Video: CEO Perspective: Building the Future of Bioinformatics with BigOmics and Seqera | Duration: 3620s | Summary: CEO Perspective: Building the Future of Bioinformatics with BigOmics and Seqera | Chapters: Welcome and Introduction (7.36s), Founding PicOmics (131.155s), Pharma Data Challenges (374.17s), Data Analysis Challenges (697.045s), Data Types Evolution (1179.175s), Omics Playground Overview (1565.42s), Interactive Data Analysis (2060.53s), AI in Bioinformatics (2800.775s), Scalable Bioinformatics Architecture (2958.67s), Architectural Decisions and Partnerships (3099.11s), Engaging Biologists with Bioinformatics (3218.23s), Commercialization Challenges (3317.965s), Learning Nextflow Resources (3404.97s), Conclusion and Farewell (3499.505s)
Transcript for "CEO Perspective: Building the Future of Bioinformatics with BigOmics and Seqera": Hi, everyone. Welcome to today's webinar, on the future of bioinformatics. We'll be starting in a couple of minutes. We're just waiting for, more people to join. In the meantime, feel free to introduce yourself in the chat, what you're working on, where you're from, and we'll also have a quick poll, which will pop up in a moment for you to complete. So, yeah, we'll be starting in just a couple of minutes. Hi, everyone. Welcome to today's webinar, on the future of bioinformatics. We're really excited, to be doing this collaborative webinar with BigOmics Analytics today. Just before we start, just a couple of housekeeping activities. So you should be able to see a chat and also a q and a tabs on your screens. The chat, feel free to introduce yourself, what you're working on, where you're from. We'll also be posting some useful links throughout the webinar as well. You've obviously probably already seen the first poll that's popped up. There will be a couple of polls throughout the webinar as well, so do feel free to join in on those. And finally, if you do have any questions for Evan and Mura at the end of the webinar, feel free to put them in the Q and A tab. And specifically, if you want Mura or Evan to answer the question, then do feel free to name who you want to answer as well. So without further ado, I'm delighted to introduce Mura Akhmadov from BigOmics, the CEO of BigOmics Analytics, and also Evan Floden, the CEO, of Secura and cocreator of Nextflow. So without further ado, let's kick off this webinar. Enjoy everyone. Awesome. Thanks a lot for the introduction, Lizzie. And it's also great to have you here, Murad. I'd love to just jump in and sort of hear a little bit about your background and sort of what lead lead you into the Omnic space. Yeah. Hi, Ivan. Thanks having me here. It's a great pleasure to be here and to be sharing our journey, both myself and also as PicOmics. I'm Murat, cofounder and CEO of PicOmics. My background is bioinformatics. I've worked with different kinds of molecular level data, including transcriptomics, proteomics, single cell data, and others. So during my PhD, that was a period where I really became familiar and worked with those omics data. And we the the goal was really making data analysis and interpretation available, especially for biologists and supporting them. And I've seen that, really, biologists struggle with data analysis, and they are separated from their data. And data was exponentially growing, it's still growing, and we can talk about data growth later on. But the main challenge for biologists was how to efficiently analyze and understand these data. And traditionally, biologists generate these data, send it to bioinformaticians, and we bioinformaticians analyze and share results with PDFs and bunch of Excel sheets. And that actually creates a data analysis bottleneck, which there is a back and forth between biologists and bioinformatician. That that was also a period where we realized, oh, we wanna solve it. We experienced it at firsthand and wanted to solve this. Also, I very much share that same goal of of making analysis, you know, much easier as as well as sort of avoiding some of that back and forth. I think, like, maybe working on a problem as well and then being able to, you know, come up with a solution and and maybe others seeing that valuable. We saw the same thing with with Nextflow in in many ways. We, you know, created the software for ourselves. And then as others, you know, obviously, had the similar problems and started adopting it, we could enable us to fit sort of think, well, what what could this be? And how could we expand that and and sort of get into as many hands as as possible? I guess, like, from from there, you obviously started to, you know, think about maybe more the business side, actually sort of create a business around it. What were some of the sort of experiences that you had or what drove you to to kinda create and and build Bigomics? Yeah. Great question. So before actually starting BigOmics, me and my cofounder, Evo Qui, as mentioned, experienced this at firsthand. We were overloaded with the requests, and we start building a solution to make our life easier. And that period, Shiny was also getting some popularity in the community, so it was easy for us to build the initial prototype, and we tested it in more than 50 different projects. And it really worked well because as two bioinformaticians supporting more than 50 biologists in 10 different labs, we really needed a solution to make our life easier. And we realized that we were not the ones in terms of bioinformaticians having a difficulty in terms of supporting many biologists, other labs, bioinformaticians in other labs, and eventually biotech and pharma was having a similar issues. And at that time, we spun out and wanted to really solve this problem. And today, we're really happy that more than 7,000 researchers across the globe in more than 50 different institutions using OMX Playground. It's still just a starting point, but happy to to see it and to support to enable those researchers throughout Omnics playground. And that kind of that sort of setup where you've got typically maybe a set of value petitions, and they kind of they're working horizontally across, you know, multiple different projects. It's something we see a lot across pharma as well where there is, you know, services groups within sort of sometimes jumping into projects, sometimes in many ways for a week, a month, a year, in many ways, and kind of and and working on that. And then, like, just being able to provide something that can that can speed that up so that they can be more like self-service to the analysis, sitting sitting that up as well. When you think about now, I guess, working with some of the the pharma customers and working in industry, how have you sort of managed to to work with them and sort of build relationships inside of pharma? How have you found that experience, I guess, coming from some of academic side? Yeah. That's, again, a great question. Overall, actually, the the the challenge are very similar, either it's academic lab or pharma. The main goal here is really making data available for wider audience or wider researchers to support data analysis and integration fast enough. An eventual goal is obviously to empower or to shorten drug discovery time so that the final product hits market faster and eventually, that we serve for precision medicine. So overall, the the challenge around data analysis and making it available is similar. But as now zoom out from a lab to biotech or even big pharma, then there are additional concerns or additional challenges that we need to be careful, or we need to be the general challenge is gonna be there there are definitely additional points that a solution needs to be answering. Those could be starting from scalability, for instance. You're not talking about a few users, but few 100 users that might be that might want to access data simultaneously. And standardization slash reproducibility is another big aspect that we observe, and we always want to tackle that. And the the third aspect is gonna be more the enterprise readiness. So how you manage users, so assigning roles based on the users, admin users, so on and so forth. And yeah. Yeah. A huge amount of those, like, enterprise features, the enterprise functionality that'll you know, that allows organizations then to adopt the software. Sometimes it's kind of seen as kind of, you know, maybe more of the kind of, like, the boring work that has to be done, but it really you know, it is really important to be able to get that that adoption there. Maybe more of a personal one, like, and I sort of both, you know, started as scientists and and now sort of running businesses. Any kind of things that you've kind of learned or or things you've been surprised by as you've kind of had to, like, you know, manage people and and end up kind of, building the side of the business side of things? Yeah. I think the first thing was really changing my mental thinking from just from academic focus to zooming out and building a business. In academic setting, the final output is always a paper publication, which is very important to share how you solve a problem. But then in building start up or in industry setting, that's just a starting point. And the more important thing is really how to commercialize this and think more in terms of the the support, in terms of the the scalability of your solution. Yada yada. And that that was the the first the challenge shifting the mindset. Right? Yeah. And it's very much like a it's a team game, I've I've found them. You know, from my experience to doing research is often, you know, yourself or maybe a very small group of folks working on something. And then as you as, you know, you start to, you know, scale up and and and build it, very much, there's kinda, like, team dynamics and and and kinda getting that right helps. And I think it's mirrored as well in what's what's what happens with our customers, where they are, ultimately you trying to solve a business goal or trying to, you know, solve a kind of research project on a on a much bigger level and, you know, it's with a lot of coordination, etcetera. Absolutely. Absolutely. And that, again, that was my second point that I wanted to highlight, which is really team and how as a collab it's more a collaborative goal here. It's rather than, like, first author in in publications, now it's like, as a team, we need to solve. As a team, we have those milestones that we need to reach, and we need to make it happen. Yeah. Amazing. So when you think about some of, like, the the the data analysis, you think there is a whole lot of, you know, work that's going on in terms of generating new datasets, etcetera. This has become, you know, much more common. The datasets are becoming larger. What are some of the challenges then that are sort of associated with the the analysis and and care specifically of the biological datasets that that you're helping to to analyze with with Bigomics? Yeah. Great point. General trend is, as you mentioned, data acquisition now, it's getting cheaper and cheaper. Therefore, the volume of data is exponentially growing. And with that, different challenges comes in. And the first one is how efficiently to process this data, and the next step is how efficiently analyze and understand this data to uncover biology, to develop drugs faster. Now at the moment, data generation is kind of running as a factory unit, while still data analysis and interpretation is kind of a craft. And we would like to serve as as Omics playground to standardize this process, to bring it more in a robust form to serve to to accelerate data analysis. Now the there are the the the some challenges around this is definitely in terms of standardizing now the the feature requests. So how would you like to bring those feature requests in one place? Because most of the time, the challenge is actually different people can ask different sort of analysis. And our goal here, the the rule of thumb is being able to analyze 80% of re requests so that the remaining 20% is always there. So bioinformaticians are always happy to have bespoke analysis, which actually was also our experience where when we developed the platform first, we were able to cover 80 to 90% of analysis and still remaining 10% is bespoke, and we were happy to take care of that. And Omics playground can take 80 to 90% of of analysis. Yeah. And now going back to the the tertiary analysis and what kind of challenges or or questions can be answered on OMICS playground, it starts with basic analysis and also has a more advanced analysis. For basic analysis starting from differential expression enrichment analysis, pathway level analysis, gene set enrichment analysis, and continues to more advanced analysis in terms of biomarker selection, drug repurposing. We also have a deep level of batch correction available on the platform. Yeah. I think, like, jumping into that that aspect of, like, standardization, which I think is really key. We're seeing a lot of folks where where that kind of data generation point, as you say, is is very automated, and you wanna create datasets that are very standardized. You know exactly how they've been processed, have all of that information. A lot of the work we've done in Sagara is, you know, making that reproducibility aspect of a key. But then as well, you do need that, like, that kinda customization. You need that ability to, like, iterate quickly. I think that, like, what we're doing with BigOmics as well is just really enabling that that that second piece to to happen and maybe in a kind of, like, a a self-service way. We're seeing, like we kind of see this similar in in in many ways. These these folks who are sort of running Nextflow in in production settings where they're really running things through in a very automated way, sometimes, you know, directly off the, you know, machines and readouts. But then still being able to, you know, try and solve this, you know, specific scientific problem, ask that scientific question, and then sort of iterate and learn and and and kind of do that. I think it's a a really kinda key thing. Sort of brings a difference between sometimes as well, like, the batch analysis as as well as the kind of more interactive analysis. And sometimes, it's a bit of a blurred line as as as well as, like, kinda, like, what kinda what you see there. Evan, by the way, would you be able to describe little bit unique parts of NextFloor and its benefits? Yeah. I think particularly when it comes to some of that downstream analysis, you know, that you're doing, a lot of the work with all the the data that goes into that, really having that in a standardized way and something that is kind of predictable and reproducible is really key. And we see we see this across science about really being able to replicate what you're doing. Nextflow, early on, we we took this this concept of reproducibility and really tied it to both, you know, the code that that folks have. So the code that's generating the data, the containers, so the environment that the the code is being executed in. And then being able to go tie that with a lot of the computation. So the exposed benefits is really being able to do all of that analysis in a in a very kinda controlled and reproducible way, but then also do it in in many different environments so that you are able to develop on your laptop. You're able to share that analysis with with colleagues, so that kinda collaboration aspect of it, as well as being able to kinda scale up, run-in cloud, run on your HPC. Those are kind of the key benefits that Nextflow has and is sort of built in. What we've seen though is, like, it's an aspect of it that technology where those those technologies I mentioned before have very much enabled in a large aspect of collaboration and and for scientists to kind of all come together, work together on similar problems, and then be able to build those pipelines, which they're then sharing and interested in collaborating on. And much of the growth of Nextflow and has come about from folks who have been building those pipelines, other people who wanna then, you know, who are very interested in, say, running a particular analysis, they're able to take those pipelines off the shelf and essentially get something get a result quickly, but in a way that's being tested and that's being used by hundreds of people or thousands of people in many cases, and you kinda get that reliability. You know that the outputs that are coming from there, you know, is gonna gonna be a result, which is something that you can rely on. And then maybe that's something which your users then who are who are, say, using BigOmics, they can, you know, take that data in a kind of reliable way. I know that for, obviously, there's, you know, there's a whole bunch of options in terms of, like, in terms of BigOmics as well. And and if you think about maybe what are some of the more common analysis that people are are doing in BigOmics, and we'd love to kind of hear about, you know, maybe what are the kind of common datasets that folks are analyzing, because I think that there's some mixed flow pipelines kinda sit upstream of that analysis, if I'm correct. Yeah. Yeah. That that that's exactly correct. And in terms of maybe I can start with data types. So Omics playground can handle transcriptomics, proteomics, metabolomics, and lipidomics as of today, and we plan to expand it even further. And if we speak about, for instance, transcriptomics, bulk transcriptomics, the types of analysis the the end user could do the includes, again, basic analysis such as differential expression, enrichment, pathway level analysis, gene set enrichment, which are the the basic and core analysis. And it also includes some advanced analysis in terms of, for instance, WGCNA network analysis. We also have a dedicated single cell analysis module, biomarker selection identification, which is important for biotech and pharma. We also have a dedicated tab in terms of drug reproducibility drug connectivity analysis that biotech pharma finds it very useful. And before that, of course, as you have also mentioned, batch correction, which is important. And luckily, in transcriptomics, this is less, but when it comes to proteomics, we we see that between different experiments, the batch effects could be much larger, and we have a dedicated module to take care of this as well. Awesome. Are there any particular trends that you're seeing or or or different analysis types or data types that are coming in which are, you know, becoming more prevalent in in what your your users are are doing? Yeah. It we started from transcriptomics. That was the the the default way, very natural way for us to start because we came from cancer research oncology side. And we also see recently now transcript proteomics is picking up, emerging really quick and different in terms of, for instance, affinity based proteomics, OLINK and similar kinds of data types, but then proteomics mass spec is also picking up. Then metabolomics and lipidomics is also emerging. Of course, on transcriptomics side, we see single cell definitely is not only in research now. It's also really in translational clinics. Spatial transcriptomics is growing, and it's it's it's a nice journey, actually, that seeing more and more different data types will gonna be interesting. It's definitely evolving field. Curious to see the future on that. Yeah. Fantastic. I I saw a very interesting diagram yesterday. We can pull it up on screen now. And it kinda shows that sort of the differences between that kind of bulk transcriptomics with sort of the image of a of a car, which is being completely crushed to the kind of single cell where you have, you know, the individual components. Dynamics. Right? Exactly. And then they kinda that sort of spatial transcriptomics where you're like, this position this piece in this place of the car, I think that giving that higher resolution, you know, it's it's it's really it's been helping helping our science. We're seeing this same thing, you know, a lot of single cell, lot a of spatial transcript on it is coming out. And, yeah, even you can use of the of those datasets, generation of very large datasets there, but the use of those datasets and, you know, particularly, you know, across drug development as well. It's, it's really exciting to see, and I'd say it's great to see these challenges. I think the work that, you know, has been done over the years to to to be able to to create analysis systems is very much paying off as these data types become more complex, the analysis becomes, you know, complex as as well. What do what do you what do you think about the specific challenges there that are that that exist in the interactive? Are there any, you know, particular ones that you thought may have been surprising to you or things that you've learned from users or from being involved in projects as it relates to the interactive analysis itself? Is it maybe you've been surprised by? Well, definitely. There are multiple challenges. Of course, what makes interactive analysis challenging is the involvement from multiple user types or people with different background. For instance, bioinformaticians and biologists, we like each other, but at the same time, we hate we could hate each other because, of of course, there is a the gap, there is a barrier, and people speak different languages eventually. And people who can speak both languages, both biologists and bioinformaticians, there are very few of them, unfortunately. And as an interactive tool, we're really happy to play a gap between biologists, bioinformaticians, and eventually clinicians. And the the the challenge of data interactive data analysis, we see that the current traditional way is is not scalable. So then the serving for the purpose of scalability is is great. It's important. We also see the the to enable this interactive analysis, currently, the analysis is fragmented. Also, data itself is siloed. It's fragmented. That in most of the organizations, that's the case. We also see that data itself is growing. The complexity within experiments is growing. More samples, more contrasts coming in, larger samples, and we also see multiomics coming in, and that's a whole different space in terms of integrating all different kinds of omics data. It's exciting. It's a current challenge. It's even the next challenge, actually, not only integrating transcriptomics proteomics, but also linking that to metabolomics and eventually, who knows, imaging data and clinical data definitely to to to make a better interpretations of a final phenotype. Yeah. And when when you're, like, there with yours, so you've got, you know, multiple different user types. It's very, you know, common that we see with with Nextflow as well. Are you are you typically working with both? So you're, say, deployed, you know, with a with a customer there for the for the user set, sort of working with both of the biometricians and and for the end users together to try and kinda bridge that gap and allow maybe the data to speak somewhat is it kind of a shared language? Yes. It's more like a collaborative tool where it helps bioinformaticians to delegate those tasks, repetitive tasks, so that on the others on on the other hand, biologists can do that interactively on their own without waiting too much and test their hypothesis interactively, see their genes so that bioinformaticians don't get questions such as, can you reunder my heat map or volcano plot? Or I I I don't see my favorite gene so that biologists can see that on their own, check that on their own. And whenever they have bespoke analysis requests, then they can come to bioinformaticians, and then bioinformaticians can work on those spoke analysis. So we experienced this again at at firsthand. It it was useful for us. And we also see some other particular cases that bioinformaticians are getting advantage of. Got it. I don't know about the other folks, but I'd I'd love to see a demo. Maybe we can we can pull one up, and, you know, we'd love to kinda see this offer as well and get get a bit of feel of, you know, what we've been discussing. So this is a brief introduction to the Omics playground platform for the analysis of different types of Omics data from to proteomics, lipidomics, metabolomics, more. Our platform is subdivided into, seven different topic areas, which each module providing different types of visualizations. The data view modules allows users to get, an overview about quality of their samples as well as visualization of individual genes or proteins across all your samples as well as other information that can be useful for, the users. Clustering analysis is where you can find your classic heat map and PCA, TSNE, or human plots. Again, this contains highly interactive and customizable plots. And, as well here, you can see, an introduction to our large collection of, public databases. We have more than 20 public databases covering 50,000 gene sets that you can use for everything from the functional notation of the gene modules, in this case, for the heat map to gene set and pathway enrichment analysis as well as annotation of WGCNA modules. Another one of the basic features is differential expression analysis. Here, you can see your classic volcano NMA plots. You can also visualize individual genes or proteins in your plots as well as, showing extra plots that show you, the expression in your comparisons of choices as well as an overview across all the comparisons. A more recent addition to our platform is the ability to perform a time series analysis. So here you can see we have different times point expressed in age, and you can see the different modules, the different gene modules that the platform identified that can be used to subdivide, the genes in your dataset. Biomarker discovery is a module that uses a combination of machine learning as well as other statistical tests to use the gene expression or protein abundance profiles to identify biomarkers distinguishing your different phenotypic groups. Under gene sets, you will find gene set enrichment analysis as well as pathway enrichment analysis. And under pathway enrichment analysis, you can also have a visual representation of your different pathways, including, the level of up or down regulation of individual genes or proteins. Under comparative analysis, as well as performing your classic comparative analysis between comparisons in your dataset, you also extend the analysis across all the datasets that you, inputted into the platform as well as creating a collection of 6,000 experiments obtained from the geodatabase. The more advanced features of the platform cover, the drug connectivity map. So this provides you access to different databases, including the l thousand drug connectivity map, as well as drug sensitivity databases that are really useful for, understanding mode of actions of drugs or identifying drugs that can be repurposed for treating a condition. We have the cell profiling tab that allows you to, characterize, individual cells in your single cell RNA seq or proteomics experiments based on a collection of different databases. And you can also access, the string database for, gene or protein network analysis where you can, create your interactive gene or protein network. In this case, you can see in blue, the downregulated network and in red, the upregulated network. And finally, we provide access to the double GCNA analysis. This is a module that is, rich in plots and is particularly well suited for the, analysis of complex datasets. A more recent addition to the Omics playground platform is the multi omics module that allows user to combine, the datasets coming from different omics analysis, such as proteomics transcriptomics to gain insights. Under this module, you can find, different tabs, that represents different types of analysis, such as SNF, multiomics, GSEA, MOFA. And we have in particular lasagna, which we developed internally at Bigomics Analytics, where you can see a multilayer modules of your different omics data types connected the phenotype and a multipartite graphs that allows you to see connections between, the different units or features of your omics, data types with the phenotype, that you're looking at in your dataset. And we also have included a deep learning tab that allows you to identify, among other things, potential biomarkers across your omics data types. Well, it it's awesome to see, Mara. This list a ton of functionality there. I I can see how this will be, like, super valuable, you know, for for a broad set of users and also just the the number of analyses that you guys must have built in there. Obviously, it's a, you know, a ton of work that's on there. So, yeah, congrats on that. It's that looks fantastic. I it's where we we think about it. There's there is, you know, so much functionality that you've you've shared with her. It's it's, you know, taking a, you know, able to do so many analysis, but also, like, that way, many steps, many different kind of aspects of of of the data analysis that you're covering. That that ability to be able to, you know, have a have a single tool where you're able to, you know, cover so many bases. Well, I think it's really something I think that we we try to build out was going from, you know, just the not just the data itself to running the pipelines, long running pipelines. And then also tying in some of that interactive analysis. That's something that we've we've been building out with with studios. And, yeah, I think we've gotta show it a little bit. I can share a bit of how we can show some of that analysis and and particularly integrate Omics there inside of the those studios so that you can tie those automated analyses all the way through the the processing and and get to the kind of the scientific reporting. I think one of the things that we've seen in particular is, like, when you want to be able to have that sort of interactive analysis, you wanna be able to do that, like, close to your data, and you don't wanna have to be moving things around. It's a little bit more of the kind of the the shuffling, the kind of administration, the the kind of all the kind of annoying things that go into bioinformatics. Try to remove that and and really try to doing that with studios to make that make that a whole bunch easier. I I kinda kinda share a little bit more then on on how we can integrate in this, and and I know that that, you know, the teams have been working together to be able to integrate this with inside of the studios. I think this is a really nice place and a nice integration point so that, you know, folks can really get the benefit of of both Sicera in terms of the data or the pipelines, etcetera, and and tie it in. So I'm gonna pass it over now to to Rob Sein, who's from our team, and he's got a fantastic little demo here sort of showing this first sort of end to end solution. Okay. So let me talk a little bit about Studio. So, obviously, Nextflow is the pipeline's component for reproducible analysis. And workflows are great. They allow us to do a great deal of work without really touching or worrying about too much about the data. That's great. But, of course, it has its limits. We would like to be able to get a little bit there are times in which you're gonna get wanna get a little bit closer to the data, to write a report, to publish a paper, to get some figures together. And this is really where studios comes in. So I'm gonna jump across to the secure platform here. And this is a workspace I've created. I've created a fake organization called Tangen Research Labs. And let's say we're doing some hematologic some cancer work. Now I've run a couple of little pipelines here. I've run a little hello world pipeline, and then I ran a couple of pipelines to gather some gather and analyze some data. The first pipeline I ran was the FetchNGS pipeline, which took a sample sheet with some sample IDs. These these are gonna get pulled from SRA and dumped into my own s three bucket ready for analysis analysis. The second pipeline I ran was the NF core RNA Seq pipeline. I'm gonna jump in and look at that pipeline. And that ran things like HiSat two, Samtools for alignment. Inside of the run here, I can look at the reports tab, and I can see things like the MultiQC report, some QC reports rather. This is just some PCA plots, a multi QC report, which does some summarization, of the run. And this is extraordinarily useful, but not quite interactive enough. This is a great way to double check that everything has worked as we expect. We can see, like, alignment rates, duplication rates, everything looks okay. But at some point, we're gonna want our hands on the analysis, and that's where these studios come in. So studios are a way of launching interactive analysis, inside of the secure platform. And those analyses can come in a couple of different types. If I add a new studio here, I can select the compute and the data I wanna mount. And this mounted data is data in object storage on s three. So these pipelines I ran earlier were writing data directly to s three so that it can be shared and viewable by the rest of the team. And that data on s three is accessible inside of these studios as if it was a native file system. So if I had my R scripts or my Python scripts or Ruby or whatever, that work on local data, That data and object storage will be, accessible to those scripts inside of these studios. So these studios, we can provide Secure provides a number of templates. So you can launch things like Jupyter Notebooks, RStudio, like an RID, things like Versus Code, x 11 desktop environment for doing things like running IGV. But you could so if one of those templates is sufficient, fantastic. We also provide pre the option to provide prebuilt container images. So if one of those templates isn't great, you can create your own applications and provide the prebuilt container image here and have your team spin up reproducible analysis using that custom application. And one of the custom applications that that is getting a lot of support is this BigOmics, Omics Playground. We'll show you how to do how to launch that in a second. So I have two studios here. The first studio I started was this exploratory RID. So this is the studio running now. You can see this is running on cloud.secure.io. And I can read files from what looks like a local file system, and this is actually a a path on object storage, into the studio, do the sort of data mugging to create new sample sheets and new analysis. So this is just a really small little application. But what I can also do, the second studio that I have here is my Omics playground. If I look jump into that. Here inside of my Omics playground, I have the big Omics Omics playground application running inside of my own infrastructure. I can upload new data. When I click select files, I can browse data. So this is data on my object storage inside of my s three buckets. I can do things like grab the counts file that was generated from that NF core RNA seq run. I can step through. I can also select the samples of CSC, which contains the phenotype data from those samples. I can create comparisons. I'm gonna let playground auto detect the comparisons from my phenotype information. I can have the omics playground do things like values, do QC normalization, removing outliers, etcetera. Give it a name. And now, omics playground is gonna go away and compute the data. Now I'm gonna let Axel talk about the details of the omics playground, given that he is definitely the expert there. But I just wanted to show you how easy it is to move from an NF core, like, a Nextflow pipeline, running on data pulled from SRA onto my s three, then analysis that, data on s three with another Nextflow pipeline, and then ingesting that seamlessly into a studio, either something like an RID, a Versus code, Jupyter Notebook, or even these fully fledged applications like the Omics Playground. Yeah. Perfect. It was a great pleasure to see the demo. And also for audience, I think how if they would like to analyze on Secura, download their data, and link it to Omics Playground. It was a fantastic demo, Evan. Quick question on the future. So how do you see AI playing role for preprocessing aspects in terms of building pipelines for the secondary analysis? Yeah. I'm just just wrapping up here in JPMorgan in San Francisco. So this this morning, it's, yeah, really, you know, really inspiring to see all of the work that's that's been going on and and and how folks are really applying AI. For us, given the fact that there's so much of the the code generation and so much of it is really similar to what's happening in code, I think that that's a big thing that we've been focusing on, particularly with Secara dot ai. So very making it very easy for folks to to develop pipelines to but then kinda build going beyond the development, even just the the execution of of pipelines, so the setup of the data. I when I see the I think the the way the the future is kinda going, I think there's a ability to take away a lot of that grunt work that that's required. So you can think of things like creating sample sheets and being able to do that in a much more sort of agentic automated way, it will really enable us to to, in some ways, do more analysis. Going beyond the the kind of productivity aspects of it is, you know, how can we it's we're starting to see a bit of this that's coming out of actually using LLMs for the scientific understanding. So to be able to use that to draw insights. Obviously, if you can run things more, if you can, you know, read all the papers that are relevant in a in an instant and then kind of apply that scientific reasoning, it's kind of they think that's sort of the next the next step on that. And then whether you can, you know, really use that information then to have, you know, novel hypotheses that you can that you can test in a in a new way. So huge huge amount of excitement in the in in the space. I think, you know, things are moving so fast. It's very kinda difficult to to keep up sometimes as well. So, yeah, we we we're we're seeing a ton of this, and, yeah, folks are interested. I, you know, recommend giving Secura AI a try there. You know, we've we've kind of get some recent blog posts around some of the work we've done with protein designs to highlighting that. So very, very kinda very kinda cool work there as well. What do you what do you see, like, in particular around the sort of the the application of of AI and the and the interactive analysis? I'd be super interested to hear more about that. Yeah. So on our end, integration of AI slash LLM to our platform is actively ongoing. We see AI at two levels. The first one is LLM. And in terms of interpretation of results, that's very useful. We're working on it. The starting point for LLMs won't gonna be, for instance, doing the analysis from scratch, but we start from because we have already well established base in terms of computing these statistical results, but, we see a good, value for end users in terms of now how effectively end user can summarize those experiments. For instance, the expression enrichment tab or drug connectivity tab and make a common summary out of it quickly or compare two different experiments and quickly see what are the trends that are similar or dissimilar in these two conditions in two different experiments or even across all of their experiments. And this is the first layer. The second layer is deep learning where it goes to it moves from of course, it takes all these metrics level input, and we use deep learning more for multiomics at the moment. And there are other applications that we might we will gonna be working on in terms of biomarker selection and others. The the the final goal here is as as a as a field, we also see, of course, new and new data types comes in, and that makes the whole whole field interesting and challenging. And one big challenge is how effectively integrate this multi data. It's not only multi omics, but eventually other types of data as well, including imaging as mentioned, clinical data. And we see the application of deep learning as one nice use case here in terms of integrating all this data. And at the moment, for instance, just considering a specific single data transcriptomics, of course, bulk RNA seq is limited there because of number of samples, but then with single cell and both in transcriptomics and proteomics layer, that could be really interesting. I'm curious to see this how it evolves in the future. Yeah. And a lot of that single cell data is very much more, you know, you know, tabular. It's more kind of in the format, which is or kinda used into traditional, you know, machine learning algorithms. So you've seen a bunch of that with some of the things like the virtual cell challenge, etcetera. I I mean, I I love the idea of this of the summarization. We we've built this into multi QC. So, you know, the analysis of a of a a QC report can sometimes be, you know, you know, 10 pages or so, multiple different things. But, really, can you look at that and get that kinda quick that quick view, that kinda summary, maybe pick up on something that you that you wouldn't, you know, essentially catch by eye? So we gotta pull that in. I I love that idea. We're really excited to see this come into the product as well. And, yeah, that's that's fantastic. Well well, thanks so much for the time here. I really enjoyed this conversation. Maybe we could pass it over to the audience now, and we'll take some questions. And, yeah, looking forward to what you've all got to ask. Yeah. Thanks, Ivan. Welcome back, everyone. We're gonna kick off the live q and a now. Evan and Murat will be joining very shortly. It's really exciting being questions that come through. Many of you, obviously sent us questions while you registered for the webinar. So we will be, working through some of those. There's also a poll that should have just come up as well, which you might be interested in based on some of the questions actually that we're gonna be touching upon. So yeah. So Evan and Mura are gonna join me very shortly. There we go. Hi, Evan. Hey, folks. How's it going? Hello, Mura. Awesome. Hi. So we have a lot of questions that have come in for the live q and a. The first one actually is themed just where you guys were touching upon AI at the end of your talk. So the first question is around the use of AI agents in multi omics. So do you think the increased use of AI in bioinformatics will reduce the quality of interpretations or enhance it, given the fact that AI is still prone to, you know, hallucinations and humans may resort to looking at AI interpretations rather than their own judgment. Who wants to go first? Murat? Okay. Yeah. I can I can go first? Yeah. In general, I'm very positive about AI LLM applications in in many fields, including omics data and and other image data as well. So I think it definitely improve, enhance the interpretability of of data, omics data, if it's used correctly, obviously. Because LLM and and AI has a massive potential in terms of identifying patterns quickly, doing the analysis effectively fast. But at the same time, we should not overtrust because then there are some problems that comes up, including hallucination as we were discussing. The models could also be biased based on what kind of data it has been training. So definitely, the combination of human plus AI, I would say, will definitely gonna improve in the future. That's how I I would see at least in in a very close future how that will be. Yeah. And, I I, think, would you sorry. Go on, Evan. I think I think we can kinda see see something similar, like, where even a couple of years ago, you saw these kind of very basic fallacies, like models not being able to come you know, add, one you plus one, etcetera. And as things have really changed, kind of the logic that's kind of was built into the models has changed a lot. We've seen this with MultiQC. We added it into MultiQC so you can generate a report from that, from your Nextflow runs, for example. And then from there, you're able to get a written summary of those. And people found that incredibly useful. I think it really kind of depends as well on how critical that work is now. I think in the future, though, this is something which is gonna be, yeah, built in built in everywhere, and we're all gonna rely rely on this a lot. Awesome. Thank you so much. Now this next question, I think, is for you, Evan. So from a CTO perspective, as bioinformatics platforms scale across organizations, how do you design an architecture that not only preserves reproducibility, security, and cost control while also enabling teams to, you know, rapidly innovate from, you know, the scientific and bioinformatics teams? Yeah. I think that there's a kind of a common tension that exists between being able to move very fast and iterate very quickly as you're developing things in science and that kind of more rigid controls that are needed, whether you're working in a more regulated environment or just in a larger organization where there is typically more policies in place. One One of the things that we kinda see is a separation between the kind of administration aspects of having a large scientific platform and then the kind of day to day work. So keeping those things many ways defined across an organization within SIL, enabling scientists to build pipelines to run analysis, access data, etcetera. Building a lot of the kinda enterprise controls into into the way that those things are accessed as well. There's there are some things which are common all the way through, though, and I think keeping as many of those principles or even primitives that you have in your systems the same across them really helps a lot. Think about reproducibility. If you were developing a pipeline, say, and you use containers from the very first time that you run that task, you can still use those same containers all the way through production runs, and they can be shared across organization as well. So we thought about that a lot with thinking about the concept of a pipeline, which is a Git repository, and that kind of concept holds. Similarly with studios as well as we've been building those out and keeping those primitives really constant across an organization while kinda being able to build in your kinda enterprise layers around the cost and data management, etcetera. Seems to be seems to be a very good principle for that. We have we also said something about ideal scenarios. We're happy to chat to folks if they want to you know, plan plan anything out and get some changes going. Awesome. And a kind of follow-up question from that, and perhaps, Murat, you can weigh in here as well. So what architectural decisions are effectively irreversible at scale, And how does strategic partnerships like Sakira and BigOmix help derisk long term choices in a fast moving computational biology landscape? Yeah. Great question. I mean, very quickly on the first part of the question, which is, like, either the important aspects of building those the internal tools would be definitely the code base modularity aspects of the code as more and more maybe team people might be coming in in terms of development and later on deployment, thinking about, again, multi seat, the scalability, all those aspects. And then the the second half question part of the question, which is in terms of partnerships, so in terms of specifically focusing on the parts of solution. Then for us, for instance, it was clear from the beginning that we really wanted to target the visualization aspect and because it was also our focus, our our expertise. And it was a nice decision because in that way, when we start small, every startup is has very limited resources. So having a focus is very important to tackle what exact problem needs to be solved. And later on, we're really happy to see this collaboration coming up coming up with Seqeira where the the really, the preprocessing side is nicely handled by Seqeira. And later on, end users can use Omics playground for the visualization aspects and interpretation aspects. Awesome. No. That's brilliant. And you mentioned, obviously, one of your focuses was visualization. So another question that came through is how should we engage biologists to use bioinformatics tools more? That's a great question. At the same time, I would say it's a tricky question because it's not easy to do that. I would say because there are two parts. One part is more ease of useness, and the other angle is completeness of analysis or types of analysis that are requested across organizations. And obviously, for biologists, the solution should be easy to use. It cannot be complicated. It has to be intuitive. The learning curve should be fast. But at the same time then, the amount of analysis that I requested should be covered by that platform. And otherwise, they need to be jumping from here to there or downloading from their results, intermediate results from that platform to other platforms. So the coverage of analysis should also be high. And that two angles, actually, they don't go along with each other. When you would like to make something complete, it's so easy to make it complex. And if you wanna make it user friendly and simple, then you would be missing a lot of other analysis that could be important. So the the solutions should be tackling these two angles nicely and bringing them under one roof, Awesome. I would say. Thank you for that. We have a question from the chat. And I guess this is for both of you again. So what were some of the challenges you faced taking, you know, Omics Playground and also Nextflow from an internal tool to the basis of a whole company? I take that. I think there's an aspect of just the commercialization of it. You have to build something that is gonna generate revenue so that you pay can to build and ideally build a sustainable business around. Those are very different skills from building a command line workflow tool. To do that, I think you're gonna become very adaptable and learn a lot of new skills that maybe you weren't willing to or weren't really exposed to through some of the more scientific backgrounds. I think many of those things I sort of touched on in some of the early answers, but aspects of working with the team, learning to be able to discuss with customers and learn about their problems, be able solve those quick. A lot of being able to focus on many things, different things at the same time. And then it can be aspects of sort of people management, which maybe I hadn't been exposed to a lot before. I'm not sure we had similar experiences in our Yeah. Exactly. I say I I just echo. Yeah. Nice. Awesome. And then another question for you, Evan. So how can early career life science graduates, effectively start using Nextflow and secure a platform for genomics and bioinformatics workflows, especially if they don't have access to large scale HPC infrastructure. This is a great one. In terms of learning Nextflow, if you go on the Nextflow website, there is a huge amount of material. It's all free. We have training as well on there. A couple of series of videos, you can start from really Hello Nextflow and kinda go all the way through some of the things there. I'd also maybe make a plug here for Inner Core if you wanna get involved in the community of people who are building open source pipelines, a large amount of resources, but also kinda gives you a little more hands on with some of the specific problems. You can join their Slack, join a hackathon. It's a great way to meet folks as well as contribute back to there. In terms of running these pipelines at scale, we're gonna get an exposure to running them on the cloud. You can sign up to Secura, both in Secura Cloud. We have accounts there which are free to sign up for. There's also an academic program that we have for folks who want to have their own installation of Saqqara in an academic organization. So bunch of resources there. I'd also recommend you read some of the blogs, etcetera. And people are very welcoming across the community, I find. You'll get some learn some knowledge and hopefully make some friends. Awesome. We are running out of time, but I do have one more question for Murat. And then hopefully, we'll be able to follow-up with some of the answers following this. So I have lots of great questions through. So I guess, Murat, how do you think about the market size when evaluating if you'll build a new product or feature? Yeah. Nice question. Great question. So it's it's it's hard to evaluate market size, and that's also a very important question from the business perspective. How we do it actually collect the the requested features, who is asking how many times it's asked. We have a rule of thumb if it is asked only three or five times wait until it passed that threshold to include it, and then also evaluate it based on where it's coming from, how commonly it's used, what is the potential. Is it coming from already existing client, or is it coming from the future client? Is it needed to maintain the client or get a client? All those factors add add up, and then, yeah, we we can eventually add that feature or just wait for it. Awesome. And thank you so much for that. Unfortunately, we are out of time. For everyone watching, thank you so much for joining. We will be following up with, the recording after the event, and also any follow-up material, relating to Omics Playground, Nextflow, or Sakira as well. Just before we sign out, we have just put up a final poll so we can direct the right information to you after this webinar. Thank you so much for joining, and Evan and Murat, thank you so much for joining the webinar today. I'm sure there's lots of key insights for everyone to take away. Yeah. Thank so much, everyone. Thanks, Rob. Thanks, Amy. Thanks, everyone. Bye bye.