Video: Modern Pipeline Development: A Deep Dive into the Latest Nextflow Features | Duration: 3688s | Summary: Modern Pipeline Development: A Deep Dive into the Latest Nextflow Features | Chapters: Welcome and Introduction (6.4s), Nextflow Plugins Registry (307.92s), Nextflow Stable Release (464.185s), Typed Syntax Introduction (554.195s), Scalable Pipeline Development (2944.95s), Nextflow Design Decisions (3041.64s), AI in Development (3151.72s), Deprecating 'Each' Utility (3325.41s), Output Configuration Changes (3446.63s), Config System Improvements (3520.835s)
Transcript for "Modern Pipeline Development: A Deep Dive into the Latest Nextflow Features": Hello, everybody. Welcome to the latest Securon Expo webinar. Thank you for joining everyone. We'll start in just a couple of minutes. Takes a little bit of time for people to drop in. I'm gonna kick off with some, a poll or two at the start, I think, just to get you all warmed up. Drop drop your name in the chat if you, as you're joining. Let us know, what your name is and where you're joining from today. We're just curious to see where folks are based. Oh, there we go. First poll is up. What version of Nextflow are you currently using? Have a little you have a couple of minutes to think about it before we before we start the talk. Hey, May. From. Nice. From Dundee. I think you might be lying, Michelle. I'm I'm in Stockholm, and it's definitely not sunny here. Rhode Island. Might maybe maybe some sunshine there. How many ounces have we got for for the poll so far? Oh, boats are coming in. Tony from Barcelona. Good to see you. Yeah. Rhode Island is is overrepresented in this in this webinar so far. So when you, when you put that poll thing in, we'll get an idea for how many people are running which new versions of, of Nextway. A bit of fun just to start off. If you're just dropping in, thank you for joining us today. This is the the latest, secure, webinar about Nextflow updates. We'll be kicking off in just a minute or two. It just takes people a few minutes to join the the webinar when it starts. I'm seeing a bit of a bimodal spit coming on the poll. Plus one for all of the above versions. That's a good answer. We should have put that on the poll, actually. Use all the versions. Alright. I'm gonna share the poll results now to put you all out of your misery. There we go. Not bad. Oh, it was looking quite good about a few seconds ago and then a bunch of you hit submit all at once. But there was a definitely a split. There's lots of people in twenty five ten. And then earlier, that was a pretty good mix, actually. So a good spread of updates. Thank you for doing that. That was fun. Right. I'm gonna kick off. Got a good number of you. You're in seems to be kind of slowing down a little bit. So, so welcome today. Thank you for joining the the Nextflow webinar from Secure or about updates to Nextflow. And today, my name is Phil Yuls. I'm senior product manager for open source at Secure, which includes Nextflow. And today, I'm also gonna be joined by, Ben Sherman, senior software engineer at Secure, one of the main, developers working on Nextflow, and Paolo de Tomaso, chief architect at Secure, who's obviously the creator of, of Nextflow. And we've got basically a a really nice setup for you today. I'm just gonna give you a very quick quick update about a couple of things at the start, and then Ben's gonna do a bit more of a deep dive into some of the new, Nextflow language features, which we we've released, how they work and and why we think you're going to hopefully love them. And at the end, we're gonna we've got an extended q and a section, part of this webinar where we're gonna do an an AMA on asking anything for Paulo, well for the whole team, really, but, Paula is going to be there. So if you've got any burning questions about why Nextflow is the way it is, you can get them off your chest. Before we get started, I just want to point out there's a few things here. Firstly, we've got the chats on the side, which is great for doing what we're just doing, just chatting. If you have any questions, please put them into the q and a tab, which is alongside. That way they would they get lost and they we can keep them nicely organized, and we can also reply to them if we don't have time to reply on screen. People find this really difficult. So this is my challenge. Do you see if you can be the first webinar where people actually use the q and a tab? And then we also run the polls. You've seen one already. We've got a couple more for you as we go through. Join we really appreciate it if you answer them. Some of them are for fun. There's a couple more about the at the end. So try and try and answer those. And that's that's really appreciated. And if you, need to run off at any point or anything or forget missed something, one of us said, the whole session is being recorded and will be available, pretty soon after after this webinar wraps up. Okay. So I'm gonna kick off with a quick intro. Like I say, then Ben's gonna talk about language updates and we'll have q and a at the end. So, the main thing I'm gonna mention is the new Nextflow plugins registry, which is not really, a feature of Nextflow itself so much, but we did release it alongside Nextflow, twenty five ten in in October. And very briefly, what what is this and and kind of why is it interesting? If you've ever used Nextflow in the past with plugins, you know, you can kind of you can import them. And what Nextflow does in the background is it went off and it checked this GitHub repository, Nextflow IO slash plugins. And on this repository, there was simply a JSON file. It's about, I think, 45, four, four and a half thousand lines of JSON. And in there is a list of every single Nextflow plugin with every single release with a URL and then check some and everything. And that's how Nextflow validated what you put in your import statement and then, knew where to download the plugin from. This is a very simple setup, and it has served Nextflow very well for the years, but it's been working, but it it doesn't scale brilliantly. That JSON file gets bigger and bigger and bigger, slower and slower and slower for Nextflow to download every run time. And if there are any glitches with that download, when Nextflow can't run. So we've been kind of feeling some growing pains around that for a while and wanted to see if we could do better. So we've launched the Nextflow plugins registry, which is you can you can visit now at registry.nextflow.io, where now all Nextflow plugins past and future, are published there, and you can browse them through your web browser, find them, discover them, which was basically impossible for for, see some metadata about them. And Nextflow can access that that same registry in a stable and custom designed API. You can kind of go through if you can find it. For example, Robert's done a plug in here for and if factopia for his pipeline, you can see all the different versions there and and, like, how many times they've been downloaded. And you can go into specific versions, see all the metadata. And we can do some clever stuff with this now that we're running a service ourselves. For example, there's a shiny new button down there saying security scan. And because of how we host the registry service, we can do security scans on every single plugin. One other cool feature we can do is in the past, if you're using the language server with Versus code, configuration options from, plugins were impossible to know about because Nextflow, the language server doesn't know about that part of the language that's extended. And so you'd get a little wiggly yellow line like this at the top saying unrecognized conflict option. But now we can expose those because the plug in registry knows about plugins and can tell the language server. And so, you know, when you get really nice rich error messages and, language help as you develop. So is it just two things that we think that can help with a plug in registry, and we hope that you find it really helpful. Go and check it out at registry.nextflow.io. Okay. With that, I'm gonna move on to Nextflow itself. So for the next last Nextflow summit, basically, in a a a couple of months ago, we released the latest Nextflow stable release twenty five ten. We typically do two major stable releases per year for Nextflow, one in April, one in October. And so this version of Nextflow has some key kind of updates. There are improvements in the speed of s three transfers. If you're interested in that, we've got a great blog post on that, on the secure.io, with lots of nice technical graphs and things comparing what changed and why it improved things. A couple of new commands. So you can now run, authenticate against secure platform, very easily, fire the CLI, and launch pipelines there directly. Next, I run works as active as same, but if you want to run a platform, you just switch out to next day launch. Just hopefully making that process a little bit easier, but really the two killer features and where we put most of our efforts were around these new language syntax features, which were workflow inputs and outputs and static types. And that is what you've all come here to hear about today. I'm sure, and that's what Ben is now gonna tell you all about. So, with no further ado, I'm gonna pass over to Ben, and, he can tell you all about it. And I'll see you at the q and a. Ben, thanks for joining. Hello, everyone. Howdy. Thanks for having me. I'm just gonna pull up my screen here because I've got a lot to show you today. So I've prepared a little, demo just showing a side by side of what, these new language features that we've added in Nextflow twenty five ten, what it looks like to use them in practice. If you I I encourage you all to go and watch my, my demo at the Nextflow Summit where I go into a lot more detail about, all the specific features and how they work. But I figured for today, it might be nice to just show you a whole example. And I've taken, this toy RNA seek pipeline that we always use, and just shown you a before and after of what it looks like to write it sort of idiomatically in twenty five zero four and twenty five ten. So about a year ago, we introduced the Nextflow language server which is we added as an enhancement to the Nextflow Versus Code extension. If you're not if you're using Versus Code, it's very easy to install. You can just go in here and search for the Nextflow extension and just install it. Ignore these. I don't know what these guys are doing, but this is ours. This is the main one. You can install that, and it should just work out of the box. If using other editors, we've also the community has added support for the language server to a bunch of different editors like Neovim and Emacs. You can reach out to people in the community like on Nexo Slack or on the language server GitHub if you are interested in using the language server and other editors because it should, it should work with most editors. And so we added that and this was a way to give a much better developer experience when you're writing Nextflow code. The main thing is that you get errors as you're writing code in the editor rather than, when you're halfway into a pipeline run and you've already spent however much money running your pipeline. Right? And then earlier this year in twenty five zero four, we added a lot of that functionality into Nextflow itself. And so we added a new command called Nextflow lint, and you could run that lint command and it would give you the same errors that the language server gives you. And then in twenty five ten now, we've used that that functionality. We're building on that new functionality to now add new features that sort of allow us to evolve beyond, just being a Groovy DSL. So now Nextflow is really becoming its own language with its own syntax that has nothing to do with Groovy. And this will allow us to basically build features that are, you know, Nextflow first rather than having to, you know, deal with Groovy. And so it's where it's a little bit of a transition period that we're in right now because that new functionality has to be enabled by it has to be enabled explicitly in twenty five ten. It's called the strict parser or the strict syntax. And so all of these new features, you need to enable that in order to use these features. And then in the next version, twenty six zero four, we're gonna make that enabled by default. I'll show what that looks like, in a moment, when I run the pipeline. For now, I just want to walk through some of the code changes and get you guys familiar with what's going on here. So most of you, I assume, are probably familiar with this RNA Seq pipeline, but in in case you're not, it's on the Nextflow GitHub. It's just a basic RNA Seq analysis. It takes, some FASTQ pairs and it runs fast q c and then salmon, and then it run and then it runs MultiQC on all of that. And so it's you've got this this main workflow here with, you know, a couple of params for the input samples, the the transcriptome, the MultiQC config, and the output directory. And then you've got a little sub workflow here that has all the actual workflow logic that calls all the processes. And then down here, we've got all of the individual processes. The actual pipeline repo has all this stuff split up into modules, but I just put everything in one file here to keep things simple. And so we can see we're using it's all the standard, you know, sort of Nextflow syntax that, you're all used to. And in twenty five ten, the way I would describe it overall is that we've basically overhauled every level of the language to, give you more ability to express the the the structure of the data that you're using. The technical term is static typing. We've added static typing to the language. This allows you to whenever you're, you know, declaring channels and and variables and inputs and outputs to basically declare the type of of those values. And this sort of has two purposes. One is just for the better documentation. It's it's a lot it's a lot more informative for user to see like, okay. This reads, this is a path. Whereas over here, you can't really see what it is. Right? You just know that it defaults to null. But here you see, okay. This is a file. I need to give it a file. So there's better sort of documentation, readability, especially with collaborating with people. And then the other purpose is that it enables better validation. So, you know, with the language server and the strict syntax, we we did, we did we did we were able to improve a lot of the error checking in that way because it's a lot stricter about sort of what syntax is allowed. But there are that only that was only sort of the first layer of errors, you know, when it comes to, like, all the different ways that an actual pipeline can go wrong. That's like there's the first layer, which is like basic syntax parsing errors. And then there's the second layer which is more like logical errors or like type mismatch errors where like you you try to like add a number and a string together or you try to say you you think you have a file and you try to call like file dot size, but it's actually a string and so it gives you the link to the string instead of the size of the file. Things like that that are kind of hard to catch. And so having these having the ability to specify types in the language, allows Nextflow to catch a lot of those sort of second layer issues. And so let's just walk through a little bit to to sort of give you a concrete idea of what's going on here. The first thing that I wanna make very clear about all this new syntax that you're gonna see is that it's completely optional. And so none of the syntax in this twenty five zero four version will stop working in twenty five ten. I I can take this twenty five zero four script, run it in Nextflow twenty five ten and it'll work just fine as well. All we've done in this phase is just add new syntax, new options, You know, in time this old syntax may be deprecated and phased out, but we're we're in no hurry to do that. So don't feel like you need to upgrade immediately. Don't feel rushed. Take your time. Take the time to sort of evaluate these features and how they'll help you, how much time it will take you to migrate and sort of migrate at your own pace, and we'll make sure to, keep the ladder under you as you move up. Okay. So let's start with the params, because this is probably the simple one. There's really not much difference functionally. It's just sort of a different syntax. The main thing that this params block gives you is that now you can specify a type for each input or for each param whereas you couldn't do that over here on the left. You could specify default value but is it a number? Is it a string? Is it a file? You sort of had to just rely on users doing the right thing. For example, you might have a param that's a string that maybe corresponds to, like, a sample ID, but then at run time, you give a sample ID that happens to be a number. And Nextflow is trying to be helpful, but it ends up parsing that string as a number, and then that causes other weird errors down the line. And so this just allows, you to be much more precise about the types of your parameters. And then Nextflow can also use this type information at run time so that when you say, like, you know, dash dash sample one two three, and over here you've defined params dot sample to be a string, it will understand, okay, I know it looks like a number, but it's actually declared as a string, so I'll keep it as a string, and problem solve there. Okay? When it comes to the workflow logic, there really isn't that much changed. If you compare these two, they're basically the same. We haven't really changed anything about the workflow logic. We are looking into further improvements, but for now nothing really changed there. One minor thing is that if you were using a on complete handler or an on error handler, it used to be you had to do it like this. This is your sort this is sort of like a a very groovy friendly syntax. What we realized was that this is there was a much nicer way to do it. So we added this new section to the entry workflow called on complete. So you just say on complete and then whatever code you run-in there will be run at the end of your pipeline as if you had written this. So again this still works but, you know, it's just a little bit of a nicer syntax. Another difference you'll notice here is we've got this new publish section and we've got this output block here. I'm gonna come back to that in a second because that's that's a whole sort of can of worms. What I wanna do right now is skip ahead to to focus on to stay focused on the static types for now and then we'll come back to the outputs. So here we've got with our sub workflow. Again, the workflow logic hasn't really changed at all. The main thing is that now with your takes and your emits, you can specify types. With the this is especially useful with the takes. It's not as necessary with the emits, because the type checker Nextflow can actually infer, you know, the type of your emits like samples and index as long as you specify the types at the top here. There's sort of a natural flow to it where the type checker will, you know, start with the inputs and then sort of step through your code and figure out, okay, each one of these variables, it will actually figure out what the type of those are. But if you wanna specify them on the emits, you can. It can be nice as a sanity check and just for, you know, documentation, like if somebody is hovering over the pipeline. Okay. Because you wrote out the emits, it can show them all there. Now at this level, the the main sort of types that you'll be concerned with are sort of the data flow types like channels and and like value channels as well as, you know, people typically use tuples. So you you you see all those types here. We've got this, this reach channel, which is a channel of tuples where each tuple corresponds to, some kind of sample. Right? And so that's this is how you would express that, in the type system as you would say it's a channel. And then use these angle brackets to sort of specify, whenever there's some kind of nested type involved. It's like, okay. What's the type of the thing in the channel? Then you would say a tuple. And then a tuple also can sort of contain different things. Right? So within that, you would say it's a tuple of string path path. What this means is that, you've got a sample ID, which is the string, and then the two paths are the fastq one and fastq two, so the fastq pairs. I'll go ahead and say right out of the gate, I don't really like this syntax because it's very long, but that's just sort of the just just sort of the nature of the beast of, you know, this the system that we're currently in. Our focus with these new features was mainly to, sort of enable static types for existing code, without, you know, requiring you to change your your pipeline a lot. So we've sort of designed it to meet you where you're at as much as possible. And then in twenty six zero four, we're actually looking at adding even even, you know, even more kinds of new features like record types, which may provide, you know, a better solution over these these long tuple tuple types. So stay tuned for that, because that's coming. And so that's that's really all there is to say about the workflows. It's basically just add types, you know, take some time to look at the types in the documentation. We've got a page in the Nextflow docs that, you know, shows what all the different types are. There's not that many of them. There's like 20 of them, I think. So they're they're pretty easy to learn. And then go through and even if you just just try adding the annotations for these workflow takes and then see how far the type checker, can go for you. We'll we'll we'll spend some time on the type checker in a second as well. Last piece to show is the processes as far as static types are concerned. So again, mostly hasn't changed. You know, the directives are the same, the script block, the exec block, all that stuff is the same. What has changed are the inputs and outputs, and this is actually a pretty significant piece. And because of this, we've actually this this particular part of as far as the new language updates go, the typed this new syntax for processes specifically has to be enabled with a feature flag with a preview flag. So that's why this nextflow. Preview. Types is up here. You don't need this to use the params block or like the workflow take like the typed workflow takes or all that. This is purely just if you wanna use the process syntax because we expect that this this may this will likely evolve more, in the next version as we add more things like record types that's likely going to change, some things about how we write inputs and outputs at the process level. The main thing here was that, processes sort of use this custom mini language for describing, the types of your inputs and outputs. You know, we've got this tuple, val, path. Like, you don't really use these anywhere else in the language. It serves a very specific purpose because Nextflow needs to know, like, how to stage things into the task environment. Right? That's the purpose of declaring something as a path instead of a vowel is that you're saying this fastq one is a file that I want you to stage into the task directory when the task runs, right? So we need a way to sort of preserve that behavior. And we found a way to basically do this but using a syntax that's sort of more consistent with the rest of the language. And so what you see here is, the inputs here or also the inputs here. It's structured in a very similar way to what we saw up here with the workflow takes where it's just you give a name and then you do the colon and then you give a type. It's just that in the process there's a little bit of extra bells and whistles going on here. So first of all, as you can see here, we've got this ability to, like, break out a tuple, in the way that in the same way we have this tuple here. You can't you know, this syntax doesn't exist at any other level of language, but it basically allows you to say, okay. I need a tuple. The tuple has three things in it. I want to just pull those out and here are their names. Their names are ID, fastq one and fastq two. And then over here in this tuple, it's the same sort of tuple type that we saw up here in the workflow. We're able to specify and we see it's the same sort of string path path. And so those types are used by Nextflow to figure out which things are files. Right? And so Nextflow can look at this and say, okay. I see fast q one as a file, fast q two as a file, so I'm gonna stage them in exactly the same way as, as if using this little path qualifier. So it's pretty one to one in terms of, like, what the syntax is. Basically everything that you could do in this old syntax still works in the new syntax, with the exception of one thing which is the each qualifier. If using each, you're gonna have to there's workarounds, but you have to do that a different way. Now on the outputs, the outputs, I I really like the way the outputs work here because this can basically be whatever you want. It can be any sort of value. In this case, we're creating a tuple. We're just calling the standard sort of tuple function that you also use in workflows. And then you if you want a file, you call the file function and it gives you a file. You can also call the files function if say you wanna grab a glob pattern. Then you would say, you know, you could do something like this and that's gonna say, okay, give me all the files matching that glob pattern. This is one important change from the previous syntax because this path qualifier on the output had no way to distinguish between a single file and a collection of files. And so you you and you see this a lot in NFCore. I'm sure many of you have dealt with this where, you know, you think you have a file, and then actually it's it actually turns out to be a list of files or maybe you think you have a list of files but it's actually just one file and you try to call some method on it and then it fails in some weird way. Right? So now you can be much more precise about that by using either file or files. Much the same way as you can call the file function, up here in the workflow like you see here. And if you wanna have multiple outputs, you can also do that. I think all of the processes here happen to only have one output, so we didn't do that. But let's say that we wanted to, you know, name this something. Instead of doing the, like, the the emit option over here, you just assign it. So we'll just say, like, something like result equals whatever, and that does the same thing that the emit option did for you before. So, again, basic essentially the same syntax that you see at the workflow level. And then if you wanna specify a type, you can do that as well. So I could go in here and, you know, specifically say, like, okay. This is a tuple of string path. Okay? So that's, you know, a crash course in, you know, what the new type syntax looks like. You'll I I think, you should see some links popping in the chat. We've written a lot of documentation on, you know, how these types work and how to migrate, you know, an existing pipeline to a new pipeline, how to, like, reason about the types and all of that. So I encourage you all to go to the docs if you, to to learn more about this. And in fact, with the types, we can actually we've, we've provided a a a little something something special just for you guys, which is the language server has this custom command which can automatically convert existing code to this new static syntax. And so if you go into the command palette, this is specifically in Versus Code and you look for convert script to static types. This is at the top because I just tested it before come before I came on here. We can run that, and I've just run that on, you know, the 20 five 04 code on the left. And we can see how language server has just updated our code, not to exactly match what we have on the right, but it's done a lot of the sort of grunt work for us. The first thing that it's done is, because I had a Nextflow schema JSON over on the side here, it's taken that and combined it with the old param assignments we had there, and it's generated a new params block, which you can see is pretty close to what we have on the right. And then also it has updated all of the processes to to the new syntax, and then it's also added this little preview flag for you, just that it's clear that you're stepping into a preview feature. And so, I've tested this on a lot of different pipelines, a lot of different n f corp pipelines especially, and it's working pretty well. I will probably try to keep expanding on it as I can. Like, one thing that it doesn't do right now is the takes and the emits. So you still have to add those manually, but, consider that a good exercise for you to learn about the type system. Or you just wait, and maybe I'll just figure how to automate that as well. So that's a little goodie. Try that out. I definitely encourage trying that before trying to migrate a huge, like, 100 module pipeline by hand because I can tell you I've tried it. It's very repetitive. Just use the magic button. If you run into issues, let me know, and I will try to fix it because I do not want you guys to have to be updating code manually. So we've tried to automate it as much as we can. Now the, the other thing I wanted to show here was the the new workflow output syntax. How much time? I think we have maybe, five or ten more minutes. So, I'll I'll try to be try not to take too much longer here. On the left here, we've got twenty five zero four, and you can see we're using the published year for, you know, to publish the outputs of all these processes. And even this one, you know, this is a pretty simple pipeline, so it's not really taking up that much space. It's pretty simple. We just just one little published year line. But if you've, I'm sure for many of you if you've worked on more complex pipelines, you're probably more familiar with, you know, putting this in a config and having these massive config blocks of published year statements and you're, like, wrapping them in, like process selectors and you're like specifying the publish mode like over and over and over. But also, and so that's that's just sort of annoying syntactically. But also at at a higher level, you know, this this this published year approach is it's it's a very sort of DSL one way of way of doing things, where, you know, everything was all in one script and sort of the processes were the center of everything. You're, you know, you're defining your top level pipeline outputs at the process level, and, you're just sort of grabbing file globs to say, like, okay, just grab all the start up BAMs or start up BAIs and and publish those. You're saving them to, to an output directory. It's very difficult to see from this pipeline like what the expected structure of the pipeline outputs are. You kinda just have to run the pipeline and see what it spits out, and hope that maybe you caught all the cases to even get a good sense of what all is there. And so what we've done in twenty five ten is and if you've been following along, you may already you know, we've been sort of hinting at this for for several releases now, this this output block. It's been in preview for, I think, three releases before this one. We finally made it stable. If you've been, you know, playing with this in preview mode, so if you've been play especially if you were doing it in twenty five point zero four, it's basically the same as what the preview version was from twenty five point zero four. The only difference is, to just you just have to remove that. That'll preview flag, just from the the next load dot preview dot output. You don't need that anymore. So just remove that and then it should basically work out of the box. But what this does is it's sort of a almost like a DSL two data flow centric way to to think about outputs and publish outputs. Rather than trying to, like, pull file globs out of individual processes and hope that everything lands in the right spot, you just publish channels. So we just take the output channel of, say, FastQC. And so where does that come in? That comes in right here. And then that gets passed along. It gets joined in with the QUAT results to the samples channel. And then up here, this, samples channel, we we publish it under this new publish section. So we just say samples equals samples dot channel. I'm doing a little bit of data flow logic here just to convert the tuple into a map because that has some nice benefits down here. And then in the output block, we're declaring this sample's output in twenty five ten, so this is the one new thing. You can also specify a type for the output. Again, it's not necessary, it's more just if you want it for documentation. And then there's two things that you have to do. One is you use the path directive to to define the directory structure that you want. And so in here we can say, okay, every sample that's coming in from this channel, I wanna take the FastQC results and put them in the FastQC folder. And then the quant results, put them in the quant folder. And so already, we can sort of we can get a pretty good gist here of, like, what the directory structure is gonna look like. And the other piece of this index file, which you can use to basically, what this does is it takes your channel and it writes it to a file, like a like a like a JSON version of your channel. In this case, I'm I'm writing to JSON. You can also do CSV. That'll basically give you like a traditional sample sheet. Right? I I personally prefer JSON just because it's a little bit more precise, but either one works. And so this allows you to basically get a sample sheet for free, from your channel, and then that's the sample sheet that you could just pipe directly into some downstream process. Right? And so, this is a bit more of a higher level, like, paradigm shift. Unfortunately, it's not as easy to automate. Although I promise if I can find a way to automate it, I will I will give it a try. But for now at least, this is mainly something that needs to be done manually. Secure AI might be able to help with this, because you've got to think about, like, okay, what were the outputs that you were trying to capture in this process? And then, you know, follow the output channel through your workflow logic and then make sure that they're being published up here in the same way. And then what you get from doing all of this, the static types and the workflow outputs, is that you get, a pipeline that is just a lot more robust and a lot more readable. Right? So now we've got this first of all, we've got this nice little trio here of the params, the entry workflow, and the output. This sort of forms a unit. This is like this is like the definition of a pipeline. Inputs, data flow, output. And we're gonna continue to build on this to provide additional tooling, you know, think about things like the Nextflow schema, like representing the outputs in this schema as well as the inputs. And that's just gonna help with a lot of things. Anything, any sort of automation or AI agent that wants to work with the pipeline is gonna have a much easier time when you have this sort of clear definition of the inputs and the outputs. And if you want to think about things like pipeline chaining, like how to map outputs of pipeline a to the inputs of pipeline b, that's a lot easier now because there's a nice sort of parallel structure between the params and the outputs. Right? They're just a list of named things. All their structures are defined. And so, I know that's a lot to take in, but like I said, we've written a ton of documentation on this topic, and we're gonna continue to try and, you know, discuss openly and communicate, like, what things are coming down the pipe. But we hope that this sort of first foray into static types and the workflow inputs and outputs, will sort of is we we believe is gonna put us on the path to having, you know, very, very readable and robust Nextflow code. Now these examples aren't, set up to be, like, totally executable. I just sort of pulled out the code. But if you want to try them out, what you can do is say next flow run RNAseq dash r f and then you can set the revision, to preview dash 25 dot 10. This is also in one of the migration guides that I showed, Except hold on. There's one more thing you gotta do. Okay? If you if you run this just out of the box, you're gonna get some kind of weird error because these new features rely on the, the strict syntax. Right? And so it doesn't know about things like the on complete section. So what you wanna do first is set export NSF syntax parser to v two. That will enable the strict syntax. This is just for twenty five ten. In twenty six zero four, this will happen by default, so you won't have to worry about it then. So this is just if you want to try this out in twenty five ten. And then let's give it a try. And, I just ran this so it should work just fine. You should see it, run just the same way. Oh, okay. I forgot. Actually, what I forgot was to use a profile to provide the software dependency. So we're gonna do dash profile. And run it with conda. And there it goes. And we can go over here and see this results directory being created and we can see that the results directory, lines up with what we specified here in the output block. We've got the fastqc folder, the quant folder, and then we've also got we've also split them out so each sample is in its own subdirectory. So this sample was called gut and then the MultiQC report. I am basically out of time, but there was one more if there was one more thing I was gonna show is the actual type checking. If you want to enable that, that's an extension setting here in the settings. Just look up type checking. You can enable that, and then you can go in here, and you will start to see, more kinds of errors around type checking. And, also, when you hover over things, you're gonna you start to see a lot more information about types. I've already went into that a lot of that detail in the summit talk. So, definitely check that out if you wanna see, playing with the tech checker a bit more. Alright. Well, I think that's all I've got, but wanna hand it off to the next person. Alright. Brilliant stuff. Thank you, Ben. Just before I bring panel on, I thought I'd just start with a couple of questions for you specifically about the things you're just talking about because it we had a few questions come in, and then then we can have a more general kind of q and a after that. And there's one thing I wanted to say as well, which is to kind of preface this whole body of work and say that we know that changing a syntax for Nexo language is such a delicate thing. And we're we're so sensitive to the fact that we know that, you know, people will have to update their pipelines when we do this. And this this work really comes on the on the back of, like, two years, at least, of kind of planning of doing next post surveys and seeing what people wanted and last. And and every time we did a survey, as long as I can remember, going 2020 and then before, it always came up the same. We need better error messages. We need to make it easier to debug things when when they go wrong. You know, we need to make it easier to sort of track and and be be have more guardrails. And so that's why we're doing this now. It's not kind of just because we think it's cool and, you know, it's the whole thing to do right now. We're we're trying to react to what the community has been asking for for for years, and that's kind of where these. new language syntax features are coming from. I was gonna get you to reiterate some of the opt in stuff. You actually did it again just at the end, so I don't I don't think I need to say that too much, but I think that's really important. If you if you just upgrade Nextflow and try out this new syntax when it will break, you need to have that environment variable to to to use that. And then if you're gonna do the type processes, you need the the preview flag. Ben, you said that if people have trouble, they should, they should tell you about it and and reach out for help. You didn't tell them where or how. Well, the easiest place is to just make a GitHub issue if you find something that you think is a bug. I pay pretty close attention to those. You can also go to the Nextflow Slack if you wanna do something more informal or if you're not sure if it's an issue or, like, something you should be doing wrong. There's a couple different channels like you can always just go to the help channel if you're not sure. There's also a channel called Nextflow syntax, and then there's also a channel called Versus Code extension if it's if you're dealing with something more like Versus Code related. So just any of those channels, I mean, I watch all of them, and, I try to come in and help as soon as I can. And also there's a lot of good just Nextflow experts in that Slack that will also help you out. And there's a form as well. Community.sakera.au. Yes. We watched that as well. So any one of those, whatever whatever you like, just pick one, and we'll, we'll keep an eye out. Okay. Like, two more questions, then I'll take it on. So, type aliases, are they supported, or are they coming? Type aliases. Well, that's kind of a a vague term. There are ideas related to that. Like, one is record types, which is sort of a replacement for tuples. We're also looking at ideas around like you may have you probably saw the path type a lot in that code. We're looking at maybe having some way to have something more specific than that because usually you know that it's gonna be like a fast queue file or BAM file or text file, whatever. So maybe having some kind of alias for that just so that you can have a little bit more validation. That's what I would keep an eye out for. Got a question here. I think you touched on it, but it's good to reiterate. Can I mix the two, static typing and the old style, or within one workflow? For the most part, I think the only restriction is that the typed processes when it comes to that the process syntax, it needs to be all or nothing there within a single script. So you can't have, like, an old process and a new process in the same script. You can have both in the same pipeline just as long as they're separated. So, you know, you think about it. If you have a pipeline with a bunch of modules, you can update one module at a time if you want. You don't have to, like, update all of them at once because if they're all separated into separate files, that's sort of the how we drew the line there. And then I've got one there which I'll I'll take because I saw it come in and I actually it's about AI, and I went and asked Sasha and the AI team at live. One of them up and I was going. The question question was, if I use Secure AI now, when I get the old or the new features in the in the code that it generates? And the answer is, a little bit like probably a mixture right now. But, if you it should be better than if you use, like, a vanilla LLM. Like, for the big major providers will still be going on what they know, which is the old syntax. Secure AI, knows about the next way docs. And the next way docs, I don't know. And what would you say? They're, like, 50% over to the new syntax now or something. Yeah. Something like that. So I would would expect Secure AI to be about 50% over to a new syntax, and that will get better quite quickly over the coming months as we upgrade more and more of an exploit docs to to use a new syntax. So, so hopefully, that will that will really help a lot. And especially if you if you push it further in that direction, it will it will do, and it will know what you're talking about. Okay. I'm gonna bring Paolo on now if that's okay, or Paolo can bring himself on, and we'll do move on to the ask me anything. Okay. Hi, Ben. Thanks for. joining, Paolo. Got my team together. The team's bigger than this. Good to have you on. Right. So, folks in in the audience, put in your questions now. Are we watching? I'm gonna pick out pick out a few to ask. And, Paolo, I'm gonna kick off one we've got for you here, which is, yeah, like, I guess, related to one that's kind of related to the new syntax we're talking. The DSL two transition was a significant shift for the community. And and what was what how how did what was the hardest part of introducing such a fundamental change whilst maintaining backwards compatibility, and and how do we avoid that that kind of pain as we do new syntax now? Yeah. Like you were mentioning before, changing the language is is always something a bit difficult. The one side dramatic, but it's an important change. So first of all, a lot of thinking what we are bringing, why we are doing, what the value that we are adding, bringing this new feature. Station DSL two was a radical change, but at the same time, I think the the guidelines said that North Star was okay. We decided to make this big change to allow Nextflow to be more modular. I was resisting a lot resisted a lot of time to avoid that, but at some point, it was clear that that was something really important to make the language, the runtime, the workflow engine to grow, to go to the next level, to to allow the people to write much bigger, much complex workflow. And so it was a necessary step to be down. And the point was, okay. Let's do a new complete different syntax, but keeping the same model. I think that was the the guideline, the north star of this change. So embracing a different syntax that allow developer Nextflow developer to define models to iterate processes, but not breaking the contract or, let's say, better, not breaking the the core model on Nextflow that is about data flow, about parallelism. So the paradigm is exactly the same with a different syntax, and that was the main idea. Essentially, if you get that is just a way different way to wire the task and the processes in your workflow, but the programming model got exactly the same, the transition is very simple. Nice. You kinda let me answer the question I was eyeing up next as well, which if if you go back further, what what was the original problem that you were trying to solve when you started building Nextflow? yeah. Well, the original program, again, didn't change much over the the time. The original program was to to, enable people to write portable code in different infrastructure, especially in a scalable manner. And more practically, we were when I was in Notre Dame Lab, Center for General Regulation, the the particular program was essentially to write, pipeline for, scaling and comparing, product timing product, sequence alignment tools in different dataset. And the idea was, okay. We want to compare different tools from product alignment across different data. So the the the the essential problem was to replicate the same task with different dataset, classic benchmark problem. And I realized there was not a simple way, something that apparently should be trivial. There was no simple way to manage the scalability, to manage the recovery, to manage to to maybe prototype my computer and then launch efficiently into Amazon cloud. So I started to brainstorm what it could be done, and and then what it become next to essentially. This was the concrete specific program application and then more in general, enable the scalability, the portability of power plants using this approach. Very nice. And when you when you think back to those early stages, is there any kind of design or architecture decisions that you made in those kind of early days? Was it 2015? Was it when you started, something? no. Before 2013. So I. think yeah. Making us feel old. Is there a pass. is there any kind of design or architecture decisions that you're proud of Or and is there any that you do differently if you're starting. a project today? I don't know. But I think some good design choice was this idea to easily teach task into its own directory. This is something that is, I think, still unique somehow to Nextflow. I think it was a I think we can say that it was a a good decision because allow us to to parallelize much in a much simpler way many tasks. And, also, when there is some, failure, it's much much simpler to to debug failure in the pipeline execution. And, also, likely no. Not likely. Surely, also, the idea to to to to isolate task execution in container, that time was something really new. I think also the dog owner came out around 2013. I remember there was a lot of skepticism. There was some criticism. No. It should not be down. Too new. It's too better. Make not much sense. And so then become really the the the the way to deploy competition and not just, bioinformatics, but also general competition to the cloud and not only the cloud. So that was definitely a good decision, good approach, and we are still benefiting benefiting, from this all the community. Thinking about similar kind of technology changes, I mean, the obvious one happening now is AI and and cogeneration. How how do you see AI driven cogeneration influencing workflow do. development? And and do you see do you have any kind of best practices, or can you think of any kind of guard guardrails that you'd recommend for teams using AI assistance? The, next one. we yeah. We are in a new paradigm shifting into software engineering. I think this is comparable. Something happened when there was introduction of the cloud. Well, even more, I think this is a huge, change that there is in software software industry. And and so this is impacting everybody. I think for Nextflow, it's not really different from general programming and software, development. I think all of us I'm I'm the first one. I'm using more and more, AI agent in my daily work. And developing next local, I don't think it's much different developing any other programming, with any language, Python, Google, whatever, Java, etcetera. So guidelines, ma, What I see that is benefiting benefiting him, focus on clear goal when you use your agent, define proper context, and, also, it's very useful planning before executing. When you define exactly what you want to do with the agent, it will dramatically help the experience. And, also, we need to say instead say, oh, generate this huge pipeline. You you can't guide the agent in much more precise manner. You might say, I want to implement this task. I have problem with this task. Help me in bug games. So providing very narrow instruction helps a lot the agent and helps a lot the experience. Pretty sure that next is identical. I do agree with that. I think nice comment in the in the chat from Austin as well that, adding typing to a language is also a good step to help, Yeah. AgenTic. Definitely. I think it's also one decision to embrace the typing was exactly this, to make the agent much more, to provide much more metadata about the language, about the type, about the structure to the agile level so we can have much better guidance for the code composition. Yep. Whenever I'm working with with AI code generation, I always find that the best practices for LLMs tend to be also just good practices for people as well. It's like, you. Yeah. know, Yeah. write. things in a clear. way. Yeah. This is, including really surprising. That is just accelerating the the the the natural behavior. I'm gonna pick off a couple of questions that are coming in about more of the technical syntax changes, if that's okay, because I don't wanna run out of time and miss them. Ben or or power either bit. There's a question here about the and you mentioned the deprecation of the each, utility. And can you talk a little bit about why that's not part of a new syntax, each and and why why we've deprecated it and and what people can do about that if they're using that at the moment? Yeah. Something to understand, yeah, something to understand, you know, more at a higher level is that the trend of the language has been taking these process definitions, these modules, and making them as stand alone as possible. You know, in DSL one, they were, like, all tangled up in the workflow logic. Right? You got the from and the into. In DSL two, they're more isolated, but you still have certain things like each or when, so that sort of couple the process to workflow logic. And we want modules to be, like, very stand alone so that even in the future, maybe they could be executed on their own or, you know, used by the AI on their own. And so each is one of those things that, you know, it's it sort of implies some kind of workflow logic. I really like it. It's a I mean, we all love it. It's a really nice shorthand, but there's just not really good way to represent it, in this in this model where, you know, the inputs and outputs are like, you know, just a stand alone thing. So So what you have to do instead is, what that each is doing under the hood is that it's sort of taking two inputs and combining them, like getting the combinations, you know, like maybe you have it each for, like, three different methods you wanna use and so you wanna repeat it for each method. You can do the same thing in the workflow definition by using something like the combine operator where you take two channels and it will just give you all the combinations of those channels. And so you just do that and then you'll get some kind of combined tuple and then you would pass that into the process instead as a single input. Talking a little bit about config, we're moving away from published there as well. That's another thing we're kind of deprecating. But with the new output syntax, will you still be able to configure whether files are published or not, from a config file without editing the pipeline source code? So, for example, if I have an NF core pipeline, can I kind of cheat turn off certain certain publishing? Yeah. You know, I think originally, I didn't add it, and then I think Paulo added it under the hood while I wasn't looking. So you have him to thank. But, yes, you do have the ability to enable or disable workflow outputs at large. Also in the script, you can enable specific outputs. You can, like, enable or disable them. If you want to control it through the config, what I recommend is that you use a parameter because params can also be used in the output block just like the workflow. And so you can have a parameter to say you, like, you can toggle this input or that output or whatever. And then in your output block, you can say, you know, enabled and then pass that parameter in. And that's the way to give the, you know, the user control of that kind of stuff at runtime. On the topic of configs and and maybe, Paolo, you can comment on this one. Jason says that his team is often confused about configuration. It's, like, so powerful. You know, you can basically have any kind of groovy code in your config file. And is there any thoughts about what can be done about the config system and kind of typing and the new language syntax passes it? Yeah. This is something that we are we want and we are planning to somehow improve, with the new releases next. Well, maybe this is one point that I may regret linking to the previous quest quest question that you were doing. So, essentially, there is too many things to come down next flow config. What we want to achieve to a much more declarative configuration experience now, essentially, Nextflow configuration file is essentially Ruby Java code, be under the hood, you can do, you can use any, any fragment segment into the Nextflow configuration that make tend to make too many magic things into the configuration file. It's powerful, but at the same time, it can create too many, complex or hard things to to maintain. So, Italy, what we are looking into to make much more control the configuration file to have essentially a declarative configuration file, which you cannot use if or you cannot use for or try cash. Much more similar to any other program environment configuration file. Yeah. The thing that reminds me of this conversation is in the Python world, we have had set up top high to install packages for a long time, which is just a Python script. But that that community has moved away to pyproject dot toml, which is kind of the same kind of declarative. I think many similar processes. Okay. We are at time, so I need to restrain myself and stop asking questions, which is a shame because I'm having fun. A lot of good questions here. But but we need to wrap up. So thank you, Ben and Paolo for for joining me today. Thank you everyone who's listening and and held on to the end. We're gonna put up a poll in a second. So please, just take thirty seconds to fill that in, tell us what you think of the the webinar and and what you'd like to see in the future, and we'll do our best to listen. And, yeah, if you have any questions, please try out when you syntax. Try and break it. Try and find all the edge cases you've you've thought up which don't work, and then tell us about it because, now is the time, and, we can we can try and help you out. Great stuff. Thanks very much, everyone. Goodbye. Bye bye.