Alex Bourget Alexandre Dfuse StreamingFast Substreams Substreams-powered Subgraphs The Graph Core Dev subgraphs Web3 Indexing blockchain data

GRTiQ Podcast: Special Release: Alex Bourget on Substreams-powered Subgraphs

Today I am speaking with Alex Bourget, Co-Founder and CTO at StreamingFast, the core dev team behind the groundbreaking Substreams-powered subgraphs recently launched by The Graph. This innovative technology has the potential to redefine what’s possible for The Graph and push the boundaries of indexing blockchain data.

In this special edition of the podcast, Alex joins us to discuss Substreams and Substreams-powered subgraphs its impact on The Graph’s ecosystem. If you’ve been a long-time listener, you might remember Alex from his previous full-length interview in Ep. 42, where he shared insights into the origins of StreamingFast and their journey to becoming a core dev team at The Graph.

The GRTiQ Podcast owns the copyright in and to all content, including transcripts and images, of the GRTiQ Podcast, with all rights reserved, as well our right of publicity. You are free to share and/or reference the information contained herein, including show transcripts (500-word maximum) in any media articles, personal websites, in other non-commercial articles or blog posts, or on a on-commercial personal social media account, so long as you include proper attribution (i.e., “The GRTiQ Podcast”) and link back to the appropriate URL (i.e.,[episode]). We do not authorized anyone to copy any portion of the podcast content or to use the GRTiQ or GRTiQ Podcast name, image, or likeness, for any commercial purpose or use, including without limitation inclusion in any books, e-books or audiobooks, book summaries or synopses, or on any commercial websites or social media sites that either offers or promotes your products or services, or anyone else’s products or services. The content of GRTiQ Podcasts are for informational purposes only and do not constitute tax, legal, or investment advice.



We use software and some light editing to transcribe podcast episodes.  Any errors, typos, or other mistakes in the show transcripts are the responsibility of GRTiQ Podcast and not our guest(s). We review and update show notes regularly, and we appreciate suggested edits – email: iQ at GRTiQ dot COM. The GRTiQ Podcast owns the copyright in and to all content, including transcripts and images, of the GRTiQ Podcast, with all rights reserved, as well our right of publicity. You are free to share and/or reference the information contained herein, including show transcripts (500-word maximum) in any media articles, personal websites, in other non-commercial articles or blog posts, or on a on-commercial personal social media account, so long as you include proper attribution (i.e., “The GRTiQ Podcast”) and link back to the appropriate URL (i.e.,[episode]).

Nick (00:12):

Welcome to the GRTiQ Podcast. This release is a special edition of the podcast. This past week on July 20th, The Graph announced the launch of Substreams-powered subgraphs. This innovative technology has the promise of redefining what’s possible for The Graph and pushing the frontiers of indexing blockchain data. Here to talk about Substreams and Substreams-powered subgraphs and what it all means for The Graph is Alex Bourget, co-founder and CTO at Streaming Fest, the core dev team behind it all.


Alex Bourget, welcome back to the GRTiQ Podcast. I’m thrilled to have you and I really appreciate your time.

Alex Bourget (00:54):

Hey, Nick, Nick, Nick, I’m always extremely pleased to be back with you and thanks for having me.

Nick (00:59):

Well, first of all, I just want to say congratulations to you, the StreamingFast team, for continually innovating. And for people that aren’t aware, there was a huge announcement on July 20th about Substreams-powered subgraphs, and I want to just ask you a couple questions about that and try to educate listeners about why this announcement matters and how it’ll impact the future of The Graph.


Before we begin, do you mind just providing a high-level overview of what Substream-powered subgraphs are?

Alex Bourget (01:27):

Yes. Substreams-powered subgraphs are really the fusion of those two pieces of technology into something that makes both of them even more powerful. But also, it sort of refreshes the indexing story on The Graph in a way, subgraphs has been great for developing rapidly indexing of blockchain data, indexing solutions and kicking off an API. That’s the subgraph aspect that is contributed here in the Substreams-powered subgraph, whereas Substreams take on the part of transformation and sucking the data from the blockchain and then analyzing what it means and then refining it, and then that becomes a stream that gets piped directly or closer to directly in the database under the subgraphs.


And the subgraphs provide their GraphQL API plus all the integration. That’s the most important aspect, the integration in The Graph Network, so that all the economics can still roll out, but we’re bringing the Substreams engines sooner, faster, into The Graph ecosystem. And eventually people will still be able to use subgraphs alone, eventually being able to use Substreams alone for different purposes. But the two fusing it together in this announcement, I mean it brings the future right now.

Nick (02:44):

Incredible. I got that. These are two technologies that have come together to form something that’s novel and innovative for users that are building on The Graph. What’s the origin story behind Substreams? Where did this idea come from?

Alex Bourget (02:59):

Our company started five years ago when we wanted to build a streaming-first engine. And we built tons of custom APIs, and at some point we reduced our technology to its core, and today it’s known as Firehose. And from there we were stemming a bunch of APIs or custom indexing technologies, and we were always on the look for some generalizable ways so that people could build on it. And we’re looking at The Graph and eventually here we fused, fuse in some ways with The Graph. And Substreams was taking that technology and taking that inspiration of generalizable, user programmable, remote code execution sort of thing, to allow people to build the indexing that they need within the scope of that powerful engine. Substreams is born of that fusion there.


It came about after some experiments we made trying to make a general purpose machine in the form of Sparkle, and I don’t know if you guys remember, where we would show that it is possible to have high performance, really fast indexing because of parallelization, and then we brought back the user-driven code like subgraphs already had.

Nick (04:09):

If you break out these two pieces of technology, you’ve got Substreams-powered subgraphs, and then you’ve got these standalone subgraphs, and you mentioned into the future, people will likely use both. And in fact there’ll be occasions where people just use Substreams. But can we talk a little bit about the benefits of why someone would choose to use a Substreams-powered subgraph versus maybe just a standalone subgraph?

Alex Bourget (04:32):

Yes. Well, the subgraphs, normally they’re forced in a way or they’re designed around linear execution of the history. And the histories of all these blockchains, they’re getting bigger and bigger. There’s more and more data, there’s more and more chains. And as the histories grow, well, it has a linear impact on the performance of a subgraph. Whereas with the Substreams, we designed it to be parallelizable because we wanted to be able to cut the history and increase the number of CPUs and increase the number of machines you need, perhaps, which also increases a bit of the cost.


But at least you can trade off the cost for the time so that you as a human person, when you come to the job in the morning, you don’t need to wait for a week. You can do your work faster. Albeit a little bit costlier, but faster. That’s what the Substreams aspect brings. It brings parallelization powers to the subgraph engine, which still handles all the querying, but at least the first draft.


There also improvements to the developer cycle. Because normally you would craft your subgraph and you would deploy it and wait until it’s synced, but now you can iterate much, much faster because you can try a thing and it’s going to run in parallel, I don’t know, 20 minutes, you can be at the end of the chain for a certain nugget of information you’re looking for, a certain computation that you want to try.


It really changes the paradigm as to the speed of iteration and the speed of syncing in the end. And because Substreams has a layer below, and it’s not necessarily tied to subgraphs in its query layer form, well, you’re seeing people use Substreams to feed into different things like a Twitter feed or someone’s typing that into their Google Sheet because they’re tired of copy pasting things from Etherscan. So they were able to hook that high speed transformation layer. That’s what was taken a little bit from the original subgraph and then feed into many, many other things that are also, by the way, coming to The Graph Network, as I call those deployable units, things that eventually Indexers can serve, but that will be different from subgraphs yet perhaps powered by strong and powerful and fast middle engine.

Nick (06:44):

You said a little of this in your answer there, but just double clicking on it. What does the release of Substreams-powered subgraphs mean for The Graph and for the future of indexing?

Alex Bourget (06:56):

It’s opening up a whole bunch of new paradigms of querying at least. Subgraphs have their ways to query, but in data science, you always need to think how I’m going to shape the data according to the query I have, to the needs I have? And sometimes we’re saying maybe the analytics use case is not served as well in a subgraphs because you would need to lay out the data in a different way.


Well, now with Substreams, you can decide how you’re going to shape the data, put it in a different store, and now that’s what I just said before. Now, those new deployable units could shape the data in different ways and use perhaps different stores. Now we’re using Postgres for subgraphs, but maybe other stores are just crazy better for analytics use case, like time series database. And these databases and these new query layers can reach the decentralized network offering a plethora of new use case for Indexers, new opportunities for Indexers, but also cover all of the ground that there is in terms of indexing technologies and indexing needs, query needs. There’s nothing we can, if someone wants a graph database and the [inaudible 00:08:09] because that’s how they want a query, well, we can have a deployable unit on The Graph Network and cover all cases.


I can’t imagine a case that would not be covered on The Graph Network, once we rolled out sinks, I call them “sinks,” as in the drain, sinks of all sorts of nature. And Substreams-powered subgraphs is the first that we are able to bridge those two technologies and bring into the network.

Nick (08:33):

Clearly on the day of the announcement, there was a blog post, there were some FAQs. For listeners that want to dive even a little bit deeper, Alex, what’s the best way to get educated and up to speed on Substreams and Substreams-powered subgraphs?

Alex Bourget (08:46):

A simple way to get started is you go to the docs, you go to the blog post that we just outlined, and there’s a cookbook entry where you can try directly how to write a small, simple Substreams. And that you have all the steps needed to deploy it as a subgraph on the network. Go there.


You can also head to the documentation on the Substreams docs website where you can learn more about the Substreams engine itself and how you can exert even more power and then extract more powerful stuff there to build general purpose Substreams that could fit with some perhaps other needs that you have.

Nick (09:20):

Alex, thank you so much for taking time to come and explain this. And again, congratulations to you and the whole StreamingFast team, for launching something that’ll have a real impact at The Graph. I think when we recorded our last full scale interview here on the GRTiQ Podcast, you had just had a baby. But I think since that time you’ve had another, how many are you up to now?

Alex Bourget (09:40):

Oh boy. I just had, they’re the closest babies we have. That’s my ninth baby, our ninth baby. It was like one year apart, so probably around one year we spoke. Yeah, that’s it. It’s a good setup we have here. I have a great wife and all that, so I’ve been lucky.

Nick (09:56):

Well, congratulations on that. Before I let you go, I do want to ask just one last question. And it’s the opportunity to ask somebody like yourself on this incredible team doing so much work, what’s it like working on the bleeding edge of technology? I mean, everything that’s happening at The Graph, everything that’s happening in web3, it’s brilliant stuff. It seems like we’re right at the tip of the spear of what’s possible, and every day you wake up and you work on this. What’s it like?

Alex Bourget (10:21):

I like creating, I like the fusion of different things coming in, sparking into, in our team, in my brain, and then figuring out how we’re going to do something new, how we’re going to do something that is going to be exciting, powerful, and it can have, people can use. It’s really exciting. I’m really happy to be in that setup where we can create out of new innovation, that spark together.


There’s that data aspect that existed, but that blockchain that didn’t exist as much. And there’s, even compared to a lot of the third-party, or generally I call them web2 data systems, the more I’m looking at it, the more we’re building something quite unique there. It’s because of the fusion of blockchain and data science that we’re figuring out new patterns that eventually can even be retrofit to web2 land. That’s it. I mean, it’s pretty cool. I’m happy. What about you?

Nick (11:14):

It’s a thrill to speak with people like you every week, and the opportunity to shine a light on their stories and showcase the work they’re working on. Thank you so much for joining me.

Alex Bourget (11:23):

Thank you, Nick. Thank you. I appreciate the work you’re doing there.

Nick (11:25):

We’ll talk again soon.

Alex Bourget (11:27):

Okay, bye.

Nick (11:28):

Listeners can visit the show notes for a full transcript and links from today’s episode. If you’re a longtime listener of the podcast, then you know that Alex Bourget joined me for a full length interview during Episode 42, so check it out if you want to learn more about Alex, the origins of StreamingFast and the story behind how they became a core dev team at The Graph.


Please support this project
by becoming a subscriber!



DISCLOSURE: GRTIQ is not affiliated, associated, authorized, endorsed by, or in any other way connected with The Graph, or any of its subsidiaries or affiliates.  This material has been prepared for information purposes only, and it is not intended to provide, and should not be relied upon for, tax, legal, financial, or investment advice. The content for this material is developed from sources believed to be providing accurate information. The Graph token holders should do their own research regarding individual Indexers and the risks, including objectives, charges, and expenses, associated with the purchase of GRT or the delegation of GRT.