Behind the scenes: How Figma built its product growth platform

Alon Bartur

May 18, 2023
17 min read

Joe and I had an opportunity to sit down with Matt Dailey, the Engineering Manager at Figma that built their product growth and growth platform teams from the ground up. During his tenure leading the team, he spearheaded Figma’s Responsive Content System, the internal platform they built to allow product development teams across their whole organization to easily build and manage onboarding and education experiences throughout the product while maintaining the extremely high-quality bar that Figma is known for.

In our chat, we dove deep into:

how the growth team and responsive content system came to be, and why they had a developer-in-the-loop philosophy
why product onboarding gets complicated quickly and the benefits of a growth platform
how they approached solving the hard problems of modeling, persistence, targeting, versioning, analysis, and orchestration
how they approached driving the platform’s adoption across the org, and the scaling challenges they ran into
the impact the responsive content system had on the organization, what they learned building it, and what’s next for the team.

Without further ado, let’s jump in.

The origins of Figma’s growth platform

Alon:

You were the first engineer on Figma’s growth product team—can you talk about how it got formed and where the team originally focused?

Matt:

Figma formed our growth team a bit later than hyper-growth startups tend to, because the nature of Figma growth is very community driven; it's designers building for designers, really deeply understanding the product needs.

We got to a point where there was no product team thinking about our core business metrics so it was a bit of a “we should go figure out what this growth thing is”. Initially it was myself, another engineer, a product lead, a data scientist, and we were hiring a designer.

The first product area we started looking into was onboarding, we were looking into our new user experience and in parallel building out some of the platform capabilities that we needed to iterate and move quickly. As we dug in there, we realized that there was a lot of cruft that we needed to deal with.

The first one was getting our tracking story in a better place, and then our experiment framework, we had feature flags but not experimentation. Then we focused on building out our new user onboarding experience which led to the responsive content system (RCS), the system we built to power it.

Figma’s Responsive Content System (RCS)

Alon:

How did the responsive content system (RCS) originally come to be? What problem were you trying to solve?

Those were the two problems: 1) it was really hard to make changes safely, and 2) it was like speaking a foreign language explaining why it was hard to other people.

Matt:

The problem with all of our onboarding was that our tooltips were one big conditional. Each time we’d be like, “let's change something just to do a quick experiment” it was this Jenga tower of “I changed something and then something else broke somewhere”.

The other challenge was trying to explain that to the product folks or the designer I was working with. I would constantly say “hey, this is actually not easy because these things are dependent”.

Those were the two problems: 1) it was really hard to make changes safely, and 2) it was like speaking a foreign language explaining why it was hard to other people.

We wanted to build something that let us speak one language and that made making changes easy and safe and consistent.

Alon:

You had a developer-in-the-loop philosophy when building RCS. That is, developers were required to make changes or push things live into the product. Can you talk about the reasoning for taking that approach?

Matt:

There were a couple reasons. The first one was cultural, we were a small team and we were trying to be iterative and stay close together. The idea of building no-code tools for other members of the team to be able to configure things was like building a wall between our collaboration.

I think that goes to the core problem with this stuff: people underestimate how hard it is to get onboarding or tooltips right, to have them be a really good experience. People are like, “oh, I'll just put a string somewhere”. That's what they expect it to be, but there are a lot of interactions and nuance to doing it really well and it's really valuable to get right.

Also, as we were evolving the system we didn't necessarily know where it was gonna go. Every step of RCS was a couple days of code at most. No big pull requests, just evolving the system to where it got eventually. The growth product team didn't want to be taking on big infrastructure projects. We wanted to be shipping things and evolving the platform in parallel.

Another reason is that a lot of the counter arguments to our dev-in-the-loop approach say “if you have a no-code tool, it's gonna save you time”. I really think over the lifecycle of RCS, across all of the state machines we've built, the time developers spent doesn’t exceed the time it would've taken to build a no-code tool for a PM to build those same machines because they need to be pretty flexible.

We wanted the system to be both flexible, but also make sure that the simple things are very easy. I think that goes to the core problem with this stuff: people underestimate how hard it is to get onboarding or tooltips right, to have them be a really good experience. People are like, “oh, I'll just put a string somewhere”. That's what they expect it to be, but there are a lot of interactions and nuance to doing it really well and it's really valuable to get right.

Alon:

You mentioned one of the motivations for building RCS was helping explain to designers and PMs on the team what was possible. With RCS, what was the process of working with them? How did you help them understand the constraints of the system?

Matt:

The goal was to be able to go to a whiteboard and draw a flow chart and convert that directly to code as long as the flow chart met some constraints.

RCS, in a nutshell, is listening for event data queries where you're checking for some data, and then actions where you do something like show a modal. That basic framework was pretty easy to communicate with other folks. E.g., an event is “someone opens this page” and then we check, “are they a paid team member?”; if not, show them this upsell.

That language made it so we were focusing on the problem and the nuances of it as opposed to the dependencies and the implementation. Spending time in the implementation is not valuable, but the problem and solving something for the user is really valuable.

Modeling state machines in RCS

Joe:

I know that you originally modeled the state machines in the responsive content system in JSON, and then moved that to TypeScript. Why did you start with JSON and what led to that migration?

Matt:

I think we started with JSON because of the idea that we were going to go from this thing we drew on the whiteboard to a language for some configuration. We thought a JSON file would be pretty easy for most folks who've worked with software to understand.

We immediately realized that TypeScript would be valuable because we wanted type checking for these files. We could have built some tooling around that, but decided to use TypeScript because the rest of our code base was in TypeScript. It also helped with reusability. We found that in each one of these state machines that there's often a repeated graph of “do these types of checks”, so we could then reuse it as opposed to copy and paste JSON around.

It turns out people were not as excited to write JSON configuration as I hoped they would be. It turned to be more like, “we can write out we want on the whiteboard and that works”.

Joe:

Did the JSON originally function as a shared language between non-technical and technical folks, or was that purely contained to the developers?

Matt:

I think that was part of the aspiration. It didn't play out that way. It turns out people were not as excited to write JSON configuration as I hoped they would be. It turned to be more like, “we can write out we want on the whiteboard and that works”.

Joe:

I would have the same aspiration! Did the JSON to TypeScript transition have any drawbacks?

Matt:

I think it was more giving up on the aspirations. I had also imagined we could do something using Figma’s API where you can draw a flow in Figma and then we can extract that and turn it to a JSON file that's the configuration. You could probably do that with TypeScript too, but it'd be harder. It was pretty much net upside to go to TypeScript for us.

Persisting the state of the machines

Joe:

I'm assuming that the state of the machines in RCS for a user are persisted so that you don't show the same experiences to a user twice. I'm interested to hear how y'all went about modeling that persistence?

Matt:

The assumption is close but not quite right. We didn't actually save any (user, machine) state, we didn't save that tuple. We just had user_flags as a table. So every machine was stateless in the sense that when you loaded a tab in Figma, all the machines are instantiated. Then they just run through and we'll check, for example, if you've seen onboarding. If you have, we set a user flag that says you've seen this onboarding, and the machine just checks that as part of its state, as opposed to as part of its execution when an event comes in.

Joe:

So to reflect that back, states in the machine were flags, but the actual like graph itself was just encoded in code, in TypeScript, and the two of those allowed for state reconstruction and understanding where the user was. Is that correct?

Matt:

Yes, and I'll say one more thing, I think we're talking about the same thing, but all the machines start up at the same time at the beginning as the page loads. So they're not like starting saying “okay, I'm in this state, what's my next thing?” Instead, they’re waiting for these circumstances to happen and reevaluate as events happen on the page.

Iterating on state machines in RCS

Joe:

User journeys are changing all the time, iterating on experiences is common in growth. How did you think about that with respect to modeling the machines?

Matt:

All of the things we've built with RCS are confined to a single experience that you're experiencing on a page. We did have some higher level things, like we would have a checklist that someone might work through, but each of the items on the checklist took users through a page of “try this, and then this” which was its own separate machine.

So people only experienced it once, and we didn't save any states. It starts up, and you're running through it, and once you've run through it you’re done.

Joe:

So folks aren't existing in any particular machine for a long period of time, such that you would need to worry about what to do with them if that machine changes?

Matt:

Exactly. The closest thing that we had was across multiple tabs where if you had an old tab and a new tab, what's happening? We just didn't worry about that because you're not going to sign up, be partway through, the new user experience, and then go to another tab and start completing it again.

The only thing we would do is communicate between tabs where if you close something in one tab, we close it in all the tabs to make sure you don’t have to keep closing the same popup.

Joe:

Was there a notion of versioning for the machines? It sounds like they were simple and that folks flowed through them fast enough where you didn't have to worry about versioning.

Matt:

Yeah, that’s right. The closest thing to versioning would be the git history

Targeting machines, tracking usage, and analyzing results

Joe:

To shift focus to analytics, we’ve found folks commonly want to understand how users are flowing through the machines, or where they're dropping off. Did you have requirements in these areas?

Matt:

One of the first things we worked on was building our tracking framework. At Figma we have a concept where a React component can be marked as a ‘tracking context’. Whenever it's shown we say “we showed this thing”. So by default, all the Responsive Content System models are automatically tracked. We do the same thing with buttons, so we can get every click and every view tracked in a consistent way across the app.

We don't use any sort of tool like Amplitude that bill by events because they‘re super expensive for Figma. Our users are in the app for a long time every day so the shape of our events was different than a lot of places where they're expecting, say, a shopping cart where you get a lot of users but not many events per user.

Most of our analysis was done in Mode on top of Redshift previously, and now Snowflake. It’s pretty ad hoc, all of our experiment analysis is unique to the experiment—we’d say, what are the events we need?

One related area is around quality for the new user experience. You don't get people saying ”oh, this is broken, I won't use the app”. It’s not the same as people who are in the tool all the time. So we did build out some observability: we would pipe how many people are getting through each step or seeing each modal into Datadog and we could alert if it dropped off to zero.

I think a related thing is that we had a really hard time tracking all the things that exist and all the different flows someone could go through. Enumerating them up front was hard. Going back to look at the table for Responsive Content System modals that were shown was a much more reasonable way to say “these are all the things that are happening right now”.

We use Segment to track all of our user events. RCS listens to all those events and sees if there is anything it needs to show or check for the user. I think that worked really well because we wanted to have a consistent and clear system of tracking throughout the app regardless.

Joe:

What were some of the most common questions folks were asking of this data? What did they want to know and learn from it?

Matt:

“Where are people dropping off in this flow?” Often they're in an experiment asking: “is this helping people be successful with this thing? Are people actually using the feature we're indicating?” So a bit is within the flow, and then a bit is comparing it to what happens after.

Alon:

Were the tracking context used to trigger when you’d show experiences in RCS?

Matt:

The way RCS works is that it sits on top of the event stream on Segment. We use Segment to track all of our user events. RCS listens to all those events and sees if there is anything it needs to show or check for the user. I think that worked really well because we wanted to have a consistent and clear system of tracking throughout the app regardless. RCS was able to piggyback on that really well and it was another way to encourage more systematic usage tracking throughout the app. We actually even imagined RCS running server side because you can have a Segment hook on the other side and we could have a more stateful solution—that was actually a hack week project that we built. I think that event driven architecture is exciting, but it's often a lot easier to just do a batch process for the same problems. If you miss an event it's really frustrating to catch up, and you have to build around that, versus if you’re just doing a sync every night you don't need to worry about missing data because tomorrow it will be fixed.

Alon:

It sounds like the triggers were all based off of client side events. Were there things that people wanted to trigger off where you hit a boundary with that approach?

Matt:

One boundary we hit was when the infrastructure team wanted to use RCS to alert people about downtime. They wanted to be able to hit a button and be say “Hey, downtime's coming up.” The fact that all of our configuration was on the client in a static file wasn't very conducive to that. They also had like higher uptime requirements, they needed it to work even if other parts of our system were down so they hit other barriers there too. Another: for our user conference the marketing team wanted to just push something out and I was like, “it doesn't work that way”. That required a little bit more like upfront work. This is where a developer being in the loop was a hindrance in that we needed a release cycle to be able to put something out.

Scaling RCS

Alon:

Can you talk about the process of getting more folks across the org to use RCS and any demands that put on your team?

Matt:

In the beginning it was just us on growth using it. Then other teams started launching feature update modals, pretty simple ones. For the very simple ones, it was just “plug in your UI here.” They weren’t even really using RCS, they were just plugging in a little bit of content. We tried to make that part really easy, but that was definitely the first step of other teams using it.

Then they started wanting to do more complex things, they had more complex ideas and we got more complex requests coming in. Figma was at the scale where we could be aware of what was launching and what people were doing. A lot of getting people to use RCS was just saying it over and over again, going to all hands and being like “we have this cool thing”. Or “here's the thing, and by the way, use this”. I think that some of the best communicators I've ever worked with are people that, by the time they're done talking, you're like “I get it, you can stop now.” That's what you want, when you're like trying to make a point to somebody by the time you're done they're ready for you to stop talking.

Then eventually it hit a tipping point where teams started having someone on their team who was the RCS expert and we got involved only for the more advanced questions because there was a person who knows the basics in their local area and can help.

The eventual evolution of this, though, was that as we started getting more complex requests we used that as one of the bases, along with our tracking and experiment system, to build a growth platform team and spin that out of the growth product team. That way we had dedicated resources to not just be keeping up, but envisioning the future of this system

Joe:

As you got broader adoption, it sounds like devs were coding those events that would drive those machines forward throughout the code base. Did you run into any issues managing those or having an understanding of which events were still actually being used to drive a machine?

Matt:

We did our best with the tracking system so that there would be as few of those events as possible, and I think we also had a very huge Segment bill so the data science team was very on top of what we were using everything for. We're now in the process of moving off segment because it's gotten too expensive.

Joe:

I like the cost driven approach to understanding what's alive and used in code.

We would find out that there's a feature update from a year ago that still exists. We don't really want to be updating people about a launch a year ago, it's unlikely anyone's actually hitting it. There's a small set of people who come back after a year who would hit it, but is that really what we want to show them when they come back? Probably not.

Matt:

I will say though, one of the problems we encountered was “what machines are actually important? What should we sunset?”

We would find out that there's a feature update from a year ago that still exists. We don't really want to be updating people about a launch a year ago, it's unlikely anyone's actually hitting it. There's a small set of people who come back after a year who would hit it, but is that really what we want to show them when they come back? Probably not. So that's still a sticking point, there's not really a good end of life for RCS machines at this point.

Joe:

Is there a manual inventory of all machines that are active currently? How do you audit that?

Matt:

There's no process so much as it happens occasionally. It happened recently because the team is working on Curator, which is the next iteration of RCS. So they've been doing a lot of auditing and trying to understand what the use cases are and how we can adapt the API.

Learnings and takeaways from building RCS

Alon:

Were there areas you pushed back on requirements to try to keep things simpler?

Matt:

A couple examples: first when the research team was excited to buy a tool for surveys. RCS supported a lot of those functionalities, so we had to figure out whether we were actually getting value. That is, are we going to now have two systems that are similar? We have much better component system we can use and it will look native and everything. Ultimately we did trial a vendor particularly because they were promising that they could analyze our data and give us good insights. So that was okay, because RCS can't do that—that's a value add.

That was one example of pushing back and making sure that other parts of the org knew that we have this capability because it's a very flexible system. Letting other folks know what it can do is something we had to push on.

One of the biggest points of friction that I hand waved a little too much at the beginning was how different machines will interact with each other. At this point, there are about 90 machines at Figma across the code base.

Another area that we encountered that was a challenge is that RCS was based on having all of your user state available to query, because all the querying happens on the client and there's no state. As Figma grew, that state grew bigger for larger organizations. A major project was breaking that down and incrementally loading it, which proved to be a challenge for RCS because it had this premise of “your data is just there, period”

Handling that was probably one of the biggest iterations on RCS and ultimately it still works by saying “we get all the data” and it’s a smaller but still large blog.

Alon:

Anything that you would've approached differently, if you were going back in time and start this effort from the beginning?

Matt:

One of the biggest points of friction that I hand waved a little too much at the beginning was how different machines will interact with each other. At this point, there are about 90 machines at Figma across the code base. So the system for that was a little naive, a little too simple.

I was aiming to make this so you just have to worry about your thing and not worry about too much else. I think we needed another notch in terms of how it’s going to interact with other machines. Balancing that awareness of other things happening versus your own thing.

Joe:

How do you manage the orchestration of machines today?

The complexity came in when machines would then have to express behavior for “what should I do if something else goes ahead of me?” Some would want to queue up, some would want to go back to where they were, some would want to exit and not worry about it.

Matt:

From the very beginning, one of the principles was RCS can't detract from the overall experience.

We isolated in terms of exceptions and we also made sure only one thing can show at once so the worst case is something gets hidden and something else shows up and the user sees a flash. We preferred that to there being a million popups and making it seem like Figma is not talking to itself.

The solution we went with was the ability to specify a channel that your machine is on. In practice that ended up being a single channel because we didn’t want to risk two things showing at the same time. Then we had priorities along with that where whenever an event came in, all the machines would run and say “I wanna show my thing, here's its priority” and then RCS would choose the highest priority one. The complexity came in when machines would then have to express behavior for “what should I do if something else goes ahead of me?” Some would want to queue up, some would want to go back to where they were, some would want to exit and not worry about it.

That was one of the complexities. People would start building their machine and now you need to fill out this blob of parameters that are a little bit esoteric and maybe one step too far. You have to understand what it means to be blocked, which means you need to dig in and understand what RCS is doing. I feel like there could have been a clearer semantic representation of that perhaps.

Joe:

Right, because suddenly you have to understand the system or the broader set of modules in the system to make sense of what you need to do. Hard.

Matt:

Hard stuff, yeah.

The Value of RCS

Alon:

If you take a step back how would you frame up the value that RCS has brought to the organization?

People really underestimate how complex onboarding can be and also how important it is. A really core part of a feature is the first experience with it. I think RCS helps product teams build these kind of features in a consistent, quick, and high quality way that allows them to focus on the user problem and the nuances there, and not the nuances of the implementation.

Matt:

I mentioned earlier, people really underestimate how complex onboarding can be and also how important it is. A really core part of a feature is the first experience with it. I think RCS helps product teams build these kind of features in a consistent, quick, and high quality way that allows them to focus on the user problem and the nuances there, and not the nuances of the implementation.

Alon:

What are you most proud of?

Matt:

One of the coolest things we built was a visualizer for these machines, and it's not even very complicated, it's just a graph representation of the machines. But it took that whiteboard drawing and then made it a thing that you can put in the PR and use as a debugging tool.

I would've loved to go a step farther with being able to visualize, as you're using it yourself, what's happening in the machine. But like we, we didn't go that far with the tooling, obviously.

Joe:

We're converging on the same goals from different angles. Folks design their machines in our canvas and then we also would love to have a machine as a debugging tool that one of our users can use to see one of their user flowing through their machines. There's complexity in understanding the machine and how users move through them.

Matt:

I think a similarity is RCS is sort of it’s own package. I imagine Dopt is the same where people aren't expecting to have to go into your code base to understand what's actually happening here. They could at Figma, but they didn't really want to. Probably the best thing we built was a tool where you could put in a URL param that said ‘enable logging’ and it would spit out what every machine thought of every event that came in, e.g. did it progress? You get a big, long thing with 80 machines telling you this is what I thought of that and you had to figure it out, but it had everything you could need to debug.

Curator, the next iteration of RCS

Joe:

You mentioned that the team is working on Curator, the next iteration on RCS. Can you give us a peek into what it is?

Matt:

A lot of the development happened while I was on parental leave, so I don't have as many details, but they're adding a lot of React hooks to make more in the language developers are used to. They actually noticed that a lot of the APIs they're designing are similar to the Dopt’s APIs when they looked at your docs—they were excited to see that. They're also evaluating whether there are simpler ways to represent a sequence of things happening. A lot of machines take the form of a sequence more than a graph where you can go different directions. So that'd maybe be less cognitive load for developers.

Alon:

Thank you so much Matt, really appreciate you spending the time sharing your experience with us!