Rethinking serverless with FLAME

Having dealt with the pain and complexity of a 100+ lambda function app for the last 4 years, I must say this post definitely hits the spot wrt. the downsides of FaaS serverless architectures.

When starting out, these downsides are not really that visible. On the contrary, there is a very clear upside, which is that everything is free when you have low usage, and you have little to no maintenance.

It is only later, when you have built a hot mess of lambda workflows, which become more and more rigid due to interdependencies, that you wish you had just gone the monolith route and spent the few extra hundreds on something self-managed. (Or even less now, e.g. on fly.io)

A question for author: what if not using Elixir?

A pattern I see over and over, which has graduated to somewhere between a theorem and a law, is that motivated developers can make just about any process or architecture work for about 18 months.

By the time things get bad, it's almost time to find a new job, especially if the process was something you introduced a year or more into your tenure and are now regretting. I've seen it with a handful of bad bosses, at least half a dozen times with (shitty) 'unit testing', scrum, you name it.

But what I don't know is how many people are mentally aware of the sources of discomfort they feel at work, instead of a more nebulous "it's time to move on". I certainly get a lot of pushback trying to name uncomfortable things (and have a lot less bad feelings about it now that I've read Good to Great). Nobody wants to say, "Oh look, the consequences of my actions."

The people materially responsible for the Rube Goldberg machine I help maintain were among the first to leave. The captain of that ship asked a coworker of mine if he thought it would be a good idea to open source our engine. He responded that nobody would want to use our system when the wheels it reinvented already exist (and are better). That guy was gone within three to four months, under his own steam.

That's why I'm always wary of people who hardly ever seem to stay anywhere more than a couple of years.

There's valuable learning (and empathy too) in having to see your own decisions and creations through their whole lifecycle. Understanding how tech debt comes to be, what tradeoffs were involved and how they came to bite later. Which ideas turned out to be bad in hindsight through the lens of the people making them at the time.

Rather than just painting the previous crowd as incompetent while simultaneously making worse decisions you'll never experience the consequences of.

Moving on every 18-24 months leaves you with a potentially false impression of your own skills/wisdom.

That's why I'm always wary of people who hardly ever seem to stay anywhere more than a couple of years

Do have some empathy for when job markets or life make people move on. I'd like to stay at an employer for more than 1-2 years, but between the layoffs (avoiding them or getting laid off) or the need for a higher salary that often only comes from switching jobs, its not always possible to build tenure.

Frankly its a big issue in the industry at large. I hate interviewing etc. but I'm not going to get paid 20% less because of it. I had to ride the job hopping treadmill for awhile, and I'd like to get off it just as much as you'd like to see people have more tenure.

When I first started out though, I worked for 5 years at the same place until it was very clear I was going to cap out on being able to advance and merit increases of 3-5% a year aren't going to cut it.

Yep. Absolutely this. I get the frustration of people building fragile junk and bouncing before the house of cards falls, but I wouldn’t hold switching jobs every couple of years against a candidate when a significant difference in salary is potentially on the table. A 20% raise, compounded over a couple of switches, is massive.

I got way more by switching jobs (after 4 years) than I was ever going to get by staying.

There's valuable learning (and empathy too) in having to see your own decisions and creations through their whole lifecycle.

This is so true. It's extremely enlightening to watch a design go from design docs to implementation and then finally the maintenance phase. A lot of problems that could happen never do and some unexpected ones pop up along the way.

It definitely colors which questions I want to ask people.

It is possible to avoid these traps, but then there are a lot of traps that we collectively have the wisdom to avoid but individually do not.

I started taking things apart and putting them back together at a very young age. When I was a young man I was in a big hurry to get somewhere, and so I could walk into a brownfield project and slowly reconstruct and deconstruct how we got here, would I have made the same decisions with the same information, and how do I feel about this news. Not only was I not falling into the "1 year of experience 10 times" dilema, I got more like 3-4 years of experience 4 times in 10 years, by playing historian.

My first almost-mentor left me with a great parting gift at the start of the dot-com era. He essentially convinced me of the existence of the hype cycle (in '95!), that we were at the beginning of one/several, he had seen previous ones play out, and they would play out again. Not cycles for new things, mind you, but cycles trying to bring back old things that had been forgotten. Like fashion. If anything it made me more likely to want to excavate the site.

Going into the trap knowing it's a trap doesn't necessarily save you, but it does improve your odds. Of course it also makes you a curmudgeon at a tender age.

And don’t forget that the developer fought like hell to use that new process, architecture, pattern, framework, etc

I have spent so much time fighting to do things the boring way.

There's only so much interesting code you can add to interesting code, before every single conversation becomes a proxy discussion of The Mythical Man Month - we can't teach anybody new how to use this code in anything like a reasonable time frame. The best we can do is create new experts as fast as the old ones disappear.

Really Interesting Code is more at home surrounded by Really Fucking Boring code. You get so blinded by the implementation details of the Interesting Code that you cannot see the forest for the trees. That is the secret wisdom of Relentless Refactoring. The more I rearrange the pieces the more things I can see that can be made out of them, some of which are better, and a few of which are much better or brilliant.

Last week I implemented something in 2 days that we wanted years ago and didn't do because it would have taken more than a month of 1.5 people working on it to fix. But after a conversation in which I enumerated all the little islands of sanity I had created between Here and There, it took a couple dozen lines of code to fix a very old and very sore problem.

I talk about FLAME outside elixir in one one of the sections in the blog. The tldr; is it's a generally applicable pattern for languages with a reasonable concurrency model. You likely won't get all the ergonomics that we get for free like functions with captured variable serialization, but you can probably get 90% of the way there in something like js, where you can move your modular execution to a new file rather than wrapping it in a closure. Someone implementing a flame library will also need to write the pooling, monitoring, and remote communication bits. We get a lot for free in Elixir on the distributed messaging and monitoring side. The process placement stuff is also really only applicable to Elixir. Hope that helps!

functions with captured variable serialization

Can't wait for the deep dive on how that works

That's just standard erlang/elixir- because all values are immutable, when a new anonymous function is defined it copies the current value of the external variables into it.

You can do it right now even without Flame, just by opening two Elixir nodes, then it's as simple as

```elixir

iex(first_node@localhost)> name = "Santa"

iex(first_node@localhost)> Node.spawn_link(:other_node@localhost, fn -> IO.puts "Hello #{name}" end)

Hello Santa

#PID<1337.42.0>

```

Note that while the string interpolation and `IO.puts` was run on `other_node@localhost`, it still did stdout from the first node- this is because it was the one that called `Node.spawn_link`, making it the 'group leader'. Outside of which stdout it went to, all the work was done in the other node.

Probably not too much to say that’s specific to FLAME. Closures are serializable and can be sent as messages to actors on the BEAM with a few caveats.

From a quick look at the code, this looks the magic line: https://github.com/phoenixframework/flame/blob/main/lib/flam...

JS kind of has inter process built in with Web Workers and the channel messaging API- I wonder whether it'd be possible to essentially create a "FLAME Worker" with the same API as a web worker but backed by distributed execution!

that you wish you had just gone the monolith route

Going from hundreds of lambdas to a monolith is overreacting to one extreme by going the other one. There's a whole spectrum of possible ways to split a project in useful ways, which simplify development and maintenance.

Anything in between is all the downsides of both approaches.

Once you have the flow for deploying always running hot application(s) with autoscaling the benefits of lambda are basically gone.

Low volume scale to zero is just another route. No 15 minute limit, no having to marshal data through other AWS services because that's all lambda talks to, no more eventbridge for cron, no more payload size limits and having to use S3 as buffer, no more network requests between different parts of the same logical app, code deploys are atomic you're either at v1.x or v1.x+1 but never some in-between state.

I really do like Lambda but once you're at the spend where it's the same as some dedicated always-on compute the value drops off.

I don't get why you'd have a 100+ lambda function app... i can see purpose built lambdas (ie we have one for "graphql" and "frontend" and a few backend services) but unless you're at Meta size, why would you have 100 lambdas? Do you have 100 teams?

I'm working on something that I think might solve the problem in any language (currently have an sdk for typescript, and java in the works). You can avoid splitting an application into 100s of small short-running chunks if you can write normal service-orientated code, where lambdas can call each other. But this isn't possible without paying for all that time waiting around. If the Lambdas can pause execution while they are blocked on IO, it solves the problem. So I think durable execution might be the answer!

I've been working on a blog post to show this off for the last couple of weeks:

https://restate.dev/blog/suspendable-functions-make-lambda-t...

I couldn't even stand having a dozen lambdas. The app was originally built by someone who didn't think much about maintenance or deployment. Code was copy-pasted all over the place. Eventually, we moved to a "fat lambda" monolith where a single lambda serves multiple endpoints.

You can monolith on lamba if you don't care too much about cold starts or can manage it..

Put another way, you can monolith and minimize spend on AWS; it's not either or.

I'm using asp.net these days and even a chunky app published ready to run with optimized ef models starts relatively quickly.

Author here. I’m excited to get this out and happy to answer any questions. Hopefully I sufficiently nerd sniped some folks to implement the FLAME pattern in js , go, and other langs :)

This looks great. Hopefully Microsoft are paying attention because Azure Functions are way too complicated to secure and deploy, and have weird assumptions about what kind of code you want to run.

weird assumptions about what kind of code you want to run

Those "weird assumptions" are what makes the experience wonderful for the happy path. If you use the C#/v4 model, I can't imagine you'd have a hard time. Azure even sets up the CI/CD for you automatically if your functions are hosted in Github.

If your functions need to talk to SQL, you should be using Managed Identity authentication between these resources. We don't have any shared secrets in our connection strings today. We use Microsoft Auth to authenticate access to our HttpTrigger functions. We take a dep on IClaimsPrincipal right in the request and everything we need to know about the user's claims is trivially available.

I have zero experience using Azure Functions outside of the walled garden. If you are trying to deploy python or rust to Az Functions, I can imagine things wouldn't be as smooth. Especially, as you get into things like tracing, Application Insights, etc.

I feel like you should only use Microsoft tech if you intend to drink a large amount of their koolaid. The moment you start using their tooling with non C#/.NET stacks, things go a bit sideways. You might be better off in a different cloud if you want to use their FaaS runners in a more "open" way. If you can figure out how to dose yourself appropriately with M$ tech, I'd argue the dev experience is unbeatable.

Much of the Microsoft hate looks to me like a stick-in-bike-wheels meme. You can't dunk on the experience until you've tried the one the chef actually intended. Dissecting your burger and only eating a bit of the lettuce is not a thorough review of the cuisine on offer.

You can't dunk on the experience until you've tried the one the chef actually intended. Dissecting your burger and only eating a bit of the lettuce is not a thorough review of the cuisine on offer.

But Microsoft isn't selling burgers that people are taking a bit of lettuce from. They're selling lettuce, and if that lettuce sucks in any context that isn't the burger that they're also selling, then complaining about the quality of their lettuce is valid.

A cloud vendor where using some of the most popular languages in the world makes your life harder is a genuine reason to dislike something.

Have you used azure functions in an enterprise setting? Because it’s a terrible experience.

I think our “best” issue is how sometimes our functions won’t be capable of connecting to their container registry. For no apparent reason. We have an Entra ID group that has Arc Pull access and we add new function slot identities to it automatically, and for the most part it works fine, but then suddenly it won’t.

We currently has a pipeline slot, that is completely the same as the four other slots, which will only deploy a new function version if you deploy it twice through the Azure pipeline. We’ve had Microsoft look at it and their suggestion was to delete the pipeline and create it again.

And that’s just part of it. Then comes the VNETs, the subnets the private endpoints and how both subscriptions and resource groups make everything a tiny battle.

I don’t really mind that much, we’re abandoning it in favour of Azure Container Apps and a fully run Bicep + Dapr pipeline and we’re never looking back.

Though, to be fair to Microsoft, the way they designed Azure Functions makes the migration away from them really, really, easy. Which is frankly a brilliant design for the more managed side of “serverless” in my opinions. It’s just a shame that the managed part just doesn’t work very well. The functions themselves work fine, well maybe not if you don’t use .net in isolation, but I can’t speak about that as we weren’t going to trust Microsoft to update our dependencies (even if they are Microsoft SDKs).

I had a lot of problems trying to set up Azure Functions with Terraform a couple of years ago. Wonder if it's gotten better?

https://www.bbkane.com/blog/azure-functions-with-terraform/

I used them with Python. Simple enough but opinionated. I didn’t play around with durable functions.

Don’t have strong feelings there. It worked. I did have some issues with upgrading the functions but found the work arounds.

azure functions don't fit common definition of serverless, I've had a few convos with them over several years.. but there is really a mismatch owing to the original origin at azure, and real lack of understanding of the space, ie origin is as built on top of web apps.. ie.. azure functions is built on a hack for to try and enter the marketing in the serverless space at its origins. how many websites do you need to run... ie you can't run more than 50 functions, or the 16 cell table on different runtime options (ie. provision servers for your server less)... consumption is better, but the origins in web apps means its just a different product.. hey every function has a url by default :shrug: azure needs a radical rethink of what serverless is, I haven't seen any evidence they got the memo. in aws, lambda originated out of s3, re bring compute to storage.

I’d encourage you to temper the hyperbole.

It’s easy to dismiss these types of articles as straight sales pitches when the problem is presented as “…a fate worse than death.” and the solution so easy and painless, you’d be a fool not to dismiss any other approach! Thats the MO of snake oil sellers.

Instead, I think a more objective comparison with much less editorializing would be less of a turn off for folks curious about a better approach to solve a problem that you did well in outlining at the beginning of the article.

Hopefully you take this as constructive feedback! I think the substance of the article is interesting, I was just turned off by the presentation.

I especially liked that like. There's always a bit of space for comedy.

On an unrelated note, what syntax highlighting theme did you use for the code? I love it.

Can you run your app on a smaller vm but boot a larger one ok demand? I'm thinking bumblebee etc

Great article and video, and very exciting concept! Looking forward to a JS implementation, but that looks like a challenge to get done.

And now I feel (a tiny bit) bad for sniping ffmpeg.fly.dev :)

I feel like the most natural implementation would be something like vert.x for the jvm. They already have the mechanisms to handle async execution (via reactive extensions and futures and coroutines on top of the event bus) and serialization and distribution of data across a cluster. There are eventbus clients for many popular languages as well, so you'd be able to build your application in a mix of languages.

That's great. I agree with the whole thesis.

We took an alternative approach with https://www.windmill.dev which is to consider the unit of abstraction to be at the source code level rather than the container level. We then parse the main function, and imports to extract the args and dependencies, and then run the code as is in the desired runtime (typescript, python, go, bash). Then all the secret sauce is to manage the cache efficiently so that the workers are always hot regardless of your imports

It's not as integrated in the codebase as this, but the audience is different, our users build complex workflows from scratch, cron jobs, or just one-off scripts with the auto-generated UI. Indeed the whole context in FLAME seems to be snapshotted and then rehydrated on the target VM. Another approach would be to introduce syntax to specify what is required context from what is not and only loading the minimally required. That's what we are currently exploring for integrating better Windmill with existing codebase instead of having to rely on http calls.

Indeed the whole context in FLAME seems to be snapshotted and then rehydrated on the target VM. Another approach would be to introduce syntax to specify what is required context from what is not and only loading the minimally required.

This isn't strictly what is happening. FLAME just uses the BEAM's built in clustering features to call a function on a remote node. That implicitly handles transferring only the context that is necessary. From the article:

FLAME.call accepts the name of a runner pool, and a function. It then finds or boots a new copy of our entire application and runs the function there. Any variables the function closes over (like our %Video{} struct and interval) are passed along automatically.

Fair point, TIL about another incredible capability of the BEAM. As long as you're willing to write Elixir, this is clearly a superior scheme for deferred tasks/background jobs.

One issue I see with this scheme still is that you have to be careful of what you do at initialization of the app since now all your background jobs are gonna run that. For instance, maybe your task doesn't need to be connected to the db and as per the article it will if your app does. They mention having hot-modules, but what if you want to run 1M of those jobs on 100 workers, you now have a 100 unnecessary apps. It's probably a non-issue, the number of things done at initialization could be kept minimal, and FLAME could just have some checks to skip initialization code when in a flame context.

This is actually a feature. If you watch the screencast, I talk about Elixir supervision trees and how all Elixir programs carefully specify the order their services stop and stop in. So if your flame functions need DB access, you start your Ecto.Repo with a small or single DB connection pool. If not, you flip it off.

It's probably a non-issue, the number of things done at initialization could be kept minimal, and FLAME could just have some checks to skip initialization code when in a flame context.

Exactly :)

So, Chris, how do you envision the FLAME child understanding what OTP children it needs to start on boot, because this could be FLAME.call dependent if you have multiple types of calls as described above. Is there a way to pass along that data or for it to be pulled from the parent?

Acknowledging this is brand new; just curious what your thinking is.

EDIT: Would it go in the pool config, and a runner as a member of the pool has access to that?

Good question. The pools themselves in your app will be per usecase, and you can reference the named pool you are a part of inside the runner, ie by looking in system env passed as pool options. That said, we should probably just encode the pool name along with the other parent info in the `%FLAME.Parent{}` for easier lookup

Ah, that makes a lot of sense - I think the FLAME.Parent{} approach may enable backends that wouldn't be possible otherwise.

For example, if I used the heroku api to do the equivalent of ps:scale to boot up more nodes - those new nodes (dynos in heroku parlance) could see what kind of pool members they are. I don't think there is a way to do dyno specific env vars - they apply at the app level.

If anyone tries to do a Heroku backend before I do, an alternative might be to use distinct process types in the Procfile for each named pool and ps:scale those to 0 or more.

Also, might need something like Supabase's libcluster_postgres[1] to fully pull it off.

EDIT2: So the heroku backend would be a challenge. You'd maybe have to use something like the formation api[2] to spawn the pool, but even then you can't idle them down because Heroku will try to start them back. I.e. there's no `restart: false` from what I can tell from the docs or you could use the dyno api[3] with a timeout set up front (no idle awareness)

[1] https://github.com/supabase/libcluster_postgres

[2] https://devcenter.heroku.com/articles/platform-api-reference...

[3] https://devcenter.heroku.com/articles/platform-api-reference...

Oops you've got an extra w, here is the URL for anyone looking: https://www.windmill.dev/

I love the project's goals; I'm really hoping Windmill becomes a superior open-source Retool/Airtable alternative!

Thanks, fixed! (and thanks)

It then finds or boots a new copy of our entire application and runs the function there.

So for each “Flame.call” it begins a whole new app process and copies the execution context in?

A very simple solution to scaling, but I’d imagine this would have some disadvantages…

Adding 10ms to the app startup time, adds 10ms to every “Flame.call” part of the application too… same with memory I suppose

I guess these concerns just need to be consider when using this system

The FLAME.Pool discussed later in the post addresses this. Runners are pooled and remain configurable hot for whatever time you want before idling down. Under load you are rarely paying the cold start time because the pool is already hot. We are also adding more sophisticated pool growth techniques to the Elixir library next so you also avoid hitting an at capacity runner and cold starting one.

For hot runners, the only overhead is the latency between the parent and child, which should be the same datacenter so 1ms or sub 1ms.

Cold start time is the issue with most serverless runtimes.

Your own mission statement states: "We want on-demand, granular elastic scale of specific parts of our app code." Doing that correctly is fundamentally a question of how long you need to wait for cold starts, because if you have a traffic spike, the spiked part of the traffic is simply not being served until the cold start period elapses. If you're running hot runners with no load, or if you have incoming load without runners (immediately) serving them, then you're not really delivering on your goal here. AWS EC2 has had autoscaling groups for more than a decade, and of course, a VM is essentially a more elaborate wrapper for any kind of application code you can write, and one with a longer cold-start time.

Under load you are rarely paying the cold start time because the pool is already hot.

My spiky workloads beg to differ.

Depending of course on the workload and request volume, I imagine you could apply a strategy where code is run locally while waiting for a remote node to start up, so you can still serve the requests on time?

No, because then you're dividing the resources allocated to the function among the existing run + the new run. If you over-allocate ahead of time to accommodate for this, you might as well just run ordinary VMs, which always have excess allocation locally; the core idea of scaling granularly is that you only allocate the resources you need for that single execution (paying a premium compared to a VM but less overall for spiky workloads since less overhead will be wasted).

In a Elixir/Phoenix app I don't think this will be really used for web traffic and more for background/async jobs.

Currently the per-runner concurrency is limited by a fixed number. Have you thought about approaches that instead base this on resource usage, so that runners can be used optimally?

Yes, more sophisticated pool growth options is something I want longer term. We can also provide knobs that will let you drive the pool growth logic yourself if needed.

I’m a huge fan of Serverless. I’m also a huge fan of simplicity.

My advice to anyone starting out on a purely web adventure: run your monolith on lambda. Just upload your whole app as your lambda. Use dynamodb for storage.

When your app gets popular, then optimize. Make a separate lambda for the popular part. Spin up a relational database or maybe an auto scaling group.

But start with a monolith on lambda. Get all the benefits without the downsides.

That advice sounds to be opposite of simple for someone starting out a new project. In addition to focusing on their core stack, they would also need to deal with Lambda details and its restrictions. How would they work around the 15 minute execution limit, for one?

My advice would be to start with a plain old monolith. Even launch with it. Once, and if, you have workloads that a) make sense to run as an isolated function, and b) would help offload some processing from your main app, consider splitting it and using a Lambda.

Doing so prematurely, or worse, starting with it, opens the door to complexity you just don't need early on.

You shouldn’t need to run anything more than a minute or two. If you do lambda isn’t for you.

I’ve set up multiple businesses this way. It works great. There are no lambda details to worry about, the defaults are fine. Just upload code and run. No load balancers, no firewall rules. It couldn’t be simpler.

If it doesn’t work then worry about that other stuff.

But it’ll just work 9 times out of 10 in my experience.

You shouldn’t need to run anything more than a minute or two. If you do lambda isn’t for you.

Right, but how do you know that on a new project? What happens if you _do_ need more execution time, RAM, CPU, IOPS, etc., than what Lambda provides? Or need some specific integration or workflow that doesn't fit the Lambda model? It would be very expensive to backtrack and implement a traditional architecture at that point, instead of starting with the traditional architecture and using Lambda only when you have a good use case for it.

Not to mention that you're going all in with a single cloud provider. Vendor lock-in at an early stage is not a good idea.

Just upload code and run. No load balancers, no firewall rules. It couldn’t be simpler.

It's probably simple for you if you've done it many times, but it would require a considerable amount of time and effort for someone unfamiliar with the stack or AWS. The simplest thing is often whatever we're used to, and if that's a traditional LAMP stack, then I'd suggest going with that first.

You don’t know. You write the code, upload it, see if it runs. If it doesn’t you run it somewhere else.

It’s dead simple for a beginner, far simpler than a LAMP stack.

I think you misunderstand how lambda works because your objections don’t make sense.

Lambda is a LAMP stack. They just take care of LA for you.

This advice sounds rad! Do you know of any open-source codebase that does this?

Use dynamodb for storage.

So far I've done the opposite, i.e. use a relational database at the beginning, and, if the access pattern is clear and there are some parts of the application that would not scale well with SQL, move those parts to DynamoDB.

can not give you a codebase for permission reasons, but ie in nodejs land you rather easily can do:

1. write a bog standard express.js backend (might or might not use a SPA/SSR frontend)

2. make the normal (dev mode) server start with an index.js (which boots the express app on a port) and ontop of that an additional lambda.js next to it, which wraps the app (not booting it) with `@vendia/serverless-express`.

3. use AWS CDK (which is typescript anyways!) and ship it as a "NodejsFunction", which also will use esbuild under the hood to bundle your app into one JS blob (no need or a ZIP container with node_modules!) on deploy. The entry file is the "lambda.js" from step 2.

4. in addition use the CDK to configure some traffic origin for the lambda, the cheapest one being a CloudFront function URL - which also acts as a quite good cache in general when you send the appropriate HTTP response headers from your express app!

5. Point a domain to the Cloudfront.

6. Profit!

... I've used this excessivbely in the past years for lots of microservices/microfrontends and it basically works as promised: scale to zero, scale horizontally on demand automatically, and being rather cheap (esp when in the free tier then its basically free to run small-mid loads).

Having said that, this WILL become a maintenance trainwreck when scaled up, telling from my devops experience. All the libraries/apis/infrastructure (especially in nodejs land) tend to have breaking updates all the time, so better have at least one FTE dedicated to maintenance of all the moving parts.

And tbh: all this pain goes away for like <100$/mo with Elixir/Phoenix/LiveView on fly.io. a single phoenix app on a single server can deal with _suprising_ amounts of traffic, and even if you get to serious load, scaling horizontally is outright trivial with Elixir+Fly. And there is no need for additional infra like Redis or Message Queues, since these things have natively built-in equivalents in Elixir (or: the BEAM itself). So only an appserver and a database you need here in terms of infrastructure/maintenance.

Very interesting concept, however it's a bit soured by the fact that Container-based FaaS is never mentioned, and it removes a decent chunk of the negatives around FaaS. Yeah you still need to deal with the communication layer (probably with managed services such as SQS or Pub/Sub), but there's no proprietary runtime needed, no rewrites needed between local/remote runtime environments.

what are some examples of container-based faas? like you put your docker image onto lambda?

* Google Cloud Run - https://cloud.google.com/run/docs/deploying#command-line

* OpenFaaS - https://www.openfaas.com/blog/porting-existing-containers-to...

* AWS Lambda - https://docs.aws.amazon.com/prescriptive-guidance/latest/pat...

* Scaleway Serverless Containers - https://www.scaleway.com/en/serverless-containers/

* Azure Container Instances - https://learn.microsoft.com/en-us/azure/container-instances/...

Probably others too, those are just the ones I know off the top of my head. I see very little reason to use traditional Function-based FaaS, which forces you into a special, locked-in framework, instead of using containers that work everywhere.

ok yeah so like an image on lambda, totally agree, a lot of the pros of serverless without a lot of the cons

https://knative.dev/ - (CloudRun API is based on this OSS project)

Bring-your-own-container is certainly better than proprietary js runtimes, but as you said it carries every other negative I talk about in the post. You get to run your language of choice, but you're still doing all the nonsense. And you need to reach for the mound of proprietary services to actually ship features. This doesn't move the needle for me, but I would be happy to have it if forced to use FaaS.

Pretty cool idea, and that api is awesome.

CPU bound work like video transcoding can quickly bring our entire service to a halt in production

Couldn't you just autoscale your app based on cpu though?

Yes and no: Maybe the rest of your workloads don't require much CPU- you only need this kind of power for one or two workloads, and you don't want them getting crowded out by other work potentially.

Or they require a GPU.

Or your core service only needs 1-2 servers, but you need to scale up to dozens/hundreds/thousands on demand, for work that only happens maybe once a day.

fair enough.

i think it's cool tech, but none of those things are "hair on fire" problems for me. i'm sure they are for some people.

Thanks! I try to address this thought in the opening. The issue with this approach is you are scaling at the wrong level of operation. You're scaling your entire app, ie webserver, in order to service specific hot operations. Instead what we want (and often reach for FaaS for) is granular elastic scale. The idea here is we can do this kind of granular scale for our existing app code rather that smashing the webserver/workers scale buttons and hoping for the best. Make sense?

If you autoscale based on CPU consumption, doesn’t the macro level scaling achieve the same thing? Is the worry scaling small scale services where marginal scaling is a higher multiple, e.g. waste from unused capacity?

I'm firmly in the "I prefer explicit lambda functions for off-request work" camp, with the recognition that you need a lot of operational and organizational maturity to keep a fleet of functions maintainable. I get that isn't everyone's cup of tea or a good fit for every org.

That said, I don't understand this bit:

Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user’s device somehow

I assumed by "worker queue" they were talking about something akin to Celery in python land, but it actually does handle all this glue. As far as I can tell, Celery provides a very similar developer experience to FLAME but has the added benefit that if you do want durability those knobs are there. The only real downside seems you need redis or rabbit to facilitate it? I don't have any experience with them but I'd assume it's the same story with other languages/frameworks (eg ruby+sidekiq)?

Maybe I'm missing something.

Wouldn’t you lose, for example, streaming capabilities once you use Celery? You would have to first upload the whole video, then enqueue the job, and then figure out a mechanism to send the thumbnails back to that client, while with FLAME you get a better user experience by streaming thumbnails as soon as the upload starts.

I believe the main point though is that background workers and FLAME are orthogonal concepts. You can use FLAME for autoscaling, you can use Celery for durability, and you could use Celery with FLAME to autoscale your background workers based on queue size. So being able to use these components individually will enable different patterns and use cases.

Yes fair point. Celery has to pickle parameters so that they can flow through redis or rabbit to the worker pool.

It's worth pointing out this transparent remote function call ability is unique to the BEAM. The FLAME pattern in other languages (as described in the article with Javascript) would also require serializable parameters.

I believe the main point though is that background workers and FLAME are orthogonal concepts

Yeah I think this is what the author is driving at. I appreciate you helping me try and wrap my brain around it :)

Yeah, I think this was more inward focusing on things like `Oban` in elixir land.

He's made the distinction in the article that those tools are great when you need durability, but this gives you a lower ceremony way to make it Just Work™ when all you're after is passing off the work.

Personally I feel more like this is pythons multiprocessing, with some spices added to start a server on demand.

It's been a while since I last read the multiprocessing docs but last time I did it did a pretty poor job of showing all the fancy tricks multiprocessing supports, like running a function in a different interpreter or on a different server altogether.

I created something similar at my work, which I call "Long Lamda", the idea is that what if a lambda could run more than 15 minutes? Then do everything in a Lambda. An advantage of our system as is you can also run everything locally and debug it. I didn't see that with the FLAME but maybe I missed it.

We use it for our media supply chain which processes a few hundred videos daily using various systems.

Most other teams drank the AWS Step Koolaid and have thousands of lambas deployed, with insane development friction and surprisingly higher costs. I just found out today that we spend 6k a month on "Step Transitions", really?!

you can also run everything locally and debug it. I didn't see that with the FLAME but maybe I missed it.

He mentioned this:

With FLAME, your dev and test runners simply run on the local backend.

and this

by default, FLAME ships with a LocalBackend

Yes, but does mean you can debug it?

Yes, the local backend runs on your machine, and you would debug it like any other Elixir code on your project/machine.

This is a very neat approach and I agree with the premise that we need a framework that unifies some of the architecture of cloud - shuttle.rs has some thoughts here. I do take issue with this framing:

- Trigger the lambda via HTTP endpoint, S3, or API gateway ($)

  * Pretending that starting a fly machine doesn't cost the same as triggering via s3 seems disingenuous.

- Write the bespoke lambda to transcode the video ($)

  * In go this would be about as difficult as flame -- you'd have to build a different entrypoint that would be 1 line of code but it could be the same codebase. Node it would depend on bundling but in theory you could do the same -- it's just a promise that takes an S3 event, that doesn't seem much different.

- Place the thumbnail results into SQS ($)

  * I wouldn't do this at all. There's no reason the results need to be queued. Put them in a deterministically named s3 bucket where they'll live and be served from. Period.

- Write the SQS consumer in our app (dev $)

  * Again -- this is totally unnecessary. Your application *should forget* it dispatched work. That's the point of dispatching it. If you need subscribers to notice it or do some additional work I'd do it differently rather than chaining lambdas.

- Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $)

  * Your lambda really should be doing the DB work not your main application. If you've got subscribers waiting to be informed the lambda can fire an SNS notification and all subscribed applications will see "job 1234 complete"

So really the issue is:

* s3 is our image database

* our app needs to deploy an s3 hook for lambda

* our codebase needs to deploy that lambda

* we might need to listen to SNS

which is still some complexity, but it's not the same and it's not using the wrong technology like some chain of SQS nonsense.

Thanks for the thoughts – hopefully I can make this more clear:

* Pretending that starting a fly machine doesn't cost the same as triggering via s3 seems disingenuous.

You're going to be paying for resources wherever you decide to run your code. I don't think this needs to be spelled out. The point about costs is rather than paying to run "my app", I'm paying at multiple layers to run a full solution to my problem. Lambda gateway requests, S3 put, SQS insert, each have their own separate costs. You pay a toll at every step instead of a single step on Fly or wherever you host your app.

* I wouldn't do this at all. There's no reason the results need to be queued. Put them in a deterministically named s3 bucket where they'll live and be served from. Period. This is totally unnecessary. Your application should forget it dispatched work. That's the point of dispatching it. If you need subscribers to notice it or do some additional work I'd do it differently rather than chaining lambdas.

You still need to tell your app about the generated thumbnails if you want to persist the fact they exist where you placed them in S3, how many exist, where you left off, etc.

* Your lambda really should be doing the DB work not your main application. If you've got subscribers waiting to be informed the lambda can fire an SNS notification and all subscribed applications will see "job 1234 complete"

This is exactly my point. You bolt on ever more Serverless offerings to accomplish any actual goal of your application. SNS notifications is exactly the kind of thing I don't want to think about, code around, and pay for. I have Phoenix.PubSub.broadcast and I continue shipping features. It's already running on all my nodes and I pay nothing for it because it's already baked into the price of what I'm running – my app.

This is exactly my point. You bolt on ever more Serverless offerings to accomplish any actual goal of your application. SNS notifications is exactly the kind of thing I don't want to think about, code around, and pay for. I have Phoenix.PubSub.broadcast and I continue shipping features. It's already running on all my nodes and I pay nothing for it because it's already baked into the price of what I'm running – my app.

I think this is fine if and only if you have an application that can subscribe to PubSub.broadcast. The problem is that not everything is Elixir/Erlang or even the same language internally to the org that runs it. The solution (unfortunately) seems to be reinventing everything that made Erlang good but for many general purpose languages at once.

I see this more as a mechanism to signal the runtime (combination of fly machines and erlang nodes running on those machines) you'd like to scale out for some scoped duration, but I'm not convinced that this needs to be initiated from inside the runtime for erlang in most cases -- why couldn't something like this be achieved externally noticing the a high watermark of usage and adding nodes, much like a kubernetes horizontal pod autoscaler?

Is there something specific about CPU bound tasks that makes this hard for erlang that I'm missing?

Also, not trying to be combative -- I love Phoenix framework and the work y'all are doing at fly, especially you Chis, just wondering if/how this abstraction leaves the walls of Elixir/Erlang which already has it significantly better than the rest of us for distributed abstractions.

You're literally describing what we've built at https://www.inngest.com/. I don't want to talk about us much in this post, but it's so relevant it's hard not to bring it up. (Huge disclaimer here, I'm the co-founder).

In this case, we give you global event streams with a durable workflow engine that any language (currently Typescript, Python, Go, Elixir) can hook into. Each step (or invocation) is backed by a lightweight queue, so queues are cheap and are basically a 1LOC wrapper around your existing code. Steps run as atomic "transactions" which must commit or be retried within a function, and are as close to exactly once as you could get.

This is great! It reminds me of a (very lightweight) Elixir specific version of what we built at https://www.inngest.com/.

That is, we both make your existing code available to serverless functions by wrapping with something that, essentially, makes the code callable via remote-RPC .

Some things to consider, which are called out in the blog post:

Often code like this runs in a series of imperative steps. Each of these steps can run in series or parallel as additional lambdas. However, there's implicit state captured in variables between steps. This means that functions become workflows. In the Inngest model, Inngest captures this state and injects it back into the function so that things are durable.

On the note of durability, these processes should also be backed by a queue. The good thing about this model is that queues are cheap. When you make queues cheap (eg. one line of code) everything becomes easy: any developer can write reliable code without worrying about infra.

Monitoring and observability, as called out, is critical. Dead letter queues suck absolute major heaving amounts of nauseous air, and being able to manage and replay failing functions or steps is critical.

A couple differences wrt. FLAME and Inngest. Inngest is queue backed, event-driven, and servable via HTTP across any language. Because Inngest backs your state externally, you can write a workflow in Elixir, rewrite it in Typescript, redeploy, and running functions live migrate across backend languages, similar to CRIU.

Being event-driven allows you to manage flow control: everything from debounce to batching to throttling to fan-out, across any runtime or language (eg. one Elixir app on Fly can send an event over to run functions on TypeScript + Lambda).

I'm excited where FLAME goes. I think there are similar goals!

Ingest looks like an awesome service! I talk about job processors/durability/retries in the post. For Elixir specifically for durability, retries, and workflows we reach for Oban, which we'd continue to do here. The Oban job would call into FLAME to handle the elastic execution.

FYI: there's an Elixir SDK for Inngest as well. Haven't fully announced it yet, but plan to post it in ElixirForum some time soon.

https://github.com/inngest/ex_inngest

I have a question about distributed apps with FLAME. Let's say the app is running in 3 Fly regions, and each region has 2 "parent" servers with LiveViews and everything else.

In that case, how should the Flame pools look like? Do they communicate in the same region and share the pools? Or are Flame pools strictly children of each individual parent? Does it make a difference in pricing or anything else to run on hot workers instead of starting up per parent?

What would you recommend the setup be in such a case?

Aside: I really liked the idea of Flame with Fly. It's a really neat implementation for a neat platform!

Or are Flame pools strictly children of each individual parent?

Confirm. Each parent node runs its own pool. There is no global coordination by design.

Does it make a difference in pricing or anything else to run on hot workers instead of starting up per parent?

A lot would depend on what you are doing, the size of runner machines you decide to start in your pools (which can be different sizes from the app or other pools), etc. In general Elixir scales well enough that you aren't going to be running your app in every possible region. You'll be in a handful of regions servicing traffic in those regions and the load each region has. You could build in your own global coordination on top, ie try to find processes running on the cluster already (which could be running in a FLAME runner), but you're in distributed systems land and it All Depends™ what you're building the tradeoffs you want.

Thanks for the reply!

Can I suggest adding some docs to Fly to run Flame apps? To cover the more complex aspects of integrating with Fly, such as running Flame machines with a different size compared to the parent nodes, what kind of fly.toml config works and doesn't work with Flame, such as the auto_start and auto_stop configurations on the parent based on the number of requests, and anything else particularly important to remember with Fly.

Superficially, this sounds similar to how Google App Engine and Cloud Run already work (https://cloud.google.com/appengine/migration-center/run/comp...). Both are auto-scaling containers that can run a monolith inside.

Is that a fair comparison?

They handle scaling at only highest level, similar to spinning up more dynos/workers/webservers like I talk about in the intro. FLAME is about elastically scaling individual hot operations of your app code. App Engine and such are about scaling at the level of your entire app/container. Splitting your operations into containers then breaks the monolith into microservice pieces and introduces all the downsides I talk about in the post. Also, while it's your code/language, you still need to interface with the mount of proprietary offerings to actual accomplish your needs.

Splitting your operations into containers then breaks the monolith into microservice pieces and introduces all the downsides

A pattern here is to not split the monolith and use the same container for your main app and hot operations. The hot operations just need some different configuration eg. container args or env vars

Whoa, great idea, explained nicely!

Elixir looks ridiculously powerful. How's the job market for Elixir -- could one expect to have a chance at making money writing Elixir?

Yep! Elixir is ridiculously powerful. Best place to look for work is the phoenix discord which has a pretty active job channel.

It's indeed very powerful and there are jobs out there. Besides being an excellent modern toolbox for lots of problems (scaling, performance, maintenance) and having the arguably best frontend-tech in the industry (LiveView), the Phoenix framework also is the most loved web framework and elixir itself the 2nd most loved language according to the stackoverflow survey.

Its still a more exotic choice of a tech stack, and IMO its best suited for when you have fewer but more senior devs around, this is where it really shines. But I also found that phoenix codebase survived being "tortured" by a dozen juniors over years quite well.

I basically make my money solely with Elixir and have been for ~5 years now, interrupted only by gigs as a devops for the usual JS nightmares including serverless (where the cure always has been rewriting to Elixir/Phoenix at the end).

So how does it work if there are workers in flight and you redeploy the main application?

If you're talking about inflight work that is running on the runner, there is a Terminator process on the runner that will see the parent go away, then block on application shutdown for the configured `:shutdown_timeout` as long as active work is being done. So active processes/calls/casts are given a configurable amount of time to finish and no more work is accepted by the runner.

If you're talking about a FLAME.call at app shutdown that hasn't yet reached the runner, it will follow the same app shutdown flows of the rest of your code and eventually drop into the ether like any other code path you have. If you want durability you'd reach for your job queue (like Oban in Elixir) under the same considerations as regular app code. Make sense?

The workers get terminated. If the work they were doing is important, it should be getting called from your job queue and so it should just get started up again.

Also thanks to Fly infrastructure, we can guarantee the FLAME runners are started in the same region as the parent.

If customers think this is a feature and not a bug, then I have a very different understanding about what serverless/FaaS is meant to be used for. My division is pretty much only looking at edge networking scenarios. Can I redirect you to a CDN asset in Boston instead of going clear across the country to us-west-1? We would definitely NOT run Lamba out of us-west-1 for this work.

There are a number of common ways that people who don't understand concurrency think they can 'easily' or 'efficiently' solve a problem that provably do not work, and sometimes tragicomically so. This feels very similar and I worry that fly is Enabling people here.

Particularly in Elixir, where splitting off services is already partially handled for you.

If customers think this is a feature and not a bug, then I have a very different understanding about what serverless/FaaS is meant to be used for. My division is pretty much only looking at edge networking scenarios. Can I redirect you to a CDN asset in Boston instead of going clear across the country to us-west-1? We would definitely NOT run Lamba out of us-west-1 for this work.

I'm not sure how you're misunderstanding, but why would it go across the country when it's guaranteed to run from the same parent? Just deploy the app in the region you want, and now its Flame pools will be deployed in the same region.

If you want to switch the region it runs in, you can also easily just contact the other cluster to tell it to pick up the work.

just

Every solution is easy when you oversimplify the problem.

None of what you said is true if you care about persistent state in the app. Local reads and distant writes are how you avoid speed of light problems.

As an alternative to Lambdas I can see this being useful.

However, the overhead concerns me. This would only make sense in a situation where the function in question takes long enough that the startup overhead doesn't matter or where the main application is running on hardware that can't handle the resource load of many instances of the function in question.

I'm still, I think, in the camp of "monoliths are best in most cases." It's nice to have this in the toolbox, though, for those edge cases.

I don't think this goes against "monoliths are best in most cases" at all. In fact it supports that by letting you code like it's all one monolith, but behind-the-scenes it spins up the instance.

Resource-wise if you had a ton of unbounded concurrency then that would be a concern as you could quickly hit instance limits in the backend, but the pooling strategy discussed lower in the post addresses that pretty well, and gives you a good monitoring point as well.

He commented in another post that they use pooling so you don't really pay the cold start penalty as often as you'd think so maybe not a issue?

One thing I'm not following how this would work with IAM etc. The power of Lambda to me is that it's also easy to deal with authorization to a whole bunch of AWS services. If I fire off a flame to a worker in a pool and it depends on say accessing DynamoDB, how do I make sure that that unit of work has the right IAM role to do what it needs to do?

Similarly how does authorization/authentication/encryption work between the host and the forked of work? How is this all secured with minimal permissions?

how does authorization between the host and the forked work?

On fly.io you get a private network between machines so comms are already secure. For machines outside of fly.io it’s technically possible to connect them using something like Tailscale, but that isn’t the happy path.

how do I make sure that the unit of work has the right IAM

As shown in the demo, you can customise what gets loaded on boot - I can imagine that you’d use specific creds for services as part of that boot process based on the node’s role.

I'm interested in following this project, but I'm also quite skeptical of how much is being abstracted here.

I still feel like Kubernetes might be the most profitable thing to ever happen to AWS for the same reason--and not because of EKS (their hosted Kubernetes control plane).

Looks like a great integrated take on carving out serverless work. Curious to see how it handles the server parts of serverless like environment variables, db connection counts, etc.

One potential gotcha I'm curious if there is a good story for is if it can guard against code that depends on other processes in the local supervision tree. I'm assuming since it's talking about Ecto inserts it brings over and starts the whole apps supervision tree on the function executor but that may or may not be desired for various reasons.

It starts your whole app, including the whole supervision tree, but you can turn on/off services based on whatever logic you want. I talk a bit about this in the screencast. For example, no need to start the phoenix endpoint (webserver) since we aren't serving web traffic. For the DB pool, you'd set a lower pool size or single connection in your runtime configuration based on the presence of FLAME parent or not.

Oh cool! Thanks for the reply, haven't had time to watch the screencast yet. Looking forward to it.

This is one reason I really don't like US headline casing as enforced by HN - it looks like Serverless, as in the capital-S company, serverless.com, is what's being rethought, not the small-s principle.

(Aside: I wish someone would rethink Serverless, heh.)

I think the casing is not enforced by HN but rather up to the poster?

(Aside: I wish someone would rethink Serverless, heh.)

Not sure if you've checked out https://sst.dev/ but I think they've done precisely that. For example, they have Live Lambda Development which makes local dev a real breeze by significantly shortening feedback loops (no need to push your code up to the cloud and wait for it to deploy)

You can override it as the poster, but you have to edit it back to what you wanted after initially submitting to do so. (And I suppose I don't know if that's intentional or just the way it happens to be.) If you submit 'Foo bar' it will be made to be 'Foo Bar'.

With autoscaling runtimes like Cloud Run isn't this the sorta default?

So: end-user -> app -> expensive_operation -> increase number of instances

rather than:

end-user -> app -> flame -> app_pool -> expensive operation -> scale pool

I guess this isn't 'specific parts' of my code by practically aren't you using the same app image in the pool? You'd have to have ffmpeg available, for example. I'm not sure I see the difference.

There is a similar discussion here: https://news.ycombinator.com/item?id=38544486

TL;DR: you get granular and programmatic scaling, including the ability to scale using specific resources (for example, machines with GPUs for certain workflows).

Wow, this is amazing. Great work.

One could really up a whole Hetzner/OVH server and create a KVM for the workload on the fly!!

WELL, considering the time delay in provisioning on Hetzner/OVH, maybe Equinix Metal would work better? But, if you're provisioning + maybe running some configuration, and speed is a concern, probably using Fly or Hetzner Cloud, etc. still makes sense.

With FLAME, your dev and test runners simply run on the local backend.

Serverless with a good local dev story. Nice!

Totally. One reason I don't like serverless is because the local dev exp is so much worse compared to running a monolith.

FLAME - Fleeting Lambda Application for Modular Execution

Reminds me of 12-factor app (https://12factor.net/) especially "VI. Processes" and "IX. Disposability"

This is incredible. Great work.

Awesome work, let's see how long it takes to get the Kubernetes backend.

Companies should pay 1Mm everytime they use a new acronym

I don't know if I agree with the argument regarding durability vs elastic execution. If I can get both (with a nice API/DX) via something like Temporal (https://github.com/temporalio/temporal), what's the drawback here?

This is really cool - but I would love to simply run an SSI (single system image) cluster with scaling support - a successor to openMOSIX for a post-cloud reality.

https://en.m.wikipedia.org/wiki/OpenMosix

At least shipping processes should generalize to any FFI for most any run-time/language?

https://en.m.wikipedia.org/wiki/OpenMosix

Amazing addition to Elixir for even more scalability options! Love it!

Imagine if you could auto scale simply by wrapping any existing app code in a

function and have that block of code run in a temporary copy of your app.

That's interesting, sounds like what fork does but for serverless. Great work

I used a service years ago that did effectively this. PiCloud were sadly absorbed into Dropbox but before that they had exactly this model of fanning out tasks to workers transparently. They would effectively bundle your code and execute it on a worker.

There’s an example here. You’ll see it’s exactly the same model.

https://github.com/picloud/basic-examples/blob/master/exampl...

I’ve not worked with Elixer but I used Erlang a couple of decades back and it appears BEAM hasn’t changed much (fundamentally). My suspicion is that it’s much better suited for this work since it’s a core part of the design. Still, not a totally free lunch because presumably there a chance the primary process crashes while waiting?

This seems a lot like the “Map” part of map-reduce.

A shapeless version of something like this has been in my head for a very long time. I'm glad someone did the hard work of giving it a shape.

This looks fantastic! At my last gig we had exactly the “nuts” FaaS setup described in the article for generating thumbnails and alternate versions of images and it was a source of unnecessary complexity.