Does it have basic functioning other stuff? I am shocked at how our production usage of Fly has gone. Even basic stuff as support not being able to just... look up internal platform issues. Cryptic/non-existent error messages. I'm not impressed. It feels like it's compelling to those scared of or ignorant of Kubernetes. I thought I was over Kubernetes, but Fly makes me miss it.
I was hoping to migrate to Fly.io and during my testing I found that simple deploys would drop connections for a few seconds during a deploy switch over. Try a `watch -n 2 curl <serviceipv4>` during a deploy to see for yourself (try any one of the the strategies documented including blue-green). I wonder how many people know this?
When I tested it I was hoping for at worst early termination of old connections with no dropped new connections and at best I expected them to gracefully wait for old connections to finish. But nope, just a full downtime switch over every time. But then when you think about the network topology described in their blog posts, you realize theres no way it could've been done correctly to begin with.
It's very rare for me to comment negatively on a service but that fact that this was the case paired with the way support acted like we were crazy when we sent video evidence of it definitely irked me for infrastructure company standards. Wouldn't recommend it outside of toy applications now.
I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)
Have you tried Google Cloud Run(based on KNative) I've never used it in production, but on paper seems to fit the bill.
Yeah we're mostly hosted there now. The cpu/virtualization feels slow but I haven't had time to confirm (we had to offload super small ffmepg operations).
It's in a weird place between heroku and lambda. If your container has a bad startup time like one of our python services, autoscaling can't be used as latency becomes a pain. Its also common deploy services on there that need things like health checks (unlike functions which you assume are alive), this assumes at least 1 instance of sustained use as well, assuming you do minute health checks. Their domain mapping service is also really really bad and can take hours to issue a cert for a domain so you have to be very careful about putting a lb in front of it for hostname migrations.
I don't care right now but the fact that we're paying 5x in compute is starting to bother me a bit. A 8core 16gb 'node' is ~$500/month ($100 on DO) assuming you don't scale to zero (which you probably wont). Plus I'm pretty sure the 8 cores reported isn't a meaty 8 cores.
But its been pretty stable and nice to use otherwise!
A 6c / 12t Dedicated Server with 32GB of ram is 65$ a month with OVH
I do get that it is a bare server, but if you deploy even just bare containers to it, you would be saving a good bit of money and get better performance from it.
Another interpretation is the so-called dedicated servers are too good to be true.
It depends on what the 6 cores are. Like I have a 8C/8T dedicated server sitting in my closet that costs $65 per the number of times you buy it. (Usually once.) The cores are not as fast as the highest-end Epyc cores, however ;)
At the $65/month level for an OVH dedicated server, you get a 6-core CPU from 2018 and a 500Mbps public network limit. Doesnt even seem like that good a deal.
There is also a $63/month option that is significantly worse.
I have yet to gain positive experience with Cloud Run. I have one project with it, and Cloud Run is very unpredictable with autoscaling. Sometimes, it can start spinning up/down containers without any apparent reason, and after hunting Google support for months, they said it is an "expected behavior". Good luck trying to debug this independently because you don't have access to knative logs.
Starting containers on Cloud Run is weirdly slow, and oh boy, how expensive that thing is. I'm getting the impression that pure VMs + Nomad would be a way better option.
What is this about? I assumed a highly throttled cpu or terrible disk performance. A python process that would start in 4 seconds locally could easily take 30 seconds there.
Last I checked, Cloud Run isn't actually running real Linux, it's emulating Linux syscalls.
As a long time Nomad fan (disclaimer: now I work at HashiCorp), I would certainly agree. You lose some on the maintenance side because there's stuff for you to deal with that Google could abstract for you, but the added flexibility is probably worth it.
I just use AWS EC2, load balancer, auto scaling groups. The user_data pulls and runs a docker image. To deploy I do an instance refresh which has no downtime. Obvious downside is more configuration than more managed services.
I have been using Google Cloud Run in production for a few years and have had a very good experience. It has the fastest auto scaler I have ever seen, except only for FaaS, which are not a good option for client-facing web services.
Same experience here, using it for years in production for our critical api services without issues.
Yeah I had a similar experience where I got builds frozen for a couple days, such that I was not able to release any updates. When I emailed their support, I got an auto-response asking me to post in the forum. Pretty much all hosts are expected to offer a ticket system even for their unmanaged services if its a problem on their side. I just moved over all my stuff to Render.com, it's more expensive, but its been reliable so far.
The first (pinned) post in the fly.io forum explains it:
https://community.fly.io/t/fly-io-support-community-vs-email...
That forum post just says what OP said, that they will ignore all tickets from unnmanaged customers. Which is a pretty shitty thing to do to your customers.
You need blackbox HTTP monitoring right now, don't ever wait for your customer to tell you that your service is down.
I use Prometheus (&Grafana), but you can also get a hosted service like Pingdom or whatever.
Been on it 7 months, 0 issues. Feel like you're alone on this potentially.
Alone? Every thread about Fly has complaints about reliability and people complain about it on Twitter too
Every thread on the Internet about any product or service has complaints.
Not to this extent, it has always stood out to me in particular
Actually here is a good example: Cloudflare. Sure people complain a ton about privacy but I haven't seen a single complaint about the reliability of Cloudflare Workers or similar product in the dozens of threads I've seen on HN
That hasn’t been my experience with Fly but I’m sorry to hear it seems to be others :(
It's hard to tell how meaningful the reviews are. I have used AWS, GCP, DigialOcean, and Linode throughout my career. Every single one of these, through no fault of myself or my team, messed up and caused downtime. Like, you can get most SRE types in a room to laugh if you blurt out "us-east-1", because it's known to be so unreliable. And yet, it's where every Fortune 500 puts every service; we laugh about the reliability and it's literally powering the economy just fine.
So yes, a lot of people on HN complain about fly's reliability. fly posts to HN a lot and gives them the opportunity. Is it actually meaningful compared to the alternatives? It's very hard to tell.
To be fair most hosting providers come with plenty of public complaints about downtime. The big ones do way better, the best one is AWS, then GC and last Azure. They cost stupid money though.
Digital ocean has been terrible for me, some regions just go down every month and I lose thousands of requests, increasing my churn rate.
Fly.io had tons of weird issues but it got better in the last months. It's still very incomplete in terms of functionality and figuring out how to deploy the first time is a massive pain.
My plan is to add Hetzner and load balance with bunnycdn across DO and H
ok possibly not alone, maybe the issues happened before I started using them extensively. I've had ~no downtime that affects me in 7 months.
I do wish they had some features I need, but their support and responses are top notch. And I've lost much less hair and time than I would going full-blown AWS or another cloud provider.
https://community.fly.io/t/reliability-its-not-great/11253
Not alone, I’ve been part of two teams who have evaluated fly and hit weird reliability or stability issues, deemed it not ready yet.
Can you email the first two letters of my username at fly.io with more details? I'd love to find out what you've been having trouble with so I can help make the situation better any way I can. Thanks!
Another support.flycombinator.com classic.
Why would you care about customer problems if they don’t embarrass you in public?
/s
the only thing easier than them responding in this thread is someone making this comment in this thread…
Don't worry, a random anime character is going to help you now that it's been brought to the top.
It seems to me that your comment is personally targeting OP and I think that is quite out of line.
Would you rather them be unresponsive?
It's HN -- if the company proved responsive it might invalidate his OP and everyone who band wagons on it.
...as if it's one person who had issues! I thought it was just incompetency. But it now looks like a theatre, pretending now.
I've been a paying Fly.io customer for 3 years now, and for the past 18 months, I've had no real issue with any of my apps. In fact, I don't even monitor our Fly.io servers any more than I monitor S3 buckets; the kind of zero devops I expect from it is already a reality.
Issues specific to an application or one particular account have to be addressed as special cases (like any NewCloud platform, Fly.io has its own idiosyncrasies). The first step anyway is figuring out just what you're dealing with (special v common failure).
I have had the Fly.io CEO do customer service. Some may call it theatre, but this isn't uncommon for smaller upstarts, and indicative of their commitment, if anything.
Yep they have terrible reliability and support. Couldn’t deploy for 2 days once and they actually told me to use another company. Unmanaged dbs masquerading as managed. Random downtime. I could go on but it’s not a production ready service and I moved off of it months ago.
Are you talking about fly postgres? Because I use it and feel they've been pretty clear that it's unmanaged.
Seriously! That's crazy. I need to setup terraform and move to AWS before launching I guess.
huh? it does what it says on the tin. nothing crazy about it.
They spell out for you in detail what they offer: https://fly.io/docs/postgres/getting-started/what-you-should...
And suggest external providers if you need managed postgres: https://fly.io/docs/postgres/getting-started/what-you-should...
I was shocked because I didn't realise it wasn't managed. Even Digital Ocean offer managed Postgres.
If you are offering a service like Fly I think the database should be managed personally, the whole point of Fly.io is to provide abstractions to make production simpler.
Do you think the type of user who is using fly.io is interested in or capable of managing their own Postgres database? I'd rather just trust RDS or another provider.
Honestly.. kinda, yeah
At least I'm projecting my weird "I want to love you for some reason, Fly" plus my skillset onto anyone else that wants to love Fly too haha
They feel very developer/nerd/HN/tinkerer targeted
The header at the top of their Getting Started is "This Is Not Managed Postgres " [1]
and they have a managed offering [2] in private beta now...
[1] https://fly.io/docs/postgres/getting-started/what-you-should...
[2] https://fly.io/docs/reference/supabase/
Unfortunately this is a pretty common story. Half the people I know who adopted Fly migrated off it.
I was very excited about Fly originally, and built an entire orchestrator on top of Fly machines—until they had a multi-day outage where it took days to even get a response.
Kubernetes can be complex, but at least that complexity is (a) controllable and (b) fairly well-trodden.
Fly.io is not comparable to Kubernetes. It’s a bit like comparing AWS to Terraform.
Or to clarify your comment, Kubernetes on which cloud? Amazon? google? Linode?
Kubernetes on AWS, GCP, and Linode are all controllable and well-trodden.
I definitely understand the comparison between Kubernetes and fly. You have couple apps that are totally unrelated, managed by separate teams, and you want to figure out how you can avoid the two teams duplicating effort. One option is to use something like fly.io, where you get a command line you run to build your project and push the binary to a server. Another option is to self-host infrastructure like Kubernetes, and eventually get that down to one command to build and push (or have your CI system do it).
The end result that organizations are aiming for are similar; developers code the code and then the code runs in production. Frankly, a lot of toil and human effort is spent on this task, and everyone is aiming to get it to take less effort. fly.io is an approach. Kubernetes is an approach. Terraform on AWS is an approach.
I switched to Kamal and Hetzner. It's the sweet spot.
I find it amazing how much bad vibes fly.io gets here.
It looks worse than AWS or Azure to me.
Never used the service, but based on what I hear, I'll never try...
I have run several services on Fly for almost a year now, have not had any issues.