return to table of content

Bluesky migrates to single-tenant SQLite

scorpion_farmer
37 replies
15h9m

That looks like the PR from hell - 190 files changed, 143 commits? Mostly with names like "tidy" and "wip"

Props to whoever actually reviewed that, you are a warrior

lrx
31 replies
15h6m

I prefer to read the unified diff and commits don't matter as much.

readline_prompt
17 replies
14h38m

don't know why, but recent teams around me have always made strict rules about number of commits in PRs. I just wanted to tell them the same thing you said: "Why don't you just look at the diffs?" curious for other opinions. (sorry not really about this particular topic)

lolinder
8 replies
14h25m

I prefer to have clear commits that tell a tidy story. For example:

* Refactor function `foo` to accept a second parameter

* Add function `bar`

* Use `bar` and `foo` in component `Baz` to implement feature #X

If you give me a commit history like this, I can easily validate that each step in your claimed process does what you describe.

If you instead give me a messy history and ask me to read the diff, you might know that the change to file `Something.ts` on line 125 was conceptually part of the refactor to `foo`, but I'll have to piece that together myself. It's not obvious to the person who didn't write the code what the purpose of any given change was supposed to be.

This isn't a huge deal if your team's process is such that each step above is a PR on its own, but if your PRs are at the coarseness of a full feature, it's helpful to break down the sub-steps into smaller (but sane and readable) diffs.

krainboltgreene
2 replies
13h41m

Funny that two of your commits don't actually tell us why they exist, one simply describes the diff (which you should never need lol?) and the other proxies that responsibility to some other system.

You could have simply randomized the text in each commit, put the ticket id and the one "why" in the merge commit body and gotten the same end result amount of real information in the end.

lolinder
0 replies
10h25m

The first line of the commit message isn't about including information that couldn't be gleaned from the commit. That can be done in subsequent lines. The first line is for two purposes:

* Priming the reader so they are able to quickly interpret what they're seeing when they open the commit.

* Making it easy to search or scan for a specific change.

The last commit message in my example would probably have included the name of the feature as well as the ticket number, but I couldn't be bothered to invent an actual feature name.

DRY doesn't really apply to technical writing, at least not as extremely as you seem to think it should. Headings are supposed to summarize the contents, and that's what commit messages are: headings.

fijiaarone
0 replies
6h43m

I like to leave comments like this too:

loop i up to n times

break when false

check value returned is not null

88913527
2 replies
14h20m

This is reasonable, but the problem I encounter is how stifling it seems to ask others to structure their work so specifically. By way of comparison, getting compliance on conventional commit messages is a challenge, and that's an appreciably smaller ask than this.

lolinder
1 replies
14h18m

Oh, for sure. This is how I structure my own PRs, but I've certainly never bothered to ask a coworker to do so, I just appreciate it when I see it.

That said, OP is in an environment where it sounds like this kind of structure is already the cultural norm.

eitland
0 replies
10h6m

From another one who tries to do the same (but doesn't enforce it):

Thanks!

dilyevsky
1 replies
12h34m

In the context of Github PR you can’t leave reviews on commits other than what’s currently the tip commit of the pr branch so structuring this way is just wasted effort.

What you should be doing is breaking down PRs more finely so that your unrelated refactors are all separate single-commit PRs. That ofc requires that your pr review round trip time is fast

lolinder
0 replies
10h29m

I'm pretty sure I've left comments on a commit before in a GitHub PR. The comment just goes in the right place in the PR diff, assuming no changes, or comments can actually be attached to commits themselves (which is what happens when a comment becomes stale—it retains a reference to the original commit).

gonzo41
3 replies
14h31m

Commit and push often. Put a novel explaining yourself in the PR. And that's enough IMO.

caskstrength
2 replies
6h49m

Commit and push often. Put a novel explaining yourself in the PR. And that's enough IMO.

Someone reading the git changelog 5 years down the line most likely wouldn't be able to find your "novel" in the PR and definitely won't appreciate if instead of a "novel" you ended up with a "short call" with the assigned reviewer explained what you actually did in your 50 "wip" commits.

moron4hire
1 replies
5h28m

Someone reading 5 year old git logs is lost to begin with.

caskstrength
0 replies
5h15m

When debugging I routinely explore git blame and read the changelog. This sometimes leads to 3, 5 or even 10 years old code. Doesn't mean I'm lost.

ukzwc0
1 replies
14h25m

A good practice is to rebase your commits before creating a PR into a single commit. You are free to commit as many times as you want to while doing your work. This minimizes the noise in the log.

Hamuko
0 replies
4h49m

It's only a good practice if the PR is a single logical change.

recursive
0 replies
14h14m

Easy workaround. Start with feature branch f.

1. Branch f-prime from master. 2. Squash merge f to f-prime. 3. Pull request f-prime to master. 4. Profit.

FPGAhacker
0 replies
14h29m

Squash is our git given right.

_heimdall
8 replies
14h36m

Commit your code and commit it often. There's no reason not to.

nkozyra
2 replies
14h19m

Sure, but then there's nothing wrong with rebasing it and making a nicer story for other people that want to review it.

Diffs are great but sometimes they're just as overwhelming in a huge PR. It's nice to first follow 5-10 commits in chunks of logical change.

dilyevsky
0 replies
12h33m

Dont send huge prs. They are hard/impossible to review anyway with good commit history or not

arein3
0 replies
8h16m

I don't know why people are obsessed with squash merging. I always rebase (when needed) to preserve commit history. It's a good best practice, and makes it easier to spot errors after fixing conflicts.

I suspect squashers use the wrong tools. Use source tree, or, if you are on linux, smartgit. You can see a detailed log, which makes it much easier.

JammyDodge
1 replies
7h50m

Sure, commit often while you're working.

But then when you're done, turn it into a series of patches for a reviewer to read. In the words of Greg Kroah-Hartman, "pretend I'm your math teacher and show your working".

In a maths assignment, you spend ages making a big mess on a scrap of paper. Then when you've got the solution, you list the steps nice and clearly for the teacher as if you got it right first time. In software development, if you're not a dick, you do the same. You make a big old mess with loads of commits, then when you're done and it's review time, you turn it into a series of tidy commits that are easy for someone to review one-by-one.

caskstrength
0 replies
6h41m

Why on Earth did people flag this? Indeed, you won't have a good time sending series of 50 "wip" commits to any kernel mailing list. Having a good split with proper commit messages and cover letter will both make your code much easier to understand for current reviewers and any future "code archeologist" who will have to fix bug in that code 10 years down the line.

Am I living in a bubble and all the glorified 500k TC FAANG devs from HN really routinely submit a changes consisting of a tangled mess of 50 "wip" commits for their code review without any repercussions?

lolinder
0 replies
14h30m

Commit and commit often, but then clean up the history into discrete, readable chunks.

If your PRs are tiny it's not a big deal, but with 190 files changed in this one, it absolutely should have been rebased into a more reasonable commit history.

h0l0cube
0 replies
14h28m

Also continuously integrate (from trunk) if you want to hit that moving target sooner.

aantix
0 replies
14h24m

Unless you’d like to maintain your train of thought.

I don’t want to interrupt my flow with intermediary commits.

MenhirMike
1 replies
14h25m

Same. Do whatever you want in your feature branch, what matters is the Files list and the description in the PR. The whole thing gets squashed into a single commit anyway (which also makes reverting much easier).

eitland
0 replies
10h9m

Reverts are also easy even if one merges the whole branch. Just revert the merge commit.

I almost never look at them, but once in a while it is really great to see the thought process that led to something.

throwaway892238
0 replies
14h10m

I don't think any method is gonna make it easy to grok 3,336 added lines and 5,421 removed

tehlike
0 replies
14h41m

This is the answer.

4death4
1 replies
14h41m

What if it was 190 files changed in 1 commit, would that make a difference?

rgoulter
0 replies
13h57m

It might.

With commits like "typo", you might as well squash these into the commit which introduced the typo in the changeset.

If there are changes across many files, and the changes were made automatically with some search-and-replace (or some refactoring tool).. by having a commit that's only that automatic change, it's easy to look at that commit and tell what the changes were. -- Presumably, non-automatic changes are going to be smaller.

I guess roughly, if it makes sense to apply a changeset that changes 5 things, you'd want 5 commits. Having commits like "typo" means there are more commits; but squashing those 5 things together makes it harder to discern the granular change.

scientaster2
0 replies
14h25m

lgtm

numbsafari
0 replies
15h6m

Props to whoever actually reviewed that, you are a warrior

Or a ghost.

jakebsky
0 replies
14h41m

Those two work very closely together, so probably not as nightmarish as it may appear to an observer. But, the two of them are most certainly warriors.

PrivateButts
29 replies
12h57m

I've got a bunch of invites if folks want them:

bsky-social-etdu7-njigu

bsky-social-2ktcs-uwoxg

bsky-social-6f5nh-36gnq

bsky-social-ciwro-3gzk5

bsky-social-y4h57-dxh3g

thegeekpirate
24 replies
12h53m

Grabbed bsky-social-6f5nh-36gnq, thanks!

pests
23 replies
12h16m

Damn, they all gone.

ochronus
17 replies
11h55m

There you go folks:

bsky-social-h3d4w-u6yn4

bsky-social-74bqi-vkmcq

bsky-social-n3fdq-46nxz

bsky-social-yippe-32vdr

bsky-social-l2fbt-xnscx

shmde
8 replies
11h42m

Gone in 60 seconds.

arthurcolle
7 replies
11h17m

Any more?

tymofiy
6 replies
11h6m

bsky-social-lbjkg-gcxs4

bsky-social-zigwm-f3qpq

bsky-social-2jlu7-apy5a

bsky-social-6ct52-4egmz

bsky-social-cy64m-53sqn

hoseja
4 replies
9h57m

Maybe if you stop posting them with the easily-greppable first part they won't be so easy to scrape.

numpad0
2 replies
8h20m

No one seems to be taking two codes I'm putting up without the prefix for ~hour, this is likely the case

e: second one now used, first still up

e: both used

greyface-
1 replies
7h49m

I wonder what they're being used for. The UI doesn't expose it, but the Bluesky API will tell you who redeemed your invites. Open the site, watch for a "com.atproto.server.getAccountInviteCodes" request in your browser's network inspector, look in the "usedBy" field in the response JSON, and append the DID value there onto "https://bsky.app/profile/". Any commenters in the parent chain who got scraped want to take a look?

numpad0
0 replies
7h37m

I get "$username joined using your invite code!" in notification tab that leads to the user profile. So far the user hasn't done anything.

pests
0 replies
8h46m

grep "bsky-" internet.txt

da25
0 replies
10h37m

Ya'll got anymore of that.

arthurcolle
4 replies
11h17m

Any more?

CtrlAlt
3 replies
11h5m

Some more for y'all!

bsky-social-ge2mz-mfmpi

bsky-social-hykwa-x3ox4

bsky-social-gh4mt-2od6p

bsky-social-dejzy-mmcxf

edit: all gone :(

pixelat3d
2 replies
10h54m

Snagged bsky-social-hykwa-x3ox4

ochronus
1 replies
6h43m

4 more: (prepend bsky-social-)

7poji-p36pm

irn4h-ncvic

2hb2e-xhxnb

2k4na-5qiqu

ochronus
0 replies
6h33m

And they are gone.

chocolatkey
2 replies
10h55m

Either people were really prepared for these codes to appear, or they are being scraped. Regardless, they're all gone

numpad0
1 replies
10h2m

It seems a temporary, anonymous, private, receive-only dropbox(not the USB drive replacement kind) on the Internet is an unsolved problem. It doesn't have to be completely out-of-band like email, could be just an encrypted public reply by `cat | base64 -d | openssl rsautl -decrypt -inkey temp.key`, so long up to few bits(70 in this instance) of encrypted content would be allowed on a platform.

ulucs
0 replies
9h49m

piracy websites were using base64 encoding for this purpose a while ago, but now it seems they moved on to a proprietary algorithm

numpad0
4 replies
11h13m

e: burner email didn't work, sorry

e4: check out dns on [my username].com

WinstonSmith84
2 replies
7h28m

can't believe it's gone, too many smart people on this website lol

numpad0
1 replies
3h8m

Not called hn for nothing! Glad I'm absolute bottomest on the floor in terms of intelligence or ability here

pests
0 replies
1h10m

Nope that might be me. I started this comment thread and I didn't even manage to snag one despite getting direct replies multiple times with codes.

ldeso
0 replies
7h22m

Thanks a lot!

skybrian
0 replies
12h12m

These seem to be gone.

scudsworth
0 replies
4h14m

to the people doing this: your codes will most likely be instantly stolen by bots and not real people

okanesen
0 replies
10h37m

The codes are all gone. That was fast.

E: Happy to take one, if somebody happens to have a spare one left. Email is in my bio.

da25
0 replies
10h41m

they're all exhausted now :(

xrd
25 replies
15h46m

Is Bluesky still invite only?

nvy
18 replies
15h43m

Yes. I have invite codes if you would like one. Email in my profile.

Edit: they're all gone!

JosephK
11 replies
14h26m

Is your offer invite code available to other randoms like myself? I tried to register on bsky months ago and still haven't been approved.

defatigable
9 replies
14h12m

I have some extra if you'd like one. Let me know how to get it to you and I will.

duvara
4 replies
13h10m

Would also like one if you have an extra. Thx in advance. (Click on username to see my email in profile)

Pi9h
2 replies
12h18m

Sent you a code.

leviathan
0 replies
5h10m

I’d also like an invite if anyone still reading has any.

da25
0 replies
6h18m

I'd love to have one if you have any left. (email in my bio)

duvara
0 replies
11h52m

Got an invite from another user. Thx!!!

azemetre
2 replies
14h7m

If you're offering I'd love one. My email is my username on hackernews at gmail.com

Pi9h
1 replies
13h43m

Sent you an invite code.

AnonC
0 replies
11h17m

Could you send me one? Email in my profile. Thanks.

pests
0 replies
10h14m

I'm still trying to get one if anyone see's this. Keep missing the ones posted. Email in profile.

Thanks.

nvy
0 replies
13h52m

I'm happy to give them to any HN user but I'm afraid I have only three left and there are three emails in my inbox asking for invites, so if one is you then congratulations! Otherwise, sorry.

lucb1e
1 replies
15h31m

Was browsing around your website (mentioned in profile), noticed https://0x85.org/contact.html only mentions Twitter and email. Maybe the bluesky omission is intentional, but probably it just hasn't been updated yet? I'm not on bsky myself, currently having fun on mastodon and I'm not familiar with bsky enough to know what I'm missing out on, but for other folks I figured I'd mention it

nvy
0 replies
15h17m

Hey thanks. It's just outdated, what with young kids and grad school. Appreciate the note.

xarope
0 replies
11h16m

if anyone still has any codes, please DM me one, email in my profile. Thanks muchly

panja
0 replies
12h22m

I'll also add my 4 invite codes if anyone wants them

EDIT: I'm fresh out for now, sorry!

kin3tik
0 replies
11h22m

I'd also love a code if anyone has any to spare (email in bio)

eitland
0 replies
10h13m

If anyone else still has invites I am also interested.

My mail is at the bottom of my bio.

Hamuko
3 replies
4h51m

I still have a single invite left.

xrd
2 replies
3h45m

I'll take it if you still have it? chris@extrastatic.com.

Hamuko
1 replies
3h31m

Sent.

xrd
0 replies
3h24m

Thanks, and really enjoyed your blog while searching for your email! :)

jakebsky
1 replies
15h41m

It is, but not as a "growth hack" or anything. It's just a way of limiting growth while the system is scaled (in terms of the backend and abuse prevention).

There's a dedicated waitlist for developers that will get you access quite quickly: https://atproto.com/blog/call-for-developers

danShumway
0 replies
1h46m

It's pretty hard not to see it as a growth hack given that posts can't even be viewed without an account. That seems pretty transparently to be a system to create a feeling of FOMO/exclusivity, to make it so that you don't only need an account to participate, you need an account to even see what the network is or to follow anyone on it at all.

As a comparison, Cohost limited account setup when it launched as a way to limit growth. But it didn't lock viewing the entire site behind an account requirement because... come on. What does that have to do with scaling, we all know why that restriction is there :)

To be fair, it seems to be working. Needing to seek out and find invite codes means that signups are more visible -- signup codes get shared over social media and that means mentioning Bluesky publicly and keeping it in people's minds. It also forces people to ask publicly about access, which makes the network feel more exclusive and turns every signup or expression of interest into an advertisement for the network. It's a good marketing strategy, and I suspect that a nontrivial portion of Bluesky's current buzz comes from that marketing strategy, so I can understand why it hasn't been abandoned yet. I mean, look at the current thread; if people didn't need to coordinate publicly on HN to get access then this subthread wouldn't exist and then there wouldn't be a public thread where a bunch of people express interest in trying out the network -- and that publicly expressed interest in this very subthread makes Bluesky feel more in-demand.

In fact, this is such an effective marketing strategy that I've seen Bluesky users complain that invite codes are too common now and that their invite codes aren't in as much demand as they used to be. That FOMO loop is so powerful that it's even affecting the people who already have access to the network who enjoyed the feeling of being in control of an artificially scarce resource.

But sure, all of this is definitely not a growth hack, I believe you ;)

Regardless of whether it's good marketing, the account requirements make the platform a lot less relevant in any serious discussions about the direction of social media, because despite its plans for the future for federation and access, what Bluesky is today is a platform that is in practice even more locked down than Twitter is.

paxys
19 replies
13h36m

Cool, but maybe let people actually use your service before everyone forgets what it is?

nyx2d
18 replies
12h15m

They have over 1.8 million users currently, or do you mean PDSes specifically? Federation is in open beta on a test network, you can try it out today if you'd like.

Tomte
11 replies
10h32m

I have been on the wait list since they launched. They seem to mostly rely on invites.

felixthehat
4 replies
9h37m

  bsky-social-scbch-eolha
  bsky-social-fs26y-d6gnv
  bsky-social-2lx5u-ntrdv
  bsky-social-hboq7-dyuue
  bsky-social-b2v3f-3a23q
WinstonSmith84
2 replies
7h45m

damn, seems already all gone

garblegarble
1 replies
7h30m

    bsky-social-lkzsp-7x7ja
    bsky-social-p4vwr-nrthu
    bsky-social-bdu6c-6tbv4
    bsky-social-fkpgk-oestw
WinstonSmith84
0 replies
7h23m

Got one, thank you sir!

brylie
0 replies
9h30m

Thank you so very much! :-)

pnathan
1 replies
7h18m

I have a few invites. Email me and I'll pass them out. :)

pnathan
0 replies
59m

all gone

FlyingSnake
1 replies
9h43m

I’ve some invites lying around. DM me if you want one.

bugsmith
0 replies
8h14m

I'd like to take you up on that, if you still have one going?

soperj
0 replies
10h23m

It shouldn't be too difficult to find an invite? They hand them out pretty frequently.

BiteCode_dev
0 replies
7h37m

Got a few invites, DM me on twitter, substack or masto if you want them (listing on https://bitecode.dev)

raverbashing
4 replies
12h8m

Come on, "over 1.8 million users" is not an impressive number

These kind of movements makes me think they're not serious about scaling up. Wouldn't surprise me if then end up as an also-ran

pmontra
3 replies
10h20m

Maybe not impressive but none of the services of my customers had or has 1.8 million users. And yet they do well (my customers.)

rapnie
2 replies
9h9m

That. And none of the big social media platforms were big at the start either.

raverbashing
1 replies
8h53m

Yes but 1.8Mi at a time where people are longing for a Twitter alternative is just leaving money on the table

pmontra
0 replies
6h56m

There are two usual strategies for growing: 1) low cost, organic and slow or 2) high cost, throw a lot of money at advertisement, saturate all media, grow quickly or bust.

The exceptions are those rare products that despite a low cost marketing sell themselves so well that their organic growth is fast and in a few months everybody use them.

Maybe Bluesky don't have the money to advertise or is not compelling enough. As one data point: I know about Mastodon but I think that I learned about Bluesky only today. I went to their site and there is nothing to explain how it works except that it's some social thing. I learned more by reading the comments here. Apparently it's being marketed at a very low cost.

troupo
0 replies
9h41m

They have over 1.8 million users currently

How many of them active?

winrid
12 replies
13h34m

Why sha256 hash the user into to get a two character target directory? Wouldn't md5 be much faster and solve the same problem?

paulryanrogers
3 replies
13h22m

At their scale maybe they're worried about collisions?

Or, like me, they're drowning in security tooling from corporate and don't want to have to carve out exceptions for md5 usage in each.

RaisingSpear
1 replies
8h31m

At their scale maybe they're worried about collisions?

With their scheme, collisions are already guaranteed to happen if they have >256 users.

fodkodrasz
0 replies
3h28m

I guess parent meant abusable non-uniform distribution of collisions (they have collisions anyway as the take only the first two characters according to GP comment)

winrid
0 replies
3h35m

It could be they didn't want to explain the md5 usage, yeah. But that's kinda nuts if they do this every query.

wmf
2 replies
13h19m

It's probably not healthy to have broken cryptographic hashes running around. If you don't need a secure hash there are plenty of fast non-cryptographic hashes.

winrid
1 replies
3h39m

There's nothing about security here. By this logic you should probably stop using hashmaps, then? :)

sophacles
0 replies
2h25m

That's literally not their logic.

They said:

if you need security don't use md5.

If you don't need security, use something faster than md5.

md5 is neither secure nor fast, why use it at all?

alex_suzuki
2 replies
11h38m

This is probably not about collisions but about filesystem limitations (max number of files in a directory).

muttled
0 replies
1h6m

I've done something similar and that's absolutely what it was. I'm no pro, knew I wasn't doing it the right way, but it was for a personal side project and Windows starts to get weird when you have a million files in a single directory.

fodkodrasz
0 replies
3h27m

having a good hash uniformly distribute content helps scaling (by sharding of data)

moreati
1 replies
9h29m

At a guess: that hash is performed relatively few times, so any performance difference is lost in the noise floor. Never having to answer "why did you use this insecure hash" or eliminating/minimising any possibility of a class of security problem is worth more.

winrid
0 replies
3h37m

This has nothing to do with security. It's just wasted CPU. I imagine you have to do this every time you make a query to lookup the users DB?

Security is not a concern here. It's just literally bucketing ids. Also, this is not needed with modern file systems.

lucasyvas
11 replies
15h25m

Love SQLite - in general there are many challenges with a schema or database per tenant setup of any kind though. Consider the luxury of row-level security in a shared instance where your migration either works or rolls back. Not now! If you are doing a data migration and failed to account for some unexpected data, now you have people on different schema versions until you figure it out. Now, yes, if you are at sharding scale this may occur anyway, but consider that before you hit that point, a single database is easiest.

You will possibly want to combine the data for some reason in the future as well. Or, move ownership of resources atomically.

I'm not opposed to this setup at all and it does have its place. But we are running away from schema-per-tenant setup at warp speed at work. There are so many issues if you don't invest in it properly and I don't think many are prepared when they initially have the idea.

The funny thing is that about a decade ago, the app was born on a SQLite per tenant setup, then it moved to schema per tenant on Postgres, now it's finally moving to a single schema with RLS. So, the exact opposite progression.

viraptor
4 replies
9h32m

If you are doing a data migration and failed to account for some unexpected data, now you have people on different schema versions until you figure it out.

That shouldn't be a big issue. Any service large/complex enough to care does the schema upgrades in phases, so it's 1. Make code future compatible. 2. Migrate data. 3. Remove old schema support.

So typically it should be safe to run between steps 1 and 2 for a long time. (Modulo new bugs of course) As an ops-y person I'm comfortable with the system running mid-migration as long as the steps as described are used.

gigatexal
3 replies
7h21m

That shouldn't be a big issue. Any service large/complex enough to care does the schema upgrades in phases, so it's 1. Make code future compatible. 2. Migrate data. 3. Remove old schema support.

Exactly this, schema migrations should be an append, deprecate, drop operation over time.

rockwotj
2 replies
5h28m

I wish there were ways to enforce this on the db so you never accidentally grabbed a table lock during these operations.

definitely have shot myself in the foot with postgres on this

ddorian43
1 replies
4h31m

I wish there were ways to enforce this on the db so you never accidentally grabbed a table lock during these operations.

You can use a linter for PostgreSQL migrations https://squawkhq.com/

gigatexal
0 replies
1h50m

And squitch is a wonderful Perl tool for this as well

starcraft2wol
1 replies
15h19m

now you have people on different schema versions until you figure it out.

That can be a good thing if your product has say < 100 customers. As each might have different upgrade timelines and needs. I even know of business like this who do custom work for some so they essentially aren’t even running the same code (gasp).

I guess it depends on the business structure.

lucasyvas
0 replies
15h14m

Totally correct. But not a good thing in our case!

ngrilly
0 replies
7h57m

The funny thing is that about a decade ago, the app was born on a SQLite per tenant setup, then it moved to schema per tenant on Postgres, now it's finally moving to a single schema with RLS.

To be fair, RLS was not available yet a decade ago :) It appeared in PostgreSQL 9.5 in 2016.

gigatexal
0 replies
7h22m

If you are doing a data migration and failed to account for some unexpected data, now you have people on different schema versions until you figure it out. Now, yes, if you are at sharding scale this may occur anyway, but consider that before you hit that point, a single database is easiest.

This can be accounted for and handled. Though if schema issues are enough of a scare I wonder if a documentdb style embedable database like a couch/pouchdb might make more sense.

dilyevsky
0 replies
13h18m

I dont know - I have experience working with monster DBs in production and never again. Under large enough load every change becomes risky because you can’t test performance corner cases fully. Having a free-tier user take out your prod because they found a non-indexed code path is also classic

malkia
9 replies
15h40m

What do they mean by "Since SQLite does not support concurrent transactions" - it supports them, as long as you don't access the .db file through a file share (UNC, or NFS, etc) - https://www.sqlite.org/wal.html

I've been using this to update/read db from multiple threads/processes on the same machine. You can also do snapshotting with the sqlite backup API, if you want consistent view, and to not hold on transaction (or copy in-memory).

But maybe I'm missing something here... Also haven't touched sqlite in years, so not sure...

sinkwool
2 replies
15h22m

Writers merely append new content to the end of the WAL file. Because writers do nothing that would interfere with the actions of readers, writers and readers can run at the same time. However, since there is only one WAL file, there can only be one writer at a time.

I think the OP meant that updates have to run sequentially.

starcraft2wol
0 replies
15h16m

Which just means the lock happens at user scope in this case instead of per table or row. This limitation still causes so much confusion when it’s a completely reasonable design.

mikrotikker
0 replies
10h51m

I've been importing data to sqlite databases running being actively written to for years. Just throws exception if the database is locked and I retry. Do 10k row batches, with a small sleep between. No issues. Helps if your use case doesn't really care about data being in order I guess.

stefanos82
0 replies
6h2m

Have patience and eventually hctree [1] will become stable and will be offered to us to choose between its traditional backend mechanism and the newly implemented to support concurrency!

[1] https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html

manmal
0 replies
10h22m

Did you disable auto checkpointing? Wouldn’t checkpointing result in potential corruption or at least data loss if two processes do that simultaneously? Or is that scenario exhaustively prevented with a lock file?

malkia
0 replies
9h53m

Nope! I was mistaken - it's really multiple readers, single writer - Probably I was assuming things the whole time, and did not spent enough time thoroughly checking - granted most of the db's I've done with sqlite were more about reading than writing.

So stand corrected!

dilyevsky
0 replies
14h40m

Probably means there’s still no row-level locking at least last I checked, very limited table-level locking. The writer still grabs a lock on entire db per docs

anaisbetts
0 replies
1h38m

This is the tradeoff of SQLite, it is extremely fast, as long as you only mostly have one user. With WAL you can get multiple readers, but it doesn't scale the same way that e.g. PostgreSQL does

abbbi
0 replies
5h56m

it works if there is low traffic, as soon as you got bigger transactions or the amount of concurrent writes becomes heavier, you will at some point (even with WAL enabled) get "database locked" issues. You can work around that on application level to a certain point, but in general, if you are at that point, you should really consider using another database backend.

focom
9 replies
16h21m

I am curious, does the HN folks know if bluesky is more active than nostr or the mastodon network?

muglug
3 replies
16h2m

Less active than Mastodon, I'd assume more active than Nostr.

But the interesting thing for me isn't activity — it's the people on there.

Of the cohort who had >100k followers on Twitter, I think more of them post regularly on Bluesky than post on Mastodon. Bluesky definitely has a more cohesive feel, especially because there's currently just one instance & mod team.

spookie
1 replies
14h25m

Mastodon, and the Fediverse in general, make user interaction decisions on purpose to limit many of the issues common to social media. Think about: mob culture, addiction, and the like.

I wonder if BlueSky intends to follow on those. For example, hiding user actions counts (repeats, favourites, etc...) until the user acts on one.

Things like these may be strange for those accustumed to Twitter, but personally, is what makes me stick with smaller instances on the Fediverse.

indigochill
0 replies
8h35m

My bet is regardless of any initial good intentions, since BlueSky is a company, market pressures will inevitably force them into dark patterns like we see on every other commercial social network (going back to the early days of the companies, Facebook, Twitter, and even Google looked really good early on until all were corrupted by profit motive). My belief is that the profit motive is necessarily at odds with free communication.

To me, the ActivityPub network (Mastodon and friends) is relatively unique in the social media space in having no direct commercial pressures (the protocol is developed by W3C) and therefore being inoculated against the causes for these dark patterns.

StevenXC
0 replies
14h23m

I'm a donating supporter of the Mathstodon.xyz instance, but (sadly?) most of "math Twitter", at least the education-focused university faculty, ended up on BlueSky. I think there's a strong appeal for "a straight forward Twitter clone without Musk" for a lot of people.

steeleduncan
0 replies
8h34m

I don't know about nostr, but I find it is a lot less active than Mastodon. In general the tech accounts I am interested in have moved to Mastodon rather than bluesky. I imagine this would depend on whose activity you are interested in, and where they have chosen to migrate to

rsynnott
0 replies
1h53m

Bluesky is much smaller than Mastodon, but how active it feels will depend on who you're following. It also has an Algorithm (TM); I never really missed this when I went from Twitter to Mastodon as I mostly used the linear timeline anyway, but I gather that some people find that Mastodon feels empty/inactive without one.

nvy
0 replies
15h51m

Bluesky is pretty positive and definitely lacks the "American Suburbia HOA" energy that some Mastodon instances have.

It's pretty active during North American hours.

nemo44x
0 replies
14h16m

They’re all just arbitrary ghettos that aren’t dissimilar to each other. None of them matter in terms of influence but are like nice Reddit boards for certain interests.

BryantD
0 replies
16h13m

Less active than Mastodon, more active than nostr.

https://vqv.app/stats/chart is useful for Bluesky and draws on the Bluesky firehose for data. https://stats.nostr.band/ seems useful for nostr.

anonyfox
7 replies
8h30m

Slightly related: is Bluesky moderated good enough or do I get lots of rightwing and conspiracy crap like on twitter currently?

I‘d really love to have some more civilized hub again that isn’t full of hate and anti-intellectualism.

_heimdall
2 replies
4h47m

Are you assuming that hate and anti-intellectualism are exclusively a rightwing thing?

rsynnott
0 replies
1h57m

Not exclusively, but on the mainstream internet in 2023? Yeah, more or less, bar a few tankies.

rchaud
0 replies
2h50m

On Twitter, the place that hired Tucker Carlson after Fox News dumped him? Yeah it is. No need for "both sides"-ing on this one.

vehemenz
0 replies
5h10m

It's anti-intellectual and uncivilized, but not because of rightwing conspiracy content. There is a strong culture of intolerance and censorship of viewpoints that diverge from the norm.

rsynnott
0 replies
1h58m

It seems a lot nicer than Twitter. Though I'd wonder how much of that is just that it's invite-only right now. I haven't really gotten into it, for a variety of reasons (happy enough with Mastodon for most stuff, no decent client apps, vaguely suspicious of the involvement of Dorsey) but it seems... fine?

nonethewiser
0 replies
4h13m

Get out of your bubble

elAhmo
0 replies
8h27m

I think it is still too small and people seem quite nice there. But, that has its drawbacks, as I keep returning to Twitter due to the slow migration in the recent months.

It is a shame, as it seems like a nice alternative that has some cool ideas.

lukevp
6 replies
16h20m

Interesting... I like the strategy of having each user be 1:1 with a DB. What would be done for data that needs to be aggregated across users though? If I'm subscribed to another user and they post, how does my DB get updated with that new post? Or is this meant just for durable data and not feed data (like profile data, which users are followed / not followed / etc.) and all the interactive stuff happens separately?

I like that "connection pooling" is just limiting the number of open handles in a LRU cache. It's also interesting because instead of having to manage concurrency at the connection level, it handles it at the tenancy level since each DB connection is single-threaded. You could build up per-DB rate limiting on top of this pretty easily to prevent abuse by a given user.

Is there a straightforward way to set up Litestream to handle any arbitrary number of DBs?

wmf
5 replies
16h6m
Retr0id
3 replies
15h36m

To summarise the relevant details, the "AppView" service is responsible for the sorts of queries that aggregate across users, and that has its own database setup - I think postgres but I'm not 100% sure on that.

jakebsky
2 replies
15h16m

You're right, as usual. AppView is on a Postgres cluster with read replicas doing timeline generation (and other things) on-demand. We're in the process of moving it toward a beefy ScyllaDB cluster designed around a fanout-on-write system.

The v1 backend system was optimized for rapid development and served us well. The v2 backend will be somewhat less flexible (no joins!) but is designed for much higher scale.

manmal
1 replies
10h10m

Does the BGS pull all the tenant‘s individual SQLite data? Or do the PDS push new posts to the BGS?

jakebsky
0 replies
1h50m

The BGS (which is an atproto "relay" service) subscribes to all PDS event streams on the entire network, and aggregates and relays them.

This way it's possible to get all network data from a single place (the BGS) rather than having to connect to every PDS, which is simpler for consumers and dramatically reduces the workload of PDS hosts.

Some details about event streams here, although the APIs are still evolving: https://atproto.com/specs/event-stream

rapnie
0 replies
9h5m

The BGS handles "big-world" networking. It crawls the network, gathering as much data as it can, and outputs it in one big stream for other services to use. It’s analogous to a firehose provider or a super-powered relay node.

"Big-world" networking by Big Tech-to-be Bluesky with super-powers, I wonder? Is this BGS also going to be federated, or is that the big centralized beating heart of this platform managed exclusively by BS?

pvg
4 replies
16h24m

This seems like a very misleading title, the Bluesky PDS is the meant-for-selfhosting thing they distribute, not the bluesky service as experienced and used by most of its users.

wmf
1 replies
16h2m

AFAIK there's only one version of the software so "the service" runs the same thing that you self-host. SQLite seems like it will simplify the single-user case though.

jakebsky
0 replies
15h48m

That's right. This is the same code Bluesky is running on our new PDS hosts. It's all open source.

The main motivation in moving from a big central Postgres cluster to single tenant SQLite databases is to make hosting users much more efficient, inexpensive, and operationally simpler.

But it's also part of the plan to run regional PDS hosts near users, increasing performance by decreasing end-to-end latency.

The most experimental part of this setup is using Litestream to replicate these many SQLite databases (there are almost 2 million user repositories) to cloud storage. But we're not relying on this alone, we're also going to maintain standard SQLite ".backup" snapshots as well.

whyrusleeping
0 replies
15h53m

no this actually is moving every single user currently on the service into this setup. Everyone gets their own sqlite under the hood.

typical182
0 replies
15h42m

The “Personal” in PDS doesn’t mean it is only for self-hosting.

Bluesky has a main PDS instance at https://bsky.social that serves almost all of the Bluesky user base.

There is a good overview of the architecture here:

https://blueskyweb.xyz/blog/5-5-2023-federation-architecture

Here’s a snippet from the protocol roadmap they published 3-4 weeks ago [1]:

Multiple PDS instances

The Bluesky PDS (bsky.social) is currently a monolithic PostgreSQL database with over a million hosted repositories. We will be splitting accounts across multiple instances, using the protocol itself to help with scaling.

[1] https://atproto.com/blog/2023-protocol-roadmap

PurpleRamen
4 replies
4h22m

On the surface, this looks like the worst combined with the awful. I hope someone will make a good article with some hard numbers to explain the benefits and analyze the assumed flaws, because this could be something really, fascinating to learn about.

barnabee
3 replies
3h59m

Can you explain why this looks like the "worst combined with the awful" to you?

To me, on the surface, particularly assuming you are building a distributed system to be run and deployed by many users, some of which are not professional sysadmins (which I believe is likely to be a goal here, and should be), this seems like quite a sane choice. I'd definitely expect a design goal to be avoiding the need to setup/configure/look after any additional database or other servers.

PurpleRamen
2 replies
1h37m

This looks like someone is building their own filebased database-system, in typescript, while still using mature features of database-servers. So instead of trusting the optimized, regularly maintained and battletested solution, they build something by themselves. This smells ugly, like something that will scale poor in performance, and will have security and tooling-problems.

Simplification of installation seems not like a good enough reason to trust your whole backend on this. Installing and maintaining a database-server is not that hard today. This is well established and documented, unlike this. But I also don't know enough about this app, maybe this is just one of several options, meant for a specific usecase? Using this in a standalone desktop-app would make sense, while still offering a mature sql-backend available for server-installations.

muttled
0 replies
1h9m

I'm not a professional coder, only side-projects. Never formally taught. I looked at the solution and thought it kinda sounds like something I'd come up with. Like when I didn't know how to use data tables and would hold data in an array of arrays to form the rows and columns. Somewhat clever, "works", but would probably make my professional coder friends vomit if I explained it to them.

jakebsky
0 replies
13m

Using SQLite is most certainly not "building their own filebased database-system"

SQLite is just about as mature and well-tested as it gets in the entire world of software: https://www.sqlite.org/testing.html

Each users' data is naturally partitioned at the atproto repository level, so this is the sweet spot for per-user SQLite databases. It would make total sense for a PDS instance to have just a single user on it, and in fact that is likely for many self-hosters. It's also worth noting that the PDS software already had SQLite support, which made this change somewhat easier.

There are legitimte trade-offs to this kind of a system but it comes out way ahead in this case, and it's not as wild as it may seem to those not familiar with the power of SQLite.

A major consideration is that we're planning to run at least 100+ instances, which would require operating 100+ high availability (primary+replica) Postgres clusters. This would be a huge amount of operational and financial overhead.

We chose this route because it is better for us as a small team, with relatively limited resources. But it also has the property of being much easier for self-hosters, of which we hope there will be many.

victorbjorklund
3 replies
8h43m

Can someone that knows more about bluesky explain what data is stored in sqlite and not? Because i assume it isnt messages etc between users.

williamcotton
0 replies
7h49m

I assume that messages between users are stored in those SQLite DBs.

Think email. When you send an email and CC five other people as well then seven people now have the same copy of the email stored on their email servers. That is, there’s no central database that contains a single email that is referenced by others.

This is basically how sharding with relational DBs works as well.

This sort of data denormalization is almost a requirement as applications scale and especially for many-to-many applications that have a high write to read ratio.

Low write to read and you can get away with a single master to many slave relational DB architecture for quite astonishing numbers of requests and data!

Hamuko
0 replies
4h52m

By messages, do you mean direct messages (private messages between two parties)? Because Bluesky doesn't have those at the moment. There's only public messages broadcast to the world.

Haven't done any research to determine if there are plans for direct messages.

BigTuna
0 replies
3h37m

It's all your posts and replies as a user. While they currently host the only* PDS themselves, the end goal is for every end user to have their own PDS. Inrupt/SOLID calls this concept a "pod".

*(actually they just onboarded a second production PDS yesterday.. progress!)

rapnie
3 replies
6h34m

Will the BGS also be federated, or is that to be the centralized big spider in Bluesky's web?

fiatjaf
1 replies
6h29m

In theory you can migrate between BGSes, but you can always just use one at any point in time.

In practice no one will switch because it makes no sense to do it. If there happen to ever be more than one real BGS contender, it will be from something like Cloudflare that will just replicate everything Bluesky Inc decides.

rapnie
0 replies
6h16m

I don't know if it does not make sense. AFAIU these BGSes could be special-purposed e.g. for a business, community or topic of interest. Why wouldn't it make sense to synchronise the collected data between these BGSes and get a combined view on the data? With just a single BGS we have another centralized big tech platform. I think decentralized BGSes are a major factor in how interested people are in becoming part of the ecosystem.

jakebsky
0 replies
1h42m

The BGS is a "dumb" relay and mirror of the network, so it generally shouldn't matter which one your client app is ultimately sourcing data from.

But yes, anyone is free to operate a BGS. It does necessarily require a non-trivial amount of storage, compute, and bandwidth. A funded startup, well-funded non-profit, or any just about any cloud provider could likely afford to run one.

It's also entirely possible to operate a BGS that only mirrors a slice of the network (for instance, only users in one country) if desired, which could in some cases make it affordable for a single user or small coop to operate.

mythz
3 replies
12h12m

Always happy to see more server SQLite/Litestream adoption which we've also been using to build our new Apps with.

SQLite + Litestream is an even greater choice for tenant databases, that's vastly cheaper to replicate/backup to S3/R2 than expensive cloud managed databases [1] (up to 3900% cheaper vs SQLServer on Azure).

[1] https://docs.servicestack.net/ormlite/litestream

patates
2 replies
4h38m

What does 3900% cheaper mean? I don't get it.

vizzah
1 replies
4h33m

yeah.. a weird way to say 39 times cheaper ;)

throwaway167
0 replies
4h31m

100% of, say 42 is 42. So 100% less than 42 is 0.

3900% cheaper makes no sense.

andrewstuart
2 replies
9h52m

I sure hope they don’t ever want to change their db structure.

Why not use Postgres with RBAC (Row Based Access Control).

dathinab
1 replies
6h2m

- simpler db client

- simpler cloud architecture

- simpler resource management

- simpler partial backups/restore

- simpler compliance with law enforcement

- partitioning might be easier, e.g. when handing "user account storage which should be undo-able for a while" (e.g. long term absent users data could be moved to cold storage, blocked/deleted users data could move to some scheduled for deletion space allowing undoing it for a while but then reliable auto deleting them, a copy of users data where crime detection triggered (e.g. CASM) could be moved to a quarantine space, etc. And each of the spaces can be completely different servers with different storage methods and retention policies, virtual access control and physical access control. Sure you can have all of that with RBAC + partitioning + triggers + roles in postgres, but it's the personal data store of a user so you don't need cross users FK constraint enforcement and it makes it much easier to make sure you don't miss anything wrt. access controll or forgetting to partition/move some columns of a new table etc.)

- maybe simpler billing for storage ("just" size of DB)

now simpler doesn't mean better, but often it pays of as long as you don't run into the limits of what is possible with the simpler architecture (and as far as I can tell you can shard this approach really nice, so there at least there shouldn't be scaling performance limits, scaling cost and future feature complexity limits might still apply)

andrewstuart
0 replies
59m

You didn’t systemically document “harder”.

tomashubelbauer
1 replies
6h55m

Anyone interested in joining Bluesky, please grab these. I have extra and I've already invited all my Twitter mutuals I wanted to invite.

Edit: I'm all out now :)

da25
0 replies
6h21m

All used :( Do you have anymore ?

gigatexal
1 replies
7h24m

At a previous fintech role the company would store customer accounts as encrypted sqlite3 files on blob storage ... this worked out decently well for our access patterns.

emadda
0 replies
30m

How did they lock the file when re-uploading it after edits?

m3kw9
0 replies
2h49m

That sounds like centralizing

hknmtt
0 replies
9h30m

good luck with running updates.

gigatexal
0 replies
7h19m

This should make leaving the service rather simple. Download your sqlite file and throw up a simple local-only html front end to the data and you're solid.