return to table of content

Optimizing the Lichess Tablebase Server

hocuspocus
19 replies
5d5h

I know it's not a fair comparison but I'm truly impressed by the quality of engineering shown by the Lichess team, when their main competitor was for example boasting about a migration to GCP and yet suffering from repeated outages due to fairly organic growth in popularity. While I believe they employ 100x more people.

Lichess' mobile app was a weak spot, however the v2 rewrite in Flutter is already pretty good while still in beta.

And keep in mind Thibault pays himself less than 60k/year.

sgt
9 replies
5d4h

I don't think he needs to feel bad about increasing his salary. Make it 200k/yr and make his life easier, which can only be good for the project long term.

epidemian
5 replies
5d2h

IDK about France (where Thibault is from, and IDK if he lives there), but where i'm from, you would have a very comfortable life earning 5k every month, so his self-imposed 60k/yr salary doesn't seem unreasonable at all. At some point, more money yields diminishing returns.

hyperman1
2 replies
5d1h

I don't know if that 5K is before or after taxes. You easily lose half of what your employer actually pays.

maccard
1 replies
5d1h

€60k pre-tax is roughly in the top 10% of incomes in the country based on a quick google. Not opulent, but definitely comfortable.

hocuspocus
0 replies
5d1h

His salary is more like €55k though.

It's comfortable outside of Paris and other expensive cities. But he could easily double that given his background. Before quitting his job he already worked with Play and the Typesafe (now Lightbend) stack before the peak of its hype, when companies were paying top dollar for consultants.

diggan
1 replies
5d1h

but where i'm from, you would have a very comfortable life earning 5k every month, so his self-imposed 60k/yr salary doesn't seem unreasonable at all.

(Some) HN commentators seems weirdly out of touch when it comes to salary outside of IT-heavy cities in the US. The other day someone claimed $125k/year for an employee wasn't "big money" (https://news.ycombinator.com/item?id=40927175), so I'd take any comments saying some salary is high/low with a box filled with sand.

AQuantized
0 replies
4d22h

To be fair that really isn't 'big money' in most of those cities, assuming big money has some connotation of significantly above average after tax and expenses disposable income in those areas, especially relative to your peers. I don't think it would be unfair to say that would be big money compared to many European workers in the same jobs though.

hocuspocus
2 replies
5d4h

I don't know him personally but from the talks he's given, he seems to be ideological about Lichess and his own lifestyle, in a way that would be considered fairly anti-capitalistic by most of the HN crowd :)

treyd
1 replies
5d3h

Do you have links to any of these talks you could recommend?

peter_retief
5 replies
5d4h

Lichess is a great service to casual chess players like myself to get a quick game against another human. Never much of a wait.

What I do want to know is how does one pronounce Lichess? Lie chess, Le chess?, League chess?

tecleandor
0 replies
5d1h

I guess it's because of the lychee fruit?

peter_retief
0 replies
4d4h

Thanks.

ycombinete
0 replies
5d3h

I’m team lie-chess.

hocuspocus
0 replies
5d4h

/li:/ as in libre.

epolanski
1 replies
5d2h

I think you're highly overestimating how many devs Chess.com has

hocuspocus
0 replies
5d1h

I am not, that's why I said employees not devs.

Sesse__
0 replies
5d2h

Lichess is a great example of how efficient Wikipedia should have been (both on the code and organization level). :-)

aeyes
19 replies
5d4h

Did they have to reduce cost or is there any other reason to not stick 20TB of SSDs in a box and call it a day? 4TB SSDs only cost ~$300, even HP or Dell SFF drives aren't much more expensive.

I guess they were interested in doing the testing and optimization for fun. From a product standpoint I probably would have invested my limited time in other projects.

diggan
4 replies
5d1h

From a product standpoint

Makes sense from that perspective, but Lichess is not run as a for-profit company with a product, it's run as a non-profit organization (which it is), so a perspective shift is needed to understand their decisions :)

silvestrov
3 replies
5d1h

Take a look at their financials and $1500 for SSDs would not be out of place.

They have yearly expenses for more than $500.000

https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...

Seems really weird to be using harddrives when they already have expenses like that.

lukhas
0 replies
5d

As mentionned elsewhere, we're renting most of our infra from OVH, and paying, monthly, for 40TB of SSDs or NVMes would simply explode our yearly budget.

Source: am président of the lichess charity (and also one of the sysadmins)

Timshel
0 replies
5d1h

Looks like rented stuff to me you can't just add drives ...

And while 500k is a lot maybe they can do so much with it because they do not just throw $1500 in drives at every problem.

Out_of_Characte
0 replies
5d1h

The reason is buried in another article

"WDL tables (.rtbw) store the outcome of positions, e.g. if a position is winning. An engine will use this very frequently to decide which endgames to aim for. WDL tables should be stored on the fastest disk (preferably SSD) you have." "DTZ tables (.rtbz) tell the engine how to finish the endgame once it is on the board. They are optional, but required to reliably convert complicated endings."

Seems reasonable to put the WDL table on the SSD for better engine performance. I do understand not choosing SSD's. The number of lookups for positions always remains the same per user per game. Yet the tablebase is growing more than exponentially.

https://lichess.org/@/lichess/blog/7-piece-syzygy-tablebases...

BSDobelix
4 replies
5d3h

testing and optimization for fun

In no other industry a engineer would think like that...except in IT.

We definitely have too powerful and cheap Hardware, combined with lazy Wetware who just wants to "call it a day"....be proud of your work....or so they say.

WJW
1 replies
4d22h

You think engineers in other industries won't sometimes choose the more exciting option when a boring but well-understood one would do the trick? That's definitely not true in (at least) mechanical and electrical engineering from what I've seen. From people spending millions trying to have the entire factory operated by robots so they could save 100k on humans to engineers specifying friction stir welders for the most basic of welding jobs, overengineering of parts that would make the people at Juicero blush, etc etc etc.

I have no idea why software people think their industry is the only one where people cut corners. Some form of meta-imposter syndrome perhaps.

BSDobelix
0 replies
4d22h

From people spending millions trying to have the entire factory operated by robots so they could save 100k on humans to engineers specifying friction stir welders for the most basic of welding jobs

Look, I come from that industry (metalworking), if you do friction stir where it's not needed you should be kicked out of your job, but wonder, I've never heard of such a thing in reality, don't tell me you're buying another friction stir cnc to save 100k "on people", friction stir is slow, expensive and any robot can weld (normal welding) faster.

Yes people are expensive, but un-optimised work is even more expensive (on factory level), NO ONE in the metal industry would do something like this if it was not necessary (well except the defence sector, because those guys are crazy and have unlimited money).

I call your made up story complete BS.

chronogram
0 replies
5d3h

Not calling it a day anywhere is why Lichess is such a good website.

aeyes
0 replies
5d2h

Most things in life are a compromise and it's easy to get tempted to find the perfect solution instead of spending your time on actually moving forward.

In all industries there is always something you can do better if only you spend more time. But at most places time is worth money and I'd say $3000 for a few SSDs is little enough to not make this worth my time.

broodbucket
2 replies
5d4h

Lichess is a non-profit with a lot of volunteers, they probably don't have the same time vs hardware cost balance as most for-profit companies do

traceroute66
1 replies
5d1h

It is important not to automatically make assumption that all non-profits are impoverished and run by volunteers.

One of the most famous examples is Wikipedia.

Technically yes, they are a non-profit. Impoverished ? Certainly not !

Look at the financials, as others have already pointed out. Especially if you are in the habit of donating to non-profits, the financials can make for interesting reading.

pfg_
0 replies
4d18h

If you look at lichess financials they currently have two full time employees - in this case it's not a bad assumption. Wikipedia has significantly more users and does fundraisers

ViktorRay
2 replies
5d2h

Lichess is a non-profit. It is run entirely on donations and volunteering. It has only 1 employee, the dude who founded the non-profit, and it seems he takes far less money than he could make from any other job based on how talented he is.

Also the organization is based in France. I don’t what impact that has on costs but it’s worth mentioning.

lukhas
0 replies
5d1h

We're up to 2 employees now! The founder and a mobile dev.

The impact on costs is "not small", because as a rough estimate, the charity pays overall about twice what the dev gets in take-home money, because French employer taxes are high (keyword for the Frenchies reading us: URSSAF).

Source: am President of the Lichess charity and have the honour and pleasure of dealing with most of the French administrative paperwork.

jayemar
0 replies
5d1h

I had no idea that was the case, that's incredibly impressive!

KolmogorovComp
1 replies
5d3h

Why scale up when you can optimise? I'm probably going to be downvoted for this, but imo this is really the mindset that leads to bloated software.

tra3
0 replies
5d2h

Agreed.

This is the implicit assertion that developer time is more expensive than hardware costs.

Seems true in the short term, until the whole system crumbles.

bastawhiz
0 replies
5d2h

They managed to reduce max response times by an order of magnitude. If this project took a week (even two) and some users went from 15s response times to 1.5s response times, only projects where the user experience is even worse or where you work for a for-profit organization where there's money to be made elsewhere (and you admit you don't really care about customer pain) would be a better justification of time.

imperialdrive
4 replies
5d

Lichess is one of those things you just have to sit and appreciate like a fine wine. It's absolutely wonderful for people in the chess community. I use it every day and am inspired by the functionality and performance, especially knowing it's a 1-2 person shop with limited budget.

why5s
0 replies
3d19h

I gotta be honest: I aspire to create something as valuable and as cool as lichess one day.

wavemode
0 replies
4d23h

I wish more open source end-user software learned from Lichess, in terms of how user friendly, well designed and well maintained it is.

lepetitchef
0 replies
5d

Me too. Recently the new beta mobile app is even cleaner and has haptic feedback which is so cool.

TheRoque
0 replies
5d

You forgot to mention that it's free, open source, and doesn't nor will ever ask for your money, and a lot of people donate. Their expenses are public. It's also available as an app !

everyone
4 replies
5d2h

A lichess is a female lich I'm assuming? (It's like baron / baroness)

o11c
2 replies
5d2h

Noble titles are a poor comparison since they're the rare example where there actually is an exclusively-male root form. For most words the root form is neuter, and both male-only (if it exists) and female-only forms require an affix.

Properly, a male lich is "werlich" and a female lich is "wiflich" (unlike other words the /f/ sound is not likely to disappear); the plurals add "-en". But generally sex is irrelevant for undead{cn} so the neuter form by far predominates.

"lichess" is an abominable mixture of German and French roots ... so naturally it is indistinguishable from the rest of English.

claytonwramsey
1 replies
5d

note - "chess" is not a Germanic word (deriving from the Arabic شَاه (shah), meaning king). Ironically enough, it comes to English via the Old French eschés, meaning that "lichess" is arguably made from entirely French roots.

o11c
0 replies
5d

Hm, I guess the "libre" is French, but "live", "light", and most importantly "lich" are all German.

If we look for relatives of "libre", they include "leed"(song) and the first half of Leopold (adding "bold") and Luther (adding "army"). The common meaning is "people".

OsrsNeedsf2P
0 replies
5d1h

It's "Libre" chess, as in "Free (and open source)" chess

treebeard901
1 replies
4d23h

Some questionable choices are made in this optimization.

The reason for the optimization is that there is so much IO activity the RAID checks can't complete.

It is unclear from the article if the RAID checks were ever completed on 17TiB of data. Instead, they choose to disable the periodic RAID checks and instead switch to doing the error checking as a page of data is read in. The two are not equivalent, and both should be used for important data.

Finding corrupt data only as you try to read it can lead to long running data corruptions, maybe to the point your backups do not go back far enough to restore the uncorrupted data. Underpinning this also is a change to RAID 0... While the fastest option, they are putting a lot of faith in that NVMe config handling that kind of workload.

Hope they have good backups...

EDIT: A good way to solve this is to spin up a temporary server, restore your backups to it, do the full data checks and when successful, you have also checked your backup and restore process along with the integrity of the file. You still want to have enough overhead available to complete the RAID checks on the primary server and don't use RAID 0 for performance.

lukhas
0 replies
4d22h

They are indeed not equivalent, but for our use case this is sufficent, if we detect data corruption we can just throw away the files and download/regenerate them (this is a freely available dataset, if a bit large, https://en.wikipedia.org/wiki/Endgame_tablebase will explain it better than me). For this reason, it is also not backupped.

robbles
0 replies
5d4h

here are the empirical distribution functions (ECDFs) with 30ms added to each response time

The added constant seems artificial, but it's just viewing the results from the point of view of a client with 30ms ping time. Otherwise the log scaled x-axis would overemphasize the importance of a few milliseconds at the low end.

I thought this was interesting - maybe it's a standard practice I was just unaware of but it seems like a smart trick.

29athrowaway
0 replies
5d2h

There is also lishogi but it is smaller enough to not require such optimizations yet.

Shogi is the most entertaining for chess variants. Xiangqi not as much.