return to table of content

The Internet Archive is under a DDoS attack

Simon_ORourke
57 replies
1d1h

Why, what's the point in doing such nonsense? Unless it's someone with lots of money, contacts in the dark web, and some historic Barbara Streisand type chip on the shoulder.

somenameforme
21 replies
1d

DDOS attacks are dirt cheap and can be contracted from large professional sites offering customer support and the works. The largest one taken down had hundreds of thousands of users, and had carried out some 4 million attacks, for prices starting at $14.99/month. [1]

So in other words, anybody can carry out a DDOS for basically no cost. So trying to analyze the purpose, let alone suspects, is probably not going to be fruitful.

[1] - https://wccftech.com/865619-2/

sva_
20 replies
22h9m

And they're curiously usually protected by Cloudflare.

neurostimulant
13 replies
21h58m

Doubt cloudflare has anything to do with it. The operators most likely don't want to openly expose their website's ip addresses.

mike_d
7 replies
21h36m

That is exactly the problem. These services are constantly at war with each other and are attacked by competitors. Cloudflare provides DDoS protection to the DDoS providers so they can keep their services online, which directly benefits Cloudflare by DDoS being a bigger problem than if they were all busy attacking each other.

This is a sampling of currently available services and who they use for DDoS protection:

  stresslab.app - Cloudflare
  maxstresser.com - Cloudflare
  sunnystress.com - Cloudflare
  tresser.io - Cloudflare
  ip-stresser.net - Cloudflare
  hardstresser.com - DDoSGuard
  zdstresser.net - Cloudflare
  starkstresser.net - Cloudflare
  stresserhub.org - Cloudflare
  nightmarestresser.net - DDoSGuard
Just for fun head over to Cloudflare's abuse reporting site and try to figure out how to get one of these taken down. https://abuse.cloudflare.com/

jltsiren
4 replies
18h31m

I find the idea of DDoS providers confusing. If someone tried to operate a service that can be abused easily to cause similar disruption in the physical world, the operation would be taken down quickly and the people behind it would probably end up in prison. But somehow the internet is still a lawless zone where crime is tolerated and everyone is out for themselves.

mike_d
2 replies
14h27m

It used to be very rare for DDoS providers to publicly advertise their services, you kinda had to know a guy who knew a guy. If you put up a website offering this service the Good Guys of the Internet would track you down and get your provider to take you down, or that provider would in turn get disconnected from the internet.

Now they hide behind Cloudflare who will refuse to turn over any information so that security folks can get them taken down. Unfortunately Cloudflare has grown too large that we can't just block all of it or depeer them like we would any other network that provided services to bad actors.

jltsiren
0 replies
12h16m

That kind of vigilante justice is part of the general lawlessness.

Most of the listed domain names are under US jurisdiction. That means the authorities should be able to take them down. If Cloudflare is found to have been knowingly enabling crime, it could face fines, and the CEO and other key people could end up in prison. The Cloudflare services have probably been paid using means that are under US jurisdiction. Those payment accounts can be closed and the people behind them tracked down and potentially charged with crimes.

Or at least that's how things work in the real world. The internet is still apparently too new for the authorities to understand how to deal with it.

greyface-
0 replies
3h53m

It's true that you can't practically block Cloudflare without impacting legitimate users, but they can absolutely be depeered if you're willing to pay a higher transit bill.

marginalia_nu
0 replies
16h15m

The added element of international relations makes it a far bit more tricky than any real-world equivalent. Usually these operate out of places that are not on good relations with the countries they target. Russia and China are the big ones.

swyx
0 replies
17h3m

TIL. thats shocking. i doubt it’s intentional but “institutions will preserve the problem to which they are the solution. no need to ascribe to malice that which can be blamed on simple incentives (and of course its a big problem, things fall thru the cracks, etc etc)

jsheard
0 replies
21h31m

DDoSGuard has a reputation for being The Crime CDN, disproportionately serving things like phishing campaigns, black hat forums, piracy sites, etc, so the fact that they are merely the second most popular CDN amongst DDOS providers after Cloudflare speaks volumes.

jsheard
3 replies
21h45m

It's obvious why a DDOS provider would want to use Cloudflare, but their point is that Cloudflare turns a blind eye to DDOS providers using their services. Actively helping to keep DDOS providers online while also selling DDOS mitigation isn't a good look to say the least.

coretx
2 replies
19h9m

Cloudflare is a data goldmine setup by people who love fedoras and newspapers. Professional DDOS providers won't use Cloudflare ever and have the skills, metal and (human) network to do everything in-house.

jart
1 replies
15h1m

Yeah you're actually worse off using Cloudflare because you can't block attacker IPs anymore, once you're dependent on them to protect you, and they're not very good at protecting. I run an online service that invites hackers to DDOS the server. Cloudflare's servers would usually go down before we did. The only way we could stay online was by switching to GCS and using token buckets to blackhole IPs in the raw prerouting table, which made the hostile packets into mighty Google's problem. Thankfully they don't charge for ingress, so it was about as cheap as Cloudflare too.

fch42
0 replies
2h37m

You mention the key feature for ddos (self-)protection - zero ingress fees. Non-Availability hurts you in harder-to-quantify terms than a bill for bandwidth used.

Zero ingress puts the upfront bandwidth cost onto the attacker. Because... you actually may succeed to defend and stay up. Their success is not guaranteed, they might be shouting into the void.

Attack success (as in, "impact on you") is guaranteed if your ingress is chargeable.

dclowd9901
0 replies
13h7m

And you probably think glass repairers don’t drive around freeways at night dusting gravel.

ziddoap
5 replies
22h4m

What makes that curious?

hnbad
3 replies
19h51m

While I think there might be valid arguments to support that claim, that blog post hardly qualifies. The author runs a gambling site and while the way Cloudflare handled the situation (according to the author) could certainly be improved, they clearly were affecting other users by "tainting" shared IPs.

jimmydorry
2 replies
17h24m

And yet if they ponied up the money, that issue of "tainting" shared IPs suddenly goes away. You can bet CloudFlare would graciously give the gambling site as much time as they need to bring their own IP (they went out of their way to link third party sellers of IPs with dubious provenance, after all).

hnbad
1 replies
8h55m

Did you read the blog post? It doesn't include the entire correspondence so it's not clear how explicit Cloudflare was about this but the Enterprise plan they were trying to upsell them includes BYOIP. It's clear to me that Cloudflare insisted they buy the enterprise plan because it includes BYOIP.

So in other words, Cloudflare noticed the author was running a gambling site, they decided that this was negatively impacting the shared IPs and the author would therefore need to upgrade to a plan that included BYOIP because they would need to use that feature to continue using Cloudflare and they likely insisted on prepayment for the annual plan because gambling sites have a reputation for being flaky and prepaying would have demonstrated the liquidity necessary to continue operating the site at that plan.

Again, Cloudflare could have communicated this better (and maybe they did in parts of the correspondence the author didn't share) but this all seems perfectly understandable, especially given how the sales team kept referencing Trust and Safety (implying the alternative is ending the contract for violating the ToS).

The issue of tainting shared IPs would indeed have suddenly gone away had the author brought their own IP (which would have required an Enterprise plan to do while staying on Cloudflare). Instead the author feigns ignorance arguing they don't even need the features of the Enterprise plan and doesn't acknowledge the issue with sharing IPs while sheepishly mentioning that maybe they're accidentally invading bans of their domain in certain countries by having alternative domains which they of course don't actually need because most traffic comes from their main domain yet somehow having these alternative domains is critical to running their business.

What are you even trying to argue here? The author is being deliberately dishonest in how they frame the incident and Cloudflare's motivation is perfectly understandable. The only thing to take offense with is the communication style which we can only judge based on a select few messages the author shows us. We have to rely on their word after they have already demonstrated dishonesty.

NineStarPoint
0 replies
2h47m

To me it’s similar to the whole “SSO wall of shame” thing, where a vital feature is locked behind more expensive pricing. As said in the article:

“We tried saying that we don't need any number of the 14 features that are included”

Which, to me, is the crux of the issue. Is it fair for Cloudflare to say “You are breaking the terms of service if you do not change your set up in this specific way, and also the way you need to chance your setup is locked behind a significantly more expensive pricing.” Being able to bring your own IP does not, to me, seem like something that should require a plan that is orders of magnitude more expensive than the standard. It seems much more to me like something that is more fundamental, and should be included as an option in a lesser version of the product Maybe I’m wrong, and there is actually significant overhead to Cloudflare for letting customers bring an IP. But as is, it feels very much to me like a situation where something vital was locked at the most expensive tier to force certain kinds of customer to pay more.

DougN7
16 replies
1d1h

Maybe there is something damning on there that someone needs kept quiet for a while?

textfiles
14 replies
1d

No.

sirtaj
12 replies
1d

The user you're responding to is Jason Scott of TIA.

matsemann
3 replies
21h58m

What's the significance of that?

(Googling "Jason Scott TIA" gives me "Dr Jason Scott is a Senior Research Fellow in the Tasmanian Institute of Agriculture" which doesn't explain much to me)

ziddoap
0 replies
21h55m

The beauty of acronyms/initialisms that people are too lazy to spell out!

TIA = The Internet Archive (i.e. the victim of the DDoS).

The user you're responding to is Jason Scott of The Internet Archive
gumby
0 replies
15h56m

The Tasmanian Institute of Agriculture are well known for their work on biological models of computer security architecture.

I am shocked that any HN reader could be ignorant of this fact. Their director is a (controversial) Turing Award winner.

ai_what
3 replies
1d

Oh sorry, I guess that makes it a very detailed and well though-out response.

Brian_K_White
1 replies
23h52m

It does, yes. The single word, from that source, on this topic, communicates all relevant information.

textfiles
0 replies
5h27m

Hackernews, it never disappoints.

humanfromearth9
1 replies
13h11m

Shallow dismissal anyway, even if he was the Supreme Majestic King of New Americania. He might further explain his answer. And I'm truly sorry for the DDos happening to this guy's organisation!

sirtaj
0 replies
3h13m

What is there to explain further?

dspillett
0 replies
4h20m

Which without context that is not given I certainly didn't know, so it isn't safe to expect others to know to. Are we supposed to dig into people's profiles to derive relevant context?

Freak_NL
0 replies
1d

That would have been an excellent addition to that comment.

pcdoodle
0 replies
23h41m

Makes sense: Large media outlets don't like their old BS stories staying accessible. I've seen it used as an accountability tool.

dxdm
5 replies
1d1h

Maybe it's a form of advertising certain capabilities and services.

neilv
4 replies
1d

IIUC, that's always a good theory for unexplained DDoS. Though, even if they have only profit motivations, I'm a little surprised when they don't seem to let ideology influence their selection of targets for demos.

For the sake of argument (maybe not true), let's say that all techies are aware of archive.org, and consider it beneficial, probably using it themselves.

Why don't they instead demo against a target that will be proof of capability, and one that someone won't pay them to do (no freebies), yet one that they perceive as bad or deserving in some way?

Probably improper to suggest "better" targets here, but I really wonder what's going on when some relative do-gooder gets attacked.

Similarly, ransomware attack on a children's hospital, of all places? Doesn't that get you uninvited to criminal mastermind dinner parties?

As Omar of "The Wire" told us, a man's gotta have a code.

aniviacat
1 replies
22h54m

Perhaps cruelty is the point.

Perhaps they intentionally attack targets that are generally seen in a positive light, to prove to potential customers that morale is not an issue.

Oh, you want me to DDOS a children's hospital? No problem.

jachee
0 replies
21h54m

Thus is the power of The Dark Side^W^W^W late-stage capitalism.

fmajid
0 replies
23h1m

Extortion is usually the motive. “Nice porn/gambling/crypto website you’ve got here. Shame if something happened to it”.

floam
0 replies
23h40m

One thing to keep in mind m about LockBit ransomeware was it was SaaS — errr RaaS — and there is a good chance the target was picked by an insider there, or it was at least some opportunistic hacker not really associated with those who provided the service, besides signing up as an affiliate.

LockBit was so successful partly because they didn’t have to hack anyone themselves. It was basically something advertised “Got SSH or RDP access? Let’s make a bunch of money.”

This attracted hackers who might not trust themselves to do the extortion part safely, as well as people who didn’t actually hack anything but hated their boss, wanted a payday.

krapp
2 replies
22h0m

Seriously? People do this shit for fun. There used to be a program (LOIC) popular on 4chan used for DDoS attacks all the time, it's the origin of the "firin mah lazer" meme.

krapp
0 replies
20h28m

I stand corrected.

codexon
2 replies
22h47m

It's probably because someone saved incriminating evidence on it and they refused to take it down.

jimbobthrowawy
1 replies
19h6m

If it's on their own site, that wouldn't be a problem. IIRC, archive.org stops serving pages that later appear in a robots.txt file.

codexon
0 replies
18h26m

It could be on a website they don't control like twitter.

swinglock
0 replies
22h44m

Are you upset? Can't do nothing about it? It even made a headline or even just a thread on a forum? That's reason enough for some. It could easily be a teenager with no better excuse than not having a fully developed brain and no better reason than liking to ruin things. Having seen how much that happens, I guess it's more likely than a conspiracy or a crime with any rationality behind it.

prmoustache
0 replies
22h51m

My wild guess is most of them are ran by companies offering ddos mitigations services.

jauntywundrkind
0 replies
22h7m

There are some very bad very shitty people about, just trying to make earth worse.

Npm has been under pretty severe attack for ~6 weeks now. I forget who else.

The scariest thing to me is what we might do in the face of persistent online attacks. If this stuff gets rolled up into western nations rolling back privacy & liberty? That's an theonion.com "bin laden plan to sit back and enjoy collapse" situation. Freak out & let cyber security paranoia reign & destroy free communication & connection.

banish-m4
0 replies
16h55m

Al-shabaab, Boko Haram, China, Russia, DPRK, basement dweller, or third-party offensive hackers.

Uptrenda
0 replies
18h24m

It's probably just some kid with a botnet that's showing off. You all give these people way too much credit, lmao.

Joel_Mckay
0 replies
1d1h

My thoughts exactly, what is the point of attacking a library... so lame... =/

OutOfHere
27 replies
22h45m

What are the ways to manage a DDoS attack, preferably using open source? Don't say Cloudflare because they're an extortionist firm.

jsyang00
7 replies
22h24m

Cloudflare is not "an extortionist firm". It is a large tech company, where occasionally teams employ shitty sales tactics to meet their numbers, but generally provides a valuable service and acts reasonably ethically.

There are open source tools to mitigate DDoS, but all of them will have some marginal cost to run, and they will all be significantly worse than Cloudflare as they benefit from neither Cloudflare's data moat or scale.

OutOfHere
5 replies
20h18m

No, thanks. Cloudflare acts ethically only until it suits them. It is the pre-exploitation phase to lure a customer. We are not fools here. The report at https://news.ycombinator.com/item?id=40481808 says it all.

Secondly, considering Cloudflare would MITM all traffic, it would make a data good source for the NSA, thereby violating all user privacy.

Capricorn2481
3 replies
10h20m

Secondly, considering Cloudflare would MITM all traffic, it would make a data good source for the NSA, thereby violating all user privacy.

This seems like a weak argument. Should we just take down anything widely accessed because it might be used by the NSA? What about AWS?

OutOfHere
1 replies
2h29m

Is AWS providing DDoS mitigation services now, coupled with MITM access to user traffic?

bozey07
0 replies
2h19m

They're the man at the end, actually. No extortion necessary.

cess11
0 replies
3h25m

Yes, that's pretty much the view in Schrems II from the European Court of Justice. The CLOUD Act does not respect data protection rights.

bdlowery
0 replies
10h50m

Nah. Just don’t do anything illegal and you’re good

ornornor
0 replies
4h18m

Cloudflare is not "an extortionist firm". It is a large tech company, where occasionally teams employ shitty sales tactics to meet their numbers, but generally provides a valuable service and acts reasonably ethically.

Nice.

Capricorn2481
6 replies
22h25m

I have asked this a few times and never gotten an answer beyond "One day they could turn evil." What is the reason Cloudflare is an extortionist firm? I am way more concerned about Amazon than Cloudflare.

ClassyJacket
2 replies
18h26m

"What is the reason Cloudflare is an extortionist firm?"

Because they are a publicly traded company

OutOfHere
0 replies
16h18m

Nailed it. The implications are saddening.

Capricorn2481
0 replies
10h18m

So is Amazon.

tripletao
1 replies
21h39m

Beyond the upselling under duress, I've also seen complaints that Cloudflare protects the client-facing websites of DDoS-as-service operators. This enables them to sell their service, which then creates demand for Cloudflare's service from their targets.

Cloudflare describes that policy as a commitment to content neutrality rather than extortion, and I think that's more or less sincere (since they've protected many other unpopular sites that didn't give them such a benefit, with a few high-profile exceptions). It does work out very conveniently for them, though.

fortran77
0 replies
17h20m

Cloudflare describes that policy as a commitment to content neutrality rather than extortion

But we know that's not true. Point out problems with a very controversial blogger and they'll cancel your service.

emporas
4 replies
19h24m

The only universal way IMHO, is to associate a small cost to every internet request. The cost has to be as small as possible, but it has to be there for million/billion/trillion requests to add up the cost and make it uneconomical to continue the attack past some point.

It is the same problem with email spam. What's stopping someone from sending billions of spam mails?

If we suppose that: a blockchain exists which is fast enough, cheap enough, and spread out enough on the globe (to mitigate latency), then there is no reason, for a tcp packet to not carry with it a small money transaction, in the order of a millionth of a cent. Information gets served back, only when the transaction is confirmed.

In that way, any request with no transaction gets discarded, and only requests with a small cost pass through. Suddenly by sending requests one after another and no end in sight, DDoS attacks and mail spam start to cost money. It is the serving of request that makes DDoS attacks and mail spam to be effective.

The problem however, is that no blockchain is fast enough and cheap enough as of today. But there will be one in a handful of years.

0xcde4c3db
1 replies
18h51m

Similar systems were proposed in the late '90s/early 2000s (hashcash/micropayments) to combat spam. The big problem isn't a technological one, it's that it presupposes some "sweet spot" price (negligible for legitimate users, yet prohibitive for abusers) that has never been shown to exist in reality.

(also, you're arguably just moving the problem to DDoSing the payment processing / firewall mechanism)

emporas
0 replies
14h21m

Similar systems were proposed in the late '90s/early 2000s (hashcash/micropayments) to combat spam.

These ideas indeed exist for decades.

The big problem isn't a technological one, it's that it presupposes some "sweet spot" price (negligible for legitimate users, yet prohibitive for abusers) that has never been shown to exist in reality.

Advances in technology, software and hardware, make it easier and easier for that sweet spot to exist. That sweet spot, didn't exist in the past, certainly, but we are close right now.

One example that i think is useful here, is aluminum cans for fizzy drinks. Aluminum, a strong metal compared to cardboard or plastic or glass, is better at withstanding pressurized gases without exploding. The downside, is that it's more expensive. When manufacturing prices dropped down a lot, then it was feasible to drink half a liter of liquid and just throw away the metal. Aluminum still not free though, but the small price did worth it. Huge waste of energy as well to smelt all that metal and throw it away after 10 minutes of drinking, but it is economically viable.

One could manufacture Titanium cans, and drink even more fizzy drinks. But that's not economically viable as of today.

you're arguably just moving the problem to DDoSing the payment processing / firewall mechanism.

Yes, the problem is moved elsewhere, that's the weak link in the scheme i described. The thing is that a flood of transactions still costs money. Blockchains cannot be flooded just with requests, they have to be flooded by transactions. Take a look at the article [1] which outlines some ideas. I don't agree with a lot of things in there, but it states the problem and gives some numbers.

The theory when it comes to blockchain deterring DDoS attacks (and other kind of attacks), is that there are not bad guys in general, just rational economic actors who use dirty tricks. When a dirty trick starts to cost money, and profit disappears from an attack, then the rational economic actor will stop the attack. The bad guy will resume the attack regardless of profit, but that's one of the axioms of the theory, that there are no bad guys.

[1] https://www.dlnews.com/articles/defi/ddos-attacks-are-an-inc...

jimbobthrowawy
0 replies
18h53m

That would assume the cost is borne by the attacker, and not every smart thermostat in their botnet.

Analemma_
0 replies
17h42m

This wouldn’t fix anything. Most DDoS attacks today are amplification attacks, e.g. “I sent 10 bytes to this unpatched NTP server and as a result it sends 500 kb to this target server., so in your scheme the costs would not be borne by the attacker.

robertakarobin
1 replies
22h22m

Aw darn, they are? I was just considering migrating my frontend to them after seeing all the positive reviews. What's the issue?

OutOfHere
0 replies
20h12m

Refer to the report at https://news.ycombinator.com/item?id=40481808

Secondly, considering Cloudflare would MITM all traffic, it would make a data good source for the NSA, thereby violating all user privacy.

remram
0 replies
22h12m

If your application can take it, drop it in the application. If your load balancers can take it, drop it on your load balancers. Otherwise you have to get your provider to drop it, if they can take it. Worse case they'll drop all traffic meant for you to protect the rest of their network.

rcxdude
0 replies
17h58m

It's not (just) a case of the software, it's the hardware and the position in the network. These DDoS primarily just saturate your internet pipe: you need to be able to co-ordinate with core ISPs to block the ddos traffic before it concentrates too much.

overstay8930
0 replies
20h53m

Plenty of DDoS mitigation firms use open source tech, but that’s only step one of mitigation, most normal firms will never be able to stop a DDoS attack without someone else with a lot of resources tanks the attack for you.

Even if you go all out and buy a bunch of huge IP transit links, you are not gonna be able to stop the IXP 800 miles away from getting congested and blocking your customers from accessing your site anyways. You need access to a backbone to route traffic differently to avoid those kinds of issues, which is why DDoS scrubbing services will partner with a T1 ISP to do most of the work.

brokenmachine
0 replies
19h13m

Proof of work gateways and really annoying captchas.

Joe_Cool
0 replies
21h49m

HAProxy + DDOS protection? https://www.haproxy.com/blog/application-layer-ddos-attack-p...

Or any proof of work proxy that delays the ingress traffic. If you only have one server there is very little you can do except maybe redirect to a static page or kill the DNS entries.

Lammy
19 replies
19h34m

This is why I’ve gotten into the habit of maintaining my own WWW archive of sites I find interesting. Probably have around 1 TiB now, and One Of These Days I’d like to set my network up so it can serve arbitrary sites directly from local archive to revive any site I want.

I have a `wget-mirror` shell function invoking wget with all the trimmings that takes care of 99% of sites. I’ll edit the full command into this comment when I get home if anybody else wants to start doing the same :)

Lammy
4 replies
16h50m

Missed the edit window, but here's the command I use. Newlines added here for clarity.

  wget-mirror() {
    wget --mirror --convert-links --adjust-extension --page-requisites \
    --no-parent --content-disposition --content-on-error \
    --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
    --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" \
    --restrict-file-names="windows,nocontrol" -e robots=off --no-check-certificate \
    --no-hsts --retry-connrefused --retry-on-host-error --reject-regex=".*\/\/\/.*" $1
  }

Some notes:

— This command hits servers as fast as possible. Not sorry. I have encountered a very small number of sites-I-care-to-mirror that have any sort of mitigation for this. The only site I'm IP banned from right now is http://elm-chan.org/ and that's just because I haven't cared to power-cycle my ISP box or bother with VPN. If you want to be a better neighbor than me, look into wget's `--wait`/`--waitretry`/`--random-wait`.

— The only part of this I'm actively unhappy with is the fixed version number in my fake User-Agent string. I go in and increment it to whatever version's current every once in a while. I am tempted to try automating it with an additional call to `date` assuming a six-week major-version cadence.

— The `--reject-regex` is a hack to work around lots of CMS I've encountered where it's possible to build up links with an infinite number of path separators, e.g. an `www.example.com///whatever` containing a link to `www.example.com////whatever` containing a link to…

— I am using wget1 aka wget. There is a wget2 project, but last time I looked into it wget2 did not support something I needed. I don't remember what that something was lol

— I have avoided WARC because I usually prefer the ergonomics of having separate files and because WARC seems more focused on use cases where one does multiple archives over time (as is the case for Wayback Machine or a search engine) where my archiving style is more one-and-done. I don't tend to back up sites that are actively changing/maintained.

— However I do like to wrap my mirrored files in a store-only Zip archive when there are a great number of mostly-identical pages, like for web forums. I back up to a ZFS dataset with ZSTD compression, and the space savings can be quite substantial for certain sites. A TAR compresses just as well, but a `zip -0` will have a central directory that makes it much easier to browse later.

Here is an example of the file usage for http://preserve.mactech.com with separate files vs plain TAR vs DEFLATE Zip archive vs store-only Zip archive. These are all on the same ZSTD-compressed dataset and the DEFLATE example is here to show why one would want store-only when fs-level compression is enabled.

  982M    preserve.mactech.com.deflate.zip
  408M    preserve.mactech.com.store.zip
  410M    preserve.mactech.com.tar
  3.8G    preserve.mactech.com
Also I lied and don't have a full TiB yet ;)

  [lammy@popola#WWW] zfs list spinthedisc/Backups/WWW
  NAME                      USED  AVAIL     REFER  MOUNTPOINT
  spinthedisc/Backups/WWW   772G   299G      772G  /spinthedisc/Backups/WWW


  [lammy@popola#WWW] zfs get compression spinthedisc/Backups/WWW
  NAME                     PROPERTY     VALUE           SOURCE
  spinthedisc/Backups/WWW  compression  zstd            local



  [lammy@popola#WWW] ls 
  Academic                        DIY                             Medicine                        SA
  Animals                         Doujin                          Military                        Science
  Anime                           Electronics                     most_wanted.txt                 Space
  Appliance                       Fantasy                         Movies                          Sports
  Architecture                    Food                            Music                           Survivalism
  Art                             Games                           Personal                        Theology
  Books                           History                         Philosophy                      too_big_for_old_hdds.txt
  Business                        Hobby                           Photography                     Toys
  Cars                            Humor                           Politics                        Transportation
  Cartoons                        Kids                            Publications                    Travel
  Celebrity                       LGBT                            Radio                           Webcomics
  Communities                     Literature                      Railroad
  Computers                       Media                           README.txt


Some of this could stand to be re-organized. Since I've gotten more into it I've gotten better at anticipating an ideal directory depth/specificity at archive time instead of trying to come back to them later. Like `DIY` (i.e. home improvement) there should go into `Hobby` which did not exist at the time, `SA` (SomethingAwful) should go into `Communities` which did not exist at the time, `Cars` into `Transportation`, etc.

`Personal` is the directory that's been hardest to sort because personal sites are one of my fav things to back up but also one of the hardest things to try and organize when they reflect diverse interests. For now I've settled on a hybrid approach. If a site is geared toward one particular interest or subsulture, it gets sorted into `Personal/<Interest>`, like `Academics`, `Authors`, `Artists`, `Goth` (loads of '90s goths had web pages for some reason). Sites reflecting The Style At The Time might get sorted into `1990s` for a blinking-construction-GIF Tripod/Angelfire site or `2000s` for an early blog. Some times I sort personal sites by generation like `GenX` or `Boomer` (said in a loving way — Boomers did nothing wrong) when they reflect interests more typical of one particular generation.

Adj_and_Styles
1 replies
11h40m

Maybe save the log automatically? And then check and report unsolved errors, at end of the fuction or better separate one so log can be reinspected any time.

I have encountered "GnuTLS: The TLS connection was non-properly terminated. Unable to establish SSL connection." multiple times, and retry options seem to be useless when that happens. Some searches suggest it could be related to tls handshake fragmentation, but nonetheless wget could retry if related options are used. Manual retry seems to download the missing URLs, otherwise mirroring jobs are randomly incomplete.

shanemhansen
0 replies
4h1m

It's weirdly specific but I remember old versions of go caused that error. The final packet (close_notify) to close the connection was set with the wrong error level.

razodactyl
0 replies
15h35m

Wow. Only 772GB. Way under 1TB. Liar!!

IntoEmptySpace
0 replies
13h58m

This is great, thanks for sharing with that additional context.

JKCalhoun
3 replies
18h51m

Same. I've been scraping PDF'ed magazines, etc. and keeping them locally. In addition to feeding my byte-hoarding tendencies, I like the idea I could be off-grid in my van/RV somewhere and reading a "Popular Electronics" magazine from 1972 on my laptop.

(Oh, never mind YouTube videos that I once added to playlists ... that later disappear leaving only holes in my playlists.)

jasonfarnon
2 replies
18h17m

My problem with this approach is that the stuff I want to look at in 10 yrs time is never the stuff I think of saving right now. In the 2000s there were browser extensions I've forgotten the names of (shelf? slogger?) that would automatically save local copies of every webpage on page load. But I don't think they're around anymore and have no idea how you could achieve similar functionality with dynamic pages anyway.

rightbyte
1 replies
18h9m

But I don't think they're around anymore and have no idea how you could achieve similar functionality with dynamic pages anyway.

It is probably easiest to save the render as a picture and then store text separately for searchability?

disqard
0 replies
17h39m

There's a way to get "the best of both worlds" (tbh, works most of the time): print to pdf.

swyx
2 replies
17h6m

yes please. extra credit for anyone who shares instructions on how to inject this into every website i browse sans blocklist

password4321
1 replies
16h23m

Keeping a local, searchable record of all web browsing comes up every few months here, but it took me a while to find a lengthier discussion like this one from 2022: https://news.ycombinator.com/item?id=31848210

swyx
0 replies
4h52m

incredible linkfinding. you deserve 10 upvotes.

1vuio0pswjnm7
2 replies
13h57m

Assume everyone is familiar with this project, dating back to 1996:

https://en.wikipedia.org/wiki/WWWOFFLE

https://ftp.netbsd.org/pub/pkgsrc/distfiles/wwwoffle-2.9j.tg...

The way the www is going, it seems like downloading a copy of libgen, i.e., nonfiction books, and scimag, i.e., academic journals, via torrent, would be more valuable than archiving websites, in general. These primary sources are part of the material used to train so-called "AI" anyway. The problem is that this so-called "AI" also includes all the garbage from the www.

Worst case is eventually these books and journals will again become publicly inaccessible but "AI" will be offered as a bogus substitute; a future where few people will do research using primary materials anymore, they will just submit questions to a remote "AI" server. Truth will be decimated.

oredbored
0 replies
3h21m

Do we know for sure that they trained on data from libgen etc? It's such a powerful source of information you'd assume they must have, although they would never admit it. There must be a way to test if they have, via enquiring about some niche information only found in certain books.

dkz999
0 replies
12h23m

We already kind of see this with search.

nox101
1 replies
15h1m

how is this a solution? The Archive performs a valuable service. They're collecting wahy more of the internet than you are (I assume) so when that thing you didn't back up today is not available in 10yrs it's more likely to be on the archive.

I donate to The Archive. More people should too.

Lammy
0 replies
14h22m

I don't know why you're treating them as mutually exclusive. Single points of failure are as bad when it comes to organizations as they are with anything else. Internet Archive (the org) could stop existing with the flick of a pen. I don't think “Let Somebody Else Do It” is a healthy attitude to take, and I'm going to keep doing what I'm doing.

Plus for as great of a service as Wayback Machine is, it can be very unpleasant to actually browse. I dislike how it injects its own toolbar into every page (yes I know how to massage the URLs to get the raw page data, but it isn't browsable that way). Have you never encountered sites in Wayback Machine where certain pages were just randomly not archived? Or when you click a link and get a page from years earlier or later than the one you came from? Never encountered a page or an entire domain that was blocked from Wayback Machine? Why do you think I would get started doing something like this in the first place if I didn't find it more fun to browse my own archives than Somebody Else's?

jmholla
1 replies
12h36m

I’ll edit the full command into this comment when I get home if anybody else wants to start doing the same :)

I would love that. I have a little for parameter version, but I feel yours is more tried and true.

freitzkriesler2
16 replies
1d1h

This is like an arsonist lighting an orphanage or library on fire. Why would you do something like that?

jedberg
10 replies
1d

To sell your services as an arsonist. Being able to point to a big successful attack helps professional DDoSers sell their services.

aprilthird2021
5 replies
23h39m

But wouldn't they need to telegraph the attack in advance to customers? Taking credit after the fact is risky as many competitors will also take credit

jedberg
4 replies
22h42m

Yes, that is part of it. They tell potential clients, "On May 27th, I'm going to take down the Internet Archive". Then they do it, and then go back to their clients and say, "now that you've seen my work, do you want to pay me?".

aprilthird2021
2 replies
20h18m

Then wouldn't it make more sense to take down a target with few eyes on it? Since you're not paid, why deal with the risk of an attack that will make the news?

jedberg
1 replies
20h8m

Making the news is the goal. As long as a few people can verify it was you, word will get around about the person who can take down big targets, and will cite the news articles as part of the proof.

aprilthird2021
0 replies
18h52m

Isn't word getting out about you bad though? As it puts you in the spotlight of law enforcement?

hnthrowaway0328
0 replies
21h58m

I wonder where we can find a middleman for that kind of service. Criminal groups would be pretty stupid not paying a middleman to do the checks and filtering.

ysofunny
2 replies
1d

more cynically, to sell fire-safety insurance

freitzkriesler2
1 replies
19h20m

I always wondered if the NSA and Cloud flare do stuff like this with websites not behind cloud flares umbrella.

"Make em offer they can't refuse."

kmeisthax
0 replies
17h21m

Most DDoS for hire services are on CloudFlare and they refuse to drop them. You're at least in the right ZIP code.

DoctorOetker
0 replies
9h51m

I don't understand, what prevents them from running to your competition if you dont mark your brand?

mandibles
1 replies
22h59m

Some people just want to watch the world burn.

freitzkriesler2
0 replies
21h19m

Precisely no honor amongst thieves.

karma_pharmer
1 replies
13h53m

Because the orphanage refuses to use cloudflare.

freitzkriesler2
0 replies
5h37m

Precisely this. There's a reason cloudflare was part of the CIA's incubator. Totally organic growth and didn't have government and spook hands all over it /s.

Tao3300
0 replies
22h40m

Probably no good reason. In technical terms, some asshole is dicking around.

tetris11
10 replies
1d1h

Who's their biggest enemy at the moment

btbuildem
6 replies
1d

Paywalled sites?

ysofunny
5 replies
1d

let me go further: the whole of the copyrighted industry

including all media conglomerates (obviously) and all scientific, literary, etc, publishing houses.

also, there's a global war, so it well may be a fog-of-war technique or like somebody else also mentioned: someone needs something to stay quite for a little bit as part of some larger operation

genidoi
3 replies
20h4m

Doubt this is coordinated - more likely a singular (m/b)illionaire wanted a post/photo/video, or multiple of, deleted for good, perhaps for suppression of legal evidence, and this was one way of bringing some firepower to a… library. One of the internets biggest libraries too. Odd.

brokenmachine
2 replies
19h4m

But a ddos doesn't remove the page...

ysofunny
1 replies
18h13m

it may prevent its capture in the first place, and it can also prevent consulting the page (a delay tactic)

brokenmachine
0 replies
18h9m

Good points.

markus_zhang
0 replies
23h14m

The establishment always gets the most advanced technology, attack and defense because they have the big bucks. That's why I never believed that technological advancement promotes individualism, distributed X (whatever X is, money, power, whatever). Eventually it always points to a more centralized world because the elites are able to control more with each technological advancement.

some_furry
1 replies
1d

That's a tough question to answer without devolving into politics, which is off topic for Hacker News.

I think that's also the wrong question to ask. "Who's doing it?" is less interesting than "What's enabling them to succeed?"

slater
0 replies
23h24m

Politics isn't fully off-topic on HN; per guidelines, most political stuff is off-topic, "unless they're evidence of some interesting new phenomenon"

https://news.ycombinator.com/newsguidelines.html

kristopolous
0 replies
20h5m

Anyone who doesn't like the availability and accessibility of history and documents.

Lots of people want to rewrite or erase history.

Quoting a story I wrote about this a few years ago:

"Everything you speak, all ideas, all things, all thoughts, they are all of the past. Society and knowledge is a composite of the shadows of former presents.

When people lie or misrepresent knowledge they speak of a past they wish to change.

What if people who have the most to gain from deceit had a tool to actually change the past and make these lies the truth?"

Here it is if you're curious https://kristopolous.medium.com/stephen-hawking-had-a-time-t...

pmarreck
9 replies
1d

Is there any way to know who is responsible?

pixelpoet
8 replies
23h39m

I think "follow the money" is a decent heuristic here. Why else would anyone do it?

markus_zhang
6 replies
23h8m

Is it possible for archive.org to trace back the IP addresses? I assume maybe the attackers used a lot of IoT devices or VMs in cloud?

fmajid
5 replies
23h3m

Botnets usually, sometimes amplification attacks against NTP or DNS, although the Chinese government’s Great Firewall also has offensive capabilities known as the Great Cannon, although they are generally used against GitHub because it hosts censorship-circumvention software like VPNs.

markus_zhang
4 replies
23h1m

Are botnets usually hosted on personal computers or servers or IoT? I'm thinking maybe archive.org can block a whole range of IPs if needed.

0xcde4c3db
3 replies
22h16m

Resistance to that kind of simple countermeasure is exactly what distinguishes a DDoS attack from a non-distributed DoS attack. The traffic basically comes from "everywhere". Not literally every IP block and route, but widespread enough that it's difficult to separate from legitimate users without actually processing the traffic (which is what you're trying to avoid by e.g. blocking an IP range).

hnthrowaway0328
1 replies
21h52m

Thanks. And I assume they mostly come from friendly countries which makes it even harder to block?

This is indeed very tough to resist.

ziml77
0 replies
17h15m

Yes. This is why it's maddening that most people don't take computer security seriously. Virus infected devices are what give these botnets their scale and wide distribution.

pmarreck
0 replies
16h10m

So compile a long list of compromised source IP's and just block those upstream?

fortran77
0 replies
17h17m

There are many individuals with embarassing things on the Internet. Perhaps one of the recent University empcampment participants wants to get an internship and doesn't want prospective employers to see articles about them them screaming and yelling. I think the odds that a major publisher is behind this is slim....

Uptrenda
5 replies
18h13m

If anyone from the archive.org team read this: love the website, by the way. You've saved so much rare content its really awesome.

swyx
4 replies
17h2m

would a decentralized internet archive make sense? impossible to ddos.

immibis
2 replies
11h18m

That exists and it's called Arweave. You even collect imaginary brownie points for maintaining the archive.

swyx
1 replies
1h17m

Arweave

looks a bit more broad than i wanted. as is its like a thin wrapper for IPFS.

immibis
0 replies
15m

It's a fake brownie point scheme (i.e. cryptocurrency) to let the people who archive the most stuff decide what goes into the archive. There's also IPFS but that has absolutely no way to decide what gets archived.

Uptrenda
0 replies
15h49m

I'd love to work on something like this. But the internet archive themselves don't seem too interested in it I guess.

whatsakandr
3 replies
1d1h

Clicking this link isn't helping them with their current situation. You'd just be contributing to the ddos.

mttpgn
0 replies
1d1h

The link directs to an announcement via "toot" on mastodon.archive.org

aaomidi
0 replies
1d1h

Nope. This should have no impact on the rest of what they’re doing.

DaSHacka
0 replies
1d1h

Don't worry, you can visit it through the wayback machine instead!

oh wait

bhk
2 replies
1d1h

Cui bono?

niek_pas
1 replies
1d1h

The guy from U2, I think

MobiusHorizons
0 replies
1d

lol, thanks for the good laugh :)

markus_zhang
1 replies
23h10m

How much $$ does archive.org spend on infra and such? How much does one need to endure the most damaging DDOS? I remember seeing from somewhere that Google went through some huge DDOS attack without going down.

Given its benefit to the lay persons I recommend everyone who use their services give a small amount once for a while. I already did so but if not for family issues I'd donate way more.

mike_d
0 replies
21h28m

DDoS attacks are usually volumetric attack. Send more bits than the pipe the website has to the internet.

To combat this you need to buy enough pipes to the internet for your regular internet traffic, as well as an extra 500 Gbps or so. That is a lot of unused bandwidth to be paying for every month. Then once the packets arrive at your datacenter you still need to buy dedicated appliances to scrub out the bad and let the good flow.

Google is constantly under attack, but their normal daily traffic volume (multiple Tbps) is large enough that just the extra capacity they keep on hand to deal with traffic spikes the World Cup or a popular YouTube video is larger than what most attackers can muster.

swyx
0 replies
17h0m

its probably fine now but maaaybe not great policy when ddos is involved

1vuio0pswjnm7
1 replies
20h38m

archive-it.org is still working

sans_souse
0 replies
20h4m

It's those damn record labels throwing a cog in the wheel, isn't it

rzr
0 replies
1d1h

so is purl.org

pockmockchock
0 replies
13h57m

they are trying to make their case for BGP regulation.

pixxel
0 replies
1d1h

Sorry to say, archive.org is under a ddos attack. The data is not affected, but most services are unavailable. We are working on it & will post updates in comments.
keploy
0 replies
2h50m

People who want to get rid of data history should be considered the enemy of humanity. I hope the archive is fine after all this.

anthk
0 replies
5h13m

It's where I get all my PD movies, among Jamendo's dump for music albums.

This is really bad for CC media.

amatecha
0 replies
19h6m

Seems to be back online. For now, anyway!

https://mastodon.archive.org/@brewsterkahle/1125141764988452...

  @internetarchive is back up!
  this is a back-and-forth with attackers.  Sees weekends and holidays are popular.
  we made adjustments, but we will see.
  at least Happy Memorial Day!

Animats
0 replies
14h8m

Seems to have stopped or been stopped.

1vuio0pswjnm7
0 replies
19h33m

1716850657

Internet Archive is now working for me