HN comments for: Abusing Go's Infrastructure

miki123211

29 replies

1d3h

2024-05-25 14:40:07 UTC

Any online service that lets users upload material that is then publicly visible will eventually be used for command-and-control, copyright infringement and hosting CSAM. This is especially true for services that have other important uses besides file hosting and hence are hard to block.

This already happened to Twitter[1], Telegram[2], and even the PGP key infrastructure[3], not to mention obvious suspects like GitHub.

[1] https://pentestlab.blog/2017/09/26/command-and-control-twitt... [2] https://www.blazeinfosec.com/post/leveraging-telegram-as-a-c... [3] https://torrentfreak.com/openpgp-keyservers-now-store-irremo...

liquidgecka

25 replies

1d1h

2024-05-25 16:39:09 UTC

And Gmail and Google groups, and Google drive, and Gchat, on and on. The data you store doesn't even have to be public. With Gmail they would distribute credentials to log in and read attachments that they uploaded via imap.

(I am a former Google SAD-SRE [Spam, Abuse, Delivery])

zepolen

14 replies

1d1h

2024-05-25 17:11:07 UTC

Question, how would you know without invading the user's privacy?

kccqzy

5 replies

2024-05-25 17:47:30 UTC

An algorithm that processes private user data is by itself not invading anyone's privacy. It's clear to me that invasion of privacy only happens when humans look at private user data directly, or look at user data that's not sufficiently processed by an algorithm.

Otherwise, something as simple as a spell checker would be an invasion of privacy because it literally looks at every word in an email you write. That's absurd.

_heimdall

2 replies

2024-05-25 18:12:19 UTC

At least in my opinion, there's a big difference with where the data lives and where the checking algorithm is run. I don't think a spell checker would fall into what I'd consider a privacy concern as long as the spell checker is running locally on my device.

imachine1980_

1 replies

22h28m

2024-05-25 19:58:43 UTC

I don't work in the area of email nor Google but I see two problems.

1) you need to constantly update the spell checker so each time you say this is word or something like that most likely the data is send the problem is part of the data, I assume Google do something similar whit data send to span and mark as not spam. This is full email redirect and analysis not partial like old word processing.

2)I feel ai make this even more harder so now you can't simply check patterns as simply as before, and you need to check the whole content constantly

_heimdall

0 replies

15h42m

2024-05-26 02:44:19 UTC

We've had spell/grammar checkers in word processors that worked totally offline for a long time now. They definitely can be improved with a hosted service but that's by no means necessary and comes with tradeoffs like latency and offline support.

kortilla

0 replies

12h39m

2024-05-26 05:47:20 UTC

An algorithm that denies service, changes ad behavior, etc based on user content is definitely invading privacy compared to your spell checker case.

The spell checker would also be a massive privacy invasion if if flagged users based on the content of what they wrote.

ants_everywhere

0 replies

16h12m

2024-05-26 02:14:39 UTC

If an algorithm is looking through private stuff and making a decision based on it or is sending signals where the signal depends on the private stuff, then it's pretty much by definition leaking private information.

An algorithm that leaked no private information would not be useful to a business. It would do a bunch of computation and then throw it away. So realistically anything that looks at private information is privacy-relevant.

That includes even just the email headers. To quote the former head of the NSA "We Kill People Based on Metadata" https://abcnews.go.com/blogs/headlines/2014/05/ex-nsa-chief-...

You can have debates about how much private information should be leaked and for what purposes. But I don't think having a threshold like "it's all private unless another human reads it" is a good way to think about the issue.

carom

3 replies

23h6m

2024-05-25 19:20:20 UTC

Companies are legally obligated to scan for CSAM in the US.

toast0

2 replies

21h59m

2024-05-25 20:27:26 UTC

I don't think that's accurate... Do you have a link?

I do think there is an obligation to report if any is found, but I don't think they need to look.

_trampeltier

1 replies

17h40m

2024-05-26 00:46:39 UTC

https://www.theguardian.com/technology/2022/aug/22/google-cs...

j16sdiz

0 replies

17h10m

2024-05-26 01:16:24 UTC

I dont think that's a hard legal requirement to scan. Just some law around what to do once they are known, and some executive arrangements

_trampeltier

2 replies

19h20m

2024-05-25 23:06:26 UTC

I think there was a case, where several people loged in the same Gmail account and shared data not by sending mails, just by write and read drafts.

liquidgecka

0 replies

36m

2024-05-26 17:50:58 UTC

yep.. And it would split uploads across dozens of accounts with parity so that if any account was disabled it could re-create the data from what was in the other accounts. (think bittorrent using imap uploaded content in gmail)

bottom999mottob

0 replies

15h55m

2024-05-26 02:31:25 UTC

You might be thinking of the General David Patraeus case, a national security leak that was slightly worse than Snowden's, but with little repurcussions :)

liquidgecka

0 replies

42m

2024-05-26 17:44:39 UTC

Pre-AI we had a system that watched user patterns and would identify possibly suspect patterns that were outside of the norm. We also had system that would content-id the images and attachments to see what was going uploaded in a general way. Given enough suspicion then the account would be opened to look for abusive patterns.

There is absolutely no promise on any cloud hosted services that a human will not ever see your data. However, at Google it was made very, very, VERY clear that if we had to scan somebody's personal email for any reason then discussion of the contents outside of legally mandated, or required for work ways would lead to immediate termination and possible lawsuit for any damages to reputation incurred.

While fixing user accounts, or dealing with delivery of content I saw epic piles of personal email. Besides the ones full of CASM or other abusive material I couldn't say that I ever remembered the contents 30 minutes later. Its like a checker at a grocery store. They don't care about whatever embarrassing tings your buying and won't remember you 10 minutes later. =)

howenterprisey

3 replies

1d1h

2024-05-25 17:02:45 UTC

Just curious, "Delivery" doesn't seem to be the same sort of thing as "Spam" and "Abuse": why are the three grouped?

romafirst3

0 replies

2024-05-25 18:18:14 UTC

Delivery is what happens if it’s not spam or abuse.

liquidgecka

0 replies

48m

2024-05-26 17:38:54 UTC

I was apparently not watching this well enough, sorry for the delayed response.

Deliver was because we ran the SMTP and queuing infrastructure at the time. We started as Gmail SRE, then split out some of the delivery and abuse services into its own team (SAD), then SAD got SRE, hence SAD-SRE =)

fred256

0 replies

1d1h

2024-05-25 17:25:57 UTC

No inside information, but presumably this means Delivery to other organizations, which, among other things, includes maintaining outbound IP reputation, which is closely related to Spam and Abuse.

gloryjulio

3 replies

2024-05-25 17:53:48 UTC

Just a side note, I found the name sad sre funny and blursed at the same time

cdelsolar

2 replies

22h45m

2024-05-25 19:41:43 UTC

Whats’ blursed mean?

zarathustreal

0 replies

44m

2024-05-26 17:42:18 UTC

What does “Whats’” mean?

jprete

0 replies

22h43m

2024-05-25 19:43:43 UTC

Simultaneously blessed and cursed.

follower

1 replies

21h37m

2024-05-25 20:49:34 UTC

I am a former Google SAD-SRE

From long enough ago that I should apologize to you for libgmail: https://libgmail.sourceforge.net ? :D

liquidgecka

0 replies

38m

2024-05-26 17:48:28 UTC

libgmail was the least of our problems. There was a Polish software team that wrote a bittorrent layer on top of Gmail. That thing was a pain in the butt as they constantly improved it to get around abuse filters and such. Plus it had parity bits so if we killed accounts it would just re-replicate the data to new accounts.. That software was devilish and impressive at the exact same time. =)

weinzierl

1 replies

2024-05-25 18:16:48 UTC

Not sure if it has already happened, but the not so obvious one is HuggingFace.

miki123211

0 replies

19h54m

2024-05-25 22:33:08 UTC

No idea if it's used for CSAM or malware, but copyright infringement on a massive scale? https://huggingface.co/datasets/EleutherAI/pile

nerdponx

0 replies

2024-05-25 17:52:15 UTC

It seems like it would be pretty easy to use PyPI for this, because packages can contain arbitrary non-Python files. And you can also do things like base 64 encoding your files in strings in Python code.

8organicbits

12 replies

1d4h

2024-05-25 13:47:29 UTC

I know pypi has some non-python projects as well. Python needs the ability to distribute wheels, which are compiled binaries, as the user may not be able to compile library code. Lots of that code is written in C, but Golang[1] is also possible. I can't find an example, but I believe I've seen this used for distributing applications (not libraries) as well. It's kinda cool to write some app in C, upload to pypi, and then ask users to install with `pip install`.

[1] https://github.com/popatam/gopy_build_wheel_example

mort96

4 replies

2024-05-25 17:52:46 UTC

I guess that was much more useful as a use case before pip started requiring you to be in a venv/virtualenv/pipenv/pyenv/whatever to download packages

12_throw_away

3 replies

22h49m

2024-05-25 19:37:39 UTC

I've never encountered this requirement in many years of daily use - pip for me has always happily installed anything if it can.

Now I've definitely seen customized distributions of python from package managers that have taken steps to prevent you from using pip. IIRC, the python you get from `apt-get install python` in Debian does this? I.e., it's designed to support system utilities, not as a user's general purpose python environment, and they want `apt-get` to control this environment, not pip. So they've removed pip and ensure_pip and easy_install from your core system python environment.

TLDR: In my experience, that requirement doesn't come from pip, it's your distro taking steps to prevent https://xkcd.com/1987/

thayne

0 replies

20h59m

2024-05-25 21:27:54 UTC

I'm not sure if that is upstream or an ununtu or debian patch, but that is the case on Ubuntu 24.04, at least unless you pass the --break-system-packages option.

mort96

0 replies

8h17m

2024-05-26 10:09:40 UTC

https://p.mort.coffee/DEq.png

This happens on Ubuntu, Debian, Fedora and macOS (Homebrew). I'm pretty sure it's just a core part of pip these days.

garblegarble

0 replies

19h57m

2024-05-25 22:29:45 UTC

Sorta, although it is a Python feature for distros to use: installations can be marked as externally managed and pip will refuse (without being forced) to make changes unless in a venv[1][2]

1: PEP 668 https://peps.python.org/pep-0668/

2: https://packaging.python.org/en/latest/specifications/extern...

rfoo

3 replies

1d4h

2024-05-25 14:05:03 UTC

pip install cmake

or even proprietary binaries, pip install nvidia-cudnn-cu12

IshKebab

2 replies

1d2h

2024-05-25 16:03:51 UTC

Yeah I copied CMake's idea of using PyPI and I also use it to distribute some pure Rust CLI tools using Maturin. It works really well. Pip is... well it's about on par with most other package managers, i.e. not great, not terrible, but it has some pretty huge advantages over any other software distribution method on Linux:

* Very likely to be installed already on Linux and probably Mac too.

* Doesn't require root to install. You can even have isolated installs via pyenv.

* I don't have to ask anyone's permission to publish a package.

* I only have to make one package.

If any can think of a better option I'm all ears but until then I'm fairly happy with this hack.

Too

1 replies

2024-05-25 17:47:59 UTC

Some of those arguments are becoming more and more difficult as pip and distros are pushing for use of venvs and now requires a scary --break-system-packages argument if you were to use the pre installed launcher.

IshKebab

0 replies

22h46m

2024-05-25 19:40:36 UTC

That is a good point. Distro package managers have somehow screwed this up too.

bee_rider

1 replies

1d4h

2024-05-25 14:03:51 UTC

Hypothetically if they did try to add some requirement to use Python, people could just comply maliciously by providing the most minimal stub of Python code, right? Linux, but ls is written in Python. So it is probably better just to not play games.

Maxatar

0 replies

1d2h

2024-05-25 15:49:16 UTC

You could embed the binary data in a Python string and then have the installer dump that string to a file.

blt

0 replies

19h4m

2024-05-25 23:22:29 UTC

been using PyPI a lot recently for non-Python stuff such as FFmpeg and Eigen. Part of the reason why I have been able to ditch Homebrew entirely!

ithkuil

6 replies

1d3h

2024-05-25 14:39:15 UTC

I toyed with the idea of piggybacking on (i.e. abusing) the golang proxy and sumdb to have a free transparent log of checksums of arbitrary URLs

https://getsum.pub/

arccy

5 replies

1d3h

2024-05-25 14:49:16 UTC

sounds convoluted. If you just want a public transparency log, the public rekor instance under the sigstore project is much more appropriate for that.

https://www.sigstore.dev/

https://docs.sigstore.dev/logging/overview/

skybrian

2 replies

1d1h

2024-05-25 16:51:50 UTC

Interesting! Looks like it's being used by some npm packages [1] and soon homebrew will be using it [2]. Any other interesting usage?

As a user, the npm usage doesn't seem very prominent. On an npm's web page, there's a checkmark next to the version number on the right side that I hadn't paid any attention to before, with more information at the very bottom of the page. Here's an example. [3]

[1] https://blog.sigstore.dev/npm-provenance-ga/ [2] https://blog.sigstore.dev/homebrew-build-provenance/ [3] https://www.npmjs.com/package/fast-check

mynameisvlad

1 replies

2024-05-25 18:01:10 UTC

It’s at the bottom of the page on mobile. On desktop, that’s the first thing on the right hand side of the screen IIRC.

skybrian

0 replies

2024-05-25 18:24:32 UTC

For me on desktop, the version seems to be the fourth thing down in the right column, under weekly downloads, and there's a checkmark. (Or maybe I'm missing something.)

lpapez

0 replies

1d2h

2024-05-25 15:35:08 UTC

Sure, but the gosum database is a critical piece of worldwide software infrastructure, so you can count on it being accesible behind many firewalls and always up. And it's completely free and anonymus.

Perfect for the purpose.

ithkuil

0 replies

1d2h

2024-05-25 15:40:30 UTC

Yeah when I did that there was no public rekor instance ran by the sigstore project so I choose the only available public transparency log I could bend to my needs (x509 transparency logs were an alternative but it'd quickly hit rate limits by acme providers)

erik_seaberg

5 replies

23h52m

2024-05-25 18:35:04 UTC

W3C laid the groundwork for everything on the Web to be heavily cacheable, so it's weird that there are so few general-purpose proxy caches. Are publishers sending short "Cache-Control: max-age" or "Vary: Cookie" responses when they didn't need to? Are too many ISPs paying for transit rather than peering?

lmz

4 replies

22h4m

2024-05-25 20:22:11 UTC

In general there's no way to ensure the cache hasn't tampered with the contents (e.g. ISP proxy ad injection on non HTTPS sites). For software downloads usually there are signatures and checksums. Arbitrary content, not so much.

arccy

2 replies

21h32m

2024-05-25 20:54:44 UTC

There was HTTP SXG (signed exchanges) but it never seemed to get any traction https://web.dev/articles/signed-exchanges

iainmerrick

1 replies

19h44m

2024-05-25 22:42:45 UTC

Even if the content is signed, there’s still the issue that the proxy gets to see everything you read, right?

arccy

0 replies

10h45m

2024-05-26 07:41:40 UTC

that's the point of a proxy that can share contents between clients...

yencabulator

0 replies

20h37m

2024-05-25 21:49:22 UTC

Maybe use cache only when Subresource Integrity is present.

https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

verdverm

4 replies

1d3h

2024-05-25 14:30:58 UTC

CUE's module system is finally rolling out, MVS likes Go's, but built on OCI infra. If you are interested in dependency management systems, here are some links

- proposal: https://github.com/cue-lang/proposal/tree/main/designs/modul...

- custom registry: https://cuelang.org/docs/tutorial/working-with-a-custom-modu...

- road map: https://github.com/orgs/cue-lang/projects/10/views/8

- in 0.9.0-alpha-5, modules become enabled by default: https://github.com/cue-lang/cue/releases/tag/v0.9.0-alpha.5

For Go Sum, the Trillian project backs the transparency log: https://github.com/google/trillian

CUE plans to piggyback on the OCI options with attestations and such

dgellow

3 replies

1d1h

2024-05-25 16:34:14 UTC

What does that have to do with the linked article?

verdverm

2 replies

1d1h

2024-05-25 16:49:10 UTC

The CUE team worked with the Go team on the module system. From these interactions, and community input, they decided against using a proxy like Go has. The "exploit" in the article was one of the reasons they made this decision, and chose to use OCI registries instead. The V1 proposal actually proposed using the same Go proxy servers as a stopgap, which received significant pushback from the community (I was probably the loudest voice against the idea). The Go team was supportive at the time, but this would have been exactly what OP talks about, having non-Go projects in the proxy/sumdb.

So CUE's module design can be seen as an evolution on Go's, building on the good parts while addressing some of the shortcomings.

Fun fact, CUE started as a fork of Go, mainly for the internal compiler tooling and packages

infogulch

1 replies

19h19m

2024-05-25 23:07:44 UTC

One thing that the Go module system solves that seems to be unaddressed in CUE's design based on OCI is the sum database / transparency log.

I could add a "Statement that we might wish to make for a module M" to the "Module contents assurance" section:

- The content of module M is the same content that everyone else sees for the same `$path@$version`.

Though I guess users can utilize existing solutions like https://github.com/sigstore/cosign or rekor (mentioned elsewhere itt).

verdverm

0 replies

3h17m

2024-05-26 15:09:39 UTC

yup, the plan is to enable sigstore for all users, which provides more than just the sumdb hashing integrity.

https://docs.sigstore.dev/verifying/attestation/

9dev

4 replies

1d3h

2024-05-25 15:17:26 UTC

Off topic: took a look at the domain, had a foreboding on the innuendo, found mostly what I expected on put.as …

yazzku

0 replies

1d2h

2024-05-25 16:17:57 UTC

Not off-topic at all; came here just for this.

moonlion_eth

0 replies

1d2h

2024-05-25 15:57:14 UTC

I made the mistake of doing that at work

mdtrooper

0 replies

21h20m

2024-05-25 21:06:29 UTC

Yes, it is a spanish plural word. But....

KolmogorovComp

0 replies

1d2h

2024-05-25 16:18:20 UTC

https://put.as/ Mildly NSFW.

arccy

3 replies

1d4h

2024-05-25 13:45:04 UTC

it's a known issue https://github.com/golang/go/issues/31866

yjftsjthsd-h

1 replies

1d4h

2024-05-25 13:50:19 UTC

That fix would help with accidents, but wouldn't someone intentionally hoping doing it just add a .mod and .go file to the root?

jerf

0 replies

1d4h

2024-05-25 14:12:37 UTC

How do you "fix" that at all?

In the end, there is no definition of "a source control repository that is a Go module" that is robust to this sort of "attack"... although calling it an "attack" is kind of dubious, the reasons why this is a bad thing strike me as very strained and relatively weak. Mostly it hurts Google by hosting too much stuff, but, good luck bringing them down that way.

oooyay

0 replies

1d2h

2024-05-25 15:28:36 UTC

Color me unsurprised Marwan is on this issue. He and Aaron wrote Athens, Marwan wrote (to my knowledge) the first Go download protocol implementation that Athens is based on.

This issue is kind of curious because Athens already uses the go mod download -json command mentioned as a preflight check for module verification. More or less, if the repo passes the go module commands understanding of a module then Athens will serve it. In more verboten terms:

- a module version, pseudo version, or +incompatible must be able to be formulated

- that module (and it's dependencies) must produce a valid checksum

The checksum of modules just has to do with the current .mod and all files + recursively for each dependency. So, as the author pointed out you can have lots of space for arbitrary files by design so long as you have a basic go program.

IshKebab

2 replies

1d2h

2024-05-25 16:05:45 UTC

Maybe I'm being stupid but what exactly is the issue here? It's probably a bit wasteful of the proxy to cache non-Go repos, but even if it didn't you could make it store arbitrary data just by having it cache a Go repo surely? Sounds like a complete non-issue unless I've missed something.

gnfargbl

0 replies

2024-05-25 17:48:48 UTC

I don't think you've missed anything. The news here appears to be that a unsecured public proxy is willing to proxy things and make them available to the public in an unsecured fashion.

The article does make the point that some monitored networks might trust golang proxy URLs more than arbitrary web URLs and that this could be used for bypassing reputation filters etc -- but there are already several ways to do that, and this one doesn't seem particularly special.

arandomusername

0 replies

1d1h

2024-05-25 16:57:38 UTC

you're right.

palata

1 replies

2024-05-25 18:06:07 UTC

That's maybe naive, but... how is that different than just pushing files to e.g. a GitHub repository? Is it just the fact that you need to create an account for GitHub? Because I can store arbitrary data there, too. Without the 500M limit...

withinboredom

0 replies

22h56m

2024-05-25 19:30:09 UTC

GitHub has some pretty stringent rate limits for anon requests.

kyrra

1 replies

1d4h

2024-05-25 13:46:41 UTC

Googler, opinions are my own. I know nothing about this space.

I would hope the Go team collaborated with GCP and Drive, as hosting malicious files is something Google has to deal with all the time. This isn't much different from other endpoints Google already allows people to put random data on.

vineyardmike

0 replies

13h7m

2024-05-26 05:19:12 UTC

I would hope the Go team collaborated with GCP and Drive

Former Googler, I know nothing about the Go Dev Tools team, but Google collaborates in this way better than almost any massive company I've worked at or heard about from close friends.

Google is really good at having a central team manage infrastructure, and share it across the company. As long as it's not a messenger app. Surely (pure guessing) the go team is using the internal blob store, and I think there is some internal-infra teams that handle abuse and file scanning automatically.