return to table of content

Abusing Go's Infrastructure

miki123211
29 replies
1d3h

Any online service that lets users upload material that is then publicly visible will eventually be used for command-and-control, copyright infringement and hosting CSAM. This is especially true for services that have other important uses besides file hosting and hence are hard to block.

This already happened to Twitter[1], Telegram[2], and even the PGP key infrastructure[3], not to mention obvious suspects like GitHub.

[1] https://pentestlab.blog/2017/09/26/command-and-control-twitt... [2] https://www.blazeinfosec.com/post/leveraging-telegram-as-a-c... [3] https://torrentfreak.com/openpgp-keyservers-now-store-irremo...

liquidgecka
25 replies
1d1h

And Gmail and Google groups, and Google drive, and Gchat, on and on. The data you store doesn't even have to be public. With Gmail they would distribute credentials to log in and read attachments that they uploaded via imap.

(I am a former Google SAD-SRE [Spam, Abuse, Delivery])

zepolen
14 replies
1d1h

Question, how would you know without invading the user's privacy?

kccqzy
5 replies
1d

An algorithm that processes private user data is by itself not invading anyone's privacy. It's clear to me that invasion of privacy only happens when humans look at private user data directly, or look at user data that's not sufficiently processed by an algorithm.

Otherwise, something as simple as a spell checker would be an invasion of privacy because it literally looks at every word in an email you write. That's absurd.

_heimdall
2 replies
1d

At least in my opinion, there's a big difference with where the data lives and where the checking algorithm is run. I don't think a spell checker would fall into what I'd consider a privacy concern as long as the spell checker is running locally on my device.

imachine1980_
1 replies
22h28m

I don't work in the area of email nor Google but I see two problems.

1) you need to constantly update the spell checker so each time you say this is word or something like that most likely the data is send the problem is part of the data, I assume Google do something similar whit data send to span and mark as not spam. This is full email redirect and analysis not partial like old word processing.

2)I feel ai make this even more harder so now you can't simply check patterns as simply as before, and you need to check the whole content constantly

_heimdall
0 replies
15h42m

We've had spell/grammar checkers in word processors that worked totally offline for a long time now. They definitely can be improved with a hosted service but that's by no means necessary and comes with tradeoffs like latency and offline support.

kortilla
0 replies
12h39m

An algorithm that denies service, changes ad behavior, etc based on user content is definitely invading privacy compared to your spell checker case.

The spell checker would also be a massive privacy invasion if if flagged users based on the content of what they wrote.

ants_everywhere
0 replies
16h12m

If an algorithm is looking through private stuff and making a decision based on it or is sending signals where the signal depends on the private stuff, then it's pretty much by definition leaking private information.

An algorithm that leaked no private information would not be useful to a business. It would do a bunch of computation and then throw it away. So realistically anything that looks at private information is privacy-relevant.

That includes even just the email headers. To quote the former head of the NSA "We Kill People Based on Metadata" https://abcnews.go.com/blogs/headlines/2014/05/ex-nsa-chief-...

You can have debates about how much private information should be leaked and for what purposes. But I don't think having a threshold like "it's all private unless another human reads it" is a good way to think about the issue.

carom
3 replies
23h6m

Companies are legally obligated to scan for CSAM in the US.

toast0
2 replies
21h59m

I don't think that's accurate... Do you have a link?

I do think there is an obligation to report if any is found, but I don't think they need to look.

j16sdiz
0 replies
17h10m

I dont think that's a hard legal requirement to scan. Just some law around what to do once they are known, and some executive arrangements

_trampeltier
2 replies
19h20m

I think there was a case, where several people loged in the same Gmail account and shared data not by sending mails, just by write and read drafts.

liquidgecka
0 replies
36m

yep.. And it would split uploads across dozens of accounts with parity so that if any account was disabled it could re-create the data from what was in the other accounts. (think bittorrent using imap uploaded content in gmail)

bottom999mottob
0 replies
15h55m

You might be thinking of the General David Patraeus case, a national security leak that was slightly worse than Snowden's, but with little repurcussions :)

liquidgecka
0 replies
42m

Pre-AI we had a system that watched user patterns and would identify possibly suspect patterns that were outside of the norm. We also had system that would content-id the images and attachments to see what was going uploaded in a general way. Given enough suspicion then the account would be opened to look for abusive patterns.

There is absolutely no promise on any cloud hosted services that a human will not ever see your data. However, at Google it was made very, very, VERY clear that if we had to scan somebody's personal email for any reason then discussion of the contents outside of legally mandated, or required for work ways would lead to immediate termination and possible lawsuit for any damages to reputation incurred.

While fixing user accounts, or dealing with delivery of content I saw epic piles of personal email. Besides the ones full of CASM or other abusive material I couldn't say that I ever remembered the contents 30 minutes later. Its like a checker at a grocery store. They don't care about whatever embarrassing tings your buying and won't remember you 10 minutes later. =)

howenterprisey
3 replies
1d1h

Just curious, "Delivery" doesn't seem to be the same sort of thing as "Spam" and "Abuse": why are the three grouped?

romafirst3
0 replies
1d

Delivery is what happens if it’s not spam or abuse.

liquidgecka
0 replies
48m

I was apparently not watching this well enough, sorry for the delayed response.

Deliver was because we ran the SMTP and queuing infrastructure at the time. We started as Gmail SRE, then split out some of the delivery and abuse services into its own team (SAD), then SAD got SRE, hence SAD-SRE =)

fred256
0 replies
1d1h

No inside information, but presumably this means Delivery to other organizations, which, among other things, includes maintaining outbound IP reputation, which is closely related to Spam and Abuse.

gloryjulio
3 replies
1d

Just a side note, I found the name sad sre funny and blursed at the same time

cdelsolar
2 replies
22h45m

Whats’ blursed mean?

zarathustreal
0 replies
44m

What does “Whats’” mean?

jprete
0 replies
22h43m

Simultaneously blessed and cursed.

follower
1 replies
21h37m

I am a former Google SAD-SRE

From long enough ago that I should apologize to you for libgmail: https://libgmail.sourceforge.net ? :D

liquidgecka
0 replies
38m

libgmail was the least of our problems. There was a Polish software team that wrote a bittorrent layer on top of Gmail. That thing was a pain in the butt as they constantly improved it to get around abuse filters and such. Plus it had parity bits so if we killed accounts it would just re-replicate the data to new accounts.. That software was devilish and impressive at the exact same time. =)

weinzierl
1 replies
1d

Not sure if it has already happened, but the not so obvious one is HuggingFace.

nerdponx
0 replies
1d

It seems like it would be pretty easy to use PyPI for this, because packages can contain arbitrary non-Python files. And you can also do things like base 64 encoding your files in strings in Python code.

8organicbits
12 replies
1d4h

I know pypi has some non-python projects as well. Python needs the ability to distribute wheels, which are compiled binaries, as the user may not be able to compile library code. Lots of that code is written in C, but Golang[1] is also possible. I can't find an example, but I believe I've seen this used for distributing applications (not libraries) as well. It's kinda cool to write some app in C, upload to pypi, and then ask users to install with `pip install`.

[1] https://github.com/popatam/gopy_build_wheel_example

mort96
4 replies
1d

I guess that was much more useful as a use case before pip started requiring you to be in a venv/virtualenv/pipenv/pyenv/whatever to download packages

12_throw_away
3 replies
22h49m

I've never encountered this requirement in many years of daily use - pip for me has always happily installed anything if it can.

Now I've definitely seen customized distributions of python from package managers that have taken steps to prevent you from using pip. IIRC, the python you get from `apt-get install python` in Debian does this? I.e., it's designed to support system utilities, not as a user's general purpose python environment, and they want `apt-get` to control this environment, not pip. So they've removed pip and ensure_pip and easy_install from your core system python environment.

TLDR: In my experience, that requirement doesn't come from pip, it's your distro taking steps to prevent https://xkcd.com/1987/

thayne
0 replies
20h59m

I'm not sure if that is upstream or an ununtu or debian patch, but that is the case on Ubuntu 24.04, at least unless you pass the --break-system-packages option.

mort96
0 replies
8h17m

https://p.mort.coffee/DEq.png

This happens on Ubuntu, Debian, Fedora and macOS (Homebrew). I'm pretty sure it's just a core part of pip these days.

rfoo
3 replies
1d4h

pip install cmake

or even proprietary binaries, pip install nvidia-cudnn-cu12

IshKebab
2 replies
1d2h

Yeah I copied CMake's idea of using PyPI and I also use it to distribute some pure Rust CLI tools using Maturin. It works really well. Pip is... well it's about on par with most other package managers, i.e. not great, not terrible, but it has some pretty huge advantages over any other software distribution method on Linux:

* Very likely to be installed already on Linux and probably Mac too.

* Doesn't require root to install. You can even have isolated installs via pyenv.

* I don't have to ask anyone's permission to publish a package.

* I only have to make one package.

If any can think of a better option I'm all ears but until then I'm fairly happy with this hack.

Too
1 replies
1d

Some of those arguments are becoming more and more difficult as pip and distros are pushing for use of venvs and now requires a scary --break-system-packages argument if you were to use the pre installed launcher.

IshKebab
0 replies
22h46m

That is a good point. Distro package managers have somehow screwed this up too.

bee_rider
1 replies
1d4h

Hypothetically if they did try to add some requirement to use Python, people could just comply maliciously by providing the most minimal stub of Python code, right? Linux, but ls is written in Python. So it is probably better just to not play games.

Maxatar
0 replies
1d2h

You could embed the binary data in a Python string and then have the installer dump that string to a file.

blt
0 replies
19h4m

been using PyPI a lot recently for non-Python stuff such as FFmpeg and Eigen. Part of the reason why I have been able to ditch Homebrew entirely!

ithkuil
6 replies
1d3h

I toyed with the idea of piggybacking on (i.e. abusing) the golang proxy and sumdb to have a free transparent log of checksums of arbitrary URLs

https://getsum.pub/

skybrian
2 replies
1d1h

Interesting! Looks like it's being used by some npm packages [1] and soon homebrew will be using it [2]. Any other interesting usage?

As a user, the npm usage doesn't seem very prominent. On an npm's web page, there's a checkmark next to the version number on the right side that I hadn't paid any attention to before, with more information at the very bottom of the page. Here's an example. [3]

[1] https://blog.sigstore.dev/npm-provenance-ga/ [2] https://blog.sigstore.dev/homebrew-build-provenance/ [3] https://www.npmjs.com/package/fast-check

mynameisvlad
1 replies
1d

It’s at the bottom of the page on mobile. On desktop, that’s the first thing on the right hand side of the screen IIRC.

skybrian
0 replies
1d

For me on desktop, the version seems to be the fourth thing down in the right column, under weekly downloads, and there's a checkmark. (Or maybe I'm missing something.)

lpapez
0 replies
1d2h

Sure, but the gosum database is a critical piece of worldwide software infrastructure, so you can count on it being accesible behind many firewalls and always up. And it's completely free and anonymus.

Perfect for the purpose.

ithkuil
0 replies
1d2h

Yeah when I did that there was no public rekor instance ran by the sigstore project so I choose the only available public transparency log I could bend to my needs (x509 transparency logs were an alternative but it'd quickly hit rate limits by acme providers)

erik_seaberg
5 replies
23h52m

W3C laid the groundwork for everything on the Web to be heavily cacheable, so it's weird that there are so few general-purpose proxy caches. Are publishers sending short "Cache-Control: max-age" or "Vary: Cookie" responses when they didn't need to? Are too many ISPs paying for transit rather than peering?

lmz
4 replies
22h4m

In general there's no way to ensure the cache hasn't tampered with the contents (e.g. ISP proxy ad injection on non HTTPS sites). For software downloads usually there are signatures and checksums. Arbitrary content, not so much.

iainmerrick
1 replies
19h44m

Even if the content is signed, there’s still the issue that the proxy gets to see everything you read, right?

arccy
0 replies
10h45m

that's the point of a proxy that can share contents between clients...

verdverm
4 replies
1d3h

CUE's module system is finally rolling out, MVS likes Go's, but built on OCI infra. If you are interested in dependency management systems, here are some links

- proposal: https://github.com/cue-lang/proposal/tree/main/designs/modul...

- custom registry: https://cuelang.org/docs/tutorial/working-with-a-custom-modu...

- road map: https://github.com/orgs/cue-lang/projects/10/views/8

- in 0.9.0-alpha-5, modules become enabled by default: https://github.com/cue-lang/cue/releases/tag/v0.9.0-alpha.5

For Go Sum, the Trillian project backs the transparency log: https://github.com/google/trillian

CUE plans to piggyback on the OCI options with attestations and such

dgellow
3 replies
1d1h

What does that have to do with the linked article?

verdverm
2 replies
1d1h

The CUE team worked with the Go team on the module system. From these interactions, and community input, they decided against using a proxy like Go has. The "exploit" in the article was one of the reasons they made this decision, and chose to use OCI registries instead. The V1 proposal actually proposed using the same Go proxy servers as a stopgap, which received significant pushback from the community (I was probably the loudest voice against the idea). The Go team was supportive at the time, but this would have been exactly what OP talks about, having non-Go projects in the proxy/sumdb.

So CUE's module design can be seen as an evolution on Go's, building on the good parts while addressing some of the shortcomings.

Fun fact, CUE started as a fork of Go, mainly for the internal compiler tooling and packages

infogulch
1 replies
19h19m

One thing that the Go module system solves that seems to be unaddressed in CUE's design based on OCI is the sum database / transparency log.

I could add a "Statement that we might wish to make for a module M" to the "Module contents assurance" section:

- The content of module M is the same content that everyone else sees for the same `$path@$version`.

Though I guess users can utilize existing solutions like https://github.com/sigstore/cosign or rekor (mentioned elsewhere itt).

9dev
4 replies
1d3h

Off topic: took a look at the domain, had a foreboding on the innuendo, found mostly what I expected on put.as …

yazzku
0 replies
1d2h

Not off-topic at all; came here just for this.

moonlion_eth
0 replies
1d2h

I made the mistake of doing that at work

mdtrooper
0 replies
21h20m

Yes, it is a spanish plural word. But....

KolmogorovComp
0 replies
1d2h

https://put.as/ Mildly NSFW.

yjftsjthsd-h
1 replies
1d4h

That fix would help with accidents, but wouldn't someone intentionally hoping doing it just add a .mod and .go file to the root?

jerf
0 replies
1d4h

How do you "fix" that at all?

In the end, there is no definition of "a source control repository that is a Go module" that is robust to this sort of "attack"... although calling it an "attack" is kind of dubious, the reasons why this is a bad thing strike me as very strained and relatively weak. Mostly it hurts Google by hosting too much stuff, but, good luck bringing them down that way.

oooyay
0 replies
1d2h

Color me unsurprised Marwan is on this issue. He and Aaron wrote Athens, Marwan wrote (to my knowledge) the first Go download protocol implementation that Athens is based on.

This issue is kind of curious because Athens already uses the go mod download -json command mentioned as a preflight check for module verification. More or less, if the repo passes the go module commands understanding of a module then Athens will serve it. In more verboten terms:

- a module version, pseudo version, or +incompatible must be able to be formulated

- that module (and it's dependencies) must produce a valid checksum

The checksum of modules just has to do with the current .mod and all files + recursively for each dependency. So, as the author pointed out you can have lots of space for arbitrary files by design so long as you have a basic go program.

IshKebab
2 replies
1d2h

Maybe I'm being stupid but what exactly is the issue here? It's probably a bit wasteful of the proxy to cache non-Go repos, but even if it didn't you could make it store arbitrary data just by having it cache a Go repo surely? Sounds like a complete non-issue unless I've missed something.

gnfargbl
0 replies
1d

I don't think you've missed anything. The news here appears to be that a unsecured public proxy is willing to proxy things and make them available to the public in an unsecured fashion.

The article does make the point that some monitored networks might trust golang proxy URLs more than arbitrary web URLs and that this could be used for bypassing reputation filters etc -- but there are already several ways to do that, and this one doesn't seem particularly special.

arandomusername
0 replies
1d1h

you're right.

palata
1 replies
1d

That's maybe naive, but... how is that different than just pushing files to e.g. a GitHub repository? Is it just the fact that you need to create an account for GitHub? Because I can store arbitrary data there, too. Without the 500M limit...

withinboredom
0 replies
22h56m

GitHub has some pretty stringent rate limits for anon requests.

kyrra
1 replies
1d4h

Googler, opinions are my own. I know nothing about this space.

I would hope the Go team collaborated with GCP and Drive, as hosting malicious files is something Google has to deal with all the time. This isn't much different from other endpoints Google already allows people to put random data on.

vineyardmike
0 replies
13h7m

I would hope the Go team collaborated with GCP and Drive

Former Googler, I know nothing about the Go Dev Tools team, but Google collaborates in this way better than almost any massive company I've worked at or heard about from close friends.

Google is really good at having a central team manage infrastructure, and share it across the company. As long as it's not a messenger app. Surely (pure guessing) the go team is using the internal blob store, and I think there is some internal-infra teams that handle abuse and file scanning automatically.