return to table of content

Show HN: 1-FPS encrypted screen sharing for introverts

vngzs
30 replies
1d

Good job releasing your project! It's a cool idea and surprisingly minimalist. That said, I've found a number of cryptographic flaws in the application source. This should not be used in instances where the encryption is mission-critical.

1) You generate a random key [0] and then feed it into PBKDF2 [1] to generate a 32-byte AES-GCM key. If you can generate 32 random bytes instead of 10 reduced-ASCII characters and a key stretch, just do that. PBKDF2 is for turning a password into a key, and it's far from the recommended algorithm nowadays; prefer scrypt if you need to do this sort of thing.

2) AES-GCM with random 12-byte nonces. Never use random IVs with GCM; this breaks the authentication [2] [3]. Given the pitfalls of AES-GCM with respect to random nonces, you might prefer switching to XSalsa20+Poly1305. The advantage of XSalsa is it has an extended nonce length, so you can use random nonces without fear.

3) Random key derivation with a restricted character set can make brute force attacks easier. You should have a 256-bit random key, and if you want that key to be within a certain character set, then encode the byte output from the CSPRNG using that character set.

4) 1fps achieves symmetric key distribution via a URL with a fragment identifier ("#") which IIRC is not sent to the server. Therefore it assumes you have a secure key distribution channel - the link contains the key, so it's important that only the intended recipient can view the part after the "#". If the server is truly malicious, it can deploy client-side Javascript to send the fragment to the server, allowing the server to access the key (and thus cleartext communication).

[0]: https://github.com/1fpsvideo/1fps/blob/main/1fps.go#L99

[1]: https://github.com/1fpsvideo/1fps/blob/main/1fps.go#L287

[2]: https://eprint.iacr.org/2016/475.pdf

[3]: https://soatok.blog/2020/05/13/why-aes-gcm-sucks/

MoonObserver
6 replies
21h25m

Never use random IVs with GCM; this breaks the authentication [2] [3]. Given the pitfalls of AES-GCM with respect to random nonces, you might prefer switching to XSalsa20+Poly1305. The advantage of XSalsa is it has an extended nonce length, so you can use random nonces without fear.

Those papers are a bit over my head. Could you please explain what's wrong with using random IVs here? What should we do instead (assuming we can only use GCM, and not switch to chacha)

conradludgate
3 replies
20h37m

There's two issues.

Background: the key+IV define a keystream which is xor-ed against the message. The same key+IV generate the same keystream. Thus you can XOR two cipher texts and reveal information from the two plaintext.

AES-GCM is authenticated encryption. To combat known-ciphertext-attacks, you want to have authenticated cipher texts. AES-GCM specifically is vulnerable to an attack with a reused IV to recover the authentication key. Allowing you to forge authentication tags and employ a KCA.

The solution, if you're stuck with aes, is to switch to XAES-GCM or better AES-GCM-SIV. Alternatively you must use a counter or checkes system to not reuse IV. Since this is in the context of 1fps, you could use unix timestamp + random bytes to reduce the chance of collisions.

hatsunearu
2 replies
18h40m

Is the statement just that if you use a random value for a nonce rather than some guaranteed never-used-once value, it's possible to get a collision faster than the "natural" block collision complexity (half block size or something like that)?

conradludgate
1 replies
11h6m

It's a birthday attack principle. With only 96bits after roughly a billion messages with the key and random IVs, you start reaching realistic probabilities that you will reuse an IV

irundebian
0 replies
5h55m

And how you will get a billion messages with 1 frame per second?

jszymborski
0 replies
21h1m

Not an expert, but this is my understanding.

1. It is necessary for nonces to never be re-used for a given key lest you open yourself to a certain class of attacks that can decode all messages using that key. This is specific to AES-GCM due to how it internally reuses nonces.

2. AES-GCM uses very small nonces, making the probability of randomly using the same nonce unacceptably as the number of messages encoded with a given key increases (as it would with each frame sent on 1fps).

You can avoid all this by using a different primitive with a longer nonce such as XSalsa (a version of Salsa with a 192-bit nonce)

Vegenoid
0 replies
16h58m

In AES-GCM (simplified explanation), an encrypted message takes 3 inputs: plaintext, a symmetric encryption key (generally unique per-session, as in this program), and a 12-byte nonce (a.k.a. IV).

If an attacker intercepts 2 messages that were encrypted with the same key and the same nonce, they can reveal the authentication key used for the generation of authentication tags (auth tags), and they can then forge auth tags for any message. These auth tags are how a recipient verifies that the message was created by someone who knows the symmetric key that was used to encrypt the plaintext, and that it was not altered in transit.

More simply, it allows an attacker to alter the ciphertext of an encrypted message, and then forge an authentication tag so that the modification of the ciphertext could not be detected. It does not reveal the symmetric key that allows decryption of the ciphertext, or encryption of arbitrary plaintext.

If a random nonce is generated, there is a chance that it is the same as a random nonce that was generated earlier in the session. Since the nonce is 12 bytes, this chance is very small for any 2 random nonces (1 in 2^96), but the chance of a collision increases rapidly with the number of encrypted messages sent in a session (see the birthday problem). It still requires a large number of messages to be sent before the chance of a collision becomes significant: "after 2^28 encryptions the probability of a nonce collision will be around 0,2 % ... After 2^33 [encryptions] the probability will be more than 80 %"[0]

If this program is sending 1 message per second (1 FPS), it would take 8 years for 2^28 messages to be sent. I haven't looked at the code, it may well be sending many more messages than that.

The alternative to a random nonce is a "counter" nonce, which starts at 1 and increments with each message. The potential pitfall of counter nonces is that they can be harder to implement, as they require tracking and updating state. If the program ever fails to track or update this state correctly, nonce reuse will occur. A different counter must be used for each symmetric key (which should be randomly generated for each session).

EXTRA CREDIT: There is also information revealed about the plaintext of the 2 messages that used the same key and nonce - specifically, the XOR of the 2 plaintexts. While this doesn't directly reveal the plaintexts, if some information about one of the plaintexts is known, that can be used to reveal information about the other plaintext.

I learned most of this information from David Wong's Real-World Cryptography.

[0]: https://eprint.iacr.org/2016/475.pdf, section 3.3

red0point
5 replies
23h50m

I feel like there are so many pitfalls when designing this - is there something standard and trusted (would TLS work?) that you could build your application on top of?

yyyfb
1 replies
23h43m

I guess TLS has a dependency on the public key infrastructure (eg Let's Encrypt, or whoever issues wifey accepted certs). Which makes end to end encryption between users harder (most of this stuff is intended for server auth and encryption)?

But otherwise big +1 not to reimplement crypto when the are alternatives. Another option for secret key stuff might be ssh?

bawolff
0 replies
14h52m

There is no requirement to use TLS with webPKI if you are making your own application (not the browser), you can use TLS with custom certificate mangement.

You still need to figure out how you handle trust and key authentication somehow, but that is true of all cryptographic protocols.

vngzs
0 replies
23h19m

I assume there's TLS in the server connection already, but the encryption here is to make the communication unavailable to the server for decryption, so "bare" TLS does not solve the problem.

With TLS you need pubkeys you can trust (the certificate authority hierarchy provides that trust for the open Internet) or you're vulnerable to MITM. You could potentially share pubkeys using a similar out-of-band mechanism to that currently used for symmetric key distriubtion, and tunnel that TLS connection through the server's shared comms channel. That would work OK for two parties, but it becomes significantly more cumbersome if you want three or more, since each TLS session is a pairwise key exchange. Notably, however, this would not transit secret keys through server-controlled web pages where they could be available to Javascript. Something like Noise [0] might also be useful for a similar pubkey model.

Unfortunately, this kind of cryptography engineering is hard. Key distribution and exchange is hard. There isn't much of a way around learning the underlying material well enough to find this sort of issue yourself, but misuse-resistant libraries can help. Google's Tink [1] is misuse-resistant and provides a handful of blessed ways to do things such as key generation, but I'm not sure if it's suitable outside of cloud deployments with KMS solutions. nacl/secretbox handles straight encryption/decryption with sound primitives, but it still requires a correct means of key generation [2] and distribution.

[0]: http://www.noiseprotocol.org/noise.html

[1]: https://github.com/tink-crypto/tink-go

[2]: https://pkg.go.dev/golang.org/x/crypto/nacl/secretbox

dathery
0 replies
23h28m

It would be hard to do end-to-end TLS (where the server proxies the raw connection) because

(a) you can't share one TLS connection to the host between multiple clients; if you wanted multi-client support while preserving end-to-end TLS, the host would need to maintain a TLS connection with each client and waste bandwidth re-uploading the same image

(b) there is no client software requirement, so you would have to do the TLS decryption clientside in the browser (maybe via WASM) unless you're OK with having viewers download software

beltsazar
0 replies
23h20m

there are so many pitfalls when designing this

Agree. When people hear the adage "don't roll your own crypto", they often think it refers to crypto primitives only. In reality, it's also hard to design a secure crypto protocol, even if the underlying crypto primitives are secure.

NotPractical
5 replies
19h56m

Do you have a recommendation to address #4? That seems like an intrinsic problem for web apps, see also ProtonMail.

vngzs
2 replies
19h26m

You're very right! Luckily, we can resolve the vulnerability in this instance, although it's a challenging problem to resolve in general webapps.

The technical explanation for our issue is that the client-side Javascript in our webapp is trusted. To quote the late Ross Anderson [0, pg. 13], "a trusted system or component is one whose failure can break the security policy." In this case, our security policy is that the server must not be capable of viewing our screenshots. Our goal is to make that trusted Javascript more trustworthy: that is, closer to a system that can't fail.

We're at an advantage in this case: there's an open-source application on GitHub with eyeballs[1] on it that users must run on their endpoint machines. Given that we already have source-available local code running, we could instead serve the UI from the local Go application and use CORS[2] to permit access to the remote server. If the local application is trustworthy, and we're only sending data (not fetching remote Javascript), then the local client UI is trustworthy and won't steal your keys. If users run binaries directly from 1fps (as opposed to building from source), then you would want some multi-party verification that those binaries correspond directly to the associated source [3].

Protonmail is almost surprising: it's supposed to be end-to-end encrypted, but it's not end-to-end encrypted in the presence of a malicious server. If, say, a government order compelled Protonmail to deploy a backdoor only when a particular client visited the site, most users would be unaffected and the likelihood of discovery would be low.

[0]: https://www.cl.cam.ac.uk/~rja14/book.html

[1]: https://en.wikipedia.org/wiki/Linus%27s_law

[2]: https://stackoverflow.com/a/45910902

[3]: https://en.wikipedia.org/wiki/Reproducible_builds

refulgentis
1 replies
17h2m

It seems the answer is "no" --- am I right to understand it that way?

Another attempt at compression: use a native app to serve Javascript for the web app so you don't have to trust any server

I don't mean to skip anything, it's just not clear to me how related some of it is, and if it's lengthy just because it is, or because I'm missing something thats very significant (ex. cite + name + page # to help say "something we trust is something we rely on", link to Wikipedia as source for "eyeballs make bugs shallow"

vngzs
0 replies
1h52m

Links are just for reference, but the gist is: serve the webapp from the Go binary instead. The end-user already has to trust the Go binary, and if they need to they can look at the code once and confirm it's not vulnerable. I prefer this to browser extensions because the audit trail process from source to browser extension is less clear; even for open-source browser extensions, I still have to trust the author to not add code that isn't in the repository.

owjofwjeofm
0 replies
18h41m

Meta / WhatsApp have developed their own solution for the whatsapp web client (whatsapp is end-to-end-encrypted): https://engineering.fb.com/2022/03/10/security/code-verify/

it takes the form of a browser extension the user downloads that will tell the user if the javascript code is what it is expected to be. it checks this by verifying the code's expected hash with an endpoint hosted by Cloudflare. Whatsapp can publish new versions to Cloudflare but they can't modify them.

In this case it makes it so that you are trusting Cloudflare instead of just WhatsApp, but (as an amateur), I don't see why this couldn't be adapted into a standard that works with something like a blockchain or certificate authorities (or even something like a git host to go along with public source code auditing?). I think something like this should become a standard and be built into browsers, but currently not a lot of companies are using any solution at all.

The only other implementation of a solution to this that I found, which I think is pretty similar, is Etesync's pgp signed webpages library + browser extension (https://stosb.com/blog/signed-web-pages/), which allows the developer to PGP sign web pages so you know the code has not been modified by a malicious server without the developers approval. So maybe you can use that in your project I guess, or there are probably some other solutions that I haven't found

I think this problem might be called "Code Verification" in cryptography, if you want to look more into it

filleokus
0 replies
6h37m

Apart from having a local binary / extension / some bookmark URI magic I don't think so.

A "lighter" /alternative to a local binary is to a have a local index.html and use SRI when linking to the remote scripts [0]. But seems clunky as well...

[0]: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

RomanPushkin
3 replies
1d

That's pretty cool and this is exactly why I am here :) To have this kind of advice. I'll implement these changes as soon as I can.

mass_and_energy
1 replies
23h53m

This is such a healthy interaction, it makes me so happy to see people lifting each other up like this

Teknomancer
0 replies
14h41m

Love to see things like this on HN.

vngzs
0 replies
23h50m

You will still need to get the nonce and key generation right, but I'd recommend using Golang's nacl/secretbox [0] for a project such as this. It's designed to be relatively misuse-resistant compared to using underlying primitives directly, and under the hood it's XSalsa20+Poly1305 - so you can use random nonces with negligible collision risk.

[0]: https://pkg.go.dev/golang.org/x/crypto/nacl/secretbox

lulzury
1 replies
21h5m

Thank you for sharing this and recommending XSalsa20+Poly1305. I have always been interested in cryptography, so learning about the many ways why one shouldn't roll their own crypto AND protocol is very cool.

Out of curiosity, is the primary reason you don't recommend fixing the nonce issue in this specific case due primarily to the pitfalls in doing so or is it more nuanced and related to the general issues mentioned in the articles above?

A naive perspective could be that one uses AES-GCM because it is used in so many places, such as TLS or SRTP, and someone who is not very well versed in cryptography assumes it can be the way to go.

vngzs
0 replies
20h42m

AES-GCM has more issues than merely the nonce reuse in the context of random nonces. For instance, the short tag issue[0] leaks authentication (not encryption) keys after a probabilistic "forged" message.

In general, the move in modern cryptography engineering is to assume the end user does not know what they are doing. For GCM, you have to get the nonces right and you need the right tag length, and the design uses lookup tables so it's prone to timing attacks in many implementations.

Later on I didn't just recommend an algorithm but a specific implementation (at least if we can find a better method of symmetric key distribution): nacl/secretbox [1]. This is a cryptographic library designed to be misuse-resistant, a property of cryptographic designs that makes implementation errors more difficult. nacl is a few years behind the curve inasmuch as it arguably gives the end-user too much control over key generation, but it permits random nonces (being based upon XSalsa) and provides a simple API that is difficult to mess up.

AES-GCM is secure with a correct implementation, but to build a correct implementation you often need to know the specific library inputs and configuration settings to produce your desired outcome. Something like secretbox doesn't give you those options: you get one relatively secure configuration ... and that's it!

[0]: https://csrc.nist.gov/csrc/media/projects/block-cipher-techn...

[1]: https://pkg.go.dev/golang.org/x/crypto/nacl/secretbox

ww520
0 replies
17h49m

This is an excellent analysis! It's amazing what can be found with a minimal source code review.

somat
0 replies
18h50m

With regard to point 4 (secure key distribution channel), as far as I can tell there is no good pki built into the browser, My point being. any pki tooling has to be shipped by the server and you have to trust the server to supply you honest tools. The saving grace is that this does not really matter and each domain could send you totally broken tools and only be able to steal keys produced for their domain.

footnote: there are client side certs, however because there is no tooling for them built into the browser usability sucks, I want to try to get public key auth working on my toy js application and the browser tooling for user generated keys sucks. I am tempted to use ssh keys(I like ssh keys), but will probably see if I can get hoba working. https://datatracker.ietf.org/doc/html/rfc7486 I got all excited about hoba when I first read about it, but am now a bit bitter when as found out that there is zero internal browser support.

sensanaty
0 replies
8h35m

Any tips on how/where one can learn more on these topics? I find cryptography fascinating, but whenever I've tried looking for some resources on my own, they all flew hilariously above my head, with dozens of acronyms and terms I've never even heard of before even as a native English speaker.

icanhasjonas
0 replies
13h12m

Came here to point out the PBKDF use in each frame but found this fantastic write up

catoc
0 replies
1h55m

"Never use random IVs with GCM; this breaks the authentication"

Why could one not use Encrypt-then-HMAC and HMAC-then-Decrypt with a random IV ?

(Serious question. It definitely sounds like you know what you are talking about, I just can't see what I am missing here)

burkaman
17 replies
1d

What is your use case for screensharing without audio? I can't figure out when that would be useful, you have to communicate with the other person somehow.

RomanPushkin
10 replies
1d

There are many. For example, you're leaving your home computer and going to work. Save a link and see what's going on there. The same for remote desktops.

Another use case if when you have a long-running meeting, and still need to share. I found I sometimes just do not sit and listen for those 1-2 hours meetings, but prefer to code.

And for those of us who just don't like audio, like myself. I have many students who I am willing to help, but I don't wanna get audio-involved. My voice chat is not a very parallelable resource.

remram
3 replies
22h47m

leaving your home computer and going to work. Save a link and see what's going on there

What could be going on when you're not home?

If this is meant as a security tool, the fact you have to look at it is a non-starter.

If this is meant as anything else, why wouldn't you use VNC?

dheera
2 replies
21h52m

VNC doesn't even share your screen, it creates its own offscreen screen and doesn't even load your desktop, and uses some unusable minimalist window manager with a stupid X cursor. Yeah I could probably figure out how to get it to work but it's a chore. Terrible product design.

I've been wanting to create something WebRTC based, I'm not happy with either VNC or RDP.

Dwedit
0 replies
20h35m

On Windows, you do share the screen. I haven't seen any VNC servers that give you a new session like Terminal Services would.

On Linux, I've used both kinds of VNC server. One does start a new X instance, while the other one shares your main X instance. At the time I tried it, it was "TightVNCServer" to get a new X instance, and "X11vnc" to share the existing session.

3np
0 replies
18h30m

TigerVNC x0vncserver[0] is just one option for ezpz sharing your existing X session.

Couple it with novnc if you want it in the web browser. Currently WebSockets but WebCodecs support looks to be around the corner[1].

Terrible product design.

Which "product" are you even talking about here? VNC is a protocol with several different implementations.

[0]: https://tigervnc.org/doc/x0vncserver.html

[1]: https://github.com/novnc/noVNC/pull/1876

burkaman
2 replies
1d

Monitoring a remote machine makes a lot of sense. I'm still not totally getting how it can work for live collaboration, but if it works for you that's great. I do love the minimalist efficiency.

culi
1 replies
23h19m

I don't think it's meant for live collaboration. Given that the OP is pretty focused on overcoming time restrictions, all-day remote machine monitoring seems like exactly the kind of use case they had in mind.

wrs
0 replies
23h1m

What’s the 30FPS cursor tracking for, then?

browningstreet
0 replies
1d

I think adding audio would open interesting use cases though. People hate video, but 1fps isn't video. Audio still feels like an compelling feature. IMO..

CapstanRoller
0 replies
2h48m

For example, you're leaving your home computer and going to work. Save a link and see what's going on there.

Doesn't this require leaving the computer unlocked?

0points
0 replies
11h13m

There are many. For example, you're leaving your home computer and going to work. Save a link and see what's going on there

On what, your primary desktop?

Maybe you are just a bit Windows-centric but I would guess many of us run virtual desktops and/or other means of remote access such as ssh.

For general monitoring, maybe have a look at state of the art solutions, like https://prometheus.io/

skulk
2 replies
1d

From TFA:

1fps.video is perfect for introverts and remote workers who prefer sharing their screen without the pressure of audio or video calls. It's a versatile solution that works alongside any team chat application you're already using.

It seems closer to "text chat while sending screenshots" than "share screen in a voice call." I can see why some would prefer this.

theamk
0 replies
1d

that's what I thought as well, but then I read this part:

we use WebSocket-based cursor tracking, providing smooth, near 30 FPS pointer movement for precise demonstrations.

This part does not seem to support that use case, you don't need 30 FPS pointer tracking for text chat.. Moreover, it'd be actively bad, as the cursor is likely to be pointing to the text chat window.

burkaman
0 replies
1d

I get that, but it's also designed for sharing your entire laptop screen, so you'd have to either switch back and forth between code and chat, or take up half the screen with your chat app, both of which seem like they would be pretty disruptive to the actual screensharing.

It seems like it would be better to just send a screenshot and then discuss, so the other person doesn't have to watch you typing messages to them instead of looking at the actual thing you want to share.

kirykl
1 replies
1d

Scammers use no audio screen sharing while on the phone with a mark

nottorp
0 replies
1d

And disgusting extroverts need full video and audio to ... feel complete?

zarzavat
0 replies
13h53m

You could just call them? Everybody has a phone on them at all times, there’s no need to reinvent that wheel.

cornholio
12 replies
23h40m

Does it use WebRTC? The last time I've looked at this - and what stopped me from releasing a more polished MVP of the same low impact continuous meeting-not-a-meeting concept - is that the only way to scale WebRTC is to use your own paid infrastructure. The only peer to peer topology available WebRTC clients support is a star, so without a multiplexing server you are practically limited to a handful of peers in any session.

So you are either offering a slow and very limited free service, or you need to pay hand over fist and burn venture capital to basically compete with Zoom and WebRTC. Slowing the video stream to very low FPS does help somewhat with scaling, but makes for a niche product.

If you can crack P2P multiplexing and offer an unlimited free service, and tack on some fremium model on that, that this thing can take off like a rocketship, if for no other reason that every team leader in the world wants a continuous feed of their remote worker's desktop. A free and capable screen sharing app can become THE tool for collaboration, disrupting things like Slack if the right features are there.

I'm seriously interested to cofound something like that, let me know if anything I've said makes sense to you.

pjc50
4 replies
22h59m

if for no other reason that every team leader in the world wants a continuous feed of their remote worker's desktop

ahem GDPR?

Besides, that's absolutely the sort of enterpriseware that should be charged for.

cornholio
3 replies
12h15m

Not relevant in the context of an employer provided machine that is used only for work related colab and the employee is aware of it.

You can charge for the enterprise features (and get the resources to develop them in the first place) after you reach a critical mass of users.

wasmitnetzen
2 replies
8h53m

Controlling your employees' day like that is still not allowed in a lot of places with strong worker rights.

cornholio
1 replies
8h30m

We're talking about a collaboration tool here, and there is no jurisdiction that I know of where employing such tools, even when mandated by the employer, is unlawful; definitely not related to GDPR which is completely out of scope here.

Of course, as any tool can be used for bad things, so if, say, instead of the default sharing of just the development apps, the employee shares his browser, email client or instant messenger which he uses for personal purposes, you could argue it crosses the line into unlawful workplace surveillance, so it becomes a matter of setting correct policies. Sounds to me like an enterprise feature set you could charge for, as complement to the free tier.

martinsnow
0 replies
5h42m

As soon as private information (e.g. an email) is opened on that machine, you're in violation of workers rights and gdpr.

walterbell
2 replies
23h4m

> every team leader in the world wants a continuous feed of their remote worker's desktop.

"every" -- why?

Do high-performance teams have low or high trust?

cornholio
1 replies
12h11m

I'm not sure what you are arguing here: that low performance teams do not exist, or that they do not need to be managed, or that we should provide free educational resources to their managers instead of selling them the tools they want?

It's a real problem real companies face, look at r/overemployed for a taste.

chfritz
0 replies
1h29m

Would YOU like to work with your manager looking over your shoulder at all times? A good manager builds trust, rather than needing to rely on control.

burkaman
1 replies
22h59m

Please do not do this. The product you're describing would make the world a significantly worse place.

0xdeadbeefbabe
0 replies
22h21m

The market would punish you too.

lastiteration
0 replies
11h5m

"every team leader in the world wants a continuous feed of their remote worker's desktop"

..talk about micromanaging

cropcirclbureau
0 replies
23h7m

Ah, yes, consistent spying on workers to make managers feel better. How about you feed each frame to an LLM for AI powered productivity monitoring? How about you incorporate web cam for next gen AR AI companionship? How about you make it customizable so that managers can easily roll out and maintain appropriate cultural practices? Studies show synchronized boot clicking, twice daily, can foster excellent dedication and energy to the improtant taskd at hand. Zillion dollar idea buddy.

andriamanitra
12 replies
21h42m

From reading the code it looks like it's just taking a screenshot (.jpg) and sending it once a second. Does doing it that way actually save on bandwidth compared to modern video compression (that re-use information from previous frames)?

I recorded a one minute video clip of me editing some code in VS Code (1440p 10fps, using AV1 encoding) and it was about half the size of 60 JPEG screenshots of the same screen. I would be curious to see your numbers if you've done any tests.

AndrewKemendo
4 replies
21h35m

Seems like is preventing data persistence (replace, delete) was chosen over minimize bandwidth (no optimization)

But could easily do both if you wanted to - though I’m not sure it’s worth the hassle. I agree that this might struggle if used at scale on the same IP

nine_k
3 replies
18h51m

Not only that. JPEG works best on natural-looking images, with gradients, curves, constant and wide color variation, etc. Computer screens very often show entirety different kinds of images, dominated by few flat colors, small details (like text) and sharp edges. That is, exactly by "high-frequency noise" JPEG is built to throw away.

JPEG either makes "smeared" screenshots or low-compression screenshots. PNG often works better.

A proper video codec mostly sends the small changes between frames (including shifts,like scrolling), and relatively rare key frames. It could give both a better visual quality and better bandwidth usage.

What's interesting in the "screenshot per second" solution is that it can be hacked together from common existing pieces, like imagemagic, netcat, and bash; no need to install anything. (Imagine you've got privilege-limited access to a remote box, and maybe cannot even write to disk! Oh wait...)

kijin
2 replies
14h36m

The problem with the JPEG vs. PNG debate for screenshots, is that screenshots can contain anything from photos to text to UI elements to frames of video.

Just open any website and you'll see text right beside photos, or text against a photographic backdrop, often in the middle of being moved around with hardware-accelerated CSS animations.

I think we need an image container format that can use different compression algorithms for different regions or "layers" of the image, and an encoder that quickly detects how to slice up a screenshot into arbitrary layers. Both should be possible with modern tech. I just hope the resulting format isn't patent-encumbered.

nine_k
1 replies
13h21m

Completely agree. JPEG-only is insufficient. PNG-only is insufficient. An adaptive codec would apply a right algorithm to an area depending on its properties.

I suppose than the more modern video compression algorithms apply such image analysis already, to an extent. I don't know how e.g. VNC or RDP work, but it would be naural for them to have provisions like that co save bandwidth / latency, which is often in a shorter supply than computing power.

Of existing still image codecs, JPEG XL seems to have the right properties[1]: the ability to split image to areas and / or layers, and the ability to encode different areas either with DCT or losslessly. But these are capabilities of the format; I don't know how well existing encoder implementations can use them.

[1]: https://en.wikipedia.org/wiki/JPEG_XL#Technical_details

bblb
0 replies
12h41m

how RDP work

Uses a combination of different tech [0]. MS-RDPBCGR is at the base of it all, sort of like the main event loop [1]. MS-RDPEGDI looks into the actual drawing commands and optimizes them on the fly [2]. Then there's the MS-RDPEDC for desktop composition optimizations [3]. Also a bunch of other bits and pieces, like MS-RDPRFX which uses lossy compression optimization [4].

In RDP you don't get to play only with the bitmap or image stream data, but the actual interactions that are happening on the screen. You could say for example that the user right clicked a desktop item. Now send and render only the pop-up menu for this, and track and draw the mouse actions inside that "region" only.

[0] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [1] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [2] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [3] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [4] https://learn.microsoft.com/en-us/openspecs/windows_protocol...

RomanPushkin
3 replies
20h42m

It's not only a matter of bandwidth, but a matter of CPU utilization. I've tried to feed screenshots to ffmpeg and other tools, and it's just... unusable. It works, but consumes way too much resources. At least on my computer (MacBook 13-inch, 2019).

So from on side you have CPU utilization, from the other - network. Network is cheap, but encoding is expensive. This is my thinking at least. I don't have proof - only local experiments, but it's a really good idea to start measuring this.

I also have other ideas in mind on how to scan the screen and send only parts of the screen that have been updated. Probably if I send only a half of the screen, it will beat the video encoding in terms of network. The diff algo should be very fast though, since we're dealing (in case of 1280x720) with 1280x720=914400, 914400*4 = 3.49 MB info processing in 1 second.

Also, curious to hear about video encoding efficiency vs 60x JPEG creation. Is it comparable?

satellitemx
0 replies
16h18m

I've tried to feed screenshots to ffmpeg and other tools, and it's just... unusable. It works, but consumes way too much resources.

Did you try to use the hardware encoder? Modern computers have chips to accelerate/offload video encode/decode. Your 2019 Mac has Intel GPU for H.264 and HEVC hw encoder, also it has an T2 co-processor that can also encode HEVC video. If you don't supply specific encoders (with _videotoolbox suffix on Mac) via -c:v then ffmpeg will default to sw encoder, which consumes CPU.

how to scan the screen and send only parts of the screen that have been updated

You'll be reinventing video codecs with interframe compression.

Also, curious to hear about video encoding efficiency vs 60x JPEG creation. Is it comparable?

I see that you are comparing pixel by pixel for each image to dedupe and also resizing the image to 1280px. Also the image has to be encoded to JPEG. All of the above are done in CPU. In essense you implemented Motion JPEG. Below is a command to allow you to evaluate a more effecient ffmpeg setup.

ffmpeg \

-f avfoundation -i "<screen device index>:<audio device index>" \ # specific to mac

-an \ # no audio

-c:v h264_videotoolbox \ # macos h.264 hardware encoder -r 1 \ # 1fps

-vf scale=1920:-1 \ # 1920px wide

-b:v 2M \ # 2Mbps bitrate for clear and legible text

out.mp4 # you may want to setup a RTMP server so that ffmpeg can transmit the encoded video stream to and allow visitors to view the video

gregw2
0 replies
5h16m

I co-built a similar screen sharing app (with a web server seeing traffic in the middle though) many years ago. MPEG isn't a good fit. We tried JPEG but it didn't look great. Version 1 decomposed the screen into regions, looked at which regions changed, and transmitted PNGs.

The second version tried to use the VNC approach developed years ago by AT&T and open sourced. The open source implementation glitched just a bit much for my liking. Various companies white labelled VNC in those days; not sure they fed back all their fixes. But the raw VNC protocol has a lot of good compression ideas specific to the screen sharing domain and documented in papers. People also tunneled VNC over SSH. I jerryrigged an HTTPS tunnel of sorts.

After a while I started to suspect if I wanted to get higher frame rates I should use a more modern screen sharing oriented Microsoft-based/specific codec. But it wasn't my skillet so I never actually went down that route. But I'd recommend researching using others screen-optimized lossless codecs, open or closed, so you don't reinvent the wheel on that side of things if you're serious about reducing bandwidth.

bawolff
0 replies
14h48m

Keep in mind most codecs can be tuned. Live encoding is a very different use case than encoding a video file you only need later. Most codecs have knobs you can turn to make it have lower latency&cpu in exchange for somewhat larger file sizes.

Retr0id
1 replies
21h17m

This was my first thought too, there's no reason not to use a standard codec, just configured to run at 1fps.

j16sdiz
0 replies
17h54m

depends if you need low latency

modern codec do motion prediction and don't work with with low frame rate out of the box.

andai
0 replies
12h10m

Ten years ago I was experimenting with TimeSnapper Classic, a free utility for Windows that takes a screenshot every 5 seconds. The neat feature is it lets you view a timelapse of your day.

The screenshots were taking up a lot of disk space. I noticed that very little changed between pictures, so I started thinking of an algorithm that would make use of this characteristic: store only the changes between subsequent images. A few minutes in, I realized I was reinventing video compression!

So, I just used ffmpeg to turn the image sequence into a mp4. It was something like 95% reduction in file size. (I think I used ImageMagick to embed the timestamp filename into the images themselves, thus recreating basically all the features of TimeSnapper Classic with 2 commands.)

WatchDog
4 replies
16h27m

I tired of sharing screen via Google Meet with 1-hour limitation, with Zoom and 40-minute limitation, etc.

FWIW, jitsi[0] is an open source[1] WebRTC based, full featured, video conference/meeting alternative to zoom, google-meet, slack, etc.

You can use it via the main site, or self-host it if you like.

[0]: https://meet.jit.si/ [1]: https://github.com/jitsi

unnouinceput
2 replies
14h14m

I prefer https://talk.brave.com/ over jitsi. Same idea but I find it better than jitsi

throwaway5070
0 replies
11h37m

This is literally a themed Jitsi instance. Even fetches its assets from Jitsi's CDN, lol

amelius
0 replies
9h52m

The maximum of 4 users can be a bit limiting.

zokier
3 replies
1d

could be interesting concept to try to make some heuristics for picking which frame to use; just blindly picking always the latest frame is unlikely to be ideal, instead you might want to pick frames where there is little movement, or no ongoing animations, or some other similar metrics. if you want to be super fancy, you could try to do this analysis per-window and then construct some sort of aggregate for the whole frame.

philsnow
0 replies
23h41m

you could try to do this analysis per-window and then construct some sort of aggregate for the whole frame

This seems like it could get into the area of smartphone "cameras" that do so much computation on the output of the light sensors that it can hardly be called photography [0]. It's a cool idea (in chess I've heard a similar idea called "quiescence search"[1]), but probably not worth the trouble.

[0] https://old.reddit.com/r/Android/comments/11nzrb0/samsung_sp...

[1] https://en.wikipedia.org/wiki/Quiescence_search

eddd-ddde
0 replies
23h57m

I think this may defeat the purpose of minimal compute utilisation.

It definitely sounds like a great idea and an interesting problem.

RomanPushkin
0 replies
23h55m

oh, I like it actually! I had idea to scan the screen with 5-10 pixel sparse 100x100 matrix at the point where the change has been found previously to detect a possible screen change.

38
3 replies
18h52m

does not work on windows

RomanPushkin
2 replies
16h18m

See updated docs for Windows users: https://github.com/1fpsvideo/1fps?tab=readme-ov-file#windows...

Please understand the Windows toolchain is often broken, and binaries are preferable. I will roll out binaries soon for all platforms. I also noticed some cursor coordinates issues on Windows machine (I have high resolution). I'm wondering if you have any issues with that as well.

Good news is that there are steps to workaround and we're aware of these issues :) I guess check back later, I hope we gonna fix it soon!

38
0 replies
32m

No, you are just ignorant on windows development

0points
0 replies
11h9m

Please understand the Windows toolchain is often broken

What is going on? The golang windows toolchain is absolutely not "often broken".

AndrewKemendo
2 replies
21h40m

I was looking for something like this today because we’re remote monitoring a physical test event and having an open google meet with recording is a mess - however we would still want to be able to have text chat for the interface

Seems like this is a really good minimal interface - if I’m feeling wonky I might extend it with chat persistence somehow

I am assuming any additional synchronized text or voice is done elsewhere like calling someone on the phone or clarification via text on slack right?

RomanPushkin
0 replies
20h38m

This is what I sometimes do with a friend of mine: WhatsApp phone call and a simple screen sharing between my laptop and his PC.

CyberDildonics
0 replies
19h9m

I was looking for something like this today

Earlier today you were thinking you specifically needed exactly 1 fps?

popcalc
1 replies
1d

# github.com/go-vgo/robotgo In file included from go/pkg/mod/github.com/go-vgo/robotgo@v0.110.1/key.go:15: ./key/keypress_c.h:22:18: fatal error: X11/extensions/XTest.h: No such file or directory 22 | #include <X11/extensions/XTest.h> | ^~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated.

https://github.com/go-vgo/robotgo?tab=readme-ov-file#require...

There are some prereqs not listed your page. On Mint 22 I had to install the libxtst-dev package.

RomanPushkin
0 replies
1d

Thanks, I will update docs. "robotgo" has some issues on Windows, I am currently looking into some of them.

imagetic
1 replies
1d

I love it.

Our workflow is built around removing the need for an office and technical infrastructure. We have live streams of our timeline output (video editing) and an open comms channel. Most of our team is pretty introverted, so it's a push to talk system. We mostly just leave notes in the chat if it doesn't warrant a full discussion.

Crude solutions are often the ones that get adopted.

RomanPushkin
0 replies
23h59m

I'm happy to hear that. I've just open sourced the server part, so you can play with it a little bit more! I will improve docs over the time, so it's easier to follow.

formerly_proven
1 replies
21h47m

1 FPS screen sharing? Isn't that just MS Teams on a tuesday?

rjsw
0 replies
1h7m

Teams does the no-audio part too.

account42
1 replies
6h47m

You need Golang installed for this command to work.

What makes all these newfangled language ecosystems think this is an acceptable way to distribute your software. No, I don't want to install yet another giant tree of language specific crap. Learn to distribute self-contained binaries please.

Cyberdog
0 replies
3h21m

For most professional software, I agree. For hobby projects like this, I can certainly understand the appeal of asking users to compile it themselves rather than trying to provide binaries for all of the possible OS and hardware combinations in use today, most of which you might not even have access to for testing.

That said, certainly there does seem to be a weird glib assumption by some hobby/OSS projects that of course I already have the same developer ecosystem/toolchain installed on my system as they do - of course I have a Rust toolchain or can install dependencies with Homebrew or am using Linux with systemd/Wayland. This project at least kindly asks me to install Go first if I haven't already.

KomoD
1 replies
1d

I'm having issues with the cursor tracking, I don't know if it's because I have multiple monitors or something like that?

Here's a pic, the ring is where my cursor actually is https://i.imgur.com/TvzskjS.png

RomanPushkin
0 replies
23h57m

Thanks, you have my promise that for every major use case there is going to be a fix. Feel free to check out at a later time if has been fixed or not. I appreciate the feedback

Dwedit
1 replies
20h37m

Moonlight Game Streaming has pretty much displaced VNC for my uses. It just needs some better features for things like file transfer, clipboard sharing, etc...

nicman23
0 replies
12h45m

i just need the clipboard to be honest and maybe a service that can nat punch

samstave
0 replies
18h29m

This is awesome - and you've got some great advice.

The following was just inspired by the idea of yours "1fps screen sharing":

If you do 1FPS screen sharing - then create a private gallery on imgur.com - and have a thing update screenshots in a single gallery with a garbage collecting thread deleting a screenshot every 30 seconds/interval...

then have user have hidden gallery url and auto refresh browser tab.

This might work well actually if you have a small data connector and a device that can upload like only when event occurs.

Just upload that to replace the file in the imgur gallery - and you have a free cloud cam.

rustcleaner
0 replies
23h45m

limited proprietary GNU-nonfrenly screensharing

Was Rustdesk on-radar?

goldielox
0 replies
11h19m

Cool! Been working on some automation bots in golang lately, so could I use your program to monitor my screen on the go over the phone? cheers

fitsumbelay
0 replies
21h13m

very cool idea, and the pro-introverts pitch is very interesting

I really appreciate the discussion about the tech involved, especially non-go lang info and advice. peak HN imo

andrea76
0 replies
23h43m

Does it support wayland?