Long time ago, I did sth similar, i.e. made a screenshot every few seconds, with the purpose to automatically extract information from it, e.g. how long I was using some app.
I wrote a PNG DB to split PNG images into many blocks and have each block stored in a DB. If there are several equal blocks, it is only stored once. Via a hash table, the lookup for such blocks is made fast. With this PNG DB, I have a compression rate of about 400-500%. https://github.com/albertz/png-db
Some of the scripts I used to analyze the screenshots are here, but in the end, it was not really so successful and reliable: https://github.com/albertz/screenshooting
In the end, that lead to another project, where I just was storing that information more directly, i.e. what application was in the foreground, what file was open. https://github.com/albertz/timecapture
On Windows I use a small program that grabs a frame every second through the desktop API as a DirectX texture, and compresses that straight on the GPU to h265 using AMF. I'll upload the source in case it's interesting for anyone else.
I would love this!
Alright, here you go.
https://github.com/kaetemi/second_capture/blob/master/second...
why does CPP code always look so messy and unintelligible?
like if I see C# or Python it makes sense to me at least in some way
whereas CPP code always looks like it's powering some rocket engine?
Also thanks for sharing!
Largely because it's a melting pot of ancient and modern coding standards. Got the C Win32 API along COM style and then whatever AMF is doing. Makes things very verbose and explicit.
I've seen worse Python.
Personally, I think it's charming. :)
Well first of all, C++ is the language you'd be using to power a rocket engine. And second, that code is a terrible example because most of it isn't C++. Large parts of that are very C like or directly C because it's using the Windows API.
Could it be that you're just more used to looking at C#/Python than other things, then other things are more foreign and therefor look messy?
As another anecdote, I cannot stand browsing/looking through C# code as it tends to be filled with various classes just to basically write very basic programs. The amount of over-engineering I've seen in C# surpasses everything else I've looked at. Not to mention how people seem to arbitrary chose between private/public with no real consensus on when to use what, everything seems to be encapsulated in the wrong way. And don't get me started on the infrastructure around it, csproj vs sln and dealing with dependencies/assemblies.
But then I mostly write Clojure code day-to-day, and I realize that my troubles for dealing with C# is mostly because of what I'm used to, not because the language itself is inherently shit. I only have myself to blame for this. I'm sure people who write C# day-to-day have the same feelings about Clojure as I have about C#.
Thanks, I am giving it a try - any dependency required for Windows 10? Compiles fine, but get an error about AVIFileInit - maybe to do with <vfw.h>?
Thanks for open sourcing it so fast
+1
See sibling comment response. :)
As much as I dislike the current AI hype, a local on-machine AI model that can read/interpret videos/thousands of images (basically a recording of screen time combined with video/audio/handwriting of my everyday life), store it in an indexed format, and project it back to me in an easy to understand/quickly digestible format would be a godsend I'd invest a lot of money into (provided false positives were close to zero)
To me, it seems obvious Apple eventually builds this into MacOS (“it’s a feature not a product”). This is like local apps or native OS features that would index your drive contents and provide a frontend to query, but on steroids. This also gets us closer to transparent computing.
Rewind claims to do this, but you'll have to trust them on the local claims, it's not open source: https://www.rewind.ai/
I would love this project to serve that need and personally want this to.
Absolutely. Combine it with real-time analysis of your current screen, and you've got a computer that knows the complete history of what you're doing and why. That kind of global analysis could be really useful.
I used ffmpeg to try to do smart compression for me (diffing etc)- but run OCR first. Also did a poor man’s text merging to try to make use of the overlap from scrolling
What OCR did you use?
Tesseract?
What was the performance (of the OCR) like?