Show HN: Rem: Remember Everything (open source)

Long time ago, I did sth similar, i.e. made a screenshot every few seconds, with the purpose to automatically extract information from it, e.g. how long I was using some app.

I wrote a PNG DB to split PNG images into many blocks and have each block stored in a DB. If there are several equal blocks, it is only stored once. Via a hash table, the lookup for such blocks is made fast. With this PNG DB, I have a compression rate of about 400-500%. https://github.com/albertz/png-db

Some of the scripts I used to analyze the screenshots are here, but in the end, it was not really so successful and reliable: https://github.com/albertz/screenshooting

In the end, that lead to another project, where I just was storing that information more directly, i.e. what application was in the foreground, what file was open. https://github.com/albertz/timecapture

On Windows I use a small program that grabs a frame every second through the desktop API as a DirectX texture, and compresses that straight on the GPU to h265 using AMF. I'll upload the source in case it's interesting for anyone else.

I would love this!

Alright, here you go.

https://github.com/kaetemi/second_capture/blob/master/second...

why does CPP code always look so messy and unintelligible?

like if I see C# or Python it makes sense to me at least in some way

whereas CPP code always looks like it's powering some rocket engine?

Also thanks for sharing!

Largely because it's a melting pot of ancient and modern coding standards. Got the C Win32 API along COM style and then whatever AMF is doing. Makes things very verbose and explicit.

I've seen worse Python.

Personally, I think it's charming. :)

Well first of all, C++ is the language you'd be using to power a rocket engine. And second, that code is a terrible example because most of it isn't C++. Large parts of that are very C like or directly C because it's using the Windows API.

like if I see C# or Python it makes sense to me at least in some way

Could it be that you're just more used to looking at C#/Python than other things, then other things are more foreign and therefor look messy?

As another anecdote, I cannot stand browsing/looking through C# code as it tends to be filled with various classes just to basically write very basic programs. The amount of over-engineering I've seen in C# surpasses everything else I've looked at. Not to mention how people seem to arbitrary chose between private/public with no real consensus on when to use what, everything seems to be encapsulated in the wrong way. And don't get me started on the infrastructure around it, csproj vs sln and dealing with dependencies/assemblies.

But then I mostly write Clojure code day-to-day, and I realize that my troubles for dealing with C# is mostly because of what I'm used to, not because the language itself is inherently shit. I only have myself to blame for this. I'm sure people who write C# day-to-day have the same feelings about Clojure as I have about C#.

Thanks, I am giving it a try - any dependency required for Windows 10? Compiles fine, but get an error about AVIFileInit - maybe to do with <vfw.h>?

Thanks for open sourcing it so fast

See sibling comment response. :)

As much as I dislike the current AI hype, a local on-machine AI model that can read/interpret videos/thousands of images (basically a recording of screen time combined with video/audio/handwriting of my everyday life), store it in an indexed format, and project it back to me in an easy to understand/quickly digestible format would be a godsend I'd invest a lot of money into (provided false positives were close to zero)

To me, it seems obvious Apple eventually builds this into MacOS (“it’s a feature not a product”). This is like local apps or native OS features that would index your drive contents and provide a frontend to query, but on steroids. This also gets us closer to transparent computing.

Rewind claims to do this, but you'll have to trust them on the local claims, it's not open source: https://www.rewind.ai/

I would love this project to serve that need and personally want this to.

Absolutely. Combine it with real-time analysis of your current screen, and you've got a computer that knows the complete history of what you're doing and why. That kind of global analysis could be really useful.

I used ffmpeg to try to do smart compression for me (diffing etc)- but run OCR first. Also did a poor man’s text merging to try to make use of the overlap from scrolling

What OCR did you use?

Tesseract?

What was the performance (of the OCR) like?

Note: Relies on Apple Silicon, and configured to only produce Apple Silicon builds.

Just curious, what is relying on apple silicon?

Full disclosure, I haven’t tested it on Intel, but I don’t think it will not be able to keep up with taking screenshots, generating ffmpeg videos, and doing OCR that often and will drain your battery very quickly.

But if you / someone can get it to be efficient enough, awesome!

I think you underestimate computers. Taking 2fps screen recordings is a trivial task. Doing OCR may be slightly more work but at 2fps I doubt it is an issue. Worse case you could tune the OCR frequency based on the computer's abilities.

You're confusing 2fps with 1-screenshot-every-2-seconds (or 0.5fps), what the README actually says).

I wouldn't be surprised if the battery issue is problematic, likely will result in at least some kind of battery life reduction, but perhaps not 30 or 50% at 0.5fps.

I haven't looked into the code, but if you're running ffmpeg, then battery life will likely take a hit depending on what exactly you're doing. Video encoding _can be_ heavy on the CPU/GPU.

That makes it even less work. Running ffmpeg is just video encoding, I don't think a 0.5fps video would be a huge issue.

Lots of people work plugged in most of the time. I don't see why one would want to gatekeep to keep them from using it.

What gate keeping? I just see a valid correction to your misstatement and your reaction reads like a defensive Karen wrote it.

Not supporting a platform just because it may cause it may cause battery drain which may not even matter to plugged in users seems like gatekeeping.

Not supporting? The commenter simply said it may cause battery drain. It is a discussion on the topic (both sides based purely on conjecture), and a relevant one. You disagreeing does not mean others are "gate keeping". Stop trying to weaponize trendy language and white knight this thread.

The original README was claiming that relies on Apple Silicon and that they have configured builds to exclude other Apple platforms. I see it has been greatly softened now to "Only tested on Apple Silicon, and the release is Apple Silicon" which I think is quite reasonable.

I have no problem with not supporting a platform because you have no interest or any other reason, but previously it was quite proud to not support it which is different.

Ridiculous. You are working very hard to be offended.

I don’t have an Intel Mac to test on- but you can absolutely just clone it and swap the config to Intel

I have to agree. If you're interested in supporting Intel(x86/64), it's open source, and you sound like you have the hardware to add support for and test on Intel.

it's literally an open source MIT licensed hobby project. fork it and improve it and share here. complaining about it is kinda rude.

I had been doing that with opensource Linksys ip cameras since 2010 and they only have like 180mhz and 32MB RAM. What are you thinking about?

I haven't looked this codebase yet, but a screenshot every few seconds isn't a noticeable slowdown on most machines.

At such slow rates you don't need to create video - you just keep the individual images.

OCR doesn't need to be real-time, but can be done in batch mode or when the machine is idle.

This is also what I was wondering. The demo is showing recording a web-browser, and I'm wondering if that is all it is doing. If so, wouldn't that mean creating a browser plug-in would make this possible on any platform?

I also don't understand the chatGPT component, and what it is trying to tell him. Though I'm sure if you just threw the URL and the screenshot to chatGPT, you could ask it questions about that source.

I'm not sure how useful this is tbh, or how I would use it. I'm not saying it isn't useful, just that I'm not sure how I would use it, or why it is useful.

The demo is showing recording a web-browser He said it's not recording but taking a screenshot every 2 seconds and I assume it's not just for a browser but all text on the desktop.

I also don't understand the chatGPT component You give it context from the "recording" and it answers questions you give it with that context info.

This does look cool. It reminds me of a recent discovery I made. The other day, while trying to recover some disk space, I found a giant file on my hard disk. It turned out to be a nine-hour screen recording from almost a year ago. I had no idea it existed, so I must’ve accidentally left the screen recording on. Scrubbing through it sped up, watching the whole thing in a couple minutes, was fascinating; it was like a window into my thought process at that time. You could see how I was researching something online. It was almost like a play-by-play, akin to re-watching a sports performance – very instructive and surprisingly useful.

Also, the the sense of being back in that time seeing details that I otherwise probably would’ve forgotten was transformative.

In a similar vein to what you’ve done, but focusing specifically on web browsing, I’ve created a tool called ‘DownloadNet.’ It archives for offline use and fully indexes every page you visit. Additionally, it can be configured to archive only the pages you bookmark, offering another mode of operation. It’s an open-source tool, so feel free to check it out: https://github.com/dosyago/DownloadNet

This sounds a bit obvious to me after I write it down: I think there’s some value in the fact you were unaware and it was a random time.

If you take your work very seriously, I can see it being valuable to record it like athletes do. It would be tempting to use this on the “most important” days or when you’re “really ready”. At the very least, there’s a burden of choice and memory. I don’t know about security implications, but it seems valuable to randomly record a day per month and send it to yourself a week later. Or in the case of this tool, select some period for extra review.

There's a windows tool called Timesnapper that takes a screenshot every few seconds and let's you replay and navigate.

After reviewing a few days I learned to start focusing on one thing at a time.

It was cringeworthy to see how ineffective multitasking by switching between a few tasks was.

Absolutely. Watching your playback in TimeSnapper gives a lot of insights into the way a small distraction can derail you for hours (or it does for me, I mean)

It's amazing to me the kind of vulnerable personal responsibility and insights that occur prompted by simply seeing yourself and how you act, clearly. I heartily concur with the above comments and am super happy to see other people having this similar experience.

It suggests these kind of "mirroring" self-training practices and feedback might be useful across a whole range of endeavors, which sounds awesome. A super easy way to improve -- akin to people checking their reflection in a mirror -- that a bit of technology could really help with :)

It archives for offline use and fully indexes every page you visit.

Oh, I also made a tool to do this! Never open-sourced, since it’s an utter pain to set up and the UX is terrible, but amazingly useful all the same.

Incidentally: how does DownloadNet work? My tool uses a browser extension to send the full-text of each webpage to a server, but yours doesn’t seem to have a corresponding extension, so I can’t see how it would retrieve the text.

Ah, good, let me introduce you to the wonderful world of the Chrome Devtools Protocol! (fka Chrome Remote Debugging Protocol)

I love this API for almost everything browser related. I built my RBI product atop this (BrowserBox: https://dosyago.com), and I think it's a drastically underrated API.

Also, it works out of the box in Edge, Brave, Chromium, and many parts of CRDP are supported by Firefox and Safari^1

1: See for example: https://github.com/WebKit/webkit/tree/main/Source/JavaScript...

I opened the page of BrowserBox but didn’t understand what it does. Can you provide an example of a real-world use case?

Very interesting, thanks! I’d better add this to my list of things to look into…

DownloadNet reminds me of how I got really started with Perl programming over 20 years ago. Since I was using my parent's land line with a dial-up modem (which cost cents/minute), I wanted to speed up the process of looking for a job via the government's official job search site.

Turns out, on my slow computer it was faster to clean up a megabyte of HTML with regular expressions before giving it to Firefox than just rendering it as-is - by about 30 seconds per search result page.

Perhaps it's possible to sanitize often visited websites with DownloadNet? (currently getting aggravated by reddit hiding images via JS code to prevent download / viewing in another tab...)

> Perhaps it's possible to sanitize often visited websites with DownloadNet? (currently getting aggravated by reddit hiding images via JS code to prevent download / viewing in another tab...)

Many years ago, I remember using a utility called: privoxy, on Linux/Unix, for that very purpose. No idea if it’s still viable, but thought I’d mention it, in case you’re serious?!

When allowed I use a tool called Manic Time that (in the paid version) does this.

It used to be "local by default" but now I think that might be changing to "local if you want".

They have also in the past been a perfect creator of commercial software as far as I know:

- generous free edition

- paid versions work forever with its current feature set

I typically set it to auto delete after 14 days and disallow screenshots from my ordinary browser (because meetings and passwords), Slack and Teams (meetings) etc.

Archivebox and its companion browser plugin can also accomplish the capability of archiving everything you visit and may be of interest https://archivebox.io/

I feel like recording everything is like recording nothing in practical terms

i record every command in .zsh_history (like everybody else does by default, but mine is configured to not have a size limit)

i often do things like

history | rg ..

it helps when you roughly know what you want to find, but want to check some detail you forgot

For those unaware: CTRL+R in terminal will also change your prompt to search your command history. After typing, CTRL+R again to cycle through matches.

what you really want though is fzf with C-r

Would agree 100%

fzf supercharges your shell history I can’t imagine my life without it since I spend most of my day in terminal

Okay that's interesting, thanks

...and if you're like me - I live on the terminal - tools like atuin[^1] are very handy.

[^1]: https://github.com/atuinsh/atuin

It lets you query any data once you realize what is important (which might vary depending on the question you're trying to answer).

It's like law enforcement tracking everything we say. They aren't catching many people right now, but wait until the future when they start working backwards with logs.

And when things are illegal which weren’t illegal when they were said.

Think of it as closed circuit tv for your computer. You don’t need to watch 24:7 but you can go back for specific incidents/information.

Yeah I understand that, it seems that it tries to classify activity in order to help finding relevant stuff seeing

  let configuration = ImageAnalyzer.Configuration([.text])
                    let nsImage = NSImage(cgImage: image, size: NSSize(width: image.width, height: image.height))
                    let analysis = try await ImageAnalyzer().analyze(nsImage, orientation: CGImagePropertyOrientation.up, configuration: configuration)
                    let textToAssociate = analysis.transcript
                    let newClipboardText = ClipboardManager.shared.getClipboardIfChanged() ?? ""

Don’t show the VCs that invested $27.9M into https://rewind.ai this

They will be very upset

Rewind.ai looks a lot more full featured (unless their site is complete BS and none of it works yet). Doesn't matter though because Apple will rebuild this themselves in 2-5 years with an on-device LLM chip that you will have to buy new hardware to get and it will be way more efficient and with way better privacy.

From the repo, OP did this in couple of days with no experience in swift. So getting to rewind stage is not that hard it seems

This has all the makings of the original “Dropbox is just rsync” comment.

Yes but in this case both apps are relatively new and not established ones if I’m not mistaken

Similar to the “Loom is just OBS with Dropbox on top”

for now it's easy to catch up. But after a few months they will be so far in the sky from all the vc money that it will be tremendously harder. Like the M1 chip for example

Works well. Been using it since beta. I got a memory like a gold fish and this comes in handy.

a16z deploys capital fast into AI companies. They've already funded several companies running off the shelf open source models.

Find the latest flashy thing on Twitter / GitHub, spin it up with a waitlist, then send a16z your deck.

Guess it’s an improvement over deploying it to sociopathic felons which was their last claim to fame.

“pretty scary stuff” indeed!

This would inevitably end up ingesting secrets, right? Like say from my password manager? Or API keys in my terminal?

Lots of ways for this to go sideways even if the data stays local.

What’s the plan there?

Lots of ways for this to go sideways even if the data stays local.

Could you name some?

Like say from my password manager? Or API keys in my terminal?

That's not describing a bad outcome, it's describing how the tool works.

Oh, well I think what he meant is that some malicious program could read and transmit this unencrypted recorded data which is normally stored in an encrypted form

Thanks, I think so too, but the threat model is a bit odd. On a Mac, potentially malicious programs do not normally have access to files in every location (e.g. the prompts to allow a process to access your Documents dir); there is hardware-backed crypto available for further protections; full disk encryption; and so on. It's unclear to me how to evaluate the severity of the risk.

Every security decision is a risk-reward tradeoff, and the reward of a complete memory of computing tasks seems pretty huge.

The impression I was left with is that this tool would write things to disk. It would be helpful to know how that data is stored. I wouldn’t want my password manager OCR’d and then sitting in plain text on disk for example.

Come together as a community and help build the right thing. This isn’t the first implementation and I don’t have a fiduciary duty to create value to investors.

Loving this concept perhaps really useful for my work laptop (as in my own one but is only for work stuff) as quite often you just want to quickly backtrack and find that piece of info you looked at earlier rather than navigate to it again. I’d imagine something like a physical wheel on your desk to wind back would be amazing. I have a useless Bose one that never gets used, can imagine it would feel very “black mirror” to use that to rewind.

A lot of custom keyboards have wheels (search "rotary encoder"), common enough for qmk to support them (https://docs.qmk.fm/#/feature_encoders).

Or even something like this https://thepihut.com/products/adafruit-rotary-trinkey-usb-ne...

I love it. The touchpad feels pretty good, but a wheel would be incredible.

I debounce the livetext analysis on history so you should be able to spin fast without issue

Sweet. Fast spin for the win!

Rewind has search - it just works better than rewinding manually.

How similar is this to rewind.ai (https://www.rewind.ai)?

I never heard of this until now but this looks amazing

Would be even more amazing with a locally running LLM

That’s a core purpose of the project!

Rewind relies GPT-4 for the useful parts. I assume Rem will support local LLMs?

https://help.rewind.ai/en/articles/7791703-ask-rewind-s-priv...

That's the plan. Very open to ideas on the best way to do it. Seems like either Stdin/Stdout or API call via localhost.

I only used rewind at alpha, so not sure how much they’ve added, but it has the value i got out of it, and doesn’t limit your searches arbitrarily.

- takes screenshots every two seconds - records all the text via ocr - builds full text search with sqlite - allows you to go back in time however far and select/copy text from there

No meeting recording / audio recognition. Kinda irks me. Easy to add though.

Really like this. I might use it as a way to keep myself accountable.

I wonder if the screenshots can easily be categorized as "time wasting" vs "productive" (possibly via ML model?). Could optionally gamify statistics. Example last hour: 78% productive, 12% hacker news, 10% inactive. You could go for your own high score (e.g. 3 x 100% hours in a day would probably be a great day for me!).

PS: love the video demo. I figured out what this does in < 30 seconds. Thank you!

PPS: (very tangental) video speed controller (browser addon) now works with loom videos - a few months ago that wasn't the case.

Somebody else pointed out RescueTime, but if keeping it local is a priority, I recommend Qbserve, which I've been using (mostly passively in the background) for a few years now.

[0] https://qotoqot.com/qbserve/

This category of software was actually really useful when I wanted accountability during R&D tasks on engagements. I used Timing, and it would parse the active window titles and create a timeline. Then the creator wanted to charge $80/year and I ended up dropping it completely. I also kinda realized that this sort of software isn’t that different than a RAT and an attacker could target these sort of things. I also figured Apple would’ve opened up their screen time API by now and this class of software would become redundant

You can list windows and detect the front window using macOS APIs instead of taking a screenshot and running OCR/detection.

https://www.rescuetime.com/ already does what you describe very well, without ML. I've used it for years now, for personal accountability.

It even already does the "high score" thing you are talking about, LOL

Congrats on getting this off the ground, and thank you for putting it out there for us to learn from!

I've been curious how Rewind worked under the hood because I've been playing with an idea in my head: an AI assistant that helps you protect your attention.

You would describe the kind of content that you consider a distraction, and any other constraints you have (e.g. "Don't let me watch cat videos unless I'm on a break".

And whenever it sees you watching anything that fits your prompt, it'll pop up on the screen and start a conversation with you to try and understand whether you actually need to consume the content you're looking at.

An AI that intervenes when you're going off track (based purely on how YOU define going off track). Current website blocking approaches aren't useful because they're all-or-nothing. I don't ever want to block entire sites because often there's useful content there relevant for my work. I want to block content on a much more granular level.

And I'd love for an "attention audit" at the end of each day. Attention is our most valuable asset, and I believe protecting it is a worthwhile endeavor... I'd just like some help doing so :).

I encourage you to fork this repo and build it.

Might be worth checking out Ollama and bakllava. https://ollama.ai/library/bakllava

Maybe the model is a bit too slow, but I'm sure smaller ones will come out soon. You can likely fine tune to do exactly what you need.

Thanks for the share! Will check it out.

Oh this seems like a wonderful idea. Loads of invasive privacy issues if you’re not doing the detection locally but I’d absolutely use something like this

Thanks! I agree that everything needs to happen locally, and I believe it's possible.

I'd love to better understand the problems you're facing that makes you want to use a tool like this.

Couldn't find your email, but if you're interested in chatting, you can find mine in my bio. Would appreciate it!

Does it do inter-frame compression at all?

Also, integrating with Ollama.ai or some other local LLM with an API server would be fantastic.

I’d love your opinion on the right way to do this! Being able to call APIs means network permissions- which i was trying to avoid. Maybe via Stdin / Stdout?

were you trying to avoid network permissions (I'm guessing) because this is Docker? (That's the only reason off the top of my head for wanting to avoid network access... in a non-Docker context, localhost is of course easy to hit up, but Docker and nets are a PITA)

Jason - great work here. Your Swift code looks like mine :) on this, some folks in the UK have created Crux - an interesting abstraction layer for mobile apps using Rust. Might provide some ideas for optimisation/ipc. https://github.com/redbadger/crux

Very cool demo OP. Not sure why it's only for Apple Silicone, is it because of it's superior ML support compared to windows? Side oservarion, Olama is not available for Windows. Sadly I won't be able to test this out since I don't own a Apple Silicon notebook, I only have a Apple Intel and beefy Windows.

I don't know if I am a basic programmer or lack the idea on how do folks go build something like this from scratch with no Swift programming language. If I was OP, I would first do bunch of Swift tutorials.

This will be wishful thinking but it is a legitimate side project to make a clone of this to work on Linx oe WIndows on programming language I am most comfortable in, Java and C#. I have zero background in building anything in ML and not at all familiar with Direct X api or Linux Desktop API or Direct X.

The point I am trying to make is there is crap ton of API and tools to be familiar with before even taking the step to code.

How did OP crack this with no exp in Swift to build this? Is it simplar to build project on Apple Silicon and I should get one?

Mind you, I have 4 YOE and code in Java and C# doing vanilla web API and bit of WinFoem/DevExpress work.

I have been in software for a while now. I just love learning and doing cool stuff.

As for being able to hack in Swift specifically... I'm comfortable in 5-6 languages, and play with many more. The language itself feels like a mix of C# and Kotlin to me. But I had no familiarity with Mac OS APIs or SwiftUI / MVVM etc. but there's lots of docs, despite them having effectively no examples.

The repo has a lot of room to grow in terms of quality, which is perfectly ok in my book.

Re: Apple Silicon - I should have just said "I built it on my laptop which is an M1 Air and it's managing to keep up with the screenshot -> OCR -> ffmpeg rendering pipeline and not completely drain the battery and have a strong suspicion it will require a lot more work to get it to perform the same on Intel computers"

As for clone it and build it in Linux / Windows - do it. And there are other comments here suggesting others want to do the same.

I personally want this thing, if I can impact speed of it happening positively, awesome.

I want to record everything and have this nice big dataset which is what I've experienced (on my laptop) the last X amount of time, and be able to do stuff with it - whether it's chat with a local llm or have a really good way to search back in time. I _constantly_ need things from the past.

You could set up screen sharing for any system that supports it, and on another system receive the stream and use an LLM(s) to run object ("cat, piano") and text detection on images and sound. Mac silicon can do that, but it's not as efficient or cost effective as a dedicated GPU. This approach would be system neutral since models work on (increasingly) any GPU including Apple's. On your workstation, you'd just have the overhead (and concerns) of the video stream.

I suspect it's Apple Silicon only because there are simple APIs that Apple has provided to take screenshots of the desktop and OCR text from images. I don't think OP necessarily built any low-level code from scratch here (not that their provided code isn't useful).

Someone else has previously looked into how Rewind.ai may be doing its thing under the hood and there are more details about it here: https://kevinchen.co/blog/rewind-ai-app-teardown/ OP may have used some of the info there.

Can also see this being used by scammers/malware. Not saying it shouldn’t exist. It’s really cool. Just scary. Great job.

Fwiw requires you to explicitly give it permission to record your screen. Would also require you to explicitly give it permission to use network if it needed to make any requests.

I’m super glad about this personally.

I think op is referring to a similar attack vector used in the recently presented “triangulation exploit” wherein attackers used iOS’ stored data from its own local machine learning engine that classifies photos using object recognition and stores text from images with ocr to prioritise which photos from a victims phone had content of interest for them.

Seems a legitimate concern; unsure why op is receiving negative attention for saying so.

Precisely, although I’ll come clean and didn’t know about that exact triangulation exploit mentioned, just the fact that nicely organised historical data of all actions is there somewhere to be found. Especially if this type of software starts getting all the modern ai magic on top, this could be used by everyone as standard tooling and perhaps be a target.

Anyone else mentally associate REM with QBASIC?

https://www.qbasic.net/en/reference/qb11/Statement/REM.htm

I associate it with Commodore 64 BASIC V2

only half remembered and had a vague feeling it's familiar. It's been a while. Realizing now BASIC is pretty weird.

bat/cmd scripts for me

https://ss64.com/nt/rem.html

I have been recording, what I type, or copy, or windows titles of applications I interact with for past 15 years. And it has helped recover stuff that wouldn't have been possible without this system.

I recently switched to MacOS, and I'm missing this very much.

I am building this exact app you're using on Windows but for MacOS - I love hearing that you're also a fan of screen recording!

Mind sharing the name of the app you use for recording on Windows?

As this seems to generate quite a lot of positive feedback, what would be use cases for something like this? Asking not only OP.

An easy one for me is programming.

This kind of approach the only way I know how to be able to go back in time and recognize / resurrect your thought process.

But there's little thorns it solves all over. Ever experienced knowing you did something X days ago but it's in the past and there's literally no way to go back and look at it? Ideally, it solves that.

Version control / history is great if the app supports it, but depending on how it works, "a month ago" might not be available.

At one point I was considering building something similar for myself. Basic idea was something like: Take one screenshot every second, caption the image somehow and keep both things around forever. Add in some adapters that can extract more information (if the browser was active last minute, gather all URLs from that minute and categorize, and so on with different things) and put everything into one location.

Purpose for doing this would be to get a database I can search/query when I kind of know what I'm looking for, but I cannot remember exactly what it was. Being able to query "show me all websites I've never visited before, but visited first time in week 35" would help me a lot to find those easier.

Also just having a recorded log of everything I'm doing would be helpful to see where I'm spending my time the most.

Used to do this a several years back but on a windows machine and without any of the AI stuff obviously. One use case I found is for tracking down unpredictable and seemingly randomly occurring bugs since you can rewatch the events leading up to the bug and form better hypotheses about what might reproduce it.

Eventually I had to stop because the fan was going crazy, plus I couldn't bear seeing how slow and error-prone I was at typing and at generally operating the computer (it never felt that way when I'm using the computer, but watching myself using it is a different story)

I am building this exact software for exploratory bug-testing. What have you been replacing it since your last usage on Windows?? I think I tweaked the recording aspect to be super-clean and CPU/memory impact is minimal now (1%)

Haven't found any replacement, but dashcam.io looks very promising for that use case, will definitely be checking it out!

Serious question: I have a serious case of OCD where I keep trying to remember things verbatim (the verbatim part is the OCD). Naturally there are a bunch of checking and repeating in trying to do so.

I have been considering the idea of using a similar app to this (or rewind.ai), but I have the concern that it might aggravate my situation. Just imaging my checking self watching 12 hours of video footage already gave me chill.

I would appreciate if anyone with a related or similar situation can share their experience using those apps. Since this is fairly sensitive, my email is also in the profile if anyone want to contact me directly.

You’ll end up with inception levels of watching yourself watching yourself…

There are folks who mention that Rewind.ai has be invaluable for managing their ADHD on their Slack community. Perhaps if you join their Slack [1], you might be able to meet people in a similar situation as yours (with your type of OCD)?

1: https://rewind.ai/community

Another obvious option is to just access the browser’s History file and request and store the contents of each visited page. This prevents you from needing to do OCR and is more highly compressible. Or do your method, but throw away the screenshots after AI analyzes and OCRs them. BTW, Mistral 7B is good enough! We don’t need to rely on ChatGPT4 IMO and copy pasting context is a bit sloppy.

I wanted to build a similar tool that just relied on browser history. But I couldn't figure out anyway to do it (especially not through browser extensions)

If anyone has any suggestions, I'd be more than grateful.

Yeah that works well for browser stuff, but this works with IDEs etc too

and totally. Haven’t added direct local interaction yet, but on the roadmap.

cool concept, love the idea. Might be fun to integrate with local llama to get most privacy

100% and local embeddings. This is the area i want to explore next.

The demo i showed with chatgpt works just as well with openhermes2.5-mistral. But is instant with chatgpt instead of 20s

Interesting concept, however I don't get what information is pasted into the context. Also, ChatGPT's context is kinda limited, I can probably remember the recent context, what I have problem with is context from let's say a week ago which would probably be way over the LLM's context window.

Admittedly, it might have been a mistake as a demo / feature, but haven’t built embedding support yet. Working on it!

Have you tried using the Accessibility API instead of (or alongside) taking screenshots? It wont work with all apps but you can fall back to OCR when it doesn’t and best of all you can monitor the “DOM” for changes.

Candidly, I don't know how to do this effectively, especially with browsers. I looked into this approach using the notification pattern, but I just couldn't see a good way to do it. I'm no expert in Mac APIs and would love to learn and / or see any specific approaches you have in mind!

Are there instructions on how to launch the app? I’m able to clone the repo but a bit lost on next steps

Open in XCode and create an archive or run it directly.

Or you can ise the release i uploaded.

I added instructions for how to use it once it’s open in readme.

Apologies for anything unclear!

Surprisingly this is something that I have been thinking on for the whole 2023.

I myself am really bad at documenting findings while doing research or bugfixing so I started at recording all my daily activities for both replaying research sessions and also for my future me in case something is not clear in the docs.

Then I knew rewind and I was happy to know that I am not alone. This REM is the confirmation that this definitely has great use cases :)

I’d rather prefer the recording phase to be as lightweight as possible so I am recording the full mp4 video and plan to re-encode at a lower rate at night. But there is a compromise between recording quality and file size, I do not want end up with several Petabytes of videos.

What codec do you recommend for this use case? Lossy video codecs usually are very efficient for real images (just like the comparison between jpg/png) and I am sure a video format that is PNG based should be more efficient in space while preserving text quality.

I am very interested in read your thoughts about this.

I used h264_videotoolbox which is supposed to be efficient for apple hardware. I'd like to get hevc_videotoolbox working.

rem does OCR in memory before streaming to ffmpeg. But it works on the screen grabs of the video anyway.

Yeah, it's a pretty different use case than other video. Curious too if there are "screen recording optimized" codecs.

Like non-contiguous diffing. Instead of "diff from last frame", "diff from frame X"- and/or some sort of quad tree hash lookup

This is really awesome

Thank you! I hope it can become more awesome and be useful to people.

I really, really want something like this that is truly multiple platform and local. Linux and Windows are a must. Must be 100% offline so that it is useable without Internet. I'd gladly pay $60 per each major version per year. Add permissive open source license and you have me as a customer for life. Maybe I should just build it myself if others are interested?

I'd be definitely interested.

Comments:

- Insanely useful with some changes.

- Needs local llama support for privacy.

- Needs pause/record functionality, ideally w/ preset exclusions, again privacy.

- If this could evaluate in real time at some point and start intelligently adding value at that point it has the chance to change things.

My guess is that in 10 years this will seem absolutely archaic. Now, it feels a bit like magic.

Thanks for the feedback! You can start / stop remembering whenever you want.

As far as real time stuff and local llama- absolutely, on the roadmap.

I’ve been exploring / experimenting with embedding spaces and local models a lot.

Not quite the same thing (no screen grab) but there is a non-visual cousin, http://arbtt.nomeata.de that records X11 data and provides a query language to produce summaries of the data.

This is a lot like RescueTime which I use for personal accountability as to how I am actually spending my time

https://www.rescuetime.com/

Pretty interesting stuff.

I'm just wondering how you manage the limitation of context length.

For the "copy recent context"?

The last 15 frames.

It's a terrible approach! But I had to start somewhere. Actively experimenting with properly leveraging embedding search.

But I've had a hard time finding CPU + RAM efficient vector indexing + search that meets my expectations. Been doing a lot of personal experimentation and research in this space.

Is there a known approach to be able to maintain a large embedding space that you can insert into efficiently and search accurately without needing to load / maintain the entire thing into memory?

About Remember Everything

I use singleFile ( browser extension) - saves a copy of every webpage I view on Chrome and FireFox. I use a program AutomaticScreenshotter to record my screen activity to capture other non browser activity. Enables me to work what I was doing on my PC at any past date. All files are saved in a Year/month/day dir structure. Finding stuff - use windows search at present.

I also use ditto to save all copy and pastes in a mysqldb.

I've been doing this since before 2010 ( the dir structure) THe extensions and screengrabs , only started that about 3-4 years ago.

I've often wonder if forensic PC investigation tools would /could also be used ( my with some mods to help produce a PC timeline of my activity.

I'm curious how much data is produced and saved every day with such a setup, if I had to guess I'd say multiple gigabytes, but that doesn't sound sustainable on any reasonably sized hard drive

Copying text from the saved footage is wild!

I had a poor man's version of this with TimeSnapper Classic, a free Windows utility that takes a screenshot every n seconds then lets you view a timelapse at the end of the day to show how you spent your time.

After a few weeks my disk was starting to fill up with screenshots. I browsed the folders and noticed that most of the screenshots looked almost identical. "I should come up with some kind of image codec optimized for image sequences, that diffs against the previous image to save space." Then realized I had basically reinvented GIF / video codec haha. So I wrote a script to shove the timestamp (filename) into the image itself (with ImageMagick), and convert them to video with ffmpeg. 99.9% size reduction!

This looks a lot more useful though.

I forgot to explain the most valuable thing about watching a timelapse of your own day. It puts you "outside" time so you can view it from "above", essentially see the whole thing at once. (Not quite, since that would be a (still image) timeline, but the effect is very similar if the timelapse is short enough.)

Really puts things into perspective.

I'm curious as to why you chose to turn the screenshots into a video. What are the benefits of storing them like that instead of as image files?

Dramatically smaller size on disk. Video codecs leverage representing things using diffs. Think about a 2 minute video of someone reading an article online. Then think of 60 screenshots of someone reading that article over 2 minutes. The 60 screenshots are likely ~15-30MB. The video is probably like 3MB or less and that’s without doing much of anything. Any time the user is idle, that’s kind of free in a video. An image, it wouldn’t be.

definitely potential for nightmare scenarios - employers would love using this type of thing to fully surveil staff. Plug it in to AI and you have real time monitoring of everything everyone is doing with alerts.

It's interesting that as someone who seems to care about privacy and security, you would use a closed source, web browser (Arc Browser).

Cool stuff. Interesting to see how these ideas evolve, now with LLMs. I made the similar thing some time ago (>2yrs): https://shkur.ski/chronocatch/ for Mac/Win (Intel, H264 for interframe compression and BM25-ranked search). Then the war started and I regret not sharing this back in time "as is" when I could.

Now I no longer need to wonder who Rem is.

Can anyone suggest a linux equivalent of this project? X or Wayland - doesn't matter to me?

This is very cool, I am building a tool [1] to record 1H of screen at a time (to help developers debug errors while doing exploratory testing) and I always thought that I could add a layer to turn my 1-hours-brain-recording into a baby Rewind.

I have tried Rewind in alpha/beta, it was cool, but it was never something I felt like I needed. That being said things change, and maybe I'll change my mind when it's part of the OS in a seamless way, but it's sketchy for as long as it's not offline: let alone the privacy consequences of running Rewind ;)

[1] https://dashcam.io

How is the latency, impact on battery?

Would an iphone version be possible

Maybe in 5 years apple with release a native version of this.

This looks super-interesting! I haven’t seen the questions yet scrolling through a number of comments, so:

- how much disk storage does this use, say per hour of typical computer use

- how much CPU/battery life impact does it have

will be interesting to see if or how this technologies will be used in ten years, or even five. To me, it seems curious that we posses the most powerful memory ever created, and we're constantly trying not to use it.

On a more serious note, I wonder if such tools hinder creativity. By not remembering things directly, one could build the habit of relying on such tools for everything. Given creativity is the ability to recombine past memories into future ones...

Cool! I would be interested to hear what Apple Silicone specific features this uses? Is there some sort of image processing feature that Apple CPUs offer that are being leveraged?

Imagine raising a $1B + valuation just to have some random guy on HN make an open source version of your company...

VC economics are going to need to change with AI and I think many haven't got the memo.

I think my employer does it for my anyway lol

(mac only)

I wonder if there's a way to leverage this application to create a user profile while keeping the data locally (storing, processing, etc.), just for the user to know _what_ social media companies know (or think they know) about the user.

If this application monitors, stores and analyses social media presence, email, etc. Could the application present to the user a profile similar to what Google has for the user?

For example, would be interesting to know how Spotify or Netflix sees me in technical and/or social terms.

This idea for such application comes from Yuval Harari.

You should also post this to https://www.reddit.com/r/LocalLLaMA/, since it may be useful with local LLMs.