A few years ago, my wife and I decided to adopt a rescue cat from battersea.org.uk. However, it was a frustrating experience as the staff didn't always update the website regularly, and we'd find that any suitable cats would be snapped up before we'd even seen them.
I spotted that the website served its data to the frontend via an unsecured internal JSON API, so I built an Elixir app that would poll the API endpoint and upsert the cat data into the database. Any new records would get posted to a twitter account (a free way to get notifications on my phone).
It worked beautifully, and when a black cat called "Fluff" popped up, we both knew he was the right one, and we were able to phone them and arrange a meeting before anyone else. Fast forward five years, and he's sitting next to me on the sofa right now, purring away.
Why not go straight from the json to Twitter?
How would you keep state of which pets have already been posted?
Hi! I answered this in the other post, but the Postgres UPSERT was the key. By using a combination of attributes (It may have just been name & D.O.B), In a single operation I could figure out if the cat has been posted (and update the existing row), or create a new row.
This worked as there were only ever 30-50 cats online at one time. If it was a thousand, I'm not sure what I would have done.
Edit: I realise now this was a rhetorical question. Oops!
Good question!
The primary reason was to learn Elixir, so this was just a well-timed excuse to explore the language (and Phoenix, the web framework).
The secondary reason was that my wife was the main client, and she doesn't respond well to raw JSON. Each tweet would be just the cat's name, photo, and a link to the website. I also did some filtering as certain cats have safety requirements we couldn't meet, e.g. no neighbouring cats, no children)
One of the main issues I had to figure out early on, was "how do I distinguish which cats are new, compared to the previous response?". This was made harder because I couldn't rely on the ordering; occasionally previously-posted cats would have details updated and they would move position. Postgres UPSERT was new (to me, at least) at the time, and it seemed like a very handy way to offload the responsibility. There were never more than 50 cats listed at any one time, so it was reasonable enough to request all the animals at once, and let the database figure out which cats were new, based on a combination of identifiers that would make them unique. I could also filter the updated records to see _what_ had been updated, e.g. the cat had now been rehomed.
Another thing Elixir did really well was the polling mechanism. It's absolutely trivial to spawn a worker that can repeatedly perform a task and asynchronously hand it off to be processed.
Hope that answers your question!
lol it actually did and now I’m gonna read more about upserts!
I'm not the author but one purpose the database may have is to keep state if the monitoring process needs to be restarted.
Great example of a pet project :)
Made me paws and chuckle!
Well played!
haha
This made me laugh. Thank you for brightening my morning!
This a great example on the usefulness of data availability via JSON API.
If the data is read only it's a GOOD thing especially for non-confidential data that are meant to be public, every government agency should open their public data like this.
Absolutely! My original plan had been to scrape their website using Selenium or similar.
I quickly noticed that they had employed lazy loading, which would have made that all but impossible. It took me a good few minutes to realise that if they had lazy loading, there had to be a backend, and I was overjoyed when I found out it was serving JSON.
All in all, it was probably much cheaper for them to have me hitting the API endpoint every minute than scraping the website even once a day
Kind of similar, in the early days of COVID, I accidentally discovered that my state's website would have test results available several hours before they sent out the "view your results" email. So I made a script that would check the site every five or ten minutes and then ping me as soon as the result changed to something besides PENDING.
In the course of that I stumbled on https://ntfy.sh/ which solved the notification problem without needing Twitter, and I've used it since then to let me know when long-running scripts complete.
That looks great! I'm not sure it was available in 2018 when I looked. I tried a few options, but posting to twitter & subbing to the account was the quickest hack that worked reliably.
Kinda similar story for me.
I was trying to find a used motorcycle. So I created an in-browser javascript app that could go over the listings in a country local second hand site, and score the listings to my liking. Like decrease for long mileage, increase for young age.
That worked pretty well and found me a great one. Good times.
I'm going to do this to find a Toyota truck.
I did a similar thing like twenty years ago to nab free stuff on the local Craigslist.
I have a similar story of accessing an internal JSON API for my own benefit.
I left my airpods in a car I rented using zipcar. I spoke to support etc but nothing had been handed in. I checked to see if the car was still where I left it so that I could re-hire and claim them, but it had been moved.
The app tells you the 'name' of the car you rented which is used as an identifier. It also shows a map of where all available cars are. I sniffed the requests the app made to display this map, and was able to filter it by the car name. From this I was able to locate where the car I left my airpods in was. Was able to head there, unlock the car, and to my amazement the airpods were still there!
That's amazing!
That’s super heartwarming to read. Congrats on your success!
Thanks! Probably the only side project I've ever seen through to completion
Same sort of thing but this one is ongoing.
Standard, website bad and hard to use but there is a secret json feed of the useful data so hack up an alternative view. ( they have changed its format slightly once so far )
This one is cinema, local cinema ( https://thelight.co.uk/ 5 minute walk) has a monthly membership with unlimited films, but, hard to keep track of whats on time wise. EG planing to watch one film as another one ends, and also hard to tell what is the last showing of a film or even what is on right now.
So simple table view sorted by time.
https://notshi.github.io/dump/ the source is on github but as it is just a single html page with javascript embedded then the source is also the page :)
Kinda nice that it is such a simple hack.
Black cats are the best. Our last one was like a dog more than cat but also liked to sit on my Mac's keyboard :)
An ebay sniper, for cats.
My 3rd hand experience with animal shelters in the UK was that the more appealing animals were announced privately to friends and family of people running shelters before they'd be put up on for the public to adopt.
I'm confused, why was the site out of date if the API it was pulling from was not? Was it not just rendering what was coming from the API?