This is impressive work, especially for a one man show!
One thing that stood out to me was the graph of the sentiment analysis over time, I hadn't seen something like that before and it was interesting to see it for Rust. What were the most positive topics over time? And were there topics that saw very sudden drops?
I also found this sentence interesting, as it rings true to me about social media "there seems to be a lot of negative sentiment on HN in general." It would be cool to see a comparison of sentiment across social media platforms and across time!
Thanks! Yeah I'd like to dive deeper into the sentiment aspect. As you say it'd be interesting to see some overview, instead of specific queries.
The negative sentiment stood out to me mostly because I was expecting a more "clear-cut" sentiment graph: largely neutral-positive, with spikes in the positive direction around positive posts and negative around negative posts. However, for almost all my queries, the sentiment was almost always negative. Even positive posts apparently attracted a lot of negativity (according to the model and my approach, both of which could be wrong). It's something I'd like to dive deeper into, perhaps in a future blog post.
Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.
Posts written in sweet syrupy tones wouldn’t do well here, and jokes are in short supply or outright banned. Most people here also seem to be men. There’s always someone shooting you down. And after a while, you start to shoot back.
(Without wanting to sound negative or cynical) I don’t think it is, but maybe I haven’t been here long enough to notice. It skews towards technical and science and technology-minded people, which makes it automatically a bit ‘cynical’, but I feel like 95% of commenters are doing so at least in good faith. The same cannot be said of many comparable discussion forums or social media websites.
Jokes are also not banned; I see plenty on here. Low-effort ones and chains of unfunny wordplay or banter seem to be frowned upon though. And that makes it cleaner.
I've been here a hot minute and I agree with you. Lots of good faith. Lots of personal anecdotes presumably anchored in experience. Some jokes are really funny, just not reddit-style. Similarly, no slashdot quips generally, such as "first post" or "i, for one, welcome our new HN sentiment mapping robot overlords." Sometimes things get downvoted that shouldn't, but most of the flags I see are well deserved, and I vouch for ones that I think are not flag-worthy
I wonder how much of a persons impression of this is formed by their browsing habits.
As a parent comment mentions big threads can be a bit of a mess but usually only for the first couple of hours. Comments made in the spirit of HN tend to bubble up and off-topic, rude comments and bad jokes tend to percolate down over the course of hours. Also a number of threads that tend to spiral get manually detached which takes time to go clean up.
Someone who isn't somewhat familiar with how HN works that is consistently early to stories that attract a lot of comments is reading an almost entirely different site than someone who just catches up at the end of the day.
some of the more negative threads will get flagged and detached and by the end of the day a casual browse through the comments isn't even going to come across them. eg something about the situation in the middle east is going to attract a lot of attention.
Really? Mmm i think hn is a place with on avarage above intelligent people. People who understand that their opinion is not the only one. I rarely have issues with people here. Might be also because we are all in the same bubble here.
I think it's the engineering mindset. You're always trying to figure out what's wrong with an idea, because you might be the poor bastard that ends up having to build it. Less costly all round if you can identify the flaw now, not halfway through sprint 7. After a while it bleeds into everything you do.
I don't think this is particularly unique to HN. Anonymous forums tend to attract contrarian assholes. Perhaps this place is more, erm, poorly socially-adapted to the general population, but I don't see it as very far outside the norm outside of the average wealth of the posters.
Sure, sometimes. But usually it's
Truth seeking > group thinking
There's a fine line between critical and cynical. Sometimes that line gets crossed. Sometimes the ambiguity of text-only comms clouds the water.
The sentiment issue is a curious one to me. For example, a lot of humans I interact with that are not devs take my direct questioning or critical responses to be "negative" when there is no negative intent at all. Pointing out something doesn't work or anything that the dev community encounters on a daily basis isn't an immediate negative sentiment but just pointing out the issues. Is it a meme-like helicopter parent constantly doling out praise positive so that anything differing shows negativity? Not every piece of art needs to be hung on the fridge door, and providing constructive criticism for improvement is oh so often framed as negative. That does the world no favors.
Essentially, I'm not familiar with HuggingFace or any models in this regard. But if they are trained from the socials, then it seems skewed from the start to me.
Also, fully aware that this comment will probably be viewed as negative based on stated assumptions.
edit: reading further down the comments, clearly I'm not the first with these sentiments.
Every helicopter gets a trophy
wait, the parents get a trophy?
You may be right, a more tailored classifier for HN comments specifically may be more accurate. It'd be interesting to consider the classes: would it still be simply positive/negative? Perhaps constructive/unconstructive? Usefulness? Something more along the lines of HN guidelines?
Speaking from experience, debate is easily misread as negative arguing by outsiders, even though all involved parties are enjoying challenging each other's ideas.
Just one point of note : people are FAR more likely to respond and take to writing to something negative than positive. I don’t know the exact numbers but it just engages people more. People just don’t pick up the pen to write how good something is as much.
Great work! Would you consider adding support for search-via-url, e.g. https://hn.wilsonl.in/?q=sentiment+analysis. It would enable sharing and bookmarks of stable queries.
Thanks for the suggestion, I've just added the feature:
https://hn.wilsonl.in/s/sentiment%20analysis
I did something related for my ChillTranslator project for translating spicy HN comments to calm variations which has a GGUF model that runs easily and quickly but it's early days. I did it with a much smaller set of data, using LLM's to make calm variations and an algo to pick the closest least spicy one to make the synthetic training data then used Phi 2. I used Detoxify then OpenAI's sentiment analysis is free, I use that to verify Detoxify has correctly identified spicy comments then generate a calm pair. I do worry that HN could implode / degrade if there is not able to be a good balance for the comments and posts that people come here for. Maybe I can use your sentiment data to mine faster and generate more pairs. I've only done an initial end-to-end test so far (which works!). The model, so far is not as high quality as I'd like but I've not used Phi 3 on it yet and I've only used a very small fine-tune dataset so far. File is here though: https://huggingface.co/lukestanley/ChillTranslator I've had no feedback from anyone on it though I did have a 404 in my Show HN post!
It will be a deep dive into the most essential of HN staples, the nitpick
its so interesting that in Likert scale surveys, I tend to see huge positivity bias/agreement bias, but comments tend to be critical/negative. I think there is something related to the format of feedback that skews the graph in general.
On HN, my theory is that positivity is the upvotes, and negativity/criticality is the discussion.
Personally, my contribution to your effort is that I would love to see a tool that could do this analysis for me over a dataset/corpus of my choosing. The code is nice, but it is a bit beyond me to follow in your footsteps.
HN is a pretty toxic place indeed.
Perhaps... it can be toxic if you dip into the comments sometimes... Otherwise the content and links are the stuff of gold!
links are indeed the best. It is hard not to click on the comments however, which is a roll of a dice.
How did you get from negative sentiment to toxicity? Are those the same to you?
It may be a cultural thing, but I think many people see negative sentiment as a constructive tool and a demonstration of trust and respect among people who recognize each others as robust and capable peers.
Avoiding it is something you do with people who you believe need special delicacy: whether because they've told you so, because they intimidate you, or because you sense something pitiable and fragile about them.
If you can trust that it's given in good faith, and by the guidelines of HN you are asked to, negative sentiment should be seen as an expression that someone thinks you're a fully capable adult and peer. Personally, I deeply appreciate that it's generally so comfortably shared and received here and would never include "toxicity" in one of my critiques of HN.
It's a surprising thing to read someone say!
(Unless you're thinking of the nastiness that can surface on flamewar topics, but there are numerous means by which those get downranked and displaced, and they're otherwise sparse and easy to avoid.)
Negative sentiment is more general than toxicity in my understanding - but it does include it. The fact that the study found HN consistently negative does not surprise me, one of the ways HN is negative (the most disruptive and which makes me post here less often) is indeed toxic comments. But I am still here (in the comments no less) so the benefit still outweighs the pain.
This may be a personal style difference, but I find HN to be the least toxic of all social media I’ve tried. LinkedIn would be my example of ultra toxicity – the aggressive positivity there is unbearable. At least on HN people tell you what they think and even use a constructive decently argumented approach to doing so.
HN to me feels like a good technical discussion where people tear apart ideas instead of each other.
But yeah if you put a lot of ego into your ideas, HN must be an awful place to visit.
I agree, HN is much less toxic than about any other place on the internet.
I actually did a blog post a few months ago where I analyzed HN commenter sentiment across AI, blockchain, remote work and Rust. The final graph at the very end of the post is the relevant one on this topic!
https://openpipe.ai/blog/hn-ai-crypto
thanks, the sentiment in these graphs seem more positive in comparison. Did you run the sentiment on the whole corpus? What did that look like?
> sentiment across social media platforms and across time!
Also time zones and weekday/weekend.
It's really unfortunate the HN API does not provide votes on comments: I wonder if and how sentiment analysis would change if they were weighted by votes/downvotes?
My unsupported take is that engineers are mostly critical, but will +1 positive feedback instead of repeating it, as they might for critism :)
Crypto i imagine is in that bucket