I think the reviewers did a good job; the reviews are pretty reasonable. Reviews are supposed to be about the quality of a paper, not how influential they might be in the future! And not all influential papers are actually very good.
I think the reviewers did a good job; the reviews are pretty reasonable. Reviews are supposed to be about the quality of a paper, not how influential they might be in the future! And not all influential papers are actually very good.
Surely those seemingly smart anonymous reviewers now feel pretty dumb in hindsight.
Peer review does not work for new ideas, because no one ever has the time or bandwidth to spend hours upon hours upon hours trying to understand new things.
It's worth pointing out that most of the best science happened before peer review was dominant.
There's an article I came across awhile back, that I can't easily find now, that basically mapped out the history of our current peer review system. Peer review as we know it today was largely born in the 70s and a response to several funding crises in academia. Peer review was a strategy to make research appear more credible.
The most damning critique of peer-review of course is that it completely failed to stop (and arguably aided) the reproducibility crisis. We have an academic system where the prime motivation is the secure funding through the image of credibility, which from first principles is a recipe for wide spread fraud.
Peer review is basically Github anonymous PRs that has the author pinky swear that the code compiles and 95% of test cases pass.
Academic careers are then decided by the Github activity charts.
The whole 'pinky swear' aspect is far from ideal.
But is there an alternative that still allows most academic aspirants to participate?
Github
Do you understand what the parent is saying? It's clearly an analogy, not a literal recommendation for all academics to use Github.
I understand, thank you for clarifying :)
My point was that academics could use Github (or something like it)
Can you write out the argument for it, or why you believe it to be a net positive change compared to the current paradigm?
Peer review is basically Github anonymous PRs that has the author pinky swear that the code compiles and 95% of test cases pass.
It should be possible to use something like Github to *verify* "that the code compiles and 95% of test cases pass" instead of just "pinky swearing".
Based on...?
Tests and data.
I meant what is your belief that it will be successful, or even workable, based on?
The success of Github in creating software, and the success of software in advancing scientific progress.
Maybe something like nbdev.fast.ai.
In any case, it was just a thought, and likely not an original one. I would welcome it if someone tried to build this and proved it can’t be done.
Thank you for the stimulating discussion!
It seems kind of obvious that peer review is going to reward peer think, peer citation, and academic incremental advance. Obviously that's not how innovation works.
the system, as flawed as it is, is very effective for its purpose. see eg "success is 10% inspiration and 90% perspiration". on a darker side, the purpose is not to be fair to any particular individual, or even to be conducive to human flourishing at large.
yes - maybe a good filter for future academic success, which seems to be a game unto itself
academia is not about innovation, it should be trying to tend to the big self-referential kaleidoscope of knowledge.
mostly it should try to do it through falsifying things, of course groupthing is seldom effective at that.
It's worth pointing out that most of the best science happened before peer review was dominant.
It's worth pointing out that most of everything happened before peer review was dominant. Given how many advances we've made in the past 50 years, so I'm not super sure everyone would agree with your statement. If they did, they'd probably also agree that most of the worst science also happened before peer review was dominant, too, though.
Our advances in the last 50 years have largely been in engineering, not science. You could probably take a random physics professor from 1970 and they'd not sweat too much trying to teach physics at the graduate level today.
But a biology professor from that time period would have a lot of catching up to do, perhaps too much, especially (but not only) if any part of their work touched molecular biology or genetics.
You might be thinking of Adam Mastroianni's essays on the subject:
https://www.experimental-history.com/p/the-rise-and-fall-of-... https://www.experimental-history.com/p/the-dance-of-the-nake...
Thanks so much for posting those. The essays were great, but I didn't see them before.
But there is zero reason why the definition of peer review hasn't immediately been extended to include:
- accessing and verifying the datasets (in some tamper-proof mechanism that has an audit trail). Ditto the code. This would have detected the Francesca Gino and Dan Ariely alleged frauds, and many others. It's much easier in domains like behavioral psychology where the dataset size is spreadsheets << 1Mb instead of Gb or Tb.
- picking a selective sample of papers to check reproducibility on; you can't verify all submissions, but you sure could verify most accepted papers, also the top-1000 most cited new papers each year in each field, etc. This would prevent the worst excesses.
PS a superb overview video [0] by Pete Judo "6 Ways Scientists Fake Their Data" (p-hacking, data peeking, variable manipulation, hypothesis-shopping and selectively choosing the sample, selective reporting, also questionable outlier treatment). Based on article [1]. Also as Judo frequently remarks, there should be much more formal incentive for publishing replication studies and negative results.
[0]: https://www.youtube.com/watch?v=6uqDhQxhmDg
[1]: "Statisics by Jim: What is P Hacking: Methods & Best Practices" https://statisticsbyjim.com/hypothesis-testing/p-hacking/
It's worth pointing out that most of the best science happened before peer review was dominant.
This seems unlikely to be true, simply given the growth. If you are arguing that the SNR ratio was better, that's different.
Have they done a double-blind test on the peer review system?
You're probably thinking of this article:
https://www.experimental-history.com/p/the-rise-and-fall-of-...
This is not the takeaway I got. The takeaway I got was the review process improved the paper and made it more rigorous. How is that a bad thing? But yes, sometimes reviewers are focusing on different issues instead of 'is this going to revolutionize A, B, and C'.
I currently have a paper under review (first round) that was submitted the 2nd of August. This is at the second journal. The first submission was a few months before that.
I'm not sure peer review makes things more rigorous, but it surely makes it more slow.
I have been deeply unimpressed with the ML conference track this last year... There's too many papers, too few reviewers, leading to an insane number of PhD student-reviewers. We've gotten some real nonsense reviews, with some real sins against the spirit of science baked into them.
For example, a reviewer essentially insisting that nothing is worth publishing if it doesn't include a new architecture idea and SOTA results... God forbid we better understand and simplify the tools that already exist!
Do they do anything different in other countries, or is it just a copy of the U.S system?
Peer review isn't about the validity of your findings and the reviewers are not tasked with evaluating the findings of the researchers. The point is to be a light filter to make sure a published paper has the necessary information and rigor for someone else to try to replicate your experiment or build off of your findings. Those are the processes for evaluating the correctness of the findings.
The issue here wasn't that the reviewers couldn't handle a new idea. They were all very familiar with word embeddings and ways to make them. There weren't a lot a of new concepts in word2vec, what distinguished it was that it was simple, fast, and good quality. The software and pretrained vectors were easy to access and use compared to existing methods.
I have finished a PhD in AI just this past year, and can assure you there exist reviewers who spend hours per review to do it well. It's true that these days it's often the case that you can (and are more likely than not to) get unlucky with lazier reviewers, but that does not appear to have been the case with this paper.
For example just see this from the review of f5bf:
"The main contribution of the paper comprises two new NLM architectures that facilitate training on massive data sets. The first model, CBOW, is essentially a standard feed-forward NLM without the intermediate projection layer (but with weight sharing + averaging before applying the non-linearity in the hidden layer). The second model, skip-gram, comprises a collection of simple feed-forward nets that predict the presence of a preceding or succeeding word from the current word. The models are trained on a massive Google News corpus, and tested on a semantic and syntactic question-answering task. The results of these experiments look promising.
...
(2) The description of the models that are developed is very minimal, making it hard to determine how different they are from, e.g., the models presented in [15]. It would be very helpful if the authors included some graphical representations and/or more mathematical details of their models. Given that the authors still almost have one page left, and that they use a lot of space for the (frankly, somewhat superfluous) equations for the number of parameters of each model, this should not be a problem."
These reviews in turn led to significant (though apparently not significant enough) modifications to the paper (https://openreview.net/forum?id=idpCdOWtqXd60¬eId=C8Vn84f...). These were some quality reviews and the paper benefited from going this review process, IMHO.
There are more details in the FB post of Tomas Mikolov (author of word2vec) recently: https://www.facebook.com/share/p/kXYaYaRvRCr5K2Ze
A hilarious and poignant point I see is how experts make mistake too. Quote:
I also received a lot of comments on the word analogies - from "I knew that too but forgot to publish it!" (Geoff Hinton, I believe you :) happens to everyone, and anyways I think everybody knows what the origin of Distributed Representations is) to "it's a total hack and I'm sure it doesn't work!" (random guys who didn't bother to read the papers and try it out themselves - including Ian Goodfellow raging about it on Twitter).
Also, Tomas says he came up with the encoder-decoder (seq-to-seq) idea, and then Ilya and Quoc took over the idea after Tomas moved on to Facebook.
However, there is another statement by Quoc, saying this is not true: https://twitter.com/quocleix/status/1736523075943125029
We congratulate Tomas on winning the award. Regarding seq2seq, there are inaccuracies in his account. In particular, we all recall very specifically that he did not suggest the idea to us, and was in fact highly skeptical when we shared the end-to-end translation idea with him. Indeed, we worked very hard to make it work despite his skepticism.
So, word against word. I'm not accusing anyone of lying here, one of them probably just misremembers, but this leaves also a somewhat bad taste.
This is what Tomas Mikolov said on Facebook:
I wanted to popularize neural language models by improving Google Translate. I did start collaboration with Franz Och and his team, during which time I proposed a couple of models that could either complement the phrase-based machine translation, or even replace it. I came up (actually even before joining Google) with a really simple idea to do end-to-end translation by training a neural language model on pairs of sentences (say French - English), and then use the generation mode to produce translation after seeing the first sentence. It worked great on short sentences, but not so much on the longer ones. I discussed this project many times with others in Google Brain - mainly Quoc and Ilya - who took over this project after I moved to Facebook AI. I was quite negatively surprised when they ended up publishing my idea under now famous name "sequence to sequence" where not only I was not mentioned as a co-author, but in fact my former friends forgot to mention me also in the long Acknowledgement section, where they thanked personally pretty much every single person in Google Brain except me. This was the time when money started flowing massively into AI and every idea was worth gold. It was sad to see the deep learning community quickly turn into some sort of Game of Thrones. Money and power certainly corrupts people...
Reddit post: "Tomas Mikolov is the true father of sequence-to-sequence" https://www.reddit.com/r/MachineLearning/comments/18jzxpf/d_...
As another small hint of Mikolov-vs-Le divergence: they're the coauthors of the 'Paragraph Vector' paper (https://arxiv.org/abs/1405.4053) applying a slightly-modified version of word2vec to vectorize longer texts, still in a very shallow way. (This technique often goes by the name 'doc2vec', but other things also sometimes get called that, too.)
There are some results in that paper, supposedly from exactly the technique described on an open dataset, that no one has ever been able to reproduce – & you can see the effort has frustrated a lot of people, over the years, in different forums.
When asked, Mikolov has said, essentially: "I can't reproduce that either – those tests were run & reported by Le, you'lll have to ask him."
This is interesting. I went off and searched for paragraph vector code and indeed find doc2vec stuff, including tutorials referring to the paper such as https://radimrehurek.com/gensim/auto_examples/howtos/run_doc.... It’s not obvious that the results aren’t reproducible (and I realise code is not the same as published results), but I wonder if you could steer us more specifically.
I think this must happen all the time. As they say, ideas are cheap. It's likely that ALL of them had the seq-to-seq idea cross their mind at some point before it was acted on, so if credit is assigned to whoever said it out loud first, there's going to be disagreement, since most people don't remember the full details of every conversation. It's also possible for someone to be skeptical of their own idea, so that argument isn't compelling to me either. Ultimately credit usually goes to the people who do the hard work to prove out the idea, so it seems like the system worked as intended in this case.
Indeed, we worked very hard to make it work despite his skepticism.
In this case why the authors of the seq2seq paper did not mention him in their massive acknowledgement section?
Typical, saying they had the idea first without putting it on the blockchain to prove the time stamp!
"Success has a thousand mothers, but failure is an orphan"
I tried asking on another thread what Goodfellow rage he's referring to since all I could find was this: https://twitter.com/goodfellow_ian/status/113352818965167718...
If so, frankly I think it makes Mikolov sound pretty insecure.
Twitter no longer shows threads to people who aren't logged in. https://nitter.net/goodfellow_ian/status/1133528189651677184
To be fair I have some memories of the papers and surrounding tech being really bad. The popular implementations didn’t actually do what the papers said and the tech wasn’t great for anything beyond word level comparisons. You got some juice doing tf idf weighting of specific words but then tf idf weighted bag of words was similarly powerful.
Cosine similarity of the sum of different word vectors sounds soooo dumb nowadays imo
Cosine similarity is like a very primitive version of word2vec. They both work off the idea of context being the key to the derivation of a token's semantic meaning. Cosine similarity is not very useful in itself but it's helpful for understanding the progression in the space
That post sounds like a rant TBH with too many stabs at various people. It could have been a lot more graceful. OTOH I can believe most researchers are human and do not put the progress of shared knowledge first but are very much influenced by ego and money *cough* OpenAI *cough*
To err is human, to seek profit is common to all lifeforms
I wrote a detailed proof years before Mikolov on Twitter, but the 280 characters were too small to contain it
Did you now?? I'll have you know that I wrote the full word2vec paper on a roll of shabby two-ply tissue paper during my time in a Taco Bell stall. Sadly, it was then used to mop up my dietary regrets and was subsequently lost to foul wretches of the sewage system. Left with nothing but the memories of my groundbreaking thoughts and the lingering aroma of liquid feces, I texted Mikolov an idea I had about using neural nets to map sequences of text tokens from one language to another only for him to reply "lol thx" and ghost me. I was quite negatively surprised when he decided to take this to the public courts of Facebook and failed to mention the "brainest boi alive™" who gave him this idea in the first place.
The post would have found a more fitting home on Twitter rather than the 'forgotten' realms of FB. The very individuals and entities mentioned (or hinted at) might have had the chance to see the story and share their perspectives. Otherwise it is just sounds like a rant.
when i was on college, i wrote a simple system to make corrections on text based on some heuristics for a class.
then, the teacher of the class suggested me to write a paper describing the system for a local conference during the summer, with some results etc
I wrote it with his support but it got rejected right away because of poor grammar or something similar. the conference was in Brazil, but required the papers to be in English. I was just a student and thought that indeed my english was pretty bad. the teacher said to me to at least send an email to the reviewers to get some feedback, maybe resubmit with the corrections.
i asked specifically which paragraphs were confusing. they sent me some snippets of phrases that were obviously wrong. yes, they were the "before" examples of "before/after" my system applied the corrections. I tried to explain that the grammar should be wrong, but the just replied with "please fix your english mistakes and resubmit".
i tried 2 or 3 more times but just gave up.
I’ve been a reviewer and occasionally written reviews a bit like you describe.
Papers are an exercise in communicating information to the paper’s readers. If the writing makes it very difficult for the audience to understand that information, the paper is of little use and not suitable for publication regardless of the quality of the ideas within.
It is not the reviewer’s job to rewrite the paper to make it comprehensible. Not only do reviewers not have time, it is not their job.
Writing is not easy, and writing technical papers is a genuinely difficult skill to learn. But it is necessary for the work to be useful.
To be honest, it sounds like the teacher who suggested you write the paper let you down and wasted your time. Either the work was worth their time to help you revise it in to publishable form, or they shouldn’t have suggested it in the first place.
Did you ironically misread their comment, and didn't realize the grammar the reviewers were complaining about was the known bad examples his algo could fix?
It's hard to believe that the reviewers misunderstood the examples. It's more likely that the surrounding text was badly written, and the reviewers had no idea what they should be looking at.
There is the option of contacting the program committee chair or proceedings editor to complain if the reviewers misunderstood something fundamentally, like it looks like it happened in his example.
The teacher should have fought this battle for the pupil, or they ought to have their efforts re-targeted another conference.
Ha!
Sorry, I did miss that. And yes, that sounds like lazy reviewing .
But I have also read many word salads from grad students that their supervisors should never have let go to a reviewer.
*eyeroll* Sounds about right. Want to get that published anyway? You could pop it on the arXiv and let the HN hivemind suggest an appropriate venue.
If you don't have arXiv access, find an endorser <https://info.arxiv.org/help/endorsement.html>, and send them a SHORT polite email (prioritise brevity over politeness) with your paper and the details. Something like:
Hello,
I wrote a paper for college in yyyy (attached) on automatic grammar correction, which got rejected from Venue for grammatical errors in the figures. I still want to publish it. Could you endorse my arXiv account, please?
Also, could you suggest an appropriate venue to submit this work to?
Yours sincerely,
your name, etc
Follow the guidance on the arXiv website when asking for endorsement.
thank you for the suggestion, but it was just an undergraduate paper written in ~2014. I don't see any relevance in publishing it now.
It is a lot of effort to get something through the publication process, but if you can't find the technique you used in ten minutes of searching https://scholar.archive.org/, it would be a benefit to the commons if you published your work. At least on a website or something.
the method I used (I barely remember the details) is similar to this one https://scholar.archive.org/work/idi3kpcrlbfpvkxfxxf5tqr3om
You remind me of these anecdotes by Feynman of his time in Brazil. Specifically search for "I was invited to give a talk at the Brazilian Academy of Sciences", but the whole thing is worth a read if you haven't seen it:
We already have a better mechanism for publishing and peer review.. it's called the internet. Literally the comments section of Reddit would work better. Reviews would be tied to a pseudonymous account instead of anonymous, allowing people to judge the quality of reviewers as well. Hacker News would work just as well too. It's also nearly free to setup a forum and frictionless to use compared to paying academic journals $100k for them to sell your own labour back to you. Cost and ease of use also mean more broadly accessible and hence more widely reviewed.
Every once in a while I see a thread on reddit about a subject I know about and if someone shares a factual account that sounds unpopular it'll be downvoted even though it's true. I think reddit would be a terrible way to do this.
But academic review is like that, with the worse acktchsually guys in existance
The worst? Are you sure? Reddit's worst acktchsually guys are often spilling out of actual cesspit hate subreddits.
There is some meta-knowledge in your comment, but I'm focusing solely on the critique and pedantry levels, no comment on other factors
https://www.lesswrong.com/ lets you vote separately on agreement and quality axes. That seems to help a little bit.
The groupthink problems on reddit are quite severe.
And this is why the biggest evolution of AI has happened in companies, not in academic circles
Because there's too much nitpicking and grasping at straws amongst people that can't see novelty even when it's dancing in front of them
No, the reason is that it required substantial financial investments, and in some cases access to proprietary big-data collections.
word2vec does not require a large amount of data
mnist might have required a large amount of data at its creation, but it became a staple dataset
There was a lot of evolution before ChatGPT
And people in academia were all over Word2Vec. Mikolov presented his work in our research group around 2014, and people were very excited. Granted, that was _after_ Word2Vec had been published, and this was a very pro-vectorspaces (although of a different type) crowd.
Will keep happening because peer review itself, ironically, has no real feedback mechanism.
This is exactly correct! It's an asymmetrical accountability mechanism.
The whole reason OpenReview was created was to innovate and improve on the peer review process. If you have ideas, reach out to the program chairs of the conference you're submitting to. Many of them are open to running experiments.
In hindsight, reviewer f5bf’s comment is fascinating:
- It would be interesting if the authors could say something about how these models deal with intransitive semantic similarities, e.g., with the similarities between 'river', 'bank', and 'bailout'. People like Tversky have advocated against the use of semantic-space models like NLMs because they cannot appropriately model intransitive similarities.
What I’ve noticed in the latest models (GPT, image diffusion models, etc) is an ability to play with words when there’s a double meaning. This struck me as something that used to be very human, but is now in the toolbox of generative models. (Most of which, I assume, use something akin word2vec for deriving embedding vectors from prompts.)
Is the word2vec ambiguity contributing to the wordplay ability? I don’t know, but it points to a “feature vs bug” situation where such an ambiguity is a feature for creative purposes, but a bug if you want to model semantic space as a strict vector space.
My interpretation here is that the word/prompt embeddings in current models are so huge that they’re overloaded with redundant dimensions, such that it wouldn’t satisfy any mathematical formalism (eg of well-behaved vector spaces) at all.
Even small models (e.g. hidden dims = 32) should be able to handle token ambiguity with attention. The information is not so much in the token itself as in the context.
The key difference is what I'd call "context-free embeddings" vs "contextual embeddings". Due to its structure, word2vec and similar solutions have to assign every single "bank" in every sentence the exact same vector, but later models (e.g. all the transformer models, BERT, GPT, etc) will assign wildly different vectors to "bank" depending on the context of surrounding words for that particular mention of "bank".
Boiled down to the core essence, science is about testing ideas to see if they work. Peer review is not part of this process, they rarely if ever attempt replication during the peer review process; and so they inevitably end up rejecting new ideas without even trying them. This isn't science.
Thankfully you can just keep submitting the same paper to different journals until someone is too busy to read it and just approves it blindly. The academic publication shitshow giveth and the academic publication shitshow taketh away.
The futile quest for algorithmification of truth, and the loopholes that make the system work again despite having an impossible objective in the first place. Couldn't have been any different - in fact we can take this as evidence that AI has not taken over science yet.
I'm curious how many commenters here who are making strong statements about the worth (or not) of peer review have actually participated in it both as author AND reviewer? Or even as an editor who is faced with the challenge of integrating and synthesizing multiple reviews into a recommendation?
There are many venues available to share your research or ideas absent formal peer review, arXiv/bioRxiv being among the most popular. If you reject the idea of peer review itself it seems like there are plenty of alternatives.
It's the internet, therefore a significant percentage of the strong opinions about any topic will come from people who have little to no experience or competence in the area. Being HN, it probably skews a bit better that average. OTOH, it will also skew towards people procrastinating. Factor that in how you will...
There are indeed four entries saying "strong reject", but they all appear to be from the same reviewer, at the same time, and saying the same thing. Isn't this just the one rejection?
Also, why is only that reviewer's score visible?
I agree that Glove was a fraud.
This was hilarious!
Many very broad and general statements are made without any citations to back them up.
- Please be more specific.
The number of self-citations seems somewhat excessive.
- We added more citations.
Previous discussion: https://news.ycombinator.com/item?id=38654038
Flagged for misleading title - the four strong rejects are from a single author. It's listed four times for some unknown reason but likely an openreview quirk. The actual status described by the page is: 2 unknown (with accompanying long text), 1 weak reject, and 1 strong reject.
That didn’t age well
I've found that PhD level academics are usually wrong about practical matters. It's almost as if having a PhD is itself proof that they are not good at identifying optimal solutions to reach practical goals. Also, it may show that they are overly concerned with superficial status markers instead of results.
I think it's also the secret to why they can go so deep on certain abstract topics. Their minds are lacking a garbage collector and they can go in literally any direction and accumulate any amount of knowledge and they'll be able to memorize it all easily even if it has 0 utility value.
It seems they have rejected initial versions of the paper, since there had been later updates and clarifications based on the reviews. So it seems this was beneficial in the end and how review process should work? Especially since this was groundbreaking work it makes sense there is more effort put into explaining why it works instead of relying too much on good benchmark results.
That's the most inspiring thing I've learned this year.
I'd reject it still (speaking of someone who has developed products based on word vectors, document vectors, dimensional reduction, etc. before y'all thought it was cool...)
I quit a job because they were insisting on using Word2Vec in an application where it would have doomed the project to failure. The basic problem is that in a real-life application many of the most important words are not in the dictionary and if you throw out words that are not in the dictionary you choose to fail.
Let a junk paper like that through and the real danger is that you will get 1000s of other junk papers following it up.
For instance, take a look at the illustrations on this page
https://nlp.stanford.edu/projects/glove/
particularly under "2. Linear Substructures". They make it look like a miracle that they project down from a 50-dimensional subspace down to 2 and get a nice pattern of cities and zip codes, for instance. The thing is you could have a random set of 20 points in a 50-d space and, assuming there are no degeneracy, you can map them to any 20 points you want in the 2-d space with an appropriately chosen projection matrix. Show me a graph like that with 200 points and I might be impressed. (I'd say those graphs on that server damage the Stanford brand for me about as much as SBF and Marc Tessier-Lavign)
(It's a constant theme in dimensional reduction literature that people forget that random matrices often work pretty well, fail to consider how much gain they are getting over the random matrix, ...)
BERT, FastText and the like were revolutionary for a few reasons, but I saw the use of subword tokens as absolutely critical because... for once, you could capture a medical note and not erase the patient's name!
The various conventions of computer science literature prevented explorations that would have put Word2Vec in its place. For instance, it's an obvious idea that you should be able to make a classifier that, given a document vector, can predict "is this a color word?" or "is this a verb?" but if you actually try it, it doesn't work in a particularly maddening way. With a tiny training/eval set (say 10 words) you might convince yourself it is working, but the more data you train on the more you realize the words are scattered mostly randomly and even those those "linear structures" exist in a statistical sense they aren't well defined and not particularly useful. It's the kind of thing that is so weird and inconclusive and fuzzy that I'm not aware of anyone writing a paper about it... Cause you're not going to draw any conclusions out of it except that you found a Jupiter-sized hairball.
For all the excitement people had over Word2Vec you didn't see an explosion of interest in vector search engines because... Word2Vec sucked, applying it to documents didn't improve the search engine very much. Some of it is that adding sensitivity to synonyms can hurt performance because many possible synonyms turn out to be red herrings. BERT, on the other hand, is context sensitive and is able to some extent know the different because "my pet jaguar" and "the jaguar dealership in your town" and that really does help find the relevant documents and hide the irrelevant documents.
The review thread (start at the bottom & work your way up) reads like a Show HN thread that went negative.
The paper initially received some questions/negative feedback, so the authors updated and tweaked the reviewers a bit —
"We welcome discussion... The main contribution (that seems to have been missed by some of the reviews) is that we can use very shallow models to compute good vector representation of words."
The response to the authors' update:
Review: The revision and rebuttal failed to address the issues raised by the reviewers. I do not think the paper should be accepted in its current form. > Quality rating: Strong reject > Confidence: Reviewer is knowledgeable
Makes me not feel bad about my own rejections when I see stuff like this or Yann Lecun reacting poorly on twitter to his own papers being rejected.
"The eight-legged essay was needed for those candidates in these civil service tests to show their merits for government service... structurally and stylistically, the eight-legged essay was restrictive and rigid. There are rules governing different sections of the essay, including restrictions on the number of sentences in total, the number of words in total, the format and structure of the essay, and rhyming techniques."
https://en.wikipedia.org/wiki/Eight-legged_essay#Viewpoints
That's the medieval equivalent of leetcode.
The problems that the imperial Chinese government had to solve was pretty much the same as the problem the Big Tech companies are trying to solve with leetcode.
In earlier times, it used to be that the exams were more freestyle, but when tens/hundreds of thousands of people compete for a handful of high civil service positions, people are motivated to cheat by memorizing essays pre-written by somebody else. And the open-ended questions had subjective answers that didn't scale. So they basically gamified the whole thing.
They might not be perfect employees, but at least you know they are smart, are disciplined, and have the capacity to work hard for a long period.
Sounds like a hazing ritual’s outcome
Exactly what it is. "We had to go through it, so they'll have to too!" plus "Someone who has so little self-respect that they do this will do anything we ask of them."
(I know that some people genuinely like Leetcode and that's totally fine. But that's not why company want people to do it)
If you think that’s bad, wait until you hear about medschool / nursing.
Standardized interviews or panels do not necessarily exist to find the best candidate. They exist as a tradeoff to ensure some measure of objectivity and prevent favoritism/influence/corruption/bribery/forgery/impersonation/unfair cheating by getting advance access to the test material; in such a way that this can then be verified, standardized, audited at higher levels or nationally. Even more important for medschool/nursing than engineering.
One of countless examples was the sale of 7600 fake nursing transcripts and diplomas in 2020/1 by three south Florida nursing schools [0]. (This happened in the middle of Covid, and at schools which were already being deaccredited.)
Buyers paid $10-15K to obtain fake diplomas and transcripts indicating that they had earned legitimate degrees, like a (two-year) associate degree in nursing; these credentials then allowed the buyers to qualify for the national nursing board exam (NCLEX). About 37% of those who bought the fake documents — or about 2,800 people — passed the exam. (Compare to candidates holding a bachelor's degree in nursing (BSN) reportedly typically pass at 90% compared to 84% for those with an associate degree in nursing (ADN)).
Among that 2700, a “significant number” then received nursing licenses and secured jobs in unnamed hospitals and other health care settings in MD, NY, NJ, GA.
[0]: "7,600 Fake Nursing Diplomas Were Sold in Scheme, U.S. Says" https://web.archive.org/web/20230928151334/https://www.nytim...
I meant more that a massive part of the experience is hazing used to filter for less obvious criteria, but that is also good info!
Right, sure. But I was saying it isn't by any means only the candidates that we want to guard against misconduct or lack of objectivity, or their schools, but the interviewers/panelists/graders/regulators themselves.
Hazing is just an unfortunately necessary side-effect of this.
Are you saying "smart, are disciplined, and have the capacity to work hard for a long period" have no bearing on doing a good job?
No
For Leetcode, this is one of the typical rationalizations.
It's something a rich kid would come up with if they'd never done real work, and were incapable of recognizing it, but they'd seen body-building, and they decided that's what everyone should demonstrate as the fundamentals, and you can't get muscles like that without being good at work.
And of course, besides the flawed theory, everyone cheated at the metric.
But other rich kids had more time and money for the non-work metrics-hitting, so the rich kid was happy they were getting "culture fit".
The ancient Chinese exams were the exact opposite of what you describe.
The Chinese rulers realized they had a problem where incompetent rich kids got promoted to important government jobs. This caused the government and therefore society to function poorly. As a result of this, many people died unnecessarily.
To combat this, the Chinese government instituted very hard entrance exams that promoted competent applicants regardless of rich parents.
The book Ancient China in Transition - An analysis of Social Mobility, 722-222 BC (Cho-yun Hsu, Stanford University Press, 1965) discusses this transition in rather great detail.
https://www.cambridge.org/core/journals/journal-of-asian-stu...
Imperial exams arguably started in the Sui and Tang dynasties, in the 6th century AD.
The 722-222BC in the article you linked to is the "wrong period". That period was a time where Chinese states transitioned from feudalism to empire, and the clan-based aristocracy was replaced by a class of educated and scholars due to societal change (which needed lots of literate administrators, and not so much nepotism). The typical path for a person who aspired to work in government is to apply to be an "employee" (very loose translation of 門客) of a prominent minister, and rise up the ranks by impressing their bosses. It was also common for rulers of the time to hold meetings with intellectuals, who then tried to sell the rulers on the latest ideology/methods to run a country -- if the sales pitch worked, they'd have the job of implementing those policies. In general there were no exams, although one would assume to become an "employee" they'd test your skills in some way, but it was not systematized at all.
Between 222BC and 6th century AD, the educated/scholar class gradually consolidated into a handful of prominent families who tended to monopolize high government posts. During the Jin dynasty (266–420AD) people generally believed one's virtues/abilities were tied to their birth and family status more than anything else. This was a time when prominent families monopolized government power and systematically rejected outsiders from holding important positions, even if they proved their abilities.
The imperial exam system introduced in the Sui/Tang dynasties gradually reversed this trend by allowing commoners to participate in the government exams, though it must be noted in the Tang dynasty exams the circumstances of the candidates were taken into account (family background, social ties, subjective opinion of the examiners, etc.), so it wasn't purely based on the paper exam results.
The exams were gradually systematized in the later dynasties, ultimately ending in the "Eight-legged essay", which was famous for its rigidity in form. It provided a great opportunity for intelligent aspirants from a underprivileged background since everyone took the exam on an equal footing, but it kind of sucked the soul out of learning the classics.
This is also the original (stated) motivation for modern standardized testing.
Interesting. They had a different situation, and different motivations, and perhaps humility and awareness.
You're 100% right. Gave me a big, big smile after 7 years at Google feeling like an alien, I was as a college dropout from nowhere with nothing and nobody, but a successful barely-6-figure exit at 27 years old.
Big reason why the FAANGs get dysfunctional too. You either have to be very, very, very, fast to fire, almost arbitrarily (Amazon) or you end up with a bunch of people who feel safe enough to be trying to advance.
The "rich kids" w/o anything but college and FAANG were taught Being Visible and Discussing Things and Writing Papers is what "hard work" is, so you end up with a bunch of people building ivory towers to their own intellect (i.e. endless bikeshedding and arguing and design docs and asking for design docs) and afraid of anyone around them who looks like they are.
Have been on a few panels where candidate passes all leetcodes and then turned out to be very poor on the job with in one case worst “teamwork” I’ve witnessed. These were not FANG jobs though so might be more viable at a larger company where it’s ok to axe big projects, have duplicated work, etc. leetcode is just one muscle and many jobs require more than one muscle.
Which was what I meant by "might not be perfect employees".
Sure. But high intelligence, discipline and a capacity for a high level of sustained effort is a good start.
And won't discuss salaries with each other :)
https://en.wikipedia.org/wiki/Song_official_headwear
And it's important to recognize the advantages and disadvantages to ensure that we have proper context.
For example, leetcode may be very appropriate for those programming jobs which are fairly standard. At every job you don't need to invent new things. Industrialization was amazing because of this standardization and ability to mass produce (in a way, potentially LLMs can be this for code. Not quite there yet but it seems like a reasonable potential).
But on the other hand, there are plenty of jobs where there are high levels of uniqueness and creativity and innovation dominate the skills of repetition and regurgitation. This is even true in research and science, though I think creativity is exceptionally important.
The truth is that you need both. Often we actually need more of the former than the latter, but both are needed. They have different jobs. The question is more about the distribution of these skillsets that you need to accomplish your goals. Too much rigidity is stifling and too much flexibility is chaos. But I'd argue that in the centuries we've better learned to wade through chaos and this is one of the unique qualities that makes us human. To embrace the unknown while everything in us fights to find answers, even if they are not truth; because it is better to be ruled by a malicious but rational god than the existential chaos.
Those companies still use leetcode for those positions. It's just a blanket thing at this point.
Yes, and I think it is dumb. I'm personally fed up with how much we as a society rely on metrics for the sake of metrics. I can accept that things are difficult to measure and that there's a lot of chaos. Imperfection is perfectly okay. But I have a hard time accepting willful ignorance, acting like it is objective. I'm sure I am willfully ignorant many times too, but I think my ego should take the hit rather than continue.
I guess your comment is against the restrictive and rigid idea that peer review should be about making research papers more intellectually rigorous?
I agree. My own most influential paper received strong rejects the first time we submitted it, and rightfully so, I think. In retrospect, we didn't do a good job motivating it, the contributions weren't clearly presented, and the way we described was super confusing. I'm genuinely grateful for it because the paper that we eventually published is so much better (although the core of the idea barely changed), and it's good because of the harsh reviews we received the first time around. The reviews themselves weren't even particularly "insightful", mostly along the lines of "this is confusing, I don't understand what you're doing or why you're doing it", but sometimes you just really need that outside perspective.
I've also reviewed and rejected my share of papers where I could tell there is a seed of a great idea, but the paper as written just isn't good. It always brings me joy to see those papers eventually published because they're usually so much better.
This is the first time I ever saw a scientist say something positive about peer review
I haven't seen a manuscript that could not made a better paper through peer review.
Now there are good and bad reviewers, and good and bad reviews. However, because you usually get assigned three reviewers, the chance that there is not at least one good reviewer or at least a good review from a middle to bad reviewer is not that low, which means if you get over the initial "reject" decision disappointment, you can benefit from that written feedback. The main drawback is the loss of time if a rejection means you may lose a whole year (only for conferences, and only if you are not willing to compromise by going to a "lower" conference after rejection by a top one).
I have often tried to fight for a good paper, but if the paper is technically not high quality, even the most original idea usually gets shot down, because top conferences cannot afford to publish immature material for reputational reasons. That's what happened to the original Brin & Page Google/PageRank paper, which was submitted to SIGIR and rejected. They dumped it to the "Journal of ISDN Systems" (may this journal rest in peace, and with it all ISDN hardware), and the rest is history. As the parent says, you want to see people succeed, and you want to give good grades (except in my experience many first year doctoral students are often a bit too harsh with their criticism).
I haven't seen a manuscript that could not made better, but peer review isn't the best way to improve it.
For a start, even for a journal, the fact that there are multiple reviewers means that none of them feel much responsibility to help improve the paper. They (and I, honestly) oft seem content to read part of it and just provide comments/criticism about that. Especially at conferences the possible verdicts are "reject" or "can't find a reason to reject". I've submitted at least one paper where I wanted the reviewers to point out flaws and reject, and I was left sorely disappointed (it was rejected of course, but not for good reasons) and still in the dark about whether the research had merit.
That said, OpenReview reviews are on average far better than the ones I've received, I think it's fantastic.
Authors don't object to revision suggestions, they object to arbitrary/unfair rejections plus 4-year delays.
No, it's much worse if you're a masters student who wants to publish and have that publication accepted within 12-18mths for your jobhunt; you'll necessarily compromise by skipping most journals and aiming for middle-tier conferences (high accept rate and fast turnaround), then falling back on journals only if you get rejected. This was why arxiv was created. (Some academic type here is inevitably going to retort "But conferences (not even those with official proceedings) aren't considered 'real' publications in my field", to which the counter-retort is "the job market doesn't care about that attitude".)
I don't think conferences have the capacity to do this. Journals, yeah, conferences no.
The difference is that in a conference you're in a zero sum game and there is no chance for iteration and the framing is opposition of reviewer/author rather than seeing as being on the same team. Yes, every work can be improved, but the process is far too noisy and we can't rely on that iteration happening between conferences.
From personal experience, I've had very few reviews that have meaningful and actionable feedback. Far more frequently I've gotten ones that friends joke that GPT could have done better. My last one I had a strong reject with high confidence by a reviewer's who's only notes were about a "missing" link (we redacted our github link) and a broken citation leading to the appendix. That's it. We reported them, then they got a chance to write a new angry review which seemed to convince the other two borderline reviewers. Most frequently I get "not novel" or "needs more datasets" without references to similar work (or references that are wildly off base) and without explanation as to what datasets they'd like to see and/or why. Most of my reviews are from reviewers reporting 3/5 confidence levels and are essentially always giving weak or borderline scores (always bias towards reject). It is more common for me to see a review that is worse that the example of a bad review in a conference's own guidelines than one that is better.
As a reviewer, I've often defended papers that were more than sufficient and I could tell were making rounds. I had to recently defend a paper for a workshop that was clearly a paper that made a turnaround form the main conference (was 10 pages + appendix when most workshop papers were ~5) and the other two reviewers admitted to not knowing the subject matter but made similar generic statements about "more datasets would make this more convincing." I don't think this is helping anyone. Even now, I've been handed a paper that's not in my domain for god knows why (others in domain). (I do know, it's because there's >13k submissions and not enough reviewers)
I've only seen these problems continue to grow and silly bandaids attempt to be applied. Like the social media ban, which had the direct opposite result of what they were attempting to do and was quite obvious that that would happen. The new CVPR LLM ban is equally silly because it just states that you shouldn't do what was already known as unethical and shifts the responsibility to the authors to prove that an LLM gave the review (which is a tall order). It is like proving to a cop that you've been shot for them only to ask that you identify the caliber of the bullet and the gun that was used. Not an effective strategy to someone bleeding out. It's a laughable solution to the clear underlying problem of low quality reviews. But that won't happen until ACs and metas actually read the reviews too. And that won't happen until we admit that there's too many low quality reviews and no clear mechanism to incentivize high quality ones https://twitter.com/CVPR/status/1722384482261508498
Eh, happens all the time. It's an extremely rare paper that isn't improved by th e process (though it's also a pain sometimes, and clueless/antagonistic reviewers do happen)
It can be annoying, and reviewer #2 is a jerk, but I'd be highly suspect of any scientist that doesn't respect the importance of peer review.
IMO maybe scientists should have experience critiquing stuff like poems, short essays, or fiction. Expecting a critiquer to give actually good suggestions matching your original vision, when your original vision's presentation is flawed, is incredibly rare. So the best critiques are usually a "this section right here, wtf is it?" style, with added bonus points to "wtf is this order of information" or other literary technique that is either being misused or unused.
Oh, I do completely agree and didn't mean to imply otherwise. I have had experiences where reviewers have given me great ideas for new experiments or ways to present things. But the most useful ones usually are the "wtf?" type comments, or comments that suggest the reviewers completely misunderstood or misread the text. While those are initially infuriating, the reviewers are usually among the people in the field that are most familiar with the topic of the paper--if they don't understand it or misread it, 95% of the time it's because it could be written more clearly.
Agree that this is how papers are often judged, but strong disagree on how this is how papers should be judged. This is exactly the problem of reviewers looking for the keys under the lamp post (does the paper check these boxes), versus where they lost the keys (should this paper get more exposure because it advances the field).
The fact that the first doesn't lead more to the second is a failure of the system.
This is the same sort of value system that leads to accepting job candidates with neat haircuts and says the right shibboleths, versus the ones that make the right bottom line impact.
Basically, are "good" papers that are very rigorous but lead to nothing actually "good"? If your model of progress in science is that rigorous papers are a higher probability roll of the dice, and nonrigorous papers are low probability rolls of the dice, then we should just look for rigorous papers. And that a low-rigor paper word2vec actually make progress was "getting really lucky" and we should have not rated the paper well.
But I contend that word2vec was also very innovative, and that should be a positive factor for reviewers. In fact, I bet that innovative papers have a hard time being super rigorous because the definition of rigor in that field has yet to be settled yet. I'm basically contending that on the extreme margins, rigor is negatively correlated with innovation.
Your argument is that if a paper makes a valuable contribution then it should be accepted even if it's not well written. But the definition of "well written" is that it makes it easy for the reader to understand its value. If a paper is not well written, then reviewers won't understand its value and will reject it.
Well written and rigor aren’t highly correlated. You can have poorly written papers that are very rigorous, and vic versa. Rigor is often another checkbox (does the paper have some quantitative comparisons), especially if the proper rigor is hard to define by the writer or the reader.
My advice to PhD students is to always just focus on subjects where the rigor is straightforward, since that makes writing papers that get in easier. But of course, that is a selfish personal optimization that isn’t really what’s good for society.
Rigor here doesn't have to mean mathematical rigor, it includes qualitative rigor. It's unrigorous to include meaningless comparisons to prior work (which is a credible issue the reviewers raised in this case) and it's also poor writing.
Qualitative rigor isn’t rigor at all, it’s the opposite. Still useful in a good narrative, sometimes it’s the best thing you have to work as evidence in your paper.
Prior work is a mess in any field. The PC will over emphasize the value of their own work, of course, just because of human ego. I’ve been on way too many papers where my coauthors defensively cite work based on who could review the paper. I’m not versed enough about this area to know if prior work was really an issue or not, but I used to do a lot of paper doctoring in fields that I wasn’t very familiar with.
Papers are absolutely judged on impact - it's not as though any paper submitted to Nature gets published as long as it gets through peer review. Most journals (especially high-impact for-profit journals) have editors that are selecting interesting and important papers. I think it's probably a good idea to separate those two jobs ("is this work rigorous and clearly documented") vs ("should this be included in the fall 2023 issue").
That's (probably) good for getting the most important papers to the top, but it also strongly disincentivizes whole categories (often very important paper). Two obvious categories are replication studies and negative results. "I tried it too and it worked for me" "I tried it too and it didn't work" "I tried this cool thing and it had absolutely no effect on how lasers work" could be the result of tons of very hard work and could have really important implications, but you're not likely to make a big splash in high-impact journals with work like that. A well-written negative result can prevent lots of other folks from wasting their own time (and you already spent your time on it so might as well write it up).
The pressure for impactful work also probably contributes to folks juicing the stats or faking results to make their results more exciting (other things certainly contribute to this too like funding and tenure structures). I don't think "don't care about impact" is a solution to the problem because obviously we want the papers that make cool new stuff.
This is post hoc thinking but impossible a priori. You're also discounting the bias of top venues, in that the act of being in their venue is a causal variable for higher impact if you measure by citation counts.
I'd also mention that ML does not typically use a journal system but rather conferences. A major difference is that conferences are not rolling submissions and there is only one rebuttal available to authors. Usually this is limited to a single page that includes citations. You can probably imagine that it's difficult to do an adequate rebuttal to 3-4 reviewers under the best of circumstances. It's like trying to hold a debate where the defending side must respond to any question from the opposition, with clear citations, in a short time frame, and there is no limit to how abstract the opposing side's question need be. Nor that their is congruence within the opposition. It's not a very good framework for making "arguments" more clear or convincing, especially when you consider that the game is zero sum.
I definitely agree with your comments about how other types of useful communication (like null results) are highly discouraged. But I wanted to note that there's a poor framework for even "standard" works.
I don't consider clearly stating your model and meaningfully comparing it to prior work and other models (seemingly the main issues here) to be analogous to a proper haircut or a shibboleth. Actually I think it's a strange comparison to make.
You are right. I often got told "You don't compare with anything" when proposing something very new. That's true, because if you are literally the first one attempting a task, there isn't any benchmark. The trick then is to make up at least a straw man alternative to your method and to compare with that.
Since then, I have evolved my thinking, and I now use something that isn't just a straw man: Before I even conceive my own method or model or algorithm, I ask myself "What is the simplest non-trivial way to do this?". For example, when tasked with developing a transformer based financial summarization system we pretrained a BERT model from scratch (several months worth of work), but I also implemented a 2-line grep based mini summarizer as a shell script, which defied the complexity of the BERT transformer yet proved to be a competitor tought to beat: https://www.springerprofessional.de/extractive-summarization...
I'm inclined to organize a workshop on small models with few parameters, and to organize a shared task as part of it where no model can be larger than 65 kB, a sort of "small is beautiful" workshop in dedication of Occam.
Don't you think something is missing if we've defined "quality" as an independent and uncorrelated characteristic from importance or influentiality?
Quality in the sense I meant it (cogency and intellectual depth/rigor) should certainly be correlated with importance and influence!
No, because "quality" means two different things here. I believe the main reason word2vec became important was purely on the software/engineering side, not because it was scientifically novel. Advancements in Python development, especially good higher-level constructs around numerical linear algebra, meant that the fairly shallow and simple tools of word2vec were available to almost every tech company. So I don't think word2vec was (or is) particularly good research, but it became good software for reasons beyond its own control
In 2016 or so it was proven that word2vec is equivalent to the pointwise mutual information between the words in its training set, after doing some preprocessing. This means that Claude Shannon had things mostly figured out in the 60s, and some reviewers were quite critical of the word2vec paper for not citing similar developments in the 70s.
Yes and no. I think the larger issue is about the ambiguity of what publications mean and should be. There's a lot of ways to optimize this, and none of those has optimal solutions. I don't think you should be down voted for your different framing because I think we just need to be more open about this chaos and consider other values than our own or the road we're on. I think it is very unclear what we are trying to optimize and it is quite apparent that you're going to have many opinions on this and your group of reviewers may all seek to optimize different things. The only solution I can see is to stop pretending as if we know what each other is trying to do and be a bit more explicit about it. Because if we argue based on different assumptions we'll talk past one another if we assume the other is working on the same set of assumptions.
I believe many journals focusing on potentially influential papers is why we have a reproducibility crisis. Since it is very hard to publish null results, people often don't even bother trying. This leads to tons of wasted effort as multiple groups attempt the same thing not knowing that others before them have failed.
Also, predicting whether a paper will be influential is very hard and inherently subjective, unless you are reviewing something truly groundbreaking. Quality-based reviews are also subjective, but less so.
When an author refuses to address reasonable questions by the reviewers, what should you expect? There were legitimate questions and concerns about potential alternative explanations for the increase in accuracy raised by the reviewers, and the authors didn't play ball.
I actually disagree, but maybe not for the reasons you're expecting. I actually disagree because the reviews are unreasonably reasonable. They are void of context.
It's tough to explain, but I think it's also something every person who has written research papers can understand. How there's a big bias between reading and writing and how our works are not written to communicate our works as best as possible, but in effect how to communicate to reviewers that they should accept our works. The subtle distinction is deceptively large and I think we all could be a bit more honest about the system. After all, we want to optimize it, right?
The problem I see is all a matter of context. Good ideas often appear trivial once we see them. Often we fool ourselves into thinking that we already knew this, but do not have meaningful evidence that this is true but may try to reason that x = y + z + epsilon, but almost any idea can be framed that way, even breakthroughs like Evolution, Quantum Mechanics, or Relativity. It is because we look back at giants from a distance but when looking at works now don't see giants, but a bunch of children standing on one another's shoulders standing in a trench coat. That is the reality of it all. That few works are giant leaps and bounds, but rather incremental. The ones that take the biggest leaps are rare, often rely on luck (ambiguous definition), and frequently (but not always, especially considering the former) take a lot of time. Something we certainly don't often have.
We're trained as scientists and that means to be trained in critiquing systems and letting questions spiral. Sometimes the spiraling of questions shows how absurd an idea is but other times it can show how ingenious it is. It's easier to recognize the former but often hard to distinguish the latter. It is always easy to ask for more datasets, more experiments, and such, but these are critiques that can apply to any work as no work is complete. This is especially true in cases of larger breakthroughs, because any paradigm shift (even small) will cause a cascade of questions and create a lot of curiosity. But as we've all written papers, we know that this can often be a never ending cycle and often is quite impractical. The complaint about Table 4 is a perfect example. It is quite a difficult situation. The complaint is perfectly reasonable in that the question and concerns are quite valid and do warrant further understanding. But at the same time they are unreasonable because the requisite work required to answer these is not appropriate for the timescale that we work on. Do you have the compute or time to retrain all prior works to your settings? To retrain all your works to their settings? Maybe it doesn't work there which may or may not be evidence that the other works are just as good or not. What it comes down to is asking if these questions being answered could be another work in their own right. I'm not in NLP as deep as I'm in CV, but I suspect that the answer is yes (as in there are works that have been published answering exactly those questions).
There are also reasonably unreasonable questions in other respects. Such as the question about cosine distance vs Euclidean. This is one that I see quite often as we rely too deeply on our understanding of lower dimensional geometries to influence our understanding of high dimensional geometries. Such things that seem obvious, like distance, are quite ambiguous there and our example is the specific reason for the curse of dimensionality (it becomes difficult to distinguish the furthest point from the nearest point). But this often leads us in quite the wrong direction. Honestly, it is a bit surprising that the cosine similarity works (as D->inf cos(x,y)-> 0 forall x,y in R^D because any random vector is expected to be orthogonal meaning that to get cos(x,y)=1 means y = x + epsilon with epsilon -> 0 as D->inf. But I digress), but it does. There definitely could be entire works exploring these geometries and determining different geodesics. It is entirely enough for a work to simply have something working, even if it doesn't quite yet make sense.
The thing is that science is exceptionally fuzzy. Research is people walking around in the dark and papers are them communicating what we have found. I think it is important for us to remember this framing because we should then characterize the viability/publishability of a work not as illuminating everything but if the things found are useful (which itself is not always known). Because you might uncover a cavern and then it becomes easy to say "report back when you've explored it", but such an effort may be impossible to do alone. It can be a dead end, one that can take decades to explore (we'll always learn something though) or it may lead to riches we've never seen before. We don't know, but that's really what we're all looking for (hopefully more about riches for humanity than one's self, but one should be rewarded for sure).
This is why honestly, I advocate for abandoning the journal/conference system and leverage our modern tools like OpenReview to accelerate communication. Because it enables us to be more open about our limitations, to discuss our failures, and write to our peers rather than our critics. Critics are of course important, but they can take over too easily because they are reasonable and correct, but oft missing context. For an example, see the many HN comments about a technology in its infancy where people will complain that it is not yet competitive with existing technologies and thus devalue the potential. Oft missing the context that it takes time to compete, the time and energy put in by so many before us to make what we have now, but also that there are limits and despite only being a demonstration the new thing does not have the same theoretical limits. The question is rather about if such research warrants more eyes and even a negative result can be good because it can communicate that we've found dead ends (which is something we actively discourage, needlessly forcing many researchers to re-explore these dead spaces). There's so much more that I can say and this is woefully incomplete but I can't fit a novel into our comments and I'm afraid the length as it is already results in poor communication to the given context of the platform. Thanks anyone who has taken this time.
Journal articles/conference papers are not the only outlet, you can still write technical monographs if you feel review cycles are holding you back.
It depends. Right now I'm a grad student and I'm just trying to graduate. My friend, who already left, summed it up pretty well.
I'm just trying graduate and just have to figure out how to play the game enough to leave. Honestly, I do not see myself ever submitting to a journal or conference again. I'll submit to OpenReview, ArXiv, and my blog. I already open my works to discussions on GitHub and am very active in responses and do actually appreciate critiques (there's lots of room for improvement). In fact, my most cited work has been rejected many times, but we also have a well known blog post as a result, and even more than a year later we get questions on our GitHub (which we still respond to, even though many are naive and asks for help debugging python, not our code).
But I'm done with academia because I have no more faith in it. I'd love to actually start or work for a truly open ML research group, where it is possible to explore seemingly naive or unpopular ideas, to not just accept things the way they are and forced to chase hype. I will turn down lots of money to do such a thing. To not just metric hack but be honest about limitations of my works and what still needs to be done, that saying such things is not simply giving ammunition to those who would use it against me. To do research that takes time rather than chase a moving goalpost, against people with more compute who rely on pay to play, nor work in this idiotic publish or perish paradigm. To not be beholden to massive compute or to be restricted to only be able to tune what monoliths have been created. To challenge the LLM and Diffusion paradigms that are so woefully incomplete despite there undeniable success. To openly recognize that both these things can be true without it being misinterpreted as undermining these successes. You'd think academia would be the place for this, but I haven't seen a shred of evidence that it is. I've only seen the issues grow.
But if that's the case why put so much focus on and effort into the peer review system?
If you ask people funding research I'm pretty sure they'd prefer to fund influential ideas than non-influential "high-quality" paper production.
Even if you were to take the extreme position that influence or citation counts are all that matter, the problem is that 'future influence' is hard if not impossible to judge well in the present. (Of course it's easy to point to cases where it's possible to make an educated or confident guess.)
Also, an emphasis on clarity and intellectual depth/rigor is important for the healthy development of a field. Not for nothing, the lack of this emphasis is a pretty common criticism of the whole AI field!
High-quality writing improves information dissemination. A paper like word2vec has probably been skimmed by 10's of thousands, perhaps 100's of thousands people.
One extra day of revising is nothing, comparatively.
This is the right take, despite how some might will want to frame it as 'reviewers are dummies'.