I always struggle to figure out what role arXiv should play in my information diet. On the one hand I support Open Access research. On the other hand, peer review is vital, and a substantial quantity of “papers” on arXiv are just blog posts in a LaTeX trench coat.
If you know the authors of your specific area of research, arXiv is a nice way to read their new papers when they are (mostly) done but the submission to a journal is not finished yet.
Do people browse arxiv or monitor new posts like reddit or something? I only visit when I encounter a link to it or when I search for a specific paper.
It depends on the kind of people. Most normal people don't do that, it's not a reddit-like platform after all.
But most researchers and grad students (like me) often subscribe to daily mailing list of the papers dropping that day from their particular field. Having a cursory read at the paper titles and then opening the papers further relevant to you is a morning ritual for many.
I use the RSS feeds to watch for papers mentioning terms I'm curious about, do a casual skim for anything interesting and maybe end up finding a paper per month or two that are useful to read more carefully. Lots of chaff for sure, but if you have some core interests it's quite useful.
A bit too big and varied to browse, but you can get emails of all recent papers in your field(s) of interest with something like Scholars: https://app.scholars.io/newsletter I subscribe to "Functional Analysis" and get a weekly email listing 30-40 papers.
Yeah, it is not too uncommon that people visit the new listings (or subscribe to the email version) to (try to) keep track of what is going on in your field.
Supposing of course your field roughly matches one of the categories.
It’s a useful tool. But its “value” is about the same as a github repo with your pdf.
It doesn’t need much funding or staff and not quite sure why they’re going through all this rigmarole and independence. I almost think they’d be better off like Apache where there ade very few employees.
I really like the idea. In short: arXiv, HAL and similar sites host the papers without any peer review (short of perhaps stopping crank spam) or access control. They're freely available to anyone. Authors then submit arXiv IDs (or similar) to the reviewers of "overlay journals", which then review and accept or not. The overlay journal accepts a paper by just adding it to its list of accepted arXiv identifiers, and that's that.
This ensures accessibility for all, keeps peer review, yet takes a lot of the practical hurdles away from actually running a journal. A journal can now just be a group of people who give thumbs up or down to arXiv identifiers, and if that group's conclusion start having weight in the community then it's become an important journal. Maybe they give away their listings for free, maybe they charge to read the reviews – it's really up to them what the business model (if any) will be.
Of course some blog posts are worth citing. Then cite them as blog posts.
My point is that a LaTeX PDF can launder epistemic status. An unreviewed argument starts to look like established research merely because it adopts the visual grammar of a paper.
At least in economics it can easily be 1-5 years until you go from draft to journal. In the meantime, you want a way for others to easily cite your paper, to make different revisions available, for you to post it in a way that's stable (people's websites change all the time, etc.)
Also, because most folks don't want to deal with paywalls, it's standard practice to put the last version of your draft before conditional acceptance on an online repository. It used to be SSRN for econ/finance, but they sold out to Elsevier, so now arxiv is increasingly being used.
My point is it's still useful to have a somewhat authoritative place to cite (high quality) blog post level content. arXiv has formatting requirements and doesn't go down like random personal sites.
> a LaTeX PDF can launder epistemic status
True to a certain extent, although something people are aware of and they can judge the content themselves (hopefully).
Actually arXiv is frustrating from an open access angel. It is very much possible to put up documents without open licensing so the content is not always fulfilling the open access definition.
Peer review WAS vital for a long time. Maybe the world looks different now, maybe LLMs can find value in things better than humans. When you make an assumption it's good to think about why you do so, in this case it seems to be for historical reasons.
likewise, taking a wrecking ball to systems refined over centuries should come with some burden of proof for the positive claim that a tool can replace an institution. most times this has happened before, we've had to strengthen credentialing requirements to stop people from dying
The burden of proof is on peer review not the other way around. Peer review is a fairly modern invention post WWII. Prior to that “peer review” looked very different.
i'm saying all positive claims need to be justified, not that priors are exempt. there is one claim with a vast body of evidence supporting it, and a competing claim that must meet the same standard. the world is not so magically different now that we can't look at software engineering and computer science the same way we look at real (credentialist, regulated) science and engineering disciplines. really all i was implying is "peer review WAS vital" is jumping the gun
It is also valuable for scientists as it is often a 'directors cut' version of the paper. Journal submissions are heavy edited and shortened to fit into the page limits.
look at essentially any proceedings of any conference (in crypto we dont really do journals). see EUROCRYPT for example https://link.springer.com/book/10.1007/978-3-031-91098-2 in there, every paper will be cut down and referring to full version for proofs etc. which are typically on eprint.iacr.org
We usually do conferences in cryptography/security, and most of them have page limits: CCS, USENIX, NDSS, S&P, CRYPTO, EUROCRYPT all have page limits (some allow appendices, which reviewers are not obligated to read).
exactly, the only reason Mozilla exists today is as a legal shield against an anti-browser monopoly suit against Google. that's the product they sell, and Google is paying hundreds of millions per year for this valuable service
I thought google pay Mozilla so they don't set the default search engine to something else (they same way Google pay Apple for Safari) and so Google continues to dominate and makes money of ads.
Papers submitted to arXiv under its most permissive license should always be free, as in beer, speech, freedom. For researchers that contribute to it, that is the intention for a reason. It is to serve public and corporate good without restriction.
This isn't me siding with AI companies by the way; it's a slippery slope argument.
ArXiv is a good complement to the modern peer review, IMO. As long as someone "vouches" for you, and you adhere to its minimal standards, you're able to post a paper. Other readers can decide whether the paper is worth their attention, and whether the presented ideas or results are valuable.
It's also good that it doesn't gatekeep with the paywalls that you can pretty much only afford by affiliating yourself with a toll-paying institution.
Obviously, there are plenty of flaws with this system:
1. If you're associated with a brand (e.g., Google, MIT) or have a recognizable co-author (e.g., Yann LeCun), you'll get attention and citations no matter what.
2. "Vouching" can also just mean accepting someone's email request without ever having met or known them.
3. It puts the effort on the readers to decide whether each paper is valuable, and particularly scientifically valuable, for which most readers will be unequipped.
4. "Minimal standards" can be gamed by AI-generated submissions.
I'd love to see a synthesis of arXiv, open-access publishing and artifact reviews, like the following:
- Have a number of reviewers on retainer, or design a reward system similar to bug bounties. The reward mechanism probably shouldn't be based on money or allow a winner-takes-all strategy.
- Have a number of badges with respect to the quality and value of the paper. For example: validated by peers (i.e., reviewed by at least 3 peers with minimum borderline accept consensus), valuable (i.e., reviewed by at least 5 peers with a valuable indicator), etc.
- Allow vouched comments on the platform, and moderate for self-promotion, toxicity, etc. Obviously a big ask.
- Improve the "vouching" system, or add badges like "vouched by X people" or "vouched by established scientist".
Hope their new organization will implement some of these improvements.
I volunteered for a project [1] with roughly this philosophy. Traditional publishing currently serves three purposes:
- Organise peer feedback
- Publish the work
- Recognise good work, helping with both discovery and credit
That latter part especially is what allows publishers to charge the ridiculous markup that they do.
But with "modern" technology, feedback and publishing really doesn't require all that infrastructure - email and arXiv can easily be used to self-organise that. So we built a system of recognition that does not block publication, and can be used as a layer on top of arXiv and any other venue, allowing peers to vouch ("endorse") for a work.
I had even proposed and implemented an integration for arXiv Labs that got accepted, but then never merged. I should follow up on that...
>3. It puts the effort on the readers to decide whether each paper is valuable, and particularly scientifically valuable, for which most readers will be unequipped.
You say it as if replication crisis doesn't exist and publish or perish is not a thing.
Actually, the replication crisis shows how difficult (or underinvested) the process of reviewing is.
Removing this (often very basic) peer review doesn't somehow fix the problem. The solution lies in more thorough reviews and replication studies, not in everyone deciding for themselves.
That worries me a bit. ArXiv was and is great and so useful to humanity, giving access to otherwise closed knowledge, hold by publishers cartel, that I would not like to see it is turning into a "non-profit" of OpenAI kind...
openai had billionaire "donors" who understood the company was going to operate as a PBC with a positive return for them instead of a true nonprofit.
the heel turn to unlimited for profit was only possible because of their unique structure and the fact they were already selling commercial products. arxiv is not selling anything so theres no financial incentive to take over.
This is exactly the play book that messed up scientific communication last time. Journals and research societies run by researchers and their institutions was spun off, sold, and made independent which in turn made it possible for a few publishers to gobble up everything.
I always struggle to figure out what role arXiv should play in my information diet. On the one hand I support Open Access research. On the other hand, peer review is vital, and a substantial quantity of “papers” on arXiv are just blog posts in a LaTeX trench coat.
If you know the authors of your specific area of research, arXiv is a nice way to read their new papers when they are (mostly) done but the submission to a journal is not finished yet.
Do people browse arxiv or monitor new posts like reddit or something? I only visit when I encounter a link to it or when I search for a specific paper.
It depends on the kind of people. Most normal people don't do that, it's not a reddit-like platform after all.
But most researchers and grad students (like me) often subscribe to daily mailing list of the papers dropping that day from their particular field. Having a cursory read at the paper titles and then opening the papers further relevant to you is a morning ritual for many.
I built a bluesky bot if someone is interested in having a live feed of the articles.
You can find it here: https://bsky.app/profile/arxiv-daily-bot.bsky.social
I did when I was in academia. Would open each day and check what new papers were in my field. It was fun, and I learned a ton.
I kept it up out of habit for a year after grad school. Then moved on.
I use the RSS feeds to watch for papers mentioning terms I'm curious about, do a casual skim for anything interesting and maybe end up finding a paper per month or two that are useful to read more carefully. Lots of chaff for sure, but if you have some core interests it's quite useful.
Yes, people do that. Karpathy made a utility to monitor it better years ago: https://github.com/karpathy/arxiv-sanity-preserver
A bit too big and varied to browse, but you can get emails of all recent papers in your field(s) of interest with something like Scholars: https://app.scholars.io/newsletter I subscribe to "Functional Analysis" and get a weekly email listing 30-40 papers.
Yeah, it is not too uncommon that people visit the new listings (or subscribe to the email version) to (try to) keep track of what is going on in your field.
Supposing of course your field roughly matches one of the categories.
I’m RSS-subscribed to a few sections relevant to my research.
RSSFeed yes
It’s a useful tool. But its “value” is about the same as a github repo with your pdf.
It doesn’t need much funding or staff and not quite sure why they’re going through all this rigmarole and independence. I almost think they’d be better off like Apache where there ade very few employees.
The bibliography is more important, imo, than the peer review. I get the most use of arxiv surfing references and citations.
One growing role, especially in mathematics, is that of a host for "overlay journals": https://www.insmi.cnrs.fr/en/cnrsinfo/epijournaux-en-mathema...
I really like the idea. In short: arXiv, HAL and similar sites host the papers without any peer review (short of perhaps stopping crank spam) or access control. They're freely available to anyone. Authors then submit arXiv IDs (or similar) to the reviewers of "overlay journals", which then review and accept or not. The overlay journal accepts a paper by just adding it to its list of accepted arXiv identifiers, and that's that.
This ensures accessibility for all, keeps peer review, yet takes a lot of the practical hurdles away from actually running a journal. A journal can now just be a group of people who give thumbs up or down to arXiv identifiers, and if that group's conclusion start having weight in the community then it's become an important journal. Maybe they give away their listings for free, maybe they charge to read the reviews – it's really up to them what the business model (if any) will be.
It's really nice.
Well, some blog posts are worth citing.
Of course some blog posts are worth citing. Then cite them as blog posts.
My point is that a LaTeX PDF can launder epistemic status. An unreviewed argument starts to look like established research merely because it adopts the visual grammar of a paper.
At least in economics it can easily be 1-5 years until you go from draft to journal. In the meantime, you want a way for others to easily cite your paper, to make different revisions available, for you to post it in a way that's stable (people's websites change all the time, etc.)
Also, because most folks don't want to deal with paywalls, it's standard practice to put the last version of your draft before conditional acceptance on an online repository. It used to be SSRN for econ/finance, but they sold out to Elsevier, so now arxiv is increasingly being used.
> Then cite them as blog posts
My point is it's still useful to have a somewhat authoritative place to cite (high quality) blog post level content. arXiv has formatting requirements and doesn't go down like random personal sites.
> a LaTeX PDF can launder epistemic status
True to a certain extent, although something people are aware of and they can judge the content themselves (hopefully).
Actually arXiv is frustrating from an open access angel. It is very much possible to put up documents without open licensing so the content is not always fulfilling the open access definition.
Peer review WAS vital for a long time. Maybe the world looks different now, maybe LLMs can find value in things better than humans. When you make an assumption it's good to think about why you do so, in this case it seems to be for historical reasons.
likewise, taking a wrecking ball to systems refined over centuries should come with some burden of proof for the positive claim that a tool can replace an institution. most times this has happened before, we've had to strengthen credentialing requirements to stop people from dying
The burden of proof is on peer review not the other way around. Peer review is a fairly modern invention post WWII. Prior to that “peer review” looked very different.
i'm saying all positive claims need to be justified, not that priors are exempt. there is one claim with a vast body of evidence supporting it, and a competing claim that must meet the same standard. the world is not so magically different now that we can't look at software engineering and computer science the same way we look at real (credentialist, regulated) science and engineering disciplines. really all i was implying is "peer review WAS vital" is jumping the gun
I'm always grateful to arXiv. It allows non-scientists like me to access high-quality papers anytime. Thank you, always
It is also valuable for scientists as it is often a 'directors cut' version of the paper. Journal submissions are heavy edited and shortened to fit into the page limits.
I don't know which field you're talking about, but in general, math and cs journals do not have page limits.
By the way, one of my favorite pastimes is to download the latex source for papers on arxiv and read all the commented-out stuff.
% we should make sure this theorem is actually true
cryptography, for example, which is essentially math + cs together
Which journal?
look at essentially any proceedings of any conference (in crypto we dont really do journals). see EUROCRYPT for example https://link.springer.com/book/10.1007/978-3-031-91098-2 in there, every paper will be cut down and referring to full version for proofs etc. which are typically on eprint.iacr.org
Well, yes, conference proceedings are usually page limited, but that's not a journal.
We usually do conferences in cryptography/security, and most of them have page limits: CCS, USENIX, NDSS, S&P, CRYPTO, EUROCRYPT all have page limits (some allow appendices, which reviewers are not obligated to read).
Related: https://news.ycombinator.com/item?id=47450478
“ArXiv declares independence from Cornell” (science.org)
811 points | 3 months ago | 291 comments
Should charge AI for training on top of it or get them to donate. A small amount can fund them easily.
Part of the promise of open access and open science is that the information is free and open to all. Including robots.
I submit to open things because I want my material to be openly available. If I wanted restrictions, I would submit to gated journals.
That would be a trap. It's healthier for a non-profit to have many small funders than a few large ones.
exactly, the only reason Mozilla exists today is as a legal shield against an anti-browser monopoly suit against Google. that's the product they sell, and Google is paying hundreds of millions per year for this valuable service
I thought google pay Mozilla so they don't set the default search engine to something else (they same way Google pay Apple for Safari) and so Google continues to dominate and makes money of ads.
If Google didn't pay hundreds of millions, Microsoft would.
If Google just wanted them to exist and didn't care about profiting off of the search traffic they wouldn't partner with Mozilla.
Papers submitted to arXiv under its most permissive license should always be free, as in beer, speech, freedom. For researchers that contribute to it, that is the intention for a reason. It is to serve public and corporate good without restriction.
This isn't me siding with AI companies by the way; it's a slippery slope argument.
as if they would pay.... they would pirate the contents as they already did
They’ve never paid for any content?
ArXiv is a good complement to the modern peer review, IMO. As long as someone "vouches" for you, and you adhere to its minimal standards, you're able to post a paper. Other readers can decide whether the paper is worth their attention, and whether the presented ideas or results are valuable.
It's also good that it doesn't gatekeep with the paywalls that you can pretty much only afford by affiliating yourself with a toll-paying institution.
Obviously, there are plenty of flaws with this system:
1. If you're associated with a brand (e.g., Google, MIT) or have a recognizable co-author (e.g., Yann LeCun), you'll get attention and citations no matter what.
2. "Vouching" can also just mean accepting someone's email request without ever having met or known them.
3. It puts the effort on the readers to decide whether each paper is valuable, and particularly scientifically valuable, for which most readers will be unequipped.
4. "Minimal standards" can be gamed by AI-generated submissions.
I'd love to see a synthesis of arXiv, open-access publishing and artifact reviews, like the following:
- Have a number of reviewers on retainer, or design a reward system similar to bug bounties. The reward mechanism probably shouldn't be based on money or allow a winner-takes-all strategy.
- Have a number of badges with respect to the quality and value of the paper. For example: validated by peers (i.e., reviewed by at least 3 peers with minimum borderline accept consensus), valuable (i.e., reviewed by at least 5 peers with a valuable indicator), etc.
- Allow vouched comments on the platform, and moderate for self-promotion, toxicity, etc. Obviously a big ask.
- Improve the "vouching" system, or add badges like "vouched by X people" or "vouched by established scientist".
Hope their new organization will implement some of these improvements.
I volunteered for a project [1] with roughly this philosophy. Traditional publishing currently serves three purposes:
- Organise peer feedback - Publish the work - Recognise good work, helping with both discovery and credit
That latter part especially is what allows publishers to charge the ridiculous markup that they do.
But with "modern" technology, feedback and publishing really doesn't require all that infrastructure - email and arXiv can easily be used to self-organise that. So we built a system of recognition that does not block publication, and can be used as a layer on top of arXiv and any other venue, allowing peers to vouch ("endorse") for a work.
I had even proposed and implemented an integration for arXiv Labs that got accepted, but then never merged. I should follow up on that...
[1] https://plaudit.pub/
> I had even proposed and implemented an integration for arXiv Labs that got accepted, but then never merged. I should follow up on that...
You definitely should - looks like what I roughly had in mind.
Thanks for sharing!
>3. It puts the effort on the readers to decide whether each paper is valuable, and particularly scientifically valuable, for which most readers will be unequipped.
You say it as if replication crisis doesn't exist and publish or perish is not a thing.
Actually, the replication crisis shows how difficult (or underinvested) the process of reviewing is.
Removing this (often very basic) peer review doesn't somehow fix the problem. The solution lies in more thorough reviews and replication studies, not in everyone deciding for themselves.
You can even combine arXiv and peer review very neatly: https://news.ycombinator.com/item?id=48744030
The big challenge will maybe be governance more than infrastructure : staying community driven while becoming an independent nonprofit is not trivial
That worries me a bit. ArXiv was and is great and so useful to humanity, giving access to otherwise closed knowledge, hold by publishers cartel, that I would not like to see it is turning into a "non-profit" of OpenAI kind...
openai had billionaire "donors" who understood the company was going to operate as a PBC with a positive return for them instead of a true nonprofit.
the heel turn to unlimited for profit was only possible because of their unique structure and the fact they were already selling commercial products. arxiv is not selling anything so theres no financial incentive to take over.
This is exactly the play book that messed up scientific communication last time. Journals and research societies run by researchers and their institutions was spun off, sold, and made independent which in turn made it possible for a few publishers to gobble up everything.