I recall attending a talk by Richard Stallman where he critiqued the use of the word "piracy" for file sharing. "Pirates" attack and destroy ships, kill innocents, and generally rape and plunder. How does that word fit what's basically sharing and trying to help your neighbor?
The words we use to describe and talk about things and people is actually very important and very much helps shape our impressions, so I can't really see how he's lost the plot.
The real issue is modern day purchasing is not ownership. It's not even leasing. It's you pay the full list price for a PDF or music file that can disappear anytime from your library if the corpo-rats decide it shouldn't be available to you anymore.
If purchasing isn't ownership, then piracy shouldn't be theft.
In America, I see the main argument in support of the destruction of norms for AI corporations and copyright theft is as follows. "If we adhere to standard views of copyright, then we will be disadvantaged in the race to super-intelligence against China."
I really think this argument is baseless, however, because there's absolutely no reason to think if Chinese corporations make some paradigm shifting advancement in AI then American corporations won't quickly copy it, just as Chinese corporations VERY quickly caught up to GPT 3. Is less than 1 year of economic advantage really worth the permanent erasure of norms like copyright and privacy?
I guess US, but not only. I am eluding to the whole AI thing, instead of using it as a platform to improve human condition and not only profit financially, we do have 'Step mom' and 'Russian girl' bots built on top of 'pirated' content.
In South Korea, educational inequality due to inability to afford educational materials has been an issue for many years.
The argument against piracy is that people have the right to be compensated for their work. Consider this argument in the context of medical services. The extreme pirate position is equivalent to expecting doctors to provide cosmetic surgery for free. The extreme industry position is equivalent to saying doctors should let patients die on the sidewalk if they can't pay.
I'm torn on the issue of sharing files, because I've watched its rise over the past 25 years and seen how tremendously it has improved educational opportunities for poorer peoples.
It is important to note that authors are not compensated when you buy access to an academic article in a journal.
In fact, since the authors both do the work of creating the content and peer-reviewing submissions, they must also pay the journal in exchange for doing this work.
Think of it like the patient paying the hospital, the doctor paying the hospital, and in exchange the doctor can do their work at a prestigious hospital, which is all very lucrative (for the hospital)
Journal articles represent an extreme where sharing is most ethical. An opposite extreme would be something like AAA games. In the range between are things like monographs, textbooks, literature, etc.
Textbooks are not that distant from academic publishing in terms of the rent seeking that goes on. Textbook publishers essentially bribe academia for the right to rip off their students by providing courses with testing and academic services that are tied to a license in a new textbook that the student is required to buy.
>An opposite extreme would be something like AAA games.
There's further extremes still: sharing a game made by a solo-developer who's only compensation comes from sales is less ethical than sharing a game made by a team who were paid a salary while they worked.
The problem with copyright is that gives the owner too many other rights that can be summed up as a right to also prevent distribution. This creates an imbalance. The owner should only be given the right to be compensated; but other rights should be stripped. For example, why not to require the right to publish a book to always be a public offer that any publisher can join on the same terms?
The author sets the price. Specifically, the price the publisher -- any publisher -- pays the author to make a copy. Publishers can add a margin to this to cover distribution costs, but then they're in competition with other publishers and customers can buy from any of them.
No other contractual terms. The author only sets the price publishers pay and that price is the same for every publisher. Authors can sell directly to the public but still can't impose any additional license terms nor sell for a lower price than the one publishers pay. Individual members of the public can also make copies on their own as long as they pay the fee.
Then any publisher that wants to publish the work simply does so and pays the author the author's fee, which the author can only change once a year and only change uniformly for all publishers.
>The extreme pirate position is equivalent to expecting doctors to provide cosmetic surgery for free
This analogy is not even remotely applyable to piracy. Piracy is about content that can be shared, you can't grab a surgery and share it with someone else after buying it.
I really would love to help out, but I don't have 650TB of storage laying around :(
The logistics of this archive are quite crazy; most 2-4U JBODs I worked with hold like 24 or 45 SFF SAS disks.
Standard size (unless things have changed) for 10k sff sas disks seems to be about 1.2TB, so you'd need 544 of these to build a raidz big enough. So we're talking 12 4U jbods, well over a full rack.
I guess I can just hope some rich techie with a volcano lair / private datacenter somewhere is keeping a copy..
Buy a Data60 (60 disk chassis), add 60 drives. Buy a 1U server (2 for redundancy). I'd recommend 5 stripes of 11 drives (55 total) with 5 global spares. Use a RAIDz3 so 8 disks of data per 11 drives.
Total storage should be around 8 * 24 * 5 = 960GB, likely 10% less because of marketing 10^9 bytes instead of 2^30 for drive sizes. Another 10% because ZFS doesn't like to get very full. So something like 777TB usable which easily fits 650TB.
I'd recommend a pair of 2TB NVMe with a high DWPD as a cache.
The disks will cost $18k, the data60 is 4U, and a server to connect it is 1U. If you want more space upgrade to 30TB $550 each) drives or buy another Data60 full of drives.
45 Drives is the company that builds the hard drive pods used by Backblaze and they offer a 4U, 60x3.5 inch drive array. Has an advertised capacity of 1.44 PB, which would be 24TB drives configured without redundancy.
There's your problem. Ordinary consumer LFF SATA disks go up to 30TB-ish now, though that may not be the most cost-effective size (or it may be, when you consider the cost of the drive bays as well).
> Enter how many TBs you can help seed, and we’ll give you a list of torrents that need the most seeding! The list is somewhat random every time, so two people generating at the same time will still cover different parts of the collection.
I think that especially for educational, engineering, and scientific research, these kinds of “libraries” cross over firmly into the public good.
Making money on textbooks by selling them to students at premium prices is despicable and is a poverty of mind for the future. Course tuition should include necessary book access. We gain so much more as a civilisation by sharing within academia that there needs to be a change in the way that the publishing industry treats these kinds of “violations “.
Obviously book authors need to be paid, but those at the bottom of the ladder seeking to climb it through education can't afford the books.
Perhaps some kind of program run by the publishers could give free books to these people, to reduce the motivation for piracy of the books? Those that can afford it would still pay, of course.
That said, the current system is broken, so while this particular site is gone, z-library and anna's archive live on and won't be getting taken down any time zoon.
It's amazing that people have somehow been convinced that it's sane not only to throw others in prison for copying files but also to have special police for it.
This quote is perfectly constructed to underline the hypocrisy considering this case is about pirating educational material. If pirating information is allowed when that neural network you are training is some LLM, why shouldn't it be allowed when that neural network is a human brain?
This is why an economic crash is not necessarily a bad thing in the long run. It shatters calcified assumptions about what labor is valuable and presents an opportunity to redefine them. We keep putting the next one off, and the people working jobs that "should" have gone away keep moving forward in their careers, gaining seniority and higher salaries doing varyingly terrible and unnecessary work.
I always thought that one reason to go after marijuana was that it was something that one could easily "grow at home" -- eg, make a copy. Alcohol and hard drugs typically required a lot of equipment or expertise.
..and in a dystopian future there will be an agency to remove any unauthorized knowledge (gained from pirated sources) from ones' brain.
>..and in a dystopian future there will be an agency to remove any unauthorized knowledge (gained from pirated sources) from ones' brain.
as a preview - the similar is going to happen to the LLMs soon. Actually with growth of LLMs it may become unimportant what is in the actual biological brain of its user.
>The Ministry of Culture and Sport says others involved will be tracked down and given lessons in copyright law.
This is way too soft of a punishment. If people can get away with attending a lesson there will not be enough risk in the risk reward analysis people do before violating people's copyrights.
I recall attending a talk by Richard Stallman where he critiqued the use of the word "piracy" for file sharing. "Pirates" attack and destroy ships, kill innocents, and generally rape and plunder. How does that word fit what's basically sharing and trying to help your neighbor?
Utterly losing the plot while playing dumb word games is so typical Stallman.
The words we use to describe and talk about things and people is actually very important and very much helps shape our impressions, so I can't really see how he's lost the plot.
Seems dead on in this case.
Also generally typical of Stallman
The real issue is modern day purchasing is not ownership. It's not even leasing. It's you pay the full list price for a PDF or music file that can disappear anytime from your library if the corpo-rats decide it shouldn't be available to you anymore.
If purchasing isn't ownership, then piracy shouldn't be theft.
So 'pirating' stuff for personal use is bad, but if it's for corp use and benefits than it's good.
In America, I see the main argument in support of the destruction of norms for AI corporations and copyright theft is as follows. "If we adhere to standard views of copyright, then we will be disadvantaged in the race to super-intelligence against China."
I really think this argument is baseless, however, because there's absolutely no reason to think if Chinese corporations make some paradigm shifting advancement in AI then American corporations won't quickly copy it, just as Chinese corporations VERY quickly caught up to GPT 3. Is less than 1 year of economic advantage really worth the permanent erasure of norms like copyright and privacy?
In USA? Correct (unfortunately)
I guess US, but not only. I am eluding to the whole AI thing, instead of using it as a platform to improve human condition and not only profit financially, we do have 'Step mom' and 'Russian girl' bots built on top of 'pirated' content.
We are fucked.
And next week, "Meta argues new Law AI tool trained on books from Pirate Library is fair use."
but how was he found out? did telegram cooperate with korean law enforcement?
Telegram forcefully "cooperated" with French law enforcement. Maybe the French police shared intel with Korean
In South Korea, educational inequality due to inability to afford educational materials has been an issue for many years.
The argument against piracy is that people have the right to be compensated for their work. Consider this argument in the context of medical services. The extreme pirate position is equivalent to expecting doctors to provide cosmetic surgery for free. The extreme industry position is equivalent to saying doctors should let patients die on the sidewalk if they can't pay.
I'm torn on the issue of sharing files, because I've watched its rise over the past 25 years and seen how tremendously it has improved educational opportunities for poorer peoples.
It is important to note that authors are not compensated when you buy access to an academic article in a journal.
In fact, since the authors both do the work of creating the content and peer-reviewing submissions, they must also pay the journal in exchange for doing this work.
Think of it like the patient paying the hospital, the doctor paying the hospital, and in exchange the doctor can do their work at a prestigious hospital, which is all very lucrative (for the hospital)
Journal articles represent an extreme where sharing is most ethical. An opposite extreme would be something like AAA games. In the range between are things like monographs, textbooks, literature, etc.
> textbooks
Textbooks are not that distant from academic publishing in terms of the rent seeking that goes on. Textbook publishers essentially bribe academia for the right to rip off their students by providing courses with testing and academic services that are tied to a license in a new textbook that the student is required to buy.
>An opposite extreme would be something like AAA games.
There's further extremes still: sharing a game made by a solo-developer who's only compensation comes from sales is less ethical than sharing a game made by a team who were paid a salary while they worked.
The problem with copyright is that gives the owner too many other rights that can be summed up as a right to also prevent distribution. This creates an imbalance. The owner should only be given the right to be compensated; but other rights should be stripped. For example, why not to require the right to publish a book to always be a public offer that any publisher can join on the same terms?
How should such a publisher set the price?
The author sets the price. Specifically, the price the publisher -- any publisher -- pays the author to make a copy. Publishers can add a margin to this to cover distribution costs, but then they're in competition with other publishers and customers can buy from any of them.
No other contractual terms. The author only sets the price publishers pay and that price is the same for every publisher. Authors can sell directly to the public but still can't impose any additional license terms nor sell for a lower price than the one publishers pay. Individual members of the public can also make copies on their own as long as they pay the fee.
Then any publisher that wants to publish the work simply does so and pays the author the author's fee, which the author can only change once a year and only change uniformly for all publishers.
>The extreme pirate position is equivalent to expecting doctors to provide cosmetic surgery for free
This analogy is not even remotely applyable to piracy. Piracy is about content that can be shared, you can't grab a surgery and share it with someone else after buying it.
It's not even about "piracy". It's about "unauthorized copying".
would you say the cost of copying a pdf file is similar to the cost of several hours in a surgical theatre?
Access to education and information should be free - or at the very least a lot less broken than it is today.
https://annas-archive.org/torrents
I really would love to help out, but I don't have 650TB of storage laying around :(
The logistics of this archive are quite crazy; most 2-4U JBODs I worked with hold like 24 or 45 SFF SAS disks.
Standard size (unless things have changed) for 10k sff sas disks seems to be about 1.2TB, so you'd need 544 of these to build a raidz big enough. So we're talking 12 4U jbods, well over a full rack.
I guess I can just hope some rich techie with a volcano lair / private datacenter somewhere is keeping a copy..
24TB drives are quite available, $300 on newegg.
Buy a Data60 (60 disk chassis), add 60 drives. Buy a 1U server (2 for redundancy). I'd recommend 5 stripes of 11 drives (55 total) with 5 global spares. Use a RAIDz3 so 8 disks of data per 11 drives.
Total storage should be around 8 * 24 * 5 = 960GB, likely 10% less because of marketing 10^9 bytes instead of 2^30 for drive sizes. Another 10% because ZFS doesn't like to get very full. So something like 777TB usable which easily fits 650TB.
I'd recommend a pair of 2TB NVMe with a high DWPD as a cache.
The disks will cost $18k, the data60 is 4U, and a server to connect it is 1U. If you want more space upgrade to 30TB $550 each) drives or buy another Data60 full of drives.
Why limit yourself to the SFF?
45 Drives is the company that builds the hard drive pods used by Backblaze and they offer a 4U, 60x3.5 inch drive array. Has an advertised capacity of 1.44 PB, which would be 24TB drives configured without redundancy.
https://www.45drives.com/products/storage-server-storinator-...
Seagate have an Exos M in 36TB for about £450, so 24 of those could do it. Three vdevs with one parity drive each? Call the project £13k?
Not exactly a production grade setup, but it'd do the job and you'll see fewer failures each year than in 544 10k SAS drives.
> 10k sff sas disks
There's your problem. Ordinary consumer LFF SATA disks go up to 30TB-ish now, though that may not be the most cost-effective size (or it may be, when you consider the cost of the drive bays as well).
> Enter how many TBs you can help seed, and we’ll give you a list of torrents that need the most seeding! The list is somewhat random every time, so two people generating at the same time will still cover different parts of the collection.
I don't think you need 650TB!
Well I don't think you have to dedicate 650TB or nothing :P
A full rack or more sounds like a lot, but I don't know much about hardware, so I'll take your word for it.
At work I have about 1PB per full rack, and that's with plain HP/Dell servers with 12 3.5" hard disks in each.
Maximising storage isn't the purpose of this setup, much denser configurations are possible as others have commented.
You can make it free by producing a book and then releasing it to public domain.
No doubt many LLMs have trained on the same content, with no complaints.
I think that especially for educational, engineering, and scientific research, these kinds of “libraries” cross over firmly into the public good.
Making money on textbooks by selling them to students at premium prices is despicable and is a poverty of mind for the future. Course tuition should include necessary book access. We gain so much more as a civilisation by sharing within academia that there needs to be a change in the way that the publishing industry treats these kinds of “violations “.
It seems like the passing of linger should get more notice.
Obviously book authors need to be paid, but those at the bottom of the ladder seeking to climb it through education can't afford the books.
Perhaps some kind of program run by the publishers could give free books to these people, to reduce the motivation for piracy of the books? Those that can afford it would still pay, of course.
That said, the current system is broken, so while this particular site is gone, z-library and anna's archive live on and won't be getting taken down any time zoon.
> Obviously book authors need to be paid
Do they? When the required reading is this years's version of the course professor's book, you are drowning in the BS.
> Copyright Crime Special Unit
It's amazing that people have somehow been convinced that it's sane not only to throw others in prison for copying files but also to have special police for it.
They've been getting away with it for ages. The RIAA even hired a bunch of ex-cops to act as goons, impersonate the FBI, and go on raids
https://web.archive.org/web/20040209170528/http://www.laweek...
I swear I've seen photos of them in their RIAA jackets too, but I can't seem to find any at the moment
Unless big tech does it I guess
That's easy: government should serve the needs of people, not companies.
It's not hypocritical to want a different set of rules for companies.
> a different set of rules for companies
That is sort of the problem we find ourselves in.
"It's not copyright infringement! I'm just using it to train my neural networks with the information!"
This quote is perfectly constructed to underline the hypocrisy considering this case is about pirating educational material. If pirating information is allowed when that neural network you are training is some LLM, why shouldn't it be allowed when that neural network is a human brain?
Human brains are not created and owned solely by companies
"fair use" for corporations.
I had the same thought. Those people could be put to so much better use, solving real problems that everyday people have.
> Those people could be put to so much better use, solving real problems that everyday people have.
You can apply that logic to large swathe of well paid jobs.
This is why an economic crash is not necessarily a bad thing in the long run. It shatters calcified assumptions about what labor is valuable and presents an opportunity to redefine them. We keep putting the next one off, and the people working jobs that "should" have gone away keep moving forward in their careers, gaining seniority and higher salaries doing varyingly terrible and unnecessary work.
The cyberpunk dystopia that 80s fiction authors prepared [some of] us for.
Drug and Copyright Enforcement Agency?
I always thought that one reason to go after marijuana was that it was something that one could easily "grow at home" -- eg, make a copy. Alcohol and hard drugs typically required a lot of equipment or expertise.
..and in a dystopian future there will be an agency to remove any unauthorized knowledge (gained from pirated sources) from ones' brain.
>..and in a dystopian future there will be an agency to remove any unauthorized knowledge (gained from pirated sources) from ones' brain.
as a preview - the similar is going to happen to the LLMs soon. Actually with growth of LLMs it may become unimportant what is in the actual biological brain of its user.
You would think we as a society should crack down harder on theft and robbery but here we are subsiding the rich on copyright laws
>The Ministry of Culture and Sport says others involved will be tracked down and given lessons in copyright law.
This is way too soft of a punishment. If people can get away with attending a lesson there will not be enough risk in the risk reward analysis people do before violating people's copyrights.
Won’t someone think of the textbook publishers. They are but paupers.
Wait until you see the price of the textbook.