Regardless of this situation I actually think that websites like Archive and TWBM should be fully transparent.
A very large partition of citations in Wikipedia for example relies on them. Most of the pages that cite archived copies do so because the live version is no longer available I would like to have some assurances that archive.is and the likes are not altering their content in any way over time.
Unironically content sensitive hashing of archival pages might be one of the few use cases where something like a blockchain might actually be useful for.
You do need some kind of reliable, distributed storage though. The sequential nature of a blockchain also ensures that such stored content is held no matter what by any full node.
You both need to generate the hash at the point of archival correctly and store it in a way that cannot be modified later on.
Doing that with a blockchain like tech is one of the few use cases where the tech itself actually adds value.
Heck you might be able to store the entire pages on a blockchain or a blockchain linked storage.
The problem with these sites is that we implicitly trust them and unlike a book or other handprint media where editing or destroying all unedited existing copies is effectively impossible if a shady actor can easily start editing archived news articles and other sites that are no longer publicly available.
This is getting to blockchain for the sake of blockchain.
If Wikipedia recorded the hash of every referenced page you could verify that the archive.is page is unchanged.
You could certainly argue that archive.is isn’t the right place to store archives (I have no idea) but attempting to move all this to the blockchain would be very expensive.
You only need the hash of the original content. No blockchain is necessary. The problem is that there is no source for that hash except for the scraper that archives it since people don't put the hash in a hyperlink.
If you download an ISO for a Linux OS for example, they give you the hash of the file so you can check it. They don't build an entire blockchain whatever to validate the hash.
An AdGuard employee working their Reddit subreddit let slip that the legal order that forced them to block those domains (from their ad-blocking DNS) was a - claimed! - result of Archive.today having saved CP and refusing to delete it.
Methinks someone accidentally archived the Epstein files, and the FBI is desperately trying to scrub the unredacted backups before the archive URL becomes well-known. That alone would align somewhat with the CP claim,
The .is TLD is run by ISNIC and they process registrations directly, and operating out of Iceland, it would be very strange if they took orders from the FBI.
I would bet, even when the fbi is able to track down archive.today, it will be a matter of hours until the archive is shifted to another network and reinstated.. Even though if a certain amount of archived data might be lost, the core service will be rehosted quite fast somewhere else, i would think.
>Ars inventing their own colour here. This is simply not true.
What are you talking about? Right at the top of the subpoena it literally says in bold and all caps, and I quote:
>YOU ARE REQUESTED NOT TO DISCLOSE THE EXISTENCE OF THIS SUBPOENA INDEFNITELY AS ANY SUCH DISCLOSURE COULD INTERFERE WITH AN ONGOING INVESTIGATION AND ENFORCEMENT OF THE LAW.
Sounds like a notice that disclosure could result in outcomes that make the discloser liable of (the crime? of) interfering-with-an-investigation.
And thus to avoid that risk, to think twice before disclosing.
Disclosure would only be punished if the specific circumstances actually result in legally-considered-(unlawful-)interference.
Regardless of this situation I actually think that websites like Archive and TWBM should be fully transparent.
A very large partition of citations in Wikipedia for example relies on them. Most of the pages that cite archived copies do so because the live version is no longer available I would like to have some assurances that archive.is and the likes are not altering their content in any way over time.
Unironically content sensitive hashing of archival pages might be one of the few use cases where something like a blockchain might actually be useful for.
Why would the blockchain be useful here? You don’t need a blockchain to store a hash.
You do need some kind of reliable, distributed storage though. The sequential nature of a blockchain also ensures that such stored content is held no matter what by any full node.
No, just no.
A simple four-hash like BSD or Gentoo Linux do with their repository is more than sufficient.
No need to record who is requesting the recording, much leas fetchibg.
It's a hash.
You both need to generate the hash at the point of archival correctly and store it in a way that cannot be modified later on.
Doing that with a blockchain like tech is one of the few use cases where the tech itself actually adds value.
Heck you might be able to store the entire pages on a blockchain or a blockchain linked storage.
The problem with these sites is that we implicitly trust them and unlike a book or other handprint media where editing or destroying all unedited existing copies is effectively impossible if a shady actor can easily start editing archived news articles and other sites that are no longer publicly available.
This is getting to blockchain for the sake of blockchain.
If Wikipedia recorded the hash of every referenced page you could verify that the archive.is page is unchanged.
You could certainly argue that archive.is isn’t the right place to store archives (I have no idea) but attempting to move all this to the blockchain would be very expensive.
You only need the hash of the original content. No blockchain is necessary. The problem is that there is no source for that hash except for the scraper that archives it since people don't put the hash in a hyperlink.
If you download an ISO for a Linux OS for example, they give you the hash of the file so you can check it. They don't build an entire blockchain whatever to validate the hash.
> very large partition of citations in Wikipedia for example relies on them
Is the Internet Archive related to archive.is?
People who steal from Americans get pardons... archive.is gets the the Feds on them.
The current administration would be a good joke if it wasn't real.
FWIW circumventing various paywalls is probably the bad thing archive.is is being investigated for, not the archiving bit.
An AdGuard employee working their Reddit subreddit let slip that the legal order that forced them to block those domains (from their ad-blocking DNS) was a - claimed! - result of Archive.today having saved CP and refusing to delete it.
Methinks someone accidentally archived the Epstein files, and the FBI is desperately trying to scrub the unredacted backups before the archive URL becomes well-known. That alone would align somewhat with the CP claim,
Do they actually do anything to circumvent paywalls or do websites just whitelist their crawlers?
The subpoena is for archive.today.
The .is TLD is run by ISNIC and they process registrations directly, and operating out of Iceland, it would be very strange if they took orders from the FBI.
I would bet, even when the fbi is able to track down archive.today, it will be a matter of hours until the archive is shifted to another network and reinstated.. Even though if a certain amount of archived data might be lost, the core service will be rehosted quite fast somewhere else, i would think.
> The subpoena is supposed to be secret
Ars inventing their own colour here. This is simply not true.
>Ars inventing their own colour here. This is simply not true.
What are you talking about? Right at the top of the subpoena it literally says in bold and all caps, and I quote:
>YOU ARE REQUESTED NOT TO DISCLOSE THE EXISTENCE OF THIS SUBPOENA INDEFNITELY AS ANY SUCH DISCLOSURE COULD INTERFERE WITH AN ONGOING INVESTIGATION AND ENFORCEMENT OF THE LAW.
The "requested" part sounds like it's not mandatory, otherwise they would have used something like "you're not allowed to..." or similar language.
Sounds like a notice that disclosure could result in outcomes that make the discloser liable of (the crime? of) interfering-with-an-investigation.
And thus to avoid that risk, to think twice before disclosing. Disclosure would only be punished if the specific circumstances actually result in legally-considered-(unlawful-)interference.
The "supposed" part as well.
[dupe] https://news.ycombinator.com/item?id=45836826