HN frequently suggests that DNS should be used to solve this; sleevi replied a few years back with:
> This has been a common suggestion since before the Publix Suffix List existed, as you can see from the linked issues in the text (and the references to the IETF DBOUND WG). Like most things, on first glance, it seems like it does make sense. Except it has a lot of issues, which you can see have been discussed for 15 years without resolution, even though yes, it would scale better.
The Public Suffix List changes often. I have once worked with a team who built a major feature on top of PSL, but the person who built it did not at all consider how it might handle changes to it. Basically, the feature analyzed domains and uses PSL data to extract the "important part" of the domain, and then stored that in the database as part of a primary key in a table. But when the PSL changes, the database needed to be taken offline for certain tables to be completely rebuilt. And code querying the database had to be updated in lockstep with the database changes. This design made zero-downtime deployments difficult. It then took quite a while for the team to evolve the schema such that the database contents would not depend on the PSL.
This is just one cautionary tale I have personally experienced.
It's also full of non-icann extensions. So a naive implementation will identify "github.io" as a TLD. There are lots of nuances to working with this list. Our team has a pretty robust internal (Python) library now that we hope to open source soon.
The whole point of PSL is to identify "github.io" as a TLD. Anyone can create a subdomain of it. Just like anyone can create a new subdomain of "com" (a real TLD).
The difference is you don't register a domain under github.io, you merely loan it. Some countries, like Poland, have a bunch that are real domain suffixes
Loaning or renting (registering) amount to the same thing for the purposes of the the public suffix list: because the *public* can create entries under github.io, you cannot assume that alice.github.io and eve.github.io are controlled by the same entity, so you should not share alice.github.io's data (e.g. cookies) with eve.github.io.
I was looking at this in terms of trying to keep an app up-to-date, and there was a lot more churn than I expected. If you have a security reason to be reading this, you may need to put some effort into maintaining this... at least, technically. I doubt there's hardly an app out there "properly" keeping up with this and the world seems to largely hold together even so.
What sort of backwards system is this? Why is this not in DNS? Just drop an RFC that says how to add a trust demarcation record already. Here is a how i would do it.
TXT v=ps1 ;trust boundary at this point
TXT v=ps2 exception1.my.network. ; trust boundary with exceptions at this point
And then let the big operators argue for a few years on why this in insufficient and we need a complicated dsl (cough spf cough) v=ps3. and what to do when both ps1 and ps2 entries exist. (confused operator, ignore exceptions)
It says this is a project of Mozilla, but it seems like something that would make sense under IANA. Is there a reason why it is not maintained by a standards organization? Maybe the definition of what is/isn't a public suffix is too fuzzy to standardize?
I worked on a DNS resolver that detects DNS exfiltration in part by using this list to aggregate high entropy subdomains to the first level below the TLD. And, indeed I didn’t account for the list updating frequently and need to fix that.
I only became passively aware of this because Let's Encrypt uses the PSL for limits on registrations for domains not in the PSL. Been meaning to setup a dyndns service for a few of my domains and need to get them on the PSL so users can manage to do HTTPS without issue.
Edit: I still think that domains hosted with major dyndns services (like freedns.afraid.org) should be treated like PSLs.
I'm surprised most of the free dyndns domains aren't in there already. The first time I learned about the list was when Let's Encrypt was in closed beta, and they already had a warning on the site telling people not to add their own domain as a means to circumvent registration limits for ACME certs.
Re. afraid.org, there's good discussion in the ticket tracker explaining why that hasn't happened. Anyone is free to try to convince the domain owners, but the domain owner needs to approve the addition.
I came across the PSL when a state government department contacted my consultancy and asked what the impact would be of uncommenting a line in the PSL. They were focused on the effect this would have on DMARC and SPF records of child agencies under the parent TLD, but I realised that it also meant that cookies that could previously be shared across agency boundaries would suddenly be siloed at a different level, potentially breaking web apps. (Think authentication portals using shared cookies across a bunch of things.)
But how to test this!?
I discovered that the PSL is embedded in browser executables when they’re compiled. So I came up with the approach of making two Chromium builds, one with the PSL change and one without the change. Since it has a nice blue icon I changed the modified build to have a red icon. I called these the “red pill” and “blue pill” versions.
The idea was that web devs could test their sites with the two nearly identical browsers side-by-side and so any observed difference is a sign of a potential issue. I also used Playwright to scan over ten thousand public URLs with both a compared the traces programmatically.
Another trick I used was to spin up spot priced “HPC” instances in Azure with 120 AMD EPYC cores to run the builds.
One of the most fun projects I’ve ever worked on.
No, they never changed the PSL, it’s still incorrect.
I only found one site that has an issue, but that made them too nervous and they gave up…
This feels like you've accidentally waxed pedantic a bit. In common parlance, com is a TLD, example.com is a domain, foo.example.com is a subdomain. The suffix list is designed to capture all of that and maps to how it's used (you take the suffix list and check if anything in it is a suffix map for the name you've been given).
- com, example.com, foo.example.com are all domains
- com is a TLD
- subdomain is a relative term, not an absolute one:
. example.com is a subdomain of com
. foo.example.com is a subdomain of example.com
. bar.foo.example.com is a subdomain of foo.example.com
Yup, you’re correct. But in common usage, it would be weird to refer to example.com as a subdomain. Depending on the context, it would also be weird to refer to foo.example.com as a domain instead of a subdomain.
If somebody asked me what domain you’re using and you said “com”, you would technically have answered accurately but they’d be confused.
For those first discovering the PSL, a brief review.
There are ~90 prior comments concentrated mostly in two prior submissions from 2016 and 2021 so far: https://news.ycombinator.com/from?site=publicsuffix.org
This is the top comment on the 2021 discussion:
> Before you begin to make use of the PSL, consider some of its problems: https://github.com/sleevi/psl-problems
There are another couple dozen comments on a few submissions of that: https://news.ycombinator.com/from?site=github.com/sleevi
HN frequently suggests that DNS should be used to solve this; sleevi replied a few years back with:
> This has been a common suggestion since before the Publix Suffix List existed, as you can see from the linked issues in the text (and the references to the IETF DBOUND WG). Like most things, on first glance, it seems like it does make sense. Except it has a lot of issues, which you can see have been discussed for 15 years without resolution, even though yes, it would scale better.
[flagged]
The Public Suffix List changes often. I have once worked with a team who built a major feature on top of PSL, but the person who built it did not at all consider how it might handle changes to it. Basically, the feature analyzed domains and uses PSL data to extract the "important part" of the domain, and then stored that in the database as part of a primary key in a table. But when the PSL changes, the database needed to be taken offline for certain tables to be completely rebuilt. And code querying the database had to be updated in lockstep with the database changes. This design made zero-downtime deployments difficult. It then took quite a while for the team to evolve the schema such that the database contents would not depend on the PSL.
This is just one cautionary tale I have personally experienced.
It's also full of non-icann extensions. So a naive implementation will identify "github.io" as a TLD. There are lots of nuances to working with this list. Our team has a pretty robust internal (Python) library now that we hope to open source soon.
The whole point of PSL is to identify "github.io" as a TLD. Anyone can create a subdomain of it. Just like anyone can create a new subdomain of "com" (a real TLD).
The difference is you don't register a domain under github.io, you merely loan it. Some countries, like Poland, have a bunch that are real domain suffixes
https://www.dns.pl/en/list_of_functional_domain_names
Loaning or renting (registering) amount to the same thing for the purposes of the the public suffix list: because the *public* can create entries under github.io, you cannot assume that alice.github.io and eve.github.io are controlled by the same entity, so you should not share alice.github.io's data (e.g. cookies) with eve.github.io.
This list sees a lot more updates than you'd probably think: https://github.com/publicsuffix/list/commits/main/
I was looking at this in terms of trying to keep an app up-to-date, and there was a lot more churn than I expected. If you have a security reason to be reading this, you may need to put some effort into maintaining this... at least, technically. I doubt there's hardly an app out there "properly" keeping up with this and the world seems to largely hold together even so.
TIL! Guess I have to do a `go get -u golang.org/x/net/publicsuffix` now.
What sort of backwards system is this? Why is this not in DNS? Just drop an RFC that says how to add a trust demarcation record already. Here is a how i would do it.
TXT v=ps1 ;trust boundary at this point
TXT v=ps2 exception1.my.network. ; trust boundary with exceptions at this point
And then let the big operators argue for a few years on why this in insufficient and we need a complicated dsl (cough spf cough) v=ps3. and what to do when both ps1 and ps2 entries exist. (confused operator, ignore exceptions)
It says this is a project of Mozilla, but it seems like something that would make sense under IANA. Is there a reason why it is not maintained by a standards organization? Maybe the definition of what is/isn't a public suffix is too fuzzy to standardize?
edit: After reading https://github.com/sleevi/psl-problems maybe the standards organizations just don't think it's a good idea
Does HN use the PSL to decide how to display the domains attached to submissions?
It was a manual list in 2023, it may still be:
https://news.ycombinator.com/item?id=35884437#35894287
I worked on a DNS resolver that detects DNS exfiltration in part by using this list to aggregate high entropy subdomains to the first level below the TLD. And, indeed I didn’t account for the list updating frequently and need to fix that.
I only became passively aware of this because Let's Encrypt uses the PSL for limits on registrations for domains not in the PSL. Been meaning to setup a dyndns service for a few of my domains and need to get them on the PSL so users can manage to do HTTPS without issue.
Edit: I still think that domains hosted with major dyndns services (like freedns.afraid.org) should be treated like PSLs.
I'm surprised most of the free dyndns domains aren't in there already. The first time I learned about the list was when Let's Encrypt was in closed beta, and they already had a warning on the site telling people not to add their own domain as a means to circumvent registration limits for ACME certs.
Re. afraid.org, there's good discussion in the ticket tracker explaining why that hasn't happened. Anyone is free to try to convince the domain owners, but the domain owner needs to approve the addition.
https://github.com/publicsuffix/list/issues/271#issuecomment...
Story time!
I came across the PSL when a state government department contacted my consultancy and asked what the impact would be of uncommenting a line in the PSL. They were focused on the effect this would have on DMARC and SPF records of child agencies under the parent TLD, but I realised that it also meant that cookies that could previously be shared across agency boundaries would suddenly be siloed at a different level, potentially breaking web apps. (Think authentication portals using shared cookies across a bunch of things.)
But how to test this!?
I discovered that the PSL is embedded in browser executables when they’re compiled. So I came up with the approach of making two Chromium builds, one with the PSL change and one without the change. Since it has a nice blue icon I changed the modified build to have a red icon. I called these the “red pill” and “blue pill” versions.
The idea was that web devs could test their sites with the two nearly identical browsers side-by-side and so any observed difference is a sign of a potential issue. I also used Playwright to scan over ten thousand public URLs with both a compared the traces programmatically.
Another trick I used was to spin up spot priced “HPC” instances in Azure with 120 AMD EPYC cores to run the builds.
One of the most fun projects I’ve ever worked on.
No, they never changed the PSL, it’s still incorrect.
I only found one site that has an issue, but that made them too nervous and they gave up…
Why "suffix"? They are tehnically domains?
They can happen at multiple levels of the hierarchy
That just means it is not limited to "top-level" domains. example.foo.com is a domain as foo.com, com.
This feels like you've accidentally waxed pedantic a bit. In common parlance, com is a TLD, example.com is a domain, foo.example.com is a subdomain. The suffix list is designed to capture all of that and maps to how it's used (you take the suffix list and check if anything in it is a suffix map for the name you've been given).
I always thought:
Yup, you’re correct. But in common usage, it would be weird to refer to example.com as a subdomain. Depending on the context, it would also be weird to refer to foo.example.com as a domain instead of a subdomain.
If somebody asked me what domain you’re using and you said “com”, you would technically have answered accurately but they’d be confused.