Issues I've noticed when running it against more threads:
- don't use Legacy voices as they seem to be of much lower quality (sounds like someone is calling in from an international landline)
- when the same poster appears many times, it gets tedious to hear them restate who they are. I think after the first 3, we should recognize the voice so that's not necessary anymore
Feature requests I'll add:
- emphasize quotes better
- add audio chapter marks if possible, so it's possible to skip ahead
- attach a speaker's voice to the relevant voice in the 11Labs account if there's a voice with the same name as the username
- add sound effects if people write down sound effects in their comments (this seems tough)
Alright, I've made several updates based on feedback!
Cost Estimation
- Shows (very rough) character count estimate (rounded to nearest thousand)
- Displays approximate cost at $0.12 per thousand characters
- Updates dynamically as selections change
Advanced Input Options
- Added toggle between single thread URL and top 100 stories selection
- Implemented multi-thread selection with checkboxes
- Saves input mode preference to localStorage
Comment Limit Improvements
- Changed to "All" as default with option for custom limit
- Original post no longer counts against comment limit
Quote Formatting
- Text with > is now properly recognized as quotes
- Quotes are transformed with random introduction phrases
- Adds "End of quote" with variations at the end of quoted text
Link Handling
- Preserves shared links in expandable section at the bottom
- Different random phrases for first, second, and multiple links
- Links open in new tabs when clicked
Voice Matching
- Matches commenter usernames to ElevenLabs voices if names match
- Falls back to deterministic assignment if no match found
Error Handling & Recovery
- Saves progress and allows resuming after errors
- Shows "Retry" button with partial audio when errors occur
- Audio generated so far is available for download
UI Improvements
- Added tooltip with API key information
- Persistent theme preferences via localStorage
- Improved responsive design for mobile
- the filename of the generated MP3 file matches the thread title
1. This should be possible, I think for example if you saved your cloned voice in your account with the same name as your HN handle. I'll add this. This should then work for using any voice for a specific user (just use the right username as the voice's name in 11Labs).
2. No navigation by voice commands sadly - it generates a single audio track. I might be able to insert chapter marks for each comment though, so that it'd be possible to "skip" to the next comment!
I would think once I introduce the feature above, you could just create a "01HNNWZ0MV43FF" voice with the Voice Lab[0] inside your account (not necessarily duplicating your real voice but just using 11Lab's tool to get a feminine voice). Would that work?
One big post can have a bigger reply counter-arguing every point 1b1. It would be nice if the arguments go back and forth, basically segmenting the post and the replies into multiple lines of dialog, rather than feeling like you are listening to a speech.
>Wait... do you mean, quoting the original (or parent) poster in their own voice when there's a quote?
yeah, I think what I'm getting at is when there is a big argumentative post crossing the line from chit-chat to speech, break out of the structure of the website, let the LLM get the arguments out and connect them to the counter-arguments and turn it into a back a forth with shorter dialog lines, without repeating too much or one person talking for very long.
Also I agree, the LLM should be free to transform or add dialog how it sees fit so it feels more natural but always keeping it true to what is written.
In this app, the process runs entirely in the browser and has no LLM calls at all, so we don't have the ability to rewrite the conversation (other than performing regexes or other crude operations on the text of a comment, which is how links are turned into "See the link I posted in the thread").
I also think it's incredibly difficult (even with an LLM) to render properly a multi-turn multi-user conversation without sticking to the actual hierarchy of the thread. We would probably run into the "summarize the thread and lose nuance" problem again.
Note: I'm particularly interested in feedback on making the conversation feel even more "natural" so that the audio is as similar as possible as if we were really listening in on the watercooler chat.
I'm sorry, please allow me clear any misunderstanding.
I am not producing any derivative work of your comments, and this is not commercial use. Here is why:
a - this is a static HTML page so the only way to generate audio from it is for YOU to provide YOUR 11Labs key to get an audio on YOUR 11Labs account that YOU can then download to YOUR computer, so technically YOU would be creating the derivative work, not me
b - how is this promoting my business?? There is no link to it anywhere here
c - did you follow me from the TikTok thread to here? what's going on?
Also, are you also issuing cease and desist letters to every web app that is using the HN api? Because there's really nothing happening here warranting such a response.
To your first point, conceded on further review, but it is unwise to host this on your corporate Github account edit: or a Github Pages account sharing a name with your corporate entity. I would take it as a sign of good faith to see that change soon, which seems a trivial burden since personal hosting for Github Pages is also free. edit: No longer necessary in my view, though I still would recommend it; not everyone is as nice as me.
To your second, see my first, above. Also, you opened your Show HN as:
> Hi HN, here's something fun to play with.
> It takes any HN thread and turns it into an audio conversation so you can listen to the thread while doing other things.
> I've seen many previous attempts to turn HN threads into podcasts, but they all shared a common issue IMO: trying to reduce the very rich back-and-forth into a single-thread single-reader boring podcast. Instead, I wanted to hear the actual debate from the actual thread!
I appreciate you believe yourself to have acted in good faith, and on further review I concede I made some hasty assumptions, based on the nature of your own representations, about the nature of this work, which led me to conclude more than appears to be warranted in fact.
I know of no Tiktok thread, but infer from your mention of same I am not the only one who has reacted in a fashion which some could see fit to consider precipitous. Please consider the possibility that this is what a launch looks like in the moment when it is only going sideways, after the rocket has ceased going up but before it begins to succumb again to gravity's inexorable pull.
> To your first point, conceded on further review, but it is unwise to host this on your corporate Github account. I would take it as a sign of good faith to see that change soon, which seems a trivial burden since personal hosting for Github Pages is also free.
It's not a "corporate GitHub account", it's a throwaway github account I created a while ago because some app required a GitHub login to give me access. I have zero other repos there, are my company uses GitLab.
> I appreciate you believe yourself to have acted in good faith, and on further review I concede I made some hasty assumptions, based on the nature of your own representations, about the nature of this work, which led me to conclude more than appears to be warranted in fact.
No worries, everyone can be mistaken from time to time, especially in a written medium.
> I know of no Tiktok thread, but infer from your mention of same I am not the only one who has reacted in a fashion which some could see fit to consider precipitous.
You replied to me just earlier today, on a thread called "TikTok is harming children at an industrial scale" where I made a cheeky comment about AI-generated YouTube content:
https://news.ycombinator.com/item?id=43718657
As an aside, I believe that AI should be used to enhance creation not replace it, which is why I built a video-editing agent (to help my customers who are experts or teachers who record themselves in real videos) and why we don't do avatar-style generative video.
So I have some strong opinions about AI content (that are vastly different from other founders in the space) and enjoy debate about it. Threatening legal action might not be the most constructive tool for debate when you could find out that you're mostly in agreement with your counterpart's position... this is one of the things HN is good for.
Oh, that Tiktok thread, of course. Sorry, I inferred from your mention in the context here that you were hearing about someone else complaining on Tiktok, on similar grounds to mine.
I threatened legal action when I believed you to be creating an unlicensed derivative work of my carefully considered original words, for the commercial purpose of promoting your business which I see functions in a way technically somewhat similar. I now understand that not to be the case, and for the sake of absolute clarity, I acknowledge that the cause for action I initially suspected is not in fact present, nor any other of which I'm aware, and I intend no further action of the sort I earlier described.
Sorry to blow you up over it. The folks who understand me to be "unhinged" don't know what it actually looks like to see someone care about something in a way that requires courage, and I don't blame them for failing to recognize it, but I'm not so small a person I can't concede I overreacted somewhat here.
That said, my voice - the way I use words, in text and in speech - has been the work of a lifetime, and to see that processed into AI-generated garbage may possibly be nothing I can eventually prevent but certainly is nothing I would intend to take lying down.
I'm sure someone here will take these words as reason to use your tool to generate something they can try to bother me with, and I hereby disclaim any intent of pursuing action against you, Sébastien, or OneTake, on that basis, unless I come to believe at some future time that you were substantially involved. (I'm sure I won't!) I don't expect to be particularly bothered in such event, but do intend to see someone else be, and I would advise anyone against the attempt.
I'm glad we had the opportunity to clear that up, and I look forward to following your further doings with the same favorable interest that's attended on our other conversation, and I hope now to some extent for both of us even at the end of this one, today.
It's nice to be on sabbatical, yes. I see the website mentioned in your profile, as the one in mine, is also broken. Perhaps we both have better ways to spend our time.
> By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.
You may have issued such a license...
Though without an explicit sublicense from Y Combinator, they may have issues with this application:
> Except as expressly authorized by Y Combinator, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Site or the Site Content, in whole or in part, except that the foregoing does not apply to your own User Content (as defined below) that you legally upload to the Site.
This is simply calling the official YC API[0] from the end user's browser, so any user is basically doing the equivalent of clicking the Text-to-Speech accessibility button in their browser, really (albeit with better voices).
Oh for goodness sakes. This is why we can't have nice things.
I don't know if you're legally in the right or not, but if you are, it's only in the sense that media companies are in the right when the DMCA a child's fan art drawing of one of their characters.
There is an extremely material difference between Disney DMCAing a good-faith fan artist and me, one unemployed person without so much as a sole-owner LLC to my name, demanding that my words not be turned to some other bastard's moneymaking purpose.
I understand, as I have clarified elsewhere and for which I owe you no accounting of any kind, that is not happening in this case. I do not apologize, to you, for any particular of my reaction.
I think it counts the original post as a comment, so the total shown is (original posts plus number of comments). Is it actually missing one comment in your audio ? which one? first or last?
I have no plans to publish as a podcast (if I was going to go through all the trouble to put a podcast together, it would be an actual podcast for my startup, not for a hobby project!) but I'd love it if someone did it!
Hmmm this personal non-profit project is not endorsed or owned by my company at all, so you should rather reach me individually if you have comments regarding it, but otherwise if you have any official communication to send to OneTake Pte Ltd (the company), then yes, that's a valid email.
Sure but the public data is still protected by copyright laws. That's why journalists cannot reproduce your blog post verbatim or take your photos and use them in their articles without your explicit permission unless you've released your articles or photos under a Creative Commons license.
Don't get me wrong! I do think that this whole thread is a massive overreaction. A judge might even see this use of data as "fair use". I don't know. I'm not a lawyer. But I do know that "public data" != "can do whatever you want with it".
What's the difference between a screen reader and this guys show hn script? If he were releasing it as a paid podcast feed or something I might take offense, but it seems just like a screen reader to me that blind people might use.
Issues I've noticed when running it against more threads:
- don't use Legacy voices as they seem to be of much lower quality (sounds like someone is calling in from an international landline)
- when the same poster appears many times, it gets tedious to hear them restate who they are. I think after the first 3, we should recognize the voice so that's not necessary anymore
Feature requests I'll add:
- emphasize quotes better
- add audio chapter marks if possible, so it's possible to skip ahead
- attach a speaker's voice to the relevant voice in the 11Labs account if there's a voice with the same name as the username
- add sound effects if people write down sound effects in their comments (this seems tough)
Anything I'm missing?
Alright, I've made several updates based on feedback!
Cost Estimation
Advanced Input Options Comment Limit Improvements Quote Formatting Link Handling Voice Matching Error Handling & Recovery UI ImprovementsThis is cool. Any chance you can drop an example?
Here's a quick example:
First 20 comments of "John Carmack: writing Rust code feels wholesome"
Here is the rendered mp3 : https://drive.google.com/file/d/1yG1mwD70ZteXtdh8Jk_sXUXS_sQ...
The thread: https://news.ycombinator.com/item?id=19126795
First 30 comments of a recent thread, "AGI is still 30 years away": https://drive.google.com/file/d/1YbgRXBv1LC3IdMl8Xb4i9y98S2T...
The thread: https://news.ycombinator.com/item?id=43719280
Given recent developments I think it might be fun to listen to this very thread as audio!
Can I upload my own voiceprint so my comments are said in my voice, voice of my choosing?
Can I navigate by voice commands, for example if listening while driving?
1. This should be possible, I think for example if you saved your cloned voice in your account with the same name as your HN handle. I'll add this. This should then work for using any voice for a specific user (just use the right username as the voice's name in 11Labs).
2. No navigation by voice commands sadly - it generates a single audio track. I might be able to insert chapter marks for each comment though, so that it'd be possible to "skip" to the next comment!
I don't have a voice print, can I put something in my profile to get a generic feminine voice? I don't suppose there's a pronouns field
I would think once I introduce the feature above, you could just create a "01HNNWZ0MV43FF" voice with the Voice Lab[0] inside your account (not necessarily duplicating your real voice but just using 11Lab's tool to get a feminine voice). Would that work?
[0]: https://elevenlabs.io/app/voice-lab
One big post can have a bigger reply counter-arguing every point 1b1. It would be nice if the arguments go back and forth, basically segmenting the post and the replies into multiple lines of dialog, rather than feeling like you are listening to a speech.
Wait... do you mean, quoting the original (or parent) poster in their own voice when there's a quote?
That seems less natural. I think what I can do though, is turn quotes into actual quotes, eg. turning
> One big post can have a bigger reply counter-arguing every point 1b1
into:
"Look; you said 'One big post can have a bigger reply counter-arguing every point 1b1'"
>Wait... do you mean, quoting the original (or parent) poster in their own voice when there's a quote?
yeah, I think what I'm getting at is when there is a big argumentative post crossing the line from chit-chat to speech, break out of the structure of the website, let the LLM get the arguments out and connect them to the counter-arguments and turn it into a back a forth with shorter dialog lines, without repeating too much or one person talking for very long.
Also I agree, the LLM should be free to transform or add dialog how it sees fit so it feels more natural but always keeping it true to what is written.
In this app, the process runs entirely in the browser and has no LLM calls at all, so we don't have the ability to rewrite the conversation (other than performing regexes or other crude operations on the text of a comment, which is how links are turned into "See the link I posted in the thread").
I also think it's incredibly difficult (even with an LLM) to render properly a multi-turn multi-user conversation without sticking to the actual hierarchy of the thread. We would probably run into the "summarize the thread and lose nuance" problem again.
Note: I'm particularly interested in feedback on making the conversation feel even more "natural" so that the audio is as similar as possible as if we were really listening in on the watercooler chat.
[flagged]
I'm sorry, please allow me clear any misunderstanding.
I am not producing any derivative work of your comments, and this is not commercial use. Here is why:
a - this is a static HTML page so the only way to generate audio from it is for YOU to provide YOUR 11Labs key to get an audio on YOUR 11Labs account that YOU can then download to YOUR computer, so technically YOU would be creating the derivative work, not me
b - how is this promoting my business?? There is no link to it anywhere here
c - did you follow me from the TikTok thread to here? what's going on?
Also, are you also issuing cease and desist letters to every web app that is using the HN api? Because there's really nothing happening here warranting such a response.
To your first point, conceded on further review, but it is unwise to host this on your corporate Github account edit: or a Github Pages account sharing a name with your corporate entity. I would take it as a sign of good faith to see that change soon, which seems a trivial burden since personal hosting for Github Pages is also free. edit: No longer necessary in my view, though I still would recommend it; not everyone is as nice as me.
To your second, see my first, above. Also, you opened your Show HN as:
> Hi HN, here's something fun to play with.
> It takes any HN thread and turns it into an audio conversation so you can listen to the thread while doing other things.
> I've seen many previous attempts to turn HN threads into podcasts, but they all shared a common issue IMO: trying to reduce the very rich back-and-forth into a single-thread single-reader boring podcast. Instead, I wanted to hear the actual debate from the actual thread!
I appreciate you believe yourself to have acted in good faith, and on further review I concede I made some hasty assumptions, based on the nature of your own representations, about the nature of this work, which led me to conclude more than appears to be warranted in fact.
I know of no Tiktok thread, but infer from your mention of same I am not the only one who has reacted in a fashion which some could see fit to consider precipitous. Please consider the possibility that this is what a launch looks like in the moment when it is only going sideways, after the rocket has ceased going up but before it begins to succumb again to gravity's inexorable pull.
> To your first point, conceded on further review, but it is unwise to host this on your corporate Github account. I would take it as a sign of good faith to see that change soon, which seems a trivial burden since personal hosting for Github Pages is also free.
It's not a "corporate GitHub account", it's a throwaway github account I created a while ago because some app required a GitHub login to give me access. I have zero other repos there, are my company uses GitLab.
> I appreciate you believe yourself to have acted in good faith, and on further review I concede I made some hasty assumptions, based on the nature of your own representations, about the nature of this work, which led me to conclude more than appears to be warranted in fact.
No worries, everyone can be mistaken from time to time, especially in a written medium.
> I know of no Tiktok thread, but infer from your mention of same I am not the only one who has reacted in a fashion which some could see fit to consider precipitous.
You replied to me just earlier today, on a thread called "TikTok is harming children at an industrial scale" where I made a cheeky comment about AI-generated YouTube content: https://news.ycombinator.com/item?id=43718657
As an aside, I believe that AI should be used to enhance creation not replace it, which is why I built a video-editing agent (to help my customers who are experts or teachers who record themselves in real videos) and why we don't do avatar-style generative video.
So I have some strong opinions about AI content (that are vastly different from other founders in the space) and enjoy debate about it. Threatening legal action might not be the most constructive tool for debate when you could find out that you're mostly in agreement with your counterpart's position... this is one of the things HN is good for.
Oh, that Tiktok thread, of course. Sorry, I inferred from your mention in the context here that you were hearing about someone else complaining on Tiktok, on similar grounds to mine.
I threatened legal action when I believed you to be creating an unlicensed derivative work of my carefully considered original words, for the commercial purpose of promoting your business which I see functions in a way technically somewhat similar. I now understand that not to be the case, and for the sake of absolute clarity, I acknowledge that the cause for action I initially suspected is not in fact present, nor any other of which I'm aware, and I intend no further action of the sort I earlier described.
Sorry to blow you up over it. The folks who understand me to be "unhinged" don't know what it actually looks like to see someone care about something in a way that requires courage, and I don't blame them for failing to recognize it, but I'm not so small a person I can't concede I overreacted somewhat here.
That said, my voice - the way I use words, in text and in speech - has been the work of a lifetime, and to see that processed into AI-generated garbage may possibly be nothing I can eventually prevent but certainly is nothing I would intend to take lying down.
I'm sure someone here will take these words as reason to use your tool to generate something they can try to bother me with, and I hereby disclaim any intent of pursuing action against you, Sébastien, or OneTake, on that basis, unless I come to believe at some future time that you were substantially involved. (I'm sure I won't!) I don't expect to be particularly bothered in such event, but do intend to see someone else be, and I would advise anyone against the attempt.
I'm glad we had the opportunity to clear that up, and I look forward to following your further doings with the same favorable interest that's attended on our other conversation, and I hope now to some extent for both of us even at the end of this one, today.
Apologies accepted, no hard feelings, glad we've cleared the misunderstanding :-)
Likewise! Enjoy the balance of your day, hopefully much more than we both have the last hour or so :D
> conceded on further review
> I would take it as a sign of good faith to see that change soon
Who are you, the district attorney around these parts?
Have you never seen this kind of conversation before?
edit: Oh, sorry, never mind, I see the 'Extension' now.
One moment, please, while I review in more detail.
The real question is why is AI making you get so angry over trivial issues?
When you have nothing better to do, you do this.
It's nice to be on sabbatical, yes. I see the website mentioned in your profile, as the one in mine, is also broken. Perhaps we both have better ways to spend our time.
Mine is less broken than yours.
Is it? From here at least, multiplicity.studio does not presently resolve.
I will consult the manual and make the necessary adjustments.
[flagged]
[flagged]
> By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.
You may have issued such a license...
Though without an explicit sublicense from Y Combinator, they may have issues with this application:
> Except as expressly authorized by Y Combinator, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Site or the Site Content, in whole or in part, except that the foregoing does not apply to your own User Content (as defined below) that you legally upload to the Site.
https://www.ycombinator.com/legal/#tou
This is simply calling the official YC API[0] from the end user's browser, so any user is basically doing the equivalent of clicking the Text-to-Speech accessibility button in their browser, really (albeit with better voices).
[0]: https://github.com/HackerNews/API
Oh for goodness sakes. This is why we can't have nice things.
I don't know if you're legally in the right or not, but if you are, it's only in the sense that media companies are in the right when the DMCA a child's fan art drawing of one of their characters.
Please stop.
There is an extremely material difference between Disney DMCAing a good-faith fan artist and me, one unemployed person without so much as a sole-owner LLC to my name, demanding that my words not be turned to some other bastard's moneymaking purpose.
I understand, as I have clarified elsewhere and for which I owe you no accounting of any kind, that is not happening in this case. I do not apologize, to you, for any particular of my reaction.
Please stop.
[flagged]
[flagged]
Because anonymous facts are less valuable than those from an identified individual?
[flagged]
Please go be nasty and uncivil elsewhere.
[flagged]
It seems that in the generated audio, the number of comments is off by one. It is missing 1 comment.
I think it counts the original post as a comment, so the total shown is (original posts plus number of comments). Is it actually missing one comment in your audio ? which one? first or last?
The last one. I did https://news.ycombinator.com/item?id=43552385 and entered 26 comments.
Ah! You don't need to enter the exact number of comments in this field, you can leave it at 100.
Entering a max of "26" manually is what created the off-by-one error, I think, because of the original post being counted as a comment.
But yeah, I'll fix that.
If I leave the max at 100, then I get every comment (original post + al 26 comments), here's the output audio: https://drive.google.com/file/d/1fIis8yQn-YuOmJwq1J4cLtthQV0...
Update: I fixed it. The parent post is no longer counting towards the limit.
This is pretty good I might listen to this as alternative to a podcast.
Maybe publish it as a podcast.
Thank you!
I have no plans to publish as a podcast (if I was going to go through all the trouble to put a podcast together, it would be an actual podcast for my startup, not for a hobby project!) but I'd love it if someone did it!
Oh nice cool water. It's a bit muddy looking? Is it safe to drink?
Continue straight for eleven thousand miles, then turn lreft
rips hair out
This sounds painful! I think I'll add a feature so 11Labs generates sound effects for comments like this, so they can be enjoyed in their full glory
[flagged]
Hmmm this personal non-profit project is not endorsed or owned by my company at all, so you should rather reach me individually if you have comments regarding it, but otherwise if you have any official communication to send to OneTake Pte Ltd (the company), then yes, that's a valid email.
May I ask why though? It seems oddly off-topic.
[flagged]
How can he know which is your real account though? I guess you’ll have to post under your real name
When you post on HN it becomes public data. Are you going to sue antirez too about his post earlier today on word similarity analysis?
> When you post on HN it becomes public data.
Sure but the public data is still protected by copyright laws. That's why journalists cannot reproduce your blog post verbatim or take your photos and use them in their articles without your explicit permission unless you've released your articles or photos under a Creative Commons license.
Don't get me wrong! I do think that this whole thread is a massive overreaction. A judge might even see this use of data as "fair use". I don't know. I'm not a lawyer. But I do know that "public data" != "can do whatever you want with it".
What's the difference between a screen reader and this guys show hn script? If he were releasing it as a paid podcast feed or something I might take offense, but it seems just like a screen reader to me that blind people might use.
Just curious!
What a completely unhinged knee jerk overreaction. Maybe take a nice walk.
Are you planning to send them a cease and desist for having AI read your comments?
I intend exactly that: https://news.ycombinator.com/item?id=43721405 edit: See above; no longer a concern.
I see I'm rate limited, which is reasonable at this time; I'll follow up via email tomorrow or so. Sorry for necessitating moderation.
Even if it had worked the way you thought it did, that’s absolutely bizarre.
To assert the property right in one's own likeness is bizarre? In what way?