Very interesting approach to showing the relationship between RLHF and AI Psychosis: the idea of taking clinical conversations and prompting the model with it seemed like a grounded start. As I'm also investigating AI Psychosis, this approach seems like something to adopt for my work.
I ran the data through our LLM behavioral analysis system at splabs.io, and it alerts Red on the RLHF-optimized output compared to a Yellow on the no-RLHF.
Very interesting approach to showing the relationship between RLHF and AI Psychosis: the idea of taking clinical conversations and prompting the model with it seemed like a grounded start. As I'm also investigating AI Psychosis, this approach seems like something to adopt for my work.
I ran the data through our LLM behavioral analysis system at splabs.io, and it alerts Red on the RLHF-optimized output compared to a Yellow on the no-RLHF.
You can check out the analysis here: https://splabs.io/ai-psychosis-and-cognitive-cost
It was interesting to read just how much of a "sycophant" effect LLMs can have if they fully lean into the RLFH system.