I’ll be interested to see benchmarks. My expectation is that accuracy will take a hit on mid or longer context prompts: I’d bet that the heavy use of JSON in fine tuning will end up impacting quality of a more terse (less reasoning space) novel encoding.
Do you mean the [0] Token Benchmarks section? I only see token count numbers.
Which doesn't address the question: do LLMs understand TOON the same as they would JSON? It's quite likely that this notation is not interpreted the same by most LLM, as they would JSON. So benchmarks on, say, data processing tasks, would be warranted.
For repeating objects of the same structure, yaml will still require each key on each object, whereas this is a hybrid with csv, so it defines the keys once.
I’ll be interested to see benchmarks. My expectation is that accuracy will take a hit on mid or longer context prompts: I’d bet that the heavy use of JSON in fine tuning will end up impacting quality of a more terse (less reasoning space) novel encoding.
That said: I like the idea!
There are a very light benchmarks in the Readme, or are you looking for more?
Do you mean the [0] Token Benchmarks section? I only see token count numbers.
Which doesn't address the question: do LLMs understand TOON the same as they would JSON? It's quite likely that this notation is not interpreted the same by most LLM, as they would JSON. So benchmarks on, say, data processing tasks, would be warranted.
[0] https://github.com/johannschopplich/toon?tab=readme-ov-file#...
I think they're talking about these sections:
1. Retrieval Accuracy - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
2. Performance by dataset - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
Hello, it's probably better to add leading spaces before all of the words rather than none of them
Just send compressed JSON in, duh.
I don't get it, can't you just use yaml instead of inventing another DSL.
For repeating objects of the same structure, yaml will still require each key on each object, whereas this is a hybrid with csv, so it defines the keys once.
It's more compact than YAML. More like a combination of YAML and CSV.