Toon – Token Oriented Object Notation

21 points | by royosherove 21 hours ago

9 comments

vessenes 9 hours ago
I’ll be interested to see benchmarks. My expectation is that accuracy will take a hit on mid or longer context prompts: I’d bet that the heavy use of JSON in fine tuning will end up impacting quality of a more terse (less reasoning space) novel encoding.
That said: I like the idea!
[-]
- brian-bk 8 hours ago
  There are a very light benchmarks in the Readme, or are you looking for more?
  [-]
  - Mumps 8 hours ago
    Do you mean the [0] Token Benchmarks section? I only see token count numbers.
    Which doesn't address the question: do LLMs understand TOON the same as they would JSON? It's quite likely that this notation is not interpreted the same by most LLM, as they would JSON. So benchmarks on, say, data processing tasks, would be warranted.
    [0] https://github.com/johannschopplich/toon?tab=readme-ov-file#...
    [-]
    - tujux 7 hours ago
      I think they're talking about these sections:
      1. Retrieval Accuracy - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
      2. Performance by dataset - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
anonymoushn 18 hours ago
Hello, it's probably better to add leading spaces before all of the words rather than none of them
moralestapia 7 hours ago
Just send compressed JSON in, duh.
meander_water 10 hours ago
I don't get it, can't you just use yaml instead of inventing another DSL.
[-]
- jscheel 4 hours ago
  For repeating objects of the same structure, yaml will still require each key on each object, whereas this is a hybrid with csv, so it defines the keys once.
- mhosayny 8 hours ago
  It's more compact than YAML. More like a combination of YAML and CSV.