stochastictrebuchet

stochastictrebuchet@sh.itjust.works · 1 month ago

ML engineer here. My intuition says you won’t get better accuracy than with sentence template matching, provided your matching rules are free of contradictions. Of course, the downside is you need to remember (and teach others) the precise phrasing to trigger a certain intent. Refining your matching rules is probably a good task for a coding agent.

Back in the pre-LLM days, we used simpler statistical models for intent classification. These were way smaller and could easily run on CPU. Check out random forests or SVMs that take bags of words as input. You need enough examples though to train them on.

With an LLM you can reframe the problem as getting the model to generate the right ‘tool’ call. Most intents are a form of relation extraction: there’s an ‘action’ (verb) and one or more participants (subject, object, etc.). You could imagine a single tool definition (call it ‘SpeakerIntent’) that outputs the intent type (from an enum) as well as the arguments involved. Then you can link that to the final intent with some post-processing. There’s a 100M version of gemma3 that’s apparently not bad at tool calling.

stochastictrebuchet@sh.itjust.works · 9 months ago

https://minilanguage.com/ is an interesting one to look at. There are exactly 1000 words in the total vocabulary. That’s Mini Mundo though. A second, smaller variant also exists: Mini Kore, with 100 words.

I started learning it too soon after learning Toki Pona and lost steam. But I agree with the design principles. They stem from the observation that Toki Pona, as fun as it is, is just too damn ambiguous for anything non-superficial. All too often speakers need to clarify what they said by switching to a natural language. Even my own Toki notes become indecipherable after a few days.

Toki Pona: fun, therapeutic mental exercise, made even better with sitelen pona. Feels like writing poetry. Never meant to be a useful language. Easy to learn, hard to use.

Mini: useful as a language for general purpose communication. Small, primarily latinate vocabulary. Harder to learn, easier to use.