Imagine iterative summarization, or automatically iteratively refining code until it's rightįor louie.ai sessions, that's meant a fascinating trade-off here when doing the above: * thinking: a lot of analytical approaches essentially use writing as both memory & thinking. If each inference is faster, you can 'read' more in the same time on the same hardware * reading: If you want it to do inference over a lot of context, you'll need to do multiple inferences. Most LLM model use shouldn't be 'raw' but as part of a smart & iterative pipeline.