LLM architecture comparison | Svelte Hacker News

strangescript 3 hours ago

The diagrams in this article are amazing if you are somewhere in between a novice and expert. Seeing all of the new models laid out next to each other is fantastic.

bravesoul2 6 hours ago

This is a nice catchup for some who hasn't been keeping up like me

webappguy 2 hours ago

Would love to see a PT.2 w even what is rumored in top closed source frontier models eg. o5, o3 Pro, o4 or 4.5, Gemini 2.5 Pro, Grok 4 and Claude Opus 4

Chloebaker 4 hours ago

Honestly its crazy to think how far we’ve come since GPT-2 (2019), today comparing LLMs to determine their performance is notoriously challenging and it feels like every 2 weeks a models beats a new benchmark. I’m really glad DeepSeek was mentioned here, bc the key architectural techniques it introduced in V3 that improved its computational efficiency and distinguish it from many other LLMs was really transformational when it came out.

dmezzetti 5 hours ago

While all these architectures are innovative and have helped improve either accuracy or speed, the same fundamental problem of generating factual information still exists.

Retrieval Augmented Generation (RAG), Agents and other similar methods help mitigate this. It will be interesting to see if future architectures eventually replace these techniques.

tormeh 3 hours ago

To me, the issue seems to be that we're training transformers to predict text, which only forces the model to embed limited amounts of logic. We'd have to find something different to train models on in order for them to stop hallucinating.
bsenftner 2 hours ago

I'm still thinking about how RAG being conceptually simple and easy to implement, why the foundational models have not incorporated it into their base functionality? The lack of that strikes me as a negative point about RAG and it's variants, because if any of them worked, it would be in the models directly and not need to be added afterwards.
- bavell an hour ago
  
  RAG is a prompting technique, how could they possibly incorporate it into the pre training?
  - maleldil an hour ago
    
    CoT is a prompting technique too, and it's been incorporated.
    
    bavell 15 minutes ago
    
    IIUC, CoT is "incorporated" into training by just providing better quality training data which steers the model towards "thinking" more deeply in its responses. But at the end of the day, it's still just regular pre training.
    RAG - Retrieval augmented generation - how can the retrieval be done during training? RAG will always remain external to the model. The whole point is that you can augment the model by injecting relevant context into the prompt at inference time, bringing your own proprietary/domain-specific data.