Dissociating language and thought in large language models

prolyxis 3 days ago

The human brain, the authors argue, in fact uses multiple networks when interpreting and producing language. These include:

- the language network, which delivers formal linguistic competence - the multiple demand network, which provides reasoning ability - the default network, which tracks narratives above the clause level - the theory of mind network, which infers the mental state of another entity

This leads to their argument that a modular structure would lead to enhanced ability for an LLM to be both formally and functionally competent. (While LLMs currently exhibit human-level formal linguistic competence, their functional competence--the ability to navigate the real world through language--has room for improvement.)

Transformer models, they note, have degree of emergent modularity through "allowing different attention heads to attend to different input features."

I was wondering, is it possible to characterize the degree of emergent modularity in current systems?

imtringued 2 days ago

One of the big limitations in LLMs is that they only have a single context window. People throw things that probably shouldn't be mixed together into the same context and hope for the best (e.g. system prompt, RAG context, user input, LLM output).
This is basically no different from a Turing machine going from one tape to multiple tapes. While in theory it doesn't make the Turing machine more powerful, it saves a whole lot of book keeping operations that are necessary to work around the limitations of a single tape.
Another limitation is the inability to seek to positions by moving the head back and forth to rewrite old data in the context.
JieJie 3 days ago

I’m not sure if this is exactly what you are referring to, but Anthropic has done a lot of interpretability work on Claude, which they’ve published along with the famous "Golden Gate Claude".^1
"We also find more abstract features—responding to things like bugs in computer code, discussions of gender bias in professions, and conversations about keeping secrets."
1: https://www.anthropic.com/research/mapping-mind-language-mod...
ithkuil 2 days ago

How much of this modularity is it worth to insert in the architecture itself and how much should we let emerge from the training process itself?