Friends and Grandmothers in Silico
Blog post from HuggingFace
In the exploration of how large language models (LLMs) encode and retrieve factual knowledge about entities, researchers have discovered that similar to "grandmother cells" in the human brain, LLMs activate specific neurons, called "entity cells," to represent distinct entities. These neurons, residing in the model's Multi-Layer Perceptrons (MLPs), function as semantic embeddings that efficiently store and access all knowledge about an entity, regardless of its representation in different languages or forms. By localizing these neurons, experiments showed that activating them could retrieve entity-specific information, while inhibiting them could induce "entity amnesia." This suggests a robust mechanism where entities serve as keys to access knowledge, supporting the "subject-as-key" hypothesis. The study further reveals that these entity cells are robust to variations and can be manipulated to inject or erase knowledge, highlighting their causal role in knowledge processing. Despite this, the study also acknowledges limitations such as incomplete entity coverage and potential redundancy mechanisms within the models. The findings underscore the potential for advanced interpretability techniques in LLMs, enhancing understanding of their inner workings and offering new avenues for targeted interventions in AI systems.