Company
Date Published
Author
Jay Alammar
Word count
1677
Language
English
Hacker News points
None

Summary

In an interview with Graham Neubig, an associate professor at Carnegie Mellon University, he discusses the evaluation and future of large language models (LLMs) and neural network architectures beyond Transformers. Neubig emphasizes the importance of using academic datasets for initial evaluations but stresses the need for iterative examination of real-world data outputs to identify and correct errors. He highlights a growing trend in academia towards developing more complex and industry-representative datasets, and he expresses interest in LLM-backed agents, which can either use tools to solve tasks or act independently to impact the world. Neubig also explores the potential of combining LLMs with reinforcement learning and mentions the emergence of new architectures like Mamba, which challenge the dominance of Transformers. Looking ahead to 2024, he is keen on improving evaluation reliability and developing small, adaptable open-source models that perform well on specific tasks.