Large language models think that 3.11 > 3.9 because of the Bible

Oct 25, 2024

Large language models struggle with comparing numbers with decimals. Here are a few examples:

Transluce, a non-profit start-up, recently released a really cool demo to investigate and change unwanted behaviors of large language models. In the demo, they dive into why a large language model (Llama-3.1 8B Instruct) struggles with comparing numbers.

They demonstrate that neurons representing biblical verses and calendar dates activate when comparing 9.9 with 9.11. In these two settings, 9.11 is indeed bigger than 9.9. They then suppress the neurons representing biblical verses and calendar dates, and the model correctly answers that 9.9 is bigger than 9.11. I encourage you to check out the demo!

There are still improvements to be made. In the demo, they only investigate neurons. However, in most cases, a feature is represented by multiple neurons, not one. If they considered the directions in representation space instead of neurons, they might have detected more features, such as software package versions. That being said, extracting features represented by multiple neurons is an unsolved researched problem, so it’s understandable that they started with individual neurons.

Joakim’s Substack

Discussion about this post