Speaker
Moritz Munchmeyer
(University of Wisconsin–Madison)
Description
I will give a brief review of how large-language models are now being used for theoretical physics research. I will show the rapid progress of these models at the example of the TPBench benchmark, and present our recent work on improving their reliability with a symbolic verification agent and test-time scaling techniques. I will also discuss whether these models are truly reasoning and speculate how we might improve their performance in our field in the future.
External references
- 25080025
- 782a636c-03c7-4e4e-ad85-b75a5b351ec3