Name: Theory + AI Symposium
Start: 2025-04-07T13:00:00-04:00
End: 2025-04-08T19:30:00-04:00
Location: Perimeter Institute for Theoretical Physics

Theory + AI Symposium

from Monday, April 7, 2025 (1:00 p.m.) to Tuesday, April 8, 2025 (7:30 p.m.)

Monday, April 7, 2025

1:00 p.m. Registration
Registration
1:00 p.m. - 1:45 p.m.
Room: PI/1-100 - Theatre

1:45 p.m. Participants make their way into the Theatre
Participants make their way into the Theatre
1:45 p.m. - 2:00 p.m.
Room: PI/1-100 - Theatre
2:00 p.m. Panel Discussion - Marcela Carena (Perimeter Institute) Jesse Thaler (MIT) Vicky Kalogera (Northwestern University) Roger Melko (University of Waterloo) Shirley Ho (Polymathic)
Panel Discussion
- Marcela Carena (Perimeter Institute)
- Jesse Thaler (MIT)
- Vicky Kalogera (Northwestern University)
- Roger Melko (University of Waterloo)
- Shirley Ho (Polymathic)
2:00 p.m. - 3:00 p.m.
Room: PI/1-100 - Theatre
3:00 p.m. Break
Break
3:00 p.m. - 3:30 p.m.
Room: PI/1-100 - Theatre
3:30 p.m. Colloquium: Boltzmann Machines - Geoffrey Hinton (University of Toronto)
Colloquium: Boltzmann Machines
- Geoffrey Hinton (University of Toronto)
3:30 p.m. - 4:30 p.m.
Room: PI/1-100 - Theatre The standard way to do this is to use the chain rule to backpropagate gradients through layers of neurons. I shall briefly review a few of the engineering successes of backpropagation and then describe a very different way of getting the gradients that, for a while, seemed a lot more plausible as a model of how the brain gets gradients. Consider a system composed of binary neurons that can be active or inactive with weighted pairwise couplings between pairs of neurons, including long range couplings. If the neurons represent pixels in a binary image, we can store a set of binary training images by adjusting the coupling weights so that the images are local minima of a Hopfield energy function which is minus the sum over all pairs of active neurons of their coupling weights. But this energy function can only capture pairwise correlations. It cannot represent the kinds of complicated higher-order correlations that occur in images. Now suppose that in addition to the "visible" neurons that represent the pixel intensities, we also have a large set of hidden neurons that have weighted couplings with each other and with the visible neurons. Suppose also that all of the neurons are asynchronous and stochastic: They adopt the active state with a log odds that is equal to the difference in the energy function when the neuron is inactive versus active. Given a set of training images, is there a simple way to set the weights on all of the couplings so that the training images are local minima of the free energy function obtained by integrating out the states of the hidden neurons? The Boltzmann machine learning algorithm solved this problem in an elegant way. It was proof of principle that learning in neural networks with hidden neurons was possible using only locally available information, contrary to what was generally believed at the time.
Tuesday, April 8, 2025

9:00 a.m. Opening Remarks
Opening Remarks
9:00 a.m. - 9:10 a.m.
Room: PI/1-100 - Theatre
9:10 a.m. EAIRA: Establishing a methodology to evaluate LLMs as research assistants. - Frank Cappello (Argonne National Laboratory)
EAIRA: Establishing a methodology to evaluate LLMs as research assistants.
- Frank Cappello (Argonne National Laboratory)
9:10 a.m. - 9:40 a.m.
Room: PI/1-100 - Theatre Recent advancements have positioned Large Language Models (LLMs) as transformative tools for scientific research, capable of addressing complex tasks that require reasoning, problem-solving, and decision-making. Their exceptional capabilities suggest their potential as scientific research assistants, but also highlight the need for holistic, rigorous, and domain-specific evaluation to assess effectiveness in real-world scientific applications. This talk describes a multifaceted methodology for Evaluating AI models as scientific Research Assistants (EAIRA) developed at Argonne National Laboratory. This methodology incorporates four primary classes of evaluations. 1) Multiple Choice Questions to assess factual recall; 2) Open Response to evaluate advanced reasoning and problem-solving skills; 3) Lab-Style Experiments involving detailed analysis of capabilities as research assistants in controlled environments; and 4) Field-Style Experiments to capture researcher-LLM interactions at scale in a wide range of scientific domains and applications. These complementary methods enable a comprehensive analysis of LLM strengths and weaknesses with respect to their scientific knowledge, reasoning abilities, and adaptability. Recognizing the rapid pace of LLM advancements, we designed the methodology to evolve and adapt so as to ensure its continued relevance and applicability. This talk describes the current methodology's state. Although developed within a subset of scientific domains, the methodology is designed to be generalizable to a wide range of scientific domains.
9:40 a.m. State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project - Moritz Munchmeyer (University of Wisconsin–Madison)
State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project
- Moritz Munchmeyer (University of Wisconsin–Madison)
9:40 a.m. - 9:50 a.m.
Room: PI/1-100 - Theatre The newest large-language reasoning models are for the first time powerful enough to perform mathematical reasoning in theoretical physics at graduate level. In the mathematics community, data sets such as FrontierMath are being used to drive progress and evaluate models, but theoretical physics has so far received less attention. In this talk I will present our dataset TPBench (arxiv:2502.15815, tpbench.org), which was constructed to benchmark and improve AI models specifically for theoretical physics. We find extremely rapid progress of models over the last months, but also significant challenges at research level difficulty. I will also briefly outline strategies to improve these models for theoretical physics.
9:50 a.m. UniverseTBD: Democratising Science with AI & Why Stories Matter - Ioana Ciuca (Stanford University)
UniverseTBD: Democratising Science with AI & Why Stories Matter
- Ioana Ciuca (Stanford University)
9:50 a.m. - 10:00 a.m.
Room: PI/1-100 - Theatre UniverseTBD is an interdisciplinary community of astronomers, AI researchers, engineers, artists and enthusiasts aligned on a bold mission to democratise Science for everyone. From releasing the first large language model in Astronomy, AstroLLaMA-1, to the AI-enabled literature discovery tool Pathfinder, and through our research with AstroPT and HypoGen, our team has pushed the boundaries of AI for Science for the past two years. In this talk, I discuss for the first time how UniverseTBD came to be, our vision, our values, and what drives us and has enabled us to scale our team projects in our commitment to share our learnings with the broader scientific community. I also briefly discuss our latest results with hypothesis generation (HypoGen), multimodal language models (AstroLlaVA-1) and agentic AI (AstroCoder). I conclude with a vision for the future where AI teams up with human researchers to "help us understand the Universe".
10:00 a.m. Panel Discussion: Foundation Models for Theoretical Physics (Physicists)
Panel Discussion: Foundation Models for Theoretical Physics (Physicists)
10:00 a.m. - 10:30 a.m.
Room: PI/1-100 - Theatre

10:30 a.m. Break
Break
10:30 a.m. - 11:00 a.m.
Room: PI/1-100 - Theatre
11:00 a.m. arXiv: AI and Physics past, present and future - Steinn Sigurdsson (arXiv)
arXiv: AI and Physics past, present and future
- Steinn Sigurdsson (arXiv)
11:00 a.m. - 11:30 a.m.
Room: PI/1-100 - Theatre The rise of AI, in particular recent LLM based tools, has had an immediate impact on the production of physics, in ways both good and bad. I discuss some of the impact seen on arXiv in particular, what the status quo and prospects are, and speculate on the longer term impact.
11:30 a.m. LaTeXML and the Math-rich Scholarly Web - Deyan Ginev (LaTeXML)
LaTeXML and the Math-rich Scholarly Web
- Deyan Ginev (LaTeXML)
11:30 a.m. - 11:40 a.m.
Room: PI/1-100 - Theatre This short talk will outline some of LaTeXML's uses as infrastructure, as well as its enabling effect for search, AI, assistive technologies and the mobile web. We have been on a journey towards scholarly articles with web-native mathematics since the dawn of the internet. The physics Open Science movement has led the way along with LaTeX, its authoring framework of choice. NIST's LaTeXML is a conversion tool that in the last twenty years has increasingly bridged that gap.
11:40 a.m. Searching Graphics and Text in Technical Documents: A Brief Overview and Plan - Richard Zanibbi (Rochester Institute of Technology)
Searching Graphics and Text in Technical Documents: A Brief Overview and Plan
- Richard Zanibbi (Rochester Institute of Technology)
11:40 a.m. - 11:50 a.m.
Room: PI/1-100 - Theatre What would effective and usable tools for searching text and graphics in research papers look like? In this talk we sketch a partial answer to this question, with reference to recent work in the Document and Pattern Recognition Lab at RIT. Two multimodal paper search prototypes, one for math (MathDeck) and one for chemistry (ReactionMiner search) will be used for illustration. A simple framework based on 'jars' of available information sources can organize and relate the actions performed by people and automated systems when retrieving, analyzing, and synthesizing sources. We will organize our answer sketch around this framework, and share open questions and research opportunities related to enhancing multi-modal search tools for expert and non-expert users. Note: ReactionMiner was developed in collaboration with NCSA and the Han lab at the University of Illinois, Urbana-Champaign. MathDeck demo: [https://people.rit.edu/ma5339/mathdeck_landing](https://people.rit.edu/ma5339/mathdeck_landing) ReactionMiner search demo: [https://reactionminer.platform.moleculemaker.org/home](https://reactionminer.platform.moleculemaker.org/home) Biography: Richard Zanibbi is a Professor of Computer Science at the Rochester Institute of Technology (RIT, USA) where he directs the Document and Pattern Recognition Lab (dprl@RIT). His research focuses upon the recognition and retrieval of graphical notations, particularly for mathematics and chemistry. He is also a member of the Molecule Maker Lab Institute (MMLI), one of the first NSF AI Centers. He received his PhD from Queen's University (Canada), and was an NSERC Postdoctoral Fellow at the Centre for Pattern Recognition and Machine Learning (CENPARMI) at Concordia University before joining RIT.
11:50 a.m. Natural Proof Checking and AI - Peter Koepke (University of Bonn)
Natural Proof Checking and AI
- Peter Koepke (University of Bonn)
11:50 a.m. - 12:00 p.m.
Room: PI/1-100 - Theatre From the start of AI, mathematical theorem proving has been an important challenge and technique. We sketch the topics of Automated and Interactive Theorem Proving and present the checking of naturally readable mathematical texts in the Naproche proof system. This involves translations between informal, semi-formal and formal mathematical languages and shows great potential for the use of new AI techniques.
12:00 p.m. Panel Discussion: Processing the Data of Theoretical Physics (Engineers)
Panel Discussion: Processing the Data of Theoretical Physics (Engineers)
12:00 p.m. - 12:30 p.m.
Room: PI/1-100 - Theatre

12:30 p.m. Lunch
Lunch
12:30 p.m. - 1:30 p.m.
Room: PI/1-100 - Theatre
1:30 p.m. Accelerating Discovery: Mapping the Future of AI-Enhanced Theoretical Physics - Axton Pitt (Litmaps)
Accelerating Discovery: Mapping the Future of AI-Enhanced Theoretical Physics
- Axton Pitt (Litmaps)
1:30 p.m. - 1:40 p.m.
Room: PI/1-100 - Theatre This talk explores how artificial intelligence could transform theoretical physics over the next 25 years by addressing the crucial challenge of navigating an increasingly complex scientific literature landscape. We introduce Litmaps, a platform leveraging AI and visualization techniques to accelerate literature discovery and insights. We illustrate Litmaps' current capabilities in rapidly identifying relevant connections and advancing theoretical research. We also outline critical engineering challenges, including open access to historical literature, data standardization, and managing uncertainty in AI models. Finally, we highlight the importance of collaboration among physicists, AI researchers, engineers, and entrepreneurs, to realise the AI-enhanced future of theoretical physics research.
1:40 p.m. Teaching and Mentoring the AI Scientists - Xiaoliang Qi (Stanford University)
Teaching and Mentoring the AI Scientists
- Xiaoliang Qi (Stanford University)
1:40 p.m. - 1:50 p.m.
Room: PI/1-100 - Theatre In the past two years, the LLM has made significant progress in math and reasoning, but it has not been applied widely in scientific research tasks. In this talk I will give a brief introduction to our on-going efforts on building the first AI scientist platform, where all researchers in different fields can contribute to teaching the AI scientists via contributing benchmarks and contributing specialized tools. We believe that by providing AI with the real-time updates of benchmarks and research tools, we are starting to enter an era with innovation driven by new types of human-AI collaboration.
1:50 p.m. Beyond Articles: Three Pillars of Scientific Transformation - Oleg Ruchayskiy (Niels Bohr Institute)
Beyond Articles: Three Pillars of Scientific Transformation
- Oleg Ruchayskiy (Niels Bohr Institute)
1:50 p.m. - 2:00 p.m.
Room: PI/1-100 - Theatre Scientific research is facing mounting challenges: overwhelmed reviewers, fragmented expertise, an outdated and inefficient system for allocating resources, and dissemination tools that no longer match the complexity and scale of modern scientific output. In this talk, I speak both as a researcher and as the founder of a successful startup to explore what a more coherent, future-ready scientific ecosystem could look like. Drawing on real prototypes and emerging tools, I’ll outline how we might reshape the way we publish, collaborate, and share not just to fix what’s broken, but to unlock what science could become tomorrow.
2:00 p.m. Panel Discussion: Startups Accelerating Theoretical Physics (Entrepreneurs)
Panel Discussion: Startups Accelerating Theoretical Physics (Entrepreneurs)
2:00 p.m. - 2:30 p.m.
Room: PI/1-100 - Theatre
2:30 p.m. Keynote Q&A - Stephen Wolfram (Wolfram Research)
Keynote Q&A
- Stephen Wolfram (Wolfram Research)
2:30 p.m. - 3:00 p.m.
Room: PI/1-100 - Theatre
3:00 p.m. Break
Break
3:00 p.m. - 3:30 p.m.
Room: PI/1-100 - Theatre
3:30 p.m. Human Level AI by 2030 - Jared Kaplan (Anthropic)
Human Level AI by 2030
- Jared Kaplan (Anthropic)
3:30 p.m. - 4:00 p.m.
Room: PI/1-100 - Theatre
4:00 p.m. Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 - Yuri Chervonyi (Deep Mind)
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
- Yuri Chervonyi (Deep Mind)
4:00 p.m. - 4:10 p.m.
Room: PI/1-100 - Theatre We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems. To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects, and problems containing linear equations of angles, ratios, and distances. This, together with support for non-constructive problems, has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%. The search process of AlphaGeometry2 has also been greatly improved through the use of Gemini architecture for better language modeling, and a novel knowledge-sharing mechanism that enables effective communication between search trees. Together with further enhancements to the symbolic engine and synthetic data generation, we have significantly boosted the overall solving rate of AlphaGeometry2 to 84% for all geometry problems over the last 25 years, compared to 54% previously. AlphaGeometry2 was also part of the system that achieved silver-medal standard at IMO 2024 this https URL. Last but not least, we report progress towards using AlphaGeometry2 as a part of a fully automated system that reliably solves geometry problems directly from natural language input.
4:10 p.m. LitLLMs, LLMs for Literature Review: Are We There Yet? - Gaurav Sahu (MILA)
LitLLMs, LLMs for Literature Review: Are We There Yet?
- Gaurav Sahu (MILA)
4:10 p.m. - 4:20 p.m.
Room: PI/1-100 - Theatre Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. In this talk, we will explore the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We will decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We will then analyze how effective LLMs are for both components. For retrieval, we will discuss a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we will study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. We will then discuss the two-step generation phase that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We will also see a quick demo of LitLLM in action towards the end.
4:20 p.m. Panel Discussion: Harnessing Breakthroughs in Big Tech (AI Researchers)
Panel Discussion: Harnessing Breakthroughs in Big Tech (AI Researchers)
4:20 p.m. - 5:00 p.m.
Room: PI/1-100 - Theatre