Towards a Science of AI: Scaling laws and synthetic dataConfirmed
by
PI/2-292 - Time Room
Perimeter Institute for Theoretical Physics
The stunning capabilities of modern AI systems give rise to many questions regarding how they work and how much more capable they can possibly get. One way to gain additional insight is via synthetic models of data with tunable complexity, which can capture the basic relevant structures of real data. In recent work we have focused on sequences obtained from random walks on graphs, hypergraphs, and hierarchical graphical structures. I will present some recent empirical results regarding how transformers learn sequences arising from random walks on graphs. The focus will be on neural scaling laws, unexpected temperature-dependent effects, and sample complexity. If there is time, I will also discuss the effect of parameterization strategies on hyperparameter scaling laws, where we see the critical importance of appropriately scaling the embedding layer learning rate.
Jaume Gomis