Bridging the Data Gap Between Children and Large Language Models

While large language models require billions of words of text to show zero shot generalization and in-context learning, children show the same emergent behaviors with just a few million words of language input. What accounts for this difference? I’ll be discussing some of our attempts to measure and understand how language models and multimodal models can be compared productively with children’s learning using datasets and evaluations from developmental psychology.

Biography:

Michael C. Frank is Benjamin Scott Crocker Professor of Human Biology in the Department of Psychology at Stanford University and Director of the Symbolic Systems Program. He received his PhD from MIT in Brain and Cognitive Sciences in 2010. He studies children’s language learning and development, with a focus on the use of large-scale datasets to understand the variability and consistency of learning across cultures. He is a founder of the ManyBabies Consortium, and has led open-data projects including Wordbank and the ongoing LEVANTE project. He has received awards including the Troland Award from the National Academy of Sciences and the FABBS Early Career Impact award. He served as President of the Cognitive Science Society, has edited for journals including Cognition and Child Development, and is current co-Editor in Chief of the Open Encyclopedia of Cognitive Science.

For more information about upcoming speakers please visit the TPC Seminar Series Webpage:

https://tpc.dev/tpc-seminar-series/

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Bridging the Data Gap Between Children and Large Language Models

04/24/2024, 10 – 11:15am CT