Distributions of Spearman rank correlations across all datasets and random seeds. Correlation is measured between metric response and perturbation probability. Image: Ryien Hosseini, University of Chicago
Deep generative models have recently achieved significant success in representing graph data, including dynamic graphs, where both topology and features change over time. These models have applications in diverse areas, including social network analysis, infrastructure analysis of grids and networks, the study of biological functions, materials design, and financial fraud detection. However, unlike in vision or language domains, evaluating generative models for dynamic graphs is challenging because visual inspection is impractical, and existing metrics often fail to capture temporal dependencies or the interplay between node and edge features. To address this challenge, a research collaboration between Argonne National Laboratory and University of Chicago used ALCF resources to develop a new metric that provides a quantitative, interpretable measure of dynamic graph similarity, enabling more reliable evaluation of generative models.
Generative models for dynamic graphs promise to accelerate discovery by producing realistic system representations at scale, but their accuracy is difficult to assess. Traditional metrics often discretize continuous-time graph data into static snapshots and compare graph statistics, assuming snapshots are independent. This approach overlooks temporal dependencies, feature evolution, and interactions between topology and features. Without metrics that capture these complexities, it is challenging to determine whether a generative model has captured the essential dynamics of a system or how to improve it.
The researchers introduced a metric based on the Johnson-Lindenstrauss lemma, applying random projections directly to dynamic graph data to produce fixed-dimensional embeddings of variable-length node interactions. These embeddings preserve both topological and feature information while accounting for temporal evolution. The method produces a unified scalar measure that can be applied across graphs of varying sizes and time spans. Using ALCF supercomputers, the researchers scaled their framework to analyze large datasets, enabling evaluations that compare generated dynamic graphs with ground truth sequences.
The team’s method revealed subtle differences between generative models, identifying discrepancies in feature values, edge dynamics, and temporal patterns that traditional metrics often miss. By providing a scalar, expressive measure of similarity, their approach allowed generative models to be ranked and assessed systematically. Sensitivity analyses demonstrated that the metric could detect issues such as mode collapse or missing dynamic behaviors, offering a robust tool for model evaluation.
By providing an efficient, interpretable, and quantitative evaluation method for dynamic graphs, the team’s approach can help researchers assess and improve generative models more reliably. Its general applicability and scalability make it a valuable tool for modeling dynamic systems across diverse fields, from network science to biology, supporting the development of more accurate and trustworthy AI models for discovery and design.
Hosseini, R., F. Simini, V. Vishwanath, R. Willett, and H. Hoffmann. “Quality Measures for Dynamic Graph Generative Models,” The Thirteenth International Conference on Learning Representations (ICLR 2025) (April 2025), Singapore, OpenReview.
https://openreview.net/pdf?id=8bjspmAMBk