CEU Electronic Theses and Dissertations, 2025
Author | Vaca, Felipe |
---|---|
Title | Consistency and Quality of fit of Stochastic Block Models in Realistic Settings |
Summary | The structure of real-world networked systems is crucial for understanding their origin, evolution, and behavior. Network structure can be summarized by decomposing the network into subsets of elements and assuming that the rate of interactions between individual elements is driven by such groupings. These groups, commonly referred to as ``communities'', play an important role in the network formation process and may significantly shape the behavior of the underlying system. Generative network models are flexible and robust approaches to detect communities in network data. The family of Stochastic Block Models (SBMs), along with Bayesian inference tools, has proven useful for community detection and link prediction tasks. SBMs yield a coarse-grained description of the network data in a statistically principled way, which prevents drawing misleading conclusions due to spurious patterns, and simultaneously, allows the discovery of existing patterns in the data. However informative, SBMs are approximations of real-world networks and rely on several simplifying assumptions that are unlikely to be valid in various empirical settings. Currently, neither the extent of these potential discrepancies in empirical network data nor the consequences that SBM modeling inconsistencies can introduce are well understood. This dissertation aims to address this issue by conducting large-scale studies of SBM fits to hundreds of empirical networks to uncover systematic patterns in SBMs performance. We consider two complementary approaches to assess the quality of the model, namely model checking and model selection. In model checking, the goal is to understand how the model fails in describing the data, as a path towards model comprehension, revision, and improvement. To this end, we first use posterior predictive checks, which involves comparing networks generated by the inferred model with the empirical network, according to a set of network descriptors. Additionally, we conduct another study in a scenario with noisy network measurements, where we use a network reconstruction framework to test the accuracy of SBM estimates of underlying patterns of empirical networks. In both analyses, we observe that while the SBM provides accurate descriptions or estimates for most networks in the corpus, it does not fulfill all modeling requirements, particularly for transportation networks. Finally, we study model selection approaches, considering several variants of the SBM. We evaluate the models based on their compression ability and predictive power, and examine the agreements and disagreements between these model selection criteria. Overall, we find consistency between such criteria, i.e., the most compressive model is also the most predictive. Nevertheless, compression criteria tend to be more reliable for model selection, as predictive criteria cannot always determine which SBM variant is better. Thus, this dissertation aims to provide a better understanding of the behavior of SBMs, their capabilities, and limitations as approximations of true underlying models of real-world networks. |
Supervisor | Peixoto, Tiago |
Department | Network Science PhD |
Full text | https://www.etd.ceu.edu/2025/vaca_felipe.pdf |
Visit the CEU Library.
© 2007-2021, Central European University