Network embedding unveils the hidden interactions in the mammalian virome


Predicting host-virus interactions is fundamentally a network science problem. We develop a method for bipartite network prediction that combines a recommender system (linear filtering) with an imputation algorithm based on low-rank graph embedding. We test this method by applying it to a global database of mammal-virus interactions and thus show that it makes biologically plausible predictions that are robust to data biases. We find that the mammalian virome is under-characterized anywhere in the world. We suggest that future virus discovery efforts could prioritize the Amazon Basin (for its unique coevolutionary assemblages) and sub-Saharan Africa (for its poorly characterized zoonotic reservoirs). Graph embedding of the imputed network improves predictions of human infection from viral genome features, providing a shortlist of priorities for laboratory studies and surveillance. Overall, our study indicates that the global structure of the mammal-virus network contains a large amount of information that is recoverable, and this provides new insights into fundamental biology and disease emergence.

Patterns, 4(6): 100738