Community Detection in Complex Networks

Finding communities in networks has always been a central study of network analysis. With the advent of online social networks and digital user behavior, we can now obtain rich information about various types of interactions between individuals, such as in-person and online interactions, and about interactions between various types of entities like users, videos, and hashtags, to name a few. These multidimensional and multi-modal complex networks present unique challenges for problems like community detection as different modes and dimensions of interactions often have very different network topologies, which also influence each other. The objective of our research is then to find communities within complex networks that consist of multidimensional, multi-modal, and temporal networks, and combinations thereof.

To find communities in complex networks, we are exploring a host of techniques from genetic algorithms, spectral methods, neural networks, and unsupervised machine learning. In the first of or research projects, we are using k Nearest Neighbors (k-NN) unsupervised machine learning along with Newman Modularity to find communities and latent network representations of multidimensional networks. We are exploring various dimensionality reduction techniques, as are used in spectral methods, and auto-encoders from neural network research to create low-dimensional affinity matrices based on multidimensional networks. We then find both latent networks and community assignments form the affinity matrices using a multi-objective genetic algorithm k-NN that finds best fit networks and communities based primarily on Newman Modularity. In the second research project, we are using the concept of coclustering - also known as direct clustering or biclustering - to find communities within multi-modal networks. In this project we are finding communities between various classes of entities based on interactions between and within the classes, for all classes of entities at the same time.

Thus far, our preliminary results show promise at finding meaningful communities. Using the multi-objective genetic algorithm k-NN clustering, we have not only shown comparable performance to any other unsupervised learning technique in terms of accuracy, precision, and recall, but also interesting latent networks. The image below is the latent network, with coloring by community assignment, of the nations of the world by their stability factors (i.e. GDP per capita, infant mortality rate, Gini Index, etc.).

Complex Networks

From the network, it is clear that the communities recovered match expectations about the stability of different nations. Furthermore, the latent network shows us those nations that are close to moving to new communities of stability as well as the fact that the most stable nations (yellow) are distant and not connected to the least stable nations (blue); the most stable nations share almost nothing in common in terms of stability indicators with the least stable nations. We believe this research will allow for the first scalable community detection methods of full, complex networks that are multidimensional and multi-modal.