Conceived and designed the experiments: FC VDL MB AC. Performed the experiments: FC VDL MB AC. Analyzed the data: FC VDL MB AC. Wrote the paper: FC VDL MB AC.
The authors have declared that no competing interests exist.
Community detection is an important tool for exploring and classifying the properties of large complex networks and should be of great help for spatial networks. Indeed, in addition to their location, nodes in spatial networks can have attributes such as the language for individuals, or any other socioeconomical feature that we would like to identify in communities. We discuss in this paper a crucial aspect which was not considered in previous studies which is the possible existence of correlations between space and attributes. Introducing a simple toy model in which both space and node attributes are considered, we discuss the effect of spaceattribute correlations on the results of various community detection methods proposed for spatial networks in this paper and in previous studies. When space is irrelevant, our model is equivalent to the stochastic block model which has been shown to display a detectabilitynon detectability transition. In the regime where space dominates the link formation process, most methods can fail to recover the communities, an effect which is particularly marked when spaceattributes correlations are strong. In this latter case, community detection methods which remove the spatial component of the network can miss a large part of the community structure and can lead to incorrect results.
Many networks are embedded in real space and there is a cost associated to the length of links. Examples of such spatial networks can be found in infrastructures such as power grids, distribution and logistic networks, transportation and mobility networks, and also in computer science or biology with the Internet and neuronal networks respectively (see for example the review
In spatial networks, each node is described by its coordinates (usually in a 2d space) but has in general other attributes. For individuals, it can be any cultural or socioeconomical parameter. For infrastructure networks such as power grids, it can be the voltage at the electric substations. In general, this attribute depends on space and the resulting network displays entangled layers of parameters. An important goal in the analysis of these networks is to disentangle these different levels and to extract some mesoscopic information from the spatial network structure. If one is interested in studying effects beyond space
A natural tool for such a task is community detection which was used for the characterization at a mesoscopic scale of the properties of complex networks (see
Community detection can have several purposes in spatial networks
In the general case, for a given network we don’t know to what extent the existence of a link between a pair of nodes is due to a specific factor or to space only. The link could exist because of a strong attribute affinity between the nodes, or in the other extreme case, because they are close neighbors. In general, one could expect a combination of these two effects. If we are interested in recovering communities defined by an attribute (such as language for example) from the network structure, we then have to consider various assumptions such as the correlation between link formation, attribute values and space. In order to understand the effect of the underlying correlations, we can consider two extreme cases. When the links are purely spatial and independent from the attributes, if we remove the spatial component, we will observe random communities (obtained for a random graph) which contain a random number of nodes with random attributes. In this situation, community detection is unapplicable and there is no way to recover attribute communities from the network structure. The other extreme case is when the formation of a link depends on the attributes only. In this case, space is irrelevant and any standard community detection method should give sensible results, ie. communities made of nodes with the same attribute.
The important problem of interest here is thus the intermediate case when the probability to have a link depends both on attributes and on space. In this case we have to eliminate spatial effects in order to recover the attribute structure. An important point in the discussion is then the existence of correlation between space and attributes. The nature and existence of these correlations will govern the way we will have to do community detection. In this paper, we construct a simple artificial network model allowing us to investigate the effect of these correlations on the results of the community detection procedure. We will test various methods on this toy model.
In order to test these ideas and how community detection acts on spatial networks, we define a simple model of spatial networks with attributes. The attributes could be anything and we will restrict  without loss of generality  to the simple binary case where the attributes can have two possible values at each node. We will introduce a simple model where nodes and their attributes are randomly distributed in space. In general, according to the various parameters of the model, the attributes can be delocalized in space or, on the contrary, be localized in some welldefined region. In some cases, some attribute community could emerge in space, but our target community structure will always be the partition of the network in the two subgraphs composed of nodes with the same attribute and we will test how various methods can recover these two communities. In this respect the main focus of our work will be the disentanglement of the sole attribute network features beyond the spatial node arrangements.
In the A panel, we display the case
Spatially correlated: ( 
• Links are between neighboring nodes but spatialcommunities correspond to the attribute ones.• Any regular community detection will work.  • Links are between nodes with the same attribute.• Any community detection method should work. 
Spatially uncorrelated: ( 
• Links are between neighboring nodes but theattributes are anywhere in space.• It is necessary to ‘remove’ space in orderto uncover the attribute communities.  • Links are between nodes with the same attribute.• Any community detection method should work. 
We construct the test (benchmark) network defining the vertex and edge properties in the following way.
1. We generate points/nodes in the 2
2. We assign an attribute
So the relevant parameters for the generation of network nodes are
3. We then construct the network: for each pair of nodes, we create a link between nodes
When
In this way the probability associated to a link depends on both space and attribute, and the correlation between attributed and space can be controlled. If the attribute is the same between two nodes the probability to have a link will be reinforced, otherwise it will be weakened, the interplay being controlled by the parameter
The generation of attributes is an important point. We have two values of the attribute only so that we need to generate attributes for only half (
Attributes and space uncorrelated: this case is recovered by choosing
Attributes and space are strongly correlated. For this, we choose
Furthermore we can distinguish two different spatial arrangements for the northern and southern communities. The first case corresponds to a situation where the two communities are well separated with their average size
There are many proposal in the literature for networks benchmarking (see for example
The interplay between space and attributes can lead to various situations that need to be understood within the framework of community detection. Indeed we have two main regimes
(
(
The color (red and green) are the attributes, while the geometrical shapes represent the community memberships found with the various community detection procedure discussed in this paper. In the A panel, we present the case
The general assumption of our model is to what extent it is possible to detect communities even if there is a spatial influence. Without space the initial situation is clear: we have two communities by construction and the probability of two nodes to be connected is related to the attribute similarities. Nodes with S = +1 tend mainly to connect to each other and the same for the S = −1 nodes. If we then put nodes in space and enhance the connection probability due to the proximity of nodes, it is not clear if a regular community detection method is able to detect the original two communities structure. We thus see that correlations between space and attributes can be misleading and any community detection method for spatial networks should take into account this problem. There are now many community detection methods
The modularity function which needs to be optimized is defined as
Each point represents the average Jaccard index for 100 network community detection and the error bar is its standard deviation. The correlated case
Expert et al.
We now need a method to compare the community structure obtained with the modularity optimization and the expected one for the attribute membership. Many proposals have been introduced
In order to get a more intuitive picture of the Jaccard index, we show three different cases in
Finally, in order to have a baseline value we also computed the average Jaccard for a completely random partition for
Each point represents the average Jaccard index for 100 network community detection and the error bar is its standard deviation. The correlated case
NewmanGirvan  Data  Spatial  
2*0.0 (correlated) 

VG  B  VG 

VG  B  G  
2* 0.5 (uncorrelated) 

B  VG  G 

B  G  G 
The goal of this spatial community detection is to substract the spatial component and to recover the (two) attribute communities. We thus have three community detection methods: the original NewmanGirvan method, the ‘Data’ method proposed in
This transition was described in
We show here the case
We will now see how these three different methods perform in the two extreme cases of attribute correlated (
The behavior of the model depends on both parameters
In this case, for
In the uncorrelated case (
In
In order to summarize these results we show in
We note that the behavior of the error bar sizes in these
We thus recover the results of
Finally, we checked the performances of the Data and Spatial formulations looking at the
In this paper we propose a simple model which allows us to test community detection on spatial networks. Our model generates simple graphs that mix both geographical properties and attributes. In the literature many other spatial network models have been introduced for which nodes are connected each other through a certain spatial rule. Examples range from the growth of street networks to the evolution of the territorial infrastructural networks (see
In particular, we explicitly show that the existence of correlations between attributes and space drastically affects the result of community detection. The results presented in this study show that community detection in spatial networks should be taken with great care, and that including space in community detection methods could lead to results difficult to interpret. We show that for weak correlations, most community detection methods work, but that for stronger correlation community detection methods which remove the spatial component of the network can lead to incorrect results. It is thus important to have some information on the correlations between space and attributes in order to assess the validity of the results of community detection methods. In practical applications however, these attributesspace correlations are generally not known and this calls for the need of new approaches, for example such as community detection methods including in some tunable form the existence of such correlations.
We thank Gianni Mula for enlightening conversations. MB thanks Linkalab and the Department of Physics of the University of Cagliari for their warm hospitality during the early stage of this work.