Predicting the connectivity of primate cortical networks from topological and spatial node properties
© Costa et al. 2007
Received: 23 August 2006
Accepted: 08 March 2007
Published: 08 March 2007
Skip to main content
© Costa et al. 2007
Received: 23 August 2006
Accepted: 08 March 2007
Published: 08 March 2007
The organization of the connectivity between mammalian cortical areas has become a major subject of study, because of its important role in scaffolding the macroscopic aspects of animal behavior and intelligence. In this study we present a computational reconstruction approach to the problem of network organization, by considering the topological and spatial features of each area in the primate cerebral cortex as subsidy for the reconstruction of the global cortical network connectivity. Starting with all areas being disconnected, pairs of areas with similar sets of features are linked together, in an attempt to recover the original network structure.
Inferring primate cortical connectivity from the properties of the nodes, remarkably good reconstructions of the global network organization could be obtained, with the topological features allowing slightly superior accuracy to the spatial ones. Analogous reconstruction attempts for the C. elegans neuronal network resulted in substantially poorer recovery, indicating that cortical area interconnections are relatively stronger related to the considered topological and spatial properties than neuronal projections in the nematode.
The close relationship between area-based features and global connectivity may hint on developmental rules and constraints for cortical networks. Particularly, differences between the predictions from topological and spatial properties, together with the poorer recovery resulting from spatial properties, indicate that the organization of cortical networks is not entirely determined by spatial constraints.
Scientific-technological advances over the last decades have produced ever-increasing experimental knowledge about brain organization and dynamics. In particular, modern anatomical techniques have provided extensive data on the interconnections of cerebral cortical areas in the brains of animals such as the cat or rat, or non-human primates such as the rhesus monkey. The intricate, non-random connectivity of cortical brain regions mediates the diverse and flexible sensory, cognitive and behavioral functions of the mammalian brain. However, the topological organization of these networks  as well as their spatial layout in the brain  are still incompletely understood. This is particularly apparent for the connectivity of the human cerebral cortex, which is largely unknown, due to experimental limitations .
A fundamental open problem in systems neuroscience is the relationship between specialized features of local nodes, such as areas of the cerebral cortex, and the global interaction and integration of these nodes in the neural networks. One aspect of this relationship concerns the question from which features of the local nodes structural connectivity between them might be predicted.
We address this question with the help of network analysis approaches . Because cortical networks are typically complex, little insight can be obtained through their visualization alone. Therefore, useful objective and quantitative characterizations of complex networks ultimately rely on the estimation of a number of complementary measurements of their properties . Network measurements typically provide information about specific topological or geographical features of the networks. For instance, the node degree provides a simple and valuable quantification of the intensity of connections between a specific node and the rest of the network. However, it says nothing about the origin or destinations of such connections. On the other hand, the clustering coefficient of a node provides an objective quantification of the degree in which the immediate neighbors of a node (nodes which can be reached directly without involving any intermediate nodes) are interconnected, but provides no information about the rest of the network. Because of the specificity and complementariness of typical network measurements, an essential question arises regarding what subsets of measurements are more complete, in the sense of allowing accurate, or at least reasonably approximate, reconstruction of the original network from its respective topological or geographical measurements. Remarkably, this question has been little explored in the complex networks literature (however, see  for an initial foray in this area).
It is important to note that the problem of network reconstruction from topological features is in a sense circular. Such features are derived from the complete connectivity of the network, so global connectivity may be inferred by taking itself into account. However, this is by no means a trivial task. For instance, guessing which nodes are specifically interconnected, based on measurements such as their degree or clustering coefficient, is almost invariably an impossible task. The exercise of trying to reconstruct the connections from a collection of topological measurements therefore provides an interesting new way to look at specific properties and structural organization of a complex network. For instance, in case the connectivity could be reasonably guessed from the node degree correlations, this would provide a key insight about its underlying organization.
We consider topological as well as spatial parameters, as biological networks, and brain networks in particular, are embedded in space. It is an interesting question to ask how the topological and spatial organization of these networks relates to each other. In particular, how do the topological and spatial features of individual nodes relate to the connectivity and layout of the whole network? Answers on these questions may inform current theories on the evolution and development of complex biological networks.
The Methods section of this article presents the adopted topological and spatial features and describes the reconstruction methodology based on similarity between sets of features. The analysis was applied to primate cortical brain connectivity (2,402 connections among 95 cortical areas of the Macaque monkey). In order to provide a comparative case, we also describe the application of the same methodology to C. elegans neuronal connectivity [Additional file 1].
Communities 1 and 2 were of comparable size and included N 1 = 44 and N 2 = 51 nodes, respectively, and E 1 = 1326 and E 2 = 1280 directed edges. The clustering coefficients obtained for the two identified communities were found to be equal to 0.52 and 0.68, respectively. This might be explained by the higher number of connections within communities. Whereas the global edge density of the network was 0.17, the densities within the communities were 0.50 and 0.66.
Correlations of topological node-based measures.
Avg. shortest path distance
Avg. shortest path distance
Correlations of spatial node-based measures.
Coeff. variation nearest distances
Coeff. Variation nearest distances
Except for the strong correlation between the node degree and matching index, all other pairs of measurements were unremarkable, supporting the complementariness of the adopted sets of features.
Expected ratios of correct ones and zeroes, and respective geometrical average.
R0 = r0
Cortical community 1
Cortical community 2
Network reconstruction from individual and combined topological node measures.
1, 2, 3
1, 2, 4
1, 3, 4
2, 3, 4
Network reconstruction from individual and combined spatial node measures.
5, 6, 7
5, 6, 8
5, 7, 8
6, 7, 8
Network reconstruction from combinations of topological and spatial node measures.
2, 4, 5
2, 4, 7
2, 5, 7
4, 5, 7
Interestingly, a comparison between the adjacency matrices in Figure 6 and 7 immediately shows that the networks inferred on the basis of the measurements of topological properties at each node reproduced the original connectivity better than networks constructed by the consideration of the spatial properties.
Comparison of network reconstructions with random benchmarks.
Mixed feats. reconstr.
Mixed feats. reconstr.
Potential impact of unknown data.
Version of matrix
We have explored the role of local topological and spatial features in determining cortical connectivity. Topological features had been analyzed before  with a measure similar to the matching index used here as a predictor of primate visual cortex connectivity. Previous studies were also applying the notion of neighborhood as a predictor of connectivity which suggests that spatially close regions tend be connected by fiber tracts . In this article, we have expanded such notions by testing the relative impact of several topological and spatial constraints on neural network organization.
In general, a small number of local features is sufficient for predicting connections between regions. In the case of the topological features, the matching index represented the most effective individual feature for reconstruction of both communities, while the best selection for community 2 also required the clustering coefficient. This result substantiates the particular role of this feature for cortical organization  and means that cortical areas which have similar inputs and outputs also tend to be connected with each other. The best reconstructions obtained from spatial features were obtained by considering the local density for both communities. The area size was also required for the best reconstruction of community 2. These results suggest that regions with similar local densities tend to connect to one another. In the case of community 2, region interconnections also appear to favor similar area size.
Concerning single features for the prediction of connections, topological features led to a better estimation than spatial features. This may be partly explained by the fact that topological node features by their definition are indirectly linked to global network organization, as mentioned previously. It is, however, surprising that the 'purest' spatial parameter (parameter 8: area coordinates, which expresses the proximity between areas) did not result in a strong prediction for connectivity, as spatial distance has been previously put forward as an important factor in primate cortical connectivity . This can be explained by the existence of a significant number of long-range connections in cortical networks, resulting from the fact that some regions are part of a network cluster but nonetheless spatially distant. In these cases, such as the frontal eye field being spatially distant from the rest of the visual cortex, spatial proximity would not predict a connection. Indeed, there exists a significant proportion of long-distance connections in biological neural networks  which ensures a low number of processing steps across these systems .
Since previous tract tracing studies have focused on the visual cortex, there might exist additional connections mainly within and between motor, auditory, and somatosensory cortices. As demonstrated for a smaller subgraph of the primate cortical network, our reconstruction approach could be used to guide future experimental studies, by deriving hypotheses about currently unknown projections which would be expected to exist or be absent. The analysis of different versions of this subgraph, with varying proportions of unknown connections assumed to exist, also demonstrated that the principal conclusions of this study do not depend on the number of currently unknown connections which may be discovered in the future.
An earlier analysis of the relationship between the surface size of cortical areas and the number of projections they send or receive found no significant correlation between these parameters . The present analysis suggests that area size may be a factor contributing to the prediction of connections, after all (Results, section 'Comparison between original and reconstructed networks'). Thus, perhaps what matters is not the absolute area size, but the matched size of the connected regions.
For the feature analysis we transformed unidirectional projections into bidirectional connections. This resulted in 3,044 directed edges compared to the original 2,402 directed edges. This step was necessary as the reconstruction based on spatial distance depends on the Euclidean distance which is symmetric in both directions. It may be an interesting task for the future to repeat the topological analyses based on unidirectional measures.
The observed relationships between local node properties and global connectivity may hint on developmental rules. As the reconstruction approach worked well for the primate network, but not for the neuronal connections in C. elegans [see Additional file 1], it appears that the organization of neural networks is subject to different constraints in these two systems. One possibility is that the neuronal network of C. elegans, which is identical in each organism, could be largely determined by genetic factors  which may prescribe a specific connectivity independent of simple topologic or spatial rules. For larger neural systems, however, it may be impossible to encode the entire connectivity between cortical regions within the genome, resulting in a larger contribution of spatial and topologic constraints in the self-organization of systems connectivity.
Although the connectivity in non-human primates such as the macaque monkey is relatively well known, there is still only little information available about human connectivity. New methods such as diffusion tensor imaging  or post-mortem tract tracing  are applied to human brains but are still hindered by severe experimental limitations. It is our hope that the topological and spatial features reported in this study may complement and steer the current experimental approaches. These features could provide a basis for assessing the reliability of fiber tract predictions that are based on non-invasive methods.
The reconstruction of neural connectivity from local node properties offers insights into constraints of network organization. In particular, it suggests that neuronal networks in C. elegans and neural networks in the primate cerebral cortex developed under different constraints, and that the layout of primate cortical brain networks is not entirely determined by spatial properties.
We analyzed the organization of 2,402 projections among 95 cortical areas and sub-areas of the primate (Macaque monkey) brain. The connectivity data were retrieved from CoCoMac ([17, 18]) and are based on three extensive neuroanatomical compilations [9, 19, 20] that collectively cover large parts of the cerebral cortex. In the database, reported projections between cortical areas are based on anatomical tract tracing studies where dyes were injected into one cortical area, and anterograde or retrograde transport of the dye indicated target or source areas for projection fibers. Spatial positions of cortical areas were estimated from surface parceling using the CARET software http://brainmap.wustl.edu/caret. The spatial positions of areas were calculated as the average surface coordinate (or center of mass) of the three-dimensional extension of an area (cf. ). While this cortical data set is more extensive than those used in previous studies, it may still be partially incomplete, particularly for connections of motor, auditory and somatosensory areas. The restriction arose from the fact that only studies could be used for which a parcellation scheme with spatial coordinates existed in CARET. In order to avoid potential artifacts associated with the segregation of the data in available reports, we first performed a community analysis of the cortical network and then analyzed the identified two communities separately (Results, section 'Overview and community analysis').
For comparison, we also analyzed two-dimensional spatial representations of the rostral neuronal network (131 neurons, 764 connections) of the nematode C. elegans [see Additional file 1]. Spatial two-dimensional positions (in the lateral plane), representing the position of the soma of individual neurons in C. elegans, were provided by Y. Choe . Neuronal connectivity was obtained from . The dataset was slightly modified as described in detail in .
The cortical as well as the C. elegans datasets are available at .
The connections between cortical regions can be represented and understood as a graph, eg, [10, 24], or complex network, eg, . More specifically, the N = 95 cortical areas considered in this work are represented as nodes while the existing connections between such nodes are expressed in terms of edges. More formally, the cortical network is represented in terms of its adjacency matrix K, with dimension N × N, with the presence of a connection extending from node j to node i being indicated as K(i, j) = 1. The adjacency matrix only gives information about whether a connection between two nodes exists in the network; in particular, it contains only topological information about the network and is not related to the colloquial meaning of adjacent as spatially nearby. A non-directed version K non of the adjacency matrix K can be obtained as
Because we also have information about the spatial position of each cortical region, it is possible to construct a distance matrix D such that D(i, j) represents the Euclidean distance between nodes i and j. Note that both matrices K non and D are symmetric by construction, i.e.: K non (i,j) = K non (j,i) and D(i,j) = D(j,i) for any i and j. It is possible to calculate a series of measurements from matrices K, K non and D in order to characterize the topological and spatial properties of the original network. For these measurements, we used the symmetric topological adjacency matrix K non to be comparable with the symmetric spatial distance matrix D. While such measurements are often performed for the network as a whole, here we focus on local measurements obtained for each network node.
The following 8 node-based measurements (4 topological and 4 spatial) were considered in the analysis.
This simple but informative measurement quantifies the number of edges attached to a node. In the case of non-directed networks, the node degree of node i can be calculated from the respective adjacency matrix as:
Note that the node degree provides a direct measurement of the degree in which the specific node is connected to the rest of the network.
Given a subset S of the network nodes, the clustering coefficient of this set  can be defined as the ratio between the number of edges between the elements of S and the maximum possible number of such connections. Therefore, the clustering coefficient of a specific node i can be more formally defined as
where W i is the set containing the immediate neighbors of i, E(W i ) is the number of edges between such neighbors, and |W i | is the number of elements in the set S. The clustering coefficient of node i therefore expresses how intensely interconnected the neighbors of the reference nodes are concerning direct connections between the nodes. Note that 0 ≤ C C i ≤ 1.
Given any two nodes i and j of a network, they are said to be connected in case there is a sequence of edges extending from one of those nodes to the other, possibly passing through several relay nodes. The shortest path s i,jbetween the two nodes i and j corresponds to the path involving the smallest sum of involved edge segments. The shortest path distance is defined as being equal to the respective sum of edge segments. Note that the shortest path may not be unique, but all such paths will have the same shortest path distance. Because the shortest path is defined with respect to a pair of nodes, and we want to assign a related measurement to each node i, we henceforth consider the average of the shortest paths between i and all the other network nodes as a feature of node i, represented as s i . Note that all nodes in the cortical network are connected.
This measurement, introduced in [26, 27] applies to any pair of nodes i and j (connected or not) and can be conceptually defined as the amount of connectivity overlap between each of those nodes and the remainder of the network. More specifically, in case of non-directed networks, |a i ∧ a j | is the number of common projections that occur in nodes i as well as j denoting the number of common target or source nodes for projection fibers. The total number of connections that occur in node i, in node j, or in both nodes is denoted as |a i ∨ a j |. The matching index is then calculated as:
A low matching index value indicates that the nodes have diverging input and output and are linked to substantially different parts of the network. As with the shortest path, the matching indices are averaged for all nodes.
It is often the case with point distributions (as the centers of mass of the cortical areas) that the number of points per unit area varies along the space. In such cases, it is interesting to consider the local density around each point. This value can be estimated by dividing the number P i of neighboring points contained in a sphere of small radius R centered at the reference point i by the volume of that sphere, i.e.
The quantity P i (R) has been calculated with respect to each node i by counting how many nodes are at distance smaller or equal to R = 15, which corresponds to about 1/4 of the maximum internode distances in the cortical networks. Note that this measurement is influenced by the volume of each cortical region. The larger the volume, the smaller the local density.
Given a reference point i and a maximum radius R, the nearest neighbors Q of that point can be defined as those points which are contained in the sphere of radius R center at point i. The measurements in this work assumed R = 15. The nearest distances of point i are therefore defined as the set of the Euclidean distances between it and each of the nearest neighboring nodes in Q. The coefficient of variation (i.e. the standard deviation divided by the average) of the nearest distances provides an interesting indication about the local distance regularity around each reference point. For instance, a low value of this coefficient indicates that the nearest neighbors of a point are almost equidistant. As with the previous measurement, the coefficient of variation of the nearest distances can also be affected by the volume of the cortical region. More specifically, the larger the volume, the larger this measurement tends to be.
This measurement corresponds simply to the area size of the two-dimensional surface of each cortical region. The surface area was measured directly within three-dimensional space; that means, we did not use a flattened two-dimensional map to estimate the surface extent of a cortical region.
These features, considered together for simplicity's sake, correspond to the x, y and z coordinates of the center of mass of each cortical area. By application of the wiring rule described below to this feature, network nodes were linked that are spatially close to each other.
Node-based characterization measures used for network reconstruction.
Avg. shortest path distance
Standard deviation of nearest distances
Cartesian coordinates (x, y, z)
Hypothetical cortical networks were created by assessing the pairwise similarity of nodes with respect to each of the eight features. Undirected links were created between nodes, if their similarity exceeded a threshold. In order to avoid the need to specify this threshold, we considered a sequence of equally spaced thresholds during the reconstruction and took as result the threshold leading to the best results (i.e., best recovery of the original connectivity).
The topological and spatial context around each node i in the two communities can be characterized in terms of respective feature vector containing a subset of selected measurements at that node. In this work, we consider 1-by-1, 2-by-2 and 3-by-3 combinations of the six measurements described in the section 'Network characterization indices' above. In order to avoid bias implied by the different ranges of each measurement, these values have been standardized . More specifically, for each type of measurement, each value was subtracted from the respective average and divided by the respective standard deviation. Note that these new measurements have zero average and unit standard deviation.
Now it is possible to use the methodology suggested in [29, 30] in order to obtain networks from the feature vectors . Note that each element v i (p) in such a vector represents a possible measurements. In order to do so, we start with N = 95 isolated nodes and, for each possible pair of nodes (i, j), we establish a connection between them, by making and K(i,j) = 1 and K(j,i) = 1, whenever the following condition is met
where n is the number of chosen measurements and T is a pre-fixed threshold. Note that the smaller the value of this threshold, the less intensely connected the respective network will result.
Because the reconstructed networks are fully congruent with the original data, in the sense that they have the same number of nodes and each node refer to the same cortical region, it is possible to obtain a simple and effective measurement of the difference between the original network G and each of the networks F obtained from the topological and spatial features in terms of the distance defined as being equal to the number of different entries in the respective adjacency matrices. More formally, we have that
where K G and K F are the adjacency matrices of the original and reconstructed matrices, and δ (a, b) is the Kronecker delta function, which results 1 whenever a and b are equal and 0 otherwise. The 1/2 factor is necessary in order to account for the fact that in a non-directed graph each edge appears twice in the respective adjacency matrix.
However, this measure, the Hamming distance, provides a biased quantification of the similarity between any two matrices in case the number of zeros and ones is significantly different. For instance, in case a matrix contains few ones and many zeros, its Hamming distance to a null matrix (all entries equal to zero) will be very small. In order to provide a more balanced overall measurement of the similarity between two adjacency matrices A and B, both with the same dimension N × N, we consider the geometrical average g(A, B) between the ratios of correct ones (R 1) and correct zeros (R 0). More specifically, in case matrix A contains A 0 zeroes and A 1 ones, and matrix B contains b 0 zeroes coinciding with the zeroes of A and b 1 coinciding ones, we define R 1 = b 1/A 1 and R 0 = b 0/A 0. The two matrices will be maximally similar in case g (A, B) = = 1, which is verified if and only R 1 = R 0 = 1.
It is possible to obtain a random reference for the comparison between any two adjacency matrices A and B as follows. Let the ratio of ones in A be r 1 = A 1/N 2 and the ratio of zeroes be r 0 = A 0/N 2. It can be shown that the average expected ratios of correct ones and zeros while comparing matrix A with matrices B generated randomly (uniform probability) with the same ratio r 1 of ones are given as R 1 =r 1 and R 0 = r 0.
These comparisons with the original connectivity and random benchmarks were applied to all adjacency matrices reconstructed from individual and combined node features.
Luciano da F. Costa thanks FAPESP (05/00587-5) and CNPq (308231/03-1) for sponsorship. Marcus Kaiser acknowledges support from EPSRC (EP/E002331/1).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.