The post-genomic era of biological network alignment

Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches’ biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.


Introduction
Bioinformatics research has revolutionized our understanding of cellular functioning. The field has opened avenues to unveil complex biological mechanisms and their connections to disease. Genomic sequence alignment, in particular, has improved our biomedical knowledge by finding sequence regions of similarities between genes in different species, where the regions likely reflect functional and evolutionary relationships between the sequences [1][2][3][4]. However, genes or their protein products do not function in isolation; rather, they carry out cellular processes by interacting with each other. This is what biological networks model, such as protein-protein interaction (PPI), gene regulatory, or metabolic networks. In a biological network, nodes represent biomolecules (such as genes or proteins), and edges represent physical or functional interactions between the biomolecules (such as PPIs). For simplicity, henceforth, we use terms "gene" for studying disease [22,23]. Importantly, many crucial biological processes and diseases in human are hard to study experimentally, and hence, the corresponding knowledge needs to be transferred from model species [24][25][26][27][28][29][30]. Human aging is an example of such a biological process. Because susceptibility to many prevalent diseases increases with age, studying human aging could aid therapeutics. Yet, human aging is hard to study experimentally due to long human lifespan as well as ethical constraints. Thus, aging-related knowledge in human needs to be obtained computationally, by transferring experimentally obtained aging-related knowledge from model species to human. Traditionally, this transfer has relied on genomic sequence alignment [31]. However, biological network data and genomic sequence data can give complementary biological insights [32][33][34][35], implying that analyses of network data can elucidate functional knowledge that cannot be extracted from sequence data by current methods. Thus, restricting alignment to sequences may limit the knowledge transfer [32][33][34][35][36]. For example, ∼20 % of aging-related genes in model species do not have sequence-based orthologs in human [37]. And while sequence alignment can thus not transfer this knowledge between the species, network alignment can be used to identify network-based functional orthologs across the species and thus further our knowledge of aging. Similar holds for many other biological processes and diseases.
In addition to across-species transfer of functional knowledge discussed above, just as sequence alignment, network alignment can also be used to infer phylogenetic relationships of different species based on similarities between their biological networks [38][39][40]. We note that in the biomedical domain, network alignment has mostly been used in the context of PPI networks. However, the problem is applicable to other types of biological networks, such as gene co-expression networks [41]. Further, network alignment has applications outside of the biomedical domain [42], with implications on, e.g., user privacy in online social networks [43].
Unlike the computationally tractable "linear" sequence alignment, exact alignment of large networks, such as biological ones, is computationally intractable due to the nondeterministic polynomial time (NP)-completeness of the underlying subgraph isomorphism problem, which asks if a network exists as an exact subgraph of another network [44]. Therefore, efficient heuristic approaches need to be sought to solve the network alignment problem approximately.
Similar to sequence alignment, network alignment approaches (or network aligners) can be local and global. Local network alignment aims to find smaller network regions, such as biological pathways or protein complexes, which are highly conserved between larger input networks ( Fig. 1a; for a more formal description, see the following sections). Initial network alignment efforts have focused on local alignment [45][46][47][48][49][50][51][52][53][54]. However, local aligners are generally not capable of finding large subgraphs that are topologically and functionally conserved between input networks. Therefore, most of the recent efforts have focused on global network alignment [18, 19, 25, 38-40, 43, 55-82], which typically aims to map well (almost) entire networks to each other. As such, global alignment is typically capable of finding large subgraphs that are conserved between input networks but at potential expense of suboptimally matching local network regions ( Fig. 1b; for a more formal description, see the following sections).
A network alignment approach can also be categorized as either pairwise or multiple, based on how many networks it can align at once. Pairwise network alignment aligns two networks at a time (Fig. 2a), whereas multiple network alignment can align more than two networks at the same time (Fig. 2b).
There exists an additional categorization of network alignment approaches into one-to-one or many-to-many  methods. One-to-one network alignment produces oneto-one (or injective) node mapping, where a node from a given network can be mapped to at most one unique node from another network (Fig. 2a). On the other hand, many-to-many network alignment produces many-tomany node mapping, where a node from a given network can be mapped to several nodes from another network (Fig. 2b). We note that there also exists an approach that produces one-to-many node mapping, meaning that it maps a node from a given network to multiple nodes from another network, while a node from the later network can be mapped to at most one node from the former network [83].
To date, all local aligners have been of the many-tomany type, while global aligners have been of both one-toone and many-to-many types. Further, one-to-one global aligners have traditionally been associated with pairwise alignment. In this context, nodes in the smaller of the two aligned networks are injectively mapped to nodes in the larger network, thus resulting in aligned node pairs (Fig. 2a). Similarly, many-to-many global aligners have traditionally been associated with multiple alignment. In this context, the output is aligned node clusters rather than pairs, where each cluster can contain multiple nodes from the same network (Fig. 2b). Recently, "hybrid" approaches have appeared, such as one-to-one alignment of multiple networks [75,77,78]. In this case, an aligned node cluster can contain at most one node from each of the aligned networks, and each node can appear in at most one aligned cluster. Table 1 categorizes some of the most prominent network alignment approaches as either local or global, pairwise or multiple, and one-to-one or manyto-many. The approaches are discussed in more detail in the following sections.
A general algorithmic idea behind network alignment approaches is to compute similarities between nodes in different networks with respect to some cost function and rapidly identify from all possible alignments a high-scoring alignment with respect to the node similarities. Many of the existing network alignment algorithms use within their node cost function biological information external to network topology, such as protein sequence similarities. However, to extract the most from each source of biological information, it would be good to know how much of new biological knowledge can be uncovered solely from topology before integrating it with other sources of biological information [39,40,54,60,69,72,73]. Only after methods for topological network alignment are developed that result in alignments of good topological and biological quality, it is beneficial to integrate them with other biological (e.g., sequence) data to further improve the quality.
We note that network alignment is one possible type of network comparison. There also exists alignment-free network comparison. Approaches of this type "simply" aim to quantify the similarity between different networks by comparing their overall topological properties (e.g., degree distributions or graphlet-based properties), ignoring in the process node correspondence between the networks and without aiming to identify conserved edges or subgraphs [84][85][86][87][88]. On the other hand, network alignment aims to find good node correspondence between networks that leads to highly similar conserved network regions. Thus, network alignment and alignment-free network comparisons have very different goals. Our focus is on network alignment.
Also, there is another type of network comparison, called network querying, which is more related to network alignment. Network querying typically evaluates whether a small query subnetwork (e.g., a simple path or a treelike structure) exists as an exact subgraph of a larger target network, and if so, it identifies such a query-target subnetwork match. Prominent approaches of this type adopt a "color-coding" idea and ensure a confidence level of the resulting subnetwork match [89][90][91][92]. Unlike network querying, network alignment compares networks of Table 1 Overview of prominent network alignment approaches arbitrary sizes, and also, it typically searches for an inexact subnetwork match between networks. Again, our focus is on network alignment.
In the following sections, we discuss the different types of existing prominent network alignment approaches. After we contrast the different approaches, we discuss existing measures that are used to evaluate alignment quality of the approaches. Next, we discuss very recent innovative directions that question the traditional view of the network alignment problem. Further, we discuss key biological applications of network alignment. Finally, we present open research questions in comparative biological network research that are expected to enhance personalized health care via improved understanding of cellular functioning, disease, and therapeutics.

Local network alignment Approaches for local network alignment
We first discuss pairwise and then multiple local network aligners.
PathBLAST [93] aligns two PPI networks to identify their conserved pathways. An alignment graph is first built, in which a node represents a pair of putative orthologs (one from each network), and an edge represents a conserved interaction. Highest-scoring paths are then searched for through the alignment graph with a dynamic programming approach based on the degree of protein sequence similarity and the interaction quality. In the process, gaps and mismatches are allowed to account for evolution variations and experimental errors in pathway structure.
NetworkBLAST [46] is an extension of PathBLAST that aims to identify not just simple linear pathways (as Path-BLAST does) but also more complex network structures, e.g., dense functional modules or protein complexes. It does so by identifying high-scoring seeds in the alignment graph (which is similar to PathBLAST's) and extending around the seeds in a greedy fashion.
MaWISh [48] is a pairwise local aligner modeling evolution (conservation and divergence) of protein interactions. In the alignment graph, evolutionary information is encoded into edge weights through the concepts of matches, mismatches, and duplication. MaW-ISh addresses network alignment as a maximum weight induced subgraph problem. Intuitively, it greedily grows a subgraph starting from a seed with maximum gain with respect to the cost function; a bad move (negative gain) is allowed in order to bypass a poor local optimum.
NetAligner [54] is a pairwise aligner featuring pathway-to-interactome, complex-to-interactome, and interactome-to-interactome alignments. Also, it is able to perform both inter-and intra-species alignment of networks of arbitrary topology. NetAligner constructs an initial alignment graph and searches for connected components in this graph to be used as seeds. Then, nodes from different seeds, which are disconnected in the initial alignment graph as they do not conserve edges, are now connected if they conserve indirect interactions (i.e., three-node paths) in one or both input networks. Finally, NetAligner searches again for connected components in such extended alignment graph, which are its output.
AlignNemo [52] is a recent pairwise local aligner capable of handling sparse network data. It first uses the concept of a weighted alignment graph, in which nodes represent pairs of orthologs from different species (just as in the previous methods' alignment graphs), but edges are now weighted via a scoring strategy that accounts not only for directly conserved interactions but also for indirect interactions. So, the more paths connecting the two nodes and the more paths going through both nodes, the greater the edge weight. Then, a seed-and-extend strategy is used on the alignment graph to find relatively dense groups of nodes (i.e., proteins that have more interactions among themselves than with the rest of the network), which are AlignNemo's output.
AlignMCL [53] is another recent pairwise local aligner that is robust to the choice of networks to be aligned. It is based on Markov clustering (MCL), a known graph clustering algorithm that simulates random walks using Markov chains iteratively. AlignMCL first builds a weighted alignment graph the same way as AlignNemo. Next, it applies MCL to this graph to identify conserved protein modules.
Graemlin 1.0 [47] is an early multiple aligner. Based on a phylogenetic tree of species whose networks are being aligned, it uses a "progressive alignment" strategy by performing successively pairwise alignments of the closest network pairs. It first finds with a seed-and-extend strategy a pairwise alignment of the two closest species based on their phylogenetic relationship. Then, it transforms the resulting alignment together with unaligned nodes from the two networks into a new network for use in the next phase of the progressive alignment.
NetworkBLAST-M [94] is also a multiple aligner. It works with a novel representation of multiple networks, a layered alignment graph, in which each layer corresponds to a network and putative orthologs from different layers which are connected by inter-layer edges. NetworkBLAST-M then uses a seed-and-extend strategy to identify a high-scoring alignment from the layered alignment graph. Seeds come from a set of k-spines (a k-spine is a connected subgraph of size k with each node coming from a different layer) generated based on either identical topologies or underlying phylogeny. NetworkBLAST-M performs an expansion around the seed by iteratively adding to the alignment a k-spine that contributes the most to the current score, until no k-spine can be added or the alignment size exceeds the limit.

Alignment quality measures for local network alignment
A variety of approaches have been used to assess the quality of a local alignment, which is the set of conserved network modules. Generally, the resulting modules are compared against known protein complexes or other functional units to evaluate their overlap. Then, intuitively, the more conserved network modules there are that have large overlap with known protein complexes, the better the alignment quality. Popular biological alignment quality measures in the context of local network alignment, which quantify this notion of "network moduleprotein complex" overlap, are discussed below. We note that evaluation of topological alignment quality (e.g., of the amount of edges that are conserved by the alignment) is not common when it comes to local network alignment. This is because local aligners result in manyto-many node mappings, whereas edge conservation is typically defined with one-to-one node mapping in mind, and thus, it is not clear how to measure topological alignment quality of many-to-many aligners [38]. Plus, local network aligners are more biologically motivated, meaning that they are aimed at mapping protein complexes across networks, whereas global aligners are more mathematically motivated, aiming to solve a modification of the subgraph isomorphism problem.
1. Gene ontology (GO) [95] semantic similarity. GO semantic similarity aims to assess to what extent the mapped (i.e., conserved) network modules from different species are functionally related. First, GO semantic similarity is computed between each two proteins in the given module. This can be done in many ways, e.g., by averaging GO semantic similarities across the proteins' associated GO term pairs [52,53]. Then, semantic similarity of an entire module can be computed by averaging the resulting similarities over all mapped protein pairs within the module. Finally, the score of an alignment can be computed by summarizing the results over all mapped network modules, thus assigning a single GO semantic similarity score to the alignment. The higher the GO semantic similarity, the better the alignment quality. This measure was used by AlignNemo [52] and AlignMCL [53]. 2. Detection of known complexes. Given a conserved module produced by a local aligner and a known protein complex, precision is the percentage of proteins in the conserved network module that are also present in the protein complex, recall is the percentage of proteins in the protein complex that are also in the network module, and F -score is the harmonic mean of precision and recall aiming to reconcile the two mutually contradicting measures [52,54]. Then, the statistics can be summarized over all modules in the alignment. The higher the values of these measures, the better the alignment quality. These measures were used by NetAligner [54], AlignNemo [52], and AlignMCL [53]. 3. Specificity and sensitivity. For each species, specificity is the percentage of functionally coherent conserved modules, i.e., modules that are statistically significantly enriched in a given GO term, out of all conserved modules. The statistical significance of the enrichment is typically done with respect to the hypergeometric test, corrected for multiple hypothesis testing [96]. Sensitivity is the number of distinct GO terms that are statistically significantly enriched in the given local alignment, i.e., in its conserved network modules. The higher the values of these two measures, the better the alignment quality. These measures were used by NetworkBLAST [94], NetworkBLAST-M [94], and Graemlin 1.0 [47].

Summary of local network aligners: which one to use?
Of the pairwise local aligners, PathBLAST [93], Network-BLAST [46], and MaWISh [48] are among the earliest algorithms. When applied to real-world PPI networks, these pioneering aligners revealed existing as well as novel functional modules, many of which would not have been identified from sequence alignment alone. Path-BLAST has been obsolete for a while now and has not been evaluated against any recent local network aligner. NetAligner was shown to be better than NetworkBLAST when identifying known functional modules in terms of precision and recall (Fig. 3a), perhaps due to NetAligner being able to handle sparse complexes, while Net-workBLAST could only identify dense complexes [54]. AlignNemo was shown to outperform both MaWISh and NetworkBLAST when detecting protein complexes, while performing comparably to NetAligner (Fig. 3b) [52]. AlignMCL was shown to generate more of high-quality conserved network modules that match known complexes well compared to NetAligner and MaWISh (Fig. 3c) [53]. Unlike the above pairwise approaches, Graemlin 1.0 [47] and NetworkBLAST-M [94] are capable of aligning multiple networks. When compared against each other, NetworkBLAST-M was shown to outperform Graemlin 1.0 in terms of specificity and sensitivity ( Table 2) [94].
The above results are overall conclusions. Importantly, the superiority of an aligner also depends on the choice of the network data (e.g., synthetic versus real-world networks, binary Y2H versus co-complex AP/MS networks, etc.) as well as alignment quality measure and evaluation framework. Therefore, we recommend a new pairwise local aligner to be compared against AlignNemo and AlignMCL and a new multiple local aligner to be compared against NetworkBLAST-M.

Global network alignment Approaches for global network alignment
We classify prominent global network aligners into three groups according to their algorithmic design. The first group of methods employ a two-step approach: 1) use a cost function to compute pairwise similarities between nodes in different networks and 2) use an alignment strategy to rapidly identify from all possible alignments the highest-scoring alignment with respect to the total similarity over all aligned nodes [38-40, 55, 57, 59, 60, 63, 67, 74-76, 78]. Although a typical cost function aims to compute topological similarities between nodes, most of the global network aligners allow for the integration of sequence information into the node cost function. A typical alignment strategy uses the precomputed node similarity matrix resulting from the above step 1 to iteratively produce an alignment, and it does not allow for updating the matrix while creating an alignment. However, nodes that are already aligned at a given iteration of the alignment strategy might convey valuable information for guiding the remaining iterations of the strategy. Therefore, it could be desirable to update the initial node similarity matrix resulting from step 1 in each iteration of step 2. Motivated by this, the second group of recent global network aligners allows for iteratively updating the node similarity matrix while producing the alignment [68], or they employ an alternative similar idea [58,64,77]. Finally, the third group of very recent network aligners NetworkBLAST, and MaWISh, and c MaWISh, NetAligner, and AlignMCL, with respect to precision, recall, or F-score of "aligned network module-known protein complex" matches. In a, precision and recall are shown when aligning networks of human (H) and yeast (Y), where the order of networks (i.e., H/Y versus Y/H) plays a role. The statistical significance of the performance difference between the two methods is indicated with an asterisk. In b, results are shown only for conserved network modules with more than six nodes, when aligning yeast and fly networks. In the lower left, precision versus recall is shown for each conserved module, represented as a circle whose radius is proportional to the size of the module. In the lower right, F-score distributions are shown for each method. Finally, at the top right, percentages are shown to quantify how well the given method's conserved modules match known protein complexes. In c, each point represents an alignment. The position of a point on y-axis is determined by the number of modules (or solutions) conserved under the given alignment that match known protein complexes with F-score (i.e., F-index) above 0.5. The position of a point on x-axis is determined by the number of modules conserved under the given alignment that have semantic similarity (SS) scores above 0.3; semantic similarity of a conserved module quantifies functional homogeneity of proteins within the module. a, b, and c are adopted from [52,54] and [53], respectively proposes a novel view of the network alignment problem (see below for details) [25,69,72,73]. Next, we discuss these three approach groups.

Two-step global network aligners
Traditional global network aligners employ the above two-step approach to produce alignments, and they mostly differ in which For a given measure, the superior of the two methods is indicated in bold. The table is adopted from [94] node cost function and alignment strategy they use, as follows.
IsoRank has its pairwise version [55] and its multiple version [57]. Both exploit the idea of Google's PageRank algorithm [97] to define the same cost function. Intuitively, two nodes from different networks are similar if their network neighbors are similar, where the neighbors are similar if their own neighbors are similar, and so on. In each of the two IsoRank versions, the alignment strategy aligns in one-to-one fashion nodes between different networks greedily with respect to the cost function. Iso-Rank has further evolved into IsoRankN [59], a different multiple network aligner that uses the same cost function as IsoRank, but that uses a different, spectral graph theoretic alignment strategy to produce a many-to-many alignment. Intuitively, this alignment strategy is similar to PageRank-Nibble algorithm [98], and it finds dense clusters of nodes from multiple networks to produce aligned clusters, where each cluster can contain multiple nodes from the same network.
The GRAAL family of pairwise aligners [38][39][40], developed in parallel with the IsoRank family, uses graphlet (or small induced subgraph [84,99]) counts to compute mathematically rigorous topological node similarity scores [100][101][102]. Intuitively, two nodes are a good match if their extended network neighborhoods are topologically similar with respect to the graphlet counts. It is the alignment strategies of the GRAAL family members that are different. The alignment strategy of the original GRAAL [38] is a greedy seed-and-extend approach, while the alignment strategy of H-GRAAL [39] is an optimal approach that aims to solve the maximum weight bipartite matching problem using Hungarian algorithm [44]. More recent MI-GRAAL [40] combines alignment strategies of GRAAL and H-GRAAL to further improve alignment quality. In parallel to MI-GRAAL, C-GRAAL [63] has appeared, whose node cost function is based on the idea of shared network neighbors rather than graphlet counts. Its alignment strategy is a seed-and-extend approach.
More recent and also pairwise GHOST [60] uses "spectral signatures" to compute node similarities. GHOST's alignment strategy is similar to MI-GRAAL's, except that MI-GRAAL solves a linear assignment problem by taking into account the similarities between nodes in different networks, while GHOST heuristically solves a quadratic assignment problem by also taking into account the similarities between nodes within the same network.
Similar to IsoRank, SPINAL [67] computes the similarity between two nodes based on the confidence that their neighbors can be matched well. This is done via an iterative approach that stops once a converged node similarity matrix is obtained. Given the node similarities, the aligner uses a seed-and-extend approach to produce an alignment.
The remaining aligners in this two-step category are multiple rather than pairwise approaches. SMETANA [74] bases its node cost function on a semi-Markov random walk model. Its alignment strategy uses a greedy approach to produce aligned node clusters. BEAMS [75] uses protein sequence similarity as its node cost function. Then, given k networks, its alignment strategy constructs a kpartite node similarity graph, identifies a set of disjoint cliques from the similarity graph that maximizes the number of conserved edges between each pair of cliques, and repeatedly merges the cliques to form aligned node clusters until the alignment score (with respect to edge conservation) can no longer be maximized. NetCoffee's [76] cost function is based on the likelihood that two given proteins (from different networks) are topologically conserved. Its alignment strategy constructs a weighted bipartite graph for each pair of networks, searches for candidate edges from each bipartite graph (i.e., candidate aligned node pairs) by solving maximum weight bipartite matching problem, and finally produces an alignment over all network pairs using a simulated annealing-based approach guided by an objective function; the objective function is a measure of the summation of candidate edge weights, where the weight is defined by the node cost function. FUSE [78] bases its cost function on a non-negative matrix tri-factorization technique. Given k networks, its alignment strategy constructs a weighted k-partite graph and solves in approximate fashion the maximum weight k-partite matching problem to produce one-to-one multiple network alignment.

Iterative global network aligners
The following approaches allow for updating node similarity scores computed with respect to the given cost function during each iteration of the alignment strategy, or they employ a similar idea of iteratively improving the alignment as it is built via an optimization strategy.
NETAL [68] is a pairwise aligner that iteratively recomputes the similarity between two currently unaligned nodes based on the current alignment and the expected number of conserved interactions incident to the two nodes if the nodes were to be aligned. GraphM [58] is a pairwise aligner that employs a gradient ascent-based iterative approach guided by an objective function to iteratively find a high-scoring alignment with respect to the objective function. GraphM uses two variations of the objective function. The first variation is based on the overall protein sequence conservation in the alignment. The second variation is based on both the sequence and edge conservation in the alignment. Motivated by the mathematical foundations of NATALIE [62], NATALIE 2.0 [64] formulates the network alignment problem as a quadratic assignment problem (similar to GHOST) that is then generalized into an integer linear programming problem. Given the NP-completeness of the latter, the aligner adapts a Lagrangian relaxation approach to solve the problem using a subgradient optimization.
Unlike the above three pairwise one-to-one aligners, GEDEVO-M [77] is a multiple (also one-to-one) aligner.
It generalizes the concept of graph edit distance (GED) between two networks (which is the minimum number of edge insertions and deletions needed to transform one network into another) into GED for multiple networks, in order to solve multiple network alignment problem by iteratively optimizing a GED-based objective function.
Novel views of network alignment Recently, two major drawbacks have been recognized with the current view of the network alignment problem, as follows.

Mix-and-match-based network aligners.
Recall that many of the existing global network aligners rely on the two-step algorithmic idea: node cost function and alignment strategy. Most of them have their own cost functions and alignment strategies (see above). As a result, when a network aligner is found to be superior to another, it is not clear whether this superiority comes from the first aligner's cost function, its alignment strategy, or both. So, to fairly evaluate different two-step approaches, one should compare their different node cost functions under the same alignment strategy, for each alignment strategy, as well as their different alignment strategies under the same node cost function, for each node cost function [25,71]. This way, one can properly evaluate which node cost function or alignment strategy is superior. Also, in the process, the combination of cost function of one method and alignment strategy of another method could outperform each original method.
Motivated by this, recent efforts have been made to mix-and-match node cost functions and alignment strategies of prominent existing network aligners, in order to perform a fair evaluation of their two algorithmic components (Fig. 4) [25,71]. In particular, by comparing the three intuitively similar node cost functions of IsoRank family, GRAAL family, and GHOST, according to which two nodes are similar if their extended network neighborhoods are similar (see above), it was established that Fig. 4 Fair evaluation of two-step global network aligners. To fairly evaluate two aligners, one should mix and match their node cost functions (CFs) and alignment strategies (ASs) and compare the different CFs under the same AS, and vice versa the GRAAL family's graphlet-based node similarity measure is superior to those of IsoRank's family as well as GHOST [25,71]. On the other hand, which alignment strategy is superior depends on the choice of data set or evaluation criteria. But nonetheless, in each of the two efforts [25,71], novel superior network aligners have been proposed that combine cost function of one method and alignment strategy of another method, confirming that it is important to properly and fairly evaluate the two-step approaches as described above.

Edge-focused network aligners.
Traditional network aligners discussed so far (with exception of BEAMS and GraphM) identify from possible alignments the highscoring alignments with respect to the overall node similarity (or node conservation). However, the accuracy of the alignments is then evaluated with some other measure that is different than the node similarity used to construct the alignments. Typically, one measures the amount of conserved edges. Thus, the traditional methods align similar nodes between networks hoping to conserve many edges (after the alignment is constructed!).
Instead, MAGNA [69] has recently been proposed that directly optimizes edge conservation while the alignment is being constructed, without decreasing in the process the quality of node mapping. Intuitively, this aligner exploits the idea of a genetic algorithm to crossover via a novel mathematical concept two parent alignments into a superior child algorithm. MAGNA simulates the population of alignments that evolves over time for as long as allowed by computational resources and allows the fittest (highest scoring with respect to edge conservation) alignments to proceed to the next generation. Importantly, the initial population of alignments can consist of either random alignments or alignments from existing methods. Thus, MAGNA can work on top of the alignments of the existing methods to further improve their quality. But importantly, MAGNA improves upon the existing network alignment methods (that optimize node conservation rather than edge conservation) even when run on top of random alignments. MAGNA was recently extended into MAGNA++ framework [82], in order to simultaneously optimize both node and edge conservation, which further improves alignment quality. Further, MAGNA++ features a user-friendly graphical interface for domain (e.g., biological) scientists while also offering source code for easy extensibility by computational scientists.
Another edge-focused alignment effort is WAVE [73], which is not a complete aligner per se, but instead, a novel iterative alignment strategy that can be used on top of any existing node cost function. Importantly, just as MAGNA++, WAVE aims to optimize both node and edge conservation during the alignment process, unlike previous alignment strategies. For this, it uses a novel measure of edge conservation that (unlike existing measures that treat each conserved edge the same) weighs each conserved edge so that edges with highly similar end nodes (with respect to the cost function) are favored. Using WAVE on top of established node cost functions has led to superior alignments compared to the existing methods that optimize only node conservation or only edge conservation or that treat each conserved edge the same.
In parallel to WAVE, GREAT [72] has appeared, which just like WAVE optimizes both node and edge conservation and also weighs each conserved edge to favor conserved edges that are topologically similar over conserved edges that are topologically dissimilar. Unlike WAVE, GREAT approaches the network alignment problem from a novel perspective, by aligning well edges between networks first in order to improve the node cost function needed to then align well nodes between the networks. GREAT, the edge-based network aligner, outperforms fairly comparable node-based network aligners. Also, it improves upon the most recent state-of-the-art methods that aim to optimize node conservation only or edge conservation only or that treat each conserved edge the same.
We note an alternative novel view of the network alignment problem. Namely, unlike the other network aligners, PINALOG [65] first detects clusters (dense subnetworks) in the input networks, aligns the clusters between the networks, and finally aligns nodes within the aligned clusters using a seed-and-extend approach.

Alignment quality measures for global network alignment
Unlike local network alignment that is typically evaluated only biologically (see above), global network aligners are evaluated both topologically and biologically. Depending on whether a global alignment is pairwise or multiple, given the difference in their input (aligned node pairs versus aligned node clusters), different measures of alignment quality are used for the different approach categories, as follows.

Alignment quality measures for pairwise network aligners
Recall that all pairwise global aligners are also one-to-one in nature. Intuitively, a good network aligner should match well nodes between aligned networks, conserve many edges, and find a large common connected subgraph. With this in mind, the following topological quality measures are widely used by pairwise one-to-one aligners.
1. Node correctness (NC). NC of an alignment is the percentage of nodes in the smaller network (in terms of the number of nodes) that are correctly aligned (according to the ground truth node mapping) to nodes in the larger network [38]. Unlike other measures discussed below, NC is applicable only when the actual ground truth node mapping between networks is known, which is rarely the case for real-world networks. Thus, NC is typically computed on synthetic network data, when aligning a network to its noisy counterparts obtained by, e.g., randomly adding or rewiring a percentage of edges in the original network [38-40, 60, 69]. The higher the NC score, the better the alignment quality. This measure was used by GRAAL [38], H-GRAAL [39], MI-GRAAL [40], GHOST [60], NETAL [68], MAGNA [69], GREAT [72], and WAVE [73], as well as in a follow-up study on fair evaluation of existing aligners [71]. 2. Symmetric substructure score (S 3 ). S 3 measures the amount of edge conservation between two aligned networks [69]. This measure has been introduced to overcome the drawbacks of two other similar measures: edge correctness (EC) [38] and induced conserved structure (ICS) [60]. Namely, EC was proposed to measure the percentage of edges from the smaller network that are mapped to edges from the larger network under the given alignment. However, EC might fail to differentiate between two alignments that one might consider to be of different topological quality, as it is defined with respect to the smaller network but not the larger one [60]. Thus, ICS was defined as the percentage of edges from the subgraph of the larger network that participates in the alignment, which are mapped to edges in the smaller network under the given alignment [60]. However, now ICS is defined with respect to the larger network but not the smaller one. That is, since EC is defined with respect to the smaller network, it penalizes the alignment for having misaligned edges in the smaller network but not in the larger network. On the other hand, since ICS is defined with respect to the larger network, it penalizes the alignment for having misaligned edge in the larger network but not in the smaller network. With this motivation, S 3 has recently been proposed to improve upon EC and ICS by penalizing for misaligned edges in both the smaller and larger networks [69]. The higher the S 3 score, the better the alignment quality. This 2014 measure was used by MAGNA [69], GREAT [72], and WAVE [73], as well as in a follow-up study on fair evaluation of existing aligners [71]. In addition, its predecessors EC and ICS were used by GRAAL [38], H-GRAAL [39], MI-GRAAL [40], GHOST [60], and NETAL [68]. 3. Size of the largest connected common subgraph (LCCS). Of two alignments with similar S 3 scores, one could expose large, contiguous, and topologically complex regions of network similarity, while the other could fail to do so. Thus, in addition to counting aligned edges, it is important that the aligned edges cluster together to form large connected subgraphs rather than being isolated. In this context, a connected common subgraph (CCS) is defined to be a connected subgraph (not necessarily induced) that appears in both networks [39,40]. The size of the largest CCS (LCCS) can be measured in terms of the number of both nodes and edges [38,40], and a new summary measure reconciling the two, which also penalizes for misaligned edges in both networks (just as S 3 does), has been proposed [69]. The higher the LCCS score, the better the alignment quality. This measure was used by IsoRank [55], GraphM [58], and most of the aligners that also used NC (see above).
The following biological quality measures are widely used by pairwise aligners.

GO correctness is the percentage of aligned protein
pairs in which the two proteins share at least k GO terms [95], out of all aligned protein pairs in which both proteins are annotated with at least k GO terms [38]. GO correctness can be computed with respect to complete GO annotation data, independent of GO evidence code. However, since many GO annotations have been obtained via sequence comparison, and since some of the aligners also use sequence information to produce their alignments, GO correctness can be biased by the sequence information. Therefore, it is highly recommended to consider only GO annotations with experimental evidence codes when computing GO correctness of an aligner that uses sequence information [38]. This measure was used by GRAAL [38], H-GRAAL [39], MI-GRAAL [40], C-GRAAL [63], GHOST [60], NETAL [68], MAGNA [69], and WAVE [73], as well as in a follow-up study on fair evaluation of existing aligners [71]. 2. GO semantic similarity. We have already discussed this measure above in the context of local network alignment. It complements GO correctness, as follows. GO correctness is a stricter measure that requires two proteins in an alignment to share one or more common GO terms. However, two proteins can still be functionally similar if they share a similar GO term, without necessarily sharing the same GO term. GO correctness would fail to identify such functional similarity between two proteins. GO semantic similarity between two proteins overcomes the limitation by taking into account the semantic similarity between GO terms that the two proteins are annotated with [103][104][105][106][107]. Just as with local aligners, GO semantic similarity between two proteins can be computed in many ways, e.g., by averaging GO semantic similarities across the proteins' associated GO term pairs [52,69,108]. Then, GO semantic similarity of an entire alignment can be computed by averaging the resulting similarities over all protein pairs in the alignment, thus assigning a single semantic similarity score to the alignment [52,69,108]. This measure was used by GHOST [60] and MAGNA [69].

Alignment quality measures for multiple network aligners
Recall that most of multiple network aligners are also many-to-many in nature, meaning that multiple nodes in one network can be mapped to multiple nodes in another network, resulting in a set of aligned node clusters. The clusters are typically non-overlapping, but a cluster can contain multiple nodes from the same network. For these reasons, different alignment quality measures are required for multiple network aligners compared to pairwise aligners [59]. Intuitively, a good multiple network aligner should produce aligned clusters such that nodes in each cluster are functionally uniform or consistent. Also, it should produce many such clusters, so that it covers as many of the nodes from the aligned networks as possible.
The following alignment quality measures are widely used for a multiple network aligner. Among them, the first one is a topological measure and the remaining ones are biological measures. (Note that an additional potential measure of topological alignment quality exists, as follows. BEAMS generalizes the concept of edge conservation from pairwise to multiple alignment [75]. Whereas BEAMS uses this measure to define its alignment score that is optimized while creating a multiple alignment, nothing prevents one to use this measure to evaluate topological quality of another aligner's multiple alignment after it is created.) 1. The larger the number of aligned clusters containing at least three nodes, the better the alignment quality. This measure was used by IsoRankN [59], as well as in a follow-up study on fair evaluation of existing aligners [25]. 2. A related measure is k-coverage, which counts the number of clusters containing proteins from k different networks. This measure was used by SMETANA [74], BEAMS [75], and FUSE [78]. 3. Exact cluster ratio is the percentage of aligned clusters in which all proteins share a GO term. The higher its value, the better the alignment quality. This measure was used by IsoRankN [59] and in a follow-up study on fair evaluation of existing aligners [25]. 4. Exact protein ratio is the percentage of all proteins that are in the exact clusters (as defined above). The higher its value, the better the alignment quality.
This measure was used by IsoRankN [59] and in a follow-up study on fair evaluation of existing aligners [25]. 5. Mean entropy of alignment. First, the entropy of an aligned cluster S * v is computed as: where p i is the percentage of all proteins in S * v that have GO term i, and d is the total number of GO terms [59]. Then, the mean entropy of the alignment is obtained by averaging entropies across all clusters in the alignment. The lower the entropy of the alignment, the higher its average within-cluster GO term consistency, and consequently, the better its biological quality. This measure was used by IsoRankN [59], SMETANA [74], BEAMS [75], NetCoffee [76], GEDEVO-M [77], and FUSE [78], as well as in a follow-up study on fair evaluation of existing aligners [25]. 6. Normalized mean entropy of alignment. First, the normalized entropy of an aligned cluster S * v is computed as: . Then, the mean normalized entropy is obtained by averaging normalized entropies across all aligned clusters. The lower the normalized mean entropy, the better its biological quality. This measure was used by IsoRankN [59], SMETANA [74], BEAMS [75], NetCoffee [76], GEDEVO-M [77], and FUSE [78], as well as in a follow-up study on fair evaluation of existing aligners [25].

Evaluating statistical significance of an alignment
When aligning networks with an approach, it is important to measure the statistical significance of the given alignment quality score. There are several approaches to achieve this. One could compute the probability of obtaining the same or better score by aligning the actual networks with a random aligner [39,48]. Additionally, one could compute the probability of obtaining the same or better score by aligning random networks with the actual approach [39]. In this context, random networks should come from an appropriate network null model (i.e., graph family), and many network null models exist [109,110].

Summary of global network aligners: which one to use?
Of the pairwise global aligners, IsoRank [55], GraphM [58], GRAAL [38], and H-GRAAL [39] are among the earliest global alignment algorithms and have by now been outperformed by the newer approaches. MI-GRAAL was shown to perform significantly better than IsoRank, GRAAL, and H-GRAAL [40] (Fig. 5). MI-GRAAL and GHOST are mostly comparable to each other [71], with slight superiority of GHOST in some contexts. NETAL was shown to be either superior or comparable to MI-GRAAL [68]. MAGNA, a recent edge-based aligner, . GHOST and MAGNA, as more recent aligners, produced even larger LCCSs (figures not shown). Namely, GHOST's LCCS, as reported in [69], has 1622 proteins and 5356 interactions. MAGNA, when run on MI-GRAAL's initial population, returns LCCS with 1660 proteins and 3401 interactions. MAGNA, when run on GHOST's initial population, returns LCCS with 1327 proteins and 4363 interactions. Note that it is not the raw node and edge counts alone that should be compared between two methods' LCCSs. Namely, a good method should find an LCCS that has many edges as well as nodes, while at the same time penalizing for edges from both the smaller and larger networks that are misaligned under the given alignment (just as S 3 does). A summary LCCS measure capturing all of this was recently proposed [69]. The resulting summary LCCS scores are 51, 54, 59, and 59 % for MI-GRAAL, GHOST, and the two MAGNA versions, respectively. Clearly, GHOST is superior to MI-GRAAL, while MAGNA is superior to GHOST (and thus to MI-GRAAL). The LCCS figures of IsoRank, GRAAL, H-GRAAL, and MI-GRAAL are adopted from the original publications [38][39][40] was shown to be superior to IsoRank, MI-GRAAL, and GHOST [69] (Fig. 6a). Newer MAGNA++ was shown to outperform MAGNA [82]. GREAT and WAVE, also very recent edge-focused pairwise aligners, have been shown to perform better than MI-GRAAL, GHOST, MAGNA, and NETAL [72] (Fig. 6b). The above results are overall conclusions. Importantly, the superiority of an aligner also depends on the choice of the network data (e.g., synthetic versus real-world networks, binary Y2H versus co-complex AP/MS networks, etc.) as well as alignment quality measure and evaluation framework. Therefore, we recommend a new pairwise global aligner to be compared against MI-GRAAL, GHOST, NETAL, MAGNA++, GREAT, and WAVE.
Of the multiple global aligners, SMETANA, BEAMS, and NetCoffee were shown to be superior to IsoRankN [74][75][76], while SMETANA, BEAMS, and NetCoffee are mostly comparable to each other [75,76]. FUSE, a recent multiple aligner, has been shown to outperform SMETANA and BEAMS with respect to most of the alignment quality measures [78] (Fig. 6c). Since again, the superiority of an aligner depends on the choice of data and alignment quality measure, we recommend a new multiple global aligner to be compared against SMETANA, BEAMS, NetCoffee, and FUSE.

Key biological implications of network alignment
Due to recent popularity of global network alignment (Table 1), our discussion in this section focuses primarily on biological results of global aligners.

Revealing conserved PPI network regions between yeast and human
Network aligners have been used to reveal conserved and unexpectedly large PPI network regions between different species, with special focus on yeast and human, as the two most complete PPI eukaryotic networks to date [111]. Figure 5 illustrates the size of LCCS between yeast and human PPI networks obtained by IsoRank, GRAAL, H-GRAAL, and MI-GRAAL. GHOST and MAGNA, being the more recent aligners, revealed even larger LCCSs [60,69].  6 Additional illustration of the performance of different global network aligners. This figure shows a the superiority of MAGNA over IsoRank, MI-GRAAL, and GHOST with respect to NC when aligning synthetic noisy yeast networks with known node mapping, and with respect to S 3 and LCCS when aligning Campylobacter jejuni and Escherichia coli bacterial PPI networks [69], b the ranking of GREAT, MI-GRAAL, GHOST, MAGNA, and NETAL over all alignments produced by the original GREAT study [72] with respect to three alignment quality measures (NC, S 3 , and LCCS) combined, demonstrating the superiority of GREAT over the other aligners [72], and c the superiority of FUSE (its best parameter version, as reported in [78]) over BEAMS (its best parameter version, as reported in [78]) and SMETANA with respect to the number of functionally consistent aligned node clusters, i.e., clusters that are enriched in a biological process (BP) or molecular function (MF) GO term [78]. The figures in a, b, and c are adopted from [69,72] and [78], respectively Importantly, conserved network regions uncovered by the aligners are typically enriched in the same biological function (this is true for both local and global aligners). For example, GRAAL has aligned a subnetwork consisting of 52 nodes between yeast and human, where 98 % yeast proteins and 67 % human proteins are involved in splicing (Fig. 7) [38]. This is highly encouraging because splicing is known to be conserved among eukaryotic species, even if they are as diverse as yeast and human [112].

Revealing evolutionary relationships between species
Several prominent network aligners aimed to uncover evolutionary relationships between species and construct the species' phylogenetic tree based on similarities between their biological networks. GRAAL and H-GRAAL were first such aligners. Since PPI network structure has subtle effects on the evolution of proteins and reasonable phylogenetic inference can only be done between closely related species [113], and since no PPI data were available for closely related species around the time GRAAL and H-GRAAL appeared, these methods focused on phylogenetic tree inference based on metabolic networks [114]. In particular, these aligners used metabolic data from KEGG [15] to reconstruct phylogenetic relationships for seven closely related organisms from the family of protist, as well as for six closely related organisms from the family of fungi [38,39]. It was encouraging that the resulting phylogenetic trees were non-random and very similar to "ground truth" trees found by sequence comparison. The fact that the networkbased trees slightly differed from those based on sequence data is not alarming, as there is no reason to believe that the sequence-based ones should a priori be considered the correct ones. This is because sequence-based phylogenetic tree inference suffers from several problems underlying sequence comparison [38,39], which is a computational way of obtaining the trees, just as network alignment is.
Around the time MI-GRAAL appeared, genomewide PPI networks became available for five closely related species from the family of herpesviruses [115]. By "genome-wide, " we mean that all possible protein pairs in each virus were tested for interactions. Thus, MI-GRAAL used these PPI network data to infer evolutionary relationships between the five herpesviruses based on their network similarities. Importantly, Fig. 7 GRAAL's second largest CCS. The second largest CCS uncovered by GRAAL when aligning PPI networks of yeast and human, consisting of 286 interactions amongst 52 proteins; each node in the CCS contains a label denoting a pair of yeast and human proteins that are aligned and each edge between two nodes means that an interaction exists in both species between the corresponding protein pairs [38] MI-GRAAL correctly reconstructed the phylogenetic tree (Fig. 8).
All of these results support the original hypothesis of GRAAL, H-GRAAL, and MI-GRAAL studies that biological network data is a valuable source of biological and evolutionary information.

Revealing new knowledge about human aging
Since the US population is on average growing older because of ∼78 million of baby boomers who began turning 65 in 2011, and since susceptibility to diseases increases with age, studying molecular mechanisms behind aging and aging-associated diseases gains importance. However, human aging is hard to study experimentally because of long lifespan and ethical constraints. Therefore, learning knowledge about human aging needs to rely on computational research. Genomic sequence alignment, a popular computational direction, has typically been used to learn about human aging by transferring the knowledge from highly studied model species to poorly studied human between conserved sequence regions. According to GenAge, one of the most trusted data sources on aging, 298 human genes are involved in the aging processes or diseases, and most of this knowledge are predictions obtained via genomic sequence comparison [116]. However, non-sequence data and genomic sequence data can give complementary biological insights. Therefore, PPI network data can be studied to elucidate aging-related knowledge missed by the current sequence-based approaches. Also, since not all genes implicated in aging in model species have sequence-based orthologs in human, restricting comparison to sequence data may limit the knowledge transfer.
Motivated by this, it was recently hypothesized that network alignment can be used to transfer the knowledge about aging from one species to another between conserved (aligned) PPI networks [25]. Indeed, it was shown that state-of-the-art network aligners at the time, MI-GRAAL and IsoRankN, as well as their mix-and-match combination, can uncover existing aging-related knowledge with statistically significantly high accuracy, in the sense that the methods align well known aging-related network parts of one species to known aging-related network parts of other species. Then, from the alignments, novel aging-related knowledge was predicted in currently unannotated network regions whenever such regions were  [115], according to a the gold standard [117,118] and b MI-GRAAL alignments of the species' PPI networks [40] aligned to known aging-related network regions. In this way, compared to 298 human aging-related genes from GenAge, additional 792 human genes were predicted as novel aging-related candidates [25]. These predictions were validated by demonstrating their topological and biological similarities to known aging-related genes, as well as via literature search. For example, they were found to be involved in aging-related biological processes and diseases, such as brain tumor, cancer, or prostate cancer.
We note that additional methods for network-based research of human aging exist, which are not aimed at across-species network comparison. For example, recently, the current static PPI network of human was integrated with aging-related gene expression data to construct dynamic, age-specific networks [24]. Then, genes whose network positions significantly changed with age were predicted as aging-related, and they were validated in similar ways as above. The value of this dynamic network approach is that it overcomes the key drawback of traditional biological network research, which deals with static network representations of the cellular functioning that changes with time (or age).

Future directions and concluding remarks
Comparative biological network research has attracted significant attention in the computational biology community. Nonetheless, despite all of the valuable existing efforts, many research questions remain to be addressed. For example, different network aligners, and even different parameter versions of the same aligner, tend to identify very different solutions. A struggle for a computational scientist is the current lack of in-depth understanding of the qualitative (rather than just quantitative) effect of method or parameter choice on the resulting output. A struggle for a biological scientist is which of the different alignment solutions to focus on for their experimental validation.
Moreover, despite the increasing availability of biological network data, the data remains noisy and incomplete, even for well-studied species. The effect of noise on the data on the resulting alignment(s) is poorly understood. On a related note, many different types of biological network data exist that capture somewhat complementary functional slices of the cell, whereas the network alignment community has focused their attention mainly on PPI networks. By developing efficient approaches for data integration as well as for alignment of the resulting heterogeneous network data, one could not only buffer noise in each individual network type but also uncover novel biological knowledge that would be missed by studying each individual data type in isolation.
Further, as we have discussed, many different types of network aligners exist that typically aim to achieve different goals and thus require different evaluation frameworks, which makes it hard to fairly compare the different methods. Perhaps future focus should shift towards development of "hybrid" approaches that inherit the best from all worlds while offering consistency in terms of method evaluation and comparison.
Also, regarding method evaluation, whereas accuracy is important, so is computational complexity. Thus, improvements in this aspect are needed to make the existing and future methods scalable to biological network data that will only continue to grow in size. To further ensure practical usefulness of a method, proper documentation of the software implementing the method, reliability of the software, and availability of a friendly graphical user interface are all critical for the method to be widely adopted, especially by biomedical domain scientists.
Given the tremendous amounts of biological network data that are being produced, network alignment will only continue to gain importance. Further advances in this research area could lead to new discoveries about the principles of life, evolution, disease, and therapeutics, and in the long run, they could facilitate advances in health care and personalized medicine.