Graph representation learning is a technique that aims to learn low-dimensional vector representations of nodes and edges in graph data. Its overarching objective is to encode the structural and semantic information of graph data in a compact form that can be used for various downstream tasks. In this section, we introduce graph embedding methods and graph neural network methods. In the end, some other methods were introduced in an expanded manner.
3.2.1. Graph Embedding
The objective of graph embedding is to find a low-dimensional vector representation of a high-dimensional graph while preserving the connectivity and interactions between nodes in the graph, which typically retains some key information about the nodes in the original graph. We categorize these graph embedding methods into four major types: matrix factorization-based methods, manifold learning-based methods, random walk-based methods and deep learning-based methods.
Matrix factorization-based methods. A common graph embedding method is matrix factorization, which learns the low-dimensional representation of nodes by dissecting the adjacency matrix or laplacian matrix of the graph. The graph factorization (GF) [
52] learns the low-dimensional representation of nodes in a graph by decomposing the graph’s adjacency matrix or Laplacian matrix. While maintaining the advantages of unsupervised text embedding, Predictive Text Embedding (PTE) [
53] learns features from labeled data while using both labeled and unlabeled data for representation learning. Matrix factorization is used by both GraRep [
54] and HOPE [
55] to extract the higher-order neighborhood information from the graph. However, GraRep constructs the higher-order proximity matrix by concatenating k-hop transition probability matrices, while HOPE constructs the higher-order proximity matrix by measuring the pairwise similarity between nodes in subgraphs of varying sizes and orders. By generating edge representations and learning heterogeneous metrics, HEER [
56] decomposes the network into low-dimensional embeddings and captures both structural and semantic information. A joint embedding matrix can be constructed by HERec [
57] using meta-path-guided sampling to capture semantic and structural similarities between nodes of different types, which can then be factorized into low-dimensional embeddings.
Manifold learning-based methods. In high-dimensional spaces, the sparsity and noise of graph data can pose challenges for model training and lead to overfitting. Manifold learning methods address this issue by mapping high-dimensional graph data to a lower-dimensional manifold space, which preserves the local properties and structural information of the original data while reducing its dimensionality. The main idea of isometric mapping (IsoMap) [
58] is to preserve the shortest path between data points to maintain the manifold structure. Both locally linear embedding (LLE) [
59] and local tangent space alignment (LTSA) [
60] are local information manifold learning methods. A local linear reconstruction is carried out in the neighborhood of each data point by LLE to maintain the flow structure. LTSA preserves the manifold structure by aligning the data in each point’s local tangent space. Laplacian eigenmaps (LE) [
61] and hessian eigenmaps (HE) [
62] are both manifold learning methods based on spectral analysis. Specifically, LE utilizes the Laplacian matrix to perform local weight assignment on the data’s neighborhood for preserving the manifold structure, while HE models the curvature and geometry of the data by leveraging the Hessian matrix. t-Distributed stochastic neighbor embedding (t-SNE) [
63] uses a probabilistic approach to model the similarity between points in high-dimensional space and maps them to low-dimensional space while preserving pairwise similarities. Both uniform manifold approximation and projection (UMAP) [
64] and t-SNE use graph layout algorithms to arrange data in low-dimensional space, which makes them very similar. The distinction is that UMAP builds a low-dimensional representation using an optimization technique based on the nearest neighbor graph.
Random walk-based methods. Inspired by the word vector model in the field of NLP, the researchers consider that the nodes on the graph can be analogized to words and the node sequences to sentences, and then the node sequences can be generated by random walk sampling on the graph to learn the representation vectors of the nodes using the Skip-gram model. DeepWalk [
65] is the origin of the random walk approach, employing a simple, unbiased random walk followed by skip-gram representation learning of the walk sequences. Similarly, node2vec [
66] is a biased random walk that introduces two parameters to regulate the search strategy of random walks, aiming to achieve a better balance between capturing graph structural features and node similarity features and improving the quality of node representations. By maximizing the likelihood functions of first-order proximity and second-order proximity, large-scale information network embedding (LINE) [
67] learns the embedding of nodes. Additionally, LINE employs negative sampling strategies to boost training effectiveness. Walklets [
68], which concentrates on higher-order structural information, splits joint random walk sequences of various lengths to obtain structural data on nodes at various scales. The similarity of the structural roles of nodes in the network is taken into account by struct2vec [
69], not just the probability of contribution between nodes. Traditional node embedding techniques typically employ a co-occurrence probability-based methodology, which makes it difficult to adequately capture the semantic and structural information between nodes in heterogeneous information networks. By limiting the direction of random walk, meta-paths direct the generation of node sequences related to particular meta-paths, thereby capturing semantic and structural information between nodes. Metapath2vec [
70] uses skip-gram to learn node embeddings after defining meta-paths to describe the semantic relationships between nodes and create heterogeneous neighborhoods of nodes. Using the defined meta-paths, HIN2vec [
71] also creates diverse node neighborhoods. The crucial distinction is that it reconstructs the neighborhood using a self-encoder model and learns the embedding vector of the nodes by minimizing the reconstruction error. GATNE [
72] combines the ideas of graph attention network (GAN) and neighborhood aggregation embedding (NAE). It employs the GAT model for node representation learning in heterogeneous graphs and the NAE method for node neighborhood aggregation. Setting the proper number of walks and walk lengths is crucial when learning node embedding in various heterogeneous networks.
Deep learning-based methods. Traditional machine learning techniques rely heavily on manually created feature representations, which have a priori knowledge restrictions and limit the expressiveness of the models. Deep learning techniques can enhance data representation by learning multi-level feature representations and reducing the need for manual features. To encode and decode the nodes and obtain the details of the graph structure, SDNE [
73] employs a deep self-encoder architecture. First- and second-order proximity are simultaneously optimized, and overfitting is prevented by sparse regularization. The advantage of SDNE is that it learns the relationships between the nodes while retaining the graph’s structural information, improving the graph’s representation. DNGR [
74] directly obtains graph structural information using a random walk model and learns node representations using stacked denoising autoencoders. The nonlinear features of the graph can be learned using these two techniques, but they do not perform well on non-Euclidean graphs. To encode the text, HNE [
75] employs recursive neural networks, which can model the hierarchical structure of the text. Both image and text encoders are optimized using a joint training approach to learn better matching. A deep-aligned autoencoder-based embedding method for heterogeneous graphs was introduced by BL-MNE [
76] to embed different types of nodes into the same low-dimensional space. TADW [
77] learns low-dimensional embedding representations of network nodes using DeepWalk and latent dirichlet allocation (LDA) methods, and it introduces the attention mechanism to weigh the nodes’ textual information. Three different attention mechanisms—node attention, attribute attention, and neighbor attention—are introduced by the LANE [
78] method, which adaptively adjusts the weights on different embedding layers to better capture the similarities and differences between nodes and improve the quality of embedding. ASNE [
79] creates supernodes by combining the structural and attribute information of the nodes, employs an attention-based framework to regulate the weights between the different supernodes in the embedding space, and employs adaptive sparse methods to learn the representation of the nodes. DANE [
80] can simultaneously capture highly nonlinear node attributes and topological relationships while preserving their proximity. The expressiveness and generalization performance of the node representation are improved by ANRL [
81], which uses adaptive neighborhood regularization to dynamically adjust the weights of each node’s neighbors using the relationships between neighboring nodes.
3.2.2. Graph Neural Network-Based Methods
Currently, GNNs have emerged as a prominent research field in graph representation learning, especially in handling non-Euclidean data. Their capacity to effectively process such data makes them a suitable choice for various real-world applications. GNNs differ from traditional neural network methods in that they can effectively model the interactions between nodes and edges, thereby exhibiting superior performance in tasks related to the analysis and prediction of graph-structured data. Drawing upon prior research [
82,
83], we present a taxonomy of graph neural networks in this section, organized into five distinct categories based on their architectural design and unique approaches to processing structured graph data: GCNs, GATs, GAEs, GGANs, and GPNs.
GCNs. GCNs define a convolution operation on a graph that aggregates the features of a node’s neighboring nodes and the node itself to generate a new node feature. Graph convolution methods can be classified into spectral methods and spatial methods. In graph data, the relationships between nodes are irregular. GCN [
84] introduces learnable convolutional parameters to extract and learn features from graph data. By stacking multiple graph convolutional layers, GCN obtains more abundant and complex node feature representations. DGCN [
85] employs distinct graph convolution operations on the primal and dual graphs to obtain node representations on both graphs. To address the issue of traditional graph neural networks requiring the definition of static adjacency matrices in advance, AGCN [
86] introduces adaptive adjacency matrices and gate mechanisms, avoiding the computational burden brought by multiple convolutional operations. LGCN [
87] utilizes a learnable graph convolutional layer (LGCL) and a subgraph training technique for handling large-scale graph data, thereby circumventing the computational limitations of traditional graph convolutional operations that necessitate computing the entire graph. In addition, FastGCN [
88] was proposed to address the issue of high computational and memory overheads in traditional graph convolutional methods by introducing importance sampling and mini-batch training. The GraphSAGE [
89] approach is considered a significant milestone in the field of graph neural networks. It uses a sampler to randomly sample subgraphs from the graph data that contain the target node, enabling inductive representation learning and prediction on unseen nodes. GIN [
90] incorporates a global sorted pooling operation in its aggregation process, endowing the GIN method with a certain degree of graph isomorphism invariance, thereby avoiding the mapping of isomorphic graphs to different vector representations. APPNP [
91] uses the personalized PageRank algorithm to achieve personalized propagation and prediction of nodes and utilizes a larger receptive field for node classification tasks.
GATs. To address the issue of information aggregation in traditional graph neural networks, GATs introduced attention mechanisms [
46] that enable different weighting of neighboring nodes around each node, thus better capturing the relationships between nodes. The attention mechanism in GAT [
92] adaptively calculates the weights between nodes and further incorporates multi-head attention mechanisms to better capture the structural information of graphs and effectively handle large-scale graph data. AGNN [
93] utilizes an attention mechanism to dynamically learn the relationships between each node and its neighboring nodes, thereby enhancing the model’s performance. DySAT [
94] computes node representations through joint self-attention along the two dimensions of the structural neighborhood and temporal dynamics, more effectively capturing the relationships and features in dynamic graph data. GaAN [
95] introduced an adaptive receptive field mechanism to dynamically adjust the neighborhood size of each node, enabling the receptive field size to adaptively fit the density and distribution of its surrounding nodes, thus enhancing the model’s flexibility and adaptability to different graph structures. HAN [
96] utilizes distinct attention mechanisms to model different types of nodes and edges, learning complex relationships between nodes and the structural information of heterogeneous graphs by adaptively computing weights and representations of nodes. MAGNA [
97] adopts a multi-hop attention mechanism that utilizes diffusive priors on attention values to consider all paths between non-connected node pairs, enabling the dynamic capture of complex relationships between nodes. High-order attention mechanisms and adversarial regularization constraints are employed by GCAN [
98] to fully utilize both low-order and high-order information of nodes for representation learning.
GAEs. The encoder-decoder architecture has also been widely applied in graph generation and reconstruction tasks. GAE [
99] initially employs this architecture by utilizing an encoder to compress the raw graph data into low-dimensional vectors and a decoder to decode these low-dimensional vectors back into the original graph. The structures of both the encoder and decoder are composed of multiple layers of GCN. VGAE [
99] is an extension of GAE that introduces a variational inference method on top of GAE, enabling VGAE to learn more robust and interpretable low-dimensional representations. GraphVAE [
100] adopts a similar structure to VAE, where the encoder transforms the graph information into a vector that approximates a normal distribution, and the decoder then transforms this vector back into a new graph. Graphite [
101] utilizes graph neural networks to parameterize a variational autoencoder and employs a novel iterative graph refinement strategy for decoding. Graph2Gauss [
102] builds upon VGAE by modeling node representations using Gaussian distributions and improving the distance metric to better capture complex category structures. In addition, DNVE [
103] introduces deep generative models into graph autoencoders, embedding the network through deep encoders and decoders. Contrastive learning has also been applied to graph embedding methods. For example, DGI [
104] uses a mutual information maximization method to learn node representations, which can learn high-order relationships between nodes without reconstructing the input graph. Another approach, InfoGraph [
105], learns graph-level representations by maximizing the mutual information between the graph-level representation and representations of substructures at different scales within the graph. There is also a novel approach called MaskGAE [
106], which combines the Masked Graph Model (MGM) with the graph autoencoder to achieve self-supervised graph representation learning and capture node features and structural information in the graph.
GGANs. GGANs are a type of generative model based on adversarial training that extends conventional generative adversarial networks (GANs) [
107] and can be used for generating and learning embedded representations of graph data. The generator network can generate realistic graph data, while the discriminator network can evaluate the quality of the generated graph data. Through adversarial training between the two networks, GGANs can continuously optimize their ability to generate and learn graph data. An earlier method, GraphGAN [
108], uses GAN to learn graph embedding representations and employs a loss function based on pairwise similarity to preserve the continuity of the embedding space. ARVGA [
109] uses adversarial regularization to regularize node embeddings in the embedding space, thereby improving the continuity and robustness of embedding representations. By introducing an additional regularization term in adversarial training, ANE [
110] enhances existing graph embedding methods by treating the prior distribution as real data and embedding vectors as generated samples. NetRA [
111] is a network representation learning method based on an encoder-decoder framework and adversarial training. It encodes and decodes random walks on nodes, utilizes adversarial training to regularize embeddings, and introduces a prior distribution to improve the model’s generalization performance. In addition, there is an implicit generative model called NetGAN [
112], which is used to simulate real-world networks. It uses GAN to train a generator and utilizes random walk loss in the graph to learn node-level representations. MolGAN [
113] is an implicit generative model used for generating small molecule graphs. It combines GAN and policy gradient reinforcement learning methods to improve the quality and diversity of generated molecular graphs.
GPNs. GPNs use two operations, graph pooling and graph convolution, to process input graph data, with the main goal of reducing the size of the graph data and extracting global features. A general differentiable pooling operation called DiffPool [
114] has been proposed, which can optimize node representations and pooling matrices through backpropagation to better capture the hierarchical structure of graph data. Zhang et al. [
115] proposed a novel SortPooling operation that sorts each node’s feature vector along with the node’s degree and then uses the sorted result as the pooling result to better capture the global features of graph data. A method different from other graph pooling methods is SAGPool [
116], which dynamically selects a subset of nodes using self-attention scores without the need to specify a fixed-size node subset. This approach enables more accurate learned graph representations that can adapt to different graph structures. Diehl et al. [
117] proposed a pooling method called EdgePool, which relies on the edge contraction concept to learn localized and sparse hard pooling transformations. This method calculates importance scores for each edge to select representative edges and compress them together to form new nodes.
In addition, research on embedding methods for non-Euclidean spaces, contrastive learning for graph neural networks, and feature processing is worth noting. Corso et al. [
118] proposed a neural distance embedding model, NeuroSEED, which embeds biological sequences into hyperbolic space for improved capture of hierarchical structure. A simple graph contrastive learning framework called SimGRACE [
119] was proposed, which does not require data augmentation and enhances the robustness of graph contrastive learning using an adversarial scheme. Tang et al. [
120] proposed a novel unsupervised feature selection method, MGF²WL, which combines multiple image fusion and feature weight learning to address poor-quality similarity graphs and designed an effective feature reconstruction model. Sun et al. Sun et al. [
121] proposed a feature expanded graph neural network model FE-GNN that utilizes feature subspace flattening and structural principal components to expand the feature space and improve model performance.
We also summarize the representation learning method implementations reviewed in the paper in Table 1, most of which are official implementations.
Table 1.
Summary of Representation Learning Methods.
Table 1.
Summary of Representation Learning Methods.
Category |
Method Name |
Code Link |
Neural network-based language model representation learning
|
Word2vec [37] |
https://code.google.com/archive/p/word2vec/ |
|
Doc2vec [38] |
https://nbviewer.org/github/danielfrg/ word2vec/blob/main/examples/doc2vec.ipynb |
|
GloVe [39] |
https://nlp.stanford.edu/projects/glove/ |
|
FastText [40] |
https://github.com/facebookresearch/fastText |
|
Bi-LSTM [42] |
- |
|
ELMo [43] |
https://allenai.org/allennlp/software/elmo |
|
Transformer [46] |
https://github.com/tensorflow/tensorflow |
|
BERT [50] |
https://github.com/google-research/bert |
Graph representation learning
|
|
|
Graph embedding |
GF [52] |
- |
|
PTE [53] |
https://github.com/mnqu/PTE |
|
GraRep [54] |
https://github.com/ShelsonCao/GraRep |
|
HOPE [55] |
http://git.thumedia.org/embedding/HOPE |
|
HEER [56] |
https://github.com/GentleZhu/HEER |
|
HERec [57] |
https://github.com/librahu/HERec |
|
IsoMap [58] |
https://github.com/scikit-learn/scikit-learn/blob/ main/sklearn/manifold/_isomap.py |
|
LLE [59] |
- |
|
LTSA [60] |
- |
|
LE [61] |
- |
|
HE [62] |
- |
|
t-SNE [63] |
https://lvdmaaten.github.io/tsne/ |
|
UMAP [64] |
https://github.com/lmcinnes/umap |
|
DeepWalk [65] |
https://github.com/phanein/deepwalk |
|
node2vec [66] |
https://github.com/aditya-grover/node2vec |
|
LINE [67] |
https://github.com/tangjianpku/LINE |
|
Walklets [68] |
https://github.com/benedekrozemberczki/
walklets |
|
struct2vec [69] |
https://github.com/leoribeiro/struc2vec |
|
Metapath2vec [70] |
https://ericdongyx.github.io/metapath2vec/
m2v.html |
|
HIN2vec [71] |
https://github.com/csiesheep/hin2vec |
|
GATNE [72] |
https://github.com/THUDM/GATNE |
|
SDNE [73] |
https://github.com/suanrong/SDNE |
|
DNGR [74] |
https://github.com/ShelsonCao/DNGR |
|
HNE [75] |
- |
|
BL-MNE [76] |
- |
|
TADW [77] |
https://github.com/thunlp/tadw |
|
LANE [78] |
https://github.com/xhuang31/LANE |
|
ASNE [79] |
https://github.com/lizi-git/ASNE |
|
DANE [80] |
https://github.com/gaoghc/DANE |
|
ANRL [81] |
https://github.com/cszhangzhen/ANRL |
Graph neural network |
GCN [84] |
https://github.com/tkipf/gcn |
|
DGCN [85] |
https://github.com/ZhuangCY/DGCN |
|
AGCN [86] |
https://github.com/yimutianyang/AGCN |
|
LGCN [87] |
https://github.com/divelab/lgcn |
|
FastGCN [88] |
https://github.com/matenure/FastGCN |
|
GraphSAGE [89] |
https://github.com/williamleif/GraphSAGE |
|
GIN [90] |
https://github.com/weihua916/powerful-gnns |
|
APPNP [91] |
https://github.com/gasteigerjo/ppnp |
|
GAT [92] |
https://github.com/PetarV-/GAT |
|
AGNN [93] |
- |
|
DySAT [94] |
https://github.com/aravindsankar28/DySAT |
|
GaAN [95] |
https://github.com/jennyzhang0215/GaAN |
|
HAN [96] |
https://github.com/Jhy1993/HAN |
|
MAGNA [97] |
https://github.com/xjtuwgt/GNN-MAGNA |
|
GCAN [98] |
- |
|
GAE [99] |
https://github.com/tkipf/gae |
|
VGAE [99] |
https://github.com/tkipf/gae |
Graph neural network |
GraphVAE [100] |
https://github.com/snap-stanford/GraphRNN/ tree/master/baselines/graphvae |
|
Graphite [101] |
https://github.com/ermongroup/graphite |
|
Graph2Gauss [102] |
https://github.com/abojchevski/gra
ph2gauss |
|
DNVE [103] |
- |
|
DGI [104] |
https://github.com/PetarV-/DGI |
|
InfoGraph [105] |
https://github.com/fanyun-sun/Info
Graph |
|
MaskGAE [106] |
https://github.com/EdisonLeeeee/Mas
kGAE |
|
GraphGAN [108] |
https://github.com/hwwang55/GraphGAN |
|
ARVGA [109] |
- |
|
ANE [110] |
- |
|
NetRA [111] |
https://github.com/chengw07/NetRA |
|
NetGAN [112] |
https://github.com/danielzuegner/ne
tgan |
|
MolGAN [113] |
https://github.com/nicola-decao/Mol
GAN |
|
DiffPool [114] |
https://github.com/RexYing/diffpool |
|
SortPooling [115] |
https://github.com/muhanzhang/DGCNN |
|
SAGPool [116] |
https://github.com/inyeoplee77/SAG
Pool |
|
EdgePool [117] |
- |
Others |
NeuroSEED [118] |
https://github.com/gcorso/NeuroSEED |
|
SimGRACE [119] |
https://github.com/mpanpan/SimGRACE |
|
MGF²WL [120] |
- |
|
FE-GNN [121] |
https://github.com/sajqavril/Featur e-Extension-Graph-Neural-Networks |