1. Introduction: Data analytics in modern cardiology
Cardiovascular diseases (CVDs) have been, for a long time, a major global health problem that has caused more deaths than any form of cancer or respiratory disease combined. The detection and prediction of CVDs is made difficult by the numerous etiological factors, complex disease pathways, and diverse clinical presentations [
1,
2]. However, with the advent of an enhanced capability for the generation of complex high-dimensional data from electronic medical records, mobile health devices, and imaging data, one is presented with both, challenges and opportunities for data-driven discovery and research. While traditional statistical approaches for risk stratification have been broadly developed, leading to important improvements of diagnosis, prognosis and in some cases therapeutics; most of these models have limitations in terms of individualized risk prediction. Recently, data analytics, artificial intelligence and machine learning have emerged as potential solutions for overcoming the limitations of traditional approaches in the field of CVD research.
Advance analytics algorithms can have a major impact on cardiovascular disease prediction and diagnosis. CVD data however remains challenging to common machine learning and data analytics approaches due to the wide variety and large heterogeneity of the diverse cardiovascular signals currently being probed [
3,
4]. Among the several different approaches that are arising to cope with such disparate data, one that results particularly outstanding for its generality and its ability to handle data integrating diverse dynamic ranges and scales is topological data analysis [
5,
6].
Topological data analysis (TDA) is, in short, a family of analytic methods that has been gaining relevance and recognition to model, predict and understand the behavior of complex biomedical data. TDA is founded on the tenets of algebraic topology, a mathematical field that deals with the
shape of data and has a set of methods for studying it [
7]. In this review article, we want to present the fundamentals of TDA and its applications in the analysis of cardiovascular signals. We aim to make these techniques accessible to non-experts by explaining their theoretical foundations and surveying their use in computational cardiology. We also discuss the limitations of these methods and suggest possible ways to incorporate them into clinical care and biomedical informatics in the context of cardiovascular diseases.
Figure 1 presents a graphic overview of the main ideas, starting with one or several data sources on cardiovascular signals coming from medical devices, wearables, clinical monitors, electronic health records and so on. Data is used to generate data clouds that are turned into
Metric data sets that are then processed and analyzed with the tools of topological data analysis (see below) to generate homology groups, persistence diagrams and signatures, useful to classify signals towards a deeper understanding of their phenomenology.
2. Fundamentals of Topological Data Analysis
Topology is a branch of mathematics that deals with the shapes of objects. It provides a framework for understanding how objects can be deformed and still retain their essential properties. For example, a circular rubber band can be stretched into an oval, but a segment of string cannot. Topologists study the connectedness of objects by counting their number of pieces and holes, and use this information to classify objects into different categories.
A related field, algebraic topology, provides a framework for describing the global structure of a space in a precise and quantitative way. It uses methods that take into account the entire space and its objects, rather than just local information. Algebraic topology uses the concept of homology to classify objects based on their number and type of holes, and topological spaces, which consist of points and neighborhoods that satisfy certain conditions to do so. The notion of a topological space allows for flexibility in using topological tools in various applications, as it does not rely on numerical values to determine the proximity of points, but rather whether their neighborhoods overlap.
More formally, an Homology is a set of topological invariants represented by homology groups that describe the k-dimensional holes in a topological space X. The rank of (known as the kth Betti number), is analogous to the dimension of a vector space and indicates the number of k-dimensional holes. For example, corresponds to zero-dimensional features or connected components, corresponds to one-dimensional features or cycles, and corresponds to two-dimensional features or cavities. It is also possible to study for larger values of k, but it becomes more difficult to visualize the associated features.
At this stage, homology seems to be a rather abstract concept, however it can be connected in a straightforward manner to data analytics, once we recognize that one quite important property of data points: their
shape as a set. That is, how do data points are distributed in the feature’s space. One can get an approximation to this shape by looking at how do
holes are distributed in data space. Our understanding of why points accumulate in one region of data space and are missing in other regions will be a powerful tool to look for trends in the data. In order to understand the topological shape of a data set and identify its holes, it is useful to assign a topological structure to the data and calculate topological invariants. Homology groups are useful for this purpose because there are efficient algorithms for computing some of the more relevant of these invariants in the context of TDA [
8].
Homology groups hence, are used to classify topological spaces based on the topological features of their shape, such as connectedness, loops, and voids. The homology groups of a topological space are invariant under continuous deformations, meaning that if two spaces have different homology groups, then they cannot be continuously deformed into one another and are therefore topologically distinct. Homology can thus be used to distinguish between spaces that may appear to be the same from other perspectives, such as those that have the same dimension or the same symmetries.
In the context of topological data analysis (TDA), the interpretation of topological features like connectedness, loops, holes, and voids involves understanding the geometric and structural properties of the data that these features represent. Let us briefly review some of these ideas.
Connectedness: Connectedness refers to the property of data points or regions being connected in a topological space. In TDA, connectedness typically corresponds to the number of connected components in a data set. The number of connected components can provide insights into the overall structure of the data. High connectedness implies that the data is relatively well-connected, while low connectedness may indicate separate clusters or isolated data points.
Loops: Loops represent closed paths or cycles in the data. They can occur when points or regions in the data form closed curves or circles. Loops can capture repetitive or periodic patterns in the data. They are often associated with cyclic structures or data points arranged in circular or ring-like formations.
Holes: Holes correspond to empty spaces or voids in the data where there are no data points. These voids can take various shapes, including spherical voids, tunnel-like voids, or irregular voids. The presence and characteristics of holes provide information about data emptiness. They can indicate the absence of data in specific regions or reveal patterns in the data distribution, such as clustering around voids.
Voids: Voids are regions of space that lack data points. They are similar to holes but can be more generalized and may not necessarily be enclosed by data. Voids are often used to study the spatial distribution and density of data points. Large, persistent voids may suggest regions where data is scarce, while small, transient voids may highlight local fluctuations.
To interpret these topological features effectively, TDA often employs persistence diagrams or barcode diagrams. These diagrams summarize the births and deaths of topological features across a range of spatial scales, providing a quantitative way to assess the significance and persistence of these features. Here’s how persistence diagrams relate to the interpretation of topological features:
Connectedness: The number of connected components is quantified by points in the persistence diagram. Longer persistence (vertical distance from birth to death) indicates more robust connected components.
Loops: Loops are associated with features in the persistence diagram. Longer loops correspond to more persistent cyclic patterns in the data.
Holes and Voids: Holes and voids are represented by clusters of points in the persistence diagram. The position of points in the diagram indicates the spatial scale and persistence of these features.
In summary, interpreting topological features in TDA involves understanding the presence, size, and persistence of connectedness, loops, holes, and voids in your data. Persistence diagrams provide a concise visual representation of these features and their characteristics across different scales, aiding in the exploration and analysis of complex data sets. A more formal explanation of these concepts will be discussed in the next subsection.
2.1. Persistent Homology
We will discuss some of the main homology groups used for data analytics. We will start by presenting the Persistent Homology Group or Persistent Homology, PH.
PH identifies topological features of a space at different scales. Features that are consistently detected across a broad range of scales are considered more likely to be true features of the space, rather than being influenced by factors such as sampling errors or noise. To use persistent homology, the space must be represented as a simplicial complex (i.e. a set of polytopes, like points, line segments, triangles, tetrahedra and so on) and a filtration, or a nested sequence of increasing subsets, must be defined using a distance function on the space.
To clarify such abstract concepts as simplicial complex and filtration, let us consider a set of measurements of some property (or properties) of interest in terms of the associated features for each point (see
Figure 2A). We will customarily call this set a
point cloud; this represents the data. Point clouds, which as we said, are simply collections of points, do not have many
interesting topological properties
per se. However, we can analyze their topology by placing a ball of radius
around each point. This method (called filtration), allows us to encode geometric information by increasing the value of
, which determines how much the boundaries of the shape blur and expand. As
increases (
Figure 2 panels B-E), points that were initially distinct may begin to overlap, altering the concept of proximity. Essentially, this process involves taking, let’s say
an impressionist (when we partially close our eyes to reveal details, like we do to appreciate a picture from Claude Monet) look at the point cloud to give it a more defined shape.
A bit more formally, a simplicial complex is a collection of finite sets of points, called vertices, that are connected by edges, line segments connecting two vertices, and faces, which are polygons with three or more edges. The vertices, edges, and faces of a simplicial complex must satisfy certain conditions:
Every face of the complex must be a simplex, that is, a triangle or a higher-dimensional analogue of a triangle.
Every face of the complex must be a subset of one of the vertices of the complex.
If a face of the complex is a subset of another face, then the larger face must be a subset of one of the vertices of the complex.
Once we have learned to build a simplicial complex for a given scale (value of ), by changing the value of , what we are doing is creating a filtered simplicial complex (FSC). Every topological property (such as the homologies ) that persists through the FSC is a PH. Intuitively different phenomena under study will give rise to different point clouds that when analyzed via a FSC will have different PHs.
2.1.1. Building the FSH
By building the PH to a given point cloud, one aims to create a complex that approximates the original manifold using the given points. To do this, connections are established between points by adding edges between pairs of points; faces between triples of points, and so on. To determine which connections to create, we introduce a parameter called the filtration value (the we already mentioned), which limits the maximum length of the edges that can be included in our simplices. We vary and build the complex at each value, calculating the homology of the complex at each step.
There are three main strategies for using
to assign simplices: the
Vietoris-Rips strategy, the
witness strategy, and the
lazy-witness strategy [
9]. The Vietoris-Rips strategy adds an edge between two points if their distance is less than
and a face between three points if their pairwise distance is less than
, and so on. This approach is accurate but computationally expensive. The witness strategy uses two sets of points, called
landmark points and
witness points, to create the complex. Landmark points are used as vertices and edges are added between two landmark points if there is a witness point within distance
of both points, and faces are added if there is a witness point within
of all three points, and so on. The lazy-witness strategy is similar to the witness strategy in terms of how edges are assigned, but simplices of higher order are added anywhere there are
n points that are all connected by edges.
2.1.2. Calculating the PH
Once we have chosen a filtration, it is possible to calculate the homology groups (the
’s) of each space in the filtration. Homology groups are a way of describing the topological features of a space, such as connected components, holes, and voids. Depending on the particular task, we may choose a maximum value of
k to build the first
k homology groups. Then we can use these homology groups to create a
barcode or
persistence diagram, which shows how the topological features of the space change as the scale changes [
9].
To calculate persistent homology, it is possible to use a variety of algorithms, such as the already mentioned Vietoris-Rips complex, the Čech complex, or the alpha complex. These algorithms construct a simplicial complex from the data, which can then be used to calculate the homology groups of the space.
Calculating persistent homology and interpreting the results is a non-trivial task. Several issues need to be considered and decisions need to be taken in every step of the process. We can summarize the process as follows:
-
Simplicial Complex Construction: - Begin by constructing a simplicial complex from your data. This complex can be based on various covering maps, such as the Vietoris-Rips complex, Čech complex, or alpha complex, depending on the chosen strategy (See
section 2.4 and
section 2.5, as well as
Table 1 below).
- The simplicial complex consists of vertices (0-simplices), edges (1-simplices), triangles (2-simplices), and higher-dimensional simplices. The choice of the complex depends on your data and the topological features of interest.
-
Filtration: - Introduce a filtration parameter (often denoted as ) that varies over a range of values. This parameter controls which simplices are included in the complex based on some criterion (e.g., distance threshold).
- As increases, more simplices are added to the complex, and the complex evolves. The filtration process captures the topological changes as varies.
-
Boundary Matrix: - For each value of in the filtration, compute the boundary matrix (also called the boundary operator) of the simplicial complex. This matrix encodes the relations between simplices.
- Each row of the boundary matrix corresponds to a (k-1)-dimensional simplex, and each column corresponds to a k-dimensional simplex. The entries indicate how many times a (k-1)-dimensional simplex is a face of a k-dimensional simplex.
-
Persistent Homology Calculation: - Perform a sequence of matrix reductions (e.g., Gaussian elimination) to identify the cycles and boundaries in the boundary matrix.
- A cycle is a collection of simplices whose boundaries sum to zero, while a boundary is the boundary of another simplex.
- Persistent homology focuses on tracking the birth and death of cycles across different values of . These births and deaths are recorded in a persistence diagram or barcode.
-
Persistence Diagram or Barcode: - The persistence diagram is a graphical representation of the births and deaths of topological features (connected components, loops, voids) as varies.
- Each point in the diagram represents a topological feature and is plotted at birth (x-coordinate) and death (y-coordinate).
- Interpretation:
- A point in the upper-left quadrant represents a long-lived feature that persists across a wide range of values.
- A point in the lower-right quadrant represents a short-lived feature that exists only for a narrow range of values.
- The diagonal represents features that are consistently present throughout the entire range of values.
- The distance between the birth and death of a point in the diagram quantifies the feature’s persistence or lifetime. Longer persistence indicates a more stable and significant feature.
-
Topological Summaries: - By examining the persistence diagram or barcode, you can extract information about the prominent topological features in your data set.
- Features with longer persistence are considered more robust and significant.
- The number of connected components, loops, and voids can be quantified by counting points in specific regions of the diagram.
2.2. The Mapper algorithm
Mapper is a recently developed algorithm that provides a reliable tool for topological data analysis. Mapper allows researchers to identify and visualize the structure of a data set by creating a graph representation of the data [
10].
The Mapper algorithm consists in the following steps [
11] ((see
Figure 3):
-
Covering the data set: The original data set (
Figure 3 a) is partitioned into a number of overlapping subsets, called
nodes (
Figure 3 b). This is accomplished using a function called the
covering map. The covering map assigns each point in the data set to a node. Since the nodes are allowed to overlap, every point potentially belongs to multiple nodes.
There are several different ways to define a covering map. The choice of covering map, however, can significantly affect the resulting Mapper graph. Some common approaches to define a covering map include:
- (a)
Filtering: The data set is partitioned based on the values of one or more variables. A data set may, for instance, be partitioned based on the values of a categorical variable, such as gender or race.
- (b)
Projection: Data set partitioning is performed by calculating the distance between points in the data set and using it as a membership criteria. This can be done using a distance function, such as the Euclidean distance or the cosine similarity.
- (c)
Overlapping intervals: The data set is partitioned into overlapping intervals, such as bins or quantiles. This can be useful for data sets that are evenly distributed or those having a known underlying distribution.
The choice of covering map depends on the characteristics of the data set and the research question being addressed. It is important to choose a covering map that is appropriate for the data set and that will yield meaningful results.
Clustering the nodes: The nodes are then clustered using a clustering algorithm, such as
k-means or single-linkage clustering. The resulting clusters (
Figure 3 c) represent the topological features of the data set, and the edges between the clusters represent the relationships between the features.
Figure 3.
The steps of Mapper algorithm. a) The data, a cloud of points. b) The projection of the data into a lower dimension space. c) The preimage is clustered and d) A graph is built based on the clustered groups. See texts
Figure 3.
The steps of Mapper algorithm. a) The data, a cloud of points. b) The projection of the data into a lower dimension space. c) The preimage is clustered and d) A graph is built based on the clustered groups. See texts
The resulting graph (
Figure 3 d), called a
Mapper graph, can be used to identify patterns and relationships in the data set that may not be apparent from other forms of visualization [
12].
2.3. Multidimensional scaling
In the context of topological data analysis, multidimensional scaling (MDS) is a method to visualize the relationships between a set of complex objects as projected in a lower-dimensional space. MDS works by creating a map of the objects in which the distance between said objects reflects (to a certain point) the dissimilarity between them [
13]. MDS is often used along other techniques, like clustering, to analyze patterns in data sets that have a large number of variables. Multidimensional scaling can help identify relationships between objects that are not immediately apparent: hence it is useful to visually explore complex data sets [
14]. There are several different algorithms for performing MDS, including classical MDS, nonmetric MDS, and metric MDS.
Classical MDS is the most common method (see
Figure 4). In a nutshell, we start with a set of
n points in a space of high dimension (
m), then we introduce a measure of similarity (or dissimilarity), for instance a distance (such as the Euclidean distance), then we have a square symmetric matrix with the
pairwise distances, MDS is attained by performing Principal Coordinate Analysis (i.e. eigenvalue decomposition) of such matrix. The result is a set of lower dimensional coordinates for the
n points. Hence, classic MDS is based on the idea of preserving the pairwise distances between objects in the projected low-dimensional map. Classical MDS finds the map that best preserves the distances between objects using an optimization algorithm. The Nonmetric MDS method is similar to classical MDS, but it does not assume that the dissimilarities between objects are metric –i.e. represented by a continuous scale–; instead, it preserves the rank order of the dissimilarities between objects, instead of the absolute values. Metric MDS, conversely, is a variant of classical MDS based on
stress minimization, that is, by considering the difference between the distances in the low-dimensional map and the dissimilarities in the data set. This method is used when the dissimilarities between objects can be represented by a continuous scale.
In general, classical MDS is the most widely used method, but nonmetric MDS and metric MDS may be more appropriate in certain situations.
2.3.1. How to determine the scaling approach?
The choice between classical MDS, nonmetric MDS, and metric MDS depends ultimately on the characteristics of the data and the specific research question. Some general guidelines are as follows:
-
Classical MDS:
- When to Use: Classical MDS is suitable when you have metric (distance) data that accurately represents the pairwise dissimilarities between objects. In classical MDS, the goal is to find a configuration of points in a lower-dimensional space (usually 2D or 3D) that best approximates the given distance matrix.
- Pros: - It preserves the actual distances between data points in the lower-dimensional representation.
- It provides a faithful representation when the input distances are accurate.
- It is well-suited for situations where the metric properties of the data are crucial.
- Cons: - It assumes that the input distances are accurate and may not work well with noisy or unreliable distance data.
- It may not capture the underlying structure of the data if the metric assumption is violated.
-
Nonmetric MDS:
- When to Use: Nonmetric MDS is appropriate when you have ordinal or rank-order data, where the exact distances between data points are not known, but their relative dissimilarities or rankings are available. Nonmetric MDS finds a configuration that best preserves the order of dissimilarities.
- Pros: - It is more flexible than classical MDS and can be used with ordinal data.
- It can handle situations where the exact distances are uncertain or difficult to obtain.
- Cons: - It does not preserve the actual distances between data points, so the resulting configuration is only an ordinal representation.
- The choice of a monotonic transformation function to convert ordinal data into dissimilarity values can affect the results.
-
Metric MDS:
- When to Use: Metric MDS can be used when you have data that is inherently non-metric, but you believe that transforming it into a metric space could reveal meaningful patterns. Metric MDS aims to find a metric configuration that best approximates the non-metric dissimilarities.
- Pros: - It provides a way to convert non-metric data into a metric representation for visualization or analysis.
- It can help identify relationships in the data that may not be apparent in the original non-metric space.
- Cons: - The success of metric MDS depends on the choice of the transformation function to convert non-metric data into metric distances.
- It may not work well if the non-metric relationships in the data are too complex or cannot be adequately approximated by a metric space.
In summary, the choice between classical, nonmetric, and metric MDS depends on the nature of your data and the goals of your analysis. If you have accurate metric data and want to preserve the actual distances, classical MDS is appropriate. If you have ordinal data or uncertain dissimilarity measures, nonmetric MDS may be more suitable. Metric MDS can be considered when you want to convert non-metric data into a metric space for visualization or analysis, but it requires careful consideration of the transformation function.
2.4. Choosing the covering map
When choosing a covering map in TDA, there are several characteristics of the data sets which are relevant to consider, among these we can mention the following:
Data Dimensionality: The dimensionality of the data under consideration is crucial. Covering maps should be chosen to preserve the relevant topological information in the data. For high-dimensional data, dimension reduction techniques may be applied before selecting a covering map.
Noise and Outliers: The presence of noise and outliers in the data can affect the choice of a covering map. Robust covering maps can help mitigate the influence of noise and outliers on the topological analysis.
Data Density: The distribution of data points in the feature space matters. A covering map should be chosen to account for variations in data density, especially if there are regions of high density and regions with sparse data.
Topological Features of Interest: It is important to consider the specific topological features one is interested in analyzing. Different covering maps may emphasize different aspects of the data topology, such as connected components, loops, or voids. The election of a covering map should align with the particular research objectives.
Computational Efficiency: The computational complexity of calculating the covering map should also be taken into account. Some covering maps may be computationally expensive, which can be a limiting factor for large data sets.
Continuous vs. Discrete Data: Determine whether the data you are analyzing is continuous or discrete. The choice of a covering map may differ based on the nature of the data.
Metric or Non-Metric Data: Some covering maps are designed for metric spaces, where distances between data points are well-defined, while others may work better for non-metric or qualitative data.
Geometric and Topological Considerations: Think about the geometric and topological characteristics of your data. Certain covering maps may be more suitable for capturing specific geometric or topological properties, such as persistence diagrams or Betti numbers.
Domain Knowledge: Incorporate domain-specific knowledge into your choice of a covering map. Understanding the underlying structure of the data can guide you in selecting an appropriate covering map.
Robustness and Stability: Assess the robustness and stability of the chosen covering map. TDA techniques should ideally produce consistent results under small perturbations of the data or variations in sampling.
In practice, there are various covering maps and TDA algorithms available, such as Vietoris-Rips complexes, Čech complexes, and alpha complexes. The choice of covering map should be guided by a combination of these factors, tailored to the specific characteristics and goals of your data analysis. It may also involve some experimentation to determine which covering map best captures the desired topological features.
Different types of covering maps are thus best suited for different kinds of data. Some of the main covering maps used in TDA are presented in
Table 1.
Ultimately, the choice of covering map depends on the specific characteristics of your data, such as dimensionality, metric properties, and the topological features of interest. It’s often beneficial to experiment with different covering maps and parameters to determine which one best captures the desired topological information for a particular data set. Additionally, combining multiple covering maps and TDA techniques can provide a more comprehensive understanding of complex data sets.
2.5. Different strategies for topological feature selection
The Vietoris-Rips strategy, the witness strategy, and the lazy-witness strategy are some of the best known TDA methods to capture and analyze the topological features of data sets. Each of these strategies has its own advantages and disadvantages.
2.5.1. Vietoris-Rips (VR) Strategy:
The main advantage of the VR strategy is its simplicity, since the VR complex is relatively easy to understand and implement. It connects data points based on a fixed distance threshold, which is quite intuitive. Another advantage of the VR lies on its widespread use, for there is a significant body of literature and software implementations available. It works well with data in metric spaces where distances between points are well-defined.
Among the disadvantages are that VR is quite sensitive to parameters: The choice of the distance threshold (radius parameter) can significantly impact the topology of the resulting complex. Selecting an appropriate threshold can be challenging and may require prior knowledge of the data. VR can be also challenging for its computational burden: Constructing the VR complex can be computationally expensive, especially for large data sets or high-dimensional data. VR is also limited in terms of robustness: The VR complex is sensitive to noise and outliers, and small perturbations in the data can lead to significant changes in the complex topology.
2.5.2. Witness Strategy (WS):
The witness strategy (WS) in turn, is more robust to noise and outliers compared to the VR complex. It selects a subset of witness points that can capture the topology of the data more effectively. WS is more flexible, witness complexes can be applied to both metric and non-metric data, making them versatile for various data types and are able to handle data with varying sampling densities, making them suitable for irregularly sampled data sets.
Implementing the WS, however can be more involved than the VR complex, as it requires selecting witness points and computing their witness neighborhoods. Also, while witness complexes are more robust, they still depend on parameters like the number of witness points and the witness radius. Choosing appropriate parameters can indeed be a non-trivial task.
2.5.3. Lazy-Witness Strategy (LW):
The LW strategy is an optimization of the witness strategy that reduces computational cost. It constructs the witness complex on-the-fly as needed, potentially saving memory and computation time. Like the WS, the LW strategy is robust to noise and outliers.
In spite of these advantages, there are also shortcomings: Implementing the LW strategy can be more complex than the basic witness strategy, as it requires careful management of data structures and computational resources. While it can be more memory-efficient than precomputing a full witness complex, the LW strategy still consumes memory as it constructs the complex in real time. This may still be a limitation for very large data sets.
In summary, the choice between the Vietoris-Rips strategy, witness strategy, and lazy-witness strategy depends on the specific characteristics of your data and the computational resources available. The Vietoris-Rips complex is straightforward but sensitive to parameter choice and noise. The witness strategy offers improved robustness but may require more effort in parameter tuning. The lazy-witness strategy combines robustness with some memory and computation efficiency, making it a good choice for large data sets. Experimentation and a deep understanding of your data characteristics are essential when selecting the most appropriate strategy for your TDA analysis.