Quantum and Quantum-Inspired Stereographic K Nearest-Neighbour Clustering

Ark Kumar Modi; Alonso Viladomat Jasso; Roberto Ferrara; Janis Noetzel; Christian Deppe; Maximilian Schädler; Fred Fung

doi:10.20944/preprints202305.0051.v3

Submitted:

15 September 2023

Posted:

19 September 2023

You are already at the latest version

Abstract

Nearest-neighbour clustering is a simple yet powerful machine learning algorithm that finds natural application in the decoding of signals in classical optical-fibre communication systems. Quantum k-means clustering promises a speed-up over the classical k-means algorithm; however, it has been shown to currently not provide this speed-up for decoding optical-fibre signals due to the embedding of classical data, which introduces inaccuracies and slowdowns. Although still not achieving an exponential speed-up for NISQ implementations, this work proposes the generalised inverse stereographic projection as an improved embedding into the Bloch sphere for quantum distance estimation in k-nearest-neighbour clustering, which allows us to get closer to the classical performance. We also use the generalised inverse stereographic projection to develop an analogous classical clustering algorithm and benchmark its accuracy, runtime and convergence for decoding real-world experimental optical-fibre communication data. This proposed `quantum-inspired' algorithm provides an improvement in both the accuracy and convergence rate with respect to the k-means algorithm. Hence, this work presents two main contributions. Firstly, we propose the general inverse stereographic projection into the Bloch sphere as a better embedding for quantum machine learning algorithms; here, we use the problem of clustering quadrature amplitude modulated optical-fibre signals as an example. Secondly, as a purely classical contribution inspired by the first contribution, we propose and benchmark the use of the general inverse stereographic projection and spherical centroid for clustering optical-fibre signals, showing that optimizing the radius yields a consistent improvement in accuracy and convergence rate.

Keywords:

Quantum K-Means

;

Quantum Machine Learning

;

Quantum Computing

;

K-Means Clustering

;

6G Communication

;

Quadrature Amplitude Modulation

;

Quantum-Classical Hybrid Algorithms

;

Quantum-Inspired Algorithms

Subject:

Physical Sciences - Quantum Science and Technology

1. Introduction

Quantum Machine Learning (QML), using quantum algorithms to learn quantum or classical systems, has attracted much research in recent years, with some algorithms possibly gaining an exponential speedup [1,2,3]. Since machine learning routines often push real-world limits of computing power, an exponential improvement to algorithm speed would allow for such systems with vastly greater capabilities [4]. Google’s `Quantum Supremacy’ experiment [5] showed that quantum computers can naturally solve specific problems with complex correlations between inputs that can be incredibly hard for traditional (“classical”) computers. Such a result suggests that machine learning models executed on quantum computers could be more effective for specific applications. It seems quite possible that quantum computing could lead to faster computation, better generalisation on less data, or both, for an appropriately designed learning model. Hence, it is of great interest to discover and model the scenarios in which such a “quantum advantage” could be achieved. A number of such “Quantum Machine Learning” algorithms are detailed in papers such as [2,6,7,8,9]. Many of these methods claim to offer exponential speedups over analogous classical algorithms. However, some significant gaps exist between theoretical prediction and implementation on the path from theory to technology. These gaps result in unforeseen technological hurdles and sometimes misconceptions, necessitating more careful case-by-case studies such as [10].

It is known from the literature that the k-nearest-neighbour clustering algorithm (kNN) can be applied to solve the problem of phase estimation in optical fibres [11,12]. A quantum version of this kNN has been developed in [2], promising an exponential speedup. However, the practical usefulness of this algorithm is under debate [13]. The encoding of classical data into quantum states has been proven to be a complex task which significantly reduces the advantage of known quantum machine learning algorithms [13]. There are claims that the speedup is reduced to only polynomial once the quantum version of the algorithm takes into account the time taken to prepare the necessary quantum states. Furthermore, for near-intermediate scale quantum (NISQ) [3] applications, we should not expect the availability of QRAM, as this assumes reliable memories and operations which are still several milestones out of reach [14]. For this reason, it is not currently possible to use the fully quantum clustering algorithm and thus we resort to using hybrid quantum-classical kNN algorithms. Any classical implementation of kNN clustering involves, among other steps, repeated evaluations of a dissimilarity and and a loss function; changing the dissimilarity leads to a different clustering. A hybrid quantum classical kNN clustering algorithm utilizes quantum methods only to estimate the dissimilarity, eliminating the need for long-lasting quantum memories. However, reproducing the dissimilarity of a classical kNN algorithm using quantum methods can be prohibitively restrictive. The quantum dissimilarity also depends on the embedding (how the classical data is encoded in quantum states) and might only approximates the classical one, introducing fundamental deviations from the classical kNN algorithm. In [10], we applied a hybrid quantum-classical algorithm with modified angle embedding to the problem of k-means clustering for 64-QAM (Quadrature Amplitude Modulation) optical-fibre data (a well-known technical problem in signal processing through optical-fibre communication links) provided by Huawei [15], and show that this currently does not yield an advantage due to both the embedding and the current speed and noise of quantum devices.

In this work, we use the same problem and datasets to bring two main but independent contributions using the generalised inverse stereographic projection. First, we embed classical 2-dimensional data by computing the ISP onto the 3-dimensional sphere, and use the resulting normalised vector as Bloch vector to produce a pure quantum state of one qubit, which we call stereographic embedding. The resulting quantum dissimilarity directly translates into the cosine dissimilarity, thus making the quantum algorithm mathematically closer to the classical k-means algorithm. This means that no inherent limitation is introduced by the embedding and any loss in performance of this hybrid algorithm can be compensated for by improving the noise level and the speed of the quantum device. We thus propose stereographic embedding as an improved quantum embedding that may lead to improvement in several quantum machine learning algorithms (although there might still not be a practical quantum time advantage).

The second contribution comes from the benchmarking of the hybrid stereographic quantum mentioned above. Since, as already mentioned, the resulting hybrid clustering algorithm is mathematically equivalent to a classical `quantum-inspired’ kNN algorithm, in order to assess its performance in the absence of noise we simply test the equivalent classical quantum-inspired kNN algorithm. This algorithm is the result of first computing the ISP of the data and then performing clustering using a novel `quantum’ centroid update. We observe an increase in accuracy and convergence performance over k-means clustering on the 2-dimensional optical-fibre data. This suggests, as a purely classical second main contribution, that an advantage in decoding 64-QAM optical-fibre data is achieved by performing clustering in the inverse stereographically projected sphere and by using the spherical centroid.

The paper is structured as follows. In the remainder of this introduction, we discuss related works and our contribution to it. In Section 2, we introduce the experimental setup generating the 64-QAM optical-fibre transmission data and define clustering, the stereographic projection and the necessary quantum concepts for the hybrid protocols. Next, Section 3 introduces the developed Stereographic Quantum kNN (SQ-kNN), while Section 4 defines the developed quantum-inspired 2D Stereographic Classical kNN (2DSC-kNN) algorithm and proves its equivalence to the SQ-kNN quantum algorithm. In Section 5, we describe the various experiments for testing the algorithms, present the obtained results, and discuss their conclusions. We end the main text in Section 6 proposing some directions for future research, some of which are partially discussed in the appendix.

1.1. Related Work

A unifying overview of several quantum algorithms is presented in [16] in a tutorial style. An overview targeting data scientists is given in [17]. The idea of using quantum information processing methods to obtain speedups for the k-means algorithm was proposed in [18]. In general, neither the best nor even the fastest method for a given problem and problem size can be uniquely ascribed to either the class of quantum or classical algorithms, as seen in the detailed discussion presented in [9]. The advantages of using local (classical) processing units alongside quantum processing units in a distributed fashion are discussed in [19]. The accuracy of (quantum) K-means has been demonstrated experimentally in [20] and in [21], while quantum circuits for loading classical data into a quantum computer are described in [22].

An algorithm is proposed in [2] that solves the problem of clustering N-dimensional vectors to M clusters in

O (log (M N))

time on a quantum computer, which is exponentially faster than

O (poly (M N))

time for the (then) best known classical algorithm. The approach detailed in [2] requires querying the QRAM [23] for preparing a `mean state’, which is then used to find the inner product between the centroid (by default the mean point) using the SWAP test [24,25,26]. However, there exist some significant caveats to this approach. Firstly, this algorithm achieves an exponential speedup only when comparing the bit-to-bit processing time with the qubit-to-qubit processing time. If one compares the bit-to-bit execution times of both algorithms, the exponential speedup disappears, as shown in [13,27]. Secondly, since stable enough quantum memories do not exist, a hybrid quantum-classical approach must be used in real-world applications. Namely, all the information must be stored in classical memories, and the states to be used in the algorithm are prepared in real time. The process of preparing quantum states from classical data is known as `Data Embedding’ since we are embedding the classical data into quantum states. This, as mentioned before [4,27], slows down the algorithm to only a polynomial advantage over classical k-means. However, we propose an approach whereby this step of embedding can be treated as a data pre-processing step, allowing us to achieve some advantages in accuracy and convergence rate, taking a step towards making the quantum approach more viable. Instead of using a quantum algorithm, classical alternatives mimicking their behaviour, collectively known as quantum-inspired algorithms, have shown much promise in achieving classically some types of advantage that are demonstrated by quantum algorithms [4,27,28,29], but as [9] remarks, the massive increase in runtime with rank, condition number, Frobenius norm, and error threshold make the algorithms proposed in [13,27] impractical for matrices arising from real-world applications. This observation is supported by [30].

Recent works such as [27] suggest that even the best QML algorithms, without state preparation assumptions, fail to achieve exponential speedups over their classical counterparts. In [4], it is pointed out that most QML algorithms are incomparable to classical algorithms since they take quantum states as input and output quantum states, and that there is no analogous classical model of computation where one could search for similar classical algorithms. In [4], the idea of matching state preparation assumptions with

ℓ^{2}

-norm sampling assumptions (first proposed in [27]) is implemented by introducing a new input model, sample and query access (SQ access). In [4], the Quantum K-Means algorithm described in [2] is `de-quantised’ using the `toolkit’ developed in [27], i.e. a classical quantum-inspired algorithm is given that, with classical SQ access assumptions replacing quantum state preparation assumptions, matches the bounds and runtime of the corresponding quantum algorithm up to polynomial slowdown. From the works [4,27,31], we can conclude that the exponential speedups of many quantum machine learning algorithms that are under consideration arise not from the `quantumness’ of the algorithms but instead from strong input assumptions, since the exponential part of the speedups vanish when classical algorithms are given analogous assumptions. In other words, in a wide array of settings, these algorithms do not give exponential speedups but rather yield polynomial speedups on classical data.

The fundamental aspect that allowed for the exponential speedup in [27] is exemplified by the problem of recommendation systems. The philosophy of classical recommendation algorithms before this breakthrough was to estimate all the possible preferences of a user and then suggest one or more of the most preferred objects. A quantum algorithm in [8] promised an exponential speedup but provided a recommendation without estimating all the preferences; namely, it only provided a sample of the most preferred objects. This process of sampling, along with state preparation assumptions, was, in fact, what gave the quantum algorithm an exponential advantage. The new classical algorithm obtains comparable speedups also by only providing samples rather than solving the whole preference problem. In [4], it is argued that the time taken to create the quantum state should be included for comparison since the time taken is not insignificant; it is also claimed that for every such linear algebraic quantum machine learning algorithm, a polynomially slower classical algorithm can be constructed by using the binary tree data structure described in [27]. Since then, more sampling algorithms have shown that multiple quantum exponential speedups are not due to the quantum algorithms themselves but due to the way data is provided to the algorithms and how the quantum algorithm provides the solutions [4,31,32,33]. Notably, in [33] it is argued that there exist competing classical algorithms for all linear algebraic subroutines and thus for many quantum machine learning algorithms. However, as pointed out in [9] and proven in [30], significant caveats exist to these aforementioned results of quantum-inspired algorithms. The polynomial factor in these algorithms often contains a very high power of the rank and condition number, making them suitable only for sparse low-rank matrices. Matrices of real-world data are often relatively high in rank and hence unfavourable for such sampling-based quantum-inspired approaches. Whether such sampling algorithms can be used also highly depends on the specific application and whether or not samples of the solution instead of the complete data are suitable. It should be pointed out that quantum machine learning algorithms generally do not provide an advantage if such complete data is needed.

The method of encoding classical data into quantum states contributes to the complexity and performance of the algorithm. An extensive analysis and testing of the hybrid quantum-classical implementation of the quantum k-means algorithm using angle embedding can be found in [10]. In this work, the use of the ISP is proposed. Others have explored this procedure [34,35,36] as well; however, the motivation, implementation, and use vary significantly, as well as the procedure for embedding data points into quantum states. There has also been no extensive testing of the proposed methods, especially not in an industry context. In our method, we exclusively use pure states from the Bloch sphere since this reduces the complexity of the application. 3 assures that our method with existing quantum techniques is applicable for nearest neighbour clustering. In contrast, the density matrices of mixed states and the normalised trace distance between the density matrices are used for binary classification in [34,35]. A crucial thing to consider here is to distinguish the contribution of the ISP from the quantum effects. We will see in Section 5 that the ISP seems to be the most important contributing factor. In [37], it is also proposed to encode classical information into quantum states using the ISP in the context of quantum generative adversarial networks. Their motivation for using the ISP is due to the fact that it is injective and can hence be used to uniquely represent every point in the 2D plane without any loss of information. On the other hand, angle embedding loses all amplitude information due to the normalisation of all points. A method to transform an unknown manifold into an n-sphere using ISP is proposed in [38] - here, however, the property of their concern was the conformality of the projection since subsequent learning is performed upon the surface. In [39], a parallelised version of [2] is developed using the FF-QRAM procedure [40] for amplitude encoding and the ISP to ensure a injective embedding.

In the method of Spherical Clustering [41], the nearest neighbour algorithm is explored based on the cosine similarity measure (). The cosine similarity is used in cases of information retrieval, text mining, and data mining to find the similarity between document vectors. It is used in those cases because the cosine similarity has low complexity for sparse vectors since only the non-zero co-ordinates need to be considered. For our case as well, it is in our interest to study with the cosine dissimilarity. This approach becomes particularly relevant once we employ stereographic embedding to encode the data points into quantum states.

1.2. Contribution

In this work, we first develop generalised stereographic embedding for hybrid quantum-classical kNN clustering as a better encoding that allows the quantum algorithm (Section 3) to outperform the accuracy and convergence of classical k-means algorithm in the absence of noise; in contrast, angle embedding introduces fundamental limitations to the accuracy not due to quantum noise. To validate this statement, we simulate this algorithm classically, which translates into an equivalent classical quantum-analogous stereographic kNN clustering algorithm (Section 4). One must note that we do not demonstrate that running the stereographic quantum kNN algorithm is more practical than the classical k-means algorithm in the NISQ context. We show that stereographic quantum kNN clustering converges faster and is more accurate than other hybrid quantum-classical kNN algorithms with angle or amplitude embedding. In parallel, the benchmarking of the classical stereographic kNN algorithm lets us claim that for the problem of decoding 64-QAM optical-fibre signals the generalised ISP and spherical centroid can allow for better accuracy and convergence.

The extensive testing upon the real-world, experimental QAM dataset (Section 2.1) revealed some significant results regarding the dependence of accuracy, runtime, and convergence performance upon the radius of projection, number of points, noise in the optical-fibre, and stopping criterion - described in Section 5. Noteworthy, we observe the existence of a finite optimal radius for the ISP (not equal to 1). To the best of our knowledge, no other work has considered a generalised projection radius for quantum embedding or studied its effect. Through our experimentation, we have verified that there exists an ideal radius greater than 1 for which accuracy performance is maximised. The advantageous implementation of the algorithm upon experimental data shows that our procedure is quite competitive. The fact that the developed quantum algorithm has an entirely classical analogue (with comparable time complexity to the classical k means algorithm) is a distinct advantage in terms of in-field deployment, especially compared to [2,9,18,34,35,36,39]. The developed quantum algorithm also has another advantage in the context of Noisy Intermediate-Scale Quantum (NISQ) realisations - it has the least circuit depth and circuit width among all candidates [2,9,36,39] - making it easier to implement with the current quantum technologies. Another significant contribution is our generalisation of the dissimilarity for clustering; instead of Euclidean dissimilarity (distance), we consider other dissimilarities which might be better estimated by quantum circuits (Appendix E). A somewhat similar approach was developed in parallel by [42] in the context of amplitude embedding. All previous approaches [2,9,36,39] only try to estimate the Euclidean distance. We also make the contribution of studying the relative effect of `quantumness’ and the ISP, something completely overlooked in previous works. We show that the quantum `advantage’ in accuracy performance touted by works such as [34,35,36,39] is in reality quite suspect and achievable through classical means. In the appendix, we describe a generalisation of the stereographic embedding - the Ellipsoidal embedding, which we expect to give even better results in future works.

Other secondary contributions of our work include:

The development of a mathematical formalism for the generalisation of kNN to indicate the contribution of various parameters such as dissimilarities and dataspace (Section 2.4);
Presenting the procedure and circuit for stereographic embedding using the Bloch embedding procedure, which consumes only $O (1)$ in time and resources (Section 3.1).

2. Preliminaries

In this section, for completeness, we touch upon some concepts and background information required to understand the paper. These concepts range from small general statements on quantum states (Bloch sphere, fidelity and Bell-state measurement in ) to the mathematical formalism of kNN (), and stereographic projection (Section 2.7). We begin by first describing the optic-fibre experimental setup used to collect the 64-QAM dataset, upon which the clustering algorithms were tested and benchmarked.

2.1. Optical-Fibre Setup

M-ary Quadrature Amplitude Modulation (M-QAM) is a simple and popular protocol for digital data transmission through analog communication channels. It is widely used in optical-fibre communication networks, and the decoding process of the received data often uses the k-nearest-neighbour algorithm to cluster nearby points. More details, including the description of the model used in the experiments, can be found in Appendix A. We now describe the experimental setup used to collect the dataset that is used for benchmarking the clustering algorithms.

The dataset contains a launch-power (laser-power feed into the fibre) sweep (four datasets collected at four different launch powers) of 80 km fibre transmission of coherent dual polarization (DP)-64QAM with a gross data rate of

960 \times 10^{9}

bits/s. For the dataset, we assumed 15% overhead for forward error correction (FEC) and used 3.47% overhead for pilots and training sequences, so the net bit rate is

800 \times 10^{9}

bits/s. Note that the pilots and training sequences are removed after the MIMO equalizer. An overview of the experimental setup [10,15] to capture this real-world database is shown in Figure 1. Four

120 \times 10^{9}

Samples/s digital-to-analog converters (DACs) generate an electrical signal amplified by four 60 GHz 3dB-Bandwidth amplifiers. A tunable 100 kHz external cavity laser (ECL) source generates a continuous wave signal that is modulated by a 32 GHz DP-I/Q modulator. The receiver comprises an optical 90

^{\circ}

-hybrid and four 100 GHz balanced photodiodes. The electrical signals are digitized using four 10-bit analog-to-digital converters (ADCs) with

256 \times 10^{9}

Samples/s and 110 GHz. Subsequently, the raw signals are pre-processed by the receiver digital signal processing (DSP) blocks.

center/.style= align=center, block/.style = align=center, draw, fill=white, rounded corners, rectangle, minimum height=2em, minimum width=6em, EDFA/.style = align=center, draw, fill=white, regular polygon, regular polygon sides=3, minimum size=1.4cm, rotate=0, shape border rotate=-90, fiber/.style = align=center, draw, fill=none , circle, node distance= 2cm,color=red ,minimum size=20pt,

The datasets were collected in a very short time, corresponding to the memory size of the oscilloscope, which is limited. This is referred to as offline processing. At the receiver, the signals were normalised to fit the alphabet. The average launch power in watts can be calculated as follows:

\begin{matrix} P_{(W)} = 1 W \cdot 10^{(P_{(d B m)}) / 10)} / 1000 = 10^{(P_{(d B m)} - 30) / 10} \end{matrix}

There are four sets of published data with different launch powers, corresponding to different levels of non-linear distortions during transmission:

2.7

dBm,

6.6

dBm,

8.6

dBm, and

10.7

dBm. Each dataset consists of the `alphabet’ (initial analog transmission values), the error-corrected received analog values, and the true labels of the transmitted points. The data has been explained and visualised in detail in the Appendix A. To quantify the system performance of an amplified coherent optical communication system, one uses either a launch power sweep or an OSNR (Optical Signal to Noise Ratio) sweep. While the OSNR metric is used when the system is operating in the linear region, the launch power is the preferred metric to show the performance degradation in the nonlinear region since the induced nonlinear effects are directly proportional to the launch power. The signal-to-noise (SNR) ratio for each launch power can be computed using the following expression, where

z

are the received noisy signals and

x

are the noiseless target symbols (the launched signals):

\begin{matrix} S N R & = 10 {log}_{10} (\frac{mean ({∥z∥}^{2})}{mean ({∥z - x∥}^{2})}) . \end{matrix}

After obtaining this noisy real-world dataset, our task is to decode the received analog values into bit-strings. The kNN is the candidate of choice since it classifies datasets into clusters by associating an `average point’ (centroid) to each cluster. In our method, the objective of the clustering algorithm is first to identify, using the set of received signals, a given number M of centroids (one for each cluster) and then to assign each signal to the `nearest’ centroid. The second step is classification. This creates the clusters, which can then be decoded into bit signals through the process of demapping. Demapping consists of mapping the original transmission constellation (alphabet) to the current centroids and then assigning the bit-string label associated with that initial transmission point to all the points in the cluster of that centroid. This process completes the final step of the QAM protocol, translating the analog values to bit-strings read by the receiver. The size M of the constellation is known since we know beforehand which QAM protocol is being used. We also know the “alphabet”, i.e. the initial and ideal points at which the signals were transmitted.

2.2. Bloch Sphere

It is well known that all the qubit pure states can be obtained from the zero state using the unitary U [43]

\begin{matrix} | ψ (θ, ϕ) 〉 & = U (θ, ϕ) | 0 〉 = c o s (θ / 2) | 0 〉 + e^{i ϕ} sin (θ / 2) | 1 〉 \end{matrix}

(1)

where

\begin{matrix} U (θ, ϕ) (\begin{matrix} cos \frac{θ}{2} & - sin \frac{θ}{2} \\ e^{i ϕ} sin \frac{θ}{2} & e^{i ϕ} cos \frac{θ}{2} \end{matrix}) . \end{matrix}

(2)

These are unit vectors in the unit sphere of

C^{2}

, but it is also well known that the corresponding density matrices are uniquely represented by the Bloch vectors

a (θ, ϕ) (sin θ cos ϕ, sin θ sin ϕ, cos θ)

as points in the unit sphere

S^{2} (1) \subset R^{3}

[43] (the Bloch sphere) through the relation

\begin{matrix} ρ (θ, ϕ) & = |ψ (θ, ϕ) 〉 〈 ψ (θ, ϕ)| \end{matrix}

(3)

\begin{matrix} = (\begin{matrix} {cos}^{2} \frac{θ}{2} & - e^{- i ϕ} cos \frac{θ}{2} sin \frac{θ}{2} \\ e^{i ϕ} cos \frac{θ}{2} sin \frac{θ}{2} & {sin}^{2} \frac{θ}{2} \end{matrix}) \end{matrix}

(4)

\begin{matrix} = \frac{1}{2} (\begin{matrix} 1 + cos θ & - e^{- i ϕ} sin θ \\ e^{i ϕ} sin θ & 1 - cos θ \end{matrix}) \end{matrix}

(5)

\begin{matrix} = \frac{1}{2} (1 + a (θ, ϕ) \cdot \vec{σ}) \end{matrix}

(6)

where

1

is the identity matrix and

\vec{σ} = (σ_{x}, σ_{y}, σ_{z})

is the vector of Pauli matrices

\begin{matrix} σ_{x} & = (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}) & σ_{y} & = i (\begin{matrix} 0 & - 1 \\ 1 & 0 \end{matrix}) & σ_{x} & = (\begin{matrix} 1 & 0 \\ 0 & - 1 \end{matrix}) . \end{matrix}

Regarding mixed states, notice that Eq. (6) is linear and thus, convex combinations of density matrices translate to convex combinations of Bloch vectors, meaning that the interior of the sphere represents the mixed states. Namely, the most general qubit quantum states can be represented by

\begin{matrix} ρ_{a} \equiv ρ (a) & = \frac{1}{2} (1 + a \cdot \vec{σ}), & {∥a∥}_{2} \leq 1 . \end{matrix}

(7)

Finally, since the Pauli matrices are orthogonal operators under the Hilbert-Schmidt inner product, this inner product is easily computed as

\begin{matrix} Tr (ρ_{a_{1}} ρ_{a_{2}}) & = \frac{1}{2} (1 + a_{1} \cdot a_{2}) . \end{matrix}

(8)

which for pure states coincides with the fidelity.

Using the Bloch sphere representation of qubit quantum states also makes it easy to find orthogonal states and compute diagonalizations. Indeed, let

a

be a unit vector (

∥a∥ = a \cdot a = 1

), thus representing the pure state

ρ_{a} = \frac{1}{2} (1 + a \cdot \vec{σ})

, then the orthogonal state to

ρ_{a}

is simply the antipodal point

\begin{matrix} ρ_{- a} = \frac{1}{2} (1 - a \cdot \vec{σ}) \end{matrix}

(9)

which can be shown by computing the inner product as in Eq. (8)

\begin{matrix} Tr (ρ_{+ a} ρ_{- a}) = \frac{1}{4} (1 + a \cdot (- a)) = 0 . \end{matrix}

(10)

Hence, the Bloch eigenvectors for any Bloch vector

a

are

\pm \frac{a}{∥a∥}

, the two antipodal points where the line of

a

intersect the Bloch sphere. Namely, for any mixed quantum state corresponding to the Bloch vector

a

, we can decompose the quantum state as

\begin{matrix} \frac{1}{2} (1 + a \cdot \vec{σ}) & = p \frac{1}{2} (1 + \frac{a}{∥a∥} \cdot \vec{σ}) + (1 - p) \frac{1}{2} (1 - \frac{a}{∥a∥} \cdot \vec{σ}) \end{matrix}

(11)

\begin{matrix} = \frac{1}{2} (1 + (2 p - 1) \frac{a}{∥a∥} \cdot \vec{σ}) \end{matrix}

(12)

with

\begin{matrix} 2 p - 1 & = ∥a∥ & \Rightarrow p = \frac{1}{2} (1 + ∥a∥) . \end{matrix}

(13)

In the next section, we discuss how we use the Bell-state measurement to estimate the fidelity between quantum states and exposit when this should be chosen over the SWAP test.

2.3. Bell-State Measurement and Fidelity

We use the Bell-state measurement to estimate the fidelity between two pure states. The Bell-state measurement is defined as the von-Neumann measurement of the maximally entangled basis

| ϕ_{i j} 〉 C N O T (H \otimes 1) | i j 〉,

(14)

which by construction is equivalent to a standard basis measurement after

(H \otimes 1) C N O T

as displayed in Figure 2. This measurement can be used to estimate the fidelity as follows.

Lemma 1.

Let

| ψ 〉

and

| χ 〉

be two qubit pure states and let

| ϕ_{11} 〉 C N O T (H \otimes 1) | 11 〉

(the singlet Bell state). Then

\begin{matrix} {|〈 ϕ_{11} | (| ψ 〉 \otimes | χ 〉)|}^{2} & = \frac{1}{2} {(1 - | 〈 ψ | χ 〉 |}^{2}) . \end{matrix}

(15)

Proof.

Let us write the states as

\begin{matrix} | ψ 〉 & = ψ_{0} | 0 〉 + ψ_{1} | 1 〉 & | χ 〉 & = χ_{0} | 0 〉 + χ_{1} | 1 〉, \end{matrix}

(16)

Then the state before the standard-basis measurement is

\begin{matrix} | ψ_{out} 〉 = (H \otimes 1) C N O T (| ψ 〉 \otimes | χ 〉) & = \frac{1}{\sqrt{2}} (\begin{matrix} ψ_{0} χ_{0} + ψ_{1} χ_{1} \\ ψ_{0} χ_{1} + ψ_{1} χ_{0} \\ ψ_{0} χ_{0} - ψ_{1} χ_{1} \\ ψ_{0} χ_{1} - ψ_{1} χ_{0} \end{matrix}) \end{matrix}

(17)

and in particular, the probability of outcome

i j = 11

(i.e. simultaneous measurement of both qubits yields value `1’ on each qubit) can be written as

\begin{matrix} {|〈 ϕ_{11} | (| ψ 〉 \otimes | χ 〉)|}^{2} & = {| 〈 11 | ψ_{o u t} 〉 |}^{2} = \frac{1}{2} {| ψ_{0} χ_{1} - ψ_{1} χ_{0} |}^{2} . \end{matrix}

(18)

The fidelity is obtained now by adding and subtracting

ψ_{0}^{*} ψ_{0} χ_{0}^{*} χ_{0} + ψ_{1}^{*} ψ_{1} χ_{1}^{*} χ_{1}

and computing

\begin{matrix} {|〈 ϕ_{11} | (| ψ 〉 \otimes | χ 〉)|}^{2} & = \frac{1}{2} (1 - ψ_{1} χ_{0} ψ_{0}^{*} χ_{1}^{*} - ψ_{0} χ_{1} ψ_{1}^{*} χ_{0}^{*} - ψ_{0}^{*} ψ_{0} χ_{0}^{*} χ_{0} - ψ_{1}^{*} ψ_{1} χ_{1}^{*} χ_{1}) \end{matrix}

(19)

\begin{matrix} = \frac{1}{2} (1 - | ψ_{0}^{*} χ_{0} + ψ_{1}^{*} χ_{1} |^{2}) \end{matrix}

(20)

\begin{matrix} = \frac{1}{2} {(1 - | 〈 ψ | χ 〉 |}^{2}), \end{matrix}

(21)

concluding the proof. □

Lemma 1 is used to construct the quantum clustering algorithm in Section 3. We will use the quantum circuit of Figure 2 for fidelity estimation in the developed quantum algorithm.

Remark 1.

Since we are only interested in the

i j = 11

outcome and we are measuring qubits, the course-grained projective measurement defined by

|ϕ_{11} 〉 〈 ϕ_{11}|

and

1 - |ϕ_{11} 〉 〈 ϕ_{11}|

is sufficient for computing the inner product. The non-destructive version of this measurement is known as theSWAP test [25,26], first described in [24]. This test has been used extensively for overlap estimation in quantum algorithms [2]. The SWAP test requires one to only measure an ancilla qubit instead of the two input qubits, leaving them in the post-measurement state, which can be used later for other purposes. However, given the current limitations of NISQ technologies, storing quantum information for reuse is quite impractical; therefore, we prefer the destructive measurement version for overlap estimation. Namely, we use the Bell-state measurement instead of the SWAP test because the post-measurement state is unnecessary.

2.4. Nearest-Neighbour Clustering Algorithms

Clustering is a simple, powerful and well-known machine-learning algorithm that has been extensively used throughout the literature. In this section, we summarise some standard and basic notions introduced by clustering and define this class of heuristic algorithms precisely so that we can make clear the difference between regular clustering and the quantum and quantum-inspired clustering algorithms introduced in this paper. We first define the involved variables needed for the kNN.

Definition 1

(Clustering State). We define ak-Nearest-Neighbour Clustering State, orclustering statefor short, as a collection

(D, \bar{c}, D, d)

where

$D$ is a space calleddataspacewith elements calledpoints.
$D \subseteq D$ is a subset calleddatasetconsisting of points calleddatapoints.
$\bar{c} = (c_{1} c_{2} \dots c_{k}) \subseteq D^{k}$ is a list (of size k) of points calledcentroids
$d : D \times D ⟼ R$ is a lower bounded function calleddissimilarity function, ordissimilarityfor short.

Note that d does not have to be a distance metric. We now define the basic steps that are repeated in the clustering algorithm.

Definition 2

(Clusters and Centroid update). Let

(D, \bar{c}, D, d)

be a clustering state. We define theclustersof the state as, for each

j = 1, . . ., k

, the set

\begin{matrix} C_{j} (\bar{c}) = \{p \in D | d (p, c_{j}) \leq d (p, c_{ℓ}) \forall ℓ = 1, . . ., k, p \notin ⋃_{ℓ < j} C_{ℓ} (\bar{c})\} . \end{matrix}

(22)

We now definethe possible new centroidsof a subset

C \subseteq D

as the set

\begin{matrix} P (C) & \underset{x \in D}{argmin} \sum_{p \in C} d (x, p) \end{matrix}

(23)

of all points minimising the total (and thus the average) dissimilarity. Then, we call acentroid updateany function

c^{update} : P (D) \to D

(where P denotes the power set) of clusters such that

c^{update} (C)

is a possible new centroid, namely such that

c^{update} (C) \in P (C)

, for all

j = 1, . . ., k

. We then define the following short-hand notation for the centroid update of

\bar{c}

, namely the new list of centroids

\begin{matrix} {\bar{c}}^{update} (\bar{c}) & = (c^{update} (C_{1} (\bar{c})), . . ., c^{update} (C_{k} (\bar{c}))) . \end{matrix}

(24)

We now define the general k-nearest-neighbour clustering algorithm.

Definition 3

(K-Nearest-Neighbour Clustering Algorithm (kNN)). Finally, we define aK-Nearest-Neighbour clustering algorithm (kNN)as a pair of clustering state and centroid update

(D, {\bar{c}}_{1}, D, d, {\bar{c}}^{update})

. The kNN algorithm defines a sequence of clustering states

(D, {\bar{c}}_{i}, D, d)

via

{\bar{c}}_{i + 1} = {\bar{c}}^{update} ({\bar{c}}_{i})

for all

i \in N

which we call the iterations (of the algorithm).

A point of note is that Eq. (23) implies that the new centroid is one of the points

x

in the dataspace that minimises the total (and hence the average) dissimilarity with all the points

p

in the cluster. Also, notice that this definition requires one to initialise or populate the list

\bar{c}

with initial values, i.e. the initial centroids

{\bar{c}}_{1}

must be defined as a starting point for the clustering algorithm. The initial centroids can be assigned randomly or defined as a function of parameters such as the dataset.

Another comment about Eq. (23): in our case, we will see later in Section 3.2 that all choices of points from the set

P_{j}

will be equivalent. As in our algorithm, this freedom of choice can be exploited to reduce the amount of computation or for other optimisations.

Notice that Eqs. (23) and (24) implies that centroids are generally not part of the original data set; however, according to Eqs. (23) and (24) they must be restricted to the space in which the dataset is defined. Definitions involving centroids for which

\bar{c} \notin D^{k}

are possible but are not used in this work.

One can see that any kNN can be broken down into two steps that keep alternating until a stopping condition (a condition which, when true, forces the algorithm to terminate) is met: a cluster update which updates the points associated with the newly calculated centroid, and then a centroid update which recalculates the centroid based upon the new points associated to it through its cluster. For the cluster update, the value of the centroid calculated in the previous iteration is taken, and its cluster set is constructed by collecting all the points in the dataset that are `closer’ to it than any other centroid. The `closeness’ is computed by using a pre-defined dissimilarity. In the next step, the centroids are updated by searching in the dataspace, for each updated cluster, a new point for which the sum of dissimilarities between that point and all points in the cluster is minimised.

This procedure will lead to different results if one changes the dissimilarity or the space of data points or both. In this paper, we explore the effects of changing this dissimilarity as well as the space of data points, and we shall explain it in the context of quantum states.

2.5. Euclidean Dissimilarity and Classical Clustering

It can be seen from the centroid update in Eqs. (22) and (23) that the dissimilarity plays a central role in the clustering algorithm. The nature of this function directly controls the first step of cluster update since the dissimilarity is used to compute the `closeness’ between any two points in the dataspace. It is also apparent that if the dissimilarity is changed in the centroid update, the points at which the minimum is achieved could also change.

The Euclidean dissimilarity

d_{e} : R^{n} \times R^{n} \to R

is defined simply as the square of the Euclidean distance between the points:

\begin{matrix} d_{e} (a, b) = {∥a - b∥}^{2} . \end{matrix}

(25)

For a finite subset

C \subset R^{n}

, the minimisation of Eqs. (23) and (24) yields a unique point, reducing to the average point of the cluster:

\begin{matrix} c_{e}^{update} (C) : = \underset{x \in R^{n}}{argmin} \sum_{p \in C} d_{e} (x, p) = \frac{1}{| C |} \sum_{p \in C} p, \end{matrix}

(26)

which we call the Euclidean centroid update. This is the most typical case of centroid update, where the new centroid is updated as the mean point of all points in the cluster. This corresponds to the classic k-means clustering algorithm [44], which now can be defined as follows.

Definition 4

(n-Dimensional Euclidean Classical kNN (nDEC-kNN)). Ann-dimensional classical Euclidean kNN algorithmis any clustering algorithm with dataspace

R^{n}

, Euclidean dissimilarity and cluster update as the average point, as in Eq. (26). Namely, any clustering algorithm of the form

(D, \bar{c}, R^{n}, d_{e}, c_{e}^{update})

.

The computation of the centroid through Eq. (26) instead of Eqs. (23) and (24) reduces the complexity of the centroid update step; such a reduced expression is used to compute the updated centroids rather than searching the entire dataspace for the minimising points during the centroid update.

2.6. Cosine Dissimilarity

In this work, we project the collected two-dimensional dataset (described in Section 2.1) into a sphere via the ISP. After this projection, the calculation of the centroids according to Eq. (26) would generally yield centroids which lie inside the sphere instead of on the

S^{2} (r)

surface due to the convex nature of the sphere’s surface.

In our work, to use qubit pure states, we restrict the dataspace

D

to the sphere surface

S^{2} (r)

, forcing the centroids to lie on the surface of a sphere. This naturally leads to the question of what the proper reformulation of Eqs. (23) and (24) is, and whether a computationally inexpensive formula similar to Eq. (26) exists for this case as well. This question will be answered in Lemma 3. For this purpose, it is useful to first define the cosine dissimilarity [45] and see how it relates to the Euclidean dissimilarity.

Definition 5

(Cosine Dissimilarity). For two points,

a

and

b

in an inner-product space

D

thecosine dissimilarityis defined as:

\begin{matrix} d_{s} (a, b) = 1 - \frac{a \cdot b}{∥a∥ ∥b∥}, \end{matrix}

(27)

where

a \cdot b

is the inner product between the two points expressed as vectors from the origin,

∥a∥

is the norm of

a

induced by the inner product.

This is called cosine dissimilarity because when

a, b \in R^{n}

the cosine dissimilarity

d_{s} (a, b)

reduces to

1 - cos (α)

, where

α

is the angle between

a

and

b

. The cosine dissimilarity is also known sometimes as cosine distance (although it is not a distance), while

\frac{a \cdot b}{∥a∥ ∥b∥}

is well known as cosine similarity. This quantity, by construction, only depends on the direction of the vectors and not their magnitude. Said otherwise, we have

\begin{matrix} d_{s} (a, b) = d_{s} (c a, b) = d_{s} (a, c b) \end{matrix}

(28)

for any positive constant

c > 0

. We also note that cosine dissimilarity of Eq. (27) can be related to the Euclidean dissimilarity of Eq. (25) if

a

and

b

lie on the n-sphere

S^{n} (r) \{s \in R^{n + 1} | {∥s∥}_{2} = r\}

of radius r, as stated by the following lemma.

Lemma 2.

Let

d_{s}

and

d_{e}

be the cosine and Euclidean dissimilarities, respectively. Let

s_{1}

,

s_{2}

\in S^{n} (r)

be points on the n-sphere of radius r, then

\begin{matrix} d_{e} (s_{1}, s_{2}) = 2 r^{2} d_{s} (s_{1}, s_{2}) . \end{matrix}

(29)

Proof.

Assuming

s_{1}

,

s_{2}

\in S^{n} (r)

, Eq. (27) reduces to:

\begin{matrix} d_{s} (s_{1}, s_{2}) & = 1 - \frac{1}{r^{2}} s_{1} \cdot s_{2}, \end{matrix}

(30)

then

\begin{matrix} 2 r^{2} d_{s} (s_{1}, s_{2}) & = 2 r^{2} - 2 s_{1} \cdot s_{2} = {∥s_{1}∥}^{2} + {∥s_{2}∥}^{2} - 2 s_{1} \cdot s_{2} = {∥s_{1} - s_{2}∥}^{2} = d_{e} (s_{1}, s_{2}), \end{matrix}

(31)

concluding the proof. □

From this, we can expect that the minimiser of the centroid update equation (23) computed using the cosine dissimilarity will closely relate to the Euclidean centroid update. However, the derivation is not straightforward since the Euclidean centroid update does not lie on the same sphere, but lies inside at a smaller radial distance. This is shown in the following lemma.

Lemma 3.

Let

C \subset S^{n} (r)

be a finite set, then

\begin{matrix} c_{s}^{update} (C) : = & \underset{x \in S^{n} (r)}{argmin} \sum_{p \in C} d_{s} (x, p) = r \frac{\sum_{p \in C} p}{∥\sum_{p \in C} p∥} . \end{matrix}

(32)

We call this the cosineorspherical centroid update. In particular, thus

\begin{matrix} 1 - 1 c_{s}^{update} (C) = & = r \frac{c_{e}^{update} (C)}{∥c_{e}^{update} (C)∥} \end{matrix}

(33)

where

c_{e}^{update} (C) = \frac{1}{| C |} \sum_{p \in C} p

is the Euclidean centroid update of Eq. (26).

Proof.

The second claim is trivial, we thus have to prove only the first claim. Given that

C \subseteq S^{n} (r)

, then according to Lemma 2, the cosine dissimilarity given in Eq. (27) reduces for all

a, b \in C

to:

\begin{matrix} d_{s} (a, b) & = 1 - \frac{a \cdot b}{∥a∥ ∥b∥} = 1 - \frac{1}{r^{2}} a \cdot b . \end{matrix}

(34)

The minimisation in Eq. (23) can then be calculated for the cosine dissimilarity with a Lagrangian (see Eq. (37)) that satisfies Eqs. (23) and (24) at the minimising point. Namely, we have to find

x \in R^{n}

that minimises

\begin{matrix} f (x) = \sum_{p \in C} d (x, p), \end{matrix}

(35)

subject to the restriction condition that assures that

x \in S^{n} (r)

, that is

\begin{matrix} g (x) = {∥x∥}^{2} - r^{2} = 0 . \end{matrix}

(36)

Such a Lagrangian is expressed as

\begin{matrix} L (x, λ) & = f (x) - λ g (x) = \sum_{p \in C} d_{s} (x, p) - λ ({∥x∥}^{2} - r^{2}), \end{matrix}

(37)

where

λ

is the Lagrangian multiplier. We then calculate the centroid update by employing the derivative criteria to Eq. (37).

\begin{matrix} 0 & = \nabla (\sum_{p \in C} (1 - \frac{1}{r^{2}} x \cdot p) - λ {∥x∥}^{2} + r^{2}) = - \frac{1}{r^{2}} \sum_{p \in C} p - 2 λ x \end{matrix}

(38)

Therefore the following holds:

\begin{matrix} x & = - \frac{1}{2 λ r^{2}} \sum_{p \in C} p . \end{matrix}

(39)

Substituting Eq. (39) into the restriction in Eq. (36), we obtain the multiplier

λ

as:

\begin{matrix} | λ | & = \frac{1}{2 r^{3}} \cdot ∥\sum_{p \in C} p∥ . \end{matrix}

(40)

Therefore, the critical point and minimising point

c_{s}

is written as

\begin{matrix} c_{s} = \frac{r}{∥\sum_{p \in C} p∥} \sum_{p \in C} p, \end{matrix}

(41)

as claimed. □

We can observe that Lemma 3 implies that the minimiser obtained by restricting the point to lie on the surface of the sphere is the projection (from the origin) of the minimiser of the Euclidean dissimilarity into the sphere’s surface.

Corollary 1.

Let

C \subset S^{n} (r)

be a finite set, the possible new centroids of C under cosine dissimilarity in

R^{3}

are

\begin{matrix} P_{s} (C) : = & \underset{x \in R^{n}}{argmin} \sum_{p \in C} d_{s} (x, p) = \{r \sum_{p \in C} p : r > 0\} . \end{matrix}

(42)

We call these thecosine possible new centroids.

Proof.

We have

\begin{matrix} c_{s}^{update} : = & \underset{x \in R^{n} ∖ {0}}{argmin} \sum_{p \in C} d_{s} (x, p) \end{matrix}

(43)

\begin{matrix} = \underset{x \in S^{n} (r), r > 0}{argmin} \sum_{p \in C} d_{s} (x, p) \end{matrix}

(44)

\begin{matrix} = {\{r \frac{\sum_{p \in C} p}{∥\sum_{p \in C} p∥}\}}_{r > 0} \end{matrix}

(45)

\begin{matrix} = \{r \sum_{p \in C} p : r > 0\}, \end{matrix}

(46)

where the last equality follows from Eq. (28), namely

\begin{matrix} d_{s} (r \frac{\sum_{p \in C} p}{∥\sum_{p \in C} p∥}, p) = d_{s} (\sum_{p \in C} p, p) \end{matrix}

(47)

for all r and

p

; thus making all the points

r \sum_{p \in C} p

,

r > 0

equivalent possibilities for the centroid update. □

2.7. Stereographic Projection

The inverse stereographic projection (ISP), shown in Figure 3, is a bijective mapping

\begin{matrix} s_{r}^{- 1} : R^{n} \mapsto S^{n} (r) ∖ {N} \end{matrix}

(48)

from the Euclidean space

R^{n}

into an n-sphere

S^{n} (r) \subset R^{n + 1}

without the north pole N.

This mapping is interesting because of the natural equivalence between the 3D unit sphere

S^{2} (1)

and the Bloch sphere of qubit quantum states. In this case, as displyed in Figure 3, the ISP maps a 2-dimensional point

p = (p_{x}, p_{y}) \in R^{2}

into a three-dimensional point

s_{r}^{- 1} (p) = (s_{x} (p), s_{y} (p), s_{z} (p)) \in S^{2} (r) ∖ {(0, 0, r)}

through the following set of transformations:

\begin{matrix} s_{x} (p) & = p_{x} \cdot \frac{2 r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}} & s_{y} (p) & = p_{y} \cdot \frac{2 r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}} & s_{z} (p) & = r \cdot \frac{p_{x}^{2} + p_{y}^{2} - r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}} \end{matrix}

(49)

\begin{matrix} = p_{x} \cdot \frac{2 r^{2}}{{∥p∥}^{2} + r^{2}} & = p_{y} \cdot \frac{2 r^{2}}{{∥p∥}^{2} + r^{2}} & = r \cdot \frac{{∥p∥}^{2} - r^{2}}{{∥p∥}^{2} + r^{2}} . \end{matrix}

(50)

The polar and azimuthal angles of the projected point are given by the expressions:

\begin{matrix} ϕ (p) & = {tan}^{- 1} (\frac{p_{y}}{p_{x}}) & θ (p) & = 2 \cdot {tan}^{- 1} (\frac{r}{∥p∥}) \end{matrix}

(51)

This information, particularly Eq. (51), will allow us to associate each point in

R^{2}

to a unique quantum state through the Bloch sphere. Still, the inverse stereographic projection does not need to be bound to the preparation of quantum states and can be used as a transformation between classical kNN algorithms. Indeed, we can stereographically project and then perform classical clustering on the 3D data, namely perform 3DEC-kNN as defined in Definition 4.

Definition 6 (3D Stereographic Classical kNN (3DSC-kNN)). Let

s_{r}^{- 1}

be an ISP, and let

(D, \bar{c}, R^{2}, d_{e})

be a clustering state (recall, D is the dataset and

\bar{c}

are the initial centroids). We then define the 3D Stereographic Classical kNN (3DSC-kNN) as (

s_{r}^{- 1} (D)

,

s_{r}^{- 1} (\bar{c})

,

R^{3}

,

d_{e}, c_{e}^{update})

.

Here we apply

s_{r}^{- 1}

elementwise and thus

s_{r}^{- 1} (\bar{c}) = (s_{r}^{- 1} (c_{1}), . . ., s_{r}^{- 1} (c_{k}))

for any list of centroids

\bar{c}

and

s_{r}^{- 1} (D) = \{s_{r}^{- 1} (p) : p \in C\}

for any set of points C, and where

C_{j}

are the clusters as defined in Eq. (22).

Remark 2.

Derivations and further observations can be found in Appendix C. Of particular note, as explained in more detail in Appendix C.2, is that changing the plane’s distance from the centre of the sphere is equivalent to a change of radius. Therefore we can limit our analysis to projections where the centre of the plane is also the centre of the sphere without loss of generality.

3. Stereographic Quantum Nearest-Neighbour Clustering (SQ-kNN)

This section proposes and describes the quantum k (nearest-neighbour) clustering using stereographic embedding. In Section 4, we demonstrate an equivalent quantum-inspired (classical) version. Section 3.1 defines the method to convert the classical data into quantum states. In what follows, we describe how these states are manipulated so that we obtain an output that can be used to perform clustering, using the circuit of Section 2.3 for dissimilarity estimation. Section 3.2 defines the quantum algorithm in terms of , and Section 3.3 discusses the complexity and scalability of the algorithm. Section 3.4 discusses the SQ-kNN algorithm in the context of mixed states.

3.1. Stereographic Embedding, Bloch Embedding and Quantum Dissimilarity

For quantum algorithms to work on classical data, the classical data must be converted into quantum states. This process of encoding classical data into quantum states is also called embedding. The embedding of classical data into quantum states is not unique, and each technique’s pros and cons must be weighed in the context of a specific application. The process of data embedding is an active field of research. More details on existing embedding can be found in Appendix B.

Here, we propose the stereographic embedding as an improved embedding of classical vector

p \in R^{2}

into quantum state using its stereographic projection. We can split stereographic embedding into two steps: inverse stereographic projection and Bloch embedding. We define Bloch embedding, a variation of angle embedding, as follows.

Definition 7

(Bloch embedding). Let

P \in R^{3}

. We define theBloch embedded quantum state, orBloch embeddingfor short, of

P

as the quantum state

ψ_{P} : = \frac{1}{2} (1 + \frac{P}{∥P∥} \cdot \vec{σ})

(52)

which is simply the pure state obtained using

P / ∥P∥

as Bloch vector.

At this point, we define this general embedding for general 3-dimensional points since this general form will yield the quantum dissimilarity defined next. We will also define this embedding in the context of the ISP in 9 below.

To obtain

ψ_{P}

the state can be encoded as explained in the preliminaries in Section 2.2, through . For Bloch embedding, the

θ

and

ϕ

of would be the polar and azimuthal angles of

P

respectively. We now define the quantum dissimilarity as follows.

Definition 8

(Quantum Dissimilarity). For any two points

P_{1}, P_{2} \in R^{3}

, we define thequantum dissimilarityas

d_{q} (P_{1}, P_{2}) : = \frac{1}{2} (1 - Tr (ψ_{P_{1}} ψ_{P_{2}})),

(53)

where

ψ_{P}

is the Bloch embedding of

P

.

Notice that, as per this definition, the classical 2-dimensional points are embedded in pure states only. In Section 3.4, we consider Bloch embedding the centroids also into mixed states, showing that it does not give an advantage in our framework. This quantum dissimilarity can be obtained either with the SWAP test or with the Bell state measurement on

ψ_{P_{1}} \otimes ψ_{P_{2}}

as described in Section 2.3. In our application, we use the Bell state measurement (depicted in Figure 2)), as we do not need the extra resources of the SWAP test that allow us to keep the post-measurement state. For more details, see Lemma 1.

By Eq. (8), the quantum dissimilarity is proportional to the cosine dissimilarity.1

d_{q} (P_{1}, P_{2}) = \frac{1}{2} (1 - Tr (ψ_{P_{1}} ψ_{P_{2}})) = \frac{1}{4} (1 - \frac{P_{1}}{∥P_{1}∥} \cdot \frac{P_{2}}{∥P_{2}∥}) = \frac{1}{4} d_{s} (P_{1}, P_{2})

(54)

It is also proportional to the Euclidean distance for points on the same sphere (points with the same magnitude) as per Lemma 2. Namely, if

s_{1}, s_{2} \in S^{2} (r)

then:

\begin{matrix} d_{q} (s_{1}, s_{2}) & = \frac{1}{8 r^{2}} d_{e} (s_{1}, s_{2}) . \end{matrix}

(55)

We can finally define stereographic embedding as follows.

Definition 9

(Stereographic Embedding). The stereographic embedding of a classical vector

p \in R^{2}

consists of

projecting the 2D point $p$ into a point on the sphere of radius r in 3D space through the ISP:

$s : = s_{r}^{- 1} (p) \in S^{2} (r) \subset R^{3};$

(56)
Bloch embedding $s$ into $ψ_{s} = ψ_{s_{r}^{- 1} (p)}$ .

Comparing the distance estimate of the Stereographic embedding procedure (Eqs. (55) and (A34)) with that for the hybrid quantum-classical k-means with angle embedding (the `distance loss function’ described in [10]), we can see that the theoretical performance has been improved, since the estimate has been much improved with respect to the closeness to Euclidean distance. This leads us to expect a performance improvement of the SQ-kNN algorithm over the hybrid quantum-classical implementation of quantum k-means with angle embedding.

A very time-consuming computational step of kNN involves the repeated calculations of distances between the data set points meant to be classified and each centroid. In the case of the quantum kNN in [2], since angle embedding is not injective, many steps must be spent after estimating the fidelity to calculate the distance between the points using the norms. Even in [34,35,39], the norms of the points have to be stored classically, leading to much computational expense. Our method has the clear benefit of calculating the cosine dissimilarity directly through fidelity estimation. No further calculations are required due to all stereographically projected points having the same norm r in the sphere, and the existence of a bijection between the ISP and the original 2D datapoints, thus saving computational time and resources. In summary, Eqs. (54) and (55) portray a method to measure a dissimilarity that leads to consistent clustering involving pure states.

As one can see, in the case of stereographically projected points,

d_{q}

is directly proportional to the Euclidean dissimilarity between them. Since all the points after projection into the sphere have equal modulus r, and each projected point corresponds to a unique 2D data point, we can directly compare the probability of getting outcome

i j = 11

on the Bell-state measurement circuit for cluster assignment. This eliminates extra steps needed during computation to account for the different moduli of points on the 2-dimensional plane.

3.2. The SQ-kNN Algorithm

We now have all the building blocks to define the quantum clustering algorithm. The quantum part will be the dissimilarity estimation

d_{q}

, obtained by embedding the data into quantum states as described in Section 3.1 and then feeding it into the quantum circuit described in to estimate an outcome probability. The finer details of distance estimation are further described in Appendix E. We can now formally define the developed algorithm building on the definition of clustering state (Definition 1), of clustering algorithm and cluster update provided by Definitions 2 and 3, of ISP as defined in Section 2.7 and of quantum dissimilarity

d_{q}

from Definition 8.

Definition 10 (Stereographic Quantum kNN(SQ-kNN)). Let

s_{r}^{- 1}

be the ISP, let

d_{q}

be the quantum dissimilarity and let

(D, \bar{c}, R^{2}, d_{e})

be a clustering state (where D and

\bar{c}

are 2-dimensional dataset and initial centroids). We then define the Stereographic Quantum kNN (SQ-kNN) as the kNN clustering algorithm

\begin{matrix} (s_{r}^{- 1} (D), s_{r}^{- 1} (\bar{c}), R^{3}, d_{q}, {\bar{c}}_{q}^{update}) \end{matrix}

(57)

where

c_{q}^{update} : = \sum_{p \in C} p

.

The complete process of performing SQ-kNN in practice can be described in detail as follows.

First, prepare to embed the classical data and initial centroids into quantum states using the ISP: project the 2-dimensional datapoints and initial centroids (in our case, the alphabet) into a sphere of radius r and calculate the polar and azimuthal angles of the points. This first step is executed entirely on a classical computer.
Cluster Update: The calculated angles are used to create the states using Bloch embedding (Definition 7). The dissimilarity between the centroid and point is then estimated using the Bell-state measurement. Once the dissimilarities between a point and all the centroids have been obtained, the point is assigned to the cluster of the `closest’ centroid. This is repeated for all the points that have to be classified. The quantum circuit and classical controller handle this step entirely. The controller feeds in the classical values at the appropriate times, stores the results of the various shots and classifies the point to the appropriate cluster.
Centroid Update: Since any non-zero point on the subspace of $c_{s}$ (see Corollary 1, Figure 4) is an equivalent choice, to minimise computational expense, the centroids are updated as the sum point of all points in the cluster - as opposed to the average, for example, which minimises the Euclidean dissimilarity (Eq. (26)).

Once the centroids are updated, Step 2 (Cluster Update) is repeated, followed once again by Step 3 (Centroid Update) until a decided stopping condition is fulfilled.

Compared to 2D quantum kNN clustering with angle or amplitude embedding, the differences with the SQ-kNN algorithm lie in the embedding and the post-processing after the inner-product estimation.

The stereographic embedding of the 2D datapoints is done by inverse stereographic projecting the point into a sphere of a chosen radius and then producing the quantum state obtained by rescaling the sphere to radius one.

In contrast, in angle embedding, the coefficients of the vectors are used as the angles of the Bloch vector (also known as dense angle embedding [47]), while in amplitude embedding, they are used as the amplitudes in the standard basis. For 2D vectors, amplitude embedding allows one to encode only one coefficient (instead of two) in one qubit, and sometimes angle embedding would also encode only one coefficient by using a single rotation (standard angle embedding [48]). Both angle and amplitude embeddings require the lengths of the vectors to be stored classically beside the quantum state, which is not needed in Bloch embedding.
No post-processing is needed after the overlap estimation of stereographically embedded data as the obtained estimate is already a linear function of the inner product, as opposed to standard approaches using angle or amplitude encoding. Amplitude embedding also requires non-trivial computational time in the state preparation process. In contrast, in angle embedding, though the state preparation time is constant, recovering a useful dissimilarity (e.g. Euclidean) may involve many post-processing steps.

In short, stereographic embedding has the advantage of angle over amplitude embedding of being able to encode all values of a vector and low state preparation time, and the advantage of amplitude versus angle embedding in the recovery of the dissimilarity.

3.3. Complexity Analysis and Scaling

Let the ISP of a d-dimensional point

p = [x_{1} x_{2} \dots x_{d}]

into

S^{d} (r)

, using the point

(r, 0, 0, \dots, 0)

(the `North pole’) as the projection point, be the point

s = [s_{0} s_{1} \dots s_{d}]

. It is known that the Cartesian coordinates of

s

are given by:

\begin{matrix} s_{0} & = r \frac{{∥p∥}^{2} - r^{2}}{{∥p∥}^{2} + r^{2}} & s_{i} & = \frac{2 r^{2} x_{i}}{{∥p∥}^{2} + r^{2}} \forall i = 1, \dots, d \end{matrix}

(58)

where

{∥p∥}^{2} = \sum_{j = 1}^{d} x_{j}^{2}

. Hence one can see that the time complexity of projection for a single

d -

dimensional point is

O (d)

, provided we only need the Cartesian coordinates. However, for the Stereographic Embedding procedure, one would need to calculate the angles made by

s

with the axes of the space, making the time complexity of projection

O (p o l y (d))

. Therefore, the total time complexity of the Stereographic Embedding for a

d -

dimensional dataset D of size

| D | = N

and k centroids is given by

O ((k + N) p o l y (d))

. We now specify two strategies for scaling our algorithm for higher dimensional datapoints.

3.3.1. Using Qubit-Based System

We consider the case where we have two d-dimensional vectors

p_{1}, p_{2} \in R^{d}

, and we want to compute the quantum dissimilarity of vectors. If we have a qubit-based system and use dense angle encoding for encoding the stereographically projected point, we would encode the d calculated angles using

\frac{d}{2}

qubits. Namely, for the

(d + 1)

-dimensional projection of a d-dimensional point

p_{1}

, one would get d angles

[θ_{1} θ_{2} \dots θ_{d}]

that specify the projected point

s_{1}

on

S^{d} (r)

. We then encode this vector using the same unitary as follows:

\begin{matrix} | ψ_{1} 〉 = ⨂_{j \in (1, 3, \dots, d - 1)} | ψ_{1_{j}} 〉 = ⨂_{j \in o d d (d)} U (θ_{j}, θ_{j + 1}, 0) | 0 〉 . \end{matrix}

(59)

If d is odd we can pad

[θ_{1} θ_{2} \dots θ_{d}]

with an extra 0 to make it even. The other point

s_{2} = s_{r}^{- 1} (p_{2})

will be encoded into the state

\begin{matrix} | ψ_{2} 〉 = ⨂_{j^{'} \in (1, 3, \dots, d - 1)} | ψ_{2_{j^{'}}} 〉 = ⨂_{j^{'} \in o d d (d)} U (θ_{j^{'}}^{'}, θ_{j^{'} + 1}^{'}, 0) | 0 〉 . \end{matrix}

(60)

Now, to find the overlap between the states, one would have to perform the Bell-state measurement (Section 2.3) pairwise using

| ψ_{1_{j}} 〉

and

| ψ_{2_{j^{'}}} 〉

as inputs, i.e.

\begin{matrix} {| 〈 ϕ_{11} | | ψ_{1_{j}} 〉 \otimes | ψ_{2_{j^{'}}} 〉 |}^{2} = \frac{1}{2} {(1 - | 〈 ψ_{1_{j}} | ψ_{2_{j^{'}}} 〉 |}^{2}) \end{matrix}

(61)

In the common practical case of the vectors being expressed in an orthogonal basis, one would only have to find the overlap for

j = j^{'}

. We would then have the quantum dissimilarity by adding up the individual probabilities

\begin{matrix} d_{q} (p_{1}, p_{2}) = \sum_{j \in o d d (d)} {| 〈 ϕ_{11} | | ψ_{1_{j}} 〉 \otimes | ψ_{2_{j}} 〉 |}^{2} \end{matrix}

(62)

This procedure has a time complexity of

O (d)

. It is important point to note that this quantum dissimilarity will no longer correspond directly to the inner product or Euclidean distance between either

p_{1}

and

p_{2}

or

s_{1}

and

s_{2}

. With the strategy of pairwise overlap estimation, we see that if the number of shots to estimate

| 〈 ϕ_{11} | (| ψ_{1_{j}} 〉 \otimes | ψ_{2_{j}} 〉 |^{2}

is kept constant, the error in estimation will be

\propto d

. Hence, taking into account the increase in the number of shots to estimate the quantum dissimilarity with a given total error

ϵ

, the time complexity of this qubit implementation of overlap estimation between two points using SQ-kNN scales as

O (ϵ^{- 1} p o l y (d))

. Hence for all points and clusters, the time complexity would be

O (ϵ^{- 1} k N p o l y (d))

.

It is shown in [49] that collective measurements are a better strategy than repeated individual measurements for overlap estimation. Although this is shown in [49] for estimating overlap between two states given the availability of multiple copies of the same states, similar collective measurement strategies could be applied in this case for better results. In conclusion, the time complexity of SQ-kNN for qubit-based implementation is

\begin{matrix} O (ϵ^{- 1} k N p o l y (d)) . \end{matrix}

(63)

3.3.2. Using Qudit-Based System

Consider

| 1 〉 : = \sum_{i \in \{0, . . ., d - 1\}} | i i 〉 .

Then, for any two real vectors

| ψ 〉 = \sum ψ_{i} | i 〉

and

| ϕ 〉 = \sum ϕ_{i} | i 〉

, namely when

ψ_{i}, ϕ_{i} \in R

we have

\begin{matrix} 〈 1 | (| ψ 〉 \otimes | ϕ 〉) = \sum ψ_{i} ϕ_{i} = 〈 ϕ^{*} | ψ 〉 = 〈 ϕ | ψ 〉 . \end{matrix}

Now, if we make a qudit Bell measurement2, one of the basis states of this von Neumann measurement will be

| Φ 〉 \equiv {| Φ 〉}_{d} = \frac{1}{\sqrt{d}} | 1 〉 .

Thus, the inner product between two real vectors can still be measured with a Bell measurement, but the resulting probability of measuring outcome

| Φ 〉

scales as

{| 〈 Φ | (| ψ 〉 \otimes | ϕ 〉) |}^{2} = \frac{1}{d} {| 〈 ϕ | ψ 〉 |}^{2};

meaning that, as the inner product remains constant going to higher dimensions, the number of shots needed to estimate the inner product with constant precision scales polynomially in the dimension. In contrast, such complexity for the SWAP test remains constant because the contribution of the fidelity to the outcome probability is not divided by the dimension d. This is why the SWAP test is usually considered for inner product estimation, even if in the case of qubits the Bell measurement is a simpler solution [50].

3.4. SQ-kNN and Mixed States

Instead of estimating the quantum dissimilarity, we can use the datapoints produced by the ISP to perform classical kNN clustering on the 3D projected data. We called this the 3DSC-kNN (3D Stereographic Classical kNN) in 6. This algorithm produces centroids that are inside the sphere. As previously pointed out, when computing the Euclidean 3D centroid on the data projected on the sphere, the result is a point inside the sphere rather than on the sphere itself.

In the Bloch sphere, internal points are mixed states, namely states with added noise. In contrast, the quantum algorithm (SQ-kNN) always produces pure centroids, namely points on the surface of the sphere. The only noiseless states are the pure states on the surface of the sphere, and thus the intuition is that arguably mixed states should not help. However, this is not immediately clear from the algorithm. Comparing 3DEC-kNN to SQ-kNN, it is thus natural to ask whether embedding the centroids into mixed states inside the Bloch sphere improves the accuracy.

Here, we show that the intuition is correct, namely that projecting into the pure state centroid is a better option. The reason is that while the quantum dissimilarity is proportional to the Euclidean dissimilarity for states in the same sphere, the same is not true for Bloch vectors with different lengths.

To allow for mixed state embedding, we can modify the definition of quantum dissimilarity (Eq. (55)) to produce mixed states whenever the 3D vector has a length of less than one. This results in the following new quantum dissimilarity.

Definition 11

(Noisy Quantum Dissimilarity). Let

B^{2} (1) = \{P \in R^{3} | ∥P∥ \leq 1\}

be the ball of radius 1. We define thenoisy quantum dissimilarityas the function

{\tilde{d}}_{q} : B^{2} (1) \times B^{2} (1) \to R

\begin{matrix} {\tilde{d}}_{q} (P_{1}, P_{2}) : = & \frac{1}{2} (1 - Tr (ρ_{P_{1}} ρ_{P_{2}})) \end{matrix}

(64)

where

ρ_{P}

is the quantum state of the Bloch vector

P

as in Eq. (7).

Now suppose we have a convex combination of pure states; namely, suppose we have

\begin{matrix} \bar{ρ} & = \sum_{i} p_{i} ρ_{P_{i}} & {∥P∥}_{i} & = 1 \forall i . \end{matrix}

(65)

where

p_{i}

is a probability distribution (

p_{i} > 0

and

\sum p_{i} = 1

). By linearity, we have

\begin{matrix} \bar{ρ} & = ρ (\sum p_{i} P_{i}) ρ (\bar{P}) & \bar{P} & = \sum p_{i} P_{i}, \end{matrix}

(66)

By convexity,

\bar{P}

will always lie in the sphere. Namely, we have

∥\bar{P}∥ \leq 1

and thus by linearity

\begin{matrix} {\tilde{d}}_{q} (\bar{P}, P) & = \sum p_{i} {\tilde{d}}_{q} (P_{i}, P) = \sum p_{i} d_{q} (P_{i}, P) . \end{matrix}

(67)

The result in Eq. (67) can be interpreted as another two-step process: first, repeatedly performing the Bell-state measurement of each state

ρ_{P_{i}}

that makes up the cluster and

ρ_{P}

corresponding to the datapoint, to estimate each individual dissimilarity; and then, taking the weighted average of the dissimilarities according to the composition of the mixed state centroid. This procedure is clearly impractical experimentally and also no longer correlates to the cosine dissimilarity for mixed states.

Computing the diagonalization of

\bar{ρ}

as per Eq. (11)

\begin{matrix} ρ & = p ρ (\frac{P}{∥P∥}) + (1 - p) ρ (\frac{- P}{∥P∥}) & p & = \frac{1}{2} (1 + ∥\bar{P}∥) \end{matrix}

(68)

\begin{matrix} = p ψ_{\bar{P}} + (1 - p) ψ_{- \bar{P}} \end{matrix}

(69)

(where

ψ

is the Bloch embedding) makes the estimation more practical by reducing it to two estimations of

d_{q}

, namely

\begin{matrix} {\tilde{d}}_{q} (\bar{P}, P) & = p {\tilde{d}}_{q} (\frac{\bar{P}}{∥\bar{P}∥}, P) + (1 - p) {\tilde{d}}_{q} (- \frac{\bar{P}}{∥\bar{P}∥}, P) \end{matrix}

(70)

(71)

The implementation portrayed at Eq. (71) simplifies the measurement procedure of the mixed state. Furthermore, instead of estimating

d_{q} (\pm \bar{P}, P)

separately, the estimation can be done directly by preparing

ψ (P)

with probability p and

ψ (- P)

with probability

1 - p

, and finally collecting all the outcomes in a single estimation, which requires a larger number of shots to achieve the same precision of estimation. Another issue is that the points

\bar{P}, - \bar{P}

have to be computed, which is quite time-consuming. This is true even for Eq. (67); however, a number of shots proportional to the number of Bloch vectors

P_{i}

in the cluster is needed for an accurate estimation. Regardless, linearity and convexity make it clear that using mixed states can only increase the quantum dissimilarity.

Namely, while in Euclidean dissimilarity, points inside the sphere can reduce the dissimilarity, the quantum dissimilarity is proportional to the Euclidean dissimilarity only for unit vectors and actually increases for points inside the Bloch sphere. Hence we conclude that the behaviour of 3DSC-kNN does not carry over to SQ-kNN.

4. Quantum-Inspired Stereographic $k$ -Nearest-Neighbour Clustering

We have detailed the developed quantum algorithm in the previous Section 3. This section develops the classical analogue to this quantum algorithm - the `quantum-inspired’ classical algorithm. A table summarising all the algorithms discussed in this paper, including the next one, can be found in Table 1. We begin by defining this analogous classical algorithm in terms of the clustering state (1), deriving a relationship between the Euclidean and spherical centroids given datapoints that lie on a sphere, and then proving our claim that the defined classical algorithm and previously described stereographic quantum kNN are indeed equivalent.

Recall from Lemma 3 that

\begin{matrix} c_{s}^{update} (C) : = & \underset{x \in S^{n} (r)}{argmin} \sum_{p \in C} d_{s} (x, p) = r \frac{\sum_{p \in C} p}{∥\sum_{p \in C} p∥} . \end{matrix}

(72)

Definition 12 (2D Stereographic Classical kNN(2DSC-kNN)). Let

s_{r}^{- 1}

be the ISP, and let

(D, \bar{c}, R^{2}, d_{e})

be a 2D euclidean clustering state . We define the 2D Stereographic Classical kNN (2DSC-kNN) as

\begin{matrix} (s_{r}^{- 1} (D), s_{r}^{- 1} (\bar{c}), S^{2} (r), d_{s}, {\bar{c}}_{s}^{update}) . \end{matrix}

(73)

Remark 3.

Notice that due to the cluster update being cosine (

{\bar{c}}_{s}^{update}

) and 3, we can equivalently substitute

d_{s}

with

d_{e}

, namely we can substitute it without changing the outcome of the cluster update. In our implementation, we use the Euclidean dissimilarity for simplicity of coding.

To expand upon Definition 12, for the quantum-inspired/classical analogue stereographic kNN, the steps of execution are as follows:

Stereographically project all the 2-dimensional data and initial centroids into the sphere $S^{2} (r)$ of radius r. Notice that the initial centroids will lie on the sphere by construction.
Cluster Update: Form the clusters using the method defined in Eq. (22), i.e. form all $C_{j} ({\bar{c}}_{i})$ . Here, $D = S^{2} (r)$ and dissimilarity $d = d_{s} (p, c) = \frac{1}{2 r^{2}} d_{e} (p, c)$ (Definition 5 and Lemma 2).
Centroid Update: A closed-form expression for the centroid update was calculated in Eq. (41) $(c_{s}^{updated} = \frac{r}{∥\sum_{p \in C} p∥} \sum_{p \in C} p)$ . This expression recalculates the centroid once the new clusters have been formed. Once all the centroids are updated, Step 2 (cluster update) is then repeated, and so on, until a stopping condition is met.

Table 1. Summary of various kNN algorithms, where SC stands for “stereographic classical” (2D or 3D) and SQ for “stereographic quantum”. Here, D is the 2-dimensional dataset,

\bar{c}

are the 2-dimensional initial centroids (initial transmission points),

s_{r}^{- 1}

is the ISP into

S^{2} (r)

, and the dissimilarities

d_{e}

,

d_{s}

and

d_{q}

are defined in Eq. (25) and Definitions 5 and 8, respectively. The option of using

d_{e}

instead of

d_{s}

in the 2DSC-kNN is due to Remark 3.

Table 1. Summary of various kNN algorithms, where SC stands for “stereographic classical” (2D or 3D) and SQ for “stereographic quantum”. Here, D is the 2-dimensional dataset,

\bar{c}

are the 2-dimensional initial centroids (initial transmission points),

s_{r}^{- 1}

is the ISP into

S^{2} (r)

, and the dissimilarities

d_{e}

,

d_{s}

and

d_{q}

are defined in Eq. (25) and Definitions 5 and 8, respectively. The option of using

d_{e}

instead of

d_{s}

in the 2DSC-kNN is due to Remark 3.

Algorithm	Reference	Dataset	Initial	Dataspace	Dissimilarity	Centroid
			Centroids			Update
		D	${\bar{c}}_{1}$	$D$	d	$c^{update}$
2DEC-kNN	Definition 4	D	$\bar{c}$	$R^{2}$	$d_{e}$	$\frac{1}{\| C \|} \sum_{p \in C} p$
3DSC-kNN	Definition 6	$s_{r}^{- 1} (D)$	$s_{r}^{- 1} (\bar{c})$	$R^{3}$	$d_{e}$	$\frac{1}{\| C \|} \sum_{p \in C} p$
2DSC-kNN	Definition 12	$s_{r}^{- 1} (D)$	$s_{r}^{- 1} (\bar{c})$	$S^{2} (r)$	$d_{s}$ (or $d_{e}$ )	$r \frac{\sum_{p \in C} p}{∥\sum_{p \in C} p∥}$
SQ-kNN	Definition 10	$s_{r}^{- 1} (D)$	$s_{r}^{- 1} (\bar{c})$	$R^{3}$	$d_{q}$	$\sum_{p \in C} p$

4.1. Equivalence

We now want to show that the 2DSC-kNN algorithm of 12 is equivalent to the previously defined quantum algorithm using stereographic embedding (10). For that, we first define the equivalence of two clustering algorithms.

Definition 13

(Equivalence of Clustering Algorithms). Let

K = (D, {\bar{c}}_{1}, D, d, {\bar{c}}_{update})

and

K^{'} = (D^{'}, {\bar{c}}_{1}^{'}, D^{'}, d^{'}, {\bar{c}}_{update}^{'})

be two clustering algorithms. They are said to be equivalent if there exists a transformation

t : D \to D^{'}

such that it maps the data, initial centroids and centroid update, and clusters of

K

to the data, initial centroids and centroid update, and clusters of

K^{'}

; namely if

$D^{'} = t (D)$ ,
${\bar{c}}_{1}^{'} = t ({\bar{c}}_{1})$ and ${\bar{c}}_{update}^{'} = t \circ {\bar{c}}_{update}$ ,
$C_{j}^{'} (t (\bar{c})) = t (C_{j} (\bar{c}))$ for all $j = 1, . . ., k$ and any $\bar{c} \in D^{k}$ .

where we apply t elementwise and thus

t (\bar{c}) = (t (c_{1}), . . ., t (c_{k}))

for any list of centroids

\bar{c}

and

t (C) = \{t (p) : p \in C\}

for any set of datapoints C, and where

C_{j}

are the clusters as defined in Eq. (22).

Theorem 1.

SQ-kNN (Definition 10) and 2DSC-kNN (Definition 12) are equivalent.

Proof.

By definition, let

(D, {\bar{c}}_{1}, R^{2}, d_{e})

be the 2D clustering state and thus giving us the SQ-kNN algorithm as

\begin{matrix} K = (S, {\bar{s}}_{1}, R^{3}, d_{q}, {\bar{c}}_{q}^{update}) \end{matrix}

(74)

and the 2DSC-kNN clustering algorithm as

\begin{matrix} K^{'} = (S, {\bar{s}}_{1}, S^{2} (r), d_{s}, {\bar{c}}_{s}^{update}) \end{matrix}

(75)

where

\begin{matrix} S : = & s_{r}^{- 1} (D) & {\bar{s}}_{1} : = & s_{r}^{- 1} ({\bar{c}}_{1}) \end{matrix}

(76)

Let us use the notation

\hat{p} : = \frac{p}{∥p∥}

and define the transform

t : R^{3} \mapsto S^{2} (r)

as

t (p) = r \hat{p}

, which rescales any vector to have length r. Observe that trivially for all

p \in S^{2} (r)

,

t (p) = p

and thus

t \circ s_{r}^{- 1} = s_{r}^{- 1}

. Therefore

\begin{matrix} t (S) & = S, & t ({\bar{s}}_{1}) & = {\bar{s}}_{1} . \end{matrix}

(77)

Also, the equivalence of centroids is obtained since

\begin{matrix} t (c_{q}^{update} (C)) = t (\sum_{p \in C} p) = r \frac{\sum_{p \in C} p}{∥\sum_{p \in C} p∥} = c_{s}^{update} (C) . \end{matrix}

(78)

For the clusters, we prove the equivalence of the cluster updates as follows. We will use

d_{s} (a, b) = 4 \cdot d_{q} (a, b)

(54) and the fact that

d_{s}

and

d_{q}

are invariant under t, namely

d_{s} \circ t = d_{s}

and

d_{q} \circ t = d_{q}

, or more explicitly

\begin{matrix} d_{s} (t (a), b) & = d_{s} (a, b) = d_{s} (a, t (b)), & d_{q} (t (a), b) & = d_{q} (a, b) = d_{q} (a, t (b)) . \end{matrix}

(79)

Let now

\bar{s} \in {(R^{3})}^{k}

. Then, using the above equations and that

t (s) = s, t (S) = S

, we have

\begin{matrix} C_{j}^{'} (t (\bar{s})) & = \{p \in S | d_{s} (p, t (s_{j})) \leq d_{s} (p, t (s_{ℓ})) \forall ℓ = 1, . . ., k, p \notin ⋃_{ℓ < j} C_{ℓ}^{'} (t (\bar{s}))\} \end{matrix}

(80)

\begin{matrix} = \{p \in S | d_{q} (p, s_{j}) \leq d_{q} (p, s_{ℓ}) \forall ℓ = 1, . . ., k, p \notin ⋃_{ℓ < j} C_{ℓ} (\bar{s})\} \end{matrix}

(81)

where the change in the dissimilarity inequality has also transformed the calculation of

C_{ℓ}^{'} (\bar{s})

into the calculation of

C_{ℓ} (\bar{s})

. We are now done, since

t (s) = s

for

s \in S

and thus

\begin{matrix} 1 - 1 C_{j}^{'} (t (\bar{s})) & = \{t (p) \in t (S) | d_{q} (p, s_{j}) \leq d_{q} (p, s_{ℓ}) \forall ℓ = 1, . . ., k, p \notin ⋃_{ℓ < j} C_{ℓ} (\bar{s})\} \end{matrix}

(82)

(83)

This concludes the proof □

The following discussion provides a visual intuition of Theorem 1. In Figure 4, the sphere with centre origin (O) and radius r is the stereographic sphere into which the 2-dimensional points are projected, while the sphere with centre O and radius 1 is the Bloch sphere. The points

p_{1}, p_{2}, \dots, p_{n}

are the stereographically projected points defining a cluster, corresponding to the previously used labels

s_{1}, s_{2}, \dots, s_{n}

. The centroid

c_{e}

is obtained with the euclidean average in

R^{3}

. In contrast, the centroid

c_{s}

is restricted to be in

S^{2} (r)

and equal

c_{e}

rescaled to lie on this sphere. The quantum states

| ψ_{p_{1}} 〉, | ψ_{p_{2}} 〉, \dots, | ψ_{p_{n}} 〉

are obtained after Bloch embedding the stereographically projected points

p_{1}, p_{2}, \dots, p_{n}

, and

| ψ_{c} 〉

is the quantum state obtained after Bloch embedding the centroid. The points marked on the Bloch sphere in Figure 4 are the Bloch vectors of the quantum states

| ψ_{p_{1}} 〉, | ψ_{p_{2}} 〉, \dots, | ψ_{p_{n}} 〉

and

| ψ_{c} 〉

.

One can see from Definition 7 that the origin, any point

p

on the sphere and

| ψ_{p} 〉

are collinear. Hence, it can be seen that in the process of SQ-kNN clustering, the points on the stereographic sphere are projected radially into the sphere of radius 1. Once the labels were assigned in the previous iteration, the new centroid is computed, giving an integer multiple of the average point

c_{e}

, which lies within the stereographic sphere. Crucially, when we embed this new centroid into the quantum state for the quantum dissimilarity calculation of the next step, since we only use the polar and azimuthal angle of the point for embedding (see Definition 7), the prepared quantum state is also projected into the surface of the Bloch sphere - or, in other words, a pure state is prepared (

| ψ_{c} 〉

). Hence, we can see that all the dissimilarity calculations in the SQ-kNN will take place between points on the surface of the Bloch sphere, even though the calculated quantum centroid is contained outside the stereographic sphere. This argument also illustrates why any point on the ray

\vec{O c_{e} c_{s}}

can be used for the centroid update step of the stereographic quantum kNN; any chosen point on the ray, when embedded into a quantum state for dissimilarity calculations will reduce to

| ψ_{c} 〉

.

In short, we know from Lemma 3 that

O, c_{e},

and

c_{s}

lie on a straight line. Therefore one can see that if the Bloch sphere is scaled by r, the point on the Bloch sphere corresponding to

| ψ_{c} 〉

will transform to

c_{s}

, i.e.

0, | ψ_{c} 〉, c_{e}

and

c_{s}

are all collinear. Eq. (55) shows that SQ-kNN clustering clusters points on a sphere as per Euclidean dissimilarity; that implies that simply scaling the sphere makes no difference to the clustering. Therefore, we conclude that clustering on the surface of the stereographic sphere

S^{2}

(2DSC-kNN) is equivalent to the quantum algorithm with stereographic embedding (SQ-kNN).

4.2. Complexity Analysis and Scaling

As we showed in Section 3.2, the time complexity of ISP for calculating Cartesian coordinates of a

d -

dimensional vector is

O (d)

. Hence the total time complexity of projection for the 2DSC-kNN will be

O ((k + N) d)

where

N = | D |

is the total number of points, and k is the total number of centroids. Since the cluster update step uses Euclidean dissimilarity, it will take

O (k N d)

time in total (

O (d)

for each distance calculation, which is to be done for each pair of N points and k centroids). The centroid update expression (41) can be calculated in

O (N d)

making the total time for this step

O (k N d)

since we have k centroids. Hence we have

\begin{matrix} Time complexity of 2 DSC - kNN algorithm = O (k N d), \end{matrix}

(84)

on par with the classical k-means clustering algorithm, and at least polynomially faster than the stereographic quantum kNN (63) or Lloyd’s quantum clustering algorithm (taking into account input assumptions) [2,9,13].

5. Experiments and Results

We defined the procedure for SQ-kNN in Section 3. Section 3.1 introduces our idea for state preparation - projecting the 2-dimensional data points into a higher dimension. Section 3.1 details the hybrid quantum-classical method used for our process and then proves that the output of the quantum circuit is not only a valid but also an excellent metric that can be used for distance estimation between two points. Section 4 describes the quantum-inspired classical algorithm analogous to the quantum algorithm (2DSC-kNN). In this section, we test and compare the quantum-inspired rather the quantum algorithm for two main reasons:

The hardware performance and availability of quantum computers (NISQ devices) is currently so much worse than that of classical computers that no advantage can likely be obtained with the quantum algorithm.
The goal of this paper is not to show a “quantum advantage” in time complexity over the classical k-means in the NISQ context - it is to show that stereographic projection can lead to better learning for classical clustering and be a better embedding for quantum clustering. In particular, the equivalence between 2DSC-KNN and SQ-KNN proves that noise is the only limitation for the stereographic quantum algorithm to achieve the accuracy of the quantum-inspired algorithm.

All the experiments were carried out on a server with the following specifications: 2 Intel Xeon E5-2687W v4 chips clocked at 3.0 GHz (24 cores / 48 threads), 128GB RAM. All experiments are performed on the real-world 64-QAM data provided by Huawei (see Section 2.1 and Appendix A). Due to the extensive nature of testing and the large volume of analysis generated, we do not present all the figures in the following sections. Figures which sufficiently demonstrate general trends and observations have been included here. An exhaustive collection of all figures and other such analysis results, as well as the source code, real-world data, and raw data collected, can be obtained from the GitHub repository of the project.

The terminology used is as follows:

Radius: the radius of the stereographic sphere into which the 2-dimensional points are projected.
Number of points: the number of points upon which the clustering algorithm was performed. For every experiment, the selected points were a random subset of all the 64-QAM data (of a specific noise) with cardinality equal to the required number of points. The random subset is created using the numpy.random.sample() from the Python Numpy library.
Number of runs: Since for each choice of parameters for each experiment we select a subset of points at random, we repeat each of the experiments many times to remove bias from the random choice and obtain stable averages and standard deviations for the collected performance parameters (described in another list below). This number of repetitions is the “number of runs”.
Dataset Noise: As explained in Section 2.1, data was collected for four different input powers. The data is divided into four datasets labelled with powers 2.7, 6.6, 8.6 and 10.7 dBm.
Natural endpoint: The natural endpoint of a clustering algorithm occurs when

$\begin{matrix} C_{j} ({\bar{c}}_{i + 1}) & = C_{j} ({\bar{c}}_{i}) & \forall & j = 1, . . ., k \end{matrix}$

(85)

i.e. when all the clusters remained unchanged (stay the same) even after the centroid update. It is the natural endpoint since if the clusters do not change, the centroids will not change either in the next iteration, leading to the same clusters (Eq. (85)) and centroids for all future iterations.

The algorithms that we test are:

2DSC-kNN: The quantum-analogue algorithm of Definition 12, the classical equivalent of the SQ-KNN and the most important candidate for our testing.
2DEC-kNN: The standard classical kNN of Definition 4 implemented upon the original 2-dimensional dataset ( $n = 2$ ), which serves as a baseline for performance comparison.
3DSC-kNN: The standard classical kNN, but implemented upon the stereographically projected 2-dimensional dataset, as defined in Definition 6. We again emphasise that in contrast to the 2DSC-kNN the centroid lies within the sphere, and in contrast to the 2DEC-kNN, the clustering takes place in $R^{3}$ . This algorithm serves as another control, to gauge the relative impacts of stereographically projecting the dataset versus restricting the centroid to the surface of the sphere. It is an intermediate step between the 2DSC-kNN and the 2DEC-kNN algorithms.

From these algorithms, we measure the following performance parameters (or KPIs, Key Performance Indicators):

Accuracy: Since we have the true labels of the datapoints available, we can measure the accuracy of the algorithm as the percentage of points that have been given the correct label, i.e. symbol accuracy rate. All accuracies are recorded as a percentage.
Symbol or Bit error rate: As mentioned in Appendix A, due to Gray encoding, the bit error rate is approximately $\frac{1}{6}$ of the symbol error rate, which in turn is simply one minus the accuracy. Although error rates are the standard performance parameter in channel coding, we decided to measure the accuracy instead, which is the standard performance parameter in machine learning.
Accuracy gain: The gain is calculated as (accuracy of candidate algorithm - accuracy of 2-dimensional classical k-means clustering algorithm), i.e. it is the increase in accuracy of the algorithm over the baseline, defined as the accuracy of the classical k-means clustering algorithm acting on the 2D dataset for those number of points.
Number of iterations: One iteration of the clustering algorithm occurs when the algorithm performs the cluster update followed by the centroid update (the algorithm must then perform the cluster update again). The number of times the algorithm repeats these two steps before stopping is the number of iterations. We use the number of iterations the algorithm requires to reach its `natural endpoint’ as a proxy for convergence performance. The lesser the number of iterations performed, the faster the algorithm’s convergence. The number of iterations does not directly correspond to time performance since the time taken for one iteration differs between all algorithms.
Iteration gain: The gain in iterations is defined as (the number of iterations of 2D k-means clustering algorithm - the number of iterations of candidate algorithm), i.e. the gain is how many fewer iterations the candidate algorithm took than the 2DEC-kNN algorithm to reach its natural endpoint.
Execution time: The amount of time taken for a clustering algorithm to give the final output (the final centroids and clusters) given the 2-dimensional data points as input, i.e. the time taken end to end for the clustering process. All times in this work are recorded in milliseconds (ms).
Execution time gain: This gain is calculated as (the execution time of 2DEC-kNN k-means clustering algorithm - the execution time of candidate algorithm).
Overfitting Parameter: (accuracy in testing−accuracy in training).

With these algorithms and variables, we perform two main experiments:

The Overfitting Test: The dataset is divided into a `training’ and a `testing’ set, to characterise the clustering and classification performance of the algorithms.
The Stopping Criterion Test: The iterations and other performance parameters are varied, to test whether and what kind of stopping criterion is required.

We see that the tested algorithms display some very promising and interesting results. We manage to get improvements in accuracy and convergence performance almost across the board, and we discover the very important optimisation parameters of the radius of projection and the stopping criterion.

5.1. Experiment 1: Overfitting

Here, the datasets were divided into training and testing data. First, a random subset of cardinality equal to the number of points was chosen from the dataset, and then 80% of the selected points were assigned as `Training Data’, while the other 20% was assigned as `Testing Data’.

In the training phase, the algorithms were first run on the training data with the maximum possible iterations set to 50 to keep an acceptable running time. The stopping criterion for all algorithms was chosen as the natural endpoint - the algorithm stopped either when the number of iterations hit 50, or when the natural endpoint was reached (whichever happened first). The final centroid co-ordinates (

{\bar{c}}_{last iteration}

) were recorded in the training phase, to be used for the testing phase, along with several performance parameters. The recorded performance parameters were the algorithm’s accuracy, the number of iterations taken and the execution time.

Once the training was over, the centroids calculated at the end of training were used as the initial centroids for the testing set datapoints, and the algorithm was run with the maximum number of iterations set to 1, i.e. the calculated centroids were then used to classify the remaining points as per the dissimilarity and dataspace of each algorithm. The recorded performance parameters were the algorithm’s accuracy and execution time. Once both the testing and training accuracy had been recorded, the overfitting parameter was also recorded.

For each set of input variables (just the number of points for 2DEC-kNN clustering, the radius and number of points for the 2DSC-kNN and 3DSC-kNN clustering), the entire experiment (training and testing) was repeated 10,000 times in batches of 100 to calculate reasonable standard deviations for every performance parameter.

There are several reasons for this choice of experiment:

It exhaustively covers all the parameters that can be used to quantify the performance of the algorithms. We were able to observe very important trends in the performance parameters with respect to the choice of radius and the effect of the number of points (affecting the choice of when one should trigger the clustering process on the collected received points).
It avoids the commonly known problem of overfitting. Though this approach is not usually used in testing the kNN due to its iterative nature, we felt that from a machine learning perspective, it is useful to know how well the algorithms perform in a classification setting as well.
Another reason that justifies the training and testing approach (clustering and classification) is the nature of the real-world application setup. When transmitting QAM data through optical-fibre, the receiver receives only one point at a time and has to classify the received point to a given cluster in real-time using the current centroid values. Once a number of data points have accumulated, the kNN algorithm can be run to update the centroid values; after the update, the receiver will once again perform classification until some number of points has been accumulated. Hence, we can see that in this scenario, the clustering and the classification performance of the chosen method become important.

5.1.1. Results

We begin the presentation of the results of this experiment by first showing the characterisation of the 2DSC-kNN algorithm with respect to the input variables.

Figure 5 characterises the testing and training accuracy of the 2DSC-kNN algorithm acting upon the 2.7dBm dataset, i.e. classification and clustering performance respectively. Figure 6 portrays the same results in the form of a heat map, with a focus on the region of interest of the algorithm. These figures are representative of the trends of all four datasets.

Figure 7 characterises the convergence performance of the quantum algorithm - it shows how the number of iterations required to reach the natural endpoint of the 2DSC-kNN algorithm varies as the number of points and radius of projection changes. Once again, the figures for all the other datasets follow the same pattern as the included figures.

We then compare the performance of the 2DSC-kNN algorithm with that of the 3DSC-kNN and 2DEC-kNN algorithms.

Accuracy Performance

In all the figures in this section, the winner is chosen as the radius for which the maximum accuracy is achieved for the given number of points.

Figure 8 depicts the variation in testing accuracy with the number of points for all three algorithms along with error bars. As mentioned before, this characterises the performance of the algorithms in `classification’ mode, that is, when the received points must be decoded in real-time.

Figure 9 portrays the trend in training accuracy with the number of points for all three algorithms along with error bars. This characterises the performance of the algorithms in `clustering’ mode, that is, when the received points must be used to update the centroid for future re-classification or if the received datapoints are stored and decoded in batches.

Figure 8 and Figure 9 also plot the gain in testing and training accuracies respectively for the 3DSC-kNN and 2DSC-kNN algorithms. The label of the points in these figures is the radius of ISP for which that accuracy gain was achieved.

Iteration Performance

In all the figures in this section, the winner is chosen as the radius for which the minimum number of iterations is achieved for the given number of points.

Figure 10 shows how the required number of iterations for all three algorithms varies as the number of points increases. Figure 10 also displays the gain of the 2DSC-kNN and 3DSC-kNN algorithms in the number of iterations to reach their natural endpoints. The label of the points in these figures is the radius of ISP for achieving that iteration gain.

Time Performance

In all the figures in this section, the winner is chosen as the radius for which the minimum execution time is achieved for the given number of points.

Figure 11 puts forth the dependence of testing execution time upon the number of points for all three algorithms along with error bars. As mentioned before, these times are effectively the amount of time the algorithm takes for one iteration. This characterises the performance of the algorithms when performing direct classification decoding of the received points in real time.

This figure reveals the trend in training execution time with the number of points for all three algorithms, along with error bars. This characterises the time performance of the algorithms when performing clustering - that is, when the received points must be used to update the centroid for future re-classification or if the received datapoints are stored and decoded in batches.

Figure 12 graphs the gains in testing and training execution times for the 3DSC-kNN and 2DSC-kNN algorithms.

Overfitting Performance

Figure 13 exhibits how the overfitting parameter for the 2DEC-kNN k-means clustering, 3DSC-kNN and 2DSC-kNN algorithms vary as the number of points changes.

5.1.2. Discussion and Analysis

From Figure 5 we can see that there is an ideal radius > 1 for which maximum accuracy is achieved. This ideal radius is usually between 2 and 5 for our datasets. For a good choice of radius (>1), the accuracy increases monotonically (with an upper bound) with the number of points. In contrast, for a poor choice of radius (<1), the accuracy nosedives as the number of points increases. This is due to the clusters getting squished together near the North pole of the stereographic sphere (the point

(0, 0, r)

). If one is dealing with a large number of points, the accuracy becomes even more sensitive to the choice of radius as the decline in accuracy for a bad radius is much steeper as the number of points increases. These observations hold for both training and testing accuracy (classification and clustering), regardless of the noise in the dataset. These observations are also well-reflected in the heatmaps, where one can see that the best training and testing performance is for

r = 2

to 3 and the maximum number of points. It would seem that choosing too large of a radius is not too harmful. This might hold true for the classical algorithms, but when the quantum algorithm is deployed, all the points will be clustered around the South pole of the Bloch sphere and even minimal noise in the quantum circuit will degrade performance. Hence, there is a sweet spot of radius to be chosen.

Figure 7 also shows that there is an ideal radius > 1 for which one needs the minimum number of iterations to reach the natural endpoint. This ideal radius is once again between 2 and 5 for our datasets. As the number of points increases, the number of iterations always increases. The increase is minimal for a good choice of radius, while for a bad choice, the convergence is very slow. For our experiments, we chose the maximum iterations as 50 hence the observed plateau at 50 iterations. If one is dealing with a large number of points, the convergence becomes more sensitive to the choice of radius. The increase in iterations for a poor choice of radius is much steeper. 2DSC-kNN algorithm and 3DSC-kNN algorithm display near-identical performance.

From Figure 8 we can see that both 2DSC-kNN algorithm and 3DSC-kNN algorithm perform better in accuracy than the 2DEC-kNN algorithm for all datasets. The advantage becomes more definitive as the number of points increases as the increase in accuracy moves beyond the error bar. We observe the highest increase in accuracy for the 2.7dBm dataset.

In Figure 9 one can see the noticeably better performance of the 2DSC-kNN algorithm and 3DSC-kNN algorithm over the 2DEC-kNN algorithm for all datasets than in the testing case (classification mode). Once again the 2.7dBm dataset shows the maximum increase. The advantage again becomes more definitive as the number of points increases as the increase in accuracy moves beyond the error bar. The 2DSC-kNN algorithm and 3DSC-kNN algorithm show an almost identical performance.

From Figure 8 and Figure 9 we can also see that almost universally for both algorithms, the gain is greater than 0, i.e. we beat the 2DEC-kNN algorithm in nearly every case! We can also see that the best radius is almost always between 2 and 5. Another observation is that the gain in training accuracy increases with the number of points. The figures further display how similarly the 3DSC-kNN algorithm and 2DSC-kNN algorithm perform in terms of accuracy, regardless of noise.

From Figure 10 it can be concluded that for low noise datasets, since the number of iterations is already quite low, there is not much gain or loss; all three algorithms perform almost identically. For high-noise datasets, however, both the 3DSC-kNN algorithm and 2DSC-kNN algorithm show significant performance improvement, especially for a higher number of points. For a high number of points, the improvement is beyond the error bars and hence very significant. It can be noticed that the ideal radius for minimum iterations is once again between 2 and 5. Here also the 3DSC-kNN algorithm and 2DSC-kNN algorithm perform similarly, with the 2DSC-kNN algorithm performing better in certain cases.

One learns from Figure 11 that most importantly, the 2DSC-kNN algorithm and 2DEC-kNN algorithm take nearly the same amount of time for execution in classification mode, and the 2DSC-kNN algorithm in most cases beats the 3DSC-kNN algorithm. Here too, the gain is significant since it is much beyond the error bar. The execution time increases linearly with the number of points, as expected. These conclusions are supported by Figure 12.

Since the 2DSC-kNN algorithm takes almost the same time, and provides greater accuracy, it is an ideal candidate to replace the 2DEC-kNN algorithm for classification applications.

Figure 11 also shows that all three algorithms take almost the same amount of time for training, i.e. in clustering mode. The 3DSC-kNN and 2DSC-kNN algorithms once again perform almost identically, almost always slightly worse than 2DEC-kNN clustering. Figure 12 supports these observations. Here execution time increases linearly with the number of points as well, as expected.

In Figure 13, all three algorithms have nearly identical performance. As expected. the overfitting decreases with an increase in the number of points.

5.2. Experiment 2: Stopping criterion

Based on the results obtained from the first experiment, we performed another experiment to see how the accuracy of the algorithms varies iteration by iteration. It was observed that the natural endpoint of the algorithm was rarely the ideal endpoint in terms of performance. Hence we wished to observe the performance of each algorithm as the number of iterations progressed.

In this experiment, the entire random subset of datapoints was used for the clustering algorithm. The algorithms were run on the dataset, and the accuracy of the algorithms at each iteration as well as the iteration number of the natural endpoint was recorded. The maximum number of iterations was once again 50. By repeating this 100 times for each number of points (and radius, if applicable), we obtained the general performance variation of each algorithm with the iteration number. The input variables were the number of points, the radius of the stereographic sphere and the iteration number; the recorded performance parameters were the accuracy and probability of stopping.

This experiment revealed that the natural endpoint was indeed a poor choice of stopping criterion and that the endpoint should be chosen per some “loss function”. It also revealed some important trends in the performance parameters which not only emphasised the importance of the choice of radius and number of points but also gave greater insight into the disadvantages and advantages of each algorithm.

5.2.1. Results

Characterisation of the 2DSC-kNN algorithm

Figure 14 depicts the dependence of the accuracy of the 2DSC-kNN algorithm upon the iteration number and projection radius for the 2.7dBm dataset. The figures for the rest of the datasets follow the same trends and are nearly identical in shape.

Figure 15 shows the dependence of the probability of the 2DSC-kNN algorithm reaching its natural endpoint versus the radius of projection and iteration number for the 10.7dBm dataset with 51200 points and for the 2.7dBm dataset with 640 points. Once again, the figures for the rest of the datasets follow the same trends and their shape can be extrapolated from the presented Figure 15.

Comparison with 2DEC-kNN and 3DS-kNN K-Means Clustering

Figure 16 portrays the gain of the 2DSC-kNN and 3DSC-kNN algorithms in the number of iterations to reach maximum accuracy for the 2.7 and 10.7dBm datasets. In these figures, a gain of `g’ means that the algorithm took `g’ fewer iterations than the classical k means acting upon the 2D dataset did to reach maximum accuracy.

Figure 17 plots the gain of the 2DSC-kNN and 3DSC-kNN algorithms in the maximum achieved accuracy for the 2.7 and 10.7dBm datasets. Here, a gain of `g’ means that the algorithm was

g %

more accurate than the maximum accuracy of the classical k means acting upon the 2D dataset.

Lastly, Figure 18 illustrates the maximum accuracies achieved by the 2DSC-kNN, 3DSC-kNN, and 2DEC-kNN algorithms for the 2.7 and 10.7dBm datasets.

5.2.2. Discussion and Analysis

Figure 14 shows that once again, there is an ideal radius for which maximum accuracy is achieved. The ideal projection radius is larger than one, in particular, it seems to be between two and five. Most importantly, there is an ideal number of iterations for maximum accuracy, beyond which the accuracy reduces. As the number of points increases, the sensitivity of the accuracy to radius increases significantly. For a bad choice of radius, accuracy only falls with an increase in the number of iterations and stabilises at a very low value. For a good radius, accuracy increases to a point as iterations proceed, and then stabilises at a slightly lower value. If the allowed number of iterations is restricted, the choice of radius to achieve the best results becomes extremely important. With a good radius one can achieve nearly the maximum possible accuracy with very few iterations. As mentioned before, this holds for all dataset noises. As the dataset noise increases, the iteration number at which maximum accuracy is achieved also expectedly increases. Since accuracy always falls after a point, choosing a stopping criterion is essential rather than waiting for the algorithm to reach its natural endpoint. An idea for the stopping criterion is to record the sum of the average dissimilarity for each centroid at each iteration and stop the algorithm if that quantity increases.

Figure 15 portrays that for a good choice of radius, the 2DSC-kNN algorithm approaches convergence much faster. For

r < 1

the algorithm converges much slower or never converges. As the number of points increases, the convergence rate for poor radius falls dramatically. For a radius greater than the ideal radius as well, the convergence rate is lower. As one would expect, the algorithm takes longer to converge as dataset noise increases. As mentioned before, if the number of iterations is severely limited, the choice of radius becomes very important. The algorithm can reach its ideal endpoint in very few iterations if the radius is chosen well.

Through Figure 16 we see that for lower values of noise, both algorithms do not produce much advantage in terms of iteration gain, regardless of the number of points in the data set. However, both algorithms significantly outperform the classical one at higher noise in the dataset and a high number of points. This effect is especially significant for the 2DSC-kNN algorithm. For the highest noise and all the points, it saves over 20 iterations compared to the 2DEC-kNN algorithm - an advantage of over 50%. One of the reasons for this is that at low noises, the algorithms already perform quite well, and it is at high noise with a high number of points that the algorithm is stressed enough to reveal the difference in performance. It should be noted that these gains are much higher than when the algorithms are allowed to reach their natural endpoint, suggesting another reason for choosing an ideal stopping criterion.

Figure 17 shows that for all datasets and numbers of points, the two algorithms perform better than 2DEC-kNN clustering. The 3DSC-kNN algorithm and 2DSC-kNN algorithms perform nearly the same, and the accuracy gain seems to stabilise with an increase in the number of points. Figure 18 supports these conclusions.

5.3. Overall Observations

Overall Observations from Experiment 1

The ideal projection radius is greater than 1 and between 2 and 5. At this ideal radius, one achieves maximum testing and training accuracy, and minimum iterations.
In general, the accuracy performance is the same for 3DSC-kNN and 2DSC-kNN algorithms - this shows a significant contribution of the ISP to the advantage as opposed to `quantumness’. This is a significant distinction, not made by any previous work.
2DSC-kNN and 3DSC-kNN algorithms lead to an increase in the accuracy performance in general, with the increase most pronounced for the 2.7dBm dataset.
The 2DSC-kNN algorithm and 3DSC-kNN algorithm give more iteration performance gain (fewer iterations required than 2DEC-kNN) for high noise datasets and for a large number of points.
Generally, increasing the number of points favours the 2DSC-kNN and 3DSC-kNN algorithms, with the caveat that a good radius must be carefully chosen.

Overall Observations from Experiment 2

These results further stress the importance of choosing a good radius (2 to 5 in this application) and a better stopping criterion. The natural endpoint is not suitable.
The results justify the fact that the developed 2DSC-kNN algorithm has significant advantages over 2DEC-kNN k-means clustering and 3DSC-kNN clustering.
The 2DSC-kNN algorithm performs nearly the same as the 3DSC-kNN algorithm in terms of accuracy, but for iterations to achieve this max accuracy, the 2DSC-kNN algorithm is better (especially for high noise and a high number of points).
The developed 2DSC-kNN algorithm and 3DSC-kNN algorithm are better than the 2DEC-kNN algorithm in general - in terms of accuracy and iterations to reach that maximum accuracy.
The supremacy of the 2DSC-kNN algorithm over the 2DEC-kNN algorithm implies that a fully-quantum SQ-kNN algorithm would have an advantage over the fully-quantum k-means algorithm of [2].

6. Conclusion and Further work

This work considers the practical case of performing kNN on experimentally acquired 64-QAM data. This work has described the problem in detail and explained how the SQ-kNN and its classical analogue, the 2DSC-kNN clustering algorithm, can be used. The proposed processes and circuits, as well as the theoretical justification for the SQ-kNN quantum algorithm and the 2DSC-kNN classical algorithm, have been described in detail. Finally, the simulation results on the real-world datasets have been presented, along with relevant analysis. From the analysis, one can see that the classical analogue of the stereographic quantum kNN, the 2DSC-kNN algorithm, is something that should be considered for industrial implementation - the experiments provide proof of concept. It also shows the importance of choosing the projection radius and provides a very useful embedding for quantum machine learning algorithms - the generalised stereographic embedding. The theoretical advantage offered by the SQ-kNN algorithm over the hybrid quantum-classical quantum k-means algorithm has also been demonstrated. Another important inference from the obtained results is that the SQ-kNN algorithm offers a way to achieve the same advantage compared to the fully quantum k-means that 2DSC-kNN has over 2DEC-kNN - by using the stereographically projected quantum states. These results warrant the practical implementation and testing of both quantum and classical algorithms.

Quantum and quantum-inspired computing has the potential to change the way certain algorithms are performed, with potentially significant advantages. However, as the field is still in relative infancy, finding where quantum and quantum-inspired computing fits in practice is a challenging problem. Here, we have seen that quantum and quantum-inspired computing can indeed be applied to signal-processing scenarios and could potentially work well in the noisy quantum era as clustering algorithms that are relatively robust to noise and inaccuracy.

6.1. Future Work

One of the most important directions of future work is to experiment with more diverse datasets. More experimentation may also lead to more sophisticated methods of selecting the radius for ISP. A more detailed analysis of how to choose a radius of projection through analytical methods is another important direction for future work. A differential geometric analysis of the effects of the ISP on a square grid gives a rough intuition of why one needs an appropriate radius. A comparison with amplitude embedding is also warranted. The ellipsoidal projection (Appendix D) is another promising and novel idea that is to be explored further. In this project, two different stopping criteria for the algorithm were proposed and revealed a change in its performance; yet there is plenty of room to explore more possible stopping criteria.

Further directions of study include improved overlap estimation methods [49] and communication scenarios where the dimensionality of the data points is greatly increased. For example, when multiple carriers experience identical or at least systematically correlated phase rotations.

Another future work is to benchmark against sampling-based quantum-inspired algorithms. As part of a research analysis to evaluate the best possibilities for achieving a practical speed-up, we investigated the landscape of classical algorithms inspired by the sampling in quantum algorithms. Initially, we found that such algorithms have theoretical complexity competing with quantum algorithms, however only under arguably unrealistic assumptions on the structure of the classical data. As the performance of the quantum algorithms turns out to be extremely poor, this reopens the possibility that quantum-inspired algorithms can yield performance improvements while we wait for quantum computers with sufficiently low noise. Thus future work will also be a practical implementation of the quantum-inspired kNN [13], with the goal of testing the computational advantage over 2DEC-kNN, 3DSC-kNN, and 2DSC-kNN algorithms.

Funding

This work was funded by the TUM-Huawei Joint Lab on Algorithms for Short Transmission Reach Optics (ASTRO). This project has received funding from the DFG Emmy-Noether program under grant number NO 1129/2-1 (JN) and by the Federal Ministry of Education and Research of Germany in the programme of ”Souveran. Digital. Vernetzt.”. Joint project 6G-life, project identification number: 16KISK002, and of the Munich Center for Quantum Science and Technology (MCQST).

Data Availability Statement

All source codes and the complete set of generated graphs are available through the publicly accessible GitHub repository at the following link: https://github.com/AlonsoViladomat/Stereographic-quantum-embedding-clustering. The datasets are not published and are a property of Huawei Technologies.

Acknowledgments

We would like to acknowledge fruitful discussions with Stephen DiAdamo and Fahreddin Akalin during the initial stages of the project.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

A list of abbreviations used in this manuscript can be found in Table 2.

Table 2. The list of abbreviations used in this manuscript.

ADC	Analog-to-digital converter
CD	Chromatic dispersion
CFO	Carrier frequency offset
CPE	Carrier phase estimation
DAC	Digital-to-analog converter
DP	Dual Polarisation
ECL	External cavity laser
FEC	Forward error correction
GHz	Gigahertz
GBd	Gigabauds
GSa/s	$\times 10^{9}$ samples per second
DSP	Digital signal processing
ISP	Inverse stereographic projection
MIMO	Multiple input multiple output
M-QAM	M-ary Quadrature Amplitude Modulation
QML	Quantum Machine Learning
QRAM	Quantum Random Access Memory
TR	Timing recovery
SQ access	Sample and query access
kNN	k-Nearest-Neighbour clustering algorithm(3)
FF-QRAM	Flip flop QRAM
NISQ	Near-intermediate scale quantum
$D$	Dataspace
D	2-dimensional dataset
$\bar{c}$	set of all M centroids
$C (c)$	Cluster associated to a centroid $c$
$d (\cdot, \cdot)$	Dissimilarity (measure function)
$d_{e} (\cdot, \cdot)$	Euclidean dissimilarity
$d_{s} (\cdot, \cdot)$	Cosine dissimilarity
$s_{r}^{- 1}$	ISP into a sphere of radius r
$S^{n} (r)$	n-sphere of radius r
$H_{2}$	Hilbert space of one qubit
nDEC-kNN	n-dimensional Euclidean Classical kNN (Definition 4)
SQ-kNN	Stereographic Quantum kNN (Definition 10)
2DSC-kNN	2D Stereographic Classical kNN (Definition 12)
3DSC-kNN	3D Stereographic Classical kNN (Definition 6)

Appendix A. QAM and Data Visualisation

Quadrature amplitude modulation (QAM) conveys multiple digital bits with each transmission by mixing amplitude and phase variations in a carrier frequency. This is done by changing (modulating) the amplitudes of two carrier waves. The two carrier waves (of the same frequency) are out of phase with each other by 90

^{\circ}

; namely, they are the sine and cosine waves of a given frequency. This condition is known as orthogonality, and the two carrier waves as quadrature. The transmitted signal is created by adding together the two carrier waves (the sine and cosine components). The two waves can be coherently separated (demodulated) at the receiver because of their orthogonality. QAM is used extensively as a modulation scheme for digital telecommunication systems, such as in 802.11 Wi-Fi standards. Arbitrarily high spectral efficiencies can be achieved with QAM by setting a suitable constellation size, limited only by the noise level and linearity of the communications channel [51]. QAM allows us to transmit multiple bits for each time interval of the carrier symbol. The term “symbol” means some unique combination of phase and amplitude [52].

In this work, each transmitted signal corresponds to a complex number

s \in C

:

\begin{matrix} s = | s | e^{i ϕ}, \end{matrix}

(A1)

where

{| s |}^{2}

is the initial transmission power and

ϕ

is the phase of s. The case shown in Eq. (A1) is ideal; however, in real-world systems, noise affects the transmitted signal, distorting and scattering it in the amplitude and phase space. For our case, the received and partially processed noisy signal can be modelled as follows:

\begin{matrix} s = | s | e^{i ϕ} + N, \end{matrix}

(A2)

where

N \in C

is a random noise affecting the overall value of ideal amplitude and phase. This model motivates the use of nearest neighbour clustering for cases when the noise

N

causes the received signal to be scattered in the vicinity of the ideal signal s.

Appendix A.1. Description of 64-QAM Data

The various datasets we collected through the setup described in Section 2.1 are visualised in this section. As mentioned, there are four datasets with launch powers of

2.7, 6.6, 8.6, 10.7

dBm, corresponding to various noise levels. Each data set consists of three variables:

`alphabet’: The initial analog values at which the data was transmitted, in the form of complex numbers, i.e. for an entry ( $a + i b$ ), the transmitted signal was of the form $a sin (θ) + b cos (θ)$ . Since the transmission protocol is 64-QAM, there are 64 values in this variable. The transmission alphabet is the same irrespective of the non-linear distortions.
`rxsignal’: The received analog values of the signal by the receiver. This data is in the form of a $52124 \times 5$ matrix. Each datapoint was transmitted five times to the receiver, and so each row contains the values detected by the receiver during the different instances of the transmission of the same datapoint. The values in different rows represent unique datapoint values detected by the receiver.
`bits’: This is the true label for the transmitted points. This data is in the form of a $52124 \times 6$ matrix. Since the protocol is 64-QAM, each analog point represents six bits. These six bits are the entries in each column, and each value in a different row represents the correct label for a unique transmitted datapoint value. The first three bits encode the column, and the last three bits encode the row - see Figure A3.

In Appendix A.1, we can see the analog transmission values (alphabet) for all channels. In the subsequent four figures (Figure A2 to Figure A2), we can observe the transmission data for all the iterations for each channel. The first transmission data is represented as blue crosses, the second transmission as orange circles, the third transmission as yellow dots, the fourth transmission as purple stars, and the fifth transmission as green pluses. The de-mapping alphabet is depicted in Figure A3.

Figure A1. The analog alphabet (initial analog transmission values) for the data transmitted in all the channels. The real part represents the amplitude of the transmitted sine wave, and the imaginary part represents the amplitude of the cosine wave.

One can see from these figures that as the noise in the channel increases, the points are further scattered away from the initial alphabet. In addition, the non-linear noise effects also increase, causing distortion of the `shape’ of the data, most clearly visible in Figure A2 - especially near the `corners’. The birefringence phase noise also increases with an increase in the channel noise, causing all the points to be `rotated’ about the origin.

Once the centroids have been found and the data has been clustered, as mentioned before, we need to `de-map’ the analog centroid values and clusters to bit-strings. For this, we need a de-mapping alphabet which maps the analog values of the alphabet to the corresponding bit strings. The de-mapping alphabet is depicted in Figure A3. It can be seen from the figure that, as in most cases, the points are Gray coded, i.e. adjacent points differ in binary translation by only 1 bit. This helps minimise the number of bit errors per symbol error in case of misclassification or exceptionally high noise. In case a point is misclassified, with most probability, it will be assigned to a neighbouring cluster. Since the surrounding clusters differ by only 1 bit, it minimises the bit error rate. Due to Gray coding, the bit error rate is approximately

\frac{1}{6}

of the symbol error rate.

Figure A2. The four datasets detected by the receiver with various launch powers corresponding to different noise levels: 2.7dBm (top left), 6.6dBm (top right), 8.6dBm (bottom left), 10.7 dBm (bottom right). All five iterations of transmission are depicted together.

Figure A3. The bit-string mapping and demapping alphabet.

Appendix B. Data Embedding

One needs data in the form of quantum states for processing in a Quantum Computer. However, due to the instability of current qubits, data can only be stored for an extended period in classical form. Hence, the need arises to convert classical data into a quantum form. NISQ devices have a minimal number of logical qubits, which are, in addition, stable for only a limited time before they lose their quantum information to decoherence. This makes the first step in Quantum Machine Learning to load classical data by encoding it into qubits. This process is called data encoding or embedding. Classical data encoding for quantum computation plays a critical role in the overall design and performance of quantum machine learning algorithms. Table A1 summarises the various forms of data embedding.

Table A1. Summary of embeddings [2,22,53].

Embedding	Encoding	Num. qubits required	Gate Depth
Basis	$x_{i} \approx \sum_{i = - k}^{m} b_{i} 2^{i} \mapsto \| b_{m} \dots b_{- k} 〉$	$l = k + m$ per data point	$O ({log}_{2} n)$
Angle	$x_{i} \mapsto cos (x_{i}) \| 0 〉 + s i n (x_{i}) \| 1 〉$	$O (n)$	$O (1)$
Amplitude	$X \mapsto \sum_{i = 0}^{n - 1} x_{i} \| i 〉$	$⌈ {log}_{2} n ⌉$	$O (2^{n})$ gates
QRAM	$X \mapsto \sum_{n = 0}^{n - 1} \frac{1}{\sqrt{n}} \| i 〉 \| x_{i} 〉$	$⌈ {log}_{2} n ⌉ + l$	$O ({log}_{2} n)$ queries

Appendix B.1. Angle Embedding

Angle encoding [48,54,55] is one of the most fundamental forms of encoding classical data into a quantum state. Each data point is represented as a separate qubit. The

n^{t h}

classical real number is encoded into the rotation angle of the

n^{t h}

qubit. In its most basic form, this encoding requires N qubits to represent N dimensional data. It is quite cheap to prepare in terms of complexity – all that is needed is one rotation quantum gate for each qubit. This is one of the forms of encoding we have used to implement quantum kNN clustering. It is generally useful for quantum neural networks and other such QML algorithms. Angle encoding encodes N features into the rotation angles of n qubits where

N \leq n

.

The rotations can be chosen as either

R_{X} (θ)

,

R_{Y} (θ)

or

R_{Z} (θ)

gates. As a first step, each input data point is normalised to the interval

[0, π]

. To encode the data points, a rotation around the y-axis is used. The angle of rotation depends on the value of the normalised data point. This creates the following separable state:

\begin{matrix} | ψ 〉 = R_{Y} (x_{0}) | 0 〉 \otimes R_{Y} (x_{1}) | 0 〉 \otimes \dots \otimes R_{Y} (x_{n}) | 0 〉 \end{matrix}

(A3)

\begin{matrix} = (\begin{matrix} cos x_{0} \\ sin x_{0} \end{matrix}) \otimes (\begin{matrix} cos x_{1} \\ sin x_{1} \end{matrix}) \otimes \dots \otimes (\begin{matrix} cos x_{n} \\ sin x_{n} \end{matrix}) \end{matrix}

(A4)

It can easily be seen that one qubit is needed per data point, which is not optimal. To load the data, the rotations on the qubits can be performed in parallel; thus, the depth of the circuit is optimal [48].

The main advantage of this encoding is that it is very efficient in terms of operations – only a constant number of parallel operations are needed regardless of how many data values need to be encoded. This is not optimal from a qubit point of view (the circuit is very wide), as every input vector component requires one qubit. Another related encoding, dense angle encoding, exploits an additional property of qubits (relative phase) to use only

n / 2

qubits to encode n data points. QRAM can be used to generate the more compact quantum state

\sum | i 〉 R_{Y} (θ_{i}) | 0 〉

Appendix C. Stereographic Projection

Appendix C.1. ISP for General Radius

In this appendix, the transformations for obtaining the Cartesian coordinates of the projected point on a sphere of general radius are derived, followed by the derivation of polar and azimuthal angles of the point on the sphere. First mentioned are three conditions that the point on the sphere must satisfy, and then follows the rest of the derivation. Refer to Figure A4 for a better understanding of the conditions and calculations.

Figure A4. ISP for a sphere of radius `r’. In this figure, the plane is the plane of angle

ϕ

with respect to the X-axis perpendicular to the XY plane, i.e. the plane containing the 2D point and it projection.

Figure A4. ISP for a sphere of radius `r’. In this figure, the plane is the plane of angle

ϕ

with respect to the X-axis perpendicular to the XY plane, i.e. the plane containing the 2D point and it projection.

Azimuthal angle of the original point and the projected point must be the same, i.e. the original point, projected point, and the top of the sphere (the point from which all projections are drawn) lie on the same plane, which is perpendicular to the 2D plane.

$\begin{matrix} \Rightarrow \frac{s_{y}}{s_{x}} & = \frac{p_{y}}{p_{x}} \end{matrix}$

(A5)
The projected point lies on the sphere.

$\begin{matrix} \Rightarrow s_{x}^{2} + s_{y}^{2} + s_{z}^{2} & = r^{2} \end{matrix}$

(A6)
The triangle with vertices $(0, 0, r)$ , $(0, 0, 0)$ and $(p_{x}, p_{y}, 0)$ is similar to the triangle with vertices $(0, 0, r)$ , $(0, 0, s_{z})$ and $(s_{x}, s_{y}, s_{z})$ :

$\begin{matrix} \Rightarrow \frac{\sqrt{p_{x}^{2} + p_{y}^{2}}}{r} & = \frac{\sqrt{s_{x}^{2} + s_{y}^{2}}}{r - s_{z}} . \end{matrix}$

(A7)

Using Eqs. (A7) and (A5), we get

\begin{matrix} s_{x} = p_{x} \cdot (1 - \frac{s_{z}}{r}) & and s_{y} = p_{y} \cdot (1 - \frac{s_{z}}{r}) \end{matrix}

Substituting in Eq. (A6) we get,

\begin{matrix} s_{z} & = r (\frac{p_{x}^{2} + p_{y}^{2} - r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}}) \end{matrix}

Hence one gets the set of transformations:

\begin{matrix} s_{x} & = p_{x} (\frac{2 r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}}) \end{matrix}

(A8)

\begin{matrix} s_{y} & = p_{y} (\frac{2 r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}}) \end{matrix}

(A9)

\begin{matrix} s_{z} & = r (\frac{p_{x}^{2} + p_{y}^{2} - r^{2}}{p_{x}^{2} + p_{y}^{2} + r^{2}}) \end{matrix}

(A10)

Calculating the polar angle (

θ

) and azimuthal angle (

ϕ

):

\begin{matrix} ϕ & = {tan}^{- 1} (\frac{s_{y}}{s_{x}}) \end{matrix}

(A11)

\begin{matrix} \Rightarrow ϕ & = {tan}^{- 1} (\frac{p_{y}}{p_{x}}) \end{matrix}

(A12)

\begin{matrix} tan (\frac{π - θ}{2}) & = \frac{\sqrt{p_{x}^{2} + p_{y}^{2}}}{r} \end{matrix}

(A13)

\begin{matrix} \Rightarrow θ & = 2 \cdot {tan}^{- 1} (\frac{r}{\sqrt{p_{x}^{2} + p_{y}^{2}}}) \end{matrix}

(A14)

Appendix C.2. Equivalence of Displacement and Scaling

Refer to Figure A5. Here,

T

is the point from which all the projections originate;

O

and

O^{'}

are the centres of the projection spheres of radius

r

and

(1 + δ) r

respectively;

P

is the point on the 2D plane to be projected; and

S

and

S^{'}

are the ISP of P on the spheres of radius

r

and centre

O

and radius

r^{'}

and centre

O^{'}

respectively.

\begin{matrix} | \bar{OT} | = | \bar{OS} | = r \end{matrix}

(A15)

\begin{matrix} \Rightarrow ∠ O T S = ∠ O S T = θ \end{matrix}

(A16)

\begin{matrix} \Rightarrow ∠ T O S = π - 2 θ \end{matrix}

(A17)

Also,

\begin{matrix} | \bar{O^{'} T} | = | \bar{O^{'} S^{'}} | = (1 + δ) r \end{matrix}

(A18)

\begin{matrix} \Rightarrow ∠ O^{'} T S^{'} = ∠ O^{'} S^{'} T = θ \end{matrix}

(A19)

\begin{matrix} \Rightarrow ∠ T O^{'} S^{'} = π - 2 θ \end{matrix}

(A20)

Hence

\begin{matrix} ∠ T O S = ∠ T O^{'} S^{'} = π - 2 θ \end{matrix}

Figure A5. ISP on a sphere displaced above the plane and a sphere centred at origin.

Since both

S

and

S^{'}

lie on the same plane, which is a vertical cross-section of the sphere (plane perpendicular to the data plane and passing through the centre of both stereographic spheres), the azimuthal angle of both points is equal (

ϕ = {tan}^{- 1} (\frac{p_{y}}{p_{x}})

).

Hence one can see that the azimuthal and the polar angle generated by ISP on a sphere of radius

r

displaced above the 2D plane containing the points by

(1 + δ) r

is the same as the azimuthal and the polar angle generated by ISP on a sphere of radius

(1 + δ) r

centred at origin. This reduces the effective number of parameters that can be chosen for the embedding.

Appendix D. Ellipsoidal Embedding

Here, we first derive the transformations for obtaining the Cartesian coordinates of the projected point on a general ellipsoid, followed by the derivation of polar and azimuthal angles for the point on the ellipsoid. First mentioned are three conditions that the point on the sphere must satisfy, and then follows the rest of the derivation. Refer to Figure A6 for a better understanding of the conditions and calculations.

Figure A6. Ellipsoidal Projection: a generalisation of the ISP.

Azimuthal angle -

$\begin{matrix} \Rightarrow \frac{s_{y}}{s_{x}} & = \frac{p_{y}}{p_{x}} \end{matrix}$

(A21)
The projected point lies on the ellipsoid -

$\begin{matrix} \Rightarrow \frac{s_{x}^{2}}{a^{2}} + \frac{s_{y}^{2}}{b^{2}} + \frac{s_{z}^{2}}{c^{2}} & = 1 \end{matrix}$

(A22)
The triangle with vertices $(0, 0, c)$ , $(0, 0, 0)$ and $(p_{x}, p_{y}, 0)$ is similar to the triangle with vertices $(0, 0, c)$ , $(0, 0, s_{z})$ and $(s_{x}, s_{y}, s_{z})$ -

$\begin{matrix} \Rightarrow \frac{\sqrt{p_{x}^{2} + p_{y}^{2}}}{c} & = \frac{\sqrt{s_{x}^{2} + s_{y}^{2}}}{c - s_{z}} . \end{matrix}$

(A23)

From the above conditions, we have

\begin{matrix} s_{x} = p_{x} \cdot (1 - \frac{s_{z}}{c}) and s_{y} = p_{y} \cdot (1 - \frac{s_{z}}{c}) \end{matrix}

(A24)

Substituting as before, we get

\begin{matrix} s_{z} & = c \cdot (\frac{\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} - 1}{\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} + 1}) \end{matrix}

(A25)

Hence one gets the set of transformations:

\begin{matrix} s_{x} & = p_{x} (\frac{2}{\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} + 1}) \end{matrix}

(A26)

\begin{matrix} s_{y} & = p_{y} (\frac{2}{\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} + 1}) \end{matrix}

(A27)

\begin{matrix} s_{z} & = c (\frac{\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} - 1}{\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} + 1}) \end{matrix}

(A28)

From Figure A6 one can see that

\begin{matrix} tan (π - θ) & = \frac{\sqrt{s_{x}^{2} + s_{y}^{2}}}{- s_{z}} \\ ∴ θ & = {tan}^{- 1} (\frac{2 \sqrt{p_{x}^{2} + p_{y}^{2}}}{c (\frac{p_{x}^{2}}{a^{2}} + \frac{p_{y}^{2}}{b^{2}} - 1)}) \end{matrix}

Also, as before, by the same reasoning

ϕ = {tan}^{- 1} (\frac{p_{y}}{p_{x}})

Now that we have these expressions, we have two methods of encoding the datapoint. We can either encode it as before, using the unitary

U (θ, ϕ)

, which would correspond to projecting all the points on the ellipsoid to the surface of the sphere radially; or we could use mixed states to represent the points on the surface of the ellipsoid after rescaling it to lie within the Bloch sphere.

Appendix E. Distance Estimation Using Stereographic Embedding

In our proposed quantum algorithm (the SQ-kNN algorithm), we project the original 2-dimensional points into the stereographic sphere before converting them into quantum states using angle embedding and then estimating the overlap with the Bell state measurement circuit. Due to the many steps of this procedure, it is insightful to calculate the final output in terms of the original input, the 2-dimensional datapoints. It also serves as a helpful point of comparison with the 2DEC-kNN algorithm, where the Euclidean dissimilarity between the 2-dimensional points is used for classification.

Recall Eq. (50) from Section 2.7. Then concatenating the ISP with the cosine dissimilarity, we get a new dissimilarity as follows.

As mentioned before, we begin with two 2D points

p_{1} = (\begin{matrix} x_{1} \\ y_{1} \end{matrix})

,

p_{2} = (\begin{matrix} x_{2} \\ y_{2} \end{matrix})

and compute

d_{s} \circ s_{r}^{- 1} (p_{1}, p_{2}) d_{s} (s_{r}^{- 1} (p_{1}), s_{r}^{- 1} (p_{2})) .

(A29)

To calculate this, we compute

s_{r}^{- 1} (p_{1}) \cdot s_{r}^{- 1} (p_{2})

using Eq. (50):

\begin{matrix} s_{r}^{- 1} (p_{1}) \cdot s_{r}^{- 1} (p_{2}) & = \frac{4 r^{4} p_{1} \cdot p_{2}}{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2})} + r^{2} \frac{({∥p_{1}∥}^{2} - r^{2}) ({∥p_{2}∥}^{2} - r^{2})}{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2})} \end{matrix}

Combining, we have:

\begin{matrix} d_{s} \circ s_{r}^{- 1} (p_{1}, p_{2}) & = 1 - \frac{1}{r^{2}} s_{r}^{- 1} (p_{1}) \cdot s^{- 1} (p_{2}) \end{matrix}

(A30)

\begin{matrix} = \frac{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2}) - 4 r^{2} p_{1} \cdot p_{2}}{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2})} - \frac{({∥p_{1}∥}^{2} - r^{2}) ({∥p_{2}∥}^{2} - r^{2})}{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2})} \end{matrix}

(A31)

\begin{matrix} = \frac{2 r^{2} {∥p_{1}∥}^{2} + 2 r^{2} {∥p_{2}∥}^{2} - 4 r^{2} p_{1} \cdot p_{2}}{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2})} \end{matrix}

(A32)

\begin{matrix} = \frac{2 r^{2} {∥p_{1} - p_{2}∥}^{2}}{({∥p_{1}∥}^{2} + r^{2}) ({∥p_{2}∥}^{2} + r^{2})} \end{matrix}

(A33)

(A34)

where

d_{e}

is the Euclidean dissimilarity from Eq. (25).

It is illustrative to pick the point

(0, 0)

(origin) and see how this function varies as the other point

p

varies. In this case, we have:

\begin{matrix} \frac{1}{2} d_{s} \circ s_{r}^{- 1} (0, p) & = \frac{r^{2} {∥p∥}^{2}}{(r^{2} + {∥p∥}^{2}) (r^{2})} = \frac{{∥p∥}^{2}}{r^{2} + {∥p∥}^{2}} = 1 - \frac{r^{2}}{r^{2} + {∥p∥}^{2}} \end{matrix}

(A35)

For Figure A7, the radius of the stereographic sphere is assumed to be one. Hence the quantum dissimilarity reduces to:

\begin{matrix} \frac{1}{2} d_{s} \circ s_{r}^{- 1} (0, p) & = 1 - \frac{1}{1 + {∥p∥}^{2}} \end{matrix}

(A36)

For , the radius of the stereographic sphere is assumed to be 2 and

0.5

, respectively.

Figure A7.

r = 1

(left)

r = 2

(middle)

r = 0.5

(right).

Figure A7.

r = 1

(left)

r = 2

(middle)

r = 0.5

(right).

Appendix F. Rotation Gates and the UGate

The complete expression for the unitary UGate in Qiskit is as follows:

\begin{matrix} U (θ, ϕ, λ) : = & (\begin{matrix} cos \frac{θ}{2} & - e^{i λ} sin \frac{θ}{2} \\ e^{i ϕ} sin \frac{θ}{2} & e^{i (ϕ + λ)} cos \frac{θ}{2} \end{matrix}) \end{matrix}

Using the

| 0 〉

state for encoding, we have

\begin{matrix} ∴ U (θ, ϕ, λ) | 0 〉 & = (\begin{matrix} cos \frac{θ}{2} & - e^{i λ} sin \frac{θ}{2} \\ e^{i ϕ} sin \frac{θ}{2} & e^{i (ϕ + λ)} cos \frac{θ}{2} \end{matrix}) (\begin{matrix} 1 \\ 0 \end{matrix}) = (\begin{matrix} cos \frac{θ}{2} \\ e^{i ϕ} sin \frac{θ}{2} \end{matrix}) \end{matrix}

One can see that this state is not dependent on

λ

. On the other hand, if we use the state

| 1 〉

for encoding, we have

\begin{matrix} U (θ, ϕ, λ) | 1 〉 & = (\begin{matrix} cos \frac{θ}{2} & - e^{i λ} sin \frac{θ}{2} \\ e^{i ϕ} sin \frac{θ}{2} & e^{i (ϕ + λ)} cos \frac{θ}{2} \end{matrix}) (\begin{matrix} 0 \\ 1 \end{matrix}) = (\begin{matrix} - e^{i λ} sin \frac{θ}{2} \\ e^{i (ϕ + λ)} cos \frac{θ}{2} \end{matrix}) & = e^{i λ} (\begin{matrix} - sin \frac{θ}{2} \\ e^{i ϕ} cos \frac{θ}{2} \end{matrix}) \end{matrix}

One can see that the

λ

term leads only to a global phase. A global phase will not affect the observable outcome of the SWAP test or Bell-state measurement (due to the modulus operator) - hence once again, no information can be encoded into the quantum state using

λ

.

For constructing the point (

θ, ϕ

) on the Bloch sphere, we can use rotation gates as well:

\begin{matrix} (θ, ϕ) : = & R_{Z} (ϕ) R_{Y} (θ) | 0 〉 \end{matrix}

(A37)

\begin{matrix} = (\begin{matrix} e^{- i \frac{ϕ}{2}} & 0 \\ 0 & e^{i \frac{ϕ}{2}} \end{matrix}) (\begin{matrix} cos \frac{θ}{2} & - sin \frac{θ}{2} \\ sin \frac{θ}{2} & cos \frac{θ}{2} \end{matrix}) (\begin{matrix} 1 \\ 0 \end{matrix}) \end{matrix}

(A38)

\begin{matrix} = (\begin{matrix} e^{- i \frac{ϕ}{2}} & 0 \\ 0 & e^{i \frac{ϕ}{2}} \end{matrix}) (\begin{matrix} cos \frac{θ}{2} \\ sin \frac{θ}{2} \end{matrix}) \end{matrix}

(A39)

\begin{matrix} = (\begin{matrix} e^{- i \frac{ϕ}{2}} cos \frac{θ}{2} \\ e^{i \frac{ϕ}{2}} sin \frac{θ}{2} \end{matrix}) = e^{- i \frac{ϕ}{2}} (\begin{matrix} cos \frac{θ}{2} \\ e^{i ϕ} sin \frac{θ}{2} \end{matrix}) \end{matrix}

(A40)

\begin{matrix} = e^{- i \frac{ϕ}{2}} U (θ, ϕ) | 0 〉 \end{matrix}

(A41)

From Eq. (A41) one can see that

R_{Z} (ϕ) R_{Y} (θ) | 0 〉

and

U (θ, ϕ) | 0 〉

only differ by a global phase (

e^{- i \frac{ϕ}{2}}

). Hence, the

R_{Z} (ϕ) R_{Y} (θ)

and

U (θ, ϕ)

operations can be used interchangeably for state preparation since a global phase will not affect the observable result of the SWAP test.

References

Harrow, A.W.; Hassidim, A.; Lloyd, S. Quantum Algorithm for Linear Systems of Equations. Physical Review Letters 2009, 103. [Google Scholar] [CrossRef] [PubMed]
Lloyd, S.; Mohseni, M.; Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv 2013, arXiv:1307.0411. [Google Scholar]
Preskill, J. Quantum Computing in the NISQ era and beyond. Quantum 2018, 2, 79. [Google Scholar] [CrossRef]
Tang, E. Quantum Principal Component Analysis Only Achieves an Exponential Speedup Because of Its State Preparation Assumptions. Physical Review Letters 2021, 127. [Google Scholar] [CrossRef] [PubMed]
Arute, F.; Arya, K.; Babbush, R.; Bacon, D.; Bardin, J.; Barends, R.; Biswas, R.; Boixo, S.; Brandao, F.; Buell, D.; et al. Quantum Supremacy using a Programmable Superconducting Processor. Nature 2019, 574, 505–510. [Google Scholar] [CrossRef] [PubMed]
Schuld, M.; Petruccione, F. Supervised Learning with Quantum Computers; Quantum Science and Technology, Springer International Publishing, 2018.
M. Schuld, I. Sinayskiy, F.P. An introduction to quantum machine learning. arXiv:1409.3097 [quant-ph] 2014.
Kerenidis, I.; Prakash, A. Quantum Recommendation Systems. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS 2017); Papadimitriou, C.H., Ed.; Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik: Dagstuhl, Germany, 2017; Vol. 67, Leibniz International Proceedings in Informatics (LIPIcs), pp. 49:1–49:21. [CrossRef]
Kerenidis, I.; Landman, J.; Luongo, A.; Prakash, A. q-means: A quantum algorithm for unsupervised machine learning. arXiv 2018, arXiv:1812.03584. [Google Scholar]
Modi, A.; Jasso, A.V.; Ferrara, R.; Deppe, C.; Noetzel, J.; Fung, F.; Schaedler, M. Testing of Hybrid Quantum-Classical K-Means for Nonlinear Noise Mitigation. arXiv [arXiv:quant-ph/2308.03540]. 2023, arXiv:2308.03540. [Google Scholar]
Pakala, L.; Schmauss, B. Non-linear mitigation using carrier phase estimation and k-means clustering. In Proceedings of the Photonic Networks; 16. ITG Symposium. VDE, 2015, pp. 1–5.
Zhang, J.; Chen, W.; Gao, M.; Shen, G. K-means-clustering-based fiber nonlinearity equalization techniques for 64-QAM coherent optical communication system. Optics express 2017, 25, 27570–27580. [Google Scholar] [CrossRef]
Tang, E. Quantum Principal Component Analysis Only Achieves an Exponential Speedup Because of Its State Preparation Assumptions. Phys. Rev. Lett. 2021, 127, 060503. [Google Scholar] [CrossRef]
Gambetta, J. IBM’s roadmap for scaling quantum technology.
Diedolo, F.; Böcherer, G.; Schädler, M.; Calabró, S. Nonlinear Equalization for Optical Communications Based on Entropy-Regularized Mean Square Error. In Proceedings of the European Conference on Optical Communication (ECOC) 2022. Optica Publishing Group; 2022; p. 2. [Google Scholar]
Martyn, J.M.; Rossi, Z.M.; Tan, A.K.; Chuang, I.L. A grand unification of quantum algorithms. arXiv 2021, arXiv:2105.02859. [Google Scholar] [CrossRef]
Kopczyk, D. Quantum machine learning for data scientists. arXiv 2018, arXiv:1804.10068. [Google Scholar]
Esma Aimeur, G.B.; Gambs, S. Quantum clustering algorithms. ICML ’07: Proceedings of the 24th international conference on Machine learning June 2007 2007, p. 1–8.
Cruise, J.R.; Gillespie, N.I.; Reid, B. Practical Quantum Computing: The value of local computation. arXiv 2020, arXiv:2009.08513. [Google Scholar]
Johri, S.; Debnath, S.; Mocherla, A.; Singh, A.; Prakash, A.; Kim, J.; Kerenidis, I. Nearest centroid classification on a trapped ion quantum computer. npj Quantum Information 2021, 7, 122. [Google Scholar] [CrossRef]
Khan, S.U.; Awan, A.J.; Vall-Llosera, G. K-Means Clustering on Noisy Intermediate Scale Quantum Computers. arXiv 2019, arXiv:1909.12183. [Google Scholar]
Cortese, J.A.; Braje, T.M. Loading classical data into a quantum computer. arXiv 2018, arXiv:1803.01958. [Google Scholar]
Giovannetti, V.; Lloyd, S.; Maccone, L. Quantum Random Access Memory. Physical Review Letters 2008, 100. [Google Scholar] [CrossRef] [PubMed]
Harry Buhrman, Richard Cleve, J.W.; de Wolf, R. Quantum fingerprinting. arXiv:quant-ph/0102001 2001.
Ripper, P.; Amaral, G.; Temporão, G. Swap Test-based characterization of decoherence in universal quantum computers. Quantum Information Processing 2023, 22, 1–14. [Google Scholar] [CrossRef]
Foulds, S.; Kendon, V.; Spiller, T. The controlled SWAP test for determining quantum entanglement. Quantum Science and Technology 2021, 6, 035002. [Google Scholar] [CrossRef]
Tang, E. A Quantum-Inspired Classical Algorithm for Recommendation Systems. In Proceedings of the Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing; Association for Computing Machinery: New York, NY, USA, 2019; STOC 2019, p. 217–228, [arXiv:cs.IR/1807.04271]. [CrossRef]
Chia, N.H.; Lin, H.H.; Wang, C. Quantum-inspired sublinear classical algorithms for solving low-rank linear systems. arXiv 2018, arXiv:1811.04852. [Google Scholar]
Gilyén, A.; Lloyd, S.; Tang, E. Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension. arXiv 2018, arXiv:1811.04909. [Google Scholar]
Arrazola, J.M.; Delgado, A.; Bardhan, B.R.; Lloyd, S. Quantum-inspired algorithms in practice. Quantum 2020, 4, 307. [Google Scholar] [CrossRef]
Chia, N.H.; Gilyén, A.; Li, T.; Lin, H.H.; Tang, E.; Wang, C. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing Quantum machine learning. Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing 2020. [Google Scholar] [CrossRef]
Arrazola, J.M.; Delgado, A.; Bardhan, B.R.; Lloyd, S. Quantum-inspired algorithms in practice. Quantum 2020, 4, 307. [Google Scholar] [CrossRef]
Chia, N.H.; Gilyén, A.; Lin, H.H.; Lloyd, S.; Tang, E.; Wang, C. hia, N.H.; Gilyén, A.; Lin, H.H.; Lloyd, S.; Tang, E.; Wang, C. Quantum-Inspired Algorithms for Solving Low-Rank Linear Equation Systems with Logarithmic Dependence on the Dimension. In Proceedings of the 31st International Symposium on Algorithms and Computation (ISAAC 2020); Cao, Y.; Cheng, S.W.; Li, M., Eds.; Schloss Dagstuhl–Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2020; Vol. 181, Leibniz International Proceedings in Informatics (LIPIcs), pp. 47:1–47:17. [CrossRef]
Sergioli, G.; Santucci, E.; Didaci, L.; Miszczak, J.A.; Giuntini, R. A quantum-inspired version of the nearest mean classifier. Soft Computing 2018, 22, 691–705. [Google Scholar] [CrossRef]
Sergioli, G.; Bosyk, G.M.; Santucci, E.; Giuntini, R. A quantum-inspired version of the classification problem. International Journal of Theoretical Physics 2017, 56, 3880–3888. [Google Scholar] [CrossRef]
Subhi, G.M.; Messikh, A. Simple quantum circuit for pattern recognition based on nearest mean classifier. International Journal on Perceptive and Cognitive Computing 2016, 2. [Google Scholar] [CrossRef]
Nguemto, S.; Leyton-Ortega, V. Re-QGAN: an optimized adversarial quantum circuit learning framework, 2022. [CrossRef]
Eybpoosh, K.; Rezghi, M.; Heydari, A. Applying inverse stereographic projection to manifold learning and clustering. Applied Intelligence 2022, 52, 4443–4457. [Google Scholar] [CrossRef]
Poggiali, A.; Berti, A.; Bernasconi, A.; Del Corso, G.; Guidotti, R. Quantum Clustering with k-Means: a Hybrid Approach. arXiv 2022, arXiv:2212.06691. [Google Scholar]
de Veras, T.M.L.; de Araujo, I.C.S.; Park, D.K.; da Silva, A.J. Circuit-Based Quantum Random Access Memory for Classical Data With Continuous Amplitudes. IEEE Transactions on Computers 2021, 70, 2125–2135. [Google Scholar] [CrossRef]
Hornik, K.; Feinerer, I.; Kober, M.; Buchta, C. Spherical k-Means Clustering. Journal of Statistical Software 2012, 50, 1–22. [Google Scholar] [CrossRef]
Feng, C.; Zhao, B.; Zhou, X.; Ding, X.; Shan, Z. An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based on Polar Distance. Entropy 2023, 25. [Google Scholar] [CrossRef] [PubMed]
Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information; Cambridge University Press, 2000.
Lloyd, S. Least squares quantization in PCM. IEEE Transactions on Information Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
Schubert, E.; Lang, A.; Feher, G. Accelerating Spherical k-Means. In Proceedings of the Similarity Search and Applications; Reyes, N.; Connor, R.; Kriege, N.; Kazempour, D.; Bartolini, I.; Schubert, E.; Chen, J.J., Eds.; Springer International Publishing: Cham, 2021; pp. 217–231.
Ahlfors, L.V. Complex Analysis, 2 ed.; McGraw-Hill Book Company, 1966.
LaRose, R.; Coyle, B. Robust data encodings for quantum classifiers. Physical Review A 2020, 102. [Google Scholar] [CrossRef]
Weigold, M.; Barzen, J.; Leymann, F.; Salm, M. Expanding Data Encoding Patterns For Quantum Algorithms. In Proceedings of the 2021 IEEE 18th International Conference on Software Architecture Companion (ICSA-C); 2021; pp. 95–101. [Google Scholar] [CrossRef]
Fanizza, M.; Rosati, M.; Skotiniotis, M.; Calsamiglia, J.; Giovannetti, V. Beyond the Swap Test: Optimal Estimation of Quantum State Overlap. Physical Review Letters 2020, 124. [Google Scholar] [CrossRef] [PubMed]
Foulds, S.; Kendon, V.; Spiller, T. The controlled SWAP test for determining quantum entanglement. Quantum Science and Technology 2021, 6, 035002. [Google Scholar] [CrossRef]
Microsystems, B. Digital modulation efficiencies.
Jr., L.E.F. Electronics Explained: Fundamentals for Engineers, Technicians, and Makers; Newnes, 2018.
Plesch, M.; Brukner, Č. Quantum-state preparation with universal gate decompositions. Physical Review A 2011, 83, 032302. [Google Scholar] [CrossRef]
Quantum Computing Patterns. Available online: https://quantumcomputingpatterns.org/ (accessed on 30 October 2021).
Roy, B. All about Data Encoding for Quantum Machine Learning. 2021. Available online: https://medium.datadriveninvestor.com/all-about-data-encoding-for-quantum-machine-learning-2a7344b1dfef.

1

This might not be true for other definitions of quantum dissimilarity, as in Section 3.4 where we redefine it to include embedding into mixed states.

2

This is obtained as in Figure 2 where the Hadamard is replaced with the Fourier transform and the CNOT with

\sum |i 〉 〈 i| \otimes |i + j mod d 〉 〈 j|

. If we have multiple qubits instead of qudits, then the solution is even simpler: perform a qubit Bell measurement with each pair of qubits. This is because the tensor product of maximally entangled states is still a maximally entangled state.

Figure 1. Experimental setup over a 80 km G.652 fiber link at optimal launch power of 6.6 dBm. Chromatic disperion (CD) and carrier frequency offset (CFO) compensation, multiple-input multiple-output (MIMO) equalizer, timing recovery (TR) and carrier phase estimation (CPE) [10,15].

Figure 2. Quantum circuit of the Bell-state measurement. The measurement is obtained by first transforming the Bell basis into the standard basis with

(H \otimes 1) C N O T

and then measuring in the standard basis.

Figure 2. Quantum circuit of the Bell-state measurement. The measurement is obtained by first transforming the Bell basis into the standard basis with

(H \otimes 1) C N O T

and then measuring in the standard basis.

Figure 3. Inverse Stereographic Projection (ISP) from [46] (left) and ISP with different radii (right). The figure on the right is the cut of the figure in the left going through the N, O and P points.

Figure 4. A diagram providing a visual intuition for how the stereographic quantum kNN (SQ-kNN) is equivalent to the 2DSC-kNN.

Figure 5. Mean testing (top) and training (bottom) accuracy vs Number of points vs Projection radius for the 2DSC-kNN algorithm acting upon the 2.7 dBm dataset. Full data on the left and a close-up on the right.

Figure 6. Heat map of Mean accuracy vs ln(Number of points) vs projection radius for the 2DSC-kNN algorithm acting upon the 2.7 dBm dataset. Testing on the left, training on the right.

Figure 7. Mean number of iterations in training vs Number of points vs log(projection radius) for the 2DSC-kNN algorithm acting upon the 10.7 dBm dataset. Full data at the top and close up at the bottom.

Figure 8. Maximum mean testing accuracy (top) and maximum accuracy gain (bottom) among all tested radii vs number of points (logarithm scale) for the 2.7dBm (left) and 10.7dBm (right) datasets. The labels at the bottom figures correspond to the radius where each value was achieved.

Figure 9. Maximum mean training accuracy (top) and accuracy gain (bottom) among all tested radii vs number of points (logarithm scale) for the 2.7dBm (left) and 10.7dBm (right) datasets. The labels at the bottom figures correspond to the radius where each value was achieved.

Figure 10. Best mean training iterations (top) and iteration gain (bottom) performance among all tested radii vs number of points (logarithm scale) for the 2.7 and 10.7dBm datasets. The labels at the bottom figures correspond to the radius where each value was achieved.

Figure 11. Best mean testing (top) and training (bottom) execution time among all tested radii vs number of points (logarithm scale) for the 2.7dBm (left) and 10.7dBm (right) datasets.

Figure 12. Best mean testing (top) and training (bottom) execution time gain vs number of points (logarithm scale) for the 2.7dBm (left) and 10.7dBm (right) datasets. The labels at the figures correspond to the radius where each value was achieved.

Figure 13. Mean overfitting parameter vs the natural log of the number of points for the 2.7dBm (left) and 10.7dBm (right) datasets.

Figure 14. Maximum Accuracy vs iteration number vs projection radius for the 2DSC-kNN algorithm acting upon the 2.7 dBm dataset. Close-up of the maximas at the bottom.

Figure 15. Probability of stopping vs projection radius vs iteration number for 2DSC-kNN algorithm. Top: 2.7dBm dataset with the number of points = 640. Bottom: 10.7dBm dataset with the number of points = 51200.

Figure 16. Gain in iteration number at maximum accuracy (number of iterations at maximum accuracy of 2DEC-kNN minus the number of iterations at maximum accuracy of 3DSC-kNN (blue) and 2DSC-kNN (red)) vs number of points for the 2.7dBm (left) and 10.7dBm (right) datasets.

Figure 17. Gain in maximum accuracy of 2DSC-kNN (red) and 3DSC-kNN (blue) algorithms vs number of points for the 2.7dBm (left) and 10.7dBm (right) datasets

Figure 18. Maximum accuracy of 2dSC-kNN (red), 3DSC-kNN (blue) and 2DEC-KNN (yellow) algorithms vs number of points for the 2.7dBm (left) and 10.7dBm (right) datasets

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Quantum and Quantum-Inspired Stereographic K Nearest-Neighbour Clustering

Abstract

Keywords:

Subject:

1. Introduction

1.1. Related Work

1.2. Contribution

2. Preliminaries

2.1. Optical-Fibre Setup

2.2. Bloch Sphere

2.3. Bell-State Measurement and Fidelity

2.4. Nearest-Neighbour Clustering Algorithms

2.5. Euclidean Dissimilarity and Classical Clustering

2.6. Cosine Dissimilarity

2.7. Stereographic Projection

3. Stereographic Quantum Nearest-Neighbour Clustering (SQ-kNN)

3.1. Stereographic Embedding, Bloch Embedding and Quantum Dissimilarity

3.2. The SQ-kNN Algorithm

3.3. Complexity Analysis and Scaling

3.3.1. Using Qubit-Based System

3.3.2. Using Qudit-Based System

3.4. SQ-kNN and Mixed States

4. Quantum-Inspired Stereographic k -Nearest-Neighbour Clustering

4.1. Equivalence

4.2. Complexity Analysis and Scaling

5. Experiments and Results

5.1. Experiment 1: Overfitting

5.1.1. Results

Accuracy Performance

Iteration Performance

Time Performance

Overfitting Performance

5.1.2. Discussion and Analysis

5.2. Experiment 2: Stopping criterion

5.2.1. Results

Characterisation of the 2DSC-kNN algorithm

Comparison with 2DEC-kNN and 3DS-kNN K-Means Clustering

5.2.2. Discussion and Analysis

5.3. Overall Observations

Overall Observations from Experiment 1

Overall Observations from Experiment 2

6. Conclusion and Further work

6.1. Future Work

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. QAM and Data Visualisation

Appendix A.1. Description of 64-QAM Data

Appendix B. Data Embedding

Appendix B.1. Angle Embedding

Appendix C. Stereographic Projection

Appendix C.1. ISP for General Radius

Appendix C.2. Equivalence of Displacement and Scaling

Appendix D. Ellipsoidal Embedding

Appendix E. Distance Estimation Using Stereographic Embedding

Appendix F. Rotation Gates and the UGate

References

MDPI Initiatives

Important Links

Subscribe

4. Quantum-Inspired Stereographic $k$ -Nearest-Neighbour Clustering