Distributed Jacobi-Proximal ADMM for Consensus Convex Optimization

Preprint

Article

Distributed Jacobi-Proximal ADMM for Consensus Convex Optimization

Altmetrics

Downloads

135

Views

Comments

Xian-Hong Xiao,Hui Deng,

Yang-Dong Xu^*

Xian-Hong Xiao,Hui Deng,

Yang-Dong Xu^*

This version is not peer-reviewed

Submitted:

30 January 2024

Posted:

31 January 2024

You are already at the latest version

Alerts

Abstract

In this paper, a distributed algorithm is proposed to solve a consensus convex optimization problem. It is a Jacobi-proximal alternating direction method of multipliers with a damping parameter $\gamma$ in the iteration of multiplier. Compared with existing algorithms, it has the following nice properties: (1) The restriction on proximal matrix is relaxed substantively, thus alleviating the weight of the proximal term. Therefore, the algorithm has a faster convergence speed. (2) The convergence analysis of the algorithm is established for any damping parameter $\gamma\in(0,2]$, which is larger ones in the literature. In addition, some numerical experiments and an application to a logistic regression problem are provided to validate the effectiveness and the characteristics of the proposed algorithm.

Keywords:

Subject: Computer Science and Mathematics - Computational Mathematics

1. Introduction

Consider the following consensus convex optimization problem:

\begin{matrix} min_{y} \sum_{i = 1}^{n} f_{i} (y) \end{matrix}

(1.1)

where

y \in R^{m}

is the global optimization variable, n is the number of agents in the multi-agent system and

f_{i}

(

i = 1, \dots, n

) :

R^{m} \to R

are convex functions. Each

f_{i}

is known only by agent i and the agents cooperatively solve the consensus optimization problem. Many problems encountered in machine learning [1] and power network[2] can be posed in the model (1.1).

There are two types distributed algorithms to solve problem (1.1): continuous-time algorithms [3,4,5,6] and discrete-time algorithms, among which, discrete-time algorithms can be divided into primal algorithms and dual algorithms. In primal algorithms, each agent takes a (sub)gradient-related step and averages its local solution with those of neighbors [7,8,9]. One great advantage of these methods is their low computation burden. But slow convergence and low accuracy are two strikes against it. The typical dual algorithms include augmented Lagrangian method [10] and alternating direction method of multipliers (ADMM) [11,12,13,14,15,16], in which each agent needs to solve a subproblem at each iteration, which is responsible for high computation burden. However, the characteristic that they can quickly converge to exact optimal solutions can make up for it.

The ADMM algorithm has attracted significant research interests in recent years. With regard to distributed ADMM algorithms, almost all developments begin with transforming problem (1.1) into a equivalent form by introducing local copy

x_{i}

for each agent

i = 1, 2, . . ., n

, and enforcing consensus

x_{1} = x_{2} = . . . = x_{n}

. For start networks, the reformulation of problem (1.1) can be shown as follows:

\begin{matrix} min_{x} f (x) : = \sum_{i = 1}^{n} f_{i} (x_{i}) \\ subject to x_{i} = \bar{x}, \forall i, \end{matrix}

where

x = {[x_{1}^{T}, . . ., x_{n}^{T}]}^{T}

and

\bar{x}

is so-called consensus variable. Considerable attentions have been paid to such formulation, which can be referred to [11,12] for details..

A central agent is required in the start network, and thus algorithms in [11,12] have high communication burden and low fault tolerance. This leads to growing research interests in general connected networks. For general connected networks, the consensus optimization problem (1.1) can be rewritten in the following compact form:

\begin{matrix} min_{x} f (x) : = \sum_{i = 1}^{n} f_{i} (x_{i}) \\ subject to A x = 0 o r A x + B z = 0, \end{matrix}

where

x = {[x_{1}^{T}, . . ., x_{n}^{T}]}^{T}

, A, B are matrices related to network structure and z is the slack variable. For this kind of problems, Wei and Ozdaglar [13] proposed a distributed Gauss-Seidel ADMM algorithm and proved that its convergence rate was

O (1 / k)

, where the objective function

f_{i}

(

1 = 1, \dots, n

) are convex. Based on this algorithm, agents can only update in order. To save the waiting time of agents in [13], Yan[14] raised a parallel ADMM algorithm, which adopts Jacobi iterate method. Besides, some distributed ADMM algorithms for nonconvex but differentiable probelms are also established in[15,16].

In addition to the algorithms in [11,12,13,14,15,16], several ADMM algorithms can also solve problem (1). These algorithms were originally designed to solve multi-block separable problems, which can be can be cast as

\begin{matrix} min_{x} \sum_{i = 1}^{n} f_{i} (x_{i}) \\ subject to A_{1} x_{1} + . . . + A_{n} x_{n} = c . \end{matrix}

where

x = {[x_{1}^{T}, . . ., x_{n}^{T}]}^{T}

. A wide variety of the proximal ADMM algorithms were proposed for this kind of formulation. The researches on these algorithms mainly focus on proximal matrix

P_{i}

and damping parameter

γ

. Deng et al.[17] presented a parallel ADMM algorithm and the proximal matrix

P_{i}

is required to satisfy

P_{i} ≻ (\frac{n}{2 - γ} - 1) A_{i}^{T} A_{i}

, where

0 < γ < 2 .

There are two specific choices for the proximal matrix

P_{i}

in[18]: (1) Standard proximal matrix

P_{i} = τ_{i} I

; (2) Linearized proximal matrix

P_{i} = τ_{i} I - A_{i}^{T} A_{i}

. Therefore, the condition in [17] can be reduced to

P_{i} = \{\begin{matrix} τ_{i} I, & τ_{i} > (\frac{n}{2 - γ} - 1) {∥ A_{i} ∥}^{2}, \\ τ_{i} I - A_{i}^{T} A_{i}, & τ_{i} > \frac{n}{2 - γ} {∥ A_{i} ∥}^{2} . \end{matrix}

Afterwards, Sun and Sun[19] came up with an improved proximal ADMM algorithm with partially parallel splitting, where

P_{i} = τ_{i} I - A_{i}^{T} A_{i}

and a lower bound of the proximal parameter is given by

τ_{i} > \frac{4 + max {1 - γ, γ^{2} - γ}}{5} (n - 1) {∥ A_{i} ∥}^{2}

, where

0 < γ < \frac{1 + \sqrt{5}}{2}

Inspired by the works in [13,14,17,19], this paper puts forward a distributed Jacobi-proximal ADMM algorithm to solve the consensus convex optimization problem (1.1) over a general connected network. Compared with the state-of-art ones, the proposed algorithm has the following outstanding features.

(1) Compared with the algorithm in [13], the optimization variables of all agents can be updated simultaneously. Hence, the waiting time is saved.

(2) Compared with [14], only half of dual variables are occupied in the proposed algorithm. Therefore, the communication burden among agents and storage cost for each agent are reduced.

(3) The proximal matrix

P_{i}

of the presented algorithm is smaller than those in [17,19]. Thus, the distributed Jacobi-proximal ADMM algorithm is favorable based on the general principle given by Fazel et. al [20], that the proximal matrix

P_{i}

should be as small as possible. Besides, the value range of damping parameter

γ

in the proposed algorithm is larger than that of [19].

The rest of this paper is organized as follows. In Section 2, the equivalent form of the consensus convex optimization problem (1.1) is introduced. In addition, based on this equivalent form, a distributed Jacobi-proximal ADMM algorithm is proposed. Section 3 supplies the convergence analysis of the algorithm. In Section 4, extensive numerical experiments are provided to verify the effectiveness of the proposed algorithm. Moreover, the impacts of the penalty parameter, damping parameter and connectivity ratio on the algorithm are investigated. In Section 5, the proposed algorithm is applied to a logistic regression problem and its numerical results are compared with those in [17]. Finally, the conclusions of this paper are presented in Section 6.

2. Problem Formulation and Distributed Jacobi-Proximal ADMM Algorithm

In this section, some notations related to the network are introduced, and the consensus convex optimization problem (1.1) is represented so that it can be solved by ADMM.

The network topology of the multi-agent system is assumed to be a general undirected connected graph, which is described as

G = {V, E}

, where V denotes the set of agents, E denotes the set of the edges and

| V | = n

| E | = l

. These agents are arranged from 1 to n. The edge between agents i and j with

i < j

is represented by

(i, j)

e_{i j}

and

(i, j) \in E

means that agents i and j can exchange data with each other. The neighbors of agent i are denoted by

N (i) : = {j \in V | (i, j) \in E o r (j, i) \in E}

and

d_{i} = | N (i) |

The edge-node incidence matrix of the network G is denoted by

\tilde{A} \in R^{l \times n}

. The row in

\tilde{A}

that corresponds to the edge

e_{i j}

is denote by

{[\tilde{A}]}^{e_{i j}}

, which is defined by

{[\tilde{A}]}_{k}^{e_{i j}} = \{\begin{matrix} 1, & if k = i, \\ - 1, & if k = j, \\ 0, & otherwise . \end{matrix}

Here, the edges of the network are sorted by the order of their corresponding agents. For instance, the edge-node incidence matrix of the network G in Fig. 1 is given by

\tilde{A} = (\begin{matrix} 1 & - 1 & 0 & 0 \\ 1 & 0 & - 1 & 0 \\ 0 & 1 & - 1 & 0 \\ 0 & 1 & 0 & - 1 \\ 0 & 0 & 1 & - 1 \end{matrix}) .

Figure 1. An example of the network G.

According to the edge-node incidence matrix, the extended edge-node incidence matrix A of the network G is given by

A : = \tilde{A} \otimes I_{m} = (\begin{matrix} {\tilde{a}}_{11} I_{m} & \dots & {\tilde{a}}_{1 n} I_{m} \\ ⋮ & ⋱ & ⋮ \\ {\tilde{a}}_{l 1} I_{m} & \dots & {\tilde{a}}_{l n} I_{m} \end{matrix}) \in R^{m l \times m n},

where ⊗ denotes the

K r o n e c k e r

p r o d u c t

. Obviously, A is a block matrix with

l * n

blocks of

m \times m

matrix.

By introducing separating decision variable

x_{i}

for each agent

i = 1, 2, . . ., n

, the consensus convex optimization problem (1.1) has the following form:

\begin{matrix} min_{x} f (x) : = \sum_{i = 1}^{n} f_{i} (x_{i}) \\ subject to x_{i} = x_{j}, \forall (i, j) \in E, \end{matrix}

(2.1)

where

x = {[x_{1}^{T}, x_{2}^{T}, . . ., x_{n}^{T}]}^{T} \in R^{n m \times 1}

. Clearly, the problem (2.1) is equivalent to problem (1.1) if G is connected.

With the help of the extended edge-node incidence matrix A, the problem (2.1) can be rewritten in the following compact form:

\begin{matrix} min_{x} f (x) \\ subject to A x = 0 . \end{matrix}

(2.2)

Dividing the neighbors

N (i)

of the agent i into two sets: predecessors

P (i) : = {j \in V | (j, i) \in E}

and successors

S (i) : = {j \in V | (i, j) \in E}

. The distributed Jacobi-proximal ADMM (DJP-ADMM) algorithm is described as Algorithm 1.

Algorithm 1: Distributed Jacobi-proximal ADMM Algorithm (DJP-ADMM)

Remark 1.

The parallel ADMM algorithm presented in [14] is shown as follows:

\begin{matrix} x_{i}^{k + 1} : = \underset{x_{i}}{arg min} f_{i} (x_{i}) + \frac{ρ}{2} \sum_{j \in N (i)} ∥ x_{j}^{k} - x_{i} - \frac{1}{ρ} λ_{e_{j i}}^{k} ∥^{2} + \frac{ρ}{2} \sum_{j \in N (i)} {∥ x_{i} - x_{j}^{k} - \frac{1}{ρ} λ_{e_{i j}}^{k} ∥}^{2}, \\ λ_{e_{j i}}^{k + 1} : = λ_{e_{j i}}^{k} - ρ (x_{j}^{k} - x_{i}^{k + 1}), j \in N (i) . \end{matrix}

(2.3)

It is clear that the number of dual variables in (2.3) is twice that in DJP-ADMM. Thus, the communication burden among agents and the storage cost for each agent in Algorithm 1 are smaller than ones in [14].

3. Convergence Analysis

In this section, some important notations and technical lemmas are given. Then, the convergence analysis of Algorithm 1 is investigated.

Let

\begin{matrix} {\tilde{L}}_{-} = {\tilde{A}}^{T} \tilde{A} \in R^{n \times n} . \end{matrix}

(3.1)

Remark 2.

Hong et al. [16] have pointed out that

{\tilde{L}}_{-}

is the sign Laplace matrix of the graph G.

The extended degree matrix and extended sign Laplace matrix of the network G are denoted by

\begin{matrix} D : = \tilde{D} \otimes I_{m} \in R^{m n \times m n}, \end{matrix}

(3.2)

\begin{matrix} L_{-} : = {\tilde{L}}_{-} \otimes I_{m} \in R^{m n \times m n}, \end{matrix}

(3.3)

where

\tilde{D}

is the degree matrix of the graph G.

To simplify the notation, let

H = (\begin{matrix} \frac{1}{2} (P_{1} + P_{1}^{T}) \\ ⋱ \\ \frac{1}{2} (P_{n} + P_{n}^{T}) \end{matrix}) \in R^{m n \times m n},

(3.4)

and

Q = H + ρ \bar{A} \otimes I_{m},

(3.5)

where

\bar{A}

is the adjacency matrix of the graph G. To ensure the convergence of Algorithm 1, it is necessary to make an assumption about the matrix Q, which is shown below.

Assumption 1.

The matrix Q is a positive definite matrix.

Remark 3.

If proximal matrices

P_{i}

(

i = 1, \dots, n

) are symmetric, then Assumption 1 can be reduced to

P + ρ \bar{A} \otimes I_{m}

is a positive definite matrix. Therefore,

P = ρ D = ρ \tilde{D} \otimes I_{m}

is a feasible choice, where

\tilde{D}

is the degree matrix of the graph G. In this case,

P_{i} = ρ d_{i} I_{m}

Remark 4.

By the definition of Q, the matrix Q is symmetric positive definite under Assumption 1, and thus, there exists a matrix M such that

Q = M^{T} M .

(3.6)

According to the convexity of the objective function, we have following result.

Lemma 1.

Assume that

{(x^{k}, λ^{k})}

is the sequence produced by Algorithm 1 for the problem (2.2), where

x^{k} = {[{(x_{1}^{k})}^{T}, {(x_{2}^{k})}^{T}, . . ., {(x_{n}^{k})}^{T}]}^{T}

and

λ^{k} = [λ_{e_{i j}}^{k}],

e_{i j} \in E

. Then one has

\begin{matrix} f (x) - f (x^{k + 1}) - {(x - x^{k + 1})}^{T} A^{T} λ^{k + 1} \\ + {(x - x^{k + 1})}^{T} Q (x^{k + 1} - x^{k}) - ρ (γ - 1) {(x - x^{k + 1})}^{T} L_{-} x^{k + 1} \geq 0, \forall x \in R^{m n} . \end{matrix}

(3.7)

Proof.

Define

g_{i} (i = 1, \dots, n) : R^{m} \to R

g_{i}^{k} (x_{i}) : = \frac{ρ}{2} \sum_{j \in P (i)} ∥ x_{j}^{k} - x_{i} - \frac{1}{ρ} λ_{e_{j i}}^{k} ∥^{2} + \frac{ρ}{2} \sum_{j \in S (i)} ∥ x_{i} - x_{j}^{k} - \frac{1}{ρ} λ_{e_{i j}}^{k} ∥^{2} + \frac{1}{2} {∥ x_{i} - x_{i}^{k} ∥}_{P_{i}}^{2} .

Using the iteration of x in Algorithm 1, one can conclude that

x_{i}^{k + 1}

is the optimizer of

f_{i} + g_{i}^{k}

, i.e.,

x_{i}^{k + 1} : = \underset{x_{i}}{arg min} f_{i} (x_{i}) + g_{i}^{k} (x_{i}) .

Therefore, there exists a subgradient

h (x_{i}^{k + 1}) \in \partial f_{i} (x_{i}^{k + 1})

such that

h (x_{i}^{k + 1}) + \nabla g_{i}^{k} (x_{i}^{k + 1}) = 0

. Then

\begin{matrix} {(x_{i} - x_{i}^{k + 1})}^{T} (h (x_{i}^{k + 1}) + \nabla g_{i}^{k} (x_{i}^{k + 1})) = 0, \forall x_{i} \in R^{m} . \end{matrix}

(3.8)

Due to the convexity of

f_{i}

, we have

f_{i} (x_{i}) \geq f_{i} (x_{i}^{k + 1}) + {(x_{i} - x_{i}^{k + 1})}^{T} h (x_{i}^{k + 1}) .

This together with (3.8) implies that

f_{i} (x_{i}) - f_{i} (x_{i}^{k + 1}) + {(x_{i} - x_{i}^{k + 1})}^{T} \nabla g_{i}^{k} (x_{i}^{k + 1}) \geq 0 .

Substituting the gradient

\nabla g_{i}^{k}

of the function

g_{i}^{k}

into the above inequality, we have

\begin{matrix} f_{i} (x_{i}) - f_{i} (x_{i}^{k + 1}) + {(x_{i} - x_{i}^{k + 1})}^{T} \\ (- ρ \sum_{j \in P (i)} (x_{j}^{k} - x_{i}^{k + 1} - \frac{1}{ρ} λ_{e_{j i}}^{k}) + ρ \sum_{j \in S (i)} (x_{i}^{k + 1} - x_{j}^{k} - \frac{1}{ρ} λ_{e_{i j}}^{k}) + \frac{1}{2} (P_{i} + P_{i}^{T}) (x_{i}^{k + 1} - x_{i}^{k})) \geq 0 . \end{matrix}

From the iteration of the multipliers, one can obtain that

\begin{matrix} - ρ \sum_{j \in P (i)} (x_{j}^{k} - x_{i}^{k + 1} - \frac{1}{ρ} λ_{e_{j i}}^{k}) \\ = \sum_{j \in P (i)} (λ_{e_{j i}}^{k} - ρ (x_{j}^{k} - x_{i}^{k + 1})) = \sum_{j \in P (i)} (λ_{e_{j i}}^{k} - γ ρ (x_{j}^{k + 1} - x_{i}^{k + 1}) + γ ρ (x_{j}^{k + 1} - x_{i}^{k + 1}) - ρ (x_{j}^{k} - x_{i}^{k + 1})) \\ = \sum_{j \in P (i)} (λ_{e_{j i}}^{k + 1} + γ ρ (x_{j}^{k + 1} - x_{i}^{k + 1}) - ρ (x_{j}^{k + 1} - x_{i}^{k + 1}) + ρ (x_{j}^{k + 1} - x_{i}^{k + 1}) - ρ (x_{j}^{k} - x_{i}^{k + 1})) \\ = \sum_{j \in P (i)} (λ_{e_{j i}}^{k + 1} + ρ (γ - 1) (x_{j}^{k + 1} - x_{i}^{k + 1}) + ρ (x_{j}^{k + 1} - x_{j}^{k})) . \end{matrix}

Similarly,

\begin{matrix} ρ \sum_{j \in S (i)} (x_{i}^{k + 1} - x_{j}^{k} - \frac{1}{ρ} λ_{e_{i j}}^{k}) = \sum_{j \in S (i)} (- λ_{e_{i j}}^{k + 1} + ρ (γ - 1) (x_{j}^{k + 1} - x_{i}^{k + 1}) + ρ (x_{j}^{k + 1} - x_{j}^{k})) . \end{matrix}

Hence,

\begin{matrix} f_{i} (x_{i}) - f_{i} (x_{i}^{k + 1}) + {(x_{i} - x_{i}^{k + 1})}^{T} (\sum_{j \in P (i)} λ_{e_{j i}}^{k + 1} - \sum_{j \in S (i)} λ_{e_{i j}}^{k + 1}) \\ + {(x_{i} - x_{i}^{k + 1})}^{T} (ρ (γ - 1) \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{i}^{k + 1}) + ρ \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{j}^{k}) + \frac{1}{2} (P_{i} + P_{i}^{T}) (x_{i}^{k + 1} - x_{i}^{k})) \geq 0, \end{matrix}

By the definition of the matrix A, we simplify the above inequality as follows:

\begin{matrix} f_{i} (x_{i}) - f_{i} (x_{i}^{k + 1}) - {(x_{i} - x_{i}^{k + 1})}^{T} {[A]}_{i}^{T} λ^{k + 1} \\ + {(x_{i} - x_{i}^{k + 1})}^{T} (ρ (γ - 1) \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{i}^{k + 1}) + ρ \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{j}^{k}) + \frac{1}{2} (P_{i} + P_{i}^{T}) (x_{i}^{k + 1} - x_{i}^{k})) \geq 0 . \end{matrix}

And then,

\begin{matrix} \sum_{i = 1}^{n} f_{i} (x_{i}) - \sum_{i = 1}^{n} f_{i} (x_{i}^{k + 1}) - \sum_{i = 1}^{n} {(x_{i} - x_{i}^{k + 1})}^{T} {[A]}_{i}^{T} λ^{k + 1} \\ + \sum_{i = 1}^{n} {(x_{i} - x_{i}^{k + 1})}^{T} (ρ (γ - 1) \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{i}^{k + 1}) + ρ \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{j}^{k}) + \frac{1}{2} (P_{i} + P_{i}^{T}) (x_{i}^{k + 1} - x_{i}^{k})) \geq 0 . \end{matrix}

(3.9)

By the definition of matrices A and D, we have

\begin{matrix} - \sum_{i = 1}^{n} {(x_{i} - x_{i}^{k + 1})}^{T} {[A]}_{i}^{T} λ^{k + 1} = - {(x - x^{k + 1})}^{T} A^{T} λ^{k + 1}, \end{matrix}

(3.10)

d_{i} \sum_{i = 1}^{n} {(x_{i} - x_{i}^{k + 1})}^{T} x_{i}^{k + 1} = {(x - x^{k + 1})}^{T} D x^{k + 1} .

In addition,

\begin{matrix} \sum_{i = 1}^{n} ({(x_{i} - x_{i}^{k + 1})}^{T} \sum_{j \in N (i)} x_{j}^{k + 1}) \\ = [{(x_{1} - x_{1}^{k + 1})}^{T}, . . ., {(x_{n} - x_{n}^{k + 1})}^{T}] {[\sum_{j \in N (1)} {(x_{j}^{k + 1})}^{T}, . . ., \sum_{j \in N (n)} {(x_{j}^{k + 1})}^{T}]}^{T} \\ = [{(x_{1} - x_{1}^{k + 1})}^{T}, . . ., {(x_{n} - x_{n}^{k + 1})}^{T}] (\bar{A} \otimes I_{m}) x^{k + 1} \\ = {(x - x^{k + 1})}^{T} (\bar{A} \otimes I_{m}) x^{k + 1}, \end{matrix}

where

\bar{A}

is the adjacency matrix of the graph G. The above two relations indicate that

\sum_{i = 1}^{n} ({(x_{i} - x_{i}^{k + 1})}^{T} (\sum_{j \in N (i)} x_{j}^{k + 1} - d_{i} x_{i}^{k + 1})) = - {(x - x^{k + 1})}^{T} (D - \bar{A} \otimes I_{m}) x^{k + 1}

Therefore, by the definition of the extended sign Laplace matrix

L_{-}

, one can conclude that

\begin{matrix} \sum_{i = 1}^{n} ({(x_{i} - x_{i}^{k + 1})}^{T} \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{i}^{k + 1})) = \sum_{i = 1}^{n} ({(x_{i} - x_{i}^{k + 1})}^{T} (\sum_{j \in N (i)} x_{j}^{k + 1} - d_{i} x_{i}^{k + 1})) = - {(x - x^{k + 1})}^{T} L_{-} x^{k + 1} . \end{matrix}

(3.11)

Analogously,

\sum_{i = 1}^{n} ({(x_{i} - x_{i}^{k + 1})}^{T} \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{j}^{k})) = {(x - x^{k + 1})}^{T} (\bar{A} \otimes I_{m}) (x^{k + 1} - x^{k}) .

Besides, by the definition of matrix Q, we have

\begin{matrix} \sum_{i = 1}^{n} [{(x_{i} - x_{i}^{k + 1})}^{T} (\frac{1}{2} (P_{i} + P_{i}^{T}) (x_{i}^{k + 1} - x_{i}^{k}) + ρ \sum_{j \in N (i)} (x_{j}^{k + 1} - x_{j}^{k}))] = {(x - x^{k + 1})}^{T} Q (x^{k + 1} - x^{k}) . \end{matrix}

(3.11)

Thus, recalling (3.9)-(3.12), inequality (3.7) holds. □

The non-negative property of the norm is very important in the subsequent analysis of convergence. To this end, certain items in Lemma 1 will be concerted into norm form. To simplify some expressions in the proof of the following lemmas,

V^{k}

is denoted by

V^{k} = \frac{1}{2 ρ γ} ∥ λ^{k} ∥^{2} + \frac{1}{2} {∥ M (x^{k} - x^{*}) ∥}^{2},

(3.13)

where M is defined in (3.6).

Under Assumption 1, we can get the following lemma.

Lemma 2.

Assume that

{(x^{k}, λ^{k})}

is the sequence produced by Algorithm 1 for the problem (2.2), where

x^{k} = {[{(x_{1}^{k})}^{T}, {(x_{2}^{k})}^{T}, . . ., {(x_{n}^{k})}^{T}]}^{T}

and

λ^{k} = [λ_{e_{i j}}^{k}],

e_{i j} \in E

. Then under Assumption 1, one has the following equality

\begin{matrix} {(x^{k + 1})}^{T} A^{T} λ^{k + 1} + {(x^{*} - x^{k + 1})}^{T} Q (x^{k + 1} - x^{k}) = V^{k} - V^{k + 1} - \frac{ρ γ}{2} ∥ A x^{k + 1} ∥^{2} - \frac{1}{2} {∥ M (x^{k + 1} - x^{k}) ∥}^{2}, \end{matrix}

(3.14)

where

Q = M^{T} M

Proof.

To prove (3.14), we firstly claim that

\begin{matrix} {(x^{k + 1})}^{T} A^{T} λ^{k + 1} = \frac{1}{2 ρ γ} (∥ λ^{k} ∥^{2} - ∥ λ^{k + 1} ∥^{2}) - \frac{ρ γ}{2} {∥ A x^{k + 1} ∥}^{2}, \end{matrix}

(3.15)

\begin{matrix} {(x^{*} - x^{k + 1})}^{T} Q (x^{k + 1} - x^{k}) = \frac{1}{2} (∥ M (x^{k} - x^{*}) ∥^{2} - ∥ M (x^{k + 1} - x^{*}) ∥^{2}) - \frac{1}{2} {∥ M (x^{k + 1} - x^{k}) ∥}^{2} . \end{matrix}

(3.16)

Indeed, by the iteration of the multiplier:

λ^{k + 1} = λ^{k} - γ ρ A x^{k + 1}

, we know

{(x^{k + 1})}^{T} A^{T} λ^{k + 1} = {(x^{k + 1})}^{T} A^{T} λ^{k} - ρ γ {∥ A x^{k + 1} ∥}^{2},

(3.17)

and

\frac{1}{2 ρ γ} (∥ λ^{k} ∥^{2} - ∥ λ^{k + 1} ∥^{2}) = {(x^{k + 1})}^{T} A^{T} λ^{k} - \frac{ρ γ}{2} {∥ A x^{k + 1} ∥}^{2} .

(3.18)

Therefore, equality (3.17) and (3.18) indicate that equality (3.15) is valid. In addition, by distorting some of the terms, we obtain

∥ M (x^{k} - x^{*}) ∥^{2} - ∥ M (x^{k + 1} - x^{*}) ∥^{2} = ∥ M x^{k} ∥^{2} - {∥ M x^{k + 1} ∥}^{2} + 2 {(M x^{*})}^{T} M (x^{k + 1} - x^{k}),

and

2 {(x^{*} - x^{k + 1})}^{T} (M^{T} M) (x^{k + 1} - x^{k}) = 2 {(M x^{*})}^{T} M (x^{k + 1} - x^{k}) - 2 {∥ M x^{k + 1} ∥}^{2} + 2 {(M x^{k})}^{T} M x^{k + 1} .

Combining the above two equalities, we yield

\begin{matrix} 2 {(x^{*} - x^{k + 1})}^{T} (M^{T} M) (x^{k + 1} - x^{k}) \\ = ∥ M (x^{k} - x^{*}) ∥^{2} - ∥ M (x^{k + 1} - x^{*}) ∥^{2} + 2 {(M x^{k})}^{T} M x^{k + 1} - (∥ M x^{k} ∥^{2} + ∥ M x^{k + 1} ∥^{2}) \\ = ∥ M (x^{k} - x^{*}) ∥^{2} - ∥ M (x^{k + 1} - x^{*}) ∥^{2} - {∥ M (x^{k + 1} - x^{k}) ∥}^{2} . \end{matrix}

Taking into account

Q = M^{T} M

, we can get the equality (3.16). Consequently, by (3.15) and (3.16), the equality (3.14) holds. □

With the help of the proceeding two lemmas, the convergence result of Algorithm 1 can be established.

Theorem 1.

Assume that

{(x^{s}, λ^{s})}

is the sequence produced by Algorithm 1, where

x^{s} = {[{(x_{1}^{s})}^{T}, {(x_{2}^{s})}^{T}, . . ., {(x_{n}^{s})}^{T}]}^{T}

and

λ^{s} = [λ_{e_{i j}}^{s}], e_{i j} \in E

. Let

y^{k} = \frac{1}{k} \sum_{s = 0}^{k - 1} x^{s + 1}

be the ergodic average of

x^{s}

from step 1 to k.

x^{*}

is the optimal solution of the problem (2.2). Then under Assumption 1, the following relation holds for any

k \geq 1

and for

0 < γ \leq 2

\begin{matrix} 0 \leq f (y^{k}) - f (x^{*}) \leq \frac{V^{0}}{k} . \end{matrix}

(3.19)

where

V^{0}

given in (3.13) is a non-negative term. Furthermore,

\begin{matrix} lim_{k \to + \infty} (f (y^{k}) - f (x^{*})) = 0, \end{matrix}

(3.20)

with the rate of

O (1 / k)

Proof.

It follows from the optimality of

x^{*}

that the first inequality in (3.19) is clearly true. Let

x = x^{*}

in inequality (3.7), then one has

\begin{matrix} f (x^{*}) - f (x^{s + 1}) - {(x^{*} - x^{s + 1})}^{T} A^{T} λ^{s + 1} + {(x^{*} - x^{s + 1})}^{T} Q (x^{s + 1} - x^{s}) - ρ (γ - 1) {(x^{*} - x^{s + 1})}^{T} L_{-} x^{s + 1} \geq 0 . \end{matrix}

Take into consideration that

L_{-} = A^{T} A

and

A x^{*} = 0

, the above inequality can be rewritten as:

\begin{matrix} f (x^{*}) - f (x^{s + 1}) + {(x^{s + 1})}^{T} A^{T} λ^{s + 1} + {(x^{*} - x^{s + 1})}^{T} Q (x^{s + 1} - x^{s}) - ρ (1 - γ) {∥ A x^{s + 1} ∥}^{2} \geq 0 . \end{matrix}

By Lemma 4.2, one has

\begin{matrix} f (x^{*}) - f (x^{s + 1}) + V^{s} \geq V^{s + 1} + \frac{ρ γ}{2} ∥ A x^{s + 1} ∥^{2} + ρ (1 - γ) ∥ A x^{s + 1} ∥^{2} + \frac{1}{2} {∥ M (x^{s + 1} - x^{s}) ∥}^{2}, \end{matrix}

and then

\begin{matrix} k f (x^{*}) - \sum_{s = 0}^{k - 1} f (x^{s + 1}) + V^{0} \geq V^{k} + \frac{1}{2} \sum_{s = 0}^{k - 1} ∥ M (x^{s + 1} - x^{s}) ∥^{2} + \frac{ρ}{2} (2 - γ) \sum_{s = 0}^{k - 1} {∥ A x^{s + 1} ∥}^{2} . \end{matrix}

Due to

V^{k} \geq 0

for any k, the following inequality holds for

0 < γ \leq 2

\begin{matrix} k f (x^{*}) - \sum_{s = 0}^{k - 1} f (x^{s + 1}) + V^{0} \geq 0 . \end{matrix}

(3.21)

Since the function f is convex,

\sum_{s = 0}^{k - 1} f (x^{s + 1}) \geq k f (\frac{1}{k} \sum_{s = 0}^{k - 1} x^{s + 1})

, and then using

y^{k} = \frac{1}{k} \sum_{s = 0}^{k - 1} x^{s + 1}

, we have

\begin{matrix} k f (x^{*}) - k f (y^{k}) + V^{0} \geq 0, \end{matrix}

i.e.,

\begin{matrix} f (y^{k}) - f (x^{*}) \leq \frac{V^{0}}{k} . \end{matrix}

(3.22)

Therefore, inequality (3.19) stands. Furthermore, inequality (3.22) implies that

\begin{matrix} lim_{k \to + \infty} (f (y^{k}) - f (x^{*})) \leq 0 . \end{matrix}

On the other hand, from the optimality of

x^{*}

, we have

\begin{matrix} lim_{k \to + \infty} (f (y^{k}) - f (x^{*})) \geq 0 . \end{matrix}

As a result,

{lim}_{k \to + \infty} (f (y^{k}) - f (x^{*})) = 0

and the proof is completed. □

Remark 5.

Theorem 1 gives the theoretical upper bound for

f (y^{k}) - f^{*}

, which provides the error estimates for the optimal value

f^{*}

at each iteration k. The upper bound is consist of two additive items. Both of them approach to zero at the rate

O (1 / k)

. In addition, Theorem 1 implies that

f (x^{k})

converges to the optimal value

f^{*}

asymptotically. Furthermore, if at least one function

f_{i}

is strongly convex, then the optimal solution

x^{*}

is unique, and thus

x^{k}

asymptotically approaches to

x^{*}

Remark 6.

When solving the consensus optimization problem (1.1), the convergence condition of Algorithm 1 has less conservative than that in [17], wherein, the convergence of Algorithm 1 can be guaranteed if

P_{i}

is symmetric and

P_{i} ≻ ρ d_{i} I_{m}

according to Remark 4.2, while algorithm in [17] requires that

P_{i}

is a symmetric positive semi-definite matrix and

P_{i} ≻ (\frac{n}{2 - γ} - 1) ρ A_{i}^{T} A_{i} = (\frac{n}{2 - γ} - 1) ρ d_{i} I_{m} (0 < γ < 2)

4. Numerical Experiments

In this section, some numerical experiments are provided to show the validity of Algorithm 1. First, the convergence property of Algorithm 1 is verified. Then the impacts of penalty parameter

ρ

, damping parameter

γ

and connectivity ratio d on Algorithm 1 are investigated.

In this section, each edge of the connected network G is generated randomly. The connectivity ratio of the network G is denoted by

d = \frac{2 l}{n (n - 1)}

. Consider the following consensus optimization problem given in [21]:

\begin{matrix} min_{y} \frac{1}{2} \sum_{i = 1}^{n} {(y - θ_{i})}^{2}, \end{matrix}

(4.1)

where

y \in R

. Apparently, the optimal solution of this problem is

y^{*} = \bar{θ} = \frac{1}{n} \sum_{i = 1}^{n} θ_{i} .

The consensus optimization problem (4.1) can be reformulated into a distributed version:

\begin{matrix} min_{x} f (x) = \frac{1}{2} \sum_{i = 1}^{n} {(x_{i} - θ_{i})}^{2}, \\ subject to x_{i} = x_{j}, \forall (i, j) \in E, \end{matrix}

(4.2)

where

x = {[x_{1}, x_{2}, . . ., x_{n}]}^{T} \in R^{n}

and f is convex. Therefore, Algorithm 1 can be used to solve the consensus optimization problem. For the consensus optimization problem (4.2), each

θ_{i}

is randomly generated by a normal distribution

N (0, 1)

The proximal matrix of Algorithm 1 is set by

P_{i} = ρ d_{i} I

. In this case, the iteration of x has a closed-form solution, which is shown as follows:

x_{i}^{k + 1} = \frac{ρ d_{i} x_{i}^{k} + ρ \sum_{j \in N (i)} x_{j}^{k} + \sum_{j \in S (i)} λ_{i j}^{k} - \sum_{j \in P (i)} λ_{j i}^{k} + θ_{i}}{1 + 2 ρ d_{i}},

where

d_{i}

is the number of neighbors of the agent i.

A. Convergence Property

To illustrate the convergence property of Algorithm 1 for the consensus optimization problem (4.2), ten networks are generated. Each network has

n = 50

agents and the connectivity ratio of these networks are set as

d = 0.1

, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0, respectively. The algorithm parameters are set as

ρ = 1

and

γ = 1

. The algorithm will be stopped once

∥ x^{k} - x^{*} ∥

reaches to

10^{- 16}

or the number of iterations k reaches to 3500, where

x^{*}

is the optimal solution of problem (4.1).

Fig. 2 and Fig. 3 respectively depict how the relative error

\frac{∥ x^{k} - x^{*} ∥}{∥ x^{*} ∥}

and constraint violation

∥ A x^{k} ∥

vary with iteration k. One can find that Algorithm 1 has high accuracy since the relative error can achieve

10^{- 13}

and the constraint violation can achieve

10^{- 16}

Figure 2. Relative error versus iteration.

Figure 3. Constraint violation versus iteration.

B. Algorithm Parameters $ρ$ and $γ$

In this part, the impacts of algorithm parameters

ρ

and

γ

on the convergence speed of Algorithm 1 are discussed. The networks are generated in the same way as Part A. In order to explore the influences of parameters

ρ

and

γ

on Algorithm 1, the convergence speed is denoted by

ξ_{ε_{0}} = 1 / k_{0}

, where

ε_{0} > 0

and

k_{0}

is the number of iterations required to achieve

∥ x^{k_{0}} - x^{*} ∥ \leq ε_{0}

. Here, the accuracy is set as

ε_{0} = 10^{- 6}

Choosing damping parameter

γ = 1

and selecting different penalty parameters to solve the problem (4.2), one can get the relationship between the convergence speed

ξ_{ε_{0}}

and parameter

ρ

, which is displayed in Fig. 4. Obviously, if the penalty parameter

ρ

is too large or too small, the convergence speed of the algorithm is slow. The penalty parameter

ρ

can be selected from

(0.01, 2)

. In general, a smaller connectivity ratio leads to larger actual optimal parameter

ρ^{*}

. As a consequence, when the network is sparse, it is better to select a larger penalty parameter and when it is dense, a smaller penalty parameter will be a nice choice.

Figure 4. Convergence speed versus

ρ

Figure 4. Convergence speed versus

ρ

In order to explore the influence of parameter

γ

on Algorithm 1, the penalty parameter is set as

ρ = 1

and the damping parameter is set to 60 different values. The numerical results are shown in Fig. 5. Obviously, the convergence speed of Algorithm 1 increases with the damping parameter, and then remains constant. Therefore,

γ = 2

is a great choice.

Figure 5. Convergence speed versus

γ

Figure 5. Convergence speed versus

γ

C. Connectivity Ratio

In this part, the effect of connectivity ratio d on the convergence speed of Algorithm 1 is explored. From Fig. 4, one can find that when penalty parameter

ρ

takes different values, the impact of connectivity ratio on convergence speed is different. Therefore, the penalty parameter is set to six different values

ρ = 0.005, 0.01, 0.05, 0.1, 1

and 2, respectively.

We generate 30 networks with

n = 50

agents, whose connectivity ratio are set to 30 different values :

\frac{1}{30}

\frac{2}{30}

, ..., 1. From Fig. 6, one can find that when the penalty parameter takes a smaller value, such as

ρ = 0.005, 0.01

or 0.05, the convergence speed of Algorithm 1 generally slows down with the increase of connectivity ratio, and the opposite is true when the penalty argument takes a bigger value, such as

ρ = 0.1, 1

or 2 from Fig. 7. It is worth noting that when the network is very sparse, for example

d = 0.05

, no matter what the penalty parameter value is, the convergence speed is slow. Therefore, on the premise of ensuring network connectivity, few edges can be added to increase information exchange between agents.

Figure 6. Convergence speed versus d.

Figure 7. Convergence speed versus d.

5. Application to A Logistic Regression Problem

In this section, the proposed distributed Jacobi-proximal ADMM algorithm is applied to a logistic regression problem, which is a widely used machine learning model[22,23].

The network

G = {V, E}

is generated with

n = 50

agents. The connectivity ratio is set as

d = 0.3

and the edges are generated randomly. The network generated is given in Fig. 8. Each agent has

n_{i}

training samples, which denoted by

{w_{i j}, y_{i j}}_{j = 1}^{n_{i}}

, where

w_{i j} \in R^{p}

and

y_{i j} \in {1, - 1}

Figure 8. The network of problem (5.1).

The distributed logistic regression problem is described as follows:

\begin{matrix} min_{x} f (x) = \frac{1}{2} {∥ x ∥}^{2} + \frac{1}{N} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} l o g (1 + e^{- y_{i j} w_{i j}^{T} x}), \end{matrix}

(5.1)

where

N = \sum_{i = 1}^{n} n_{i}

is the total number of samples. The dimension of feature is set as

p = 3

, the number of samples

n_{i}

is generated by a uniform distribution

U (1, 20)

, and the parameter

w_{i j}

is generated by a normal distribution

N (0, 1)

. The generation rule of the label

y_{i j}

is shown as follows:

y_{i j} = \{\begin{matrix} 1, & if u_{ij} \geq 0.5, \\ - 1, & if u_{ij} < 0.5, \end{matrix}

where

u_{i j}

is generated by a uniform distribution

U (0, 1)

The distributed logistic regression problem (5.1) can be formulated as

\begin{matrix} min_{x} f (x) = \sum_{i = 1}^{n} f_{i} (x_{i}), \\ subject to x_{i} = x_{j}, \forall (i, j) \in E, \end{matrix}

(5.2)

where

x = {[x_{1}^{T}, x_{2}^{T}, . . ., x_{n}^{T}]}^{T}

and

f_{i} (x_{i}) = \frac{1}{2 n} {∥ x_{i} ∥}^{2} + \frac{1}{N} \sum_{j = 1}^{n_{i}} l o g (1 + e^{- y_{i j} w_{i j}^{T} x_{i}})

. Obviously, problem (5.2) can be solved by Algorithm 1.

The convergence path of Algorithm 1 is compared with the Jocobi-Proximal ADMM (JP-ADMM) algorithm in [17]. To investigate the performances of the two algorithms, the penalty parameter is set to

ρ =

0.01, 0.1 and 1, respectively. In addition, the damping parameter is set to two different values

γ = 1

and

\frac{3}{2}

. The proximal matrix of Algorithm 1 and algorithm in [17] are set as

P_{i} = ρ d_{i} I

and

P_{i} = [(\frac{n}{2 - γ} - 1) ρ d_{i} + 1] I

, respectively. Every algorithm is stopped once

∥ x^{k} - x^{k - 1} ∥

reaches to

10^{- 5}

or the number of iterations k reaches to 1000. One can find that the convergence speed of Algorithm 1 is significantly faster than that in [17] from Fig. 9 and Fig. 10.

Figure 9. Objective value

f^{k}

(

γ = \frac{1}{2}

Figure 9. Objective value

f^{k}

(

γ = \frac{1}{2}

Figure 10. Objective value

f^{k}

(

γ = \frac{3}{2}

Figure 10. Objective value

f^{k}

(

γ = \frac{3}{2}

6. Conclusions

In this paper, a distributed ADMM algorithm is put forward to solve a consensus convex optimization problem over a connected network. The proposed algorithm is a Jacobi-proximal ADMM algorithm and the proximal matrix is smaller than existing algorithms. The convergence of the algorithm is proved and its convergence rate is

O (1 / k)

. Extensive numerical experiments are provided to verify the convergence of the algorithm. Besides, the impacts of penalty parameter, damping parameter and connectivity ratio on the proposed algorithm are investigated. Finally, an application of the proposed algorithm to a logistic regression problem is implemented and its performance is compared with that of another ADMM algorithm in [17].

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant number: 11801051) and the Natural Science Foundation of Chongqing (Grant number: cstc2019jcyj-msxmX0075).

References

Y.L. Pan, Distributed optimization and statistical learning for large-scale penalized expectile regression, J. Korean Stat. Soc. 50 (2021) 290-314. [CrossRef]
G. Chen, J.Y. Li, A fully distributed ADMM-based dispatch approach for virtual power plant problems, Appl. Math. Model. 58 (2018) 300-312. [CrossRef]
G. Droge, H. Kawashima, M.B. Egerstedt, Continuous-time proportional-integral distributed optimisation for networked systems, J. Control Decis. 1 (2014) 191-213. [CrossRef]
B. Gharesifard, J. Cortés, Continuous-time distributed convex optimization on weight-balanced digraphs, IEEE Trans. Autom. Control 59 (2014) 781-786.
Y.N. Zhu, W.W. Yu, G.H. Wen, G.R. Chen, W. Ren, Continuous-time distributed subgradient algorithm for convex optimization with general constraints, IEEE Trans. Autom. Control 64 (2019) 1694-1701. [CrossRef]
W. Zhu, H.B. Tian, Distributed convex optimization via proportional-integral-differential algorithm, Meas. Control 55 (2021) 13-20. [CrossRef]
A. Nedic, A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Trans. Autom. Control 54 (2009) 48-61. [CrossRef]
C. Xi, U.A. Khan, Distributed subgradient projection algorithm over directed graphs, IEEE Trans. Autom. Control 62 (2017) 3986-3992. [CrossRef]
S. Liu, Z.R. Zhang, L.H. Xie, Convergence rate analysis of distributed optimization with projected subgradient algorithm, Automatic 83 (2017) 162-169. [CrossRef]
D. Jakovetić, J. Xavier, J.M.F. Moura, Cooperative convex optimization in networked systems: augmented Lagrangian algorithms with directed gossip communication, IEEE Trans. Signal Process. 59 (2011) 3889-3902.
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (2010) 1-122.
R. Zhang, J.T. Kwok, Asynchronous distributed ADMM for consensus optimization, in: Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 3689-3697.
E. Wei, A. Ozdaglar, Distributed alternating direction method of multipliers, in: Proceedings of the IEEE Conference on Decision and Control, 2012, pp. 5445-5450.
J.Q. Yan, F.H. Guo, C.Y. Wen, G.Q. Li, Parallel alternating direction method of multipliers, Inf. Sci. 507 (2020) 185-196. [CrossRef]
W. Shi, Q. Ling, K. Yuan, G. Wu, W. Yin, On the linear convergence of the ADMM in decentralized consensus optimization, IEEE Trans. Signal Process. 62 (2014) 1750-1761. [CrossRef]
M. Hong, H. Davood, M. Zhao, Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks, in: Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 2402-2433.
W. Deng, M.J. Lai, Z.M. Peng, W.T. Yin, Parallel multi-block ADMM with o(1/k) convergence, J. Sci. Comput. 71 (2017) 712-736.
W. Deng, W.T. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers, J. Sci. Comput. 66 (2016) 889-916. [CrossRef]
M. Sun, H.C. Sun, Improved proximal ADMM with partially parallel splitting for multi-block separable convex programming, Appl. Math. Comput. 58 (2018) 151-181. [CrossRef]
M. Fazel, T.K. Pong, D.F. Sun, P. Tseng, Hankel matrix rank minimization with applications to system identification and realization, SIAM J. Matrix Anal. Appl. 34 (2013) 946-977. [CrossRef]
M. Rabbat, R. Nowak, Distributed optimization in sensor networks, in: Proceedings of the third International Symposium on Information Processing in Sensor Networks, 2004, pp. 20-27.
L.J. Wang, M. Guo, K. Sawada, J. Lin, J.C. Zhang, A comparative study of landslide susceptibility maps using logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network, Geosci. J. 20 (2016) 117-236. [CrossRef]
B.Y. Kim, S.J. Shin, Principal weighted logistic regression for sufficient dimension reduction in binary classification, J. Korean Stat. Soc. 48 (2019) 194-206. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Distributed Jacobi-Proximal ADMM for Consensus Convex Optimization

Abstract

1. Introduction

2. Problem Formulation and Distributed Jacobi-Proximal ADMM Algorithm

3. Convergence Analysis

4. Numerical Experiments

5. Application to A Logistic Regression Problem

6. Conclusions

Acknowledgments

References

MDPI Initiatives

Important Links

Subscribe