1. Introduction
We consider the subset sum problem: given a set of distinct non-negative integers , and a value sum (certificate) , determine if there is a subset of the given set with a sum equal to the given sum .
For this problem, exponential [
1,
2], pseudopolymial [
3,
4,
5,
6,
7] algorithms and exhaustive search methods based on the “divide and conquer” principle [
8] have been developed. The complexity of algorithms was considered in [
9,
10,
11].
It is proved that the subset sum problem belongs to the NP-complete class problems. According to the well-known theorem that if there is a polynomial solution to the NP-complete class problem then P = NP.
The main idea of the proposed approach is to transform the set (input data of length ) into subsets based on the combination function in the form of two-dimensional symmetric matrixes from which the main diagonal and all elements below this diagonal have been removed. Thus, these matrixes are triangular. Since the elements of the set have indexes, therefore these matrixes are rewritten with respect to the indexes and the sum of the indexes, which we call index certificates. Index certificate corresponds to the certificate . It should be noted that the elements along each diagonal of matrixes with index certificates are equal to each other. In this case, the time required for selecting subsets of (selected data of length ) will be proportional to the number of elements in the used diagonal, which are less then To the best of our knowledge, the proposed polynomial
solution with time complexity and space complexity is the fastest general
algorithm for this problem and is reduced to the problem of elements’ index selection.
The paper is organized as follows:
- -
problem statement in parametrized formdata input methods based on triangle two-dimensional matrixes;
- -
given lemma estimates required time and space ;
- -
additionally, we present algorithms to solve subset sum problem and examples to confirm claimed results.
- -
The suggested algorithms can be computed as a standalone, self-contained module (as a software solution for diverse problems) or implemented as hardware chips.
2. Problem statement.
Today we know several subset sum problem’s statements [
1,
2,
12]. We propose a parametrized form in the following statement:
The subset
can be selected based on the combination function:
3. Data input methods.
Initially, let's introduce the subset of partial sums
, which is associated with the problem statement(1):
and the subset of index combinations:
where is a set of natural numbers, , this subset is necessary to ensure correspondence with the subset .
In the following discussion, we presume that these subsets are sorted and can be represented as two-dimensional matrixes. For
subset
can be represented as a triangle two-dimensional matrix of order (n-1)x( n-1):
Let's represent matrix(5) with respect to index combinations. It is enough to apply the concatenation operator
and attach the number 1 to the elements of the set
, starting from the second element to the end; then attach the number 2 to the elements of this set, starting from the third element to the end, and so forth until reaching the last element (n-1 n):
Thus the subset of index combinations
can be represented with respect to the partial index certificates
as a triangle two-dimensional matrix of order (n-1)x( n-1):
For
the subset
is represented as a two-dimensional matrix of order (n-2)x( n-2), (n-3)x( n-3), …, (1x1) respectively:
A simple way to construct matrix(8) is to add element
to elements of matrix(5), starting from the second line and until the end; then add element
to the elements of matrix(5), starting from the third line and until the end, and so forth until reaching the last element (
). Number of these matrixes is
Let’s represent matrixes(8) with respect to index combinations. It is enough to apply the concatenation operator
and attach the number 1 to the elements of the second line of matrix(6); then attach the number 2 to the elements of the third line, and so forth until reaching the last element (n-2 n-1 n):
Then matrixes(9) can be represented with respect to the partial index certificates
as a triangle two-dimensional matrix of order (n-2)x( n-2), (n-3)x( n-3), …, (1x1) respectively:
A dash separates matrixes from each other.
The proposed method not only allows to construct triangle two-dimensional matrixes for but also to establish a correspondence between the partial sum and the partial index certificate (in particular, based on index combination matrixes (6) and (9)):
Definition. The correspondence of two subsets and is defined as any subset
F ⊂
×
. We write
F:
→
, and instead of (
) ∈
F we write
∈
F(z) or
In other words: if sets A and B are given, then the elements of set A can correspond to any number of elements of set B. Including none. A mapping is referred to as a one-to-one correspondence, i.e., one in which for every 'a,' there exists a unique 'b' so that (a, b) ∈ F.
Index combination matrixes(6) and (9) are indexes for partial sum matrixes(5) and partial index certificate matrixes(7), and indexes for similar matrixes (8), (10) respectively.
Thus, the proposed problem statement(1) is directly related to two-dimensional matrixes(5) and (8), if . In the case of , it is possible to construct matrixes similar to matrixes(5) and (8) related to the problem statement(1).
Now we can suggest a novel approach to solve problem(1) and construct a framework to manage set of indexes
. For this purpose, an auxiliary problem on subsets sum
⊆
with
and a given index certificate
can be parametrized in the following statement:
Auxiliary problem(12) excludes the accuracy parameter p (bit representation of elements from the computational complexity of problem(1), thus facilitates the solution of problem(1). The subsets are determined based on the combination function(2). Each subset consists of elements of the set .
According to the combination function(2), we have the range:
where based on the combination function(2) we can determine
and it is assumed that the set
is sorted in ascending order and elements from the subset
are selected such that:
Thus, the range(13) consists of elements of the subset satisfying condition(14). Certificate may or may not belong to the range(13). If certificate does not belong to the range(13), then problem(1) has no solution.
In particular, for matrixes(5) and (8) we have respectively.
Let's demonstrate a method for identifying unique index certificates
. First, let's determine
,
,
. Next establish the potential range for the index certificate
, corresponding to a specific subset within the set of subsets
,
Notice that the range(15) describes only unique index certificates
. Next, let’s determine number of unique index certificates
Formula (17) defines the number of unique index certificates Let’s note that the sum of all elements of the constructed matrixes is equal to
4. Method to solve the problem.
Based on the above discussion, we can draw an important conclusion: the quantity
allows us to determine the time required for selecting subsets
,
, using only unique index certificates
:
Inequality(18) shows the high efficiency of the proposed approach in terms of algorithm execution time and required space.
Lemma. Let
,
and there is a one-to-one correspondence(11) of the subsets
and
then required time
to select subsets
,
and required space
satisfy the conditions:
Proof. The fact that certificate belongs to the range(13) ensures the fulfillment of inequality(14) and the existence of a partial sum such that The fact that the index certificate is in the range(15) means, that satisfies inequality(16) and exists within unique index certificates so, that The fact that the certificate is in the range(13) insures the inequality(14) and the existence of a partial sum so, that . These given conditions allow us to find and construct triangle two-dimensional matrixes In turn, the pair satisfies the correspondence condition(11) between the subsets and . In this case there is , as Then based on formula(17) we have
, 1 When recursively storing triangular two-dimensional matrixes, it is sufficient to .
Consequence. When directly using matrixes (5)-(10) with , we have и
5. The subset sum problem algorithms.
With obtained results we can develop algorithms to solve the problem(1) and the auxiliary problem(12).
Subset sum problem algorithm.
Step1. Inputting certificate and sets Step2. Determining of subsets from inequalities(14), (16) with respect to certificate and calculated boundaries of ranges(13), (15).
Step3. Constructing of subsets and in the form of triangle two-dimensional matrixes similar to matrixes(5) and (8), as well as defining ranges(13) and (15) from elements of subsets , satisfying the inequalities(14) and (16) respectively.
Step4. Checking the existence of element , and defining partial index certificate based on correspondence(11) within subsets and Step5. Determining whether belongs to the range(15) and checking the condition so that belongs to one of the diagonals of the partial index certificate matrixes obtained in Step3, and finding the number of elements of this diagonal.
Step6. Representing of a unique index certificate found in Step5 as indexes, similar to elements of matrixes(6), (9).
Step7. Defining subsets based on indexes found in Step6.
Step8. Calculating number of unique index certificates based on boundaries in formula(17).
Step9. Calculating required time and required space to select subsets based on forlmula: .
Step10. Outputting Auxiliary problem(12) algorithm.
Step1. Inputting certificate and set Step2. Calculating boundaries of the range(15).
Step3. Constructing of subsets in the form of triangle two-dimensional matrixes, similar to matrixes(7) and (10), defining the range(15) from elements satisfying inequality(16) (which is not obligatory).
Step4. Obtaining partial index certificate when and and checking the inequality(16) for the index cesrtificate Step4. Obtaining partial index certificate from the condition and checking the inequality(16).
Step5. Determining whether unique index certificate =belongs to one of the diagonals of the partial index certificate matrixes obtained in Step3, and finding the number of elements of this diagonal to define required time to select subsets Step6. Representing of a unique index certificate found in Step5 as indexes, similar to elements of matrixes(6), (9).
Step7. Defining subsets based on indexes found in Step6.
Step8. Calculating number of unique index certificates based on boundaries in formula(17).
Step9. Calculating required time and required space to select subsets based on formula: .
Step10. Outputting Examples are provided to confirm claimed results and to show the possibility of applying algorithms in a compressed form:
Given set
Example1. Let
Matrix(5) is constructed as follows:
Matrix(6) is constructed as follows:
Matrix(7) is constructed as follows:
From the condition(11) of correspondence and we have that for there is as per matrixes(19) and (21). Let's decompose into indexes and find subsets . Then possible subsets are . To the given certificate there are satisfying subsets , Example2. LetSubtract from 120-53=67. For the certificate there is as per matrixes(19) and (21). Subsets . Then . As per example1 we obtain 10,43,20,47}={}
6. Conclusions and future work.
The article provides lemma to estimate required time
and space
and algorithms to solve the subset sum problem. On the basis of the results obtained, an initial set indexes’ management framework has been developed. The proposed algorithms can be easily implemented as software and/or hardware solution in a variety of applications including: scheduling[
13], queries in databases[
14], graph problems[
15] and others.
In a view of the fact that the linear (or quadratic) solvability of the subset sum problem from the NP-complete class is proved, therefore, based on the well-known theorem (stating that if some NP-complete problem is solvable in polynomial time, then P = NP), the equality of classes P and NP is claimed.
- -
Further work directions will be focused on:
- -
partition of an initial set into subsets (using Vandermonde’s convolution and symmetry property);
- -
applying combination function properties;
- -
optimization of combination algorithms;
- -
calculating process paralleling etc.
References
- E. Horowitz, S. Sanni. Computing Partitions with Application to the Knapsack Problem //Journal of the ACM(JACM), 1974, T21, pp.277-292. [CrossRef]
- R. Schroeppel, A. Shamir A T=O(2n/2), S=O(2n/4) Algorithm for Certain NP-Complete Problem // SIAM Journal on Computing, 1981, Vol.10, № 3, pp.456-464. [CrossRef]
- Richard Bellman. Notes on the theory of dynamic programming iv - maximization over discrete sets. Naval Research Logistics Quarterly, 3(1-2):67–70, 1956. [CrossRef]
- David Pisinger. Linear time algorithms for knapsack problems with bounded weights. //Journal of Algorithms, 33(1):1 – 14, 1999. [CrossRef]
- Konstantinos Koiliaris, Chao Xu. A Faster pseudopolynomial time algorithm for subset sum. To appear in SODA ’17, 2017. //arXiv:1610.04712v2[cs.Ds] 8 Jan 2017.-18p. [CrossRef]
- Karl Bringmann. A near-linear pseudopolynomial time algorithm for subset sum. To appear in SODA ’17, 2017. //arXiv:1610.04712v2[cs.Ds] 8 Jan 2017.-18p. [CrossRef]
- A Lincoln, VV Williams, JR Wang, RR Williams. Deterministic Time-Space Tradeoffs for k-SUM //arXiv preprint arXiv:1605.07285 ,. [CrossRef]
- N. Wirth. Algorithms and Data Structures. Russian translate – М.: Mir, 2006.
- R. M. Karp. Reducibility among combinatorial problems. Springer, 1972.
- Cobham A. The intrinsic computational difficulty of functions. //In Proceedinges of the Congress for logic, methodology and philosophy of science.-NorthHoiLand, 1964.P.24-30. [CrossRef]
- Egmonds J. Parths, treers and flowers. // Canadian Journal of mathematics. -1965. Vol.17. –P.449-467. [CrossRef]
- R.M. Kolpakov, M.A. Posypkin. An upper bound for the number of branches for the subset sum problem. //Math. Issues of cybernetics. Issue 18.-M.: Fizmatlit, 2013.-pp.213-226.
- Xiangtong Qi. Coordinated logistics scheduling for in-house production and outsourcing. //Automation Science and Engineering, IEEE Transactions on, 5(1):188–192, Jan 2008. [CrossRef]
- Quoc Trung Tran, Chee-Yong Chan, and Guoping Wang. //Evaluation of set-based queries with aggregation constraints. In Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, 2011, pages 1495–1504, 2011. [CrossRef]
- Venkatesan Guruswami, Yury Makarychev, Prasad Raghavendra, David Steurer, and Yuan Zhou. Finding almost-perfect graph bisections. //In Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 7-9, 2011. Proceedings, pages 321–337, 2011.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).