The development of ACMANT started around 2010 on the base of the PRODIGE method (ACMANT = Adapted Caussinus-Mestre Algorithm for the homogenization of Networks of climatic Time series). The method contains modern and effective tools both for break detection and the calculation of adjustment terms. It approaches to the final solution with 3 homogenization cycles, and ensemble homogenization helps to attenuate random effects. A brief description of ACMANT is provided in a recent daily temperature and precipitation dataset development for Catalonia [
24], while the full description of ACMANTv4 was published by Domonkos [
12]. ACMANTv4 participated in the method comparison tests of the MULTITEST project. In those tests, ACMANTv4 often produced more accurate homogenization results than any other tested method [
16,
17], but a problem of ACMANTv4 was also revealed: the method cannot treat effectively inhomogneities of clustered breaks. To achieve advance in this issue, the combined time series comparison was introduced to ACMANTv5 [
18]. The development of ACMANTv5 is continuous, and some of the recent developments have not been published in other documents. Here the differences from ACMANTv4 in three subversions of ACMANTv5 are presented. Note that from the three subversions only ACMANTv5.1 is available yet, the other subversions, referred to as A52 and A53, are under development.
3.1. ACMANTv5.1
The method of combined time series comparison has been introduced into the first homogenization cycle of the method, and it has exchanged the ensemble homogenization of that cycle in the earlier versions. The first step of combined time series comparison is a break detection with pairwise comparison of time series and optimal step function fitting with the Caussinus-Lyazrhi criterion [
8]. Then, in the second step, the time series comparison is performed by using composite reference series, while the break detection method is the same as in the first step. In the second step, the timings of the detected breaks of the first step are introduced as obligatory break positions, so that the final set of detected breaks by the combined time series comparison contains the detected breaks of the first step together with the additionally detected breaks of the second step. Of course, the number of detected breaks can be zero in any step. Inhomogeneity bias removal is performed only after both steps of the combined time series comparison are finished, and it is done with the ANOVA correction model [
25]. The full description of the combined time series comparison was presented by Domonkos [
18].
Two more important novelties of ACMANTv5 are: (i) this method has both automatic and interactive versions [
26], (ii) metadata can be treated in both of the automatic and interactive versions [
27]. Details of these aspects are not provided here, since the present study examines only automatic homogenization without metadata.
In ACMANTv5.1 the parameterization of the final ensemble homogenization (at step 17.3.2 of ACMANTv4) is modified. There 9 linear combinations of the minimum of the ensemble results (
z’) of the second homogenization cycle and the arithmetical average of the same ensemble results (
z+) are used both in ACMANTv4 and ACMANTv5. In ACMANTv4 the weights of
z’ (denotation of weight:
c’) change from –3 to 4, while those of
z+ (
c+) decrease from 4 to –3. In ACMANTv5, weights
c’ are increased with 0.5, while
c+ are decreased with 0.5 (
Table 2).
3.2. Version A52
The changes introduced by ACMANTv5.1 are kept, but further three kinds of modifications are applied.
- (i)
The length of the overlapping periods in the use of relative time series for break detection is changed. The concept and practice of using overlapping relative time series are presented at section B6 and step 10.1 of the ACMANTv4 description [
12].
As a first approach, only one relative time series is used, always the one with the highest β score. This score is determined primarily by the number of neighbor series included in the composite reference series, but some other factors are also considered (see at Section B6 of the ACMANTv4 description). However, close to any endpoint of a relative time series (which can be different than the endpoints of the candidate series), the reliability of break detection is reduced. Therefore, overlapping of relative time series is applied when it helps to cease or reduce such edge effects. In ACMANTv4 the maximum length of the overlap is 9 years, while in ACMANTv5 it is increased to 15 years. However, when a detected break point is close to the endpoint of the previously used relative time series, the overlap by the lately used relative time series extends only to the timing of that detected break. This parameter change is applied in all break detection steps of A52 when multiple relative time series are used.
- (ii)
The creation of relative time series for break detection in the first homogenization cycle is modified. The applied modifications partly change the content of steps 9.1-9.3 of ACMANTv4. Note that in ACMANTv5, these steps are part of the combined time series comparison.
Networks are classified to be small networks or large networks. In the classification the mean number of time series with comparable observed data (
N*) is considered. For the calculation of
N*, the period from the earliest staring year (
YA ≥ 1) of all homogenized periods to the latest ending year (
YB ≤
n) of all homogenized periods is used.
n denotes the number of years in the study period defined by the user, while homogenized period defines the period of a time series, in which the ratio and compactness of observed data, as well as the availability of spatially comparable data of neighbor series make it possible to perform homogenization with ACMANT [
26]. When the total number of time series in network is
N, the number of truly comparable data
N’ (
N’ ≤
N) may vary in time (
i), Equation (1).
In method A52 a network is considered to be small network if N* ≤ 15, while it is considered to be large network in the opposite case.
In small networks 1 only relative time series is edited to each candidate series. It covers the whole homogenized period of the candidate series. The composite reference series includes all neighbor series which have homogenized period overlapping with the homogenized period of the candidate series. When the considered neighbor series have missing data, they are completed over the homogenized period of the candidate series, and the completed series are used in the creation of the relative time series. Neighbor series are equally weighted in small networks.
In large networks the neighbor series are weighted by their squared spatial correlations with the candidate series. There is no other change in the edition of multiple relative time series for large networks.
Note again that this methodology is used only at the second step of the combined time series comparison and is not used in other relative time series edition steps of A52.
- (iii)
In the gap filling steps of A52, the use of monthly data is preferred in several details of the procedure, even when daily data homogenization is performed. The earlier concept of always using daily data for gap filling in daily data homogenization was based on the fact that monthly values may have elevated uncertainty when some of their daily data are missing. However, tests proved (not shown) that the use of daily data in gap filling does not yield perceptible accuracy improvement of the final results, except in a few details of the procedure, which are presented here and still considered in A52. The motivation of these changes is that the reduction of using daily data in gap filling steps often significantly reduces the computational time consumption.
The gap filling for monthly temperatures is performed by Equations (2)-(5) according to Section B12 of the ACMANTv4 description:
Denotations: gc – candidate series, gs – neighbor series s, h – serial number of month, h0 – timing (month) of missing data in the candidate series, N’’ – number of used neighbor series, ws – weight (depending most on the squared spatial correlation between gc and gs), h1 and h2 are the applied time window around h0, H’’ – number of months with observed data in both of gc and gs within the time window, p4 – parameter (usually 0.4).
When the time resolution is changed from monthly to daily,
d (day) can be written instead of
h in Equations (2) and (3), with which the formulas are converted to Equations (6) and (7).
In ACMANTv4 and ACMANTv5.1, Equations (6) and (7) are applied in all gap filling steps of daily data homogenization. (Gap filling for precipitation data is somewhat different, but its presentation is excluded in this study.) However, Equation (7) is not used in A52, except for at the initial generation of monthly data (next paragraph). In the preliminary operations and within the first two homogenization cycles, only monthly data are used in gap filling, and Equations (2) and (3) are used there also in daily data homogenization. It is possible, because in the first two homogenization cycles the other steps of the homogenization are also done in monthly or annual resolution. In the last homogenization cycle and also in the final gap filling step, the daily values are determined by Equation(6), but still monthly data are used for calculating the differences between the averages of station series, as it is shown by Equation (8)
In the daily data homogenization with ACMANT, a monthly data is considered to be observed when at least 75% of the daily data in the month are observed. Differences between the mean climate anomaly of the observed data of a month and that of the other days of the month may cause biased estimations of monthly values. To reduce such biases, gap filling with daily data is performed in the initial generation of monthly data. In this step, only the data of the month including the target missing data (d0) are used. Here, Equation (7) is used with d1 = 1, and d2 equals the number of days in the month.
3.3. Version A53
The changes introduced by A52 are kept, but further modifications are included. All the newly introduced changes are related to the automatic network construction. In A53 two networks are used when the input dataset contains more than 22 time series. One of the networks is constructed in exactly the same way as in the earlier method versions, while the other network is constructed with the modification of a few parameters. These new type networks are generally smaller than the networks of the earlier method versions, and to distinct the two network types easily, they will be referred to as large networks and small networks.
- (iv)
Generation of large networks: Identical with the network construction of the earlier method versions (see step 3.6 of the ACMANTv4 description).
- (v)
-
Generation of small networks:
- a)
First, the best correlating 20 neighbor series are selected;
- b)
When the first 20 neighbor series no cover sufficiently parts of the homogenized section of the candidate series, further neighbor series are selected when neighbor series s with index S > 0 can be found (Equation 9).
S1 is an empirically constructed index characterizing the frequency of those observed monthly values of the candidate series, which are paired with less than 10 synchronous observed data of the neighbor series.
S2 is also an empirically constructed index, with which the frequency of less than 20 synchronous observed data of neighbor series is considered in overlapping 10-year-long sections of the homogenized period of the candidate series. There is no change in the calculation of
S1 and
S2 relative to their use in large networks. Index
S3 is a penalty term for the excess in network size (
N’, see Equation 10).
In the construction of large networks q = 31, while in that of the small networks, q = 21. When S is positive for more than one neighbor series, the one with the highest S is selected. The network construction is finished when no neighbor series with S > 0 can be found.
- (c)
Use of small networks and large networks in A53: In most part of A53 the small network is used. Exceptions are the second step of the combined time series comparison, i.e. the break detection with composite reference series in the first homogenization cycle, and the preparatory steps for that break detection step.