Combining "Deep Learning" and Physically Constrained Neural Networks to Derive Complex Glaciological Change Processes from Modern High-Resolution Satellite Imagery: Application of the GEOCLASS-image System to Create VarioCNN for Glacier Surges

The objectives of this paper are to investigate the tradeoffs between a physically constrained neural network and a deep, convolutional neural network and to design a combined ML approach ("VarioCNN"). Our solution is provided in the framework of a cyberinfrastructure that includes a newly designed ML software, GEOCLASS-image, modern high-resolution satellite image data sets (Maxar WorldView data) and instructions/descriptions that may facilitate solving similar spatial classification problems. Combining the advantages of the physically-driven connectionist-geostatistical classification method with those of an efficient CNN, VarioCNN provides a means for rapid and efficient extraction of complex geophysical information from submeter resolution satellite imagery. A retraining loop overcomes the difficulties of creating a labeled training data set. Computational analyses and developments are centered on a specific, but generalizable, geophysical problem: The classification of crevasse types that form during the surge of a glacier system. A surge is a glacial catastrophe, an acceleration of a glacier to typically 100-200 times its normal velocity. GEOCLASS-image is applied to study the current (2016-2024) surge in the Negribreen Glacier System, Svalbard. The geophysical result is a description of the structural evolution and expansion of the surge, based on crevasse types that capture ice deformation in six simplified classes.

Keywords:

Physically Constrained Neural Networks; connectionist-geostatistical classification; crevasse classification; glacier surging; satellite image classification; machine learning

Subject:

Environmental and Earth Sciences - Remote Sensing

1. Introduction

The objective of this paper is to contribute to three challenges in different disciplines: (1) Earth observation and data analysis, (2) climatic and cryospheric change, and (3) machine learning (ML).

Challenge 1. Harnessing the data revolution in Earth observation from space. Observations of our rapidly changing Earth are largely carried out from space, and the collection of such Earth observation data from satellites has rapidly advanced with increasingly large and detailed data sets becoming available for scientific investigations [1]. The data revolution has led to both new opportunities and challenges for science, as extraction of information on complex geophysical processes from large and high-resolution data sets is becoming increasingly difficult (a problem that has been summarized as “Harnessing the data revolution" by the U.S National Science Foundation [2]). In turn, this phenomenon has created a cyberinfrastructure problem in terms of a disconnect between the revolutionary increase in satellite image data on the one hand and the development of numerical Earth system models on the other hand, which are employed to aid in assessment of global climatic changes and their manifestations in warming and sea-level rise (SLR) [3,4,5,6,7,8,9,10,11,12]. A bottleneck is created - and growing with the data revolution - as this new wealth of information revealed by the new satellites makes it hard to incorporate observations into physical-process models as the improved spatio-temporal scale introduced by the data sheds light onto subprocesses not easily incorporated into models.

In this paper, we will introduce an approach that integrates machine learning and physical knowledge into a physically-driven neural network, whose application will facilitate derivation of physical process understanding from high-resolution satellite data. Results include parameterized information in the form of thematic maps (time series of segmented satellite imagery) that can inform modeling as well as lend themselves to direct geophysical interpretation and discovery.

Challenge 2. Glacial acceleration and sea-level-rise assessment. We address a climatic and cryospheric change problem, the phenomenon of glacial acceleration, that has been identified as one of two main sources of uncertainty SLR assessment, as identified by the Intergovernmental Panel on Climate Change (IPCC) in their 2013 Assessment Report 5 [the other source is atmospheric] [13]. The most recent IPCC AR 6, published in 2021, does not present a solution but rather elevates the urgency of understanding glacial acceleration by declaring it a “deep uncertainty" in SLR assessment [3]. The different types of accelerating glaciers include surge-type glaciers, tidewater glaciers, fjord glaciers (isbræ) and ice streams [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]. Acceleration frequency may be intrinsic to the glacier type, quasi-periodic, or single-time. Initialization of an acceleration may be due to internal dynamics of the glacier or externally forced, for instance, induced by warming ocean water at the front of the glacier or controlled by a combination of several factors [31,32,33,34]. Spatial acceleration may be due to subglacial (bed) topography [33] or caused by a dynamic event. All types of acceleration typically lead to the formation of crevasse fields. Surging is the type of acceleration that has seen the least amount of research, and complexity of ice flow during surging defies many classic data analysis methods thus rendering most cyberinfrastructures incapable of modeling this geophysical process.

In this paper, we focus on an exemplary analysis of glacial acceleration during the surge of an Arctic glacier system, the Negribreen Glacier System (NGS), through classification of crevasse patterns as indicators of the drastic and rapid dynamic changes that occur during a surge. The surge has led to mass transfer from the glacier to the ocean on the order of 0.5-1% of global annual SLR in just a few months during the height of the surge [35,36]. The fact that a surge causes sudden mass transfer events from the cryosphere to the ocean leads to a catastrophic type of uncertainty in SLR estimation (the term “catastrophic" defined as continuous changes leading to sudden effects). If we are to reconcile SLR assessment, we need to understand surge processes.

Challenge 3. Integration of physically-constrained classification and modern “Deep Learning" approaches in satellite image classification. The surge is captured in time series of high-resolution satellite image data, which motivates a ML-based classification. While deep convolutional neural network (CNN) architectures have been considered to provide state of the art performance on standard image classification benchmarks such as the ImageNet dataset [37,38,39,40,41], two problems exist: First, deeper networks only lead to increased performance up to a point, after which increased network depth results in increasingly worse performance due to the vanishing gradient problem [42]. Second, and more challenging for applications in the cryospheric sciences, is the fact that no published labeled training data sets exist for tasks of classification of ice-surface features, such as crevasses (see, [43]). The role of crevasse types in identification of deformation types, which are directly related to glacial acceleration, will be described in section 3. A main task is thus the creation of such labeled data sets, required for training of a Neural Net (NN). For CNNs, the problem is exacerbated by the fact that very large numbers of training data (on the order of 100,000s) are needed.

We have previously developed a physically constrained ML approach, the connectionist -geostatistical classification method [18,44,45,46]. The connectionist-geostatistical method uses a two-tiered approach, in which the first step is a physically informed spatial statistical analysis, carried out in a discrete mathematics framework. The output of the geostatistical step provides the input for the NN, activating the neurons of the input layer. In order to carry out an actual classification, a connectionist approach is selected, which can utilize a multi-layer perceptron with backpropagation of errors (MLP-BP), or simply, a MLP. The MLP has proven to provide a robust and functional architecture for this type of classification and provided an efficient solution already 20 years ago [44]. To train the connectionist-geostatistical classification, a small data set suffices, of a size that can reasonably be derived by an expert [44], on the order of several 100 labeled video-scenes or small subimages of a satellite image. However, advances in Earth observation, increasing data resolution and data set size, as well as advances in computer hardware and processing speed warrant investigation of modern “Deep Learning" architectures to facilitate fast and efficient processing.

The salient difference in the effectiveness of the two approaches lies less in the NN architecture (MLP versus CNN) than in the fact that the connectionist-geostatistical classification is a physically informed approach (where the physical knowledge informs our approach to geostatistics), whereas in the case of the CNN the network’s much larger amount of degrees of freedom is what the determination of classes relies on. CNNs can be trained supervised or unsupervised [47].

In this paper, we will investigate the trade-offs of a physically constrained NN and a CNN and introduce a first approach to leverage the advantages of both ML methods in an integrated image classification system. We propose a solution to natural science problems that takes an approach of combining and integrating physically constrained neural networks and modern ML methods. To this end, we will demonstrate that a physically constrained NN can be utilized to aid in creating a labeled training data set of sufficient size to train a CNN. We emphasize that physical knowledge needs to be leveraged in designing a ML approach that can be expected to provide solutions for the physical sciences and advance knowledge there.

2. Background on Neural Networks, Especially in the Geosciences and in (Satellite) Image Classification

In this section, we give a brief summary of the state of the art of ML in the geosciences, as well as ML applied to satellite image classification or analysis. Most works fall into one of several categories, addressed in the following subsections.

2.1. General References: Classic Papers, Review Papers and Books

While neural networks have seen a sudden rise in public attention in the most recent years, first research dates back to neural psychology at the end of the 19th century [48,49]. Rosenblatt [50] introduced the perceptron, and the first deep learning perceptron came out in 1967 [51]. The use of neural networks stalled in the early 1970s, mostly due to limitations computation [52]. Foundational research on CNNs and thus on Deep Learning dates back to the 1960s [40,51,53,54]. Important concepts that mark steps of development of NNs include backpropagation of errors and connectionism. Backpropagation of errors is an application of Leibniz’ chain rule (from 1673) to networks of differentiable nodes that has become a standard in optimizing MLPs [55]. Designing the connectionist-geostatistical classification approach, we applied an MLP with backpropagation of errors in the 1990s, using the Stuttgarter Neural Network software [44,56]. The term connectionism refers to an approach to the study of human mental processes and cognition that utilizes mathematical models known as connectionist networks or artificial neural networks [57,58].

A standard reference for Deep Learning is the book by Goodfellow and others [59]. A good general reference related to several topics of this paper is a book titled “Deep learning for the Earth Sciences: A comprehensive approach to remote sensing, climate science and geosciences" [47].

In their review of remote sensing image classification methods, [60] focus on applications of CNNs for extraction of semantic features in image data. Rawat and Wang [38] present a review of deep convolutional networks for image classification, and Garcia-Garcia and others write a survey of deep learning techniques for image and video semantic segmentation [61]. A review of ML methods for classification of remotely sensed imagery and applications to sea-ice classification is given in [62], and a review of hyperspectral image (HSI) classification using CNNs is presented by [63].

2.2. Classic Applications of NNs in the Geosciences

Prediction and assessment of sea-ice conditions in the Arctic, based on satellite remote sensing, has been an important tool for ocean navigation. Synthetic Aperture Radar (SAR) data lend themselves well for sea-ice observations, because the radar signal penetrates cloud layers and fog, which are frequent in Arctic atmospheric conditions and obscure optical satellite image data. A short review of ML methods for classification of remotely sensed imagery and applications to sea-ice classification is given in [62]. Research on sea-ice classifications based on SAR data goes back to the 1990s [64,65,66]. These early methods typically allow distinguishing a small number of sea-ice types, such as 3 or 4. Most methods use multivariate statistics at pixel values in different channels, for example [67,68,69]. The approach of [67] is innovative in that is bases image segmentation on gradients in the original multivariate statistical parameters, using an edge-detection method. Early applications of NNs include [70,71]. Karvonen’s work [70] was a milestone in state-of-the-art statistical techniques in sea-ice classification with applications to the seasonal ice cover in the Baltic Sea, noting that understanding physical processes is an open problem. [71] introduce an interesting concept that combines a number of statistical parameters and a NN. Recent publications which utilize sea-ice classification include [72,73,74,75].

Neural networks that address pattern recognition problems such as self-organized maps [76], a form of unsupervised classification, or “Learning vector quantification", a supervised NN approach [77], achieved some popularity, but were found to be outperformed by MLPs with back propagation of errors (MLP-BP) for image analysis of repeated structural patterns [44]. An overview of pattern recognition using NNs is given in [78].

2.3. Spectral Versus Spatial Classification

Most image classification methods are based on spectral or multi-spectral classification, i.e. they utilize the fact that an image is composed of several spectral bands [60]. The connectionist-geostatistical classification method that will be utilized in this paper is a form of spatial classification, which is based on the fact that repeating spatial structures of crevasse fields lead to characteristic types of vario functions [44,45]. In [62], we compare statistical and geostatistical classification methods to explore the potential of combined methods for sea-ice classification.

Vario functions are a formulation of the variogram in discrete mathematics [79]. Variograms are employed in satellite image characterization by [80]. [80] explore first and second-order modeled histograms and variograms to characterize landscape spatial structures from remote-sensing imagery (SPOT-HRV NDVI data) and conclude that the method has potential to distinguish effects of anthropogenic landscape-forming processes from those of environmental and ecological processes, however they note that the method can be improved. In contrast, the connectionist-geostatistical method in the form used in this paper employs experimental vario functions directly to initialize the input neurons in a NN.

Most applications of variograms in satellite image analysis are estimations (kriging) or spatial or temporal analyses, rather than classifications, of satellite data, for example [81,82], specifically, of Synthetic Aperture Radar data (SAR data). The differences between geostatistical estimation/ interpolation and extrapolation, characterization and classification are explained in [45].

2.4. Computer Science Developments of ML Methods for Image Processing and Classification. CNNs.

Recent advances in NN research, especially for applications to image analysis/processing/ classification, have been led by computer scientists. In the last approximately 10-15 years, Deep Learning methods have dominated. Within the field of Deep Learning approaches, CNNs are preeminent [47]. Deep Learning summarizes ML approaches that involve Neural Nets with large numbers of internal layers (for example, ResNet-1001 has 1001 layers [83,84]. Overviews of these methods are given in [47,60]. In contrast, the MLP used in the original (2001) connectionist-geostatistical classification has three layers: an input layer, an internal layer and output layer [44].

Types of CNNs that have been widely used include: (described largely following [60] with some updated references) (1) AlexNet, a CNN with five convolutional layers and two fully connected layers, first evaluated for ImageNet [39,40], a prototype test data set. AlexNet won the so-called ImageNet challenge in 2012 [41]. (2) Network-in-Network (NiN) [85], where a MLP is added to each convolutional layer, replacing a simple linear convolutional layer, and an averaging method, called global average pooling, is applied to counteract overfitting, (3) VGG-Net [86], a 19-layer network with small (3x3) convolutional kernels, (4) GoogLeNet [87], a 22-layer network, (5) ResNet [41,83,84], a family of so-called residual networks with depths of up to 1001 layers, (6) DenseNet [88], a NN type that uses cross-layer connections to improve network structure, (7) MS-CapsNet [89], a multi-scale capsule network, ML methods, from multi-spectral statistical methods to CNNs, can be trained supervised or unsupervised [47].

The work in this paper uses a form of ResNet, because ResNets have been found to excel at image classification problems. Hence ResNet principles and architectures will be described in more detail in section (8). Applications of CNNs in image classification are numerous (see, for example, [37,38,40]).

2.5. Identified Needs for Advancing Remote-Sensing-Data Classification Using ML Methods, in General and in the Geosciences

In this paper, we will address some of the shortcomings or challenges associated with applications of CNNs in image classifications, identified in [60]: Lack of sufficient training data (see also [43,90,91], need for remote-sensing-specific CNN architectures, time-efficiency of training CNNs for image classification, and a need for high-level CNN-based applications in remote-sensing image classification. The first three challenges concern technical aspects of NN developments, and our work will address all three. Most interesting to us is the observation (made by [60]) that most current remote-sensing image ML applications resemble those in computer vision, whereas identification of semantically complex information is largely missing in state-of-the-art research. This resonates with the authors’ observations that many modern CNNs are constructed for the same type of simple applications that were tackled with image processing methods several decades ago. For example, the hyper-deep ResNet-1001 [83,84] is derived for multiframe video satellite image super-resolution processing, but then applied to a problem of differencing aircraft-presence/aircraft-absense already analyzed decades earlier. Another application to moving object detection is described in [92]. Note that the ResNets use very small convolutional kernels, which is a match to the fact that many image denoising or sharpening techniques of the 1900s used 3x3 or 5x5 or 7x7 kernels [44]. It appears that the modern ML methods often perform similar applications as old methods, only faster, at higher resolution, or for more modern observations, e.g. satellite videography. In our paper, we aim to create an approach that allows to understand a certain, complex geophysical (cryospheric) phenomenon.

In part, the lack of actual conceptual advances or physical process understanding in the Earth Science from ML applications to image classification is tied to the fact that ML research is based on a relatively small number of labeled training data sets (an example is ImageNet [40]).

Physically-driven NNs fall in the category that is termed “high-level (C)NN-based applications" (by [60]) or classification of geophysically complex information, such as crevasse classification for the surge problem in this paper. Identification and classification of complex information in imagery requires large sub-images, or large moving windows (not the same) [44], and last not least the creation of labeled training data for cryospheric applications. Along similar lines, [93] in their review of Earth science applications highlight NN structures that include modules of data analysis from other than ML fields (see, subsections (2.5) and (2.6)), however, there are only a small number of such approaches listed - and none in cryospheric sciences. Our work falls in this category.

2.6. Recent Applications of NNs in Geosciences

Dominant application fields include land-cover/land-use (urban areas, farmlands, roads, water bodies), biogeosciences, and military applications, there especially change detection of airplanes present/absent at terminals (see, for example, [37,60,90,94,95,96,97]). Neural Nets and other ML methods are increasingly finding applications in the geosciences. Reviews are found in [62,98].

Examples of papers where ML structures are applied include the following: Neural networks have been utilized in studies of vegetation canopy height, using ICESat-2 and Landsat data [99,100]. In a case study of a forest in Texas, [100] investigate the potential of using a Deep Neural Net (DNN) or a Random Forest (RF) model for above-ground biomass assessment based on ICESat-2 and Landsat data, finding similar performance values for the DNN and the RF. [101] explore applications of NNs for analysis of atmospheric data from ICESat-2, treated like image data. Common to these studies is that they are case studies, which investigate the applicability of previously published types of NNs to satellite data analysis. Other applications include forest canopy height determination from ICESat-2 and Landsat data [102], disaster detection and monitoring (flood detection) using Random Forests [103], and geological image classification using CNNs [104].

In summary, recent applications of ML in the geosciences fall into two categories, (1) Computer scientists taking summative approaches to geoscience data classification (different formulation) and (2) geoscientists exploring applications of existing, previously published ML approaches to image analysis. Notable exceptions include feature augmented neural nets for satellite image classification (an approach that augments data sets with handcrafted feature data sets, see, for example, [105]) and a new strand of methods that aim to integrate ML and physics (see, next section).

2.7. Approaches Aimed at Integrating Physical Sciences and ML

Most relevant for the work in this paper is a class of approaches that are aimed at integrating physical sciences and ML, by either using physical knowledge in ML or by using ML to improve physical models.

Exemplary approaches that include physics in ML have been termed “physics-aware ML" [106], based on the concept that the elementary laws of physics ought to be respected by ML approaches in the geosciences. Under this label, challenges, more so than solutions, in the interplay of physics and ML have been identified that may help advance Earth system knowledge (encoding differential equations from data, constraining data-driven models with physics-priors and dependence constraints, improving parameterizations, emulating physical models, and blending data-driven and process-based models). [107] propose an approach termed “Geoscience-aware deep learning" (GADL), which will include geoscience features into Deep Learning models. This is similar to the concept of including handcrafted features in CNN-based satellite image classification suggested by [105]. Other authors recognize the need for collaborative efforts in the field of geoscience and ML (e.g. [108]).

Physically-guided neural networks (PGNNs) leverage scientific knowledge, physical models and observational data in a neural network in order to make better predictions [109]. The idea of physical consistency is used as a learning objective to allow generalization of the learned network. PGNN’s have been used to model complex physical systems that either lack required data constraints or incur large computational costs, such as those found in fluid dynamics problems [110] or power flow analysis [111]. These include applications of ML methods in the determination of numerical modeling parameters.

In a recent overview of ML in the Earth sciences or physical sciences in general, [93] emphasize that advance of knowledge in the sciences, facilitated with the help of ML methods, requires development of novel NN approaches. Examples of methods that include non-ML physical data analysis modules in the NN operations flow/ architecture stem from oceanography (sea-surface temperature patterns, [112]) and biological applications [113].

3. Glaciology Background

3.1. Importance of Surging

Glacier surging is an important type of glacial acceleration, with surge-type glaciers found around the world in many but not all geographic regions, however the phenomenon remains poorly understood due to a relative paucity of comprehensive observational data and a lack of model application to actual, complex ice systems [14,15,16,17,18]. A surge-type glacier experiences a quasi-periodic cycle between a long quiescent phase of normal flow and gradual retreat, and a short surge phase when the glacier accelerates to typically 10-200 times its quiescent speeds with heavy crevassing occurring throughout the ice system.

The recent surge of the Negribreen Glacier System (NGS), an Arctic glacier system located in eastern Spitsbergen, Svalbard, provides a rare opportunity to study a surge in a large and complex system [35,114,115]. Beginning in 2016, the NGS began to surge with acceleration and heavy crevassing within 10 km of the terminus [35,36,115,116]. Largest surge speeds of around 22 m/day, equivalent to 200 times the glaciers quiescent flow velocity, occurred during the height of the acceleration phase in July 2017 [117].

Negribreen last surged in 1935/36 [114], which indicates that the quasi-cycle of the surge in in the NGS is approximately 80 years. From a methodological point of view, it is worth noting that there has been no opportunity for modern data analysis and study of the Negribreen surge process prior to the current surge – this example indicates how the relative paucity of surges limit our ability for their study, but also that the Negribreen surge has provided a unique opportunity to advance several branches of science, mathematics and engineering [117,118]. Relevant to the study in this paper, the NGS has provided a unique collection of ice surface structures and crevasses types in close proximity, for an Arctic glacier system, and thus enabled the ML work reported here.

3.2. The Surge in the NGS

Negribreen is located on Spitsbergen in Svalbard, Norway, with the calving front at approximately (78.57°N, 19.083°E) in 2019, approximately 1000 km south of the North Pole. Negribreen receives most of its inflowing ice from the accumulation zone above the glacier to the west called Filchnerfønna and its northern part, the Lomonosovfønna, through Transparentbreen, Opalbreen and the Negribreen ice falls. The NGS, as defined by the blue contour in Figure 1a, has an ice extent of approximately 500

k m^{2}

. The main glacial trunk, referred to simply as Negribreen, is fed by several major tributaries: Rembebreen to the south, and to the north, Akademikarbreen and Ordonnansbreen. Rembebreen and Petermannbreen (southwest of Negribreen) flow out of a southern part of the Filcherfønnna. Ordonnansbreen does not flow out of an icecap, but its source areas are mountain cirques. The area of the NGS is classically referenced as 1180 km² [114], based on the extent of the glacier system at a time when Petermannbreen and Gardebreen (east of Ordonnansbreen) and their tributaries were still connected to Negribreen.

The NGS is a polythermal glacier, consisting of ice at and below the pressure melting point, and a marine-terminating (tidewater) glacier with ice calving into Storfjorden and the Arctic Ocean. Like other tidewater glacier surges in Svalbard (e.g., [24,115,119]), Negribreen began accelerating near the terminus after a collapse near the glacier front [36]. Surge effects, such as heavy crevassing and elevated velocities, proceeded to propagate upglacier through the end of 2020 when it reached the NGS boundary with Filchnerfønna 30 km upglacier from the terminus [36]. Mean ice-speeds remain significantly elevated in 2023 relative to quiescent speeds, with a maximum of 4m/day near the calving front, though ice-speeds have been decreasing steadily since the peak in 2017 (see Figure 1c). High velocities, large-scale crevassing and enhanced calving during the surge has led to rapid disintegration of the system and large mass loss [36], thus contributing a significant amount to annual sea level rise during the surge years. Examples of surge crevasses are shown in aerial photographs in [36,117].

3.3. The Crevasse-Centered Approach

Because analysis of crevasse patterns takes a central role in the physical part of our ML approach, we give a brief background summary on the role of crevassing in glacial acceleration and to the utilization of the crevasse concept in data analysis and modeling. The central idea of the crevasse-centered approach is that dynamic signatures of fast-moving ice and glacial acceleration are imprinted in ice in the form of crevasses and consequently the deformation history of a glacier can be reconstructed through analysis of crevasse patterns. Structural geologic principles provide links between dynamics, kinematics and deformation, which can be physically formalized and quantified using continuum mechanics, and simulated in numerical models [20,120,121,122,123,124,125].

Crevasses can be characterized using generalized spatial surface roughness, which is a mathematical approach that utilizes parameters derived from spatial statistical functions to capture spatial properties of a surface [45]. Roughness-based characterization applies to both crevassed and non-crevassed ice surfaces and thus allows to map an entire glacier. The approach of combining structural geology and mathematical roughness analysis to derive deformation characteristics in fast-moving glaciers is described in theory in [126] and has been applied to map deformation provinces in surging and continuously fast-moving glaciers throughout the cryosphere [16,44,45,127,128,129,130]. Applications of other approaches to structural glaciology are have been reported by [131,132,133]. These studies have shown that observations of crevasse patterns and surface roughness can be used as a source of geophysical or glaciological information.

Furthermore, the crevasse-based geophysical information obtained from remote sensing observations, such as satellite imagery, can be utilized in numerical models. [20] use Landsat-7 imagery of Bering Glacier, Alaska, during its peak surge phase in 2011 to derive crevasse locations, based on roughness characterization, and crevasse orientations, which are also modeled by simulating the stress regime in a 3D, full-Stokes finite element model. Differences in crevasse characteristics are minimized by optimizing important surge-model parameters such as the basal friction coefficient. This method is extended in [134] to include other sources of model-data comparisons, such as surface velocity, which allows the optimization of additional model parameters such as those related to ice rheology.

Ice velocity observations are popularly used to constrain unknown model parameters (e.g. [135]), however, during a surge the large-scale non-linear dynamics complicate velocity determination [20,136], particularly on short time scales relevant to a surge. Therefore, crevasse observations are our most reliable source of dynamical information during peak surge activity and can be used to derive and optimize basal sliding laws for modeling a surge phase [137].

Crevasse classes, like those derived in the present paper, offer a more sophisticated picture of a glacier’s dynamic and structural state compared to simple crevasse-location and crevasse-orientation characterization. With more detailed geophysical information from crevasse classification, we expect to provide better constraints for a numerical model allowing more optimal parameterization, better error correction for input data sets such as bed topography, and ultimately more realistic simulation of glacial acceleration and its resulting effects on SLR and the evolving cryosphere.

4. Summary of the Approach

4.1. Objectives, Summary of Approach, Classification and Analysis Steps

The main objective of this paper is the exploration of the trade-offs between a physically constrained NN, a CNN (“Deep Learning") for a specific, but generalizable, problem in the geosciences: The classification of crevasse types that form during the surge in an Arctic glacier system, the Negribreen Glacier System, Svalbard, to derive objective information about the evolution of the surge. To achieve this objective, we create a software system, termed GEOCLASS-image that facilitates classification of surface features from high-resolution satellite imagery and other imagery, perform testing and quality assessment (Q/A) of the software system, and release it as core of an associated Cyberinfrastructure.

Based on the results of the two trade-offs studies, we derive an example of a ML approach that combines the advantages of a physically constrained, classic NN with those of a CNN, thereby creating a physically constrained NN with a combined architecture, that will be termed VarioCNN. The final VarioCNN is applied to a time series of WorldView images, to derive information on the evolution of the surge in an Arctic Glacier System, the Negribreen Glacier System.

The combined NN, VarioCNN, will be applied to a time series of WorldView-1 and WorldView-2 images, collected in 2016 – 2018 during the acceleration stage and mature stage of the the surge in the NGS. Each image will be analyzed individually and provide an element in a time series of thematic maps of crevasse provinces. The goal is to derive geophysical information on the evolution of the surge during these core stages. Specifically, we aim to create a classification of crevasse patterns, as they relate to deformation types that occur as a result of ice-dynamic processes. Crevasses are manifestations of the local strain state of the ice. Occurrence of fresh crevassing indicates the expansion of the surge, and as the surge progresses, new types of crevasse patterns form. The time series of crevasse maps will be interpreted geophysically. Lastly, we provide a description of the GEOCLASS-image software system.

In summary, the work in this paper builds on the following three ideas:

(1): Employ geostatistical parameters as mathematical formulation for physically informed extraction of complex information from imagery
(2): Utilize different NN types as connectionist association structures: MLPs and CNNs
(3): Compare and then combine the NNs into a three-tiered approach: Geostatistical-connectionist with MLP and CNN

4.2. Approach Steps

Objectives of the work in this paper are the following:

(1)

Create a software that

(1.1): encompasses the main principles of the connectionist-geostatistical classification method,
(1.2): is sufficiently tested/ robust/ quality-assessed to form the center-piece of a community software for image classification in the geosciences and beyond,
(1.3): has a user-friendly GUI for image manipulation, selection of training data, through classification,
(1.4): facilitates training and classification of several crevasse types
(1.5): allows analysis of different types of satellite imagery
(1.6): includes utility tools for cartographic projections and other image manipulations,
(1.7): includes several Neural Network Types, including Multi-Layer Perceptrons, Convolutional Neural Networks, and
(1.8): is open to generalization to more architecture types

(2)

Explore the trade-offs between a physically constrained NN and a CNN for a specific, but generalizable, problem in the geosciences: the classification of crevasse types that form during a glacier surge

(3)

Create an example of a ML approach that combines the advantages of a physically constrained, classic NN with those of a CNN, thereby creating a physically constrained NN with a combined architecture, and

(4)

Apply the resultant NN to a time series of WorldView images, to derive information on the evolution of the surge in an Arctic Glacier System, the Negribreen Glacier System.

4.3. Terminology

We use the following terms to distinguish ML approaches and NNs in this paper; further explained in sections (6), (7), (8) and (9).

(1): The connectionist-geostatistical classification method [44] is the original approach that combines a physically driven geostatistical analysis of an input data set and a neural network into a ML approach. As described in [45], the geostatistical analysis or characterization can take several different forms, in any case, the output of the geostatistical analysis is used as input for the neural network. Examples of geostatistical analysis include (a) the experimental variogram, a discrete function, and (b) results of geostatistical characterization parameters. The neural network type applied in most of our studies is generally a form of a multi-layer perceptron (MLP) with back-propagation of errors [44,45,46,62] (see section (6)).
(2): The acronym VarioMLP is used for connectionist-geostatistical NN type that is applied in this paper; it employs an four-directional experimental vario function to activate the input neuron of a MLP with back-propagation of errors (see section (6)).
(3): The term Convolutional Neural Network (CNN) stands for a specific class of neural networks that realize the concept of “deep learning" [47,59].
(4): ResNet-18 is the acronym for the specific CNN used in this paper [41,83] (see section (8)).
(5): The acronym VarioCNN will be used for the combined new method that integrates VarioMLP and ResNet-18 into a unique, physically constrained ML approach (see section (9)).
(6): Specific architectures of a NN are identified by adding information in square brackets, for example, VarioMLP[18, 4,(5,2)] identifies a VarioMLP, where 18 is the number of steps in the vario function (for each direction), 4 the number of directions of vario-fcuntion calculations, yielding 72 nodes in the input layer, and (5,2) the factor in the number of nodes of hidden layers; here a MLP with two hidden layer is used, where the first layer includes 72 times 5 nodes and the second layer 72 times 2 nodes (see section (7)).

More generally, $V a r i o M L P [n_{s t e p s}, n_{d i r}, (m_{1}, \dots, m_{n})]$ identifies a VarioMLP, where $n_{s t e p s}$ is the number of steps in the vario function (for each of $n_{d i r}$ directions ) and $(m_{1}, \dots, m_{n l})$ with $n l \in N$ the factor in the number of nodes in $n l$ hidden layers; here a MLP with $n l$ hidden layers is used, where layer i has $m_{i} n s t e p s$ nodes for $i = 1, \dots, n l$ (see section (7)).
(7): GEOCLASS-image is the software system utilized to create the neural networks and labeled data sets referred to in this paper and carry out the classifications of crevasse types during the surge of the NGS, Svalbard [138].

5. Approach Component: Image Classification and Data Sources

5.1. Image Classification Challenges and Approaches

The data analysis challenge is a type of image classification, more specifically, image segmentation. Different types of image classification are the following: (1) Each image is associated to a class, (2) features are extracted from images (an often-analyzed example is the detection of moving features between consecutive images; e.g. planes at terminals [84]. (3) application of the image classification to videography, i.e. time series of images (e.g. [44]) or satellite videography [84], (4) Segmentation of a single image into areas of several different classes, resulting in thematic maps. Early applications in the geosciences, e.g. sea-ice classification, land-cover classification, fall in this category. Typically, the classification is applied as a moving-window operator (i.e. to subimages, which can overlap). From a classification standpoint, the types of image arrangement can all be treated the same way, with different data handling utilities. Challenges in this context lie in the specifics of the observational data, which may include remotely sensed imagery from any tier of observation (satellite, airborne, subaerial, ground), in the specific spatial and spectral resolution of the sensor, signal-background separation, and other characteristics of image. The problem treated in this study is a combination of (4) and (1), applied to a time series of satellite image data of the glacier surface. The classification will be applied to each image individually (i.e. without providing information on the previous image).

Because a surge in an Arctic glacier extends over several years, typically 7-10 [139], data from several different satellite sources need to be integrated in an analysis. Here, we utilize Maxar (formerly DigitalGlobe) data from the WorldView-1 and WorldView-2 satellites. Both satellites carry high-resolution multispectral optical image sensors, but with different resolutions and spectral channels (see, Table 1). Thus, a specific challenge lies in the identification of subimages for training that work for both satellite data types.

In order to facilitate application of our classification system GEOCLASS-image to data from different sources, a large range of data handling utility modules is included (see, software description). The software is designed to be generalizable to several data types, both (a) for different studies, using a single data type, and (b) to integrate data from several sources into one classification.

The classification will be trained using a set of labeled images. A challenge in a spatially based classification, but also in any image classification that uses subimages (or: splitimages), is the selection of a subimage size that is large enough to include several repetitions of the crevasse pattern, but also small enough to be sufficiently homogenous to be assigned to a single class.

5.2. Data Sources and Processing

The analysis in this paper utilizes Maxar WorldView-1 and WorldView-2 optical satellite image data. WorldView-1, WorldView-2 and WorldView-3 data are a widely used type of commercial satellite imagery [140]. Hence the classification approach described in this paper is relevant to large parts of the Earth science community. For example, an Arctic-wide Mosaic and DEM has been created from WorldView data [141]. [142] use a random forrest classification applied to Worldview-2 imagery to identify tree species in the forests of Austria at high resolution. WorldView data is used heavily as the data source for classifications by the Land-Cover/Land-Use and the Vegetation Ecology Communities (e.g., [94,95,97,142,143,144,145,146,147,148,149]).

The Worldview-1,2,3 satellites, owned and operated by Maxar (formerly DigitalGlobe), provide submeter optical imagery of much of the cryosphere [141], including all of Negribreen. WorldView-1 carries a single high-resolution optical imager called the WorldView 60 camera, which has a single panchromatic channel with a spectral range of 0.45

μ m

- 0.90

μ m

. WorldView 60 is a pushbroom sensor operating in a swath of 17.9 km with 0.5 m resolution at nadir down to 0.55 m resolution 20° off nadir. (see Table 1). WorldView-2 also carries a single high-resolution optical imager called the WorldView 110 camera, but has two operational channels. The first is panchromatic channel with a spectral range of 0.45

μ m

- 0.80

μ m

, while the second is an 8-band multispectral channel ranging from 0.4

μ m

- 1.05

μ m

. WorldView 110 is also a pushbroom sensor with a swath-width of 16.4 km with 0.46 m resolution at nadir down to 0.52 m resolution off nadir. A full comparison of the WorldView-1 and WorldView-2 specs is given in Table 1.

In this analysis, we utilize panchromatic imagery from WorldView-1 (launched 18 September 2007, decommissioned September 2023, [150]) and WorldView-2 (launched 8 October 2009, remains operational in 2024, [151]) to analyze the NGS surge from its start in 2016 through 2019. Data from the panchromatic channel will be utilized for the classification approaches in this paper, because it has the highest spatial resolution for each satellite (0.45m pixel size for WorldView-1 data and 0.42m for WorldView-2 data) and thus retains the most information on spatial properties of the ice surface. While we will not employ data from the other spectral channels, we have described statistical and geostatistical image classification approaches for multispectral data elsewhere [62].

The VarioMLP has also been applied to classify Negribreen crevasse provinces based on Planet SkySat data [35,117] (for data description, see [152,153]). Other commonly used satellite imagers include Landsat [154] or Sentinel-2 [155].

Processing. Images were selected w.r.t spatial coverage, temporal coverage and lack of obfuscation. The GEOCLASS-image system includes a tool for evaluation of the area of overlap of a given WorldView image with the area of interest, as outlined by the polygon encompassing the NGS region (Figure 1a). Only images with a 50% or more overlap with the NGS area of interest were used for analysis. To avoid obfuscation, (1) images with a high percentage cloud cover over the area were avoided, as were images with a deep snow cover on the glacier, which would obliterate the crevasse patterns. Thus winter images were rejected. Applying these criteria, 11 high-quality images from spring and summer 2016-2018 were selected from several hundred WorldView data sets (Table 2). All 11 images were used for creation of the labeled training data sets of split images, whereas 7 are used in the time-series analysis (leaving out images that are too close in time to other images already in the time series).

The pixel intensity of the source geotiff images was normalized to the 95th percentile and compressed to an 8-bit range. This was done because although the Worldview-1 and Worldview-2 sensors allow for 11-bit depth, only a small number of pixels in each image actually reached those intensity values, the rest being several orders of magnitude below. By normalizing to an 8-bit range data processing during training and testing was made much faster and more efficient, and by thresholding the highest-intensity pixels, the resulting geotiff images and the split images extracted from them were much easier to view for the human eye, a crucial component for a labeling process.

Custom software was written to extract the coordinates in both pixel space and UTM space for each split image within a given set of geotiff images which falls 100% within the NGS area of interest. These coordinates, along with fields for a class label, class prediction, confidence, and enumerator to reference the source geotiff images were stored in a large table in addition to metadata containing the filepaths, affine transforms to convert between pixel and UTM space, and class enumerations. Thus, this pipeline produces a standardized and efficient split image dataset format which can be utilized by the classification model, visualization tools, labeling tools and utility tools to reproducibly extract the split images from their corresponding source images at runtime. For the 11 source geotiff images selected (Table 2), a total of 108,623 split images were extracted using this pipeline. A breakdown of the total number of split images from each source geotiff is also given in Table 2.

6. The Connectionist-Geostatistical Classification Method

The connectionist-geostatistical classification method [44,45] integrates and interleaves physical knowledge, spatial statistical analysis and computational components at several levels. The approach includes the following concepts:

(1): The idea of using spatial classification to extract features from image data
(2): The idea of using geostatistical parameters to pre-process the imagery
(3): The vario function and residual vario function
(4): Creation of input data to activate the input-layer neurons of the NN
(5): The feed-forward multi-layer perceptron with back-propagation of errors

The idea of the connectionist-geostatistical classification method is to utilize geostatistical parameters to pre-process the input image data, thereby reducing the complexity of a NN required to identify spatial structures that are surface signatures resultant from cryospheric processes. The relationship between glacial acceleration, crevassing and resultant spatial structures reflected in imagery has been explained in section (3.3) and in more detail in [45]. In this section we describe the mathematical and computational ingredients of this approach, in the form that is employed in VarioMLP, the type of connectionist-geostatistical classification utilized as a component of VarioCNN.

6.1. Geostatistical Processing of the Input Image Data

6.1.1. Spatial Homogeneity

Depending on the type of image classification problem at hand, an input image can be a video frame, a photograph, a subset of a video frame or photograph, or a subimage of a large image such as a satellite image (termed split-image here). Split-images are created from satellite images by a moving window process. The goal is to associate each image to a surface class, here a crevasse class, using the classification method. Considering the entire classification a moving-window operation applied to a satellite image, a segmentation of the area of the satellite image into crevasse classes will be obtained, in other terms, a thematic map of structural glaciologic provinces. Similarly, a time-dependent segmentation of a video stream of a glacier will result in a mapping of surface classes.

To allow for characterization and classification of the spatial structures captured, the optimal size of a subimage (split-image) is determined as follows: A feature type needs to repeat approximately three times in the split-image, and the split image needs to be spatially homogeneous with respect to surface structure, here, crevasse type. These two criteria will not be met exactly across an entire glacier region, thus a split image size needs to be selected that meets the criteria sufficiently often to make the classification operational. For experiments with only VarioMLP, split-images of sizes 201 pixels by 268 pixels were used, which follow the (3-4-5) rectangle convention and are approximately 123 m by 92 m for WorldView-1 data. An additional constraint is that the structure requires input imagery of 224 by 224 pixel sizes. For WorldView data and Negribreen surge crevasses, this requirement can be met, however, it limits generalizability of the approach. The entire training was rerun for the combined architecture of VarioCNN using square images.

6.1.2. Vario Functions

In order to characterize the spatial surface structure in a given area, recorded in an image or subimage, we calculate vario functions, defined as follows:

v_{1} (h) = \frac{1}{2 n} \sum_{i = 1}^{n} {[z (x_{i}) - z (x_{i} + h)]}^{2}

(1)

for pairs of points

(x_{i}, z (x_{i})), (x_{i} + h, z (x_{i} + h)) \in D

, where

D

is a region in

R^{2}

(case of profile data) or

R^{3}

(case of image data) and n is the number of pairs separated by h; the distance value h is also termed “lag". The function

v_{1} (h)

is called the first-order vario function. This function exists always and has a finite value.

The residual vario function is often more useful to analyze roughness in situations where a regional trend or a local drift underlies the data. Using

m (h) = \frac{1}{n} \sum_{i = 1}^{n} [z (x_{i}) - z (x_{i} + h)],

(2)

the residual vario function is defined as:

r e s_{1} (h) = v_{1} (h) - \frac{1}{2} {m (h)}^{2} .

(3)

First-order vario functions are formally equivalent to variograms of geostatistics, but introduced in a discrete mathematics framework that facilitates easy numerical implementation as well as generalization to higher order [45].

The variogram is defined for a data set that may be considered a realization of a spatial random function satisfying the intrinsic hypothesis (see Matheron [156,157]), for which generalization to higher order is difficult because of the statistical assumptions that need to be met. Equation (1) corresponds to the statistical second-order moment and Equation (2) to the first-order moment. Residual vario functions work best for data that underly a trend. The second-order vario function and residual vario function can also be used (see, Table 3). Numerical outputs of the first-order vario function have been used in the original connectionist-geostatistical classification in [44], they correspond to the experimental variogram values.

In VarioMLP, first-order vario functions are calculated sampling along four directions of each image, paralleling each side and the two diagonals (see, Figure 2). An efficient sampling algorithm makes use of the matrix structure of the 2D image. The sampling algorithm in [44] utilizes images of relative sizes (3-4-5), where 3 and 4 are the relative lengths of the split image sides and 5 the diagonal. However, this cannot be transferred to training, which requires square images.

The discretization of the vario function is determined by the lag value h in pixels. In the final implementation of the algorithm, the lag value is determined such that 18 lag steps exhaust 80% of the image size (244). The values of

(v 1 (1), \dots, v 1 (m))

(4)

become the activation values for the input layer of the MLP for any value of

m \in N

, in our final structure it will be

m = 18

. Accounting for the 4 directions of directional vario functions calculated for each split image, we have a matrix of input values

V = {(v 1_{(i, j)})}_{j = 1, \dots, n_{d i r}}^{i = 1, \dots, m}

(5)

With

n_{d i r} = 4

for the number of directions, the number of input values is

m_{i n} = n_{d i r} m = 72

6.2. NN Architecture: Multi-Layer Perceptron (MLP)

The NN structure of the connectionist-geostatistical classification is a multi-layer perceptron with back-propagation of errors (MLP-BP or simply MLP). The MLP has an input node per vario-function value, in the final VarioMLP structure

m_{i n} = 4 m = 72

, accounting for 18 lag steps and 4 directions. MLPs have been found to be useful NN types for the solution of this type of classification problem.

The number of nodes (neurons) in the output layer has to equal the number of surface classes, here crevasse classes. In our experiments, this number is

m_{o u t} = 6

. Larger numbers of crevasse classes have been used, ranging up to 18. In [46], we describe a classification with up to 13 crevasse classes.

This leaves the number and size of internal, hidden layers as variables of the NN architecture that will be determined experimentally (see section (7.7.2)). The original work in [44] uses a single hidden layer, in fully connected or partly connected architectures. Here we experiment with two or three internal layers.

7. Image Labeling and Training Approach (for VarioMLP and ResNet-18)

7.1. Training Approach

The training approach reflects the goal of creating a physically constrained NN by combining knowledge of glaciological processes and Earth observation technology with ML methods at every step. In the last section, we already saw that the selection of sizes of training images is controlled by a requirement of spatial homogeneity, constraints associated with the spatial resolution of the satellite imagery, and the spacing of crevasses on the glacier surface, which results from the glacial movement and acceleration and we aim to analyze. Training is carried out as a form of supervised training; training as such is an optimization problem of the model’s internal parameters.

7.2. Crevasse Classes

Crevasse classes are selected by an expert, based on structural glaciology (section (3.3)). Because a main objective of this paper is the integration of a physically constrained NN and a CNN, we utilize (only) four basic crevasse classes: (a) one-directional crevasses, (b) multi-directional crevasses, (c) shear crevasses, and (d) chaos crevasses, or shear-chaos crevasses. The crevasse types associated with these classes are illustrated in Figure 3. Crevasse types (a), (b) and (c) are associated with basic deformation matrices [126]: The one-directional crevasse type results from an extension in one direction (Figure 3a). The multi-directional, including two-directional, crevasse type results from a deformation with more than one stress axis (Figure 3b). It can also result from two deformation processes that affect the material ice in sequence. The shear crevasse type results from shear, a deformation type that typically occurs when fast-moving ice borders slow-moving ice. In the case of a surge, the ice of one glacier (Negribreen) accelerates, while the ice of an adjacent glacier (e.g. Ordonnansbreen) continues to flow at normal, much slower speeds (Figure 3c). Depending on the spatial and temporal velocity gradient, shear crevasses can take different appearances (Figure 3c and Figure 3d). Transportation, weathering and interaction of several deformation processes can lead to complex ice-surface and near-surface structures, in which the signatures of individual processes cannot be distinguished any more, thus they are summarized as “chaos" crevasse class (Figure 3d). In some areas, the signature of shear deformation is still evident in the chaos crevasse fields (Figure 3f), but separation in an image classification process may be too difficult, thus the class is summarized as chaos/shear-chaos. Two additional classes need to be added to each classification, one for undisturbed snow/ice and a rest class for “other" surfaces, which can include moraines, rock avalanches, subimages that include snow/ice and rock surfaces, and indiscernible images, to limit misclassification of the better defined 4 crevasse classes. A rendering of representative examples of split images, subselected from WorldView satellite imagery, is seen in Figure 4. The images have a size of 201(=3*67) pixels by 268(=4*67) pixels, i.e. the follow the (3-4-5) size rule.

7.3. Image Labeling

A second main objective of this paper is the derivation of a labeled training data set for the problem of crevasse classification from satellite imagery. With this objective, we address the problem that application of ML in the geosciences and specifically the cryospheric sciences has been hampered by the lack of labeled training data sets, as identified by authors working in the field (e.g. [43,90,91]) and described in more detail in section (2).

To initiate the training, sets of split-images for each class are identified and selected by the structural glaciologist. In our experiments, we found that several tens of example images per class are sufficient for an initial training run of VarioMLP.

Technically, image labeling is carried out using the Split Image Explorer Tool, visualized in Figure 5, described in more detail at [138]. Individual images can be selected from the WorldView image, optionally with a polygonal area of interest outlined that contains the glacier area, viewed enlarged at the top left, and associated to a class. The association can be (1) performed initially by the glaciologist, or (2) displayed as the result of the NN classification, or (3) overwritten (accepted or rejected) in a control pass in the training loop (see section (7.6)). A sliding bar in the left middle of the explorer toll allows application of confidence as a filter for visualization (only images classified with a confidence level exceeding the user-selected confidence threshold are displayed in color).

7.4. Data Handling and Feature Engineering

Feature engineering is the design of the input for the neural network. Of importance for robustness of the results is that identification of a crevasse type is independent of orientation and view angle of the satellite, relative to features on the ground. Directional bias is removed by calculating vario functions in several different directions for each split-image.

Prior to extraction of split-images, the satellite image needs to be oriented in a geographic or rectangular projection framework that facilitates output of the final classification in the form of a thematic map of crevasse provinces. Raw satellite imagery is typically collected along orbits and constrained by the view angle of the observatory, which is fixed for some imagers, but adjustable or sweeping for most (including WorldView). To accomplish mapping larger areas from a single or multiple satellite images, utility functions for image projection and mosaicking are implemented as part of the GEOCLASS-image system. To visualize, the reader may compare the different sizes and orientations of the input imagery shown in Figure 6.

Data from the panchromatic channel of WorldView-1 and WorldView-2 are utilized, because the classification principle is a spatial classification. In the more common form of multivariate statistical classification, data from several spectral channels are used. Our study combines imagery from two different image systems, WorldView-1 and WorldView-2, which result in imagery of somewhat different pixel size and resolution (0.45 m for WorldView-1 and 0.42 m for WorldView-2, see section (5.2)). A utility function in GEOCLASS-image facilitates simultaneous analysis and classification of imagery from both satellite types.

Application of the vario function to a typical image from the classes of (1) undisturbed snow and ice surfaces and (2) one-dimensional crevasse types, seen in Figure 2, illustrates how the NN can separate these crevasse types based on the vario function values for different directions and distances. First, the maximum of the resultant vario function values is much lower for undisturbed surfaces than for crevassed surfaces (compare the

v_{1}

axes in Figure 2c and Figure 2d). Second, an anisotropic behavior of the set of directional vario functions is typical for one-directional crevasses (Figure 2b,d), where the direction that is near-parallel to the crevasse direction does not reach the sill of the vario function (green in Figure 2d), whereas the other three directional vario functions exhibit a typical wavy pattern resultant from washed out cross-correlation, with spacing dependent on the relative angle of the crevasse orientation to the directional calculations.

7.5. Criteria for Evaluation of Training Success

We use the terminology of intrinsic criteria for quantitative, computational criteria (cross-entropy measure of training loss, confidence of classification result, co-occurence matrix) and extrinsic criteria for glaciological criteria which are typically based on airborne field observations of the glacier system during surge and on additional expert knowledge on the evolution of crevasse types during a surge [16,18,20,127,128,158]. The application of extrinsic criteria is best explained in an applied example of image labeling and in the geophysical interpretation (see sections (7.6) and (11)).

7.5.1. Softmax Function

A softmax function is used to convert the NN output layer to a probability distribution for the possible classes. Each output node is assigned a value between 0 and 1 (

p_{i}

), with all outputs summing to 1, so that they can be interpreted as probabilities. The class with the largest probability is selected as the NN’s final classification of a given input and the confidence of the classification result is equal to that probability, i.e., the maximum of the softmax function. The loss function associated with the softmax function is given by the cross-entropy loss, which is used for training purposes (see, section 7.5.2). The softmax function is commonly used in many CNNs [40,41] due to its simplicity and probabilistic interpretation.

7.5.2. Cross Entropy

Training an MLP is an optimization of the model’s internal parameters, carried out iteratively. At each iteration, VarioMLP predicts the class of each training example and uses the cross entropy loss function as a quantification of the difference between predicted values and training data. Entropy was first introduced in [159] to quantify the level of uncertainty of a random variable X based on possible outcomes

p_{i}

according to

H (X) = - \sum_{i = 1}^{n} p_{i} l o g (p_{i})

(6)

for

i = 1, \dots, n

and

n \in N

is the number of classes. For VarioMLP, the outcomes are the crevasse classes and the probabilities are those which the model assigns to each output neuron. The DDA-MLP uses cross entropy loss as its loss criterion, calculated as

L o s s_{c e} = - \sum_{i = 1}^{n} t_{i} l o g (p_{i})

(7)

where n is the number of classes,

t_{i}

is the truth label for class i, and

p_{i}

is the model-predicted probability for class i as its loss criterion. The optimization problem is then for the model to learn an internal parameter set which minimizes this loss function, and to accomplish this the DDA-MLP employs stochastic gradient descent (SGD) via the Adam algorithm for first-order gradient-descent based optimization problems introduced by [160]. During training, backpropagation, as defined in [161], involves computing the gradient of the loss layer by layer, starting from the output and moving backward towards the input layer. In this case, the Adam algorithm for SGD only computes the first-order gradient, and employs adaptive learning rates for parameters based on estimates of first and second order moments, and updates the parameters proportionally to the learning rate hyperparameter in the direction of steepest descent of the gradient [160]. Application of cross-entropy loss for training of deep NNs is described in [162].

Cross-entropy loss is utilized to identify functional training runs and reject training mistakes. For example, overfitting in a test-run of the model is illustrated in Figure 7.

7.5.3. Confidence

Classification confidence is a measurement of the probability that the association of an input image to a class is correct. Confidence approaches have been discussed in [163]. We utilize confidence to accept or reject classified crevasse images into the training data set, applying a threshold of 90% confidence. The Split-Image Explorer Tool allows user-selected confidence values.

7.5.4. Other Training Hyperparameters

Overall, the training and feedback-loop experiments were repeated several times with different parameters and variations of the classification models. The split of the training data into actual training images and validation images was held constant at 80% (training) and 20% (evaluation) for all experiments training VarioMLP and ResNet-18. This means that however many labeled training images existed for a given run, 80% were randomly selected at runtime for the actual training process, and 20% were reserved to evaluate model performance after each epoch. It is important to separate the training and evaluation datasets, because if the model is not evaluated on images it did not see during training, it will simply memorize the training dataset if the model is sufficiently complex. Each training run was carried out with a maximum number of 50 epochs. For each epoch which resulted in a new best validation loss, a checkpoint of the classification model was saved for further evaluation. For all training experiments, cross entropy loss was used with the Adam optimizer as the method for gradient descent calculation (section 7.5.2).

7.6. Interleave of Split Image Labeling with the Training Process: The Feedback Loop

Following creation of an initial set of expert-labeled training data, a VarioMLP is trained. The resultant network architecture can be applied to simply classify an entire satellite image. However, in order to derive a large data set of labeled training images, an iterative approach to split-image labeling and VarioMLP training is taken. The goal is the creation of a data set that is large enough to train a CNN, which in turn can be expected facilitate rapid classification of many satellite images for similar problems, i.e. a higher level of generalization of the task of crevasse classification.

The iterative approach is implemented in as a feedback loop in VarioMLP, executed as a mix of computational criteria and expert interaction, interleaved in the training process of VarioMLP as follows (see, Figure 8). The initial data set is considered the first-order data set, used to train the NN. Validation loss and training loss are evaluated as quantified by the cross-entropy measure (see, section 7.5.2). A trained VarioMLP architecture results.

The VarioMLP, with first-approximation final structure, is then applied to classify the entire set of all split-images from a given satellite image (all split-images inside the polygon that outlines the NGS). Each split image is associated to a class and written out into a directory of that class. Next, only split images with a classification confidence at least 0.9 are retained in the crevasse class directories. Then, the glaciology expert quickly views all new images in each class (i.e. any images that were not part of the original labeled data set) and rejects images that are misclassified. This process is much faster, requiring a fraction of human expert time, than labeling thousands of split-images initially. The VarioMLP is then rerun, using the larger set of labeled data as training data. By repeating the feedback loop, a labeled data set with 3933 images was obtained in a reasonable amount of time. The final labeled data set of 3993 split images includes between 522 and 953 images per crevasse class, with a distribution given in Table 4. This distribution is relatively even and not varied enough to cause a significant potential source of inaccuracy for the model training.

In this exemplary application, the expert that selected the initial data set was a glaciologist experienced in structural glaciology, especially observation of glacier surfaces during surges (the lead author of the paper), whereas in later iterations, the sorting of images was performed by a computer science student, indicating that the sorting procedure grows increasingly fast and simple, as the training goes through several iteration steps. To simplify the process, only a set of four main crevasse classes, plus undisturbed plus a rest class/chaos class were chosen for this study.

On the other hand, to ascertain general application of the labeled training data set to a range of previously unseen WorldView data sets from the NGS and other regions of surge glaciers, as well as to analysis and classification of data from WorldView-1 and WorldView-2, split-images were sourced from 11 different WorldView data sets collected over the NGS in 2016, 2017 and 2018. This resulted in a total of 108,623 split-images. The distribution of split images in the final 3933 data set per WorldView source files is given in (Table 2).

At this point, we have achieved two results: (1) The derivation of a labeled training data set, and (2) The VarioMLP together with the feedback loop as either a standalone NN or a component in a physically constrained CNN, the VarioCNN.

In the next sections, we will describe ResNet-18, the CNN component selected for VarioCNN, its training, comparison to VarioMLP, and finally design of the combined classification system, VarioCNN, and the classification software system, GEOCLASS-image. Experiments with VarioCNN, using GEOCLASS-image, are rounded off by geophysical application and interpretation of the evolution of crevasse provinces during the surge in the NGS.

7.7. Determination of VarioMLP Hyperparameters

The VarioMLP architecture includes hyperparameters which can be optimized to tune the model for testing performance and generalization. Both the Directional Variogram and Multi-Layer Perceptron steps of the VarioMLP architecture have hyperparameters which affect the training and testing in different ways. Input image size and resolution has already been discussed in section (6.1.1), as this is constrained by the observations technology, the surface signatures and the assumption of spatial homogeneity. To optimize the architecture of VarioMLP, experiments were carried out to determine the optimum number of lag steps in the vario function and the shape and number of the internal layers. In both series of experiments, cross entropy loss was used as the measure for assessment of training quality and network performance.

7.7.1. Number of Vario Function Steps

The optimal number of lag steps in the directional variogram becomes a function of the input image size in pixels. If the number of lag steps used is too small, the directional variogram may not be able to provide sufficiently different characteristics for a given set of surface types for reliable classification. If the number of lag steps is too high, the characteristics provided by the directional variogram can be polluted by noise and small-scale features that are present in multiple surface types. These characteristics may bury the salient features of each surface type needed for classification. During training, the lag step parameter was tested at values of 10, 12, 14, 16, 18 and 20 (Table 5). In this experiment, the hidden layer shape was fixed at [5,2] and the final validation data set included 786 images (20% of the final 3,933-image labeled data set.) The best performance was achieved with a lag step value of 18. It is interesting to note that performance is not correlated with the number of lag steps used in the Variogram phase. Rather, the model seems to perform relatively well with values of 12, 14 and 18, and relatively poorly with values of 10, 16 and 20.

7.7.2. Hidden Layer Structure in the MLP

The number of hidden layers in the MLP step of the VarioMLP architecture is a function of the size of the input layer, as well as the size of the training dataset. If the number of hidden layers is too large relative to the input layer size, then the model becomes unnecessarily complex and thus more susceptible to overfitting. Too few hidden layers produce the opposite problem - the model lacks the complexity necessary to capture the full variance of the dataset and suffers from underfitting. This is an example of what is commonly referred to in machine learning as the bias-variance tradeoff [164,165,166]. Choosing a perfect model size and depth becomes increasingly difficult for problems where there is no existing reference dataset of labeled training examples, since as the size of the training dataset increases so too does the optimal fully-connected model size. However this relationship is nearly impossible to calculate, so trial-based estimation is necessary. To reduce the scope of this optimization during training, the shape of the hidden layers of MLP architecture were limited to being exact multiples of the input layer size. An MLP model denoted as [5, 10, 2] refers to a model with 3 fully-connected hidden layers, which contain 5, 10 and 2 times as many nodes as the input layer respectively. During training, model architectures of [2, 2], [5, 2], [5, 5, 2], [10, 5, 2], and [10, 10, 2] were tested (Table 6). For each test run, the lag steps for the variogram stage were fixed at 18. The best performing hidden layer shape was [5, 2]. For networks both wider and deeper than this, the performance significantly decreased. This is likely due to the fact that for the relatively small amount of information at the input layer (the concatenated output of the variogram stage), larger networks simply converge on memorizing the training dataset. This is another example of the bias-variance tradeoff at play, the network must not be overly complex for the scope of the input data.

8. ResNet-18

8.1. Description of the CNN ResNet-18

The term ResNet summarizes a family of convolutional neural networks with depths of up to 1001 layers [41,83,84], based on residual learning. ResNet-18 is the deep learning network with the fewest internal layers that is commonly used today (e.g. [167,168,169]) and the ResNet type that will used for the stuides in this paper. In medical sciences, labeled training data exist [167]. Notably, for image feature detection in medicine (brain tumors, alzheimers, to name a few) labeled data sets exist, which is not the case for image classification in geoscience studies.

8.1.1. Mathematical Principles of ResNets

The following mathematical description of deep residual networks (ResNets) is summarized from [83], the basic paper that introduces ResNets. ResNets consist of many stacked so-called “Residual Units". The defining equations of a ResNet are the following two:

y_{l} = h (x_{l}) + F (x_{l}, W_{l})

(8)

x_{l + 1} = f (y_{l})

(9)

where

x_{l}

and

x_{l + 1}

are input and output of the l-th unit, F is a residual function and

W_{l}

denotes the set of weights (and biases) associated with the l-th residual unit, which may consist of K number of layers itself (

K = 2

K = 3

are typical values). The residual units form the building blocks of the modularized architecture that characterizes a ResNet. In [41], only

h (x_{l})

is an identity mapping and f is a ReLU function, whereas in [83], both

h (x_{l})

and

f (y_{l})

are identity mappings. The work in [83] shows that in the case that both these are identity mappings, the signal can be directly propagated from one unit to any other unit; this finding leads to the definition of skip connections. The identity mapping

h (x_{l}) = x_{l}

(already derived in their first paper, [41]), achieves fastest error reduction and lowest training loss (among a number of model variants studied in [83]). The use of the second identity mapping,

f (y_{l}) = y_{l}

, is a new interpretation of the activation functions (which can be, for example, ReLU, the function used in our ResNet-18 model) as so-called pre-activation of the weight layers, as opposed to a hitherto view of post-activation. This paper [83] introduces a 1001-layer ResNet, which is easier to train and generalizes better than the original ResNet described in [41].

The essential mapping of the ResNet is reduced to Equation (3), by using the two identity equations and applying them to equations (1) and (2):

x_{l + 1} = x_{l} + F (x_{l}, W_{l})

(10)

and then, using definitions of the elements in a building block, the following recursive equation is obtained

x_{L} = x_{l} + \sum_{i = l}^{L - 1} F (x_{l}, W_{l})

(11)

that associates any deeper unit L with any shallower unit l. The summation term

\sum_{i = l}^{L - 1} F (x_{l}, W_{l})

is the residual between units L and l. The innovative property of the ResNet is this residual function, which distinguishes it from previous designs of so-called plain networks, where a feature

x_{L}

is a series of matrix-vector products. The simplification used in the defining equations of a ResNet not only introduces the skip connections, but also results in the consequence that the gradient of a layer does not vanish, even when the weights are arbitrarily small. The signal can be propagated both forward and backward between layers L and l. This paper also states explicitly that deeper (plain) networks suffer from increased errors.

He and others [83] compare

n = {3, 5, 7, 9}

, leading to 20, 32, 44, and 56-layer networks, and

n = 18

that leads to a 110-layer ResNet. [83] conclude that ResNets (including ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152, whose main difference lies in the number of network layers) perform better in image classification aimed at feature extraction than other CNN models, evaluated for the ImageNet dataset. In our work, we will investigate the potential of ResNets for identification and classification of complex cryospheric spatial patterns, namely crevasse patterns.

8.1.2. Properties of ResNet-18

A few facts about ResNet-18 are the following:

(a) Layer structure. ResNet-18 consists of 16 convolutional layers, 2 downsampling layers and several fully connected layers. Convolution kernels are of size 7x7 for the first convolution layer and of size 3x3 for the following convolutional layers. A ResNet-18 can include shortcut connections.

(b) Size of input images. The input images need to have a size of 224 by 224 pixels, as a result of the requirement that the number of input neurons of the fully connected layer is fixed. This is a very limiting fact, because it cannot be assumed that patterns or objects can be identified in a (sub)image of a specific size. After identifying optimal sizes for input images in the crevasse detection, the creation of labeled data sets was re-run, using images of 224x224 size, in order to allow training of a ResNet-18 model. The connectionist-geostatistical method and VarioMLP do not require a certain size of input imagery. Vario-function calculation is computationally most efficient when images of a size of 3-4-5 are used [44], however a rectangular input image of any size can be utilized. This type of flexibility is important, because it does not require a-priori assumptions about the relationship between sensor resolution, data fields and physical sizes of cryospheric or other patterns.

(c) Class association is carried out by an eigenvector composed of multiple probabilities, the class with the highest probability will be associated to the image to be classified.

(d) Performance. There is research that indicates [92] that ResNet-18 can actually not be expected to perform a complex class association, such as the crevasse classification, but only extraction of low-level features such as edge detection and texture. [92] state that deeper ResNets (with more layers) would be needed for detection or classification of more complex features, such as spatial context, global semantic and local features of objects. As demonstrated in the work in this paper, the ResNet-18 model that was derived from a combination with the VarioMLP has resulted in classification of crevasse patterns. Of note, in classic satellite image processing and other image processing, edge-detection and texture analysis are obtained by convolution with small kernel images of sizes 3x3 to 7x7 [44], so the finding of limitations caused by small kernels in ResNets are an analogue of limitations in image processing known several decades ago.

(e) Activation functions: [170] explore the effect of different activation functions on image classification results. They note that CNNs perform better than machine learning techniques because of their multi-layer hierarchical feature extraction which is controlled by variables such as number of hidden layers, activation functions, learning rates, initial weights, and decay functions, however, they attribute non-linearity of the network only to the activation function which motivates their comparative investigation, regarding under-researched problems including (a) vanishing and exploding gradients during back-propagation, (b) zero-mean and range of outputs, (c) compute complexity of activation functions and (d) predictive performance of the model. The activation function used in our ResNet-18 experiments is the Rectified Linear Unit (ReLU), which is commonly used elsewhere in the literature.

8.1.3. Applications of ResNets

Examples of studies that utilize ResNet-18 include applications to moving object detection in super-resolution videos [84] or complex scenes [92], an analysis of COVID presence [167], an application to Alzheimer’s diagnosis [168], an engineering application to classification images of bearing faults [169]. In medical sciences, labeled training data exist [167], other than in geosciences. In order to process video satellite image sequences of relatively low resolution, collected by Chinese video surveillance satellites Jilin-1 and OVS-1, into data streams that have super-high spatial resolution while maintaining the high temporal resolution of the video data allowing to detect moving objects, [84] develop a multiframe video super-resolution neural network (MVSRnet). The resultant MVSRnet is a ResNet with a 1001-layer depth, the largest depth published to date and first mentioned in [83], and it includes an attention mechanism to improve moving feature detection. While the complexity of the MVSRnet is impressive, it performs a relatively simple task, the detection of moving objects. In contrast, our research aims at extraction of complex information. In the next subsection, we cast some light on the expected and reported differences in capabilities and performance of comparatively shallow (such as ResNet-18) and deeper CNNs.

8.2. Selection of ResNet-18 as the CNN Structure for the Work in this Paper

ResNet-18 was chosen as the convolutional architecture. The ResNet architecture was first developed in 2015 in response to a growing set of problems with CNN image classification architectures at the time [41,83]. While deep convolutional architectures had been shown to provide state of the art performance on standard image classification benchmarks such as the ImageNet dataset [39,40,41], it was quickly discovered that, in a somewhat counter-intuitive fashion, deeper networks only led to increased performance up to a point, after which increased network depth resulted in increasingly worse performance. This was due to what is known as the vanishing gradients problem [170,171]. In essence, the deeper a network becomes, the smaller the derivative used to adjust model weights becomes during backpropagation. After a certain point during backpropagation, this value becomes so insignificantly small that the initial layers of the network are no longer trained at all. This results in significantly worse performance, and overall training inefficiency at scale. ResNet solves this problem by allowing skip or shortcut connections between convolutional layers, in which the output from a given layer is both fed directly to the next layer, and several layers later in the model. This effectively solves the vanishing gradient problem, as it provides a much shorter path for the gradient to adjust the initial layers of the model during backpropagation. Not only do ResNet models achieve similar or better performance to other state-of-the art convolutional image classification models of its time such as VGG [86] or AlexNet [39,41], they do so with far fewer trainable parameters. ResNet-18, the most shallow commonly used variation containing 18 layers has roughly 11 million trainable parameters, whereas VGG16 (a state of the art convolutional model contemporary to ResNet) contains around 138 million. This drastically reduced model complexity with comparable performance results in faster, more efficient and more generalizable training for ResNet models as opposed to traditional convolutional architectures [83]. Furthermore, deeper ResNet architectures have been shown to result strictly in increased performance models with even hundreds of ResNet layers perform better on benchmark datasets than shallower versions of the same architecture.

For this project, ResNet-18 was chosen because of its overall efficiency and demonstrably high performance on similar image classification tasks. Because the problem domain requires building a custom dataset from scratch, deeper and wider convolutional network architectures which mainly show increased performance over ResNet-18 on benchmark datasets with millions of training examples are not necessary. In summary, ResNet-18 provides an efficient and lightweight reference for convolutional image classification architectures. The reference ResNet-18 architecture was left essentially unmodified for the purposes of this paper, in order to serve as a reasonable benchmark for convolutional models on the glacier surface type classification problem, and for ease of comparison with the VarioMLP model.

8.3. Determination of ResNet-18 Hyperparameters

Because the main goal of this paper is to evaluate a CNN compared to a physically constrained NN, and the ResNet-18 model was selected as explained above (section 8.2), the commonly used hyperparameters of the ResNet-18 model were not reevaluated. The purpose of the ResNet CNN architecture in this project is to provide a benchmark for CNN models on the surface classification problem to compare against the VarioMLP. The goal is to provide a methodology which inherits the strengths of both the “shallow" fully-connected VarioMLP architecture with the deep ResNet model.

For ResNet-18, the one parameter tested is the batch size (Table 7). Batch size refers to the number of input examples which are fed forward through the MLP before backpropagation is performed during training. The resulting loss and gradient for backpropagation is then an average of the losses from each input in the “batch". With a larger batch size, the model sees a greater variety of examples for which to tune its weights and biases, and this can improve the model’s generalizability. Too large of a batch size relative to the amount of training data can result in a poorly directed gradient during backpropagation. For the Resnet-18 model, batch sizes of 1, 2, 4 and 8 were tested, with 2 achieving the lowest loss of any tested model or training hyperparameter set (Table 7). It is interesting to note that after a batch size of 2, performance degrades significantly. This is likely a factor of the size of the training dataset. The larger the ratio of batch size to overall training set size, the less directed the gradient will be during backpropagation.

9. Comparison and Integration of VarioMLP and ResNet-18 in a Combined NN Model: VarioCNN

9.1. Comparison of Capabilities and Performance of VarioMLP and ResNet-18

The primary advantage of VarioMLP over the CNN (ResNet-18) is that VarioMLP can be trained with a relatively small set of labeled training data, of a number of input images that can feasibly be labeled by an expert in the field. This is simply not possible for a CNN, not even a relatively shallow CNN such as ResNet-18, because the number of training images ranges in the 100,000s (and is orders of magnitudes higher for other types of CNNs, as reported in section (2)). Data sets of several hundred initially labeled images of crevasse types suffice to train VarioMLP, and thereafter an efficient training feed-back loop, developed here, can be applied to increase the size of the training data set, until a number is achieved that facilitates training of ResNet-18.

In comparisons of the efficiency and performance of VarioMLP and ResNet-18, the VarioMLP outperformed the CNN for training data sets for up to several 1000 (3500 or 5000) labeled images. An explanation for this lies in the fact that we utilize a physically informed spatial analysis/geostatistical approach to NN neuron activation is more important that the NN model architecture. In other words, the performance-determining factor is not the selection of a shallow, potentially outdated model structure, here, the perceptron, but rather the pre-training effect of the geostatistical analysis, combined with expert understanding of the relationship between surface signatures of crevasses on the ice surface and resultant outputs of experimental vario-functions. The MLP turns out to remain an efficient model architecture for this first classification task.

For classification tasks of many more images, ResNet-18 is faster and trained more efficiently. For this study, we have only carried out a limited number of experiments with CNNs.

9.2. Derivation and Application of a Three-Tiered VarioCNN

The main idea is to use the physically constrained Vario-MLP to drive the CNN. Thereby we employ the main advantage of the VarioMLP, to derive labeled data sets of 4000 (3933) training images, starting with approximately 300 labeled images and using the iterative active learning approach.

The combined architecture, illustrated in Figure 8, consists of three tiers: (1) the vario-function calculation for each input image data set, which is used to activate the nodes in the VarioMLP, using the connectionist geostatistical method, (2) the MLP component of Vario MLP, which employs error backpropagation through the layers as a means for optimization of weights, and (3) ResNet-18, a CNN that can take input from satellite split images of a size of 244 by 244 pixels, which are piped through the convolutional structure and associated with 6 output classes, here, 6 crevasse classes. A retraining loop around VarioMLP serves to grow a training data set from a size of several hundred images, selected by an expert, to a size that sufficiently large to train the CNN (here 4000 images approximately). The ResNet-18 has a stable and robust outcome for classification. Based on the research in this paper, we can formulate the hypothesis that a CNN can be used as a component in a physically constrained NN. We have tested this hypothesis for ResNet-18.

In the next section, we will use the trained ResNet-18 to perform a classification of a time series of WorldView-1 and WorldView-2 data sets to analyze the evolution of the surge in the NGS, based on insights from formation and expansion of 6 basic crevasse classes (4 classes and two rest classes). Experiments using VarioMLP alone for classification of crevasse types of the Negribreen surge from WorldView imagery and Planet SkySat imagery are described and analysed in [46].

10. Experiments with VarioCNN: Application to Classification of Crevasse Types from a Time Series of WordView Satellite Imagery

Following training of VarioCNN using the 3933 set of labeled training images, a final architecture of VarioCNN was derived. The final, trained VarioCNN was then applied to a time series of 7 WorldView-1 and WorldView-2 data sets (Table 2). From a large catalog of WorldView images, 11 images were found to be suitable w.r.t. cloud cover and area coverage, of those, 7 images were selected to represent the time interval between May 2016 to May 2018. A disadvantage of any analysis that utilizes WorldView imagery is the large delay between the time of data collection and the time when imagery is first made available to the glaciological research community. All useful images are WorldView-1 or WorldView-2 data.

As described in section (3), crevasse types are the results of ice-dynamic processes that occur during the surge. The spatial patterns recorded in the satellite image provide a snap-shot of the local result of the dynamic state of the material ice, which is the kinematic force/ state associated with the deformation that results in the crevasse type.

At the beginning of the classification work for this paper, 22 crevasse classes (including ancillary classes), were created. To facilitate efficient implementation and application of the software, crevasse classes were combined into four larger classes: The current selection of classes ((1) one-directional, (2) multi–directional, (3) shear and (4) shear/chaos) provides relatively simple descriptors of deformation kinematics, but allows to capture the formation of main crevasse provinces, as the following analysis will demonstrate.

The resultant time series of thematic maps of the 6 surface classes, which include four crevasse types, undisturbed surface and a rest class, is shown in Figure 9. A criterion for the consistency and geophysical interpretability of the results is the fact that the areas of each crevasse class consist of one or several simply connected regions, without being post-processed, such as smoothed. The region of crevassed ice expands up-glacier, as time evolves and the surge progresses. Therefore, interpretation of our results from the physically constrained CNN, VarioCNN, is warranted and will be presented in the next section.

11. Geophysical Application: Evolution of the Surge in the NGS

The following geophysical analysis is based on the time series of thematic maps of crevasse classes, derived from WorldView imagery using VarioCNN (Figure 9).

11.1. Evolution of Crevasse Classes in 2016

The first image, collected in 2016-06-20 (Figure 6a and Figure 9a), corresponds to the time when the start of a surge was first detected in Sentinel imagery [115]. The area of crevassing at this time coincides with an area of fast movement [36,46]. At this time, one-directional crevasses form the center of the fast-moving region, which is flanked by shear crevassing on both its northern and southern edge. This classification is correct, assessed by visual interpretation and field observations that (1) acceleration occurs in an along-flow direction near the calving front, and the basic physical notion that (2) shear crevasses form between fast-moving and slow-moving ice. Only five days later, on 2016-06-25, the crevassed area had already expanded in both upglacier and across-flow directions (Figure 9b), as indicated by the increased region of ice classified as crevassed. This rapid change matches existing knowledge that a surge is a catastrophic event, which expands rapidly in the acceleration phase [18,20,128,137,172,173,174,175,176,177]. At this point, two new crevasse classes occur, (2) multi-directional and (4) shear/chaos. Multi-directional crevasses form, as the next acceleration wave affects existing one-directional crevasses (so these are multi-generational crevasses) near the calving front. Further progression of deformation results in crevasses that are summatively termed “chaos", as the ice is too fractured to allow identification of crevasse phases (Figure 3d, f). Notably, the new crevasse types occur first in the oldest regions of crevassed ice. Thus, the analysis of crevasse formation allows reconstruction of the surge evolution (beyond velocity). By 2016-July-08 (Figure 9c), the surge has expanded further upglacier, with its leading edge reaching almost as far upglcaier as the Negribreen-Ordonnansbreen junction. Fields of one-directional crevasses directional crevasses are always on the upglacier edge of the surge expansion (see also Figure 9d–g).

11.2. Evolution of Crevasse Classes in 2017

In 2017, the surge in the NGS reached maximal velocities of 22m/day in July, which marked the height of the acceleration phase and the most intensive phase of new crevassing. This result was obtained from velocity analysis of Sentinel-1 SAR data and from field observations of the authors [36] (see also Figure 3). Leading up to this, the surge expanded far upglacier, as seen in analysis of WorldView imagery from 2017-04-25 (Figure 6d and Figure 9d) and 2017-05-30 (Figure 6e and Figure 9e). The image collected 2017-04-25 only covers the lower glacier region, but the image from 2017-05-30 shows that the surge has expanded as far as the Negribreen-Akademikarbreen junction, indicated by isolated fields of one-directional crevasses, separated by undisturbed ice. Notably, the marginal area of Negribreen is not part-taking in the surge acceleration at this point in time (summer 2017). In both April and May 2017, a large area of multi-directional crevasses is mapped, that extends upglacier from the calving front to the Negribreen-Ord-breen junction, which is as far upglacier as the crevasse provinces of one-dimensional crevassing extended in 2016. In consequence, we conclude that a new surge wave, or acceleration phase, has affected the area of the 2016 surge in summer 2017. Based on these results, it is likely that surge speeds decreased in winter 2016 and increased again in summer 2017, reaching the all-time maximum. However, crevasse classification maps provide information on the deformation type at a high resolution, whereas velocity maps yield only a single parameter (ice-flow speed). The classification map from 2017-05-30 shows an expansion of the shear zone (class 3: shear) along the southern margin of Negribreen, upglacier to the Negribreen-Rembebreen junction, bordering the most extensive region of on-dimensional crevassing. In comparison, the shear zone (class (3)) in the northern margin only extends to about 5 km or 10 km upglacier of the Negribreen-Ordonnansbreen Junction (Figure 1a). Hence, there is a clear asymmetry in progression of the surge through the NGS, with a far more extensive surge region in the longitudinally southern part of the glacier. In the 2017 crevasse maps, we see two regions of class 4: shear/chaos. The northern region follows the northern edge of class (3) “shear", with slow-flowing ice of non-surging Ordonannsbreen running along the northern edge of the class-(3) province, and thus this region is classified correctly. The new and stronger acceleration in 2017 induced a stronger shear pattern, identified in training images for this class. The strong velocity gradient leads to so-called shear holes near the folded moraines (Figure 3c,e). However, the region identified as class (4) includes a wider part, which actually coincides with the region of the “retreating bay" [158], where the ice has retreated along the previous Negribreen-Ordonnansbreen medial moraine, leaving an area where open water is covered with ice chunks of various sizes, rendering “chaos class" (class 4) surface types. The second area, where class (4) is identified, covers much of the region of multi-directional crevasses (class (3)) in 2016, and field observations and imagery show that this is a region of “chaos", as described above (Figure 3d,f). The significant differences between strong shear and chaos crevasse types show the limitations of a classification that is based on insufficient numbers of characteristic crevasse classes. Furthermore, the 2017 classifications indicate that Ordonannsbreen has not been affected by the surge in Negribreen.

11.3. Evolution of Crevasse Classes in 2018

Classification results from two WorldView images from 2018 allows analysis of the surge progression in that summer. The images were collected on 2018-04-29 and 2018-05-26. In 2018, the marginal area of Negribreen is also affected by surge crevassing, and crevassing is overall more pervasive (see, Figure 9f,g). Thus, we conclude that in 2018 the surge is expanding into areas of shallower bed topography, which coincide generally with marginal glacier areas. Comparison of the width of the crevassed regions in the maps in Figure 9f (2018-04-29) and 9(g) (2018-05-26) suggests that this process of across-glacier expansion happens in summer 2018. However, the longitudinally southern part of the glacier continues to be more crevassed than the northern part, as was already noted for summer 2017. Crevasse patterns may still be evolving, as the map for Figure 9f (2018-04-29) shows. The northernmost shear zone that parallels the glacier edge between the Negribreen-Ordonnansbreen junction and the Negribreen-Akademikarbreen junction is misclassified as “other", but shown on the map for (2018-05-30; Figure 9g) correctly as a shear margin (class 3). Similar to progression of crevasse types observed for 2017 compared to 2016, the new surge wave leads to changes of crevasse type, with type (2) “multi-directional" following areas where type (1) “one-dimensional" formed in 2017, especially in central Negribreen. The regions of class 4 “shear/chaos" increased, expanding the regions pf complex shear as well as the regions of chaos crevasses, compared to regions where these types occurred in 2017, with the problem of lack of separation of “complex shear" and "chaos" persisting.

Of note is the mapping of crevasse types in upper Negribreen, south of the Negribreen-Akademikarbreen junction and north of the Rembebreen, and downstream of the Filchnerfønna ice falls. The Negribreen-Akademikarbreen medial moraine is classified correctly as class (5) “other" in its upstream (eastern) part, but as class 2 “shear" in its downstream half, which is incorrect, as moraine material covers this part of the glacier (seen in airborne imagery from summer 2018, cf. Figure 1 in [117]).

In this area, crevasse fields overflown by our ICESat-2 validation campaign and analyzed in [35,117] (RGT450-RGT594 areas) are clearly seen and correctly classified in Figure 9g of 2018-05-26. The one-directional crevasses south of the Negribreen-Akademikarbreen medial moraine are the subtype of thin, parallel, freshly opened crevasses, measured by airborne altimetry and ICESat-2 satellite altimetry [35]. Furthermore, the classification of uncrevassed regions as class (1), demonstrates that the surge has not transgressed the medial moraines between Negribreen and Akademikarbreen, nor advanced into Rembebreen in the south. In conclusion, as of summer 2018, the surge in the NGS has not or not yet affected side glaciers of Negribreen. However, it has been hyothesized in [114] that some side glaciers may surge in a later part of the surge process.

11.4. Summary of Geophysical Findings

In summary, we conclude that crevasse classification using a physically constrained neural net yields a segmentation of a surging glacier into crevasse provinces, that allows geophysical interpretation. A time series of crevasse provinces, based on a time series of WV images, provides evidence of the complex deformation processes that occur during the evolution of the surge. Individual findings are, in summarized form, as follows:

(1) More classes form, as the surge progresses.

(2) Fields of one-directional crevasses directional crevasses are always on the upglacier, leading edge of the surge expansion. From airborne observations and numerical analyses, we know that these crevasses are of extensional crevasse type with direction of extension in the direction of largest strain (strain rate) [18,36,117].

(3) Fields of shear crevasse type form between areas of accelerating and fast-moving ice and areas of slow-moving ice that is not (or not yet) affected by the surge.

(4) Multi-generational, multi-directional crevasse types form, as a new wave of the surge forces affects regions with pre-existing crevasses. Multi-directional crevasses can also form as a result of a two-directional, expensive force field, as observed during the surge of the BBGS [128], however, these types are not differentiated in the experiments in this paper.

(5) Lastly, continued deformation can render the crevassed area as a region “chaos class", where individual deformation events cannot be traced back from the crevasse patterns any more.

(6) Combining complex shear and chaos into a single class limits the ability for geophysical interpretation. For simplicity, these two different processes are not differentiated in the experiments in this paper. In a paper in preparation, we discriminate up to 13 crevasse classes [46].

(7) Over time, the surge expands into marginal areas, in addition to expanding upglacier.

12. Summary, Discussion and Conclusions

The work in this paper has addressed three challenges, posed in the introduction: Challenge 1. Harnessing the data revolution in Earth observation from space; Challenge 2. Glacial acceleration and Sea-Level-Rise assessment; and Challenge 3. Integration of physically-constrained classification and modern “Deep Learning" approaches in satellite image classification.

Challenge 1. Harnessing the data revolution in Earth observation from space. Through the integration of physical knowledge and two different ML approaches into a physically-driven NN, the VarioCNN, we have provided a means for rapid and efficient extraction of complex information from submeter resolution satellite imagery (and other imagery). The new NN, VarioCNN, combines the advantages of a physically-driven, relatively easily trainable MLP, with those of an efficient CNN, and thus directly provides an answer to Challenge 3. Integration of physically-constrained classification and modern “Deep Learning" approaches in satellite image classification.

There are several key concepts which have been instrumental in the mathematical and computational formulation of a connection between physical understanding and physically constrained classification: (1) ice dynamics of glacial acceleration, especially surging, (2) deformation of the material ice during rapid acceleration, (3) resultant surface signatures: crevasse patterns, and their formation, transport and overprinting, (4) recording of ice-surface structures in optical satellite imagery (and other imagery), and (5) mathematical representation of crevasse patterns in multi-directional vario functions – all these components comprise the physical constraints of the VarioMLP. VarioMLP utilizes the connectionist-geostatistical classification method [18,44,45] to first process satellite imagery by calculation of directional vario functions, which are then used to activate the neurons of an input layer of a MLP.

While there has been an increasing acceptance of deep learning methods in the geosciences, the lack of adequate, problem-specific labeled training data has hampered derivation of new knowledge using said deep learning approaches, because CNNs require training data sets with on the order of 100,000s to millions of labeled data. Science applications of CNNs have been limited to areas where more training data exist, including (a) biology and medicine, (b) atmospheric sciences and weather forecasting, and (c) sea surface temperature (ocean remote sensing) [93].

In a comparison of VarioMLP and ResNet-18, the shallowest “deep" NN that is commonly used [41,83], we find that the primary advantage of VarioMLP over the CNN is that VarioMLP can be trained with a relatively small set of labeled training data, of a number of input images that can feasibly be labeled by an expert in the field. Starting from a set of several hundred training data sets of crevassed surface images, associated to six classes by a structural glaciologist, a feedback loop of retraining and reinforcement, with a fast rejection/acceptance feature supported by a GUI in a combination of a confidence measure and expert-controlled decision, leads to creation of a labeled crevasse class data set of 4000 images.

We proceed to create a combined three tiered network, termed VarioCNN, which consists of VarioMLP, the feedback loop and a backend of a CNN (ResNet-18); this NN can be trained with the 4000-image labeled data set and has better training properties than VarioMLP alone. A flexible and versatile open-source software system, GEOCLASS-image [138], was designed and built for image classification. It performs all the tasks in this analysis and more; it is easily generalizable to other network structures and applications because of its modular design. GEOCLASS-image is user friendly, it includes a functional GUI that appeals to the expert and non-expert in glaciology or computer science alike (i.e. it does not require a lot of knowledge of ML, however, it has a pytorch framework). With GEOCLASS-image and VarioCNN, we have created an infrastructure that facilitates rapid analysis of submeter resolution commercial satellite image data, such as Maxar WorldView data, thus answering to Challenge 1. Furthermore, the work in this paper presents an approach for a path forward in harnessing the data revolution towards obtaining an advanced understanding of complex geophysical phenomena (here: glacial acceleration) in a climate-change science framework.

Challenge 2. Glacial acceleration and Sea-Level-Rise assessment. Our research in this paper presents an advance in the complexity of physics that can be extracted from satellite imagery (crevasse classification, deformation), in an area where such research has not been conducted yet. In the introduction, we have summarized the relationship between glacial acceleration and sea-level rise. In summary, glacial acceleration constitutes a deep uncertainty in SLR assessment, a term coined by the 6th Assessment Report of the IPCC [3]. Surges are the least understood form of glacial acceleration. The work in this paper culminates in an application of VarioCNN to study the evolution of crevasse provinces during the current (2016-2024) surge of an Arctic glacier system, the Negribreen Glacier system, Svalbard, based on classification of crevasse types in a time series of WorldView images for 2016-2018. This constitutes a novel approach, resulting in new results in glaciology. The classification is the first of its kind, carried out for an entire Arctic glacier system and for WorldView data. Negribreen last surged in 1935/36 [35,46,114].

Using four principle crevasse types (one-directional, multi-directional, shear and chaos), plus a class for undisturbed snow/ice surfaces and a rest class, we have derived segmentations of a surging glacier into crevasse provinces that allow geophysical interpretation of the surge evolution in 2016-2018, which includes most of the acceleration phase of the surge. Some results are: More crevasses form, as the surge expands. Fields of one-directional crevasses always form on the upglacier, leading edge of the surge expansion. Fields of shear crevasse type form between areas of accelerating and fast-moving ice and areas of slow-moving ice that is not, or not yet, affected by the surge. Multi-generational, multi-directional crevasse types form, as a new wave of the surge forces affects regions with pre-existing crevasses. Lastly, continued deformation can render the crevassed area as a region of “chaos class", where individual deformation events cannot be traced back to individual deformation events any more. Over time, the surge expands upglacier and into marginal areas. Links to modeling are outlined.

A limitation of the current analysis is the small number of crevasse classes, chosen to more easily derive the first combined network that integrates the connectionist-geostatistical approach and a CNN. A classification that distinguishes up to 13 crevasse classes is in preparation [46]. ResNet-18 requires square input images of 244 by 244 pixels, which VarioMLP does not require, however, this limitation is not severe, as split-images of any size can be created using the Split Image Explorer Tool in GEOCLASS-image [138] and capture the crevasse patterns.

More generally, the specific glaciological results obtained in this paper demonstrate that geoscience and computer science are equally important disciplines in the development of physically constrained NNs (i.e. glaciology is not merely “domain knowledge"), in light of the goal to utilize modern observation technology to advance geophysical understanding.

Author Contributions

UCH designed the research in this paper, developed the original Connectionist-Geostatistical Classification method and wrote most of this paper. The computational part of this paper is significantly based on unpublished work for a Masters Thesis, written by Lawrence “Jack" Hessburg, under advisory of U. Herzfeld. JH wrote the python geoclass software package, which has been utilized for all computational work in this paper, based on legacy software developed by UCH and members of the geomathematics laboratory, in various programming languages, including the GEOCLASS Library in C and GEOCLASS matlab software. TT contributed extensively to writing this paper. AH tested and used the new python GEOCLASS-image software and contributed to writing.

Funding

The work in this paper was primarily funded by U.S. National Science Foundation (NSF) Office of Advanced Cyberinfrastructure award OAC-1835256. Research on the surge in the NGS and data collection were supported by the U.S. National Aeronautics and Space Administration (NASA) Earth Sciences Division under awards 80NSSC20K0975, 80NSSC18K1439 and NNX17AG75G and by the U.S. National Science Foundation (NSF) under awards OPP-1745705 and OPP-1942356 (Office of Polar Programs).

Acknowledgments

Thanks are due to Tasha Markley, Griffin Hayes, Lukas Goetz-Weiss, Alex Weltman, Alfredo de La Pena Gonzales, Connor Meyers and Chris Higginson, all Geomathematics Lab, University of Colorado Boulder, and to Oliver Zahner for previous work on the classification methods and the GEOCLASS software. Maxar WorldView satellite imagery from the surge of the NGS was acquired with help from the Polar Geospatial Center, University of Minnesota, here we are indebted to Paul Morin, Jonathan Pundsack, Cole Kelleher, Stephanie Linde and colleagues, and through the NASA Commercial Small Satellite Program (CSDAP).

Research on WorldView data analysis was also supported by NASA Earth Sciences under the CSDAP. Principal Investigator for all awards is Ute Herzfeld. Helicopter support was facilitated by the Norwegian Polar Center. Collection of airborne data in Svalbard was conducted with permission of the National Security Authority of Norway, the Civil Aviation Authority of Norway and the Governor of Svalbard, registered as Research in Svalbard Project RIS-10827 “NEGRIBREEN SURGE". The data collection was also partly supported through a 2018 Access Pilot Project (2017_0010) of the Svalbard Integrated Observing System (SIOS). All this support is gratefully acknowledged.

Appendix A. Software/ Open Science Section

The data and code supporting the findings of this study are openly available in the GEOCLASS-image GitHub repository [138] found at

https://github.com/Herzfeld-Lab/GEOCLASS-image/releases/tag/v1.0.

The repository contains all the necessary datasets, scripts, and code used for analysis and reproduction of the results presented in the article. Researchers interested in accessing and utilizing the first release of this software may find them at the specified GitHub link.

Additionally, specific instructions, descriptions, and necessary dependencies to replicate and conduct similar experiments and analyses are documented within the repository’s README file.

All materials in this repository are released under the MIT License. Please refer to the repository’s LICENSE file for detailed information on the permissions and restrictions regarding the use, reproduction, and distribution of the data and code.

References

Wagner, W. Big Data infrastructures for processing Sentinel data. In Proceedings of the Photogrammetric Week, 2015, pp. 93–104.
U.S. National Science Foundation. NSF’s Big 10 Ideas: Harnessing the Data Revolution, 2023.
Masson-Delmotte, V.; Zhai, A.; Pirani, S.; Connors, C.; Péan, S.; Berger, N.; Caud, Y.; Chen, L.; Goldfarb, M.; Gomis, M.; et al. AR6: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press, 2021. [Google Scholar]
Greve, R. Thermomechanisches Verhalten polythermer Eisschilde – Theorie, Analytik, Numerik. Doctoral thesis, Department of Mechanics, Darmstadt University of Technology, Germany, 1995. Berichte aus der Geowissenschaft, Shaker Verlag, Aachen, Germany.
Greve, R.; Blatter, H. Dynamics of Ice Sheets and Glaciers; Springer: Berlin, Germany etc., 2009. [Google Scholar]
Larour, E.H.; Seroussi, H.; Morlighem, M.; Rignot, E. Continental scale, high order, high spatial resolution, ice sheet modeling using the Ice Sheet System Model (ISSM). J. Geophys. Res 2012, 117 (F1), F01022. [Google Scholar] [CrossRef]
Lipscomb, W.H.; Price, S.F.; Hoffman, M.J.; Leguy, G.R.; Bennett, A.R.; Bradley, S.; Evans, K.J.; Fyke, J.G.; Kennedy, J.H.; Perego, M.; Ranken, D.M.; et al. Description and evaluation of the community ice sheet model (CISM) v. 2.1. Geoscientific Model Development 2019, 12, 387–424. [Google Scholar] [CrossRef]
Sellevold, R.; Van Kampenhout, L.; Lenaerts, J.T.; Noël, B.; Lipscomb, W.H.; Vizcaino, M. Surface mass balance downscaling through elevation classes in an Earth system model: Application to the Greenland ice sheet. The Cryosphere 2019, 13, 3193–3208. [Google Scholar] [CrossRef]
Hanna, E.; Pattyn, F.; Navarro, F.; Favier, V.; Goelzer, H.; van den Broeke, M.R.; Vizcaino, M.; Whitehouse, P.L.; Ritz, C.; Bulthuis, K.; et al. Mass balance of the ice sheets and glaciers–Progress since AR5 and challenges. Earth-Science Reviews 2020, 201, 102976. [Google Scholar] [CrossRef]
Payne, A.J.; Nowicki, S.; Abe-Ouchi, A.; Agosta, C.; Alexander, P.; Albrecht, T.; Asay-Davis, X.; Aschwanden, A.; Barthel, A.; Bracegirdle, T.J.; et al. Future sea level change under coupled model intercomparison project phase 5 and phase 6 scenarios from the Greenland and Antarctic ice sheets. Geophysical Research Letters 2021, 48, e2020GL091741. [Google Scholar] [CrossRef]
Siahaan, A.; Smith, R.S.; Holland, P.R.; Jenkins, A.; Gregory, J.M.; Lee, V.; Mathiot, P.; Payne, A.J.; Ridley, J.K.; Jones, C.G. The Antarctic contribution to 21st-century sea-level rise predicted by the UK Earth System Model with an interactive ice sheet. The Cryosphere 2022, 16, 4053–4086. [Google Scholar] [CrossRef]
Gettelman, A.; Geer, A.J.; Forbes, R.M.; Carmichael, G.R.; Feingold, G.; Posselt, D.J.; Stephens, G.L.; van den Heever, S.C.; Varble, A.C.; Zuidema, P. The future of Earth system prediction: Advances in model-data fusion. Science Advances 2022, 8, eabn3488. [Google Scholar] [CrossRef] [PubMed]
Stocker, T.F.; Qin, H.; Plattner, G.K.; Tignor, M.; Allen, S.K.; Boschung, J.; Nauels, A.; Xia, Y.; Bex, V.; Midgley, P.M.e. Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press, 2013. [Google Scholar]
Clarke, G. Fast glacier flow: Ice streams, surging, and tidewater glaciers. Journal of Geophysical Research 1987, 92, 8835–8842. [Google Scholar] [CrossRef]
Truffer, M.; Echelmeyer, K.A. Of isbrae and ice streams. Annals of Glaciology 2003, 36, 66–72. [Google Scholar] [CrossRef]
Mayer, H.; Herzfeld, U. Structural glaciology of the fast-moving Jakobshavn Isbræ, Greenland, compared to the surging Bering Glacier, Alaska, USA. Annals of Glaciology 2000, 30, 243–249. [Google Scholar] [CrossRef]
Jiskoot, H. Glacier surging. Encyclopedia of Snow, Ice and Glaciers 2011, pp. 415–428.
Herzfeld, U.; McDonald, B.; Weltman, A. Bering Glacier and Bagley Ice Valley Surge 2011: Crevasse Classification as an Approach to Map Deformation Stages and Surge Progression. Annals of Glaciology 2013, 54(63), 279–286. [Google Scholar] [CrossRef]
Straneo, F.; Cenedese, C. The dynamics of Greenland’s glacial fjords and their role in climate. Annual review of marine science 2015, 7, 89–112. [Google Scholar] [CrossRef]
Trantow, T.; Herzfeld, U.C. Crevasses as indicators of surge dynamics in the Bering Bagley Glacier System, Alaska: Numerical experiments and comparison to image data analysis. Journal of Geophysical Research: Earth Surface 2018, 123, 1615–1637. [Google Scholar] [CrossRef]
Murray, T.; Luckman, A.; Strozzi, T.; Nuttall, A. The initiation of glacier surging at Fridtjovbreen, Svalbard. Annals of Glaciology 2003, 36, 110–116. [Google Scholar] [CrossRef]
Robel, A.A.; Roe, G.H.; Haseloff, M. Response of marine-terminating glaciers to forcing: time scales, sensitivities, instabilities, and stochastic dynamics. Journal of Geophysical Research: Earth Surface 2018, 123, 2205–2227. [Google Scholar] [CrossRef]
Hill, E.A.; Carr, J.R.; Stokes, C.R.; Gudmundsson, G.H. Dynamic changes in outlet glaciers in northern Greenland from 1948 to 2015. The Cryosphere 2018, 12, 3243–3263. [Google Scholar] [CrossRef]
Nuth, C.; Gilbert, A.; Köhler, A.; McNabb, R.; Schellenberger, T.; Sevestre, H.; Weidle, C.; Girod, L.; Luckman, A.; Kääb, A. Dynamic vulnerability revealed in the collapse of an Arctic tidewater glacier. Scientific reports 2019, 9, 5541. [Google Scholar] [CrossRef]
Benn, D.; Fowler, A.; Hewitt, I.; Sevestre, H. A general theory of glacier surges. Journal of Glaciology 2019, 65, 701–716. [Google Scholar] [CrossRef]
Zheng, W.; Pritchard, M.E.; Willis, M.J.; Stearns, L.A. The possible transition from glacial surge to ice stream on Vavilov Ice Cap. Geophysical Research Letters 2019, 46, 13892–13902. [Google Scholar] [CrossRef]
Alley, K.E.; Wild, C.T.; Luckman, A.; Scambos, T.A.; Truffer, M.; Pettit, E.C.; Muto, A.; Wallin, B.; Klinger, M.; Sutterley, T.; et al. Two decades of dynamic change and progressive destabilization on the Thwaites Eastern Ice Shelf. The Cryosphere 2021, 15, 5187–5203. [Google Scholar] [CrossRef]
Frank, T.; Akesson, H.; de Fleurian, B.; Morlighem, M.; Nisancioglu, K.H. Geometric controls of tidewater glacier dynamics. The Cryosphere 2022, 16, 581–601. [Google Scholar] [CrossRef]
Grinsted, A.; Hvidberg, C.S.; Lilien, D.A.; Rathmann, N.M.; Karlsson, N.B.; Gerber, T.; Kjær, H.A.; Vallelonga, P.; Dahl-Jensen, D. Accelerating ice flow at the onset of the Northeast Greenland Ice Stream. Nature Communications 2022, 13, 5589. [Google Scholar] [CrossRef]
Ehrenfeucht, S.; Morlighem, M.; Rignot, E.; Dow, C.F.; Mouginot, J. Seasonal acceleration of Petermann Glacier, Greenland, from changes in subglacial hydrology. Geophysical Research Letters 2023, 50, e2022GL098009. [Google Scholar] [CrossRef]
Straneo, F.; Sutherland, D.; Holland, D.; Gladish, C.; Hamilton, G.; Johnson, H.; Rignot, E.; Xu, Y.; Koppes, M. Submarine melting of Greenland’s glaciers by Atlantic waters. Annals Glaciology 2012, 53(60). [Google Scholar]
Rignot, E.; Fenty, I.; Menemenlis, D.; Xu, Y. Spreading of warm ocean waters around Greenland as a possible cause for glacier acceleration. Annals of Glaciology 2012, 53. [Google Scholar] [CrossRef]
Herzfeld, U.; McDonald, B.; Wallin, B.; Krabill, W.; Manizade, S.; Sonntag, J.; Mayer, H.; Yearsley, W.; Chen, P.; Weltman, A. Elevation Changes and Dynamic Provinces of Jakobshavn Isbræ, Greenland, Derived Using Generalized Spatial Surface Roughness from ICESat GLAS and ATM Data. Journal of Glaciology 2014, 60, 834–848. [Google Scholar] [CrossRef]
Herzfeld, U.; Wallin, B. Spatio-Temporal Analysis of Surface Elevation Changes in Pine Island Glacier, Antarctica, from ICESat GLAS Data and ERS-1 Radar Altimeter Data. Annals of Glaciology 2014, 55, 248–258. [Google Scholar] [CrossRef]
Herzfeld, U.; Trantow, T.; Lawson, M.; Hans, J.; Medley, G. Surface heights and crevasse types of surging and fast-moving glaciers from ICESat-2 laser altimeter data — Application of the density-dimension algorithm (DDA-ice) and validation using airborne altimeter and Planet SkySat data. Science of Remote Sensing 2021, 3, 1–20. [Google Scholar] [CrossRef]
Trantow, T.; Herzfeld, U.C. Progression of the surge in the Negribreen Glacier System from two years of ICESat-2 measurements. Journal of Glaciology (in review), Earth ArXiv (preprint: doi.org/10.31223/X5NT1Z) 2023. [Google Scholar]
Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 1–9.
Rawat, W.; Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural computation 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012, 25. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 1998, 6, 107–116. [Google Scholar] [CrossRef]
Meyer, H.; Pebesma, E. Estimating the area of applicability of remote sensing-based machine learning models with limited training data. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 2021, pp. 2028–2030.
Herzfeld, U.C.; Zahner, O. A connectionist-geostatistical approach to automated image classification, applied to the analysis of crevasse patterns in surging ice. Computers & Geosciences 2001, 27, 499–512. [Google Scholar]
Herzfeld, U.C. Master of the Obscure — Automated Geostatistical Classification in Presence of Complex Geophysical Processes. Mathematical Geosciences 2008, 40, 587–618. [Google Scholar] [CrossRef]
Herzfeld, U.C.; Markley, T.; Hayes, A.; de la Pena Gonzalez, A.; Hale, G.; Han, H. Evolution of the Surge in Negribreen, Svalbard, Derived from Automated Connectionist- Geostatistical Classification of Crevasse Provinces in WorldView and Planet SkySat Satellite Image Data prep 2023.
Camps-Valls, G.; Tuia, D.; Zhu, X.X.; Reichstein, M. Deep learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science and Geosciences; John Wiley & Sons, 2021. [Google Scholar]
Bain, A. Mind and body: The theories of their relation; Vol. 4, D. Appleton, 1873.
James, W. The Principles of Psychology; H. Holt and Company, 1890. [Google Scholar]
Rosenblatt, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 1958, 65, 386. [Google Scholar] [CrossRef] [PubMed]
Ivakhnenko, A.G.; Lapa, V.G. Cybernetics and forecasting techniques; Modern analytic and computational methods in science and mathematics, North-Holland: New York, NY, 1967; Trans. from the Russian, Kiev, Naukova Dumka, 1965. [Google Scholar]
Minsky, M.; Papert, S. An introduction to computational geometry. Cambridge tiass., HIT 1969, 479, 104. [Google Scholar]
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 1980, 36, 193–202. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Werbos, P.J. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef]
Zell, A.; Mamier, G.; Vogt, M.; Mache, N.; Hubner, R.; Doring, S.; Herrmann, K.; Soyez, T.; Schmalzl, M.; Sommer, T.; et al. Stuttgart neural network simulator User Manual. University of Stuttgart, Stuttgart 1994. [Google Scholar]
Rumelhart, D.E.; McClelland, J.L.; PDP Research Group, C. Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1: Foundations; MIT press, 1986. [Google Scholar]
Sun, R. Connectionism and neural networks. The Cambridge handbook of artificial intelligence 2014, pp. 108–127.
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT press, 2016. [Google Scholar]
Song, J.; Gao, S.; Zhu, Y.; Ma, C. A survey of remote sensing image classification based on CNNs. Big earth data 2019, 3, 232–254. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A review on deep learning techniques applied to semantic segmentation. arXiv 2017. arXiv preprint arXiv:1704.06857 2020. [Google Scholar]
Herzfeld, U.; Williams, S.; Heinrichs, J.; Maslanik, J.; Sucht, S. Geostatistical and Statistical Classification of Sea-Ice Properties and Provinces from SAR Data — Methods and Applications to Ice Environments Near Point Barrow, Alaska. Remote Sensing, MDPI 2016, 8, 1–37. [Google Scholar]
Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Transactions on Geoscience and Remote Sensing 2019, 57, 6690–6709. [Google Scholar] [CrossRef]
Kwok, R.; Cunninghan, G.; Holt, B. An approach to identification of sea ice types from spaceborne SAR data. Microwave Remote Sensing of Sea Ice 1992, pp. 355–360.
Collins, M.J. Information fusion in sea ice remote sensing. Microwave remote sensing of sea ice 1992, pp. 431–441.
Steffen, K.; Heinrichs, J. Feasibility of sea ice typing with synthetic aperture radar (SAR): Merging of Landsat thematic mapper and ERS 1 SAR satellite imagery. Journal of Geophysical Research: Oceans 1994, 99, 22413–22424. [Google Scholar] [CrossRef]
Ochilov, S.; Clausi, D. Operational SAR sea-ice image classification. Geoscience and Remote Sensing, IEEE Transactions on 2012, 50, 4397–4408. [Google Scholar] [CrossRef]
Wang, L.; Scott, K.; Clausi, D. Improved sea ice concentration estimation through fusing classified SAR imagery and AMSR-E data. Canadian Journal of Remote Sensing 2016. [Google Scholar] [CrossRef]
Dabboor, M.; Geldsetzer, T. Towards sea ice classification using simulated RADARSAT Constellation Mission compact polarimetric SAR imagery. Remote Sensing of Environment 2014, 140, 189–195. [Google Scholar] [CrossRef]
Karvonen, J. Compaction of C-band Synthetic Aperture Radar Based Sea Ice Information for Navigation in the Baltic Sea; Helsinki University of Technology, 2006. [Google Scholar]
Zakhvatkina, N.Y.; Alexandrov, V.Y.; Johannessen, O.M.; Sandven, S.; Frolov, I.Y. Classification of sea ice types in ENVISAT synthetic aperture radar images. Geoscience and Remote Sensing, IEEE Transactions on 2013, 51, 2587–2600. [Google Scholar] [CrossRef]
Roesel, A.; Kaleschke, L.; Birnbaum, G. Melt ponds on Arctic sea ice determined from MODIS satellite data using an artificial neural network. The Cryosphere 2012, 6. [Google Scholar] [CrossRef]
Shen, X.y.; Zhang, J.; Meng, J.m.; Ke, C.q. Sea ice type classification based on random forest machine learning with CryoSat-2 altimeter data. In Proceedings of the 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). IEEE, 2017, pp. 1–5.
Buckley, E.; Farrell, S.; Duncan, K.; Connor, L.; Kuhn, J.; Dominguez, R. Classification of Sea Ice Summer Melt Features in High-resolution IceBridge Imagery. Journal of Geophysical Research 2020, 125. [Google Scholar] [CrossRef]
Buckley, E.M.; Farrell, S.L.; Herzfeld, U.C.; Trantow, T.M.; Baney, O.N.; Duncan, K.; Han, H.; Lawson, M.; Webster, M. Observing the Evolution of Summer Melt on Multiyear Sea Ice with ICESat-2 and Sentinel-2. The Cryosphere 2023, 17, 3695–3719, Published 2023/08/31. [Google Scholar] [CrossRef]
Kohonen, T. Self-organization and associative memory; Vol. 8, Springer Science & Business Media, 2012.
Kohonen, T. Learning Vector Quantization for Pattern Recongition. Technical Report TKK-F-A602 1990.
Looney, C. Pattern Recognition Using Neural Networks; Oxford University Press: New York, NY, USA, 1997; p. 458. [Google Scholar]
Herzfeld, U.C. Vario functions of higher order–definition and application to characterization of snow surface roughness. Computers & Geosciences 2002, 28, 641–660. [Google Scholar]
Garrigues, S.; Allard, D.; Baret, F. Using First- and Second-Order Variograms for Characterizing Landscape Spatial Structures From Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing 2007, 45, 1823–1834. [Google Scholar] [CrossRef]
Qing, D.; Huadong, G.; Yun, S.; Zhen, L.; Changlin, W. Variograms: practical method to process polarimetric SAR data. In Proceedings of the IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477), 2003, Vol. 2, pp. 932–934 vol.2. [CrossRef]
Schwartz, D.; Pinel-Puyssegur, B. Spatial and Temporal Statistical Analysis of Stack of SAR Images: The Contribution of the Variogram. In Proceedings of the IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 2018, pp. 4205–4208. [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645.
He, Z.; Li, J.; Liu, L.; He, D.; Xiao, M. Multiframe video satellite image super-resolution via attention-based residual learning. IEEE Transactions on Geoscience and Remote Sensing 2021, 60, 1–15. [Google Scholar] [CrossRef]
Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv preprint arXiv:1312.4400 2013.
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014, [arXiv:cs.CV/1409.1556].
Tai, C.; Xiao, T.; Zhang, Y.; Wang, X.; et al. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067 2015.
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.
Xiang, C.; Zhang, L.; Tang, Y.; Zou, W.; Xu, C. MS-CapsNet: A novel multi-scale capsule network. IEEE Signal Processing Letters 2018, 25, 1850–1854. [Google Scholar] [CrossRef]
Virts, K.; Shirey, A.; Priftis, G.; Ankur, K.; Ramasubramanian, M.; Muhammad, H.; Acharya, A.; Ramachandran, R. A quantitative analysis on the use of supervised machine learning in Earth science. In Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2020, pp. 2252–2255.
Liu, X.; Hu, Q.; Cai, Y.; Cai, Z. Extreme learning machine-based ensemble transfer learning for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020, 13, 3892–3902. [Google Scholar] [CrossRef]
Ou, X.; Yan, P.; Zhang, Y.; Tu, B.; Zhang, G.; Wu, J.; Li, W. Moving object detection method via ResNet-18 with encoder–decoder structure in complex scenes. IEEE Access 2019, 7, 108152–108160. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, f. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Aguilar, M.; Saldaña, M.; Aguilar, F. GeoEye-1 and WorldView-2 pan-sharpened imagery for object-based classification in urban environments. International Journal of Remote Sensing 2013, 34, 2583–2606. [Google Scholar] [CrossRef]
Waser, L.T.; Küchler, M.; Jütte, K.; Stampfer, T. Evaluating the potential of WorldView-2 data to classify tree species and different levels of ash mortality. Remote Sensing 2014, 6, 4515–4545. [Google Scholar] [CrossRef]
Sen, A.; Suleymanoglu, B.; Soycan, M. Unsupervised extraction of urban features from airborne lidar data by using self-organizing maps. Survey Review 2020, 52, 150–158. [Google Scholar] [CrossRef]
Vigneshwaran, S.; Vasantha Kumar, S. Comparison of classification methods for urban green space extraction using very high resolution worldview-3 imagery. Geocarto International 2021, 36, 1429–1442. [Google Scholar] [CrossRef]
Luan, Q.; Tian, Z. Application of Machine Leaning Methods in Geoscience. In Proceedings of the 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). IEEE, 2022, pp. 563–567.
Malambo, L.; Popescu, S. Image to Image Deep Learning for Enhanced Vegetation Height Modeling in Texas. Remote Sensing 2023, 15, 5391. [Google Scholar] [CrossRef]
Narine, L.L.; Popescu, S.C.; Malambo, L. Synergy of ICESat-2 and Landsat for mapping forest aboveground biomass with deep learning. Remote Sensing 2019, 11, 1503. [Google Scholar] [CrossRef]
Palm, S.P.; Selmer, P.; Yorks, J.; Nicholls, S.; Nowottnick, E. Planetary boundary layer height estimates from ICESat-2 and CATS backscatter measurements. Frontiers in Remote Sensing 2021, 2, 716951. [Google Scholar] [CrossRef]
Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-resolution mapping of forest canopy height using machine learning by coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 data. International Journal of Applied Earth Observation and Geoinformation 2020, 92, 102163. [Google Scholar] [CrossRef]
Gaffinet, B.; Hagensieker, R.; Loi, L.; Schumann, G. Supervised Machine Learning for Flood Extent Detection with Optical Satellite Data. In Proceedings of the IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023, pp. 2084–2087.
de Lima, R.P.; Bonar, A.; Coronado, D.D.; Marfurt, K.; Nicholson, C. Deep convolutional neural networks as a geological image classification tool. The Sedimentary Record 2019, 17, 4–9. [Google Scholar] [CrossRef]
Liu, Q.; Basu, S.; Ganguly, S.; Mukhopadhyay, S.; DiBiano, R.; Karki, M.; Nemani, R. Deepsat v2: feature augmented convolutional neural nets for satellite image classification. Remote Sensing Letters 2020, 11, 156–165. [Google Scholar] [CrossRef]
Camps-Valls, G.; Svendsen, D.H.; Cortés-Andrés, J.; Mareno-Martínez, Á.; Pérez-Suay, A.; Adsuara, J.; Martín, I.; Piles, M.; Muñoz-Marí, J.; Martino, L. Physics-aware machine learning for geosciences and remote sensing. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 2021, pp. 2086–2089.
Ge, Y.; Zhang, X.; Atkinson, P.M.; Stein, A.; Li, L. Geoscience-aware deep learning: A new paradigm for remote sensing. Science of Remote Sensing 2022, 5, 100047. [Google Scholar] [CrossRef]
Karpatne, A.; Ebert-Uphoff, I.; Ravela, S.; Babaie, H.A.; Kumar, V. Machine learning for the geosciences: Challenges and opportunities. IEEE Transactions on Knowledge and Data Engineering 2018, 31, 1544–1554. [Google Scholar] [CrossRef]
Daw, A.; Karpatne, A.; Watkins, W.D.; Read, J.S.; Kumar, V. Physics-guided neural networks (pgnn): An application in lake temperature modeling. In Knowledge Guided Machine Learning; Chapman and Hall/CRC, 2022; pp. 353–372. [Google Scholar]
Muralidhar, N.; Bu, J.; Cao, Z.; He, L.; Ramakrishnan, N.; Tafti, D.; Karpatne, A. Phynet: Physics guided neural networks for particle drag force prediction in assembly. In Proceedings of the Proceedings of the 2020 SIAM International Conference on Data Mining. SIAM, 2020, pp. 559–567.
Hu, X.; Hu, H.; Verma, S.; Zhang, Z.L. Physics-guided deep neural networks for power flow analysis. IEEE Transactions on Power Systems 2020, 36, 2082–2092. [Google Scholar] [CrossRef]
De Bézenac, E.; Pajot, A.; Gallinari, P. Deep learning for physical processes: Incorporating prior scientific knowledge. Journal of Statistical Mechanics: Theory and Experiment 2019, 2019, 124009. [Google Scholar] [CrossRef]
Willis, M.J.; von Stosch, M. Simultaneous parameter identification and discrimination of the nonparametric structure of hybrid semi-parametric models. Computers & Chemical Engineering 2017, 104, 366–376. [Google Scholar]
Lefauconnier, B.; Hagen, J.O. Surging and calving glaciers in eastern Svalbard; 1991.
Strozzi, T.; Paul, F.; Wiesmann, A.; Schellenberger, T.; Kääb, A. Circum-Arctic Changes in the Flow of Glaciers and Ice Caps from Satellite SAR Data between the 1990s and 2017. Remote Sensing 2017, 9, 947. [Google Scholar] [CrossRef]
Haga, O.N.; McNabb, R.; Nuth, C.; Altena, B.; Schellenberger, T.; Kääb, A. From high friction zone to frontal collapse: dynamics of an ongoing tidewater glacier surge, Negribreen, Svalbard. Journal of Glaciology 2020, 66, 742–754. [Google Scholar] [CrossRef]
Herzfeld, U.C.; Lawson, M.; Trantow, T.; Nylen, T. Airborne Validation of ICESat-2 ATLAS Data Over Crevassed Surfaces and Other Complex Glacial Environments: Results From Experiments of Laser Altimeter and Kinematic GPS Data Collection From a Helicopter Over a Surging Arctic Glacier (Negribreen, Svalbard). Remote Sensing 2022, 14, 1185–1224. [Google Scholar] [CrossRef]
Herzfeld, U.C.; Trantow, T.; Hans, J.; Lawson, M.; Medley, G. Surface heights and crevasse types of surging and fast-moving glaciers from ICESat-2 laser altimeter data — Application of the density-dimension algorithm (DDA-ice) and validation using airborne and PlanetLab SkySat data. Science of Remote Sensing 2020. in review. [Google Scholar]
Sevestre, H.; Benn, D.I.; Luckman, A.; Nuth, C.; Kohler, J.; Lindbäck, K.; Pettersson, R. Tidewater glacier surges initiated at the terminus. Journal of Geophysical Research: Earth Surface 2018, 123, 1035–1051. [Google Scholar] [CrossRef]
Means, W. Stress and Strain: Basic Concepts of Continuum Mechanics for Geologists; Springer: New York, NY, USA, 1976; p. 339. [Google Scholar]
Suppe, J. Principles of Structural Geology; Academic Press: Englewood Cliffs, NJ, 1985; p. 573. [Google Scholar]
Twiss, R.; Moore, E. Structural Geology; W.H. Freeman: New York, NY, 1992; p. 532. [Google Scholar]
Ramsay, J.; Lisle, R. The Techniques of Modern Structural Geology, Vol. 3: Applications of Continuum Mechanics in Structural Geology; Academic Press: San Diego, CA, 2000; p. 7011061. [Google Scholar]
Liu, I. Continuum Mechanics; Springer: Berlin, 2002; p. 297. [Google Scholar]
Greve, R. Kontinuumsmechanik; Springer: Berlin, Germany etc., 2003. [Google Scholar]
Herzfeld, U.C.; Clarke, G.K.C.; Mayer, H.; Greve, R. Derivation of deformation characteristics in fast-moving glaciers. Computers & Geosciences 2004, 30, 291–302. [Google Scholar]
Herzfeld, U.C.; Mayer, H. Surge of Bering Glacier and Bagley Ice Field, Alaska: an update to August 1995 and an interpretation of brittle-deformation patterns. Journal of Glaciology 1997, 43, 427–434. [Google Scholar] [CrossRef]
Herzfeld, U. The 1993-1995 surge of Bering Glacier (Alaska) — a photographic documentation of crevasse patterns and environmental changes; Vol. 17, Trierer Geograph. Studien, Geograph. Gesellschaft Trier and Fachbereich VI – Geographie/Geowissenschaften, Universität Trier, 1998; p. 211pp.
Herzfeld, U.C.; Stauber, M.; Stahl, N. Geostatistical characterization of ice surfaces from ERS-1 and ERS-2 SAR data, Jakobshavn Isbræ, Greenland. Annals of Glaciology 2000, 30, 224–234. [Google Scholar] [CrossRef]
Mayer, H.; Herzfeld, U. A structural segmentation, kinematic analysis and dynamic interpretation of Jakobshavns Isbræ, West Greenland. Zeitschrift für Gletscherkunde und Glazialgeologie 2001, 37, 107–124. [Google Scholar]
Vornberger, P.; Whillans, I. Crevasse deformation and examples from Ice Stream B, Antarctica. Journal of Glaciology 1990, 36 (122), 3–10. [Google Scholar] [CrossRef]
Marmo, B.; Wilson, C. Strain localisation and incremental deformation within ice masses, Framnes Mountains, east Antarctica. Journal of Structural Geology 1998, 20 (2-3), 149162. [Google Scholar] [CrossRef]
Rist, M.; Sammonds, P.; Murrell, S.; Meredith, P.; Doake, C.; Oerter, H.; Matsuki, K. Experimental and theoretical fracture mechanics applied to Antarctic ice feature and surface crevassing. Journal of Geophysical Research 1999, 104 (B2), 29732987. [Google Scholar]
Trantow, T. Surging in the Bering-Bagley Glacier System, Alaska – Understanding Glacial Acceleration through New Methods in Remote Sensing, Numerical Modeling and Model-Data Comparison. PhD thesis, University of Colorado, 2020. [Google Scholar]
Larour, E.; Utke, J.; Csatho, B.; Schenk, A.; Seroussi, H.; Morlighem, M.; Rignot, E.; Schlegel, N.; Khazendar, A. Inferred basal friction and surface mass balance of the Northeast Greenland Ice Stream using data assimilation of ICESat (Ice Cloud and land Elevation Satellite) surface altimetry and ISSM (Ice Sheet System Model). The Cryosphere 2014, 8, 2335–2335. [Google Scholar] [CrossRef]
Fatland, D.R.; Lingle, C.S. Analysis of the 1993-95 Bering Glacier (Alaska) surge using differential SAR interferometry. Journal of Glaciology 1998, 44, 532–546. [Google Scholar] [CrossRef]
Trantow, T.; Herzfeld, U.C. Evolution of a Surge Cycle of the Bering-Bagley Glacier System From Observations and Numerical Modeling. Journal of Geophysical Research: Earth Surface 2024, 129, e2023JF007306. [Google Scholar] [CrossRef]
Herzfeld, U.C.; Hessburg, J.; Hayes, A.; Trantow, T. GEOCLASS-image (v1.0). 2023. https://github.com/Herzfeld-Lab/GEOCLASS-image/releases/tag/v1.0. [CrossRef]
Murray, T.; Strozzi, T.; Luckman, A.; Jiskoot, H.; Christakos, P. Is there a single surge mechanism? Contrasts in dynamics between glacier surges in Svalbard and other regions. J. Geophys. Res. 2003, 108, 2237. [Google Scholar] [CrossRef]
Neigh, C.S.; Masek, J.G.; Nickeson, J.E. High-resolution satellite data open for government research. Eos, Transactions American Geophysical Union 2013, 94, 121–123. [Google Scholar] [CrossRef]
Porter, C.; Howat, I.; Noh, M.J.; Husby, E.; Khuvis, S.; Danish, E.; Tomko, K.; Gardiner, J.; Negrete, A.; Yadav, B.; et al. ArcticDEM - Mosaics, Version 4.1. 2022. [CrossRef]
Immitzer, M.; Atzberger, C.; Koukal, T. Tree species classification with random forest using very high spatial resolution 8-band WorldView-2 satellite data. Remote sensing 2012, 4, 2661–2693. [Google Scholar] [CrossRef]
Elsharkawy, A.; Elhabiby, M.; El-Sheimy, N. Improvement in the detection of land cover classes using the Worldview-2 imagery. In Proceedings of the ASPRS, International Scientific Conference, 2012, pp. 19–23.
Koc-San, D. Evaluation of different classification techniques for the detection of glass and plastic greenhouses from WorldView-2 satellite imagery. Journal of Applied Remote Sensing 2013, 7, 073553–073553. [Google Scholar] [CrossRef]
Ghosh, A.; Joshi, P.K. A comparison of selected classification algorithms for mapping bamboo patches in lower Gangetic plains using very high resolution WorldView 2 imagery. International Journal of Applied Earth Observation and Geoinformation 2014, 26, 298–311. [Google Scholar] [CrossRef]
Li, D.; Ke, Y.; Gong, H.; Li, X. Object-based urban tree species classification using bi-temporal WorldView-2 and WorldView-3 images. Remote Sensing 2015, 7, 16917–16937. [Google Scholar] [CrossRef]
Gaertner, J.; Genovese, V.B.; Potter, C.; Sewake, K.; Manoukis, N.C. Vegetation classification of Coffea on Hawaii Island using WorldView-2 satellite imagery. Journal of Applied Remote Sensing 2017, 11, 046005–046005. [Google Scholar] [CrossRef]
Melville, B.; Lucieer, A.; Aryal, J. Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia. International journal of applied earth observation and geoinformation 2018, 66, 46–55. [Google Scholar] [CrossRef]
Caraballo-Vega, J.; Carroll, M.; Neigh, C.; Wooten, M.; Lee, B.; Weis, A.; Aronne, M.; Alemu, W.; Williams, Z. Optimizing WorldView-2,-3 cloud masking using machine learning approaches. Remote Sensing of Environment 2023, 284, 113332. [Google Scholar] [CrossRef]
Earth Observation Portal (EOPortal). WorldView-1, 2023.
Earth Observation Portal (EOPortal). WorldView-2, 2023.
Fiuczynski, M.E. Planetlab: overview, history, and future directions. ACM SIGOPS Operating Systems Review 2006, 40, 6–10. [Google Scholar] [CrossRef]
Peterson, L.; Muir, S.; Roscoe, T.; Klingaman, A. Planetlab architecture: An overview 2006.
Wulder, M.A.; Loveland, T.R.; Roy, D.P.; Crawford, C.J.; Masek, J.G.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Belward, A.S.; Cohen, W.B.; et al. Current status of Landsat program, science, and applications. Remote sensing of environment 2019, 225, 127–147. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote sensing of Environment 2012, 120, 25–36. [Google Scholar] [CrossRef]
Matheron, G. Principles of geostatistics. Economic Geology 1963, 58, 1246. [Google Scholar] [CrossRef]
Matheron, G. The intrinsic random functions and their applications. Adv. Appl. Probab. 1973, 5, 439–468. [Google Scholar] [CrossRef]
Herzfeld, U.; Trantow, T.; Bennetts, S. Surge-forced structural disintegration, enhanced calving and resultant rapid mass loss of a large Arctic fjord glacier (Negribreen, Svalbard). GRL subm.
Shannon, C.E. Prediction and entropy of printed English. Bell system technical journal 1951, 30, 50–64. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.
Hecht-Nielsen, R. Theory of the backpropagation neural network. In Neural networks for perception; Elsevier, 1992; pp. 65–93. [Google Scholar]
Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems 2018, 31. [Google Scholar]
Papadopoulos, G.; Edwards, P.J.; Murray, A.F. Confidence estimation methods for neural networks: A practical comparison. IEEE transactions on neural networks 2001, 12, 1278–1287. [Google Scholar] [CrossRef] [PubMed]
Geman, S.; Bienenstock, E.; Doursat, R. Neural networks and the bias/variance dilemma. Neural computation 1992, 4, 1–58. [Google Scholar] [CrossRef]
Briscoe, E.; Feldman, J. Conceptual complexity and the bias/variance tradeoff. Cognition 2011, 118, 2–16. [Google Scholar] [CrossRef]
Belkin, M.; Hsu, D.; Ma, S.; Mandal, S. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 2019, 116, 15849–15854. [Google Scholar] [CrossRef]
Kochgaven, C.; Mishra, P.; Shitole, S. Detecting Presence of COVID-19 with ResNet-18 using PyTorch. In Proceedings of the 2021 International Conference on Communication information and Computing Technology (ICCICT). IEEE, 2021, pp. 1–6.
Ramzan, F.; Khan, M.U.G.; Rehmat, A.; Iqbal, S.; Saba, T.; Rehman, A.; Mehmood, Z. A deep learning approach for automated diagnosis and multi-class classification of Alzheimer’s disease stages using resting-state fMRI and residual neural networks. Journal of medical systems 2020, 44, 1–16. [Google Scholar] [CrossRef]
Jiang, S.; Hua, C.; Yuan, M. Image classification method of bearing fault based on BOA optimization ResNet-18. In Proceedings of the 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE). IEEE, 2023, pp. 510–514.
Pandey, G.K.; Srivastava, S. ResNet-18 comparative analysis of various activation functions for image classification. In Proceedings of the 2023 International Conference on Inventive Computation Technologies (ICICT). IEEE, 2023, pp. 595–601.
Ruder, S. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 2016.
Raymond, C. How do glaciers surge? A review. Journal of Geophysical Research 1987, 92, 9121–9134. [Google Scholar] [CrossRef]
Lingle, C.; Post, A.; Herzfeld, U.C.; Molnia, B.F.; Krimmel, R.; Roush, J. Bering Glacier surge and icebergcalving mechanism at Vitus Lake, Alaska, USA. Journal of Glaciology 1993, 39, 722–727. [Google Scholar] [CrossRef]
Molnia, B.; Post, A. Holocene history of Bering Glacier, Alaska: a prelude to the 1993-1994 surge. Physical Geography 1995, 16, 87–117. [Google Scholar] [CrossRef]
Mayer, H.; Herzfeld, U.; Clarke, G. Analysis of deformation types in fast-moving glaciers. Terra Nostra 2002, 4, 273–278. [Google Scholar]
Molnia, B.F. Alaska; Satellite Image Atlas of Glaciers of the World, U.S. Geological Survey Professional Paper 1386-K: Washington, D.C., 2008; p. 525. [Google Scholar]
Molnia, B.F.; Post, A. Surges of the Bering Glacier. Geological Society of America Special Paper 2010, pp. 251–270.

Figure 1. Location of study area and surface velocities from Sentinel-1 SAR data. (a) The Negribreen Glacier System with important geographical features labeled. The location of the NGS within the Svalbard archipelago is indicated by the red box in the upper-right corner insert. (b) NGS mean surface velocity between 2016-07-03 and 2016-07-15 shortly after the surge began. (c) NGS mean surface velocity between 2017-07-10 and 2017-07-22 when peak surge speeds were reached (upwards of 22 m/day). (d) NGS mean surface velocity between 2018-05-10 and 2018-05-22. Each of the velocity maps in (b)-(d) are in m/day with black arrows indicating the magnitude and direction of mean surface velocity between the baseline dates. Background image for each subfigure: Landsat-8 RGB image acquired 2019-08-05.

Figure 2. Typical images and associated directional variograms. (a) An example of an input image containing only undisturbed snow, which displays relatively uniform surface characteristics. (b) An example of an input image containing strong parallel crevasses, a prominent and repeating surface characteristic. (c) Output of the directional variogram v1(h) from 3.2a in 4 directions for 14 values of h. Note the relative uniformity of output values across all directions. (d) Output of the directional variogram v1(h) from (b) in 4 directions for 14 values of h. Note the much higher baseline values than in (c). Also note and the sharp contrast between the Diag1 direction (defined as diagonally from top left to bottom right) and the other directions. In this case, the Diag1 direction is nearly parallel to the direction of the crevasses and thus it does not reach the same sill as the other directions until a much higher lag value.

Figure 3. Airborne photographs of the four basic crevasse types. Photographs acquired during the 2017 Negribreen campaign of the authors (Flight 2, 2017-07-15). (a) One-directional crevasses (DSC_0063). (b) Multi-directional crevasses (foreground, DSC_245). (c) Shear crevasses (DSC_0221). (d) Chaos or shear/chaos crevasses (DSC_0199). (e) Shear crevasses, co-called “shear holes" (DSC_0344). (f) Chaos or shear/chaos crevasses (DSC_0198). Example of chaos crevasses with a shear component.

Figure 4. Examples of “split images" of the 6 surface classes, subselected from WorldView satellite imagery. (a) Undisturbed snow/ice, (b) One-directional crevasses, (c) Multi-directional crevasses, (d) Shear crevasses (shear holes), (e) Chaos crevasses (shear/chaos) and (f) Other (not classified). Same crevasse classes as illustrated in aerial photographs in Figure 2, with a class for undisturbed snow/ice and a rest class “other" added.

Figure 5. Split Image Explorer tool in action. Red cross-hairs select the split-image viewed in the upper left. Label options are displayed in the lower left. On the right, full classification and/or labeling results are viewed overlaying the full starting image (here a Worldview-2 image from 2016-06-25). The sliding bar at the center left allows visualization filters based on confidence levels. Finally, the tool has the options to switch between classification results for different starting images as seen in the upper left for various Worldview images.

Figure 6. Full Worldview imagery used in classification time series analysis. (a) Worldview-1 acquired 2016-05-20, (b) Worldview-2 acquired 2016-06-25, (c) Worldview-1 acquired 2016-07-08, (d) Worldview-1 acquired 2017-04-25, (e) Worldview-1 acquired 2017-05-30, (f) Worldview-1 acquired 2018-04-29 and (g) Worldview-1 acquired 2018-05-26.

Figure 7. An example of overfitting from a test run of the Resnet18 model trained with a training dataset of 1,362 split images. The loss function used is the cross-entropy loss. The training loss approximates zero, whereas the validation loss stays high.

Figure 8. VarioCNN, derivation and architecture. Training data set (step(1)) derived by expert (structural glaciologist), includes split images representing 6 crevasse classes. Vario functions calculated per input image, vario values activate input nodes of VarioMLP (number of input nodes=number of vario values; number of input nodes is a variable model parameter). VarioMLP example with two internal layers of sizes [5,2]=[5 x 18, 2 x 18], output layer with 6 nodes; number of output nodes=number of crevasse classes, a variable model parameter. Retraining loop used for augmentation of training images per class from step (i) to step (i+1), for n steps. Full labeled training data set used to train ResNet-18. Note training images must be of size 244 by 244 pixels (fixed due to ResNet-18 architecture requirements).

Figure 9. Time series of crevasse classes showing surge evolution. Classification of the following imagery: (a) Worldview-1 acquired 2016-05-20, (b) Worldview-2 acquired 2016-06-25, (c) Worldview-1 acquired 2016-07-08, (d) Worldview-1 acquired 2017-04-25, (e) Worldview-1 acquired 2017-05-30, (f) Worldview-1 acquired 2018-04-29 and (g) Worldview-1 acquired 2018-05-26. (h) Crevasse-type legend for the classification time series.

Table 1. Instrument specifications for the Worldview-1 and Worldview-2 satellites (Maxar). Both satellites provide high-resolution imagery using the pushbroom scanning technique.

Table 2. List of WoldView satellite image data sets and distribution of split images per source files in the final labeled training dataset of 3993 split images.

Table 3. GEOCLASS-image software system, components design, implementation, testing and release, updated November 30th, 2023.

Table 4. Number of split images for each of the 6 crevasse classes in the final labeled training data set of 3933 split-images, derived using VarioMLP and several training and reinforcement loops.

Table 5. Minimum cross entropy loss achieved by VariogramMLP model with different values for number of lag steps in the variogram step. Hidden layer shape was fixed at [5, 2] for all runs.

Table 6. Minimum cross entropy loss achieved by VarioMLP model for each of the 5 hidden layer shapes tested. Variogram lag steps was fixed at 18 for all runs. It is interesting to note that after [5, 2], adding both depth and width to the hidden layers results in decreased validation performance.

Table 7. Minimum validation loss achieved on the full-sized validation dataset of 786 split images using batch sizes of 1, 2, 4 and 8. The best result was achieved from a batch size of 2, a slight improvement from 1.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

Views

Comments

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer