A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin

Preprint

Article

A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin

Altmetrics

Downloads

153

Views

Comments

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

07 September 2024

Posted:

09 September 2024

You are already at the latest version

Alerts

Abstract

Unhealthy concentrations of ozone in the Uinta Basin, Utah, can occur after sufficient snowfall and a strong atmospheric anticyclone creates a persistent cold pool that traps atmospheric ozone and its precursors emitted from oil and gas operations. The winter-ozone system has two clear outcomes—occurrence or not—that is well understood by domain experts and supported by archives of atmospheric observations. Rules of the system can be formulated in natural language (“sufficient snowfall and high pressure leads to high ozone"), lending itself to analysis with a fuzzy-logic inference system. This method encodes human expertise as machine intelligence in a more constrained manner than alternative, more complex inference methods such as neural networks, increasing user trustworthiness of our model prototype before further optimization. Herein, we develop an ozone-forecasting system, Clyfar, based on knowledge of system dynamics and informed by an archive of meteorological conditions and ozone concentration. The inference system demonstrates proof-of-concept despite rudimentary tuning. We describe our framework for predicting future ozone concentrations if input values are drawn from numerical weather prediction forecasts as a proxy for observations as the system’s initial conditions. Our model is computationally cheap, allowing us to sample uncertainty with substantially more ensemble members than in traditional NWP. We evaluate hindcasts for one winter, finding our prototype demonstrates promise to deliver useful guidance for users concurrent with optimization of system parameters using machine learning.

Keywords:

Subject: Environmental and Earth Sciences - Pollution

1. Introduction

High, unhealthy concentrations of ozone in the Uinta Basin, Utah [1] (Figure 1) within the U.S. Intermountain West can occur some winters. If a substantial snow coverage persists in the wake of a snow-bearing extratropical cyclone, in tandem with increasing surface pressure, a persistent cold pool may form that traps emissions from oil and gas operations [2,3,4]. Insolation then drives ozone production through photolysis of these precursor pollutants, primarily nitrous oxides (NO_x) and volatile organic compounds (VOCs). High ozone is typically an urban summertime problem due to intense human activity ([5] pp. 90—95, 884). However, the mechanism is different in locations with winter episodes that are dependent on snow [6] such as the Uinta Basin winter-ozone (UBWO) system [2,3]. The Uinta Basin is one of only two locations in North America with winter ozone episodes [7] due to the delicate balance of latitude, elevation, and terrain shape that enables simultaneous persistent snow cover and insolation strong enough to raise ozone concentration to unhealthy levels [6].

While snowfall predictions are difficult due to sensitivity to temperature and altitude in mountain regions, the UBWO physical system is well understood by domain experts [2,4,8,9]. After snowfall—the prime prerequisite—the UBWO system can evolve into two possible states: the development of ozone concentrations that exceed the U.S. Environmental Protection Agency (EPA) regulation, or not. Quantities such as snow depth, ozone concentration, and wind speed are continuous and subject to error, and rules of the system behavior can be described with adverbs of degree (e.g., “quite", “sufficient"). The scientific problem of modeling a well understood physical system with sparse, imperfect data lends itself to a fuzzy-logic inference prediction system [10,11]. Fuzzy logic, an unfamiliar but relatively elemental form of machine intelligence, swaps familiar two-valued logic (True or False) for a continuum between those Boolean limits of zero and unity. Hence, snowfall can be partly sufficient and partly not, in contrast to nuance lost if Boolean logic is used (i.e., a snowfall is entirely sufficient or not).

The EPA regulates ozone concentration via National Ambient Air Quality Standards (NAAQS). We define a high-ozone episode when our representative observation for daily maximum ozone concentration in the Uinta Basin exceeds the 70-ppb NAAQS limit. Multiple exceedances of this threshold can trigger sanctions, leading to limits on and higher costs for industrial development. As such, forecasts of cold-pool and high-ozone events are critical to warn the oil and gas industry, protect public health, and support the regional economy. Since winter-ozone episodes only occur during relatively rare and particular meteorological conditions, useful predictions would better inform decision-makers responsible for reduction in emissions. The Ozone Alert system, run at the Bingham Research Center (BRC) in Vernal, Utah (Figure 1) since 2017, provides qualitative winter ozone forecasts to a network of over 100 oil and gas operators, other stakeholders, and local residents. The program followed a request by oil and gas companies; and members of regional oil and gas trade associations are encourage to participate. However, the system is entirely manual and disseminated via a one-way email list. Seeking improved guidance to support the Ozone Alert program, we aim to ultimately replace some workload and subjectivity in the status quo manually administrated email list.

1.1. Seeking an Alternative to Traditional Air-Quality Models

An obstacle to issuing accurate ozone forecasts stems from inability of traditional grid-point numerical weather-prediction (NWP) models to capture cold pools [3,12], snowfall [13], and when coupled with atmospheric-chemistry models, high concentrations of ozone [14,15]. Further, approximations of sub-grid-scale processes can perform poorly in mountainous regions [16], hence forecast systems might be better developed specifically for mountainous application [17] given compounding errors in atmospheric chemistry and dynamical processes [18]. Observational data is relatively sparse in the Basin for radar and in-situ observations (for example, Basin-level snowfall depth in Figure 1), posing an issue for training and/or evaluation of grid-based prediction systems.

More importantly, there is an unavoidable trade-off between sampling uncertainty in a forecast (achieved by a Monte Carlo or ensemble system) and the fineness of grid spacing (hence better resolving elements such as shallow cold pools [19]). Any physical NWP model must resolve complex terrain to capture persistent cold pools in complex terrain; this is generally set as a horizontal grid-spacing finer than

Δ x = 3 km

in NWP models e.g., [20]. However, increasing the fineness of NWP grid-spacing in two or three spatial dimensions rapidly raises computational demand when further considering the reduction in timestep to satisfy the Courant–Friedrichs–Lewy criterion (i.e., information does not cross more than one grid cell per timestep). Given the finite compute resources, a resolution increase reduces the maximum number of ensemble members, and in turn, the ensemble prediction system more sparsely samples the uncertainty of future states, increasing the risk of an extreme event not captured by this limited set of forecast members.

This reduction in ensemble membership comes despite more members required to capture the variation in finer-scale phenomena captured by a finer scale model [21]. Wind flow across the Basin, a complex landscape carved with canyons and surrounded by mountain ranges, is subject to diurnal reversal of slope flows, channeling in the canyons, and other small-scale patterns that cannot be observed with the sparse network of observations in the Basin [2]. While fine resolution is required to capture the mechanisms leading to cold pools and high ozone in the UBWO system, the intricacy of streamlines across the Basin flow is an unknown unknown, not captured by sparse observations, yet this high-resolution model will product a prediction thereof. This appears irreconcilable for resolving a shallow planetary boundary layer such as a UBWO cold pool under high uncertainty. What are our alternatives to this traditional configuration of NWP ensembles?

1.2. From Machine Intelligence to Ozone Prediction

Atmospheric scientists are quickly embracing state-of-the-art methods in AI suitable for operational forecasting e.g., [22,23], including those relevant to air-quality e.g., [24,25]. Alternatives to traditional NWP can range in complexity from simpler statistical relationships [26] to pure deep-learning AI models [27,28,29]. Through the Information Age, AI and machine learning (data-focused) techniques have become more powerful and accessible through open-source software such as scikit-learn [30], large language models (LLMs) such as ChatGPT (chatgpt.com, accessed 1 August 2024), PaLM [31,32] and BLOOM [33]; and so on. While powerful models have never been more accessible, a potential pitfall is black-box behavior where the human supervisor cannot fully trust generated output because they are not sure how the conclusion was reached [34]. Adopting a Ockham’s Razor approach to constructing a FIS e.g., [35], herein we seek the simplest model that gives useful guidance, and no simpler; at this point, developers may use post-processing or deep learning to fine-tune model performance by optimizing parameters [36,37].

We outline below a prototype ozone-concentration prediction model for the UBWO system, which implements a fuzzy-logic inference system (FIS) that infers the possibility of a cold pool from meteorological input. Its rules are drawn from human expertise and archived observations. We refer to our fuzzy-logic prediction system as Clyfar. This is Welsh for “clever" to reflect our focus of codifying human expertise as machine intelligence, and is a loose abbreviation of "Computational Logic Yielding Forecasts for Atmospheric Research".

2. Data and Methodology

2.1. Data Sources and Pre-Processing

We obtained atmospheric measurements from the compilation of sensor networks archived by Synoptic Weather (www.synopticdata.com, accessed 1 July 2024), a spin-off from MesoWest [38]. The geographical domain is a 72-km (45-mile) radius around Pelican Lake (UCL21) with coordinates (40.1742, -109.6666), shown by the red circle in Figure 1. As we will use a rule-based system where permutations of variables and their categories build rapidly, we limit this preliminary version of Clyfar to four input variables deemed most important for predicting high ozone concentrations in the Basin:

Snow cover
Mean sea-level pressure (MSLP)
Insolation
Surface wind

The rationale for the above might be summarized as “after a heavy snowfall, if wind calms under a strengthing high-pressure system and skies are mostly clear, ozone is possible". Future iterations of Clyfar may include additional variables such as ground heat flux (available for snow melt), actinic irradiation [3], and a “memory" of cold-pool strength and ozone concentration.

Our output (target) variable, Uinta Basin daily ozone-concentration maximum, is defined for the local time-zone period of midnight to midnight. We must therefore engineer representative input variables in the same time period. Observation stations can have different suites of atmospheric sensors, and use of only one station leaves the analysis susceptible to spurious error. We therefore use the following functions to reduce observation sets to a Basin representative value configured after extensive preliminary testing and discussion between domain experts:

Snow-cover data are sparse in the Basin (stations reporting snow depth at Basin level are marked with black squares in Figure 1), where most stations are operated by volunteers in the Cooperative Observation Program (COOP; https://www.ncei.noaa.gov/products/land-based-station/cooperative-observer-network, access 1 July 2024). A station that reports once a day may not sample at a time most representative for that solar day. Therefore, our snow value is the 90th percentile of the set of maximum snow-depth reports from Basin-floor stations on the COOP network taken at minimum once a day.
Raw pressure data is reduced to mean sea-level pressure (MSLP) on Synoptic Weather’s server before download, and we use the median value from all stations’ daily maximum as representative. The computation of MSLP becomes less reliable with height, and preliminary work revealed absolute values of MSLP in the dataset to be excessively large. The excessive MSLP values appear to be a systematic, additive offset that does not preclude good performance (not shown). Current work is investigating alternative calculations of MSLP and the source of high bias.
Insolation is affected by both optical depth (humidity and clouds; particulate matter) and the solar angle. Passing clouds make the data temporally variable, and spatially, higher elevation stations will receive more radiation under clear skies. To generate a representative value for the Basin, we employ a “near-zenith mean" that takes the mean downwelling solar radiation for each station between 1000 and 1400 local time. From this set of all stations, we then take the median value.
Wind data. We want to identify wind strong enough to disperse pollutants and/or the cold pool, while ignoring transient gusts from storms (mainly a result of evaporative cooling and attendant downdrafts). Hence, we assume Vernal Regional Airport (KVEL) is representative and take its daily median 10-meter maximum reported wind value, with the benefit of a long, reliable archive of observations. The airport is approximately 4.5 km (2.8 miles) from the nearest foothills east of the runway, and even further from canyon exits north of the town. As such, we neglect effects from downslope winds, drainage flows, or wind funneling; we take KVEL wind reports as representative of the Basin as a whole. Future versions will consider more stations’ reports.
Ozone data. While internal data shows there is occasionally considerable variation in ozone concentrations from west to east in the Basin (not shown), for the purpose of this initial study we choose one value by taking the 99th percentile of each station then take the median value from this set.

2.2. Fuzzy Logic: Background and Justification

Fuzzy logic differs from traditional two-valued logic (True or False) by allowing variables to have continuous set membership between 0 and 1. For example, in traditional logic, we might categorize a day as either "rainy" or "not rainy" based on a fixed threshold of precipitation. However, this binary classification fails to capture the nuances of weather conditions. Fuzzy logic allows us to define a “rainy day" as a continuous spectrum:

0 mm (trace) rain: Definitely not rainy (membership = 0)
0.1 mm of rain: Mostly not rainy (membership = 0.1)
1 mm of rain: Somewhat rainy (membership = 0.5)
5 mm of rain: Quite rainy (membership = 0.9)
over 10 mm of rain: Definitely rainy (membership = 1)

This approach allows for a more nuanced representation of weather conditions, where a day with 1 mm of rain is not simply “not rainy" or “rainy" but rather as “somewhat rainy" with a membership of 0.5 in the “rainy" set. Fuzzy logic has many advantages over bivalent logic. While its use in consumer products and control systems has integrated with AI and ML techniques [39,40,41], the philosophy of fuzzy-set theory still holds and is still deployed in many applications outside of control systems, such as predictions of rainfall [42] and fog [43] for meteorological examples. Output from FISs has numerous advantages, such as lower sensitivity in small perturbations versus probabilistic models due to smoothing [44], and acceptance of conflicting information [44]. Output can also be considered an upper bound on probability [44], usually preferred by risk-averse users.

We can encode nuance in our ruleset with membership functions that determine how much a given input value belongs to a particular category of the variable. For instance, in our rainfall example, we might define overlapping membership functions for each variable’s category, where the observed rainfall might have partial membership in multiple sets, allowing the system to reckon with ambiguity or conflict.

We can use domain expertise and archived observations to determine numerical values for adjectives/adverbs when creating a ruleset for the system at hand. For example, researchers might use their experience and historical data to define what constitutes "high pressure" or "calm wind" in a particular region, translating these linguistic terms into specific membership functions. The use of a FIS is motivated by multiple characteristics of the UBWO system:

The formation of UBWO cold pools—and usually high ozone concentration—is a well known system, but hinges on sufficient snowfall. As a complex system with two basins of attraction, sensitivity of cold-pool formation is lower when snow is either absent or very deep, whereas near the cusp of the two potential future states (near the bifurcation point), chaotic growth means small changes grow rapidly [45,46]. Setting and predicting representative values of snow depth is difficult due to drifting snow, sparse data observations, and inherent limitations of human knowledge and ability to represent UBWO system complexity. Fuzzy logic effectively smooths some noise, making its behavior more resilient in presence of error [44], trading some specificity for the estimate of uncertainty.
Evolution of an AI system with ongoing development and optimization that can be increased in complexity to optimize output utility to Ozone Alert forecasters and decision makers. Machine learning techniques can be deployed with rulesets and parameter tuning [39] to leverage benefits from different AI/ML techniques, while the FIS ruleset remains understandable by the human.
Capturing both complex terrain and uncertainty is a trade-off when running expensive NWP models. As grid spacing becomes finer, timesteps between integrations must become closer together, and we might consider a finer grid in the vertical direction to better capture shallow cold pools in simulations. However, a rare event (e.g., a heavy snowfall that occurs 1 in 5 winters) requires ample sampling of the uncertainty distribution. The fewer members in a forecast ensemble, the less chance of capturing the true nature of uncertainty, and the more difficult to calibrate the system to optimize balance between sharpness and reliability of uncertainty estimates. Further, fine-scale atmospheric flow and state is an unknown unknown: a high-resolution NWP model may be overkill. However, we lack the observations to diagnose such a scenario: the so-called curse of dimensionality. Running many lightweight statistical simulations may better spend finite computer resources than on unfalsifiable and demanding high-resolution NWP models.

We are further motivated to use a FIS to follow best practices of explainable AI [34], albeit fuzzy logic being only an elementary form of AI [47,48]. A FIS encodes domain knowledge explicitly, enabling explainable and transparent construction of its workings and can be extended with a fuzzy neural network e.g., [49] or fine-tuned with deep-learning e.g., [50]. Herein, we create a prototype model to demonstrate potential of forecasting ozone concentrations for the purpose of automation, optimization, and greater insight into UBWO-system behavior. Comprehensive reviews of fuzzy logic can be found in, e.g., [51]. We perform inference with the so-called Mamdani method, which the authors found more accessible than, e.g., the Sugeno method; the choice of inference is outside the scope of this manuscript but the method is discussed further in [52] and references therein.

3. Configuration of Clyfar: A Fuzzy Inference System for Ozone Prediction

Written completely in python code, Clyfar comprises a module for pre-processing input data, an inference system based on a fuzzy ruleset, and a planned post-processing module that will optimize output further based on observations.

We define membership functions for each category in each variable informed by an archive of meteorological and ozone-concentration observations. Though the authors had access to 20 years of data for this region, the present study will focus on the winter of 2021/2022 as an illustrative case study to demonstrate the promising (but mixed) results of our prototype. To simplify our prototype for sake of understanding, we restrict our system to four input weather variables with ozone as the sole output variable. Further Clyfar iterations will consider more rules and variables. The authors stress this single winter is not a representative evaluation of long-term performance, but a foundation for future versions via lessons learned.

3.1. Overview of Approach

Some users seek a deterministic forecast, perhaps interpreted as a hedged ‘best guess’. However, other decision-makers benefit from information about uncertainty, increasing the chances of detecting an early, low-risk, high-impact event [53,54] by accounting for chaotic error growth [45,55]. Inference of both a single value and uncertainty distribution follows this method:

Pre-process observational data to create a representative value of the Basin state per input variable and time (feature engineering).
Define Membership Functions: Define distribution of membership of the variable to a category (“adverbs of degree", e.g., sufficient snow). These function (curves) map the input data (e.g., 250 mm snow) to their corresponding fuzzy sets with non-zero memberships (e.g., 1.0 sufficient snow),
Construct Fuzzy Rules: Develop a set of if–then rules that define the relationship between input and output variables based on domain expertise (e.g., "Sufficient snow and calm winds lead to elevated ozone.")
Fuzzification: Convert the crisp input values into fuzzy values using the defined membership functions. For instance, snowfall at the cusp of negligible and sufficient for cold-pool formation will have non-zero membership to both categories.
Apply Inference Rules: For each fuzzy rule, we compute an activation in the range $[0, 1]$ of the target variable’s category. We use the fuzzy "AND" operator to combine multiple activations with a infimum (a minimum in finite sets). This matches intuition that it is harder to activate multiple rules at a higher level. Further, "OR" operators are combined with the supremum (maximum), and this is used to create an aggregated activation or possibility distribution [44],
Possibility distribution: the supremum is also used to aggregate the rule outputs (i.e., the maximum value from each rule output for each point in the output’s numerical range). Then each category has an activation level that represents a possibility [56,57], conceptually an upper bound on probability [44,58] that can be considered a likelihood (but not a probability);
Defuzzification: To generate a single, deterministic value in native units, we convert the aggregated activation distribution back into crisp values using defuzzification methods such as the centroid method (a sort of weighted average or center-of-gravity). We might also preserve the possibility distribution by skipping this final step.

To gauge performance of Clyfar, we will compare inferred values (resembling forecasts) with observed ozone concentrations. Our system is assumed stationary, therefore the model should capture the UBWO key behavior with observations before forecasts can be issued. As there is no machine learning occurring at this stage of the FIS, there is no concern in training and testing over the same dataset.

3.2. Pre-Processing and Membership Functions

Input variables were chosen by inspecting our archive of observations as detailed in Section 2. Clusters or bifurcations in scatter plots of daily representative values of ozone concentration against various input variables, as shown for wind speed in Figure 2, represent potential regimes or areas of nonlinear behavior in the UBWO atmospheric state known to domain experts. For instance, Figure 2 shows that even a moderate wind speed can disperse the pollution and lower concentrations. In the following figures, the x-axis represents the possible range considered by the inference system (also called the universe of discourse); values outside of either bound are clipped to the appropriate minimum or maximum.

We construct membership functions as follows (and shown in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7:

Wind speed. As seen in Figure 2, exceedence events in winter 2021–2022 only occurred if the representative wind was calm enough. Preliminary testing showed this was common to numerous stations and seasons, matching domain expertise. We chose two opposing sigmoid distributions crossing close to 2.5 ${m s}^{- 1}$ as advised by observations and adjusted slightly during preliminary testing.
Snow depth. Similarly to the wind variable, we choose two opposing sigmoid functions that cross around a region of “sufficient snow". This is around 100 mm (3.9 inch). Although difficult to directly compare, the sigmoid shapes were shallower resulting in more likely overlap when more frequently observed in the UBWO system (cf. the inset of Figure 4) to represent more uncertainty around what constitutes “sufficient" snow depth.
Mean sea-level pressure (MSLP). Rising pressure behind a snow storm reinforces the surface anticyclone in cold air, often in tandem with warm air advection aloft e.g., [12]. We choose three categories: two extremes are conducive to dissipation or formation of cold pools, while the middle category essentially increases specificity (an additional membership function curve) at the cost of increasing the ruleset complexity. Regarding magnitudes of mean sea-level pressure (MSLP), values appear too high, perhaps due to calculation error, but preliminary testing showed no obvious errors. This will be adjusted in future. The authors also tested for sensitivity to normalization of input data (i.e., pressure in [0,1]) due to the large gap in ranges between MSLP and the other variables. There was no observed improvement in performance, with some loss of transparency due to the required transform to and from the normalized range [0,1].
Solar insolation. The authors found most subjective uncertainty and sensitivity when considering downwelling solar radiation critical for photolysis and the process leading to unhealthy ozone concentrations. Solar insolation measured at the surface is highly sensitive to cloud cover factored nonlinearly by the time of day where solar obscuration occurred. The further complexity in the ozone–insolation relationship is how increasing insolation increases with photolysis and ozone production, but eventually mixes out the cold pool due to melting snow and thermal mixing of the planetary boundary layer. We encode this large uncertainty with larger overlap of membership functions (Figure 6). We decide to define four periods to reflect the four main months of the UBWO system (December to March inclusive) and parallel the ozone output categories discussed next. We label the solar insolation categories as seasons as these ranges are typical of those seasons in the Uinta Basin. There is much overlap between a cloudy spring day and a clear mid-winter’s day in terms of insolation. Given the importance of actinic irradiation to the UBWO [3], these estimates may be required to narrow bounds of uncertainty regarding photolysis rates.

The output is ozone concentration in four categories: background, moderate, elevated, and extreme. We choose not to match the EPA Air Quality Index categories (https://www.airnow.gov/aqi/aqi-basics/, accessed 1 August 2024) but instead opt for fewer categories to focus on understandable FIS configuration. The choice of four categories strikes a balance between complexity (required to capture extremes) and simplicity (to reduce the size of the FIS ruleset). Not all permutations of these rules are needed as they are either physically inconsistent (e.g., snow is sufficient and solar is summer) or already captured by another rule (optimizing the ruleset is outside the scope of the current text). Again, it is human expertise that can handpick or modify rules, adding trustworthy complexity. Next, we use relationships between input variables and ozone concentration samples to determine membership functions. The membership functions are constructed using Gaussian or sigmoid functions (Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7). See Appendix A for details on function construction for each variable and category. Below, we italicize variable categories (e.g., sufficient snow) to differentiate fuzzy-variable descriptors from body text.

3.3. Ruleset of UBWO Behavior

In natural language, we can describe the UBWO system with the following rules. We define a limited list known to human experts [2,4,8,9] and the list does not exhaust the permutations of all variables and categories for our Clyfar prototype:

If there is negligible snow, or pressure is low, or wind is breezy, then the ozone level will be at background levels. This is because pollutants are blown away from the region of interest;
If there is sufficient snow, and if pressure is high, and if wind is calm, and if the solar radiation is typical for spring, then the ozone level will be extreme (typical high-ozone case).
If there is sufficient snow, and if pressure is high, and if wind is calm, and the solar radiation is typical for winter, then the ozone level will be elevated. There is still sufficient sunlight for photolysis to build ozone to unhealthy levels, but it may take longer to build, for example.
If there is sufficient snow, and if pressure is high, and if wind is calm, and the solar radiation is low (midwinter) or high (summer), then the ozone level will be moderate.
If there is sufficient snow, and if pressure is average, and if wind is calm, and the solar radiation is low to moderate (winter into spring), then the ozone level will be elevated.
If there is sufficient snow, and if pressure is average, and if wind is calm, and the solar radiation is lowest (midwinter) or highest (late spring into summer), then ozone level will be moderate. This is because insolation is either too weak for prolific ozone generation, or so strong it may mix out the boundary layer.

We render this ruleset using logic operators in Appendix B.

4. Illustrative Examples

Here, we assess our system with synthetic examples to demonstrate expected behavior for four scenarios whereby unhealthy levels of ozone are deemed (1) likely, (2) unlikely, (3) on the cusp of occurring or not, and (4) an implausible scenario of snow in summer.

4.1. Case 1: Ozone Likely

We begin with an example in a situation where ozone levels are expected to be higher than the NAAQS limit, given deep snow, high pressure, weak winds, and insolation strong enough to instigate ample ozone production but weak enough not to mix out the cold pool.

snow = 250 mm (9.8 inches)
mslp = 1045 hPa
wind = 1.0 ${m s}^{- 1}$
solar = 640 ${W m}^{- 2}$

The crisp value predicts a ozone level of 75 ppb. Looking at the four categories, there is little support (possibility) of background and elevated levels of ozone, but strong possibility for extreme levels. The centroid method is used to generate a most-likely value, but by its nature of computing a weighted average over rule-activation aggregation, it cannot generate a crisp value to the right of the extreme Gaussian curve’s center value.

4.2. Case 2: Ozone Unlikely

Next, a case unlikely to yield ozone above a typical background level. We prescribe a thin snow depth and a breeze that would likely blow a portion of pollutants from the Basin and/or initiative mechanical mixing of the cold pool and dissipation into the free troposphere.

snow = 50 mm (2.0 inches)
mslp = 1025 hPa
wind = 4.0 ${m s}^{- 1}$
solar = 600 ${W m}^{- 2}$

The inferred ozone concentration suggests it is entirely possible (likely) to remain near background levels. The impossibility of another outcome other than background is triggered by Rule 1 (breezy wind → background ozone). We recall that possibilities across categories can sum to more than one, unlike probabilities. Hence, not only is the possibility (activation) of background close to 1.0, the possibility of other categories are equal or near zero. A background level is not only totally possible but entirely necessary due to the impossibility of all other outcomes. Further information on possibility and necessity—dual measures that represent bounds on uncertainty—is found in [44,57].

4.3. Case 3: On the Cusp

We consider a case where it is deliberately not immediately clear which ozone level is most possible due to variables on the cusp of the membership function’s intersection (i.e., close to a potential tipping point in the physical system, such as sufficient snow).

snow = 100 mm (3.9 inches)
mslp = 1040 hPa
wind = 1.5 ${m s}^{- 1}$
solar = 500 ${W m}^{- 2}$

The sum of all possibilities is less than unity: a so-called subnormal distribution [59]. While further discussion is outside the scope of this study, this signals insufficient rule coverage as something must happen (i.e., at least one category must be entirely possible before it may necessarily occur). Confirming a weakness in the ruleset construction, we find moderate was not activated but instead adjacent categories, which seems counterintuitive. Alternatively, this distribution and the two basins of attraction to ultimate states may indicate a bifurcation in solutions (i.e., it is difficult to discriminate between the two states).

There is a high possibility of elevated ozone, but there is also a considerable possibility of ozone limited to background levels. The activated range (filled area of membership functions in figures) around the crisp value (black line) is large, suggesting considerable uncertainty (i.e., a wide range of output variables are possible). In this case, a centroid value does not communicate the high uncertainty (i.e., the substantial possibility of other sets, particularly background). Further to this crisp (deterministic) value, stakeholders who are risk-averse would benefit from information that extreme levels are still possible in case evasive action makes financial sense e.g., [60].

4.4. Case 4: Ignorance

A common mantra for statistical processing states that “garbage in, garbage out"—unfortunately, “garbage out" and useful guidance are often indistinguishable before the event occurs or not. A supervising human in loop or an automatic quality control may prevent nonsensical values from Clyfar processing, but let us consider raw output in an implausible scenario of summer snow.

snow = 83 mm (3.3 inches)
mslp = 1050 hPa
wind = 1.0 ${m s}^{- 1}$
solar = 1100 ${W m}^{- 2}$

Figure 11. As Figure 8, but for an impossible scenario with summer snow, unforeseen by Clyfar.

We know Clyfar cannot offer a useful prediction. The lack of support in the data and a near-uniform distribution of (not very) possible outcomes represents substantial ignorance, which may be preferable over a deterministic, crisp ozone concentration that is extricated from scarce information: we obtain a (meta-)confidence in the confidence of an event. Cases that fall outside the ruleset (i.e., little activation of few rules) but still result in high-impact events resemble black swans [61] in that they have not been considered due to their absence in observation records. In a stationary climate with a long record of observation, we can confidently say some events—such as snow on July 1 at the Basin floor—are impossible, or “off the attractor" in the paradigm of chaotic, complex systems [62]. While we find Clyfar does suggest all outcomes are not very plausible, which is true, a non-optimized or restrictive model will continue to suffer from these problems if the set of rule permutations is not explored sufficiently. Indeed, a FIS with complete ruleset coverage would show ignorance during a nonsensical event (like this example) with low activations across all categories. The knowledge of a lack of knowledge is useful to know!

5. Case Study: Winter 2021/2022

We now present inferred values (resembling predictions) from this preliminary version of Clyfar, here marked as version 0.1 (v0.1), using observed weather variables and evaluating against collected ozone data for the same daily periods.

The advantage of choosing this winter is the two clear spikes in ozone concentration during the season, with only one event being captured by Clyfar. The high ozone was associated with typical precursors familiar to domain experts, such as calm wind and antecedent snowfall (not shown). We highlight three regions of the 2021/2022 season that illustrate the good, bad, and typical (null) performance quality of the Clyfar prototype. We order these subsections in chronological order; each event is labeled with a black arrow above the axis in Figure 12. We include the rank (a description of the percentile in which this possibility fell for this winter) for reference. It is intuitive that achieving an eextreme value of ozone is more difficult than a background level—we see a background value more often (regression to the mean).

5.1. 14 December 2021: Example of Background Signal

As noted above, crisp (determinstic) forecast values generated from Clyfar cannot exceed the center of the Gaussian curve for each category book-ending the universe of discourse (i.e., background and extreme ozone). This hard limit is an artifact of the defuzzification method (here the centroid method, a sort of weighted average), and can be addressed by changing this method [63], or perhaps post-processing with another algorithm or model. We configure Clyfar in a modular manner to allow for modification of algorithms or pre-/post-processing independently during optimization.

When we view output as the possibility of each category (Figure 13), Clyfar suggests background ozone levels are almost entirely possible (

\approx 0.95

), in contrast to almost impossible occurrence of the other, higher concentrations. Similarly to Figure 9, the impossibility of other categories makes background levels necessary—not just possible. Despite a high possibility value for background levels, we find this value to still be in the lowest quartile of possibility for the season. This is sensible: all else equal, it is more possible to achieve typical, background levels of ozone than rare, extreme levels. However, further interpretation is needed whether percentiles rather than raw possibility values are more useful to signal a potential low-risk, high-impact event at long (less predictable) lead times e.g., [64].

5.2. 2 January 2022: Poor Forecast

In this event, Clyfar predicted that an elevated level of ozone was most possible, but without high support in the data (evidenced by the possibility value

\approx 0.3

We see in Figure 14 that extreme levels of ozone, while deemed not likely by Clyfar, are predicted with a possibility in the top 2% of values for this winter. (This is possible with hindsight; operationally, percentiles would be computed from a longer archive.) The long tail of the extreme ozone category (e.g., Figure 7) allocates possibilistic weight towards high values, drawing the centroid towards a larger crisp-value forecast. In this poorly forecast event, one sees the benefit of preserving uncertainty of a possibility distribution as an additional source of forecast information to the deterministic prediction. A decision-maker would arguably avoid the worst losses from a missed event if there is an expression of uncertainty. In the context of the entire winter (Figure 12), we see the crisp predicted value (blue) is a stark false alarm in contrast to observed (orange), but also note the comparatively lower possibility of extreme values for the 2 Jan event compared to 27 Feb, as discussed next.

5.3. 27 February 2022: Good Forecast

Here, Clyfar excels in magnitude of crisp value and the sudden (nonlinear) increase in ozone levels on the same day as observed values rose substantially in tandem. We note the peak is not sustained in forecasts as long as the period observed; this prototype has no memory, and each forecast day is computed independently. Current work is underway coupling this prototype with, e.g., the previous day’s ozone concentration, given the common strong auto-correlation between yesterday’s and today’s ozone values (not shown).

To further understand the utility of a possibility distribution, compare the 2 January and 27 February cases in Figure 12 and Figure 15: while the time series of crisp values (deterministic ozone forecasts) has stark performance disparity, the 27 February case (high ozone observed) had larger possibility values for extreme ozone (red bars).

6. Synthesis and Future Work

In summary, performance of the preliminary version is promising. There is unsurprising need for optimization, potentially with gradient descent [30] and other machine-learning methods, and data mining may continue to provide insight into variables that explain more variance in the ozone time series [65]. The low possibility values in aggregate seen in activation output (e.g., Figure 10) suggests a larger coverage of the ruleset permutations is needed. Further, users would find the impact of a missed high-ozone event much worse than a false alarm due to the risk aversion inherent in oil and gas operations.

The display of nonlinearity in a prototype model is encouraging, exemplified by a sudden increase in ozone concentrations for the well forecast event in Figure 12 and Figure 15, despite a lack of day-to-day memory. It may be more common for Clyfar to infer higher levels of ozone if we include the previous day’s maximum as another input. Despite this, the deterministic crisp value of ozone concentration is a hedged forecast [66]. The defuzzification is not a representation of the most likely forecast, but rather a value that minimizes a perceived loss function (such as mean-square-error). Throughout development, the authors will use individual members from ensemble NWP models to drive instances of Clyfar. Ultimately, this yields an ensemble of possibility distributions and an ensemble of crisp values, from which users can generate an accessible summary of uncertainty in addition to a deterministic forecast. Members of a Monte Carlo collection of Clyfar forecasts are cheap to run in large numbers, enabling a wide sampling of forecast uncertainty.

It is difficult to communicate uncertainty [67], and the concept of possibilities (rather than probabilities) is not a familiar one for many stakeholders and air-quality scientists. However, we leave discussion of risk communication for a future manuscript. We decide not to normalize our possibility distribution (i.e., the heights of each bar chart or height of color-fill in activation results). Doing so would give a false sense that the rule coverage is sufficient to cover all outcomes, leaving the user susceptible to “black swan" (unforeseen; failure of imagination) events. The authors consider it more useful for this iteration of Clyfar to leave a non-zero possibility value assigned to an unknown category (conceptually, "unknown") rather than normalizing the bars (i.e., stretching the possibility values until at least one bar equals unity). However, this lack of rule coverage is information in its own right, borne from poor support in the data, and represents a lack of confidence in the possibility distribution—uncertainty of uncertainty!

Accordingly, small differences in the categories’ possibility values from day to day may represent large anomalies in terms of percentiles. Figure 16 shows distribution for the same season. Circle are single days, and the boxes indicate the interquartile range. Short boxes most likely indicate the extreme is difficult to achieve and the majority of days have a small, similar value that manifests a small range of possibility values. However, it may also be a sign of lack of ruleset coverage: output categories are insufficiently activated to capture the full complexity of the UBWO system.

Future Work: Optimizing and Deployment

The rudimentary version of Clyfar herein is a baseline for future versions and other models to beat in performance skill. A more complex model should only supplant a previous version when it shows skill increase worthy of an increase in computational demand or complexity (the latter of which comes at the loss of explainability of results). Use of machine learning techniques such as gradient descent [30] can optimize a fit faster if the areas of sampling are constrained closer to human-defined regions of phase space. Further, neural networks can optimize FISs [50], yielding a hybrid prediction system known as a neurofuzzy e.g., [68]. During optimization of Clyfar, data sparsity will hamper training of machine-learning methods. While satellite imagery is accessible and covers a wide area, it is most useful for identifying snowfall when it is already snowing (therefore unable to identify surface snow during storm passage).

The upcoming first operational version of Clyfar will ingest pre-processed NWP forecasts, rather than observations, such that inferences represent predictions. We intend to use Global Ensemble Forecasting System (GEFS) data [69,70] as input generating 14-day forecasts of daily maximum ozone concentration. These forecasts will be made available to the public via a website currently in development as part of Ozone Alert. We improve dissemination of Clyfar by holding site-user surveys and continue research deploying LLMs to tailor atmospheric-hazard risk communication appropriately for the end user advised by recent studies in LLM translation and communication skill [71,72].

Author Contributions

Conceptualization, J.R.L. and S.N.L.; methodology, J.R.L. and S.N.L.; software, J.R.L.; validation, J.R.L. and S.N.L; formal analysis, J.R.L. and S.N.L.; investigation, J.R.L. and S.N.L.; resources, J.R.L. and S.N.L.; data curation, J.R.L. and S.N.L.; writing—original draft preparation, J.R.L.; writing—review and editing, J.R.L. and S.N.L.; visualization, J.R.L. and S.N.L.; supervision, S.N.L.; project administration, S.N.L.; funding acquisition, S.N.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by Uintah County Special Service District 1 and the Utah Legislature.

Data Availability Statement

All data and methods are found in the Supplementary Material, and along with archived observations and worked Jupyter notebooks that generate figures herein, are also available upon request from the corresponding author. Code for Clyfar is documented and updated at https://github.com/Bingham-Research-Center/clyfar (accessed 1 August 2024) and more information on the Bingham Research Center data collection is found at https://www.usu.edu/binghamresearch/ (accessed 1 July 2024).

Acknowledgments

The authors thank the editor and two anonymous reviewers for their critique in improving this paper. The authors further thank Brian Blaylock of the U.S. Naval Research Laboratory for his continued work developing critical open-source python packages at https://github.com/blaylockbk (accessed 1 July 2024), with Trevor O’Neil and Michael Davies both assisting with previous and ongoing data collection and processing, respectively. JRL thanks his wife Taylor for delivering baby Finn during completion of this manuscript, and the patience of editorial team, coauthor, and wife alike. Brainstorming with OpenAI GPT-4 output accelerated project development and helped link disparate concepts. GitHub Co-Pilot output was used to assist python code development. No generative AI was used verbatim in the writing of this paper.

Conflicts of Interest

The authors have no outside conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BRC	Bingham Research Center
Clyfar	Computational Logic for Yielding Atmospheric Research
COOP	Cooperative Observation Program
EPA	Environmental Protection Agency
FIS	Fuzzy-logic Inference System
GEFS	Global Ensemble Forecast System
KVEL	Vernal Regional Airport
LLM	Large language model
MSLP	Mean sea level pressure
NWP	Numerical Weather Prediction
NAAQS	National Ambient Air Quality Standards
UBWO	Uinta Basin Winter Ozone
VOC	Volatile Organic Compound

Appendix A

Gaussian curves are each defined by mean (

\bar{x}

) and standard deviation (

σ

) values. This approach is implemented with the scikit-fuzz Python module [73] (https://github.com/scikit-fuzzy/scikit-fuzzy, accessed 1 January 2024). The general formula for each ozone level’s membership function is given by:

{VARIABLE}_{level} (x) = {exp}^{- \frac{{(x - \bar{x})}^{2}}{2 σ^{2}}}

(A1)

where “level” is replaced by the descriptive term for each membership function. We also use sigmoid (“S-shaped") functions in the FIS mechanics to represent variables that asymptote to 0 or 1. The sigmoid membership function is generated with the equation

μ_{x} = \frac{1}{1 + exp [- c \cdot (x - b)]}

(A2)

where

μ_{x}

is the membership value with respect to x; x is the variable of interest; b is the center value of the sigmoid (

y = \frac{1}{2}

); c controls the width of the sigmoidal region about b (magnitude) and determines the function’s shape. A positive value of c implies the left side approaches 0.0 while the right side approaches 1.0; likewise, vice versa for a negative sign. We show numerical values for each variable category’s membership function in Table A1. The range of that variable (formally the universe of discourse) considered by the FIS is also shown. Values outside of this range are clipped to the nearest value in that range.

Table A1. Parameters for membership functions shown graphically in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7.

Variable	Range	Units	Category	Function	$\bar{x}$	$σ$	b	c
wind	0–20	${m s}^{- 1}$	calm	sigmoid	-	-	2.5	-3.0
			breezy	sigmoid	-	-	2.5	3.0
snow	0–750	mm	negligible	sigmoid	-	-	70	-0.07
			sufficient	sigmoid	-	-	100	0.07
mslp	1000–1070	Pa	low	sigmoid	-	-	101300	-0.005
	( $\times 10^{2})$		average	Gaussian	102900	800	-	-
			high	sigmoid	-	-	104500	0.005
solar	0–1100	${W m}^{- 2}$	midwinter	sigmoid	-	-	300	-0.03
			winter	Gaussian	450	100	-	-
			spring	Gaussian	650	100	-	-
			summer	sigmoid	-	-	750	0.03
ozone	20–140	ppb	background	Gaussian	40	6.0	-	-
			moderate	Gaussian	52	5.5	-	-
			elevated	Gaussian	67	6.0	-	-
			moderate	Gaussian	95	10.0	-	-

Appendix B

We can construct rulesets for the ozone system as follows (mslp denoting MSLP):

\begin{matrix} 1 . & snow = negligible \lor mslp = low \lor wind = breezy \\ \to ozone = background \\ 2 . & snow = sufficient \land mslp = high \land wind = calm \land solar = spring \\ \to ozone = extreme \\ 3 . & snow = sufficient \land mslp = high \land wind = calm \land solar = winter \\ \to ozone = elevated \\ 4 . & snow = sufficient \land mslp = high \land wind = calm \land solar = (midwinter \lor summer) \\ \to ozone = moderate \\ 5 . & snow = sufficient \land mslp = average \land wind = calm \land solar = (winter \lor spring) \\ \to ozone = elevated \\ 6 . & snow = sufficient \land mslp = average \land wind = calm \land solar = (midwinter \lor summer) \\ \to ozone = moderate \end{matrix}

Table A2. Logical operators and associated functions for bivalent logic and fuzzy equivalents, where A and B represent independent events.

Description	Rendered	Bivalent Function	Fuzzy Function
Implication (IF...THEN)	→
A AND B	A∧ B	minimum	infimum
A OR B	A∨ B	maximum	supremum
NOT A	$\neg A$	( $1 - A$ )	( $1 - A$ )

References

Bader, J.W. Structural and tectonic evolution of the Douglas Creek arch, the Douglas Creek fault zone, and environs, northwestern Colorado and northeastern Utah: Implications for petroleum accumulation in the Piceance and Uinta basins. Rocky Mountain Geology 2009, 44, 121–145. [Google Scholar] [CrossRef]
Lyman, S.; Tran, T. Inversion structure and winter ozone distribution in the Uintah Basin, Utah, U.S.A. Atmos. Environ. 2015, 123, 156–165. [Google Scholar] [CrossRef]
Neemann, E.M.; Crosman, E.T.; Horel, J.D.; Avey, L. Simulations of a cold-air pool associated with elevated wintertime ozone in the Uintah Basin, Utah. Atmos. Chem. Phys. 2015, 15, 135–151. [Google Scholar] [CrossRef]
Mansfield, M.L. Statistical analysis of winter ozone exceedances in the Uintah Basin, Utah, USA. J. Air Waste Manag. Assoc. 2018, 68, 403–414. [Google Scholar] [CrossRef]
Finlayson-Pitts, B.J.; Pitts, Jr, J.N. Chemistry of the Upper and Lower Atmosphere: Theory, Experiments, and Applications; Elsevier, 1999.
Schnell, R.C.; Oltmans, S.J.; Neely, R.R.; Endres, M.S.; Molenar, J.V.; White, A.B. Rapid photochemical production of ozone at high concentrations in a rural site during winter. Nat. Geosci. 2009, 2, 120–122. [Google Scholar] [CrossRef]
Mansfield, M.L.; Hall, C.F. A survey of valleys and basins of the western United States for the capacity to produce winter ozone. J. Air Waste Manag. Assoc. 2018, 68, 909–919. [Google Scholar] [CrossRef]
Mansfield, M.L.; Hall, C.F. Statistical analysis of winter ozone events. Air Qual. Atmos. Health 2013, 6, 687–699. [Google Scholar] [CrossRef]
Oltmans, S.; Schnell, R.; Johnson, B.; Pétron, G.; Mefford, T.; Neely, III, R. Anatomy of wintertime ozone associated with oil and natural gas extraction activity in Wyoming and Utah. Elementa (Wash., DC) 2014, 2, 000024. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets. Information and Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Zadeh, L.A. The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets and Systems 1983, 11, 199–227. [Google Scholar] [CrossRef]
Lareau, N.P.; Crosman, E.; David Whiteman, C.; Horel, J.; Hoch, S.W.; Brown, W.O.J.; Horst, T.W. The Persistent Cold-Air Pool Study. Bull. Am. Meteorol. Soc. 2013, 94, 51–63. [Google Scholar] [CrossRef]
Terzago, S.; Andreoli, V.; Arduini, G.; Balsamo, G.; Campo, L.; Cassardo, C.; Cremonese, E.; Dolia, D.; Gabellani, S.; von Hardenberg, J.; Morra di Cella, U.; Palazzi, E.; Piazzi, G.; Pogliotti, P.; Provenzale, A. Sensitivity of snow models to the accuracy of meteorological forcings in mountain environments. Hydrol. Earth Syst. Sci. 2020, 24, 4061–4090. [Google Scholar] [CrossRef]
Matichuk, R.; Tonnesen, G.; Luecken, D.; Gilliam, R.; Napelenok, S.L.; Baker, K.R.; Schwede, D.; Murphy, B.; Helmig, D.; Lyman, S.N.; Roselle, S. Evaluation of the Community Multiscale Air Quality Model for Simulating Winter Ozone Formation in the Uinta Basin. J. Geophys. Res. D: Atmos. 2017, 122, 13545–13572. [Google Scholar] [CrossRef]
Tran, T.; Tran, H.; Mansfield, M.; Lyman, S. ; others. Four dimensional data assimilation (FDDA) impacts on WRF performance in simulating inversion layer structure and distributions of CMAQ-simulated winter ozone …. Atmos. Environ. 2018. [Google Scholar] [CrossRef]
Herrero, J.; Polo, M.J. Parameterization of atmospheric longwave emissivity in a mountainous site for all sky conditions. Hydrol. Earth Syst. Sci. 2012, 16, 3139–3147. [Google Scholar] [CrossRef]
Awan, N.K.; Truhetz, H.; Gobiet, A. Parameterization-induced error characteristics of MM5 and WRF operated in climate mode over the alpine region: An ensemble-based Analysis. J. Clim. 2011, 24, 3107–3123. [Google Scholar] [CrossRef]
Gilliam, R.C.; Hogrefe, C.; Rao, S.T. New methods for evaluating meteorological models used in air quality applications. Atmos. Environ. (1994) 2006, 40, 5073–5086. [Google Scholar] [CrossRef]
Squitieri, B.J.; Gallus, W.A. On the forecast sensitivity of MCS cold pools and related features to horizontal grid-spacing in convection-allowing WRF simulations. Weather Forecast. 2019. [Google Scholar] [CrossRef]
Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Liu, Z.; Berner, J.; Wang, W.; Powers, J.G.; Duda, M.G.; Barker, D.M.; Huang, X.Y. A description of the advanced research WRF model version 4. Technical report, 2019. [CrossRef]
Tennekes, H. Turbulent Flow In Two and Three Dimensions. Bull. Amer. Meteor. Soc. 1978, 59, 22–28. [Google Scholar] [CrossRef]
Bommer, P.L.; Kretschmer, M.; Hedström, A.; Bareeva, D.; Höhne, M.M.C. Finding the right XAI method—A guide for the evaluation and ranking of Explainable AI methods in climate science. Artificial Intelligence for the Earth Systems 2024, 3. [Google Scholar] [CrossRef]
Potvin, C.K.; Flora, M.L.; Skinner, P.S.; Reinhart, A.E.; Matilla, B.C. Using machine learning to predict convection-allowing ensemble forecast skill: Evaluation with the NSSL Warn-on-Forecast System. Artificial Intelligence for the Earth Systems 2024, 3. [Google Scholar] [CrossRef]
Casallas, A.; Ferro, C.; Celis, N.; Guevara-Luna, M.A.; Mogollón-Sotelo, C.; Guevara-Luna, F.A.; Merchán, M. Long short-term memory artificial neural network approach to forecast meteorology and PM2.5 local variables in Bogotá, Colombia. Model. Earth Syst. Environ. 2022, 8, 2951–2964. [Google Scholar] [CrossRef]
Park, M.; Zheng, Z.; Riemer, N.; Tessum, C.W. Learned 1D passive scalar advection to accelerate chemical transport modeling: A case study with GEOS-FP horizontal wind fields. Artificial Intelligence for the Earth Systems 2024, 3. [Google Scholar] [CrossRef]
Lindsey, D.; McNoldy, B.; Finch, Z.O.; Henderson, D.; Lerach, D.; Seigel, R.; Steinweg, J.; Stuckmeyer, E.A.; Van Cleave, D.T.; Williams, G.; Woloszyn, M. A high wind statistical prediction model for the northern Front Range of Colorado. Electronic Journal of Operational Meteorology 2011. [Google Scholar]
Keisler, R. Forecasting global weather with graph neural networks. arXiv [physics.ao-ph], 2022; [arXiv:physics.ao-ph/2202.07575]. [Google Scholar] [CrossRef]
Jeon, H.J.; Kang, J.H.; Kwon, I.H.; Lee, O.J. CloudNine: Analyzing Meteorological Observation Impact on Weather Prediction Using Explainable Graph Neural Networks. arXiv [cs.LG], 2024; arXiv:cs.LG/2402.14861. [Google Scholar] [CrossRef]
Hakim, G.J.; Masanam, S. Dynamical tests of a deep-learning weather prediction model. Artificial Intelligence for the Earth Systems 2024, 3. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, É. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Driess, D.; Xia, F.; Sajjadi, M.S.M.; Lynch, C.; Chowdhery, A.; Ichter, B.; Wahid, A.; Tompson, J.; Vuong, Q.; Yu, T.; Huang, W.; Chebotar, Y.; Sermanet, P.; Duckworth, D.; Levine, S.; Vanhoucke, V.; Hausman, K.; Toussaint, M.; Greff, K.; Zeng, A.; Mordatch, I.; Florence, P. PaLM-E: An Embodied Multimodal Language Model. arXiv [cs.LG], 2023; arXiv:cs.LG/2303.03378. [Google Scholar] [CrossRef]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; Schuh, P.; Shi, K.; Tsvyashchenko, S.; Maynez, J.; Rao, A.; Barnes, P.; Tay, Y.; Shazeer, N.; Prabhakaran, V.; Reif, E.; Du, N.; Hutchinson, B.; Pope, R.; Bradbury, J.; Austin, J.; Isard, M.; Gur-Ari, G.; Yin, P.; Duke, T.; Levskaya, A.; Ghemawat, S.; Dev, S.; Michalewski, H.; Garcia, X.; Misra, V.; Robinson, K.; Fedus, L.; Zhou, D.; Ippolito, D.; Luan, D.; Lim, H.; Zoph, B.; Spiridonov, A.; Sepassi, R.; Dohan, D.; Agrawal, S.; Omernick, M.; Dai, A.M.; Pillai, T.S.; Pellat, M.; Lewkowycz, A.; Moreira, E.; Child, R.; Polozov, O.; Lee, K.; Zhou, Z.; Wang, X.; Saeta, B.; Diaz, M.; Firat, O.; Catasta, M.; Wei, J.; Meier-Hellstern, K.; Eck, D.; Dean, J.; Petrov, S.; Fiedel, N. PaLM: Scaling Language Modeling with Pathways. arXiv [cs.CL], 2022; arXiv:cs.CL/2204.02311. [Google Scholar]
BigScience, Workshop.; Le Scao, T.; Fan, A.; Akiki, C.; Pavlick, E.; Ilić, S.; Hesslow, D.; Castagné, R.; Luccioni, A.S.; Yvon, F. et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv [cs.CL], 2022. [Google Scholar]
Flora, M.L.; Potvin, C.K.; McGovern, A.; Handler, S. A Machine Learning Explainability Tutorial for Atmospheric Sciences. Artificial Intelligence for the Earth Systems 2024, 3. [Google Scholar] [CrossRef]
Camastra, F.; Ciaramella, A.; Giovannelli, V.; Lener, M.; Rastelli, V.; Staiano, A.; Staiano, G.; Starace, A. A fuzzy decision system for genetically modified plant environmental risk assessment using Mamdani inference. Expert Syst. Appl. 2015, 42, 1710–1716. [Google Scholar] [CrossRef]
Chase, R.J.; Harrison, D.R.; Lackmann, G.M.; McGovern, A. A Machine Learning Tutorial for Operational Meteorology. Part II: Neural Networks and Deep Learning. Weather Forecast. 2023, 38, 1271–1293. [Google Scholar] [CrossRef]
Höhlein, K.; Schulz, B.; Westermann, R.; Lerch, S. Postprocessing of Ensemble Weather Forecasts Using Permutation-Invariant Neural Networks. Artificial Intelligence for the Earth Systems 2024, 3. [Google Scholar] [CrossRef]
Horel, J.; Splitt, M.; Dunn, L.; Pechmann, J.; White, B.; Ciliberi, C.; Lazarus, S.; Slemmer, J.; Zaff, D.; Burks, J. Mesowest: cooperative mesonets in the western United States. Bull. Am. Meteorol. Soc. 2002, 83, 211–225. [Google Scholar] [CrossRef]
Shapiro, A.F. The merging of neural networks, fuzzy logic, and genetic algorithms. Insur. Math. Econ. 2002, 31, 115–131. [Google Scholar] [CrossRef]
Zadeh, L.A. Is there a need for fuzzy logic? Inf. Sci. 2008, 178, 2751–2779. [Google Scholar] [CrossRef]
Sarker, I.H. AI-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef]
Asklany, S.A.; Elhelow, K.; Youssef, I.K.; Abd El-wahab, M. Rainfall events prediction using rule-based fuzzy inference system. Atmos. Res. 2011, 101, 228–236. [Google Scholar] [CrossRef]
Mitra, A.K.; Nath, S.; Sharma, A.K. Fog forecasting using rule-based fuzzy inference system. J. Ind. Soc. Remote Sens. 2008, 36, 243–253. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Possibility theory: An approach to computerized processing of uncertainty; Plenum Press: New York; London, 1988.
Lorenz, E.N. Deterministic Nonperiodic Flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
May, R.M. Simple mathematical models with very complicated dynamics. Nature 1976, 261, 459–467. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Fuzzy set and possibility theory-based methods in artificial intelligence. Artif. Intell. 2003, 148, 1–9. [Google Scholar] [CrossRef]
Nedjah, N.; de Macedo Mourelle, L. Fuzzy systems engineering: Theory and practice, 2005 ed.; Studies in Fuzziness and Soft Computing, Springer: Berlin, Germany, 2005. [Google Scholar]
Chang, F.J.; Chang, Y.T. Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv. Water Resour. 2006, 29, 1–10. [Google Scholar] [CrossRef]
Abraham, A. Adaptation of fuzzy inference system using neural learning. In Fuzzy Systems Engineering; Studies in fuzziness and soft computing, Springer Berlin Heidelberg: Berlin, Heidelberg, 2005; pp. 53–83. [Google Scholar] [CrossRef]
Zadeh, L.A.; Klir, G.J.; Yuan, B. Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers; World Scientific, 1996.
Mamdani, E.H. Advances in the linguistic synthesis of fuzzy controllers. Int. J. Man. Mach. Stud. 1976, 8, 669–678. [Google Scholar] [CrossRef]
Williams, R.M.; Ferro, C.A.T.; Kwasniok, F. A comparison of ensemble post-processing methods for extreme events. Quart. J. Roy. Meteor. Soc. 2014, 140, 1112–1120. [Google Scholar] [CrossRef]
Sterk, A.E.; Stephenson, D.B.; Holland, M.P.; Mylne, K.R. On the predictability of extremes: Does the butterfly effect ever decrease? Q.J.R. Meteorol. Soc. 2016, 142, 58–64. [Google Scholar] [CrossRef]
Palmer, T.N.; Döring, A.; Seregin, G. The real butterfly effect. Nonlinearity 2014, 27, R123. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1978, 1, 3–28. [Google Scholar] [CrossRef]
Le Carrer, N. Possibly extreme, probably not: Is possibility theory the route for risk-averse decision-making? Atmos. Sci. Lett. 2021, 22. [Google Scholar] [CrossRef]
Zadeh, L. A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination. AI Mag. 1985, 7, 85–90. [Google Scholar] [CrossRef]
Oussalah, M. On the normalization of subnormal possibility distributions: New investigations. Int. J. Gen. Syst. 2002, 31, 277–301. [Google Scholar] [CrossRef]
Buizza, R. Accuracy and Potential Economic Value of Categorical and Probabilistic Forecasts of Discrete Events. Mon. Weather Rev. 2001, 129, 2329–2345. [Google Scholar] [CrossRef]
Taleb, N.N. The black swan the impact of the highly improbable, ed., 1st ed.; Random House: New York, 2007. [Google Scholar]
Palmer, T.N. Quantum Reality, Complex Numbers, and the Meteorological Butterfly Effect. Bull. Am. Meteorol. Soc. 2005, 86, 519–530. [Google Scholar] [CrossRef]
Chakraverty, S.; Sahoo, D.M.; Mahato, N.R. Defuzzification. In Concepts of Soft Computing; Springer Singapore: Singapore, 2019; pp. 117–127. [Google Scholar] [CrossRef]
Li, H.; Wang, X.; Choy, S.; Jiang, C.; Wu, S.; Zhang, J.; Qiu, C.; Zhou, K.; Li, L.; Fu, E.; Zhang, K. Detecting heavy rainfall using anomaly-based percentile thresholds of predictors derived from GNSS-PWV. Atmos. Res. 2022, 265, 105912. [Google Scholar] [CrossRef]
Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques; Morgan Kaufmann, 2022.
Wilks, D.S. Statistical Methods in the Atmospheric Sciences; Academic Press, 2011.
Demuth, J.L.; Morss, R.E.; Palen, L.; Anderson, K.M.; Anderson, J.; Kogan, M.; Stowe, K.; Bica, M.; Lazrus, H.; Wilhelmi, O.; Henderson, J. “Sometimes da #beachlife ain’t always da wave”: Understanding People’s Evolving Hurricane Risk Communication, Risk Assessments, and Responses Using Twitter Narratives. Weather Clim. Soc. 2018, 10, 537–560. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Teshnehlab, M. Using adaptive neuro-fuzzy inference system for hydrological time series prediction. Appl. Soft Comput. 2008, 8, 928–936. [Google Scholar] [CrossRef]
Zhou, X.; Zhu, Y.; Hou, D.; Fu, B.; Li, W.; Guan, H.; Sinsky, E.; Kolczynski, W.; Xue, X.; Luo, Y.; Peng, J.; Yang, B.; Tallapragada, V.; Pegion, P. The development of the NCEP global ensemble forecast system version 12. Weather Forecast. 2022, 37, 1069–1084. [Google Scholar] [CrossRef]
Harrison, L.; Landsfeld, M.; Husak, G.; Davenport, F.; Shukla, S.; Turner, W.; Peterson, P.; Funk, C. Advancing early warning capabilities with CHIRPS-compatible NCEP GEFS precipitation forecasts. Sci. Data 2022, 9, 375. [Google Scholar] [CrossRef]
Trujillo-Falcón, J.E.; Reedy, J.; Klockow-McClain, K.E.; Berry, K.L.; Stumpf, G.J.; Bates, A.V.; LaDue, J.G. Creating a Communication Framework for FACETs: How Probabilistic Hazard Information Affected Warning Operations in NOAA’s Hazardous Weather Testbed. Weather, Climate, and Society 2022, 14, 881–892. [Google Scholar] [CrossRef]
Lawson, J.R.; Flora, M.L.; Goebbert, K.H.; Lyman, S.N.; Potvin, C.K.; Schultz, D.M.; Stepanek, A.J.; Trujillo-Falcón, J.E. Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication. arXiv [cs.CL], 2024; arXiv:cs.CL/2404.15166]. [Google Scholar]
Warner, J. JDWarner/scikit-fuzzy: Scikit-Fuzzy version 0.4.2. [CrossRef]

Figure 1. Geographical domain of the present study. Panel (a) is a satellite image showing the approximate bounding box of the Uinta Basin. The red circle and text denotes the radius from which all available observations were obtained for the study period. The red cross marks the center of that radius. Blue circles are towns; green points are geological features; black squares mark observation stations reporting snow depth via the COOP network. Major orographic features bounding the Basin’s perimeter are labeled with a cyan background. The black vertical line marks the Utah–Colorado boundary (Utah to the west). In panel (b), the context of the Uinta Basin (whose bounding box is labeled and marked in red) in shown within the Intermountain West of the continental United States.

Figure 2. Scatter plot of representative ozone against wind speed for the 2021–2022 winter. The purple dashed line indicates the NAAQS limit. Red scatter markers denote days in exceedence of the NAAQS limit; orange markers are within 10 ppb.

Figure 3. Membership function for the representative Basin value for 10-m wind speed. The x-axis range is zoomed to capture the salient aspects of the sigmoids.

Figure 4. Membership function for daily median snow depth. As in Figure 3, x-axis range is zoomed to capture the salient aspects of the sigmoids.

Figure 5. Membership function for daily median mean-sea-level pressure (MSLP).

Figure 6. Membership function for incoming short-wave solar radiation.

Figure 7. Membership function for daily maximum of atmospheric ozone concentration.

Figure 8. The possibility of each ozone fuzzy set (filled color), membership function overlaid (colored line), both generated by the inference system for a likely high-ozone day.

Figure 9. As Figure 8, but for a scenario unlikely to yield ozone in excess of background levels.

Figure 10. As Figure 8, but for a scenario at the cusp of yielding predictions of elevated ozone levels.

Figure 12. Full forecast of centroid (hedged) and observed (orange) values, and four possibility levels overlaid so that more extreme levels are plotted higher in the stack of bars for conspicuousness.

Figure 13. Inferred possibility of ozone categories valid 14 December 2021, showing background predicted well. F and O denote the rough category that the forecast and observed values fell into, respectively. The annotated rank displays that possibility value’s percentile in this winter’s set.

Figure 14. Forecast of ozone categories valid 2 January 2022, subjectively a poorly forecast case.

Figure 15. Forecast of ozone categories valid 27 February 2022. This was a subjectively good forecast, including in the deterministic time series (Figure 12).

Figure 16. Box-and-whisker distribution plot for possibility for each category of ozone-concentration daily maximum for the winter 2021/2022. Circles are individual events. Boxes represent the interquartile range.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin

Abstract

1. Introduction

1.1. Seeking an Alternative to Traditional Air-Quality Models

1.2. From Machine Intelligence to Ozone Prediction

2. Data and Methodology

2.1. Data Sources and Pre-Processing

2.2. Fuzzy Logic: Background and Justification

3. Configuration of Clyfar: A Fuzzy Inference System for Ozone Prediction

3.1. Overview of Approach

3.2. Pre-Processing and Membership Functions

3.3. Ruleset of UBWO Behavior

4. Illustrative Examples

4.1. Case 1: Ozone Likely

4.2. Case 2: Ozone Unlikely

4.3. Case 3: On the Cusp

4.4. Case 4: Ignorance

5. Case Study: Winter 2021/2022

5.1. 14 December 2021: Example of Background Signal

5.2. 2 January 2022: Poor Forecast

5.3. 27 February 2022: Good Forecast

6. Synthesis and Future Work

Future Work: Optimizing and Deployment

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

MDPI Initiatives

Important Links

Subscribe