This section summarizes opportunities, issues, and discussions found in academic research to identify research gaps. First, it goes over the current state of research regarding blockchains and smart contracts in general. Second, it summarizes open questions and identifies challenges regarding the implementation of oracles.
3.1. Summary of Research
While the topic of oracles is still quite niche and largely unknown outside of the blockchain field, numerous papers have been published. These papers largely fall into three categories: (1) hybrid smart contracts, (2) oracle technology and implementations (3) specific use cases. In the context of this paper, (2) and, to a certain extent, (3) are the most relevant.
The 2016 paper “The Blockchain as a Software Connector” by Xu et al. is among the first independent research papers discussing off-chain and on-chain communication and data transfer in the context of smart contracts. A key argument made in this paper is that in certain transactions, first-party oracles are sufficient. The example mentioned involves a person renewing their driving license. In this case, it is assumed that the person already trusts the issuing party, and there is no need to decentralize the data feed or employ a third party. If there is no trust between the transacting parties, a “validation oracle” may be needed.
In 2017, Chainlink released its initial whitepaper [
151], just 14 days before its initial token sale (ICO) De Collibus et al. [
152]. The whitepaper was a complete, yet purely theoretical, introduction to Chainlink’s proposed solution to the oracle problem. Aside from the initial design, the paper highlights a need for composability - building simple tools that can be combined to form complex systems. While this paper mentions the concept of off-chain data aggregation, it falls short of mentioning DONs. Several of the decentralized computation functions mentioned in the theory section are only mentioned briefly or not discussed. Nonetheless, the release of this whitepaper represents a significant milestone for, in some ways, the birth of the decentralized blockchain oracle space.
In the paper “The Oracle Problem - An Analysis of How Blockchain Oracles Undermine the Advantages of Decentralized Ledger Systems”, Egberts [
153] discusses how centralized oracles pose a single point of failure and methods such as multi-source data feeds and reputation systems that may help solve this problem. The influence of Egbert’s writing is evident through the way it inspires other papers, such as Breidenbach et al. [
26]. Eberhardt and Tai [
154] published a paper highlighting the need for a way in which blockchains may be able to move data off-chain for computation. While not explicitly mentioning oracles, the paper concludes, “We still consider off-chaining techniques to be key tools in blockchain-based application engineering as they introduce additional functionality and potentially significant cost benefits” [
154]. As discussed previously, oracles now offer a variety of off-chain computation tools.
2018 and 2019 saw the launch of the first wave of dApps and a variety of other novel blockchain applications. Some of the core learnings about practical blockchain application design gained during this time are synthesized in the book “Architecture for Blockchain Applications” [
155]. Including the three ways in which blockchains may communicate with the real world: oracles, reverse oracles, and pairings of legal and smart contracts (p. 113). According to the book, oracles have two main drawbacks. First, trust - oracles vary in quality depending on implementation from centralized first-party oracles to decentralized oracle networks. Users of Oracles must understand this and select an implementation that respects the given circumstances. Secondly, validity - data can not natively be verified by the consensus of a given chain; there is a full reliance on the oracle (p. 117).
By 2020, the oracle problem began receiving more attention, with several directly related papers being published. Giulio Caldarelli published a systematic literature review (2020a), which found that in most papers addressing a use case for which off-chain connectivity is needed, the oracle problem was not taken into consideration. Shortly thereafter, he published a journal article titled “Understanding the Blockchain Oracle Problem: A Call for Action” (2020b). This article hypothesizes oracle implementations for various use cases and raises questions about the reputation of data sources: Can a firm’s reputation alone counter the oracle problem in supply chain use cases? Can patients in a health care system act as oracles? Is it at all possible to manage an energy market platform without a central authority?
2020 also marked the publishing of one of the most commonly cited papers in the oracle field, “Trustworthy Blockchain Oracles: Review, Comparison, and Open Research Challenges” by Albreiki et al. (2020). This paper compares the approaches of a variety of decentralized oracle protocols, including Augur, Chainlink, Witnet, ASTRAEA, and Aeternity. Challenges identified include developing ways in which smart contracts may feature integrated fail-safes in the case of unintended data and implementing ways to cryptographically verify data received from oracle nodes. Such methods are now being implemented by various projects, but it is unclear whether it has become a widespread industry standard. Most notably it concerns the trustworthiness and decentralization of actors participating in oracle networks [
156]. The importance of decentralization with regard to the reliability of oracles is also highlighted in the paper “Reliability Analysis for Blockchain Oracles” [
157] and the paper “Blockchain Oracles: A Framework for Blockchain-Based Applications” (Memmadzada et al.,2020) which explain how centralized oracles may work well for permissioned systems, but multi-source oracles are better suited for situations with a large number of actors. Also noteworthy, in 2020, Towncrier had not yet been purchased by Chainlink and was mentioned by several papers as a standalone oracle implementation.
In December 2020, the World Economic Forum published a whitepaper titled “Bridging the Governance Gap: Interoperability for blockchain and legacy systems” written by Chainlink Labs in collaboration with the Blockchain and Digital Assets team of the World Economic Forum [
158]. The report gives an example of how India’s national crop insurance portal (NCIP) could effectively leverage decentralized oracle feeds to validate data. Going on to state, “An oracle network with a proven incentive and reputation system would also unlock a marketplace for the larger open source community to enrich NCIP’s forecast systems with increasingly localized and granular data.” In this case, oracles would be able to augment a legacy system to improve the reliability of weather information systems. Other examples mentioned include customs processing and vehicle registration systems. These cases demonstrate how smart contracts may be integrated with current infrastructure rather than disrupting them outright.
Another relevant 2020 paper, “Blockchain for COVID-19: Review, Opportunities, and a Trusted Tracking System” (Marbouh, 2020) proposes a dApp to track new cases, recoveries, and deaths due to COVID-19. The proposed application uses multiple trusted oracles to get relevant data from various official sources. The oracles send the data to a smart contract that registers the data and which oracle broadcast it. The data is aggregated, and oracles are assigned a trust score based on how closely their data matches the aggregated values. While using multiple oracles does “avoid the occurrence of a single point of failure,” as stated in the paper, this implementation does not mention any economic incentives for individual nodes to behave honestly. This is likely not relevant since nodes are considered trusted. If nodes were untrusted, i.e., with anyone able to join as a node operator, this implementation would be insufficient. The paper does highlight ways for permissioned systems to natively integrate oracles via smart contracts.
2021 saw Giulio Caldarelli continue publishing research about blockchain oracles, focusing on separate use cases such as the fashion supply chain [
72], decentralized finance [
159], and wrapped tokens [
88]. The paper “Verifiable Computing Applications in Blockchain” [
160] briefly mentions the use case of verifiable randomness (VRF) in the context of verifiable computation. Another paper from this year [
127] closely examines Chainlink oracle activity, providing insights regarding the demand for data, node earnings, and data regarding data feeds. According to this paper, Chainlinks customers were primarily interested in price feeds for DeFi applications, with 75% of price feed traffic originating from the DeFi project Synthetix. Interestingly, at the time of writing, Synthetix uses Pyth in addition to Chainlink. Furthermore, the paper speculates that high fees on smart contract platforms at the time may be a reason the demand for oracle feeds outside of price and market data was low. The paper concludes, “...the Chainlink ecosystem on the Ethereum network appears to be driven purely by DeFi’s demand for decentralized market price feeds”.
A Systemization of Knowledge titled “SoK: Oracles from the Ground Truth to Market Manipulation” by Eskandari et al. [
161] provides a thorough summary of available research at the time. Included in the paper is a table comparing available oracle implementations and modules. Attributes mentioned include how reporting nodes are selected from a decentralized network, types of data sources (categorized into API, Human, Smart Contract, and HTTPS), aggregation mechanisms, and more. Considered are the possibly largest number of oracle projects up to this point. Most projects mentioned in this SoK are still active today and are also considered in this paper. Several important considerations regarding the crypto-economic systems supporting oracle networks are also discussed in this publication. One such insight is that on-chain modules on public blockchains may lead to high fees for data consumers, potentially disqualifying applications with a need for frequently updated data and limited proportional revenue. This may explain why DeFi applications make up the largest portion of oracle data consumers; here, data is used to directly facilitate revenue-generating activity. The paper concludes with the warning that the failure of a prominent oracle project may lead to a chain reaction with applications using this specific oracle implementation failing as well. While several DeFi applications today integrate a secondary fall-back oracle it would be important to see how commonly this is.
2021 also marked the publication of Chainlink’s updated whitepaper “Chainlink 2.0: Next Steps in the Evolution of Decentralized Oracle Networks” [
26]. This paper was another major milestone as the previous whitepaper, acting as the primary document explaining how the network works, had been published in 2017, two years before Chainlink went live on mainnet. The years of practically working and experimenting with a live network have allowed Chainlink labs to iterate on and expand the functionalities of its oracle implementation. For comparison, the initial whitepaper is 38 pages, while the whitepaper 2.0 is 136. Three authors contributed to the initial whitepaper, while the whitepaper 2.0 credits 14 authors. The Whitepaper 2.0 explains in exact detail how Chainlink’s oracle implementation operates regarding security, crypto-economic incentives, and how specific functions contribute value. All in all, there is a remarkable difference in research attention dedicated to oracles between 2017 and 2021. Of course, in 2017, oracles were in their infancy, and even today, the topic is still relatively unknown in the great scope of things.
In 2022 a number of relevant papers were published. “On the Integration of Blockchain With IoT and the Role of Oracle in the Combined System: The Full Picture” by Sadawi et al. (2022) discusses the importance of oracles when integrating IoT with blockchain, something that was not extensively researched previously. In the scope of the paper, the authors set up a prototype CO2 sensor able to communicate with Ethereum smart contracts. While this was only a basic prototype using a centralized, first-party connection, it helps illustrate the means by which sensors may make data available on-chain to those unfamiliar with the topic. Pasdar et al., meanwhile, published a paper focused on the technical underpinnings of various oracle implementations (2023). The oracle protocol selection taken into account in this paper is based on Egbert’s 2017 paper. Pasadar et al. argue that oracle implementations largely fall into one of two categories: voting-based oracles and reputation-based oracles. Voting-based oracles are built centered around explicit incentives, while reputation-based oracles are focused on implicit incentives.
The paper “Toward Trustworthy DeFi Oracles: Past, Present, and Future” [
144] compares the performance and trustworthiness of popular oracle implementations, including Chainlink, Band, DOS, Nest, and Witnet. Furthermore, the paper outlines a rubric by which the trustworthiness of an oracle may be evaluated. Namely, accuracy - as in accounting for bias from data, time-efficiency, scalability, and security - short-term and long-term adversarial costs. The paper also points out that major DeFi protocols may begin operating as oracles themselves, providing market data feeds natively and directly.
2022 also saw the publication of two papers addressing oracles and associated research directly: “Blockchain Oracles: State-of-the-Art and Research Directions” [
162] and “Overview of Blockchain Oracle Research” (Caldarelli, 2022). The former highlights several open challenges faced by oracle implementations as of the time of writing. Mostly, these have to do with a need for deeper evaluations concerning cost, performance, and security. This is an important point, as some oracle implementations of crypto-economic systems are incomplete or subsidized with the likely goal of increasing adoption. The latter paper evaluates the amount of research done relating to oracles in the seven years prior to its publication, finding a total of 162 relevant papers. It concludes that blockchain oracles are “still a widely neglected subject” despite their crucial importance in securing decentralized applications. This is likely to change in coming years as oracles find more use cases and implementations are iterated upon.
Now, in 2023, research continues. In the paper “Before Ethereum. The Origin and Evolution of Blockchain Oracles” Caldarelli and Ellul [
72] shines a spotlight on the beginnings of the oracle field. In 2011, Satoshi Nakamoto mentioned the possibility of running scripts dependent on external data on Bitcoin. Developer Mike Hearn recognized potential issues with relying on a single data source and theorized how an oracle implementation may overcome this. As a result, there were several little-known attempts to implement trust-minimized oracles on Bitcoin, including Oraclize, ORISI, Reality Keys, Counterparty, and Truthcoin. Ultimately, oracles were never widely adopted on Bitcoin, likely because smart contracts were far more easily implemented on purpose-built blockchains, and many Bitcoiners were opposed to storing external data on-chain. This paper shows that a small number of developers have long been familiar with and attempting to solve the oracle problem. Unfortunately, most early Bitcoin developers primarily shared insights via private interactions, email chains, and forum posts, not academic publications.
Compared to the early days of trying to solve the oracle problem on Bitcoin, there is now a significantly higher amount of academic interest in the topic, and there is likely even more developer interest. While Caldarelli is right in pointing out that this topic is still widely neglected, considering its importance, this may be changing.
3.2. Open Questions and Challenges
Considering the aforementioned literature, there are clearly still plenty of little or completely unexplored topics, open challenges, and unanswered questions. Largely, research impulses regarding blockchain oracles are twofold. On the one hand, explicitly defined questions outlined in the existing literature. On the other hand, implicit research impulses based on little-explored challenges and untouched topics. This chapter explores these research impulses with the goal of synthesizing specific research questions, which may later serve as a basis when defining interview questions to be tailored to industry insiders and researchers.
Firstly, as previously discussed, the book “Architecture for Blockchain Applications” [
155] outlines two defining factors that are vital when building trustless hybrid smart contracts: trust and validity. Trust refers to the structure of a given oracle implementation. The degree of decentralization applied with regard to a specific oracle structure, from data creation to smart contract execution, plays a decisive role in both the functionality and security of a smart contract. In a proof of reserves use case, for example, it may only be possible to have one data provider, which prompts the question of whether a decentralized network of nodes adds an additional level of reliability. In other cases, such as price feeds, data aggregation does improve reliability. A question that arises here is whether a specific oracle implementation is superior to others between networks of off-chain decentralized oracle nodes, a separate blockchain as part of the oracle implementation, or aggregating data on-chain after receiving it from a variety of first-party oracles. It appears as though the ideal implementation may be case-specific. Looking further into this point would also address the question raised by Caldarelli et al. [
23] regarding whether a firm’s reputation alone might be enough to counter the oracle problem in supply chain use cases. A comprehensive comparison of different use cases and available solutions would be helpful for smart contract architects.
“Demystifying Pythia: A Survey of ChainLink Oracles Usage on Ethereum” [
127] points out that “the number of individual users of the Chainlink platform is not very high.” and that Chainlink price feeds were the most in-demand. A few other publications mentioned above seem to confirm this sentiment. Meanwhile, several newer oracle platforms are focused exclusively on price feeds. This raises questions about other use cases, such as supply chain, healthcare, etc., which have been explored in research. Overall, the growth and discovery of practical hybrid smart contract use cases seem to be one of the greatest challenges to them being widely implemented. This raises several questions about the broader utilization of oracles beyond market data. For instance, if such utilization is low, could this be attributed to particular factors like economic constraints or challenges in identifying and developing distinct use cases? Additionally, it is worth considering whether the demand for specific data feeds influences their supply or vice versa. Lastly, whether the future will bring a single, universal oracle solution or multiple specialized implementations catering to varied use cases, such as price feeds, randomness, and sensor data, remains to be seen.
Another general question brought up is regarding the maturity of the oracle space. Several papers mention the state of the oracle space at the time of writing. Has there continued to be rapid progress as there was between 2017 and 2021? What future developments are on the horizon? Understanding where the oracle space is currently and how experts see it developing both in the near and far future could provide an updated outlook regarding greater trends in the hybrid smart contract and, to an extent, the DeFi, space.
As discussed in the first part of the literature review, the publication “SoK: Oracles from the Ground Truth to Market Manipulation” [
161] brings up two important points. Firstly it highlights two rarely discussed risks: the potential fallout for external projects in the unlikely case an oracle project or implementation fails and that the economic security of token-based oracles may be undermined in case the token drastically loses value. The former case could be addressed by using fall-back oracles or possibly by adding another level of aggregation. The area of integrated fail safes and data verification was also mentioned as an open research area in “Trustworthy Blockchain Oracles: Review, Comparison, and Open Research Challenges” [
156]. “Blockchain Oracles: State-of-the-Art and Research Directions” [
162] goes on to state the need for deeper evaluations concerning cost, performance, and security.
Furthermore, Eskandari et al. [
161] point out that oracles may employ entirely different methods of sourcing data. While some use an API, others may gather data directly from people, among other methods. As discussed earlier, selecting the ideal constellation from data sourcing to data delivery depends heavily on use cases, with not every case being as decentralizable as others. In some cases, there is only one original source of the data, for example, when asking a patient how they feel. There is no way to decentralize that data anymore, and adding any further intermediaries may introduce points of failure. This raises the question of how difficult it is to develop a proper methodology for data provision for a given use case, even within the framework of existing oracle implementations. This also links back to the earlier questions regarding the general development process of new use cases. Furthermore, it might be possible to develop a rubric by which new use cases and their difficulty of implementation may be evaluated.
Topics that seem to be neglected in research are questions regarding the relationships between the various stakeholders involved in providing data feeds to smart contracts outside of purely economic aspects and opportunities for coaxial business models that may either build on present constellations or create value by augmenting and improving them. Additionally, it may be interesting to ask different stakeholders directly what use cases they are most excited about to better understand what drives and motivates their involvement in such a new industry.