2.1. Understanding Data Lakes and Data Meshes
A Data Lake (DL) is a centralized architecture designed to store vast amounts of structured, unstructured and semi-structured data at any scale. Unlike traditional databases or data warehouses that require data to be structured before storage, a DL allows for hosting raw data in its native format [
13]. This means data from various sources like logs, clickstreams, social media, videos, and sensor data can be stored without the need for pre-defined schemas. DLs offer storage flexibility, allowing the storage of data in its raw form without upfront schema definition. This feature enables the accommodation of various data types and formats from diverse sources at any production frequency. DLs are highly scalable and capable of handling very large datasets making them ideal for Big Data applications. They often provide cost-effective storage options by leveraging cloud object storage, resulting in more economical solutions compared to traditional data warehouses.
DLs seamlessly integrate with tools and technologies that enable processing, querying, and analyzing the stored data. Properly configured DLs can implement security measures and data governance policies to ensure privacy and compliance with regulations. While DLs offer a high degree of flexibility, they require careful management to prevent them from becoming "data swamps", that is, hosting places where data is poorly organized, difficult to find, and hard to analyze [
14]. To address this concern, practices like metadata management, data cataloguing, and establishment of data governance policies are crucial.
Figure 1 presents the structure of a DL and an algorithmic description of how the DL concept works in practice, from collecting the data, annotating it using metadata, storing it and finally retrieving it based on the metadata tags.
The concept of Data Mesh (DM), as mentioned in the previous section, was introduced in 2019 [
15], which essentially represents a novel approach to data management within large organizations. Unlike traditional methods, a DM emphasizes several key concepts to revolutionize data handling. Firstly, it advocates for Domain-oriented Ownership. This means that data domains are entrusted to the teams or business units possessing the highest expertise in that specific domain. These teams bear the responsibility for ensuring the quality, accessibility, and privacy of their respective domain’s data. Additionally, a DM promotes the idea of Decentralized Data Products. Here data is treated as a product and each domain team is accountable for the entire data lifecycle within their domain. This encompasses tasks such as production, consumption, quality assurance, privacy measures, and comprehensive documentation. Furthermore, DMs advocate for Federated Computational Governance, an approach where each domain team defines and enforces the computational logic specific to their domain. This logic is then executed within the broader context of the mesh [
16].
To facilitate autonomy and efficiency, DM incorporates a self-serve data infrastructure. This infrastructure is designed to empower domain teams with the necessary tools and resources to independently manage their data products, reducing reliance on centralized data engineering teams. Embracing an API-first approach, DM encourages the utilization of Application Programming Interfaces (APIs) for seamless data exchange and communication between different components of the system. This promotes loose coupling and flexibility in how data is consumed and utilized.
Figure 2 presents the structure of a DM and the algorithm that serves as the core of its operation.
Furthermore, DM emphasizes a holistic view of the data product lifecycle. This encompasses stages such as discovery, ingestion, processing, storage, access, and consumption. Each of these stages is to be carefully considered and managed by the respective domain teams, ensuring a comprehensive and efficient data handling process. By adopting a DM approach, organizations aim to address the challenges of scaling data operations in a complex environment, where multiple teams work on diverse data domains. It provides a framework for decentralizing data ownership and enabling more effective, scalable, and resilient data operations.
Conversely, a Data Market is an ecosystem or marketplace where individuals, companies, or systems can buy, sell, or exchange data by leveraging the idea of DMs. Data suppliers in a Data Market offer datasets for purchase or access by data consumers for a range of applications, such as analysis, research, machine learning, and more [
7]. Data Markets facilitate the efficient sharing and monetization of data, allowing businesses to leverage external sources of information to enhance their insights and decision-making processes.
Using a large manufacturing company in the field of poultry farming and poultry meat trading as our case-study and example demonstrator, we were able to identify various operational areas, including livestock records, agricultural data, supply chain management information, financial transactions, and trading analytics. Each operational area is assigned to a specialized team responsible for its monitoring and upkeep. Moreover, each team is tasked with generating specific data products tailored to their respective domains, such as APIs for accessing data, algorithms for analyzing trading trends, tools for secure data sharing, and reporting mechanisms for financial analytics. This makes the proposed approach an ideal way of sharing portions of data across authorized groups (e.g. departments) or individuals. Adopting a federated computational governance approach ensures that each team defines and enforces the computational logic for their specific domain, facilitating the implementation of specialized algorithms and quality checks. Additionally, each team has access to a self-serve data infrastructure, equipped with tools and resources for managing their data products independently, thereby ensuring autonomy and operational efficiency. To enhance interoperability within the field, the implementation of APIs and adherence to industry standards are prioritized, allowing seamless communication and data exchange between different operational areas. This approach contributes to the optimization of data management and the creation of tailored products, ultimately benefiting stakeholders in poultry and farming trading, including producers, traders, and administrators.
It is important to note that while DM is more about organizational and conceptual principles for data management, DL refers specifically to the technology and infrastructure for storing large volumes of raw data. These concepts are not mutually exclusive, and, in practice, organizations can implement a DM framework while utilizing a DL as the underlying basic component of their technical infrastructure for data storage and processing.
2.2. Understanding Blockchain and NFTs
Blockchain serves the purpose of providing secure and transparent means for recording and transferring data. Notably, it addresses privacy concerns by anonymizing personal data, contributing to its increasing popularity and integration into infrastructure, opening avenues for innovative applications [
17]. Functioning as a decentralized database on a peer-to-peer network, Blockchain establishes a distributed communication network enabling non-trusting nodes to interact without relying on a central authority. Its protocols ensure a verifiable and trustworthy system, offering traceability, transparency, and enhanced privacy and security features. In essence, Blockchain is evolving into a fundamental technology with wide-ranging applications and use-cases such as IoT, Smart Contracts, NFTs, Cybersecurity and Cryptocurrency, providing a foundation for secure and trustworthy data transactions [
18].
Algorithmically, Blockchain includes a number of essential elements, procedures, and guidelines to create a strong and feature-rich decentralized system. Initializing basic elements, such as a consensus mechanism and cryptographic algorithms for secure key management and hashing, are the first steps in the process. Implementing token and smart contract standards like ERC-20 and ERC-721 increase functionality by managing the creation, transfer, and ownership of assets [
19]. With zero-knowledge proofs as an example, the method smoothly incorporates Decentralized Identity Standards (DIDs) to guarantee secure identification and privacy standards, offering strong user data security. Interledger Protocol and other interoperability standards also make cross-chain communication easier [
20]. The integration of decentralized storage protocols, such IPFS, ensures file storage that is dispersed and impervious to censorship. Governance norms support secure and efficient decision-making. Security measures provide protection against vulnerabilities, compliance standards guarantee conformity to legal requirements, and governance standards support efficient decision-making. This all-encompassing strategy creates a conceptual framework for the building of a Blockchain that integrates fundamental criteria, promoting a safe, compatible, and considerate decentralized ecosystem.
Non-Fungible Tokens (NFTs) are a ground-breaking innovation in the ownership and management of digital assets. Because every NFT is distinct and has a unique identifier, it cannot be copied or traded. Blockchain technology is used to accomplish this uniqueness. NFTs are used to verify ownership of a wide range of digital and physical goods [
21], including digital art, music videos, real estate, gaming avatars etc. NFTs are also crucial to Web 3.0, the next iteration of the Internet that many companies and analysts are pushing. Blockchain’s decentralized structure guarantees the integrity and transparency of ownership data, and smart contracts streamline transactions by automating tasks like ownership transfers and royalty distribution.
Finally, the NFT process algorithm starts with the digital asset being initialized, having its nature defined, and being given a unique identification. The implementation of a smart contract that oversees the NFT requires integration with a Blockchain platform, such as Ethereum, via the ERC-721 standard [
22]. The NFT is created during the minting process by adding ownership information and other pertinent metadata to the smart contract. Smart contract updates enable ownership transfers, guaranteeing safe and transparent transactions documented on Blockchain. The NFT ecosystem is made more efficient by automating features in the smart contract, such as the distribution of royalties upon resale. NFTs are posted on NFT marketplaces such as OpenSea or Rarible, where buyers and sellers can transact to make them more widely available [
21]. Verifying the integrity of related metadata and examining ownership records on the Blockchain are two steps in the process of authenticating NFTs. The foundation of the NFT lifecycle is the aforementioned algorithmic procedure, which provides a methodical way to create, transfer, and confirm ownership of distinct digital assets on the Blockchain. The whole process is depicted graphically in
Figure 3.