The application implemented is a Progressive Web App created in TypeScript using the Next.js framework. On first access, a Service Worker is installed on the user's device and thanks to the presence of a Web App Manifest, the user is asked whether he wishes to install the application on his device, mobile or desktop.
The REST APIs it uses are implemented within the Next.js framework using TypeScript and run within Node.js, a JavaScript runtime built JavaScript V8 from Google Chrome.
The decision to use Next.js was dictated by the desire to be able to implement both the client and server components within the same project, using the same programming language. Moreover, during the development of the project, this made it possible not only to simply share software modules, but also to be able to move application logics from the client to the server in a completely transparent and immediate manner, making it possible to identify and define a valid compromise for the construction of REST APIs potentially reusable in other projects and the realization of a Progressive Web App.
Instead of JavaScript, it was decided to use TypeScript because the latter language was built as a superset of JavaScript with the addition of an optional static typing system that allows static analysis of the code and the detection and avoidance of many of the most common errors by the compiler, providing a smoother and faster programming experience.
This application is easily scalable horizontally, exploiting the PaaS (Platform as a Service) features of Heroku [128] that allow the number of instances of the application running to increase as traffic increases, while Azure Cosmos DB being a SaaS (Software as a Service) [124] can scale in a completely transparent manner to the user who only has to indicate the maximum level of service required.
The application takes care of returning to the user the Service Worker and the static resources such as JavaScript files and style sheets that make up the Single Page Application that will run on the client, as well as providing the REST API for requesting routes, and requires some environment variables to configure some of its functions.
The decision to use environment variables makes it possible to securely manage access credentials to the Cosmos DB database and allows for easy distribution. Furthermore, again with a view to verifying security and data integrity, communication between the client application running in the user's browser and the server is performed exclusively using the secure HTTPS protocol.
3.1. Innovation
A study was carried out to choose the web services and related communication protocols required to support keyword-based knowledge with existing search engine interfaces.
The use of the Linked Open Data (LOD) platform required that certain characteristics be taken into account in the choice:
- use URI (Uniform Resource Identifier) to identify a physical or abstract resource,
- use HTTP as the resource access protocol,
- representation of data in Resource Description Framework (RDF) format,
- SparQL protocol for querying
- inclusion of links to other URIs so that other resources can be discovered.
The SparQL communication protocol for querying knowledge bases is thus defined by the choice of using LODs. SparQL is an open-source standard, which, via the Internet and a query and manipulation language, operates on data represented in RDF format in a similar way to how SQL is used for relational databases. The RDF data format is a standard for representing information in the semantic web and creating typed links between arbitrary objects (associations between resources) and consists of the triplet Subject-Predicate-Object [
59].
Subject represents the main resource or entity about which a statement is being made, Predicate specifies the relationship or attribute between the subject and object, Object represents the value or object of the statement [
60].
RDF data can be represented in several formats, including RDF/XML, JSON-LD, Turtle, N-Triples, N-QuadsTriG and TriX.
RDF triples, by exploiting semantic technologies and including metadata, can be combined to form more complex knowledge graphs, known as knowledge graphs. These improve the performance of search engines by providing direct answers to common queries, being able to combine data from different sources and extracting unsolicited information in keywords. In knowledge graphs, each node and each relation is represented by an RDF triple, and like RDF data, knowledge graphs can also be queried using the SparQL language, searching for relations, patterns, related concepts, specific properties and more.
Knowledge graphs use the RDF model to represent a practical implementation of information and relationships between concepts in the real world that can incorporate not only structured data, but also semantic information and associative semantics between concepts.
By equipping the platform with a simple API, it becomes capable of connecting to various RDF databases as well as to remote SparQL endpoints for retrieving the information necessary for both the representation of cultural-historical itineraries and the storage of all the details and metadata required for data enrichment.
In addition, by exposing SparQL endpoints in turn, the platform is able to offer the general public access to the data stored in it in various formats, as well as manually enriched or reconstructed data such as customised maps and archaeological and historical data annotated with metadata and georeferenced.
Tools capable of handling the types of data managed by the platform and retrieved from search engines and other external sources such as structured and unstructured data, images, texts, etc. were also studied. MultiMedia Database Management Systems (MMDBMS) capable of managing heterogeneous and unstructured multimedia data more efficiently than a traditional DBMS and providing support to applications interacting with such information were then examined.
In particular, MMDBMSs must possess certain distinctive features:
- Support for multimedia data types i.e. possess the ability to handle multimedia data such as images, audio, video and other multimedia formats.
- Functionality for creating, storing, accessing, querying and controlling multimedia data
- Support for traditional DBMS functions, i.e. not only managing multimedia data but also providing the traditional functions of a database management system, such as database definition, data retrieval, data access, integrity management, version control and concurrency support.
Among commercial MMDBMSs, a system of interest for the platform is Virtuoso Universal Server, of which there is an open-source version known as OpenLink Virtuoso. In addition to offering multi-model functionalities in a single system, allowing the management of structured, semi-structured and unstructured data, such as relational data, RDF, XML, plain text, Virtuoso Universal Server also supports the SparQL query language (in addition to SQL) and integrates file servers, web servers, web services, and web content management systems. It also has built-in support for Dublin Core, a standard for describing and annotating digital resources by means of metadata that allows DC (Dublin Core) metadata to be associated with stored data [
61].
DC metadata is a standardised set of metadata elements used to describe digital resources such as documents, images, videos, web pages and more. This metadata provides essential information to enhance the discovery of digital resources, enabling users to find, identify and retrieve relevant content based on the information provided by the metadata. Some of the common DC elements are: Title (resource name), Author, Date, Subject, Description, Format, Type, Identifier, Language, Publisher. These elements can be extended to include more detailed and specific information as required.
3.3. Selection of places of interest
To select places of interest, a process consisting of several steps has been implemented, which, starting from a topic of interest, the possible indication of an Italian region and the number of days available, is able to offer the user a series of places to visit as illustrated in
Figure 7.
Figure 6.
Selection of places of interest.
Figure 6.
Selection of places of interest.
To start the procedure it is possible to indicate a particular topic of interest to carry out the search, possibly indicate the Italian region to narrow the search and the number of days available to limit the number of locations proposed by the itinerary. Using a SparQL query, all places are identified, which preserve cultural objects relating to the topic of interest with an indication of the number of objects identified; the results will be sorted in order to propose the places with the greatest number of objects, as detailed in the following chapters, which uses the Italian regions identified with a different SparQL query.
Unfortunately, the structure of the data used turned out to be non-homogeneous; therefore, to identify the geographical coordinates of the identified places, it was necessary to analyze various relationships present in the database and use a geocoding service to obtain the correct coordinates.
This type of query was very slow for the public SparQL endpoint used; response times are in the order of ten seconds, higher than the limit of 2 seconds that a Web user is willing to tolerate [
52]. Therefore it was necessary to implement a persistent cache system that allows faster access to the results.
Once the places have been obtained, with the indication of their position and the number of interesting objects preserved, it is then possible to group them by the cities in which they are preserved, in order to be able to identify the cities that preserve the greatest number of cultural assets ; we continue with the identification of services and accommodation available nearby, through geographical queries as will be detailed below.
3.3.1. Search for culture points
By analyzing the data contained in the Catalog of Cultural Heritage, the CulturalProperty class of the Arco ontology was identified; this class represents both a material and immaterial cultural asset, recognized as part of the national cultural heritage and useful for the knowledge and reconstruction of history and landscape. For the query, all the subclasses that possess the searched topic were considered.
Once the cultural assets have been selected, considering the relationship hasCulturalInstituteOrSite [
53], defined in the Location ontology of Arco [
54], which connects a cultural asset to its container (place or institute of culture), we can obtain the places where they are preserved . Those that are deprecated by the presence of the owl:deprecated value are automatically excluded.
By possibly filtering by the selected region, grouping the result obtained by cultural place and counting the cultural objects, the requested data were extracted together with the indication of the city in which they are located.
3.3.2. Identification of the Italian regions
In the previous chapter the cultural assets were filtered by region to which they belong. For this reason it was first necessary to identify which were the Italian regions. From an analysis of the data, the regions were found not to be uniquely represented in the Cultural Heritage dataset, but present the critical issues already highlighted.
3.3.3. Non-homogeneous data
A very obvious example concerns geolocation, in which two ontologies Location Ontology and Basic Geo (WGS84 lat/long) Vocabulary [
55] are used to represent latitude, longitude and other spatial information. Geolocation information was associated inconsistently, using different data hierarchies.
These issues require the implementation of more complex queries that are capable of managing multiple different ontologies on the one hand and examining different data structures to extract information on the other. Clearly, as the number of different strategies used in the query increases, the system proportionally requires significantly more time.
In the presence of non-homogeneous data, it is not possible to guarantee the exhaustiveness of the data obtained, because it is not possible to implement an exhaustive query, as neither the number of different ontologies used nor the hierarchy of classes used are known a priori.
For example, within the dataset the geographical coordinates of a CulturalInstituteOrSite can be obtained in various ways:
• directly from the lat and long coordinates of the Basic Geo ontology;
• through the hasSite relation which takes us to the Site class which has the lat and long coordinates of the Location Ontology ontology;
• through the hasGeometry relation of the Italian Core Location Vocabulary (CLV) ontology [
56] towards the Geometry class which has lat and long coordinates of the Location Ontology ontology;
• through the hasGeometry relation of the CLV ontology towards the Geometry class and then through the hasCoordinates relation of CLV towards the Coordinates class which has lat and long coordinates of the Location Ontology ontology;
• using the coordinates associated with a cultural property to which the same siteAddress class is associated, associated with the Site, associated with the CulturalInstituteOrSite class.
After querying these sources, we need to merge all these results into a single pair of lat and long coordinates so that we can return the information.
To avoid exceeding the maximum execution time of a SparQL query in the database, it was necessary to execute the individual queries separately; subsequently is implemented within the application the logic to identify the geographical coordinates or, in their absence, the address to proceed with geocoding.
3.3.4. Geocoding
To obtain the spatial coordinates absent in the Cultural Heritage dataset it is necessary to use Geocoding; available information are used (name of the site, address of the site) which have similar problems to those previously exposed.
In order to obtain good quality results, for each site subjected to geocoding, three separate geocoding operations are carried out. Different values and parameters are used to then sort the results obtained in accordance with the importance parameter returned by the service itself.
The geocoding service used is OpenStreetMap's Nominatim, which has a limit of one request per second. To limit excessive consumption of service resources, the results are stored in the application cache.
3.3.5. Itinerary creation
Once a list of geolocalized cultural places has been obtained, ordered by number of cultural assets, the stages of the itinerary must be defined. To simplify the algorithm, it was assumed that only one city could be visited per day and a parameter was introduced to select how many cultural places can be visited each day. With these simplifications, a recursive algorithm has been implemented that implements the following logic:
• group cultural places by city and sort the list in descending order;
• remove the first N cultural places with multiple objects from the list to create a stop on the
itinerary, where N is the maximum number of cultural places that can be visited per day;
• repeat the algorithm, if cultural places are still present.
From this list it is then sufficient to select the number of days to obtain the requested itinerary.
3.3.6. Selection of services and accommodation
To create an itinerary that can be used by users, once the sites to visit have been identified, it is necessary to provide information relating to the services available and the accommodation available for overnight stays. Selection criteria were therefore implemented such as style, which determines the type of services and accommodation to be shown, and the distance from places of interest as in
Figure 7.
Figure 7.
Selection of services.
Figure 7.
Selection of services.
Using OpenStreetMap data, the search for services and accommodation is carried out with two alternative implementations, one based on Sophox and one via Overpass.
For both services it was possible to use the same criteria as they both rely on the same database.
For style-based accommodation selection, hotels are divided according to their stars.
Other types of accommodation were also considered: 'chalet', 'apartment', 'hostel', 'guest_house', 'motel', 'camp_site', 'alpine_hut', 'wilderness_hut'.
As regards the selection of services, the classification shown in the following code fragment was used:
Style.Luxury homes:
return ['restaurant']
Case Style.Medium:
return ['restaurant', 'food_court', 'pub', 'cafe']
Style.Budget homes:
return ['pub', 'cafe']
Sophox is queried via geoSparQL queries; in addition to the implemented filters,
using the illustrated information, the distance in kilometers from a point is indicated.
Overpass querying was implemented using Overpass QL; in addition to the filters implemented, using the information illustrated, the distance in meters from a point is indicated.
3.3.7. Cache
For all the services mentioned, a simple persistent cache system has been implemented both to improve their use and to respect the conditions of use of the services themselves.
Before sending the request to the service, the cache is queried and if the response is already present in the cache, the value present in the cache is returned without making the request to the service. If the value is not present in the cache, the request is instead sent to the service and its response, even in the event of an error, is saved in the cache before being returned.
To avoid showing users stale data, a TTL (Time To Live) control has been implemented which clears the cache when it is old.
To maximize performance for each type of request, a separate container is used, created on the first request, indexed using the same field as the unique identifier of the request.
The cache data is saved in the Microsoft CosmosDB system [
57] which offers:
- NoSQL Database features with a simple API
- an availability level of 99.999%
- a free level of service more than adequate for the needs of this project.
All the various services have been implemented in such a way as to make it possible to use the application without cache if the credentials for using CosmosDB are not provided.
3.3.8. WebGIS presentation
The generated itinerary is shown in
Figure 8, using Leaflet together with OpenStreetMap cartography.
3.3.9. Layers
There are different levels in the map, updated independently to increase the performance and responsiveness of the application; information is displayed progressively as it becomes available.
The different levels are as follows:
• Current position: the current position, represented with a blue icon, is obtained using the Geolocation API [
58], supported by all the main browsers, asking the user for permission;
• Raster map: the OpenStreetMap service provides the tiles used for the map;
• Cultural Sites: The identified cultural places are represented with a green icon;
• Accommodations and Services: The accommodations and services are extracted as indicated and are represented in two separate layers; they are indicated respectively with a red and a yellow icon and can be deactivated using the control at the top right;
• Itinerary: The itinerary to visit all the cultural places shown constitutes a polygon separated from the other levels and is decorated with triangles to indicate the direction of the route.
To avoid duplication in the code, the layers relating to the Cultural Sites, Accommodations and Services were implemented using the same generic software capable of representing a set of points of interest.
3.3.10. Routing
The polygon, which represents the optimal sequence with which to visit places of interest, is generated using the OpenSource Routing Machine service, providing the coordinates of the Cultural Sites and, if available, the user's location.