3.2. Methodology
The proposed methodology for dynamic visitor profiling and group formation in museums utilizes a combination of explicit and implicit profiling techniques, and smart badge technology. The methodology starts with coarse-grained profiling through OAuth registration, where visitors share basic information like age, gender, interests, etc. This profile is further refined through brief surveys conducted via a mobile app, collecting more detailed information about visitor interests and expectations.
Fine-grained profiling is achieved by monitoring how visitors interact with exhibits. This includes tracking the sequence of rooms visited and the time spent at each exhibit, offering a deeper analysis of visitor preferences and engagement. Visitors wear smart badges, which transmit telemetry data such as location and accelerometer readings, allowing for precise localization and movement tracking within the museum.
Dynamic group formation is achieved using smart visitor badges. When a visitor starts the process, the system detects nearby visitors based on badge signal strength (RSSI). Those interested in joining the group can accept the invitation by pressing a button on their badge. Once the group is formed, a composite profile is created from the common segments of individual profiles, which is then used to tailor content to the group’s shared interests. This approach enhances the social aspect of the museum experience by enabling visitors with similar interests to connect and explore together with personalized recommendations. Additionally, the museum can provide specialized tours and experiences tailored to these groups, enriching the visitor experience further.
All collected data—from OAuth registration, mobile app surveys, tracking movements, and smart badge interactions—is integrated into a visitor profile database. This data integration and analysis process enables the museum to deliver highly personalized content and recommendations to each visitor.
The museum visitor profiling service integrates several subsystems within a distributed architecture to provide a personalized and seamless experience. Central to the system is a mobile app that handles visitor registration, authorization, and profiling through brief surveys. This app communicates with the Kio Cloud via an SDK for Android and iOS, enabling real-time data transmission. The system’s architecture ensures interoperability, scalability, and security, facilitating smooth data flow and safeguarding visitor information.
By integrating these components, the proposed methodology provides a comprehensive solution for enhancing visitor experiences through personalized content delivery and dynamic group interactions. This approach not only improves engagement but also promotes social interaction, thereby enriching the overall museum visit.
3.3. Database Design
To support hybrid museum visitor profiling, the service requires a flexible database architecture. This database captures and manages extensive data on both physical and digital visitor interactions, including individual profiles, visit histories, and engagement metrics with exhibits. The service uses a MongoDB database, well-suited for this purpose due to its flexible schema design, scalability, and robust data-handling capabilities. MongoDB’s document-oriented architecture efficiently manages complex and evolving data structures, making it ideal for handling the dynamic data.
The database consists of eleven collections. The Entity Relationship (ER) diagram of the database is shown in
Figure 2.
The Visitors collection is a core element of the museum database, designed to hold critical information about individual visitors. Its primary properties, „profile“ and „preferences“, are derived from a combination of OAuth-based pseudo-explicit registration data and explicit visitor profiling using short surveys conducted during museum visits. The „profile“ property includes data obtained from the OAuth registration process, while „preferences“ reflect data collected from visitor surveys and implicit profiling. This collection is essential for tracking visitor segments, enabling targeted personalization, and facilitating in-depth analysis. It includes average viewing time (e.g., 5 minutes), pace of visit (e.g., moderate), etc. Additionally, it records their engagement with technology, use of museum apps, preferred language, and viewing habits such as time spent (e.g., Long), and group preferences (e.g., Any). It also includes information on any specific impairment types the visitor may have, which aids in accommodating their needs and enhancing their overall experience.
The Authorizations collection manages OAuth authentication details for visitors, containing key data related to their login credentials and provider information. This collection includes the following main data:
visitor_id: A unique identifier for the visitor within the database, ensuring that authorization details are correctly associated with the visitor’s profile regardless of the social network used for authentication.
auth: An array of objects, each representing authentication details for a specific OAuth provider. Each object includes the provider, which specifies the OAuth service provider used for authentication, such as „facebook” or „google.” The „provider_id” contains the unique identifier assigned by the provider, linking the visitor’s account to the external service. The „email” field stores the visitor’s email address associated with the specific provider, ensuring accurate identification across different platforms. The „access_token” field holds the encrypted access token provided by the OAuth service, used for authenticating the visitor. The „user_data” field stores user information obtained from the OAuth provider.
The „user_data” field encapsulates supplementary user information obtained from the OAuth provider during the authentication process. For example, when using OAuth registration through Facebook, this field can include detailed attributes such as public_profile, email, user_hometown, user_birthday, user_age_range, user_gender, user_link, user_friends, user_location, user_likes, user_photos, user_videos, and user_posts, depending on the permissions granted. This data facilitates the delivery of contextually relevant content and allows for the customization of interactions and communication based on the visitor’s personal attributes, preferences, and demographic information.
The Visits collection documents individual instances of museum visits. It tracks the total duration of each visit, the number of exhibits viewed, and the rooms visited. Each entry in this collection is associated with a specific visitor and badge, and it also records the groups to which the visitor belonged during the visit. This collection is crucial for maintaining a historical record of museum visits, enabling the analysis of visit patterns, durations, and group dynamics over time.
The Rooms collection maintains data about the museum’s physical spaces, including details for each room such as its name, capacity, floor number, and the building in which it is situated. It also tracks the IDs of exhibits present in each room, providing a clear association between rooms and their contained exhibits. This collection is essential for spatial analysis of visitor behavior, supporting effective museum layout planning, and optimizing space utilization.
The Room Movements collection monitors visitor movements between different rooms or areas within the museum. It records entry and exit times for each room visit, associates these visits with specific visitor trips and groups. This data is crucial for analyzing visitor flow and congestion within the museum, revealing which rooms or areas are most frequently visited.
The Exhibits collection catalogs detailed information about each exhibit within the museum. It includes data such as the exhibit’s name, artist, creation year, description, and multimedia content. Additionally, it records the exhibit’s location within the museum through a designated room ID and specific grid coordinates. The collection also tracks a popularity score for each exhibit, reflecting visitor engagement and interest. This repository is pivotal for referencing and analyzing exhibit characteristics, and it facilitates the correlation between exhibit details and visitor interactions.
The Exhibit Interactions collection records data on how visitors engage with specific exhibits. It tracks the duration and type of interaction each visitor has with an exhibit, linking these interactions to particular visits and groups. The collection also includes timestamp data, enabling temporal analysis of visitor behavior. This information is essential for assessing exhibit popularity and effectiveness, and for analyzing how different visitor segments or groups engage with exhibits.
The Badges collection oversees the physical badges issued to visitors for tracking purposes. It monitors the status of each badge (active or inactive), links badges to current visitors and their groups, and records the usage history for each badge. This collection is crucial for the proximity-based group formation feature, aiding in the management of the museum’s badge inventory and analyzing usage patterns.
The Visitor Groups collection tracks the formation and dissolution of visitor groups within the museum. It records essential details about each group, including associated visitors, badges, segments, and their current location. This collection is vital for analyzing social dynamics, such as how various group compositions interact with exhibits and each other. It shows how group size and composition impact museum experiences and helps in creating features and activities designed for groups. Understanding group behaviors, movement patterns, and exhibit preferences helps design spaces that foster social interaction and collaborative learning.
The Feedback collection is essential for refining visitor profiling and enhancing museum experiences. It records detailed input from visitors, linking feedback to specific individuals, visits, and, optionally, particular exhibits. Each entry includes qualitative comments, a quantitative satisfaction rating, and a timestamp.
The Segments collection categorizes visitors into one or more of 18 predefined segments based on their preferences and behaviors, enabling museums to deliver personalized content and experiences. This segmentation approach helps museums tailor their exhibits and interactions to meet diverse audience needs, enhancing overall visitor satisfaction and engagement.
The segment names are chosen to be easily understood by ChatGPT. The following segments are currently used:
Quick Visitors: Quick Visitors spend approximately 1-2 minutes per exhibit. They move quickly through the museum, focusing on key highlights and major attractions. They prefer quick highlights and brief overviews, favoring concise, impactful presentations. These visitors move rapidly through exhibits, focusing on the main attractions rather than in-depth exploration.
Moderate Explorers: Moderate Movers spend about 3-5 minutes per exhibit. They enjoy a balanced experience that includes both quick highlights and more detailed content. Their pace is moderate, allowing them to engage with a mix of brief and moderately detailed exhibits.
Leisurely Movers: Leisurely Movers spend approximately 6-10 minutes per exhibit. They engage more deeply with each exhibit, taking their time to explore detailed information and absorb the content. Their slower pace indicates a preference for thorough examination and contemplation of exhibits.
In-Depth Movers: In-Depth Movers spend more than 10 minutes per exhibit. They immerse themselves extensively in the details, often revisiting exhibits and taking significant time to fully engage with and understand the content. Their extended viewing time reflects a desire for a comprehensive and immersive museum experience.
Interactive Kids: This group consists of children aged 6 to 12 who are particularly attracted to interactive exhibits. Their visits tend to be brief, and the system should prioritize delivering content in the form of images, audio, and video to effectively engage this audience.
Teen Trendsetters: Teen Trendsetters are teenagers aged 13-17 years who are interested in technology and interactive media. They engage with multimedia and social features and tend to move quickly through exhibits, focusing on the latest trends and interactive elements.
Young Professionals: Young Professionals are individuals aged 18-30 years who are attracted to contemporary art, innovative exhibits, and social media. They prefer fast-paced exploration of trending exhibits and are often interested in the intersection of technology and art.
Midlife Explorers: Midlife Explorers are adults aged 31-50 who are deeply engaged with historical artifacts, detailed exhibitions, and educational content. They seek meaningful and in-depth experiences that reflect their mature perspective and extensive life experience.
Senior Art Connoisseurs: Senior Art Connoisseurs are visitors aged 51 and older who appreciate classic art, historical narratives, and in-depth stories. They enjoy leisurely visits with detailed explanations, taking their time to fully understand and reflect on the exhibits.
Mobility-Friendly Visitors: Mobility-Friendly Visitors are individuals with mobility impairments who need accessible routes and interactive aids. They are interested in exhibits with accessibility features and may prefer guided tours that accommodate their needs.
Sensory-Sensitive Visitors: This group includes individuals with sensory sensitivities, such as visual or hearing impairments. Visitors with visual impairments should be provided with audio content tailored to their needs. In contrast, visitors with hearing impairments should primarily receive visual content to ensure an accessible and engaging experience.
Language-Specific Aficionados: Language-Specific Aficionados are visitors who prefer or require exhibits in specific languages. They seek out exhibits with multilingual information or content available in their native or preferred language. They are interested in ensuring that their museum experience is accessible and comprehensible in their chosen language.
Group Collaborators: Group Collaborators are visitors who are willing to form a group. They enjoy collaborative and group-oriented exhibits and activities, valuing interactive and social experiences that allow them to engage with others.
Family Visitors: Family Visitors include groups consisting of parents and children or extended family members. They are interested in exhibits that are engaging for all age groups and may seek out interactive, educational, and family-friendly activities. They prefer exhibits that offer something for everyone and provide opportunities for family interaction and learning.
Solo Navigators: Solo Navigators prefer visiting the museum alone. They are interested in self-guided tours and personal reflection exhibits. These visitors value independence and personal space during their museum experience.
Art and Cultural Enthusiasts: Art and Cultural Enthusiasts are deeply interested in art history, cultural exhibits, and historical artifacts. They engage thoroughly with exhibits related to art and culture, seeking detailed narratives and rich historical context.
First-Time Explorers: First-Time Explorers are new visitors to the museum who seek a broad introduction to its offerings. They are interested in general overviews and introductory exhibits and may look for guidance or highlights to help them get acquainted with the museum.
Frequent Visitors: Frequent Visitors are regular patrons of the museum who come often to see new and changing exhibits, explore special collections, and access behind-the-scenes content. They appreciate the opportunity to engage with fresh displays and exclusive material.
3.4. Software Architecture
The Museum Visitor Profiling module within ExhibitXplorer service employs a microservice-based architecture to ensure scalability, flexibility, and maintainability. This architecture is designed to handle visitor data efficiently while providing a seamless and personalized museum experience.
Figure 3 shows a summary diagram of the microservices used that are directly or indirectly related to museum visitor profiling.
Visitors interact with the museum’s services through a mobile application. This app interfaces with the backend infrastructure through an API Gateway.
The API Gateway serves as the main entry point for all client interactions. It is responsible for routing incoming requests to the appropriate microservices, handling load balancing, and managing security aspects such as authentication and rate limiting. This centralized access point ensures that external communications are streamlined and secure.
The Service Mesh orchestrates communication between microservices within the system. A service mesh is an architectural layer that manages communication between microservices in a distributed application. It provides a dedicated infrastructure for handling service-to-service interactions, which allows developers to focus on business logic without having to manage communication concerns directly. The service mesh ensures these services communicate efficiently and securely, while also providing tools for monitoring and managing their interactions. Within this mesh, several modules operate.
Core Modules are essential components for the system’s operation, encompassing the following microservices:
The Auth Service plays a critical role by managing user access, ensuring that only authorized individuals can interact with the system. It handles tasks such as registration, login, permission checks, and security protocols.
The Visitor Profile microservice maintains profiles for each visitor. This service manages the core visitor data, including the VISITORS, SEGMENTS, and VISITOR_GROUPS collections. It’s responsible for handling visitor profiles, implementing visitor segmentation, and managing visitor groups. The service also maintains two mapping collections: VISITOR_SEGMENT_MAPPING (V-S MAP) and VISITOR_GROUP_MAPPING (V-G MAP). In a microservice architecture, „mapping collections” refer to database collections specifically designed to manage the relationships or associations between different entities. These collections are essential when different services need to know about relationships without directly accessing each other’s primary data collections. They store references to the IDs of the entities they link, providing a way to look up connections efficiently. In our case mapping collections allow for flexible associations between visitors and their segments or groups without tightly coupling the data.
The Content Personalization Service utilizes visitor profile data to customize recommendations and content for individual visitors. Functioning as both an aggregator and analyzer, this microservice integrates data from various other microservices to create tailored content suggestions. The service uses a Chatbot service from ExhibitExplorer to interact with ChatGPT. It transmits requests to the Chatbot service that encompass detailed information about all current visitor segments. By using these information, ChatGPT delivers highly relevant and personalized exhibit recommendations, precisely aligned with each visitor’s specific interests and preferences.
Tracking Modules are designed to capture and analyze data related to visitor behavior and interactions within the museum:
The Visit Tracking microservice monitors and records how visitors engage with exhibits and navigate through the museum. It tracks movements, interactions, and other visit-related activities to enhance visitor experiences and optimize exhibit placements. It manages the VISITS, EXHIBIT_INTERACTIONS, and ROOM_MOVEMENTS collections. To maintain relationships between visits and specific interactions or movements, it also includes two mapping collections: VISIT_EXHIBIT_MAPPING (V-E MAP) and VISIT_ROOM_MAPPING (V-R MAP). These mapping collections allow the service to efficiently record and query a visitor’s journey through the museum, including which exhibits they interacted with and how they moved between rooms.
Support Modules provide additional functionalities that support and enhance the core services:
The Badge Service manages visitor badges, which are used for access control and personalized experiences. It handles badge issuance, tracking, and associated data. It maintains the BADGES collection and includes a BADGE_VISITOR_MAPPING (B-V MAP) collection. This mapping collection is crucial for implementing proximity-based group formation, as it allows the service to quickly determine which visitors are associated with which badges at any given time. The Badge Service provides API for assigning badges to visitors and querying badge-visitor associations.
The Exhibit Service maintains detailed information about museum exhibits, including their descriptions and attributes, ensuring that accurate and up-to-date information is available. It maintains the EXHIBITS collection, storing details about each exhibit such as its description, location, and any multimedia content associated with it. The Exhibit Service provides API for retrieving exhibit information.
The Room Service manages data related to the museum’s physical layout, including room configurations and features. It maintains the ROOMS collection. The Room Service offers API for querying room information, which can be used in conjunction with the Visit Tracking Service to analyze visitor movement patterns.
The Feedback Service collects and processes visitor feedback, which is crucial for assessing visitor satisfaction and making improvements to exhibits and services. It maintains the FEEDBACK collection, storing comments, ratings, and other forms of feedback provided by visitors. The Feedback Service offers APIs for submitting new feedback and retrieving existing feedback, which can be used to improve the museum experience and contribute to visitor profiling.
Together, these modules form a system that facilitates efficient visitor profiling.
3.5. Methods
3.5.1. Group Profiling
This subsection describes the implementation of the visitor group profiling algorithm. This algorithm aims to categorize each newly formed group into relevant segments. When a new group is created or an existing visitor profile is updated, the Visitor Profile microservice activates the algorithm to analyze the segments within the group’s profiles. The group profile is then generated by aggregating segments that appear more frequently than a predefined threshold, which is set as an environmental variable.
To implement the visitor group profiling algorithm, we use the aggregation capabilities of MongoDB.
Figure 4 shows an example aggregation pipeline that is used to obtain segment statistics - how many times segments occur in the profiles of visitors from a selected group. The pipeline starts by filtering documents that match the given „group_id” (operator
$match). It then deconstructs the segments array in each document, creating separate documents for each segment (
$unwind). Next, it groups these segments by their value and counts how often each segment appears (
$group). The pipeline then sorts the segments by their count in descending order (
$sort). Subsequently, pipeline re-groups all segments into a single array, where each entry contains the segment’s ID and its count (
$group). Finally, the pipeline projects the result, outputting only the segments array without the „_id” field (
$project).
Possible methods for setting a threshold for segment selection include absolute value thresholds, percentage-based thresholds, percentile-based thresholds, and statistical techniques. Absolute value thresholds involve setting a fixed numerical value, such as retaining only segments with counts exceeding a specific number. This method is straightforward but does not consider the variability within the data. Percentage-based thresholds use a proportion of the total count or maximum value, such as retaining segments that account for at least 20% of the total. While this method scales with the dataset, it can still be somewhat arbitrary. Percentile-based thresholds use statistical percentiles, such as the 75th percentile, to determine the cutoff for segment selection. This approach considers the relative position of segments within the distribution, ensuring that only those above a certain rank are kept. On the other hand, statistical techniques like Z-scores standardize data by converting counts into a common scale where the mean is 0 and the standard deviation is 1:
where is segment count; is mean; and is standard deviation. By calculating Z-scores, the number of standard deviations a segment’s count deviates from the mean can be quantified. For instance, retaining segments with Z-scores greater than 1 involves selecting those that deviate significantly from the average, providing a statistically rigorous approach for identifying meaningful deviations.
When there are only a few visitors in a group, it is possible that no segments will have a Z-score greater than 1. In such cases, the algorithm aggregates the segments of all visitors. However, four segments are always prioritized when delivering content due to their significance: „Mobility-Friendly Visitors,” „Sensory-Sensitive Visitors,” „Language-Specific Aficionados,” and „Interactive Kids.” These segments are consistently considered when delivering content, regardless of whether the visitor is part of a group or visiting alone.
3.5.2. Visitor Similarity Estimates
Delivering personalized content using ChatGPT requires significant time and resources. To reduce requests to the Chatbot microservice, Redis is used as a content caching server. When a request for custom content is made, the system checks if the desired information is already cached. This involves searching the cache for data related to the exhibit that was obtained for visitors with similar profiles. Jaccard’s algorithm will be used for this purpose. The Jaccard similarity measures the similarity between two sets by calculating the proportion of shared elements relative to the total number of distinct elements in both sets. It is calculated by dividing the size of the intersection of the two sets by the size of their union:
where is the set containing the segments for visitor with identifier , and is the probability that visitor and visitor have similar profiling. The similarity coefficient is a number in the interval [0, 1].
The Jaccard algorithm is highly efficient when comparing sets that are relatively small to medium-sized. The time complexity of computing the Jaccard similarity between two sets is where and are the sizes of the sets. Given that the sets in our case have a maximum size of 18 segments, the Jaccard algorithm is inherently efficient due to the small size of the sets, making the intersection and union operations computationally lightweight. To further optimize performance, particularly when managing frequent queries or larger data volumes, MongoDB’s built-in operators, such as set intersection and set union, are utilized. These operators allow to perform set operations directly within the database, reducing the need to transfer data to the application layer. This approach ensures that the similarity calculations are both fast and scalable, maintaining high efficiency even as the application grows.
There are two distinct strategies for retrieving cached content, each with its own approach to leveraging stored information:
Exhibit-Centric Strategy: This strategy starts by querying the cache to check for content related to the desired exhibit. If cached content is available, the system retrieves the identifiers of the visitors for whom this content was created. It then compares the visitor’s segments with those of the cached visitors to assess similarity. If a visitor with a sufficiently similar profile is found, the corresponding cached content is retrieved. If no similar profile is found or no cached content exists, the system will generate new content using ChatGPT.
Profile-Centric Strategy: This strategy begins by identifying visitors in the database whose profiles resemble that of the current visitor. Once similar profiles are identified, the system checks the cache for any content related to the desired exhibit that was generated for these visitors. If relevant cached content is found, it is retrieved. If no suitable cached content is available, the system defaults to generating new content using ChatGPT.
In this development, the Profile-Centric Strategy is employed. This approach is favored due to its efficiency; it significantly reduces the time required to retrieve the desired information. Specifically, by identifying similar visitor profiles before querying the cache, the system effectively reduces the search space and minimizes redundant operations. As a result, extracting the required content using the Profile-Centric Strategy is accomplished in approximately half the time compared to the Exhibit-Centric Strategy, thereby optimizing the overall performance and responsiveness of the content retrieval process.
Figure 5 illustrates an example aggregation pipeline employed to implement the Jaccard algorithm in this process. This MongoDB aggregation pipeline identifies visitors whose segment data closely matches a target visitor’s segments. It first excludes the target visitor („target_visitor_id”) from the results with the
$match stage. Then, it performs a
$lookup to find other visitors who share any segments with the target visitor, creating a list of matched visitors. The pipeline then computes the intersection and union sizes between the target visitor’s segments („target_segments”) and each matched visitor’s segments using
$setIntersection and
$setUnion, respectively. The Jaccard similarity, representing the proportion of shared segments, is calculated by dividing the intersection size by the union size. Finally, it filters out the results to include only those visitors whose similarity score exceeds a predefined threshold („SIMILARITY_TH”).