Introduction
Web tracking (WT) systems are technologies that automatically record data and behaviors of internet users. Their utilization by service providers is growing rapidly. WT tools can be examined from various perspectives, including company profits, privacy protection, information security, and system architecture, [
1]. For companies, WT tools are invaluable for collecting data from internet users. By analyzing this data, companies apply personalization techniques and optimize marketing and advertising strategies, [
2].
However, WT technologies also pose a threat to individual privacy rights. Internet users may be aware of how companies process their data, especially when it is consciously shared (e.g., through online or registration forms). Research indicates that users often lack awareness of the extent of personally identifiable information they share or who can access it—even when shared consciously. Awareness is even more limited when data is collected via WT technologies without users' knowledge.
Another critical aspect is the use of security mechanisms, like cryptography, to protect against unauthorized data disclosure or modification through WT tools. Additionally, the architecture of tracking systems is crucial, as the knowledge they produce depends on how each system operates in the background, [
3,
4].
Most WT research focuses on privacy concerns, highlighting the lack of understanding about the technical workings of these systems. There is also a lack of consistent classification schemes for WT systems. For example, [
5,
6], introduced a classification framework based on observable behaviors and properties, such as a tracker’s scope (within-site or cross-site tracking). Similarly, [
6,
7,
8], developed a scheme based on how websites use WT tools. Such works are essential for better understanding WT systems. Currently, most internet users have limited familiarity with WT systems (beyond cookies) and possess only basic knowledge of these technologies, as shown by recent surveys, [
5,
9,
10,
11,
12].
WT systems are web software tools that collect data and store it in files or databases. This data is transmitted through internet communication protocols, such as HTTP, during interactions between a web client (e.g., a browser) and a web server. The process begins with a request from the web client. For example, when a web client requests to view a website (i.e. the front end technologies consisting of: static HTML and CSS or dynamic webpages of JavaScript, and HTML/CSS files), it sends information through the HTTP protocol, [
13].
If the client does not send its IP address, the server cannot know where to deliver the website. This information is added to the HTTP protocol in text form. After the client gathers the requested data, the server sends the website to the client and stores the data received from the client (e.g., IP address, operating system details). These data points can be saved on either the client’s or the server’s side.
Data can also be transferred via APIs (application programming interfaces), which are sets of subroutine definitions, communication protocols, and programming tools designed to enhance software functionality. APIs make tracking significantly easier and more efficient. There are various methods to track user data, which will be examined in the following sections, [
7,
14].
Materials and Methods
Materials
The proposed classifying tracking systems is not very common in the literature, even though it can provide a better understanding on those systems. A classification scheme allows the separation of certain objects into categories, according on specified criteria. The current paper investigates WT systems based on how they work and their technological architecture and for that reason the proposed classification is developed based upon this. It’s clear that HTTP and APIs are the main methods of tracking, either if the data are stored on client either on server. Moreover, fingerprints are usually created by APIs or scripts, but it would be better to separate them from any other WT system. That’s because fingerprints don’t collect data, but it’s an exclusive category of identifying internet users in order to help other tracking mechanism. And there are more complicated tracking systems as well, combining all the previous tracking methods. Hence the current scheme proposed 6 classes, which are represented in
Table 1. Lastly to mention, this scheme is not based on any other previous research effort.
Class A – HTTP Tracking
This class includes technologies that rely on the HTTP protocol to collect data, often using HREFs, URLs, or HTTP GET requests. Code must be executed to store the data in files or databases, which can run either on the server or client. The data storage location isn’t necessarily tied to where the code is executed. For example, code may run on the server, while data is stored on the client. Users can interact with these technologies by analyzing executable code and HTML files. Most tracking technologies depend on HTTP, as it facilitates basic communication between the server and client. However, this class excludes more complex, combinational methods. Therefore, Class A includes the following technologies:
- a)
Web Form Authentication
- b)
Session IDs in hidden fields
- c)
HTTP Cookies
- d)
HTML
- e)
Web Beacons
- f)
Email Tracking
Class A can be further divided into A1 and A2. A1 consists of technologies that store data on the client’s computer, while A2 refers to technologies that store data directly on their own servers. Thus, a), b), c) belong to A1, while d), e), f) are assigned to A2.
Class B – API Tracking
Technologies in Class B collect and store data in storage structures (databases or files) using APIs or other interfaces that mediate communication between objects and their environment. These interfaces often rely on low-level operating system APIs (such as Layered Service Providers), application APIs (e.g., HTML5, Adobe Flash), and browser interfaces (e.g., IConnectionPoint, COM). Users interact with these systems through web platforms, applications, or web pages. Implementing these tracking systems requires significant coding and a deep understanding of how APIs function. In conclusion, Class B includes the following technologies:
- a)
Flash cookies and local shared objects
- b)
Persistent cookies, zombie cookies, evercookies
- c)
Local Based Services
- d)
HTML5
- e)
Taint tracking
- f)
Spyware
Class B can be divided into B1 and B2. B1 includes tracking systems that use their own software APIs, while B2 comprises WT systems that rely on third-party APIs from programs, operating systems, or browser interfaces. Thus, a), b), c), d) belong to B1, while e), f) belong to B2.
Class C – User Identification
Class C includes WT techniques that identify specific internet users. These systems collect data to create fingerprints, which uniquely identify users. A script or API generates a fingerprint based on collected data, storing it in cookies, local shared objects, or other storage. This process can be executed via simple scripts or APIs. The unique identifiers distinguish users as visitors to web pages or web browsers. Users may encounter these identifiers in cookies or local shared objects, often encrypted and difficult to recognize. Therefore, Class C includes the following systems:
- a)
Canvas fingerprinting
- b)
Synchronization of cookies
- c)
E-tags
Clarification on Categorization Criteria
It is noted that the classification of each tracking technology into a specific category was based on its core mechanism, i.e., its operational properties and the generic technological layering it relies on. For instance, session IDs in hidden fields are placed under Class A (HTTP Tracking) because they operate through strictly HTTP protocol and do not require any scripting or external software entities. In contrast, persistent cookies do not operate purely through HTTP protocols, as, for example, zombie cookies or evercookies are included in Class B (API Tracking) as they depend on browser-based APIs or even client-side scripts (e.g., Flash or HTML4) to recreate or manipulate the stored data in order to extend their functionality beyond the strict boundaries of HTTP behavior. As such, this distinction reflects whether a system may function using straightforward protocol-level interactions or ir may employ a more complex interface and programming logic for tracking.
Methods
Based on the four classes analyzed above,
Table 1 showcases a comparison of these different classes. In this section, we compare our work with two previous studies. In [
5], the authors introduced a classification framework, classifying web trackers based on tracking behaviors and properties. Specifically, the scheme identifies 5 behaviors and 7 properties, primarily observable from the client’s side. A web tracker is classified into a behavior if it exhibits at least one of the specified properties, as detailed in
Table 2 and
Table 3, [
5].
Another classification scheme was proposed in Imane Fouad’s research to identify tracking domains and their interconnections. The authors in [
7], focused on analyzing WT webpages using web beacons (pixel tags) by detecting these beacons. The arrows in [
7], representing potential relationships between categories in a stateful crawl, suggest that categories within each class likely interact sequentially, as shown in
Table 4.
By crawling 829,349 webpages and detecting web beacons, a tracking classification framework was developed, categorizing web trackers based on their behavior. This scheme consists of 4 classes and 7 categories, each explaining how the WT domain operates, its impact on user privacy, and statistical results from the research dataset. A web tracker using web beacons is classified based on its tracking behavior, as outlined in [
7].
The Franziska Roesner classification scheme (
Table 2 and
Table 3) focuses on web tracker behaviors and properties, primarily high-level criteria observable from the client (user’s browser). In contrast, our scheme (
Table 1) relies on lower-level criteria, emphasizing programming implementation and technological background, mostly observable from the server’s side. Additionally, Imane Fouad’s scheme (
Table 4) covers more complex, non-basic WT systems based on the perspective of website usage. Our classification (
Table 1), however, includes simpler mechanisms like HTTP and session IDs in hidden fields, classifying software rather than the website’s tracking methods. In conclusion, the differences between these classification schemes stem from differing research perspectives, providing a more comprehensive understanding of WT mechanisms, [
5,
7].
The next section will discuss internet users’ awareness and their perspective on WT systems. The classification of WT systems in this research (
Table 1) has limited relevance to users’ perspective, as it focuses on observable behaviors from the browser rather than what occurs on the server side. For example, users may notice cookie notifications but are less aware of fingerprints. Consequently, the most recognized WT tools are cookies (96.9% familiarity) and GPS (68.9%), while other tools are 30% less well-known (Appendix, Questions 7, 18). In conclusion, previous research and classification efforts are more aligned with users’ perspective, which relies on observable behaviors from the client side. In contrast, this paper’s classification, focused on server-side criteria, is less related to users’ awareness.
Results and Discussion
The results are twofold. Firstly, we specify the questionnaire structure, followed by an explanation of the study’s metrics and statistics.
It is noted that, to gain a detailed understanding of the user’s knowledge and perceptions so as to analyze and explain what web tracking systems mean to them, the questionnaire was carefully designed with both structure and accessibility in mind. Specifically, it consisted of clear, concise questions formulated to avoid technical jargon or other misconceptions so as to ensure that participants, even without a strict technical or computer science-related background, could respond in a meaningful manner. As such, the survey included demographic items followed by targeted questions assessing both factual knowledge (identification of cookies/tracking methods) and subjective awareness (e.g., perceived danger or familiarity with terms). Multiple-choice questions were balanced with open-ended options, allowing for both quantitative future analysis and richer user insights. Our aim was not to merely explore what users know, but also to try to understand how they interpret and react to WT technologies.
Questionnaire Structure
In this research, an online questionnaire survey was conducted using Google Forms, with 1,032 participants from Greece. Different types of surveys offer varying advantages and disadvantages depending on the research goal. For our study, we chose an online questionnaire via Google Forms to reach as many internet users as possible and ensure ease of data collection. This approach was simple to create and share online, guaranteeing anonymity and no time pressure for participants (allowing them to review questions as needed). The questionnaire aimed to explore how much internet users know about WT systems and the factors influencing their knowledge. Additionally, we examined whether a lack of awareness about WT systems affects users’ privacy concerns.
To ensure the questions were understandable for all participants, regardless of their technical knowledge, we avoided jargon and technical terms, opting for small, closed questions (up to 21 words). Most questions included open-ended answers, such as “other, describe in a few words.” The data was analyzed using distributional descriptions, statistically comparing groups (e.g., correct vs. incorrect answers) and identifying possible associations between variables. The questionnaire began with demographic questions, including gender, age, daily internet usage, and computer science background (Appendix, Questions 1-4) to identify factors influencing awareness of WT systems. It then included four multiple-choice questions on WT systems, each with three incorrect answers and one correct option, such as: “Do you know what cookies are? a) Text files b) Software c) Virus d) I don’t know” (Appendix, Question 8). Some multiple-choice questions were more subjective or debatable, such as: “Which of the following tracking systems do you know? a) Local-based service b) Flash Cookies c) …” (Appendix, Question 18), [
50,
51,
52].
Lastly, regarding the survey methodology used, it is noted that participants were reached online using anonymous sharing, i.e., sharing of Google Forms; thus, we aimed for wide accessibility and voluntary participation. Specifically, we paid extra attention to ensure that the questionnaire was intentionally simple so as to avoid bias from technical terminology, and closed-ended questions were used as a means to facilitate statistical comparison. However, it must be noted that the sampling method may introduce limitations via this mode of operation, as it may lead to self-selection bias, and thus the sample may not fully represent the broader spectrum of the population. As such, while descriptive statistics and basic comparison were used as a means to analyze responses, in future analysis of a bigger sample, it should also help to include regression models to quantify the impact of different variables on user awareness and concern.
Structure Analysis
The first objective was to uncover how much internet users know about tracking systems. The average correct answers across four multiple-choice questions was 37.63%, although only 26.4% claimed to understand how WT systems work (Appendix, Question 6). People appear aware of the data websites collect and the reasons behind it, yet only 25.2% are familiar with the WT systems (33.6% including cookies) that websites employ (Appendix, Questions 16-19). In conclusion, internet users have some knowledge about tracking systems, but at a very basic level.
The second goal of the questionnaire was to identify the most significant factor affecting internet users’ awareness (gender, age, daily internet usage, and relation to computer science). To achieve this, queries in SQL were utilized within an Oracle Apex database. The sum of correct answers was calculated for each factor. For example, if 51% of men answered correctly to Question 8, and 54% answered correctly to Question 9, the sum for men would be 51% + 54% + ... The goal was to assess the deviation between lower and higher correct answers for each factor. For the “Age” factor, 13-18-year-olds had the lowest sum of correct answers (446%), while 19-24-year-olds had the highest sum (540%). This indicates that younger people know the least, whereas 19-24-year-olds know the most about WT systems. The deviation is the difference between these percentages, 540% - 446% = 94%, indicating that age affects awareness by 0.94. Similarly, gender affects awareness by 0.78, daily internet usage by 0.7, and relation to computer science by 1.42. These numbers provide a comparative measure of how each factor influences correct answers. Thus, the “relation to computer science” emerged as the most important factor affecting internet users’ awareness of WT systems.
The third and final objective was to examine whether internet users' lack of awareness correlates with increased concerns about WT tools. To accomplish this, SQL queries were used to identify users who answered with the term “virus” (i.e., “cookies are viruses” from Appendix, Question 8). These users (203 in total) are assumed to be concerned about WT systems. The average of correct answers for those concerned is 0.39 (let A1). By repeating this process for users not concerned about WT systems, the average of correct answers is 0.37 (let A2). If A1 significantly differed from A2, it would suggest that concerned users have less knowledge about WT systems than non-concerned users, implying that awareness reduces concerns about WT systems. However, since A1 and A2 do not differ significantly, it is concluded that awareness about WT systems is independent of concerns regarding WT tools. While initial results showed that awareness does not significantly correlate with concern about data misuse (as the average correct answers between concerned and unconcerned users were 0.39 and 0.37 respectively), further statistical analysis could help validate this finding. Future work could involve the use of inferential statistical methods such as hypothesis testing or regression models to confirm the absence or presence of statistically significant differences between user groups. This would provide a more robust understanding of whether concern is indeed independent of awareness or influenced by other hidden variables.
Conclusions
This paper provides a deeper understanding of the technological background of existing WT systems, which is largely underexplored in the literature. Other studies often examine aspects such as privacy impact, but they rarely analyze how these systems function. The current research focuses on the technological foundation of WT systems and introduces a new classification system based on their technical workings. We then compare our classification with prior systems. Other classification schemes can be developed depending on the perspective from which WT systems are studied (e.g., privacy impact, security, algorithmic efficiency), leading to more comprehensive knowledge of these systems. Furthermore, there may be other WT systems not covered by this study, especially those with research or commercial contexts. A deeper investigation into these systems will foster the development of more efficient WT technologies across various domains.
This paper also examines internet users’ knowledge of WT systems through an online survey, identifying the key factors influencing their awareness. We found that “relation to computer science” is the most significant factor affecting knowledge, surpassing other variables like gender, age, and daily internet usage. Our findings reveal that internet users’ understanding of WT systems remains at a very basic level. Additionally, the study shows that a user’s lack of awareness doesn’t necessarily translate into heightened concerns about WT systems. However, the paper does not address how users can acquire a better understanding of these technologies. As WT systems continue to evolve, finding effective ways to increase awareness—especially among those not familiar with computer science—remains essential.
Although this study focuses on the technological classification of web tracking systems and excludes the organizational and privacy implications by design, it is important to note that many of the tracking methods analyzed may pose significant security and privacy risks. Specifically, we must acknowledge that, techniques such as persistent cookies, fingerprinting, spyware, and cookie synchronization can be used to track users across websites without explicit consent, thus potentially providing a gateway for unauthorized profiling or data exploitation. As such, while these concerns were not the central aim of this research, future studies should expand on the proposed classification scheme by integrating risk assessment or evaluating the ethical implications of the technological categories.
Lastly, it is also acknowledged that several WT systems used in this research, or commercial environments may not be fully covered in this study. Analytically, although the classification is based on widely used and technically established tracking methods, emerging technologies such as AI-enhanced fingerprinting techniques or blockchain-based tracking mitigation strategies, and specifically tracking mechanisms embedded in IoT ecosystems, represent promising areas for future analysis. As such, including such systems could provide a broader classification that would strengthen the taxonomy and categorization, but also make the study more adaptable to current and evolving digital environments. Additionally, understanding the technical characteristics of such systems can also support regulatory efforts, such as improving GDPR compliance mechanisms. This is particularly important, as it helps scientists define clearer standards for what constitutes tracking and ensures that new technologies are addressed by data protection frameworks.
Author Contributions
The authors equally contributed in the present research, at all stages from the formulation of the problem to the final findings and solution. Validation, Writing—review, Data curation, Conceptualization, Project administration, Supervision: Aggeliki Tsohou; Methodology, Formal analysis, Investigation, Writing, Citation —original draft preparation: Theofanis Tasoulas; Validation, Writing—review and editing, Resources, Citation editing, Template Formatting: Alexandros Gazis.
Funding
No external funding was received for this research.
Availability of Data and Materials
Data supporting the results of this study are available upon request from the corresponding author.
Conflict of Interest
The authors have no conflicts of interest to declare that are relevant to the content of this article
Appendix
List of Abbreviations
| Abbreviation |
Full Term |
| WT |
Web Tracking |
| IoT |
Internet of Things |
| API |
Application Programming Interface |
| HTTP |
HyperText Transfer Protocol |
| HTML |
HyperText Markup Language |
| CSS |
Cascading Style Sheets |
| OS |
Operating System |
| GPS |
Global Positioning System |
| COM |
Component Object Model |
| VM |
Virtual Machine |
| IPC |
Inter-process Communication |
| DB |
Database |
| SQL |
Structured Query Language |
| ID |
Identifier |
| NoSQL |
Not Only SQL |
| GET |
HTTP GET method |
| POST |
HTTP POST method |
| GDPR |
General Data Protection Regulation |
| CSS |
Cascading Style Sheets |
| HTML5 |
HyperText Markup Language version 5 |
| JSON |
JavaScript Object Notation |
Survey Questions
In this section we showcase the Survey’s Questions:
☐ Man
☐ Woman
- 2)
What is your age?
☐ 0 to 12
☐ 13 to 18
☐ 19 to 24
☐ 25 to 38
☐ 39 to 55
☐ 56 and above
- 3)
How much time do you spend online?
☐ I rarely use the internet
☐ I use the internet sometimes
☐ 0 to 2 hours a day
☐ 2 to 5 hours a day
☐ More than 5 hours a day
- 4)
What is your relation with Computer Science?
☐ I have not studied Computer Science
☐ I have not studied Computer Science, but my profession is related to it (e.g. I do some programming)
☐ I currently study Computer Science at University or college
☐ I have a degree (bachelor, master degree, etc.) in Computer Science
- 5)
Do you know that the websites you visit online record data about you? (e.g. When did you visit a web site and from which device (smartphone, computer, etc.)?
☐ Yes
☐ No
- 6)
Do you know how this is done?
☐ Yes, I know
☐ Yes, but just a bit
☐ No, I don’t know
Have you ever heard of cookies?
☐ Yes
☐ No
- 7)
What do you think cookies are?
☐ They are text files
☐ They are software
☐ They are viruses
☐ I don’t know
☐ Other (please describe in a few words)
- 8)
Do you think cookies are dangerous?
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
Do you think a website can collect your information if you just view a picture online?
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
Do you think a website can collect information about you through HTML (without cookies)?
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
- 9)
Do you think two or more sites can share cookies?
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
- 10)
Do you think it’s easy for a website to distinguish two or more people? (e.g. to distinguish George from Maria)
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
Do you think there are ways to prevent a site from tracking you?
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
- 11)
Do you think Ad blocker or incognito mode makes it difficult for a site to track you?
☐ Yes
☐ Probably
☐ No, I don’t think so
☐ I don’t know
- 12)
Do you think it’s possible to track you when they send you an email? If so, how? (select one or more answers)
☐ Yes. They use web software (e.g. in JavaScript).
☐ Yes. They use viruses.
☐ Yes. They use images.
☐ Probably
☐ No, I don’t think so
☐ I don’t know
- 13)
What information do you think a site can collect from you? (select one or more answers)
☐ Date and time I visited a site
☐ Browsing history (previous visits to other sites)
☐ Location (e.g. city, home address)
☐ Browser (e.g., Google Chrome, Mozilla Firefox)
☐ Operating system (e.g., Windows, IOS, Android)
☐ IP address
☐ Data from other cookies on my computer
☐ Passwords
☐ Credit card numbers
☐ Files from my computer (e.g., photos, videos, music)
☐ I don’t know
- 14)
Do you know any of the following tracking systems? (select one or more answers)
☐ GPS or other Local Based Service
☐ Flash cookies
☐ Spyware
☐ Email tracking
☐ Web beacon (or Web bug, tracking bug, pixel tag)
☐ HTML5 Tracking
☐ Taint Tracking
☐ I don’t know any of them
- 15)
Finally, why do you think websites collect your personal data? (select one or more answers)
☐ Maybe they want to hurt me
☐ For security reasons and to know that I am not doing anything illegal on the internet
☐ To improve their services
☐ To make money, mainly through advertising
☐ To sell my information to third parties and make money
☐ I don’t know
☐ Other (please describe in a few words)
References
- Samarasinghe N, Mannan M. Towards a global perspective on web tracking. Computers & Security. 2019 Nov 1;87:101569. [CrossRef]
- Kusumojati PP, Mediawati E. Web-Based Asset Management Information Systems in Higher Education. International Journal of Business, Law, and Education. 2024 Jan 21;5(1):398-411. [CrossRef]
- Bujlow T, Carela-Español V, Sole-Pareta J, Barlet-Ros P. A survey on web tracking: Mechanisms, implications, and defenses. Proceedings of the IEEE. 2017 Mar 6;105(8):1476-510. [CrossRef]
- Pilton C, Faily S, Henriksen-Bulmer J. Evaluating privacy-determining user privacy expectations on the web. Computers & Security. 2021 Jun 1;105:102241. [CrossRef]
- Roesner F, Kohno T, Wetherall D. Detecting and defending against {Third-Party} tracking on the web. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) 2012 (pp. 155-168). https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/roesner.
- Utz C, Amft S, Degeling M, Holz T, Fahl S, Schaub F. Privacy rarely considered: Exploring considerations in the adoption of third-party services by websites. arXiv preprint arXiv:2203.11387. 2022 Mar 21. [CrossRef]
- Fouad I, Bielova N, Legout A, Sarafijanovic-Djukic N. Missed by filter lists: Detecting unknown third-party trackers with invisible pixels. arXiv preprint arXiv:1812.01514. 2018 Dec 4. [CrossRef]
- Fouad I, Santos C, Laperdrix P. The Devil is in the Details: Detection, Measurement and Lawfulness of Server-Side Tracking on the Web. In 24th Privacy Enhancing Technologies Symposium (PETS 2024) 2024 Jul 15 (Vol. 2024, No. 4). https://hal.science/hal-04617727/.
- Singh N, Do Y, Yu Y, Fouad I, Kim J, Kim H. Crumbled Cookies: Exploring E-commerce Websites? Cookie Policies with Data Protection Regulations. ACM Transactions on the Web. 2024 Jan 11. [CrossRef]
- Pitkänen O, Tuunainen VK. Disclosing personal data socially—An empirical study on Facebook users' privacy awareness. Journal of Information Privacy and Security. 2012 Jan 1;8(1):3-29. [CrossRef]
- Soumelidou A, Tsohou A. Towards the creation of a profile of the information privacy aware user through a systematic literature review of information privacy awareness. Telematics and Informatics. 2021 Aug 1;61:101592. [CrossRef]
- Wakefield RL, Wakefield LT. How do job seekers respond to cybervetting? An exploration of threats, fear, and access control. ACM SIGMIS Database: the DATABASE for Advances in Information Systems. 2024 Feb 6;55(1):136-55. [CrossRef]
- Byrne J, Heavey C, Byrne PJ. A review of Web-based simulation and supporting tools. Simulation modelling practice and theory. 2010 Mar 1;18(3):253-76. [CrossRef]
- Castell Uroz I. A novel approach to web tracking detection and removal with minimal functionality loss. UPCommons. 2023. [CrossRef]
- Cahn A, Alfeld S, Barford P, Muthukrishnan S. An empirical study of web cookies. In Proceedings Of The 25th International Conference On World Wide Web 2016 Apr 11 (pp. 891-901). [CrossRef]
- Rasaii A, Singh S, Gosain D, Gasser O. Exploring the cookieverse: A multi-perspective analysis of web cookies. In International Conference on Passive and Active Network Measurement 2023 Mar 10 (pp. 623-651). Cham: Springer Nature Switzerland. [CrossRef]
- Kretschmer M, Pennekamp J, Wehrle K. Cookie banners and privacy policies: Measuring the impact of the GDPR on the web. ACM Transactions on the Web (TWEB). 2021 Jul 15;15(4):1-42. [CrossRef]
- Ioannou A, Tussyadiah I, Miller G. That’s private! Understanding travelers’ privacy concerns and online data disclosure. Journal of Travel Research. 2021 Sep;60(7):1510-26. [CrossRef]
- Ermakova T, Fabian B, Bender B, Klimek K. Web tracking-A literature review on the state of research. HICSS. 2018. https://aisel.aisnet.org/hicss-51/os/information_security/5/.
- Belloro S, Mylonas A. I know what you did last summer: New persistent tracking mechanisms in the wild. IEEE Access. 2018 Sep 11;6:52779-92. [CrossRef]
- Kia AN, Murphy F, Sheehan B, Shannon D. A cyber risk prediction model using common vulnerabilities and exposures. Expert Systems with Applications. 2024 Mar 1;237:121599. [CrossRef]
- Debnath N, Jain AK. A comprehensive survey on mobile browser security issues, challenges and solutions. Information Security Journal: A Global Perspective. 2024 May 1:1-20. [CrossRef]
- Kirchner R, Koch S, Kamangar N, Klein D, Johns M. A Black-Box Privacy Analysis of Messaging Service Providers’ Chat Message Processing. Proceedings on Privacy Enhancing Technologies. 2024;3:1-8. https://www.semanticscholar.org/paper/A-Black-Box-Privacy-Analysis-of-Messaging-Service-Kirchner-Koch/9408a90d2d4c0df4caaf7c398e156bc993170f96.
- Stafford TF, Urbaczewski A. Spyware: The ghost in the machine. The Communications of the Association for Information Systems. 2004 Sep 7;14(1):49. [CrossRef]
- Naser M, Albazar H, Abdel-Jaber H. Mobile Spyware Identification and Categorization: A Systematic Review. Informatica. 2023 Sep 28;47(8). [CrossRef]
- Wengelin Å, Johansson V. Investigating writing processes with keystroke logging. In Digital Writing Technologies in Higher Education: Theory, Research, and Practice 2023 Sep 15 (pp. 405-420). Cham: Springer International Publishing. [CrossRef]
- Englehardt S, Han J, Narayanan A. I never signed up for this! Privacy implications of email tracking. Proceedings on Privacy Enhancing Technologies. 2018. [CrossRef]
- Fabian B, Bender B, Hesseldieck B, Haupt J, Lessmann S. Enterprise-grade protection against e-mail tracking. Information Systems. 2021 Mar 1;97:101702. [CrossRef]
- Zheng J, Sun Y. Emailtracker: An Intelligent Analytical System To Assist Email Event Tracking Using Artificial Intelligence And Big Data. Proceedings OF CS & IT - CSCP. 2022; 271-276. Https://Doi.Org/10.5121/Csit.2022.121222.
- Jiang H, Li J, Zhao P, Zeng F, Xiao Z, Iyengar A. Location privacy-preserving mechanisms in location-based services: A comprehensive survey. ACM Computing Surveys (CSUR). 2021 Jan 2;54(1):1-36. [CrossRef]
- Dhinakaran D, Khanna MR, Panimalar SP, Kumar SP, Sudharson K. Secure android location tracking application with privacy enhanced technique. In 2022 5th International Conference On Computational Intelligence And Communication Technologies (CCICT) 2022 Jul 8 (pp. 223-229). IEEE. [CrossRef]
- Lubbers P, Albers B, Salim F, Lubbers P, Albers B, Salim F. Using the html5 canvas api. Pro HTML5 programming: Powerful APIs for richer internet application development. Pro HTML5 Programming. 2010: 25-63. [CrossRef]
- Nikiforakis N, Joosen W, Livshits B. Privaricator: Deceiving fingerprinters with little white lies. In Proceedings of the 24th International Conference on World Wide Web 2015 May 18 (pp. 820-830). [CrossRef]
- Ayenson MD, Wambach DJ, Soltani A, Good N, Hoofnagle CJ. Flash cookies and privacy II: Now with HTML5 and ETag respawning. Available at SSRN 1898390. 2011 Jul 29. [CrossRef]
- Fernandez-de-Retana A, Santos-Grueiro I. Keep your Identity Small: Privacy-preserving Client-side Fingerprinting. arXiv preprint arXiv:2309.07563. 2023 Sep 14. [CrossRef]
- Saito T, Koshiba R. Examination and comparison of countermeasures against web tracking technologies. In Innovative Mobile and Internet Services in Ubiquitous Computing: Proceedings of the 13th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2019) 2020 (pp. 477-489). Springer International Publishing. [CrossRef]
- Ahmad Z, Casarin S, Calzavara S. An Empirical Analysis of Web Storage and Its Applications to Web Tracking. ACM Transactions on the Web. 2023 Oct 11;18(1):1-28. [CrossRef]
- Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS). 2014 Jun 1;32(2):1-29. [CrossRef]
- Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. ACM sigplan notices. 2014 Jun 9;49(6):259-69. [CrossRef]
- Kanyal R, Sarangi SR. PanoptiChrome: A Modern In-browser Taint Analysis Framework. In Proceedings of the ACM on Web Conference 2024 2024 May 13 (pp. 1914-1922). [CrossRef]
- Metwalley H, Traverso S, Mellia M. Unsupervised detection of web trackers. In 2015 IEEE Global Communications Conference (GLOBECOM) 2015 Dec 6 (pp. 1-6). IEEE. [CrossRef]
- Englehardt S, Eubank C, Zimmerman P, Reisman D, Narayanan A. OpenWPM: An automated platform for web privacy measurement. Manuscript. 2016. Proceedings of ACM CCS. https://github.com/openwpm/OpenWPM?tab=readme-ov-file#citation.
- Bui D, Tang B, Shin KG. Do opt-outs really opt me out?. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2022 Nov 7 (pp. 425-439). [CrossRef]
- Rieder W, Raschke P, Cory T. Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting. arXiv preprint arXiv:2402.01240. 2024 Feb 2. [CrossRef]
- Martin D, Wu H, Alsaid A. Hidden surveillance by Web sites: Web bugs in contemporary use. Communications of the ACM. 2003 Dec 1;46(12):258-64. [CrossRef]
- Fortier A, Burkell J. Hidden online surveillance: What librarians should know to protect their own privacy and that of their patrons. Information Technology and Libraries. 2015 Sep 21;34(3):59-72. [CrossRef]
- Senol A, Ukani A, Cutler D, Bilogrevic I. The Double Edged Sword: Identifying Authentication Pages and their Fingerprinting Behavior. In Proceedings of the ACM on Web Conference 2024 2024 May 13 (pp. 1690-1701). [CrossRef]
- Papadopoulos P, Kourtellis N, Markatos E. Cookie synchronization: Everything you always wanted to know but were afraid to ask. In The World Wide Web Conference 2019 May 13 (pp. 1432-1442). [CrossRef]
- Smith J. Review of Cookie Synchronization Detection Methods. arXiv preprint arXiv:2301.02619. 2022 Oct 28. [CrossRef]
- Andonie R, Dzitac I. How to write a good paper in computer science and HowWill it Be measured by ISI web of knowledge. International Journal of Computers Communications & Control. 2010 Nov 1;5(4):432-46. https://www.univagora.ro/jour/index.php/ijccc/article/view/2493.
- Williams A. How to… Write and analyse a questionnaire. Journal of orthodontics. 2003 Sep;30(3):245-52. https://journals.sagepub.com/doi/abs/10.1093/ortho.30.3.245.
- Taherdoost H. Designing a questionnaire for a research paper: A comprehensive guide to design and develop an effective questionnaire. Asian Journal of Managerial Science. 2022 Mar 20;11(1):8-16. [CrossRef]
Table 1.
Classification scheme based on technological architecture.
Table 1.
Classification scheme based on technological architecture.
Class A HTTP Tracking |
Class B API Tracking |
Class C User identification |
Class D Complex tracking |
| A1 (Client storage) |
B1 (Self owned API) |
|
|
|
Web Form Authentication
|
Flash cookies and local shared objects |
Canvas fingerprinting |
Analysis services |
|
Session IDs in hidden fields
|
Persistent cookies, zombie cookies, evercookies |
Synchronization of cookies |
Web Privacy Measurements |
|
HTTP Cookies
|
Local Based Services |
Etags |
|
| |
HTML5 |
|
|
| A2 (Server storage) |
B2 (Third party API) |
|
|
|
HTML
|
Taint tracking |
|
|
|
Web Beacons
|
Spyware |
|
|
|
Email Tracking
|
|
|
|
Table 2.
Classification scheme based on observable behaviors, tracking categories as derived from [
5].
Table 2.
Classification scheme based on observable behaviors, tracking categories as derived from [
5].
| Category |
Name |
Profile Scope |
Description |
Example |
Visit Directly? |
| A |
Analytics |
Within-Site |
Acts as a third-party analytics engine for individual websites. |
Google Analytics |
No |
| B |
Vanilla |
Cross-Site |
Employs third-party storage to track users across multiple websites. |
Doubleclick |
No |
| C |
Forced |
Cross-Site |
Strictly requires users to visit the tracker directly (e.g., through popups or redirects). |
InsightExpress |
Yes (forced) |
| D |
Referred |
Cross-Site |
Relies on another tracker (B, C or E) to establish unique identifiers. |
Invite Media |
No |
| E |
Personal |
Cross-Site |
Accessed directly by users in other contexts. |
Facebook |
Yes |
Table 3.
Classification scheme based on observable behaviors, tracking begaviour as mechanism as derived from [
5].
Table 3.
Classification scheme based on observable behaviors, tracking begaviour as mechanism as derived from [
5].
| Property |
Behavior |
| Tracker creates state owned by the site itself (first-party state). |
A |
| Requests made to the tracker expose site-owned data. |
A |
| The third-party request to the tracker contains state from the tracker. |
B, C, E |
| Tracker establishes its state from a third-party position; users do not directly engage with the tracker. |
B |
| The tracker forces users to visit it directly. |
C |
| Relies on interactions with another tracker (A, B, C, or E) to transmit data (not originating from the site itself). |
D |
| Users voluntarily access the tracker’s site. |
E |
Table 4.
A classification of tracking methodologies, grouping techniques into classes and categories based on functionality and mechanisms, [
7].
Table 4.
A classification of tracking methodologies, grouping techniques into classes and categories based on functionality and mechanisms, [
7].
| Class |
Category |
Description |
| Explicit Cross-Domain Tracking |
Basic Tracking |
Standard tracking functionality, including the integration of third-party trackers. |
| Third Party Included by a Tracker |
Third-party domains directly included by the tracker for data collection. |
| Implicit Cross-Domain Tracking |
Basic Tracking Initiated by a Tracker |
Tracking initiated indirectly through other mechanisms. |
| Third Parties That Include Trackers |
Third parties that themselves host or incorporate tracking scripts. |
| Cookie Syncing |
First to Third Party Cookie Syncing |
Syncing cookies from first-party domains to third-party trackers. |
| Third to Third Party Cookie Syncing |
Syncing cookies between multiple third-party trackers. |
| Third Party Cookie Forwarding |
Forwarding cookies between third parties for enhanced tracking. |
| Analytics |
Analytics |
Analytical tools providing site-specific or cross-domain insights. |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).