Preprint Review Version 2 This version is not peer-reviewed

Web Content Mining: A Review on Concepts, Techniques, and Tools

Version 1 : Received: 29 July 2024 / Approved: 29 July 2024 / Online: 29 July 2024 (16:37:56 CEST)
Version 2 : Received: 30 July 2024 / Approved: 30 July 2024 / Online: 30 July 2024 (09:04:41 CEST)

How to cite: Sial, A. H. Web Content Mining: A Review on Concepts, Techniques, and Tools. Preprints 2024, 2024072339. https://doi.org/10.20944/preprints202407.2339.v2 Sial, A. H. Web Content Mining: A Review on Concepts, Techniques, and Tools. Preprints 2024, 2024072339. https://doi.org/10.20944/preprints202407.2339.v2

Abstract

With the emergence of the Internet and WWW has become a comprehensive source of information for the extraction of meaningful information has become a significant challenge in the past decade. Furthermore, the available information is classified in unstructured, structured, and semi-structured forms. Numerous amount of information on the web is also characterized in an unstructured or semi-structured format. This usually extracts the potential useful information from these multiple forms and has been considered a leading area of research in this modern era. The authors presented web content mining as a subcategory of web mining that focuses on important extracting patterns from the contexts available in the documents on the web. Hence, this paper focuses on the multiple content mining concepts, tools, and techniques implemented on the documents available on WWW.

Keywords

WWW, Web Content Mining, Unstructured, Structured, Semi-Structured, Web Documents

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.