1. Introduction
Information is widely disseminated through websites. When it comes to providing users with high-quality material, the Quality of the websites is the most crucial concern. By considering every factor impacting the Quality of the website, more precise and pertinent information may be made available to the consumers. To assess the actual condition of the website, it is crucial to identify the variables that affect website quality, utilizing models that aid in quantitative website quality computation.
More than 42 parameters [
1,
2] are utilized to calculate the website quality. A developed model is necessary for calculating a Sub-factor's Quality. To calculate the Quality of a factor, the Quality of all the sub-factors must be added and averaged.
Everyone now engages in the exchange of information via websites. The website's content is available in various formats, including sound, photographs, movies, audio, graphics, etc. Different material types are posted using multiple forms. Numerous new websites are launched every day. The real issue is the calibre of these websites. People can only force themselves to depend on the need for such content High Quality. The most challenging task is evaluating Quality. Customer happiness depends on how well-designed the website is that is searched or browsed for information.
For evaluating the Quality of websites, numerous aspects must be considered [
3]
. Among the factors are usability, connectivity, navigation, structure, safety, maintainability, dependability, functionality, privacy, portability, etc. Various tools, approaches, models, and procedures must be used to evaluate the calibre of a website. Most website businesses need many elements, such as a logo, animated graphics, colour schemes, mouse-over effects, graphic art, database communication, and other requirements.
To assess a website's Quality, it is necessary to employ several characteristics [
4], either directly connected or unrelated. Some techniques used to evaluate the Quality of websites are subjective and rely on user input. Statistical methods, usually called objective measurements, can be used to measure quality factors.
Different stakeholders have various quality standards. Other actors evaluate the Quality from various perspectives, including designers, management, consumers, developers, etc. End users think about usability, efficiency, creditability, and other things, whereas programmers think about maintainability, security, functionality, etc. Only after determining a website's demands from an actor's viewpoint can the quality criteria best satisfy those expectations be obtained.
Websites are becoming more complicated for those concerned with e-commerce, electronics, museums, etc. Due to the availability of more components and the interactions between the parameters choosing relevant quality factors and building quality evaluation procedures to quantify the variables is becoming more difficult. The relationship between traits, attributes, and websites must be considered. Websites must have an impartial, well-considered subjective inclusion and an objective evaluation. Given the criteria that necessitate those approaches for the evaluation, it is imperative to evaluate both procedures.
WEB sites have become the most important mechanisms to disseminate information on different aspects the user focuses on. However, the reliability of the information posted on the websites must be verified. The Quality of WEB sites is always the question that users want to know so that the dependability of the information posted on the site can be assessed.
Most of the assessment of the Quality of the WEB site is based on human-computer interaction through usability heuristics
Nelson et al. [
5], design principles
Tognazzini et al. [
6], and rules
Schneiderman et al. [
7]. Evaluating the Quality of a website has taken different dimensions. As such, the formal framework has yet to be arrived at, even though several quality standards have been prescribed [ Law et al. [
8],
Semeradova and Heinrich et al. [
9].
A website's quality has been described in various ways by different people. Most recently, many metrics have been used to determine a website's capacity to live up to its owners’ and visitors’ expectations [
10] Morale Vegas et al. There is no common strategy because every solution suggested in the literature uses distinct evaluation techniques Law et al., [
11]. Evaluating the quality of a website is different from doing so for other systems. The website’s quality is evaluated using multi-dimensional modelling [
12] Ecer et al.
The Quality of a website is computed from three different perspectives, which include functional
Luieng et al. [
13], strategic
Maia and Furtado [
14] and experiential
Sambre et al. [
15]. All the methods focus on some measures, attributes, characteristics, dimensions etc. The methods are synonyms and explore distinctive features of the websites.
Different methodologies are used to compute the website's Quality, including experimental-quasi-experimental, descriptive-observational, associative, correlative, qualitative-qualitative, and subjective-objective; the methods are either participative (surveys, checklists) or non-participative (web analytics). The participative methods focus on user preferences and psychological responses
Bevan et al., [
16]. The testing technique is frequently employed for computing quality indicators, including usability tests, A/B tests, ethnographic, think-aloud, diary studies, questioners, surveys, checklists, and interviews, to name a few
Rousala and Karause [
17].
The most recent method uses an expert system involving heuristic evaluation
Jainari et al. [
18]. Experts judge the Quality of a website concerning a chosen set of website features. The expert’s methods are evaluated manually or through an automated software system. Few Expert systems also are developed using Artificial intelligence techniques
Jayanthi and Krishna Kumari [
19], Natural Language Processing
Nicolic et al. [
20]
According to a survey by Morales-Vargas et al. [
21], one of the three dimensions—strategic, functional, and experimental—is used to judge the quality of websites. The necessary method for evaluating the quality of the website is expert analysis. They have looked at a website's qualities and categorised them into variables used to calculate the website's quality. Different approaches have been put out to aid in quantifying the quality of a website.
Research Questions
What are the Features that together determine the Completeness of a WEB site?
How are the features Computed?
How did the features and the website's degree of excellence relate to one another
How do we predict the Quality of a WEB site given the code?
Research outcomes
This research presents a parser that computes the counts of different features given to a WEB site.
A model can be used to compute the Quality of a website based on the feature counts.
An example set is developed considering the code related to 100 WEB sites. Each website is represented as a set of features with associated counts, and the website quality is assessed through a quality model.
A Multi-layer perceptron-based model is presented to learn the Quality of a website based on the feature counts, which can be used to predict the Quality of the WEB site given the feature counts computed through a parser model.
2. Related work
By considering several elements, including security, usability, sufficiency, and appearance,
Fiaz Khawaja1 et al. [
22] evaluated the website's Quality. A good website is simple to use and offers the learning opportunity. A website's Quality increases when it is used more frequently. When consumers learn from high-quality websites, their experience might be rich and worthwhile. The factor "Appearance" describes how a website looks and feels, including how appealing it is, how items and colours are arranged, how information is organized meaningfully, etc. A method for calculating website quality based on user observations made while the website is being used has been presented by Kausar Fiaz Khawaja1.
Flexibility, safety, and usability are just a few of the elements that
Sastry et al. [
23] and Vijay Kumar Mantri et al. [
24] considered while determining the website's Quality. "Usability" refers to the website's usefulness, enjoyment, and efficacy. The user presence linked to the website or browsing is called the "Safety" factor. There should never be public access to the user's connection to the website. The "Flexibility" aspect is connected to the capability included in a website's design that enables adjustments to the website even while it is being used. Users can assess a website’s quality using PoDQA (Portal Data Quality Assessment Tool), which uses pre-set criteria.
Vassilis S. Moustakis et al.'s [
25] presented that the assessment of the WEB quality needs to consider several elements, including navigation, content, structure, multimedia, appearance, and originality. The term "content" described the data published on a website and made accessible to visitors via an interface. The "content" quality factor describes how general and specialised a domain is fully possible.
Navigation refers to the aid created and offered to the user to assist in navigating the website. The ease of navigating a website, the accessibility of associated links, and its simplicity all affect its navigational Quality. The "Structure" quality aspect has to do with things like accessibility, organization of the content, and speed. The appearance and application of various multimedia and graphics forms can impact a website's overall feel and aesthetic. A website could be developed with a variety of styles in mind. A website's "Uniqueness" relates to how distinct it is from other comparable websites. A high-quality website must be unique, and users frequent such websites frequently. Vassilis proposed a technique known as AHP (Analytical Hierarchical Process), and they utilized it to determine the website's Quality. Numerous additional factors must also be considered to determine the website's Quality. Andrina Graniü et al.'s [
26] assessment of the "portability" of content—the capacity to move it from one site to another without requiring any adjustments on either end—has considered this.
Tanya Singh et al. [
27] have presented an evaluation system that considers a variety of variables, such as appearance, sufficiency, security, and privacy. They portrayed these elements in their literal sense. A website's usability should be calculated to represent its Quality in terms of how easy it is to use and how much one can learn from it. The usability of a website refers to how easily users can utilize it. To a concerned user, some information published on the website can be private. The relevant information was made available to the qualified website users.
The exactness with which a user's privacy is preserved is the attribute "privacy"-related Quality. Only those who have been verified and authorized should have access to the content. Users' information communicated with websites must be protected to prevent loss or corruption while in transit. The security level used during the data exchange can be used to evaluate the website's Quality.
The "Adequacy" factor, which deals with the correctness and completeness of the content hosted on the website, was also considered. Anusha et al. [
28] evaluated similar traits while determining the websites' quality. The most important factor they considered was "Portability." This is called portability when content and code may be transferred from one machine to another without being modified or prepared for the target machine. Another key element of websites is the dependability of the content users see each time they launch a browser window. When a user hits on a certain link on a website, it must always display the same content unless the content is continually changing. The dependability of a website is determined by the possibility that the desired page won't be accessible.
When improvements to the website are required, maintenance of the existing TA website is straightforward. The ease of website maintenance affects the Quality of the website. When evaluating the Quality of a website, taking the aspect of "maintainability" into account, several factors, such as changeability, testability, analysability, etc., must be considered. The capacity to modify the website while it is active is a crucial factor to consider regarding maintainability.
The website's capacity to be analysed is another crucial aspect that should be considered when evaluating a website's Quality. The ability to read the information, relate the content, comprehend it, and locate and identify the website's navigational paths all fall under a website's analysability category. When there is no unfinished business, changes to the user interface, and no disorganised presentation of the display, a website can be called to be stable. The reliable website stays mostly the same. When designing a website, the problem of testing must be considered. While the website is being used, updates should be able to be made. Nowadays, most websites still need to provide this feature.
Filippo Ricca et al. [
29] have considered numerous other parameters to calculate website quality. The website's design, organization, user-friendliness, and organizational structure are among the elements considered. Web pages and their interlinking are part of the website's organization. The practical accessibility of the web pages is directly impacted by how they are linked. When creating websites, it is essential to consider user preferences. It's necessary to render the anticipated content.
According to Saleh Alwahaishi et al. [
30], the levels at which the content is created and the playfulness with which the content is accessible are the two most important factors to consider when evaluating a website's quality. Although they have established the structure, more sophisticated computational methods are still required to evaluate the websites' quality. They contend that a broad criterion is required to assess the value of any website that provides any services or material supported by the website. The many elements influencing a website's quality have been discussed. In their presentation, Layla Hasan and Emad Abuelrub [
31] stressed that WEB designers and developers should consider these factors in addition to quality indicators and checklists.
The amount of information being shared over the Internet is frighteningly rising. Websites and web apps have expanded quickly. Websites must be of a high standard if they are often utilised for getting the information required for various purposes. Kavindra Kumar Singh et al. [
32] have created a methodology known as WebQEM (Web Quality Evaluation Method) for computing the quality of websites based on objective evaluation. However, judging the quality of websites based on subjective evaluation might be more accurate. They have quantitatively evaluated the website's quality based on an impartial review. They included the qualities, characteristics, and sub-characteristics in their method.
People communicate with one another online, especially on social media sites. It has become essential that these social media platforms offer top-notch user experiences. Long-Sheng Chen et al.'s [
33] attempt to define the standards for judging social media site quality. They used feature selection methodologies to determine the quality of social networking sites. Metric evolution is required for the calculation of website quality.
According to metrics provided by Naw Lay Wah et al. [
34], website usability was calculated using sixteen parameters, including the number of total pages, the proportion of words in body text, the number of text links, the overall website size in bytes, etc. Support vectors were used to predict the quality of web pages.
Sastry JKR et al. [
35] assert that various factors determine a product's quality, including content, structure, navigation, multimedia, and usability. To provide new viewpoints for website evaluation, website quality is discussed from several angles [
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49]. However, none of the research has addressed how to gauge the websites' content's comprehensiveness regarding quality.
A hierarchical framework of several elements that allow quality assessment of WEB sites has been developed by Vassilis S. Moustakis et al. [
50]. The framework considers the features and characteristics of an organisation. The Model does not account for computation measurements of either factors or sub-factors.
Oon-it, S [
51] conducted a survey to find from the users the factors they consider reflect the Quality of health-related WEB sites. The users have opined that trust in health websites will increase when it is perceived that the Quality of the content hosted on web site is high considering the correctness, simplicity etc.
Allison R et al. [
52] have reviewed the existing methodologies and techniques relating to evaluating the Quality of web site and then presented a framework of factors and attributes. No computational methods have been covered in this presentation.
Barnes, Stuart et al. [
53] conducted a questionnaire survey. He developed a method of computing the Quality based on responses by various participants' computed t-score. Based on the t-score, the questions were classified, with each class representing the WEB site quality.
Layla et al. [
54] have attempted to assess the Quality of the WEB sites based on the Quality of the services rendered through the WEB site. They have presented standard dimensions based on services rendered by the websites. The measurements they consider include accuracy, content, value-added content, reliability, ease of use, availability, speed downloading, customisation, effective internal search, s security, service support, and privacy. They collected criteria to measure each factor and then presented standards combining all the requirements to form an overall quality assessment. Combining different metrics into a single metric does not reflect the excellent quality of the website.
Sasi Bhanu et al. [
55] have presented a manually defined expert model for assessing the WEB site quality. The factors related to "Completeness of a website have also been measured manually. The computations have been reflected considering a single WEB site. No Example set is used. The accuracy of prediction varied from website to website.
Rim Rekik et al. [
56] have used a text mining technique to extract features of WEB sites and used a reduction technique to filter out the most relevant features of a website. An example set is constructed. Considering different sites. They have applied Apriori Algorithm to the example set to find associations between the criteria and find frequent measures. The Quality of the WEB site is computed considering the applicability of the association rule given a WEB site.
Several soft computing models have been used for computing the Quality of a Web site which include Fuzzy-hierarchical [
57], Fuzzy Linguistic [[
58,
59], Fuzzy-E-MEANS [
60], Bayesian [
61], Fuzzy-neutral [
62],
SVM [
63], Genetic Algorithm [
64]. Most of these techniques focused on filtering out the website features and creating the example sets mined to predict the overall website quality. No generalized model is learned, which can be used for indicating the Quality of a WEB site given a new website.
Michal Kakol et al. [
65] have presented a predictive model to find the credibility of content hosted on the website based on human evaluations considering the comprehensive set of independent factors. The classification is limited to a binary classification of the content. But scaled prediction is the need of the hour.
The expert systems referred to in the literature depend on Human beings who are considered experts and through experimentation which is a complicated process. No models as such existed that attempted to compute the website's features give the WEB application code. No model that learns the website quality is presented based on the measurement of website elements.
4. Methods and Techniques
4.1. Analysis of sub-factors relating to the factor “Completeness”.
The information must be complete for consumers to understand the meaning of the content hosted on a website. The completion of the content that is hosted on the websites depends on a number of factors. Missing URLs or web pages, self-referential “hrefs” on the same page, tabular columns in tables, data items in input-output forms, context-appropriate menus, missing images, videos, and audios are just a few examples of the characteristics of the attribute "Completeness" that are taken into account and evaluated to determine the overall quality of the websites.
Connections, Usability, Structure, Maintainability, Navigation, Safety, functionality, Portability, Privacy, etc., are just a few of the numerous aspects of evaluating a website's quality. Each element has several related qualities or factors. Important supporting variables and features, such as the following, relate to completeness.
Quality URL/WEB pages
When evaluating the effectiveness of online sites, the number of missing pages is considered. If a “href” is present, but the matching WEB page is not in line with the directory structure indicated by the URL, the URL is deemed missing. The quantity of references missing from all web pages affects how well this sub-factor performs.
Quality of self-referential hrefs
“hrefs” can provide navigation in an HTML page, especially if the page size is huge. Self-refs are used to implement navigation on the same page. Self-hrefs are frequently programmed on the same page, but programme control is not transferred to those sub-refs. Additionally, although sub-herf is not defined in the code, control is still transferred. The quantity of missing sub-hrefs (not coded) or sub-hrefs that are coded but not referred to is used to determine the quality of this sub-factor.
Quality of Tables
Sometimes content is delivered by displaying tables. A repository of the data items is created, and linkages of those data items to different tables are established. The tables in the web pages must correspond to the Tables in the database, The columns code for defining a table must correspond to the Table attributes represented as columns. The number of discrepancies between HTML tables and RDBMS tables reveals the HTML tables' quality.
Quality of forms
Sometimes data is collected from the user using forms within the HTML pages. The forms are designed using attributes for which the data must be collected. The forms need to be better designed, not having the relevant fields or when no connection exists between the fields for which data is collected. The number of mismatches reveals the Quality of such forms coded into the website.
Missing images
The content displayed on websites is occasionally enhanced by incorporating multimedia-based items. The image quality needs to be improved. Some important considerations include the image's size and resolution. Every HTML page uses its URL to link to the images and normally specifies the size and resolution. Blank images are displayed when the images a "href" refers to don't exist. Second, the attributes of the actual image must correspond to the HTML page's image code specifications. The number of images on each page may be counted, and each image's details, including its dimensions, resolution, and URL, can then be found. The quantity of blank or incorrectly sized or resized images can be used to gauge the quality of the images.
Missing videos
Video is another form of content found on websites with interactive content. Users typically have a strong desire to learn from online videos. The availability of the videos and the calibre of the video playback are the two sub-factors that impact video quality the most. An HTML or script file that mentions the videos contains the URL and player that must be used locally to play the videos.
The position, width, number of frames displayed per second, and other details are used to identify the videos. There must be a match between the movies' coded size and the videos' actual size stored at the URL location. Another crucial factor for determining the quality of the videos is whether they are accessible at the site mentioned in the code. The properties of the video, if it exists, can be checked. The HTML pages can be examined, and their existence at the supplied URL can be confirmed. The quantity of missing or mismatched videos reflects the quality of the websites.
Missing PDFs:
Consolidated, important, and related content is typically rendered using PDFs. Most websites offer the option to download PDFs. The material kept in PDFs is occasionally created by referring to the PDFs using hrefs. The quantity of missing PDFs can be used to calculate the quality of the PDFs.
4.2. Total Quality considering the factor "completeness.
When assessing the quality of a website, the component "Completeness" and its attributes, which are generated as an average of the Quality of the individual criteria, are considered. The more development is made regarding a characteristic, the more a feature's quality is diminished. When the "completeness" factor is considered, the corresponding sub-factors quality defines a website's overall quality. One sub-factors quality could be top-notch, while another could be below average. The sub-factors are independent as a result. Due to the equal importance of each sub-factor, there are no assigned weights. In equation (1), a simple measure of the factor's Quality may be the average quality of all the sub-factors.
4.3. Computing the counts of sub-factors through a parser.
A parser is developed which scans through the entire WEB site given as an example. The WEB site, as such, is a Hierarchical structure of files stored in the disk drive. An example of such a structure is shown in
Figure 1. As such, the URL of the root of the structure is given as the beginning of the WEB site, which is taken as input by the parser.
4.4. An algorithm for computing object counts.
The following algorithm represents a parser designed to compute the counts related to different objects described above. The counts include total referred objects, Total objects in existence, total number of missing objects and Total number of mismatched objects. This algorithm is used to build the example set and compute counts related to a new website using the predicted WEB site quality.
Input: WEB site structure
Outputs:
The list of all files
Number of the total, Existing and Missing Images,
Number of the capacity, Existing and Missing videos,
Number of the total, Existing and Missing PDFs,
Number of the total, Existing and Missing Fields in the tables,
Number of the total, Existing and Missing columns in the forms a
Number of total, Existing and Missing self-references
Procedure
Scan through the files in the structure and find the URLS of the Code files.
-
For each of the code file
Check for URLS of PDFS; if it exists, enter them into a PDF_URL array.
Check for URLS of Images; enter them into an Image URL array if it exists.
Check for URLS of Videos, and if it exists, enter a Video URL array.
Check for URLS of inner pages; if it exists, enter them into an inner-pages _URL array!
Check for the existence of tables and enter the description of the table as a string into an array.
Check for the existence of forms and enter the description of the forms as a string into an array.
-
For each entry in the PDF-Array
-
For each entry in the Image-Array
-
For each entry in the Video-Array
-
For each entry in the Video-Array
-
For each entry in the inner-URL-Array
-
For each entry in Table-Desc-Array
Fetch the column's names in each of the entries.
Fetch the tables having the column names as Table fields.
If the column and file names are the same, add to Total-Tables and Existing Tables, And add to Missing Tables.
-
For each entry in Field-Desc-Array
Fetch the column's names in each of the entries.
Fetch the Forms having the field names as Form fields.
If the field name and the filed names are the same, add to Total-Forms and Existing Forms; else, add to Missing Tables.
Write all the counts in a CSV file.
Write all names of the program files to a CSV file.
Write all the Images URLS with associated properties to a CSV file
Write all the Video URLS with associated properties to a CSV file
Write all names of the PDF files into a CSV file.
The WEB sites are input to the parser, which computes the counts of different objects explained above.
4.5. Designing the Model Parameters.
A multi-Layer perceptron Neural network has been designed, which represents an expert model. The Model is used to predict the Quality of a website given the counts computed through the parser.
Table 3 details the design parameters used for building a NN network.
4.6. Platform for Implementing the Model.
KERAS and TensorFlow have been used in the notebook interface run on Microsoft 11 operating system. The software system runs DELL LAPTOP, built on top of the 12th Generation with 8 cores and 2 Nvidia GPUS.
4.7. Inputting the Data into the Model.
The example set is broken into batches, each of 10 Examples, and the records are shuffled in each epoch. Each split is converted to a batch of 5 for updating the weights and the bios.