Results and Discussions
The subject matter of the journal's publications can be described by the systematization of the author's keywords of individual articles. In the The Lens platform data, the keywords field for this journal was empty. Therefore, the values were taken from the RIS records exported from ScienceDirect. On the other hand, the ScienceDirect records do not contain information about citations of publications. Therefore, they were taken from the The Lens records. It should be emphasized that in this case the citation rate is based on information from a specific database, the The Lens, and will differ from a citation rate based on data from another database, such as Scopus.
Author Keywords Clustering
Clustering of the authors' keywords was performed using VOSviewer software. Two iterations were performed: the first time the file my_thesaurus_terms.txt was built, to bring some terms to a common form, this file was used in the second iteration.
The 500 keywords with the highest total link strength were clustered. The default output produced 10 clusters, but the 10th cluster contained only one term "carbon materials", so the minimum number of terms per cluster was increased to 2, yielding 9 clusters.
Figure 1 shows the general view of the author's keyword clustering.
Explanation for charecteristics of terms, e.g. the most frequently cited term is identified by the citations of publications in which it appears, etc.
The data presented in
Figure 1 can be viewed in more detail using the AuthorKeywords-top500.json file located in the archive attached to this article by opening it in the online application
https://app.vosviewer.com/.
Here are some findings that characterize the landscape of keyword distribution.
10 author keywords most frequently appearing in new publications: wind, sustainable energy, cushion gas, hydrogen supply chain, solar-to-hydrogen efficiency, techno-economic assessment, energy transition, flow field, electrochemical impedance spectroscopy, Tafel slope.
10 author keywords more frequently occurring in publications with high citations: clean energy, carbon capture, combined heat and power, bibliometric analysis, hydrogen economy, hybrid renewable energy system, s-scheme, sustainable development, cost, energy management system. As can be seen from this list, the term "bibliometric analysis" is often found in publications with high citation rates.
10 author keywords most frequently occurring in all publications: hydrogen, hydrogen production, hydrogen storage, hydrogen evolution reaction, oxygen evolution reaction, electrocatalyst, water splitting, fuel cell, oxygen reduction reaction, solid oxide fuel cell. These terms reflect well the dominant theme of the journal.
Clustering of the 'Fields of Study' Terms
As noted earlier, the terms "Fields of Study" can be considered as an analog of Index keywords in Scopus and used to describe the topics of publications.
Figure 2 shows a similar picture to
Figure 1 but postulated for the 'Fields of Study' terms.
The author keywords are more diverse (23215 all terms, 1390 meet >=5) than 'Fields of Study' (5152, 1794 meet >=5), but due to their standardization, terms from 'Fields of Study' are more often exceeding the threshold of five terms.
Here are some data characterizing the distribution of field terms 'Fields of Study'.
10 author keywords most frequently appearing in new publications: mechanism (biology), literature, dual (grammatical number), reduction (mathematics), scale (ratio), art, oxygen reduction, macroeconomics, production (economics), plasma.
10 author keywords more frequently occurring in publications with high citations: hydrogen technologies, hydrogen economy, software deployment, sustainability, energy carrier, greenhouse gas, climate change, fossil fuel, natural resource economics, risk analysis (engineering).
10 author keywords most frequently occurring in all publications: chemistry, materials science, engineering, organic chemistry, chemical engineering, hydrogen, catalysis, physics, physical chemistry, electrode.
The 5 most frequently occurring terms and the 5 most cited terms in the 4 largest clusters - green, red, khaki and purple.
Green cluster: materials science, chemical engineering, catalysis, physical chemistry, electrode; charge carrier, crystallinity, semiconductor, nanomaterials, nanocomposite.
Red cluster: engineering, environmental science, computer science, quantum mechanics, electrical engineering; hydrogen technologies, hydrogen economy, software deployment, sustainability, energy carrier.
Khaki cluster: chemistry, organic chemistry, physics, thermodynamics, combustion; hydrogen fuel enhancement, diesel engine, thermal efficiency, diesel fuel, biodiesel.
Purple cluster: hydrogen, hydrogen storage, adsorption, metal, alloy; ab initio, photocatalytic water splitting, magnesium hydride, Gibbs free energy, gravimetric analysis.
The selection of terms to build queries on the task of the researcher's interest can include many terms linked by the 'OR' operator, but 'AND' can rarely link more than 3-4 terms acting as filters, so the selection of terms linked by the 'AND' operator requires more careful reasoning.
To select terms connected by the AND operator, it is useful to perform visualization of terms in the form of a network based on their co-occurrence. Most often, node placement algorithms for visualization seek to provide a good "readability" of the figure, but this approach is less informative than in the case of displaying a network of terms in specific coordinates. According to the personal experience of the author of this article, such coordinates can be Avg. pub. year and Avg. norm. citations, as they are used in the VOSviewer program and described in its manual
2 [].
A similar visualization of the red and green cluster data is shown in
Figure 3 and
Figure 4. The terms in each cluster were additionally grouped using an algorithm built into the Scimago Graphica program.
It is noteworthy that one cluster dominates in both figures, although the networks are built using a different algorithm than in
Figure 2, which may indicate the stability of the results obtained.
Clustering of Articles
The topics of publications presented in the journal can be described not only by keywords, but also by clustering the articles themselves. The Bibliographic coupling methodology is used for this purpose.
The BibliographicCouplingDocuments.json file placed in the archive attached to this article allows you to open it in VOSviewer Online and see in detail the publication clustering network obtained by Bibliographic Coupling on cited Documents. The significant advantage of online browsing is the possibility to see in the tooltip not only general characteristics of the publication network, but also detailed data of the selected publication — its title, where and when it was published, with which documents it is included in the cluster. A copy of the json file view screen in VOSviewer Online is shown in
Figure 5.
Documents are organized into 8 clusters, general characteristics of the document network: Items: 500; Links: 20193; Total link strength: 39785.
The problem with using 'Bibliographic coupling' is that many publications nowadays cite very large reference lists. For example, in the data exported from The Lens used in our work, the average number of citations per publication is 575389/10928=52.6. Much of these citations are usually in the Introduction section and only indirectly reflect the content of the article. In my opinion, it is currently more appropriate to use tags (index keywords/labels) assigned by the abstract database platform to a particular publication to assess the similarity of publications. In the The Lens system, such labels are placed in the 'Fields of Study' column. In the data used in this paper, there was an average of 190972/10972=17.4 'Fields of Study' terms per publication. This is quite sufficient to compare the proximity of the topics of publications, all the more that the terms used in the 'Fields of Study' field have the same spelling and therefore do not require additional normalization, and the field itself does not contain empty lines.
Based on the above, in the data table from The Lens analyzed by VOSviewer, the 'Fields of Study' column was renamed to 'References' and a similar analysis was performed to determine Bibliographic Coupling, the results of which are shown in
Figure 6.
The distinctive feature of this graph is the small number of clusters (2) when using VOSviewer with default parameters.
The following procedure was carried out to determine the current topics of the two clusters presented above:
A total of 372 'Fields of Study' terms were contained in the 20 records of the first cluster and 460 in the second cluster. Of these, the first cluster contained 154 unique and the second cluster contained 135 unique 'Fields of Study' terms.
The co-occurrence of terms was determined using fp-growth algorithm with parameters -s40m4n4. The s40 is a very high value of this parameter, which indicates the significant similarity of 'Fields of Study' terms in different bibliometric records. This fact can be explained by the fact that all articles are published in one journal, the subject matter of which is significantly limited.
For the articles in the first cluster, 57 co-occurrence results were obtained for the four Fields of Study terms. For the articles in the second cluster, 199 co-occurrence results were obtained for the four Fields of Study terms.
The Alluvial diagrams below were constructed using the Scimago Graphica program for the 45 most frequently occurring 4 terms.
This chart shows the most pronounced co-occurrence of terms: Catalysis, Chemical_engineering, Chemistry, Materials_science. Terms Catalysis, Hydrogen, Physical_chemistry, Chemical_engineering, Nanotechnology, Electrochemistry, Organic_chemistry, Materials_science most reflective of the subject matter of this chart.
This chart shows the most pronounced co-occurrence of terms: Electrical_engineering, Environmental_science, Chemistry, Hydrogen.
Terms Electrical_engineering, Hydrogen_production, Renewable_energy, Environmental_science, Hydrogen_economy, Hydrogen, Engineering most reflective of the subject matter of this chart.
The color selection options for each of the term layers are presented as four interactive web pages for both figures, available in the attached archive.
The 'Fields of Study' terms can serve as filters when searching for information on the The Lens platform. Knowing the co-occurrence of the terms significantly narrows down the exported sample of bibliometric records from the The Lens platform, thus speeding up the search for the desired information.
Conclusions
The possibility of combined use of fields of bibliometric records of the abstract databases ScienceDirect and The Lens, complementing each other, is shown. For example, in The Lens the field of keywords is poorly filled, and in ScienceDirect records there are no fields of citations and reference lists. The 'Fields of Study' field of the The Lens platform can be interpreted as system keywords similar to Index Keywords in Scopus.
Given that widely used bibliometric analysis programs such as VOSviewer can use The Lens data, the easiest way to combine the data is to populate the Keywords and Abstract fields of the data exported from The Lens with the 'AB' and 'KW' fields of the RIS files exported from ScienceDirect. More generally, a merged data table in Scopus CSV format can be created by renaming the relevant fields and converting separator characters between terms. It is advisable to merge data from tables by DOI.
The feasibility of consistent use of VOSviewer and Scimago Graphica programs for more complete visualization of the results of bibliometric analysis is demonstrated. The feasibility of using Alluvial diagram to map the co-occurrence of, for example, four keywords and to map the co-occurrence network of keywords in coordinates of average publication time and average normalized citation is shown.
The 'Fields of Study' data, being normalized terms, provide good opportunities to analyze the topics of publications.
By analyzing the 'Fields of Study' bibliometric data of International Journal of Hydrogen Energy for 2022-2024, two dominant publication themes are identified, which can be described in terms of: 'Catalysis, Hydrogen, Physical_chemistry, Chemical_engineering, Nanotechnology, Electrochemistry, Organic_chemistry, Materials_science' and 'Electrical_engineering, Hydrogen_production, Renewable_energy, Environmental_science, Hydrogen_economy, Hydrogen, Engineering'.
Possible follow-up study: comparing different record grouping approaches — bibliographic coupling variants and GSDMM.