1. Introduction
As research progresses and scholars explore new research lines and publish their most recent results, the range of topics in a science field typically broadens. New discoveries open up new questions that can be explored, often generating novel research directions. Understanding the dynamics and diversity of research topics in a discipline is, potentially, a relevant indicator of how active the area is [
1]. For instance, a lower degree of diversity in a scientific field may indicate the presence of one or more prominent topics, which most of the publications focus on; this in turn could be driven by various factors, both intrinsic to the field (such as a compelling scientific conundrum that attracts the attention of many scholars of the community) or external (e.g., socioeconomical incentives) [
2,
3].
But understanding research diversity entails understanding its epistemic composition, it requires to analyze the different scientific areas that it consists of. To this effect, Topic modeling can be a precious resource to unpack the main lines along which research evolves and is conducted [
4]. Topic modeling algorithms are convenient tool to get a sense of the array of topics that are present in a field [
5]. These algorithms, such as LDA or, more recently, BERTopic, are capable of scanning large datasets, and segment a whole corpus of documents, according to their semantic content. For the present work we used BERTopic, an algorithm centered on BERT (Bidirectional Encoder Representations from Transformers) embeddings, i.e., numerical representation of the semantics of a sentence or even of a whole document [
6]. BERT is a recent instrument developed by Google, which is based on a mechanism known as attention, and has quickly outperformed earlier embedding algorithms in various tasks [
7].
To better investigate how research efforts are distributed across research topics, we decided to resort to Shannon’s entropy, a concept derived from information theory, and a measure of uncertainty or randomness in a dataset [
8]. It was originally formulated by Claude Shannon in 1948, and it has since been adapted for various domains. Although entropy measurements have been applied to investigate the performance of topic modeling algorithms [
9,
10,
11], we propose to use Shannon’s entropy to quantify the research diversity by assessing the distribution of research topics within a given field as determined by BERTopic. The mere number of topics in a field may not be an accurate measurement of diversity, when taken alone. A field where there is e.g., one predominant topic that absorbs 80% of the global research efforts alongside 19 minor niche research areas is in a very different situation than a field where there are 20 equally relevant research areas. To account for the distribution of the research efforts (measured through their output, i.e., their resulting publications), entropy may be helpful. Higher entropy values indicate a more even distribution of topics, reflecting greater diversity, while lower entropy values suggest that research is concentrated around few dominant topics.
Despite its potential, the use of Shannon’s entropy to measure research diversity remains underexplored in many disciplines, including dental sciences. Dental research includes a wide array of specialties, each with its own unique focus and themes, which have undergone a tremendous growth in the last decades of the 20th century [
12]. The introduction of groundbreaking techniques (such as implant dentistry) has also opened up new therapy opportunities, challenges and questions [
13]. Understanding how research topics evolve and diversify within these specialties could help guide future research efforts and policy decisions.
In this study, we applied Shannon’s entropy to analyze the diversity of research topics across multiple dental disciplines over a thirty-year period (1994-2023). Using BERTopic, we extracted and categorized research topics from a comprehensive dataset of scientific articles from MEDLINE. By calculating the entropy of these topics over time, we believe we could provide a quantitative measure of research diversity within each discipline and offer insights into the evolution and distribution of research themes in the field of dental research.
4. Discussion
The results of this study provide interesting insights into the diversity and evolution of research within dental sciences over the past three decades. By employing topic modeling techniques and shannon’s entropy, we quantified and compared the diversity of research topics across six distinct dental disciplines.
The dataset was arbitrarily generated through Pubmed searches to collect a sizable corpus of publications in six core areas of dental practice and research. The queries were generic enough to embrace a large portion of publications in each discipline, without getting into details of specific clinical or scientific issues within each discipline. Although other areas could have been – and they actually were - devised and searched, e.g., TMJ disorders or endodontics, some preliminary attempts did not yield a comparable number of publications, and we refrained from expanding our corpus to maintain a certain degree of comparability between disciplines. For the same reason we abstained from expanding our corpus to earlier dates than 1994, although by doing so, we likely missed some pivotal events in the development of dentistry, such as the introduction of bonding in Restorative Dentistry [
27], or the development of root-form osseointegrated fixtures in Implant Dentistry [
13], which presumably greatly impacted the trends of research in those areas at the time.
Our results confirm that a growing number of publications is published each year, i.e., that the rate of publication is accelerating, as it has been previously reported [
28], although different dental disciplines understandably move at different speeds. In particular, Periodontics and Prosthodontics have been experiencing a faster growth than the other disciplines we considered. The growth of some disciplines, such as Oral Surgery, might even be starting to slow down, as the number of new publications per year has remained constant in the last 10 years. Several factors may compound in determining the number of publications in a discipline, including new discoveries (or new challenges), new needs that arise and must be addressed, or even changes in the way a certain issue is culturally conceived, interpreted or categorized. Determining why these disciplines are behaving in a specific way is outside of the scope of the present report, which aimed at measuring one aspect of the research infoscape in dentistry, i.e., its diversity. Diversity is a polysemous term, which has acquired a novel set of meanings, including that, currently prevalent in social sciences, of inclusivity of people in events, activities, jobs etc. beyond racial, gender, or religious divides [
29,
30]. However, this word has a rich history in the natural sciences, where it has been long used to refer to wide range of different life forms within ecosystems [
31]. Our present works has focused on diversity in research, assessed through the main product of scientific activity, i.e., its dissemination through published papers [
32]. In this context, diversity can be conceived as the presence, within a certain scientific field, of multiple areas of investigations, which are actively pursued by the scientific community. Dentistry was chosen, out of all the life science fields, because of the specific domain knowledge of the authors, and because it has quite defined boundaries, as compared to e.g., medicine, or biology, so it was easier to identify some major areas of investigation, which correspond to the main clinical branches of dental practice.
To understand more of what kind of research is conducted in these fields, however, we resorted to Topic modeling. Topic modeling is a field interested in processing unlabelled texts and automatically understanding their theme, their topic (their “about”) using a wide range of techniques [
33,
34]. Recently, neural networks have allowed the creation of high performing algorithms to cluster documents based on their semantics, and BERTopic in particular has repeatedly exceeded the benchmark performance of previous algorithms on a wide range of tasks [
18,
35]. Clearly, BERTopic is far from being omniscient and still requires human input to work properly. At its core there is a transformer architecture that takes sentences, translates them into numerical sequences, known as embeddings, then clusters them based on their similarity and finds appropriate labels to characterize these clusters; to do that, it relies on a series of representation models, which range from keyword descriptors to small sentences, as BERTopic can also accommodate LLMs such as GPT [
23], which we used to quickly characterize the main topics in
Table 3. Depending on how its parameters are set, BERTopic can identify fewer and larger topics, or smaller and very numerous topics, splitting up the dataset into sometimes tiny clusters. Two critical parameters to this effect are the number of neighbors that the UMAP dimensionality reduction algorithm uses to process embeddings prior to clustering, and the minimum size of clusters that the HDBSCAN algorithm uses. By changing their values, the number of topics in a discipline could go down to 2-3, or could skyrocket to several thousands for a large dataset such as Prosthodontics or Periodontics (data not shown). The choice of the correct set of parameters is mostly arbitrary, according to the need of the investigators, and there are no safe and tested recipes to get the optimal – not to mention “true” – number of topics in a corpus. We picked the settings that yielded a sizable number of topics, around fifty or less, while keeping them manageable, and avoiding micro-topics of just a few articles and dubious meaning. Interestingly, although fields with more articles also (unsurprisingly) had more topics, the diachronic change in topic number did not directly reflect the number of publications. So, at the beginning of the period we investigated, although the Prosthodontics and Periodontics datasets included about 50% more articles than Restorative Dentistry, the number of topics of the latter was progressively becoming higher than Periodontics and almost twice as large than Prosthodontics. This would suggest that Periodontics and Prosthodontics are fields whose literature corpora have actively expanded over time faster than Restorative Dentistry, but maintaining a similar or lower number of topics as compared to Restorative Dentistry. We could envisage a situation, in Restorative Dentistry, where the front of research knowledge expands more homogeneously along a number of different trajectories. This would correspond to higher levels of Entropy, which correlate with a more homogeneous distribution of articles across the topics. On the contrary, when it comes to Periodontics and Prosthodontics, our data suggest that these fields attract a vast number of new publications every year but these are more focused, i.e., these disciplines contain fewer topics (as in the case with Prosthodontics) or at least they have some main research directions which contain most of the publications and which are prominent over the rest. This could be indicative of a field that has matured, with researchers concentrating on specific, well-established topics rather than exploring new areas. On the other hand, disciplines like Oral Surgery have shown an increase in entropy, at least over a sustained period of time, reflecting a diversification of research topics. This may correspond to an expansion of this field, driven by innovations in implant materials (which are in part contained in this dataset), techniques (such as the use of laser in oral surgery [
36]), and therapeutic strategies [
37].
The divergence in entropy trends across disciplines raises some important questions about the direction of future research. Fields with increasing entropy, such as Oral Surgery but also, more limitedly Restorative Dentistry and Orthodontics, appear to be in a phase of exploration, where the topics expand isometrically, so to speak, and research questions are being continually investigated, although the number of new papers in a year is not increasing, i.e., the literature grows linearly. This could be interpreted as a sign of a dynamic and evolving field, which nevertheless is still being pursued by a ‘niche’ community, which is not expanding on par with other specialties (e.g., Periodontics). However, although a field like Periodontics is growing, as for number of publications, and number of topics, some main research axes (e.g., Periodontal Disease Treatment Studies, Smoking and periodontal disease, Periodontitis and preterm birth relation, to mention a few of its largest topics) are still robustly leading the research landscape [
38].
Our findings suggest that dental disciplines are growing following different dynamics, and thus indirectly indicate the importance of the balance in a research agenda, to promote both depth and breadth. While specialization is necessary for advancing knowledge in specific areas, it is equally important to encourage exploratory research that can lead to the discovery of new topics and subfields. Policymakers, funding agencies, and academic institutions could consider similar approaches for data analysis when developing research priorities and funding strategies.
While this study may provide some valuable insights, it also has limitations that must be acknowledged. We relied on very generic bibliographic data from PubMed, using generic keywords. The results are not considered to be exhaustive of a specific field and may not fully capture the entirety of research within these disciplines [
39]. Additionally, the selection of disciplines and the exclusion of others, such as TMJ disorders or Endodontics, were based on the availability of a comparable volume of publications, and richer datasets might allow to further expand this analysis and make it more extensive. Future studies could broaden the analysis to include these and other related fields to provide a more comprehensive overview of dental research.