Preprint Article Version 1 This version is not peer-reviewed

Responses of AI Chatbots to Testosterone Replacement Therapy: Patient's Beware!

Version 1 : Received: 1 October 2024 / Approved: 2 October 2024 / Online: 2 October 2024 (10:56:40 CEST)

How to cite: Pabla, H.; Lange, A.; Nadiminty, N.; Sindhwani, P. Responses of AI Chatbots to Testosterone Replacement Therapy: Patient's Beware!. Preprints 2024, 2024100152. https://doi.org/10.20944/preprints202410.0152.v1 Pabla, H.; Lange, A.; Nadiminty, N.; Sindhwani, P. Responses of AI Chatbots to Testosterone Replacement Therapy: Patient's Beware!. Preprints 2024, 2024100152. https://doi.org/10.20944/preprints202410.0152.v1

Abstract

Purpose Using Chatbots to seek healthcare information is becoming more popular. Misinformation and gaps in knowledge exist regarding the risks and benefits of Testosterone Replacement Therapy (TRT). We aimed to assess and compare the quality and readability of responses generated by four AI chatbots. Materials and Methods ChatGPT, Google Bard, Bing Chat, and Perplexity AI were asked the same eleven questions regarding Testosterone Replacement Therapy. The responses were evaluated by four reviewers using DISCERN and Patient Education Materials Assessment Tool (PEMAT) questionnaires. Readability was assessed using the Readability Scoring system v2.0 to calculate the Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Kruskal-Wallis statistics were completed using GraphPad Prism V10.1.0. Results Google Bard received the highest DISCERN and PEMAT. Perplexity AI received the highest FRES and best FKGL. Significant differences were found in understandability between Bing and Google Bard, DISCERN scores between Bing and Google Bard, FRES between ChatGPT and Perplexity, and FKGL scoring between ChatGPT and Perplexity AI. Conclusion ChatGPT and Google Bard were top performers based on their quality, understandability, and actionability. Despite Perplexity scoring higher in readability, the generated text still maintained an eleventh-grade complexity. Perplexity stood out for its extensive use of citations, however, it offered repetitive answers despite the diversity of questions posed to it. Google Bard demonstrated a high level of detail in its answers, offering additional value through visual aids.

Keywords

artificial intelligence; misinformation; testosterone

Subject

Medicine and Pharmacology, Urology and Nephrology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.