Skip to Main Content
Kornhauser Health Sciences Library

Generative Artificial Intelligence (AI): Limitations

Guide to widely used generative AI systems, such as ChatGPT, and their applications in health sciences research.

The Problem of "Hallucination"

Large language models (LLMs) cannot "understand" user input; they can only identify linguistic patterns and imitate them.

By default, language models optimize the next word prediction objective, which is only a proxy for what we want these models to do.

Ouyang et al. (2023)1

ChatGPT is fundamentally a text transformer—not an information retrieval system.

Walters and Wilder (2023)2

Consequently, LLMs will sometimes output text that appears credible but has no factual basis. In particular, LLMs have a known tendency to cite non-existent sources in convincing APA style. Even when citing real sources, LLMs may paraphrase them inaccurately.

If a prompt is ambiguously phrased, LLMs may (wrongly) guess user intent rather than asking clarifying questions or admitting that they do not "understand" what is being asked. Indeed, they can be "confidently wrong".3 This is, perhaps, because LLMs are optimized to provide answers satisfying to human users, who are biased in favor of confident responses over doubtful or noncommittal ones.2

ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue as challenging as [during reinforcement learning training], there's currently no source of truth.

OpenAI (2022)4
  1. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, et al. Training language models to follow instructions with human feedback. arXiv [Preprint]. 2022:2203.02155.
  2. Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep. 2023;13(1):14045. Epub 20230907. PubMed PMID: 37679503; PubMed Central PMCID: PMC10484980.
  3. Gravel J, D’Amours-Gravel M, Osmanlliu E. Learning to Fake It: Limited Responses and Fabricated References Provided by ChatGPT for Medical Questions. Mayo Clinic Proceedings: Digital Health. 2023;1(3):226-34.
  4. Introducing ChatGPT [Internet]. San Francisco: OpenAI; 2022 Nov 30 [cited 2023 Oct 17]. Available from:

The "Black Box" Problem

  • AI increasingly relies on paradigms, such as deep learning, in which the developers themselves do not understand how their models make decisions.
  • For particularly complex models, such as LLMs, it becomes impossible to document and reproduce research methods.
    • As for humans learning from experience, an AI's methods are more comparable to intuition that to a set of instructions that can be followed or reproduced.1
  • For example, if we use an LLM to conduct a literature search, we cannot document the search strategy or reproduce the results:
    • For most LLMs, the same prompt will produce a different output each time.
    • We cannot determine or appraise the criteria by which articles were included or excluded.
    • This is antithetical to the principles and practice of systematic search, which is a cornerstone of evidence-based practice.
  1. Bathaee Y. The Artificial Intelligence Black Box and the Failure of Intent and Causation. Harv J Law Technol. 2018;31(2):889-938. Available from:

Algorithmic Bias

As machines, AI systems may give the false impression of being impartial and objective, but they are the products of the data inputs used to train them, which were created by human beings, and the choices of the developers, also human beings. Thus, AI is subject to many of the same biases and errors as humans are.

Some are concerned that existing datasets underrepresent particular sociodemographic groups1, which, if used as training data, may result in inequitable AI models. It may be possible to counter this issue by use of carefully selected training data.2 Racial and political biases have been observed in the outputs of ChatGPT.3-5

  1. Arias-Garzon D, Tabares-Soto R, Bernal-Salcedo J, Ruz GA. Biases associated with database structure for COVID-19 detection in X-ray images. Sci Rep. 2023;13(1):3477. Epub 20230301. PubMed PMID: 36859430; PubMed Central PMCID: PMC9975856.
  2. Wang R, Chaudhari P, Davatzikos C. Bias in machine learning models can be significantly mitigated by careful training: Evidence from neuroimaging studies. Proc Natl Acad Sci U S A. 2023;120(6):e2211613120. Epub 20230130. PubMed PMID: 36716365; PubMed Central PMCID: PMC9962919.
  3. Deshpande A, Murahari V, Rajpurohit T, Kalyan A, Narasimhan K. Toxicity in ChatGPT: Analyzing Perona-assigned Language Models. arXiv [Preprint]. 2023.
  4. Baum J, Villasenor J. The politics of AI: ChatGPT and political bias [Internet]. Washington (DC): Brookings Institution; 2023 May 8 [cited 2023 Oct 17]. Available from:
  5. Rozado D. The Political Biases of ChatGPT. Social Sciences. 2023;12(3).

Privacy and Security

We conclude that chatbots cannot comply with the Health Insurance Portability and Accountability Act (HIPAA) in any meaningful way despite industry assurances.

Marks and Haupt (2023)1
  • Inputting even deidentified data into a chatbot may give the developers enough information to make inferences about a patient's health or a clinician's prescribing practices.
    • Such information is extremely valuable and can be sold to advertisers and data brokers.
  • Experts can use AI to reidentify data.
  • Prompting is a two-way street—chatbots prompt the user with follow-up questions.
    • The FTC has expressed concerns about the ability of chatbots to gain undeserved trust from users.2 Healthcare providers may be tempted to divulge more than they originally intend.
The answer is to use chatbots sparingly despite the temptation to flood them with clinical information.
Marks and Haupt (2023)1
  1. Marks M, Haupt CE. AI Chatbots, Health Privacy, and Challenges to HIPAA Compliance. JAMA. 2023 Jul 25;330(4):309-310. PMID: 37410450.
  2. Atleson M. The Luring Test: AI and the engineering of consumer trust [Internet]. Washington (DC): Federal Trade Commission, Division of Advertising Practices; 2023 May 1 [cited 2023 Oct 23]. Available from:


Chatbots (such as ChatGPT) should not be listed as authors because they cannot be responsible for the accuracy, integrity, and originality of the work.

International Committee of Medical Journal Editors (2023)1

The general consensus among prominent scientific publishing organizations1-5 is that AI models cannot be credited as authors because they cannot be held accountable for their statements. Human authors alone must accept the responsibility of authorship. Simultaneously, presenting the output of an AI model as one's own work is unethical and compromises the integrity of your research.

Therefore, if you use these tools in your research:

  • Use them to support and compliment your own efforts, not as a replacement for them.
  • Declare that you have used AI tools, including which ones and how you used them.
  1. Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals [Internet]. International Committee of Medical Journal Editors; 2023 May [cited 2023 Oct 13]. 19 p. Available from:
  2. Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature. 2023;613(7945):612. doi: 10.1038/d41586-023-00191-1. PubMed PMID: 36694020.
  3. Flanagin A, Bibbins-Domingo K, Berkwits M, Christiansen SL. Nonhuman "Authors" and Implications for the Integrity of Scientific Publication and Medical Knowledge. JAMA. 2023;329(8):637-9. doi: 10.1001/jama.2023.1344. PubMed PMID: 36719674.
  4. Hosseini M, Rasmussen LM, Resnik DB. Using AI to write scholarly publications. Account Res. 2023:1-9. Epub 20230125. doi: 10.1080/08989621.2023.2168535. PubMed PMID: 36697395; PubMed Central PMCID: PMC10366336.
  5. Thorp HH. ChatGPT is fun, but not an author. Science. 2023;379(6630):313. Epub 20230126. doi: 10.1126/science.adg7879. PubMed PMID: 36701446.

Other Limitations

  • Outputs are often unnecessarily verbose and repetitive.
  • Due to attempts at moderation of harmful content by the developers, LLMs will sometimes be excessively cautious.
  • Conversely, because the developers' moderation practices cannot account for everything, some harmful outputs will still be possible.