What is Hallucination? Definition & Guide

A hallucination in the AI context occurs when a large language model generates output that sounds confident and plausible but is factually incorrect, fabricated, or nonsensical. The model might cite studies that do not exist, invent statistics, attribute quotes to the wrong person, or describe features of a product that do not exist. The term "hallucination" reflects the fact that the model is "perceiving" patterns that are not grounded in reality.

Why it matters: hallucinations are the primary risk in using LLMs for any factual or high-stakes application. If you use an LLM to write a blog post and it invents a statistic, you publish misinformation. If an AI-powered customer support bot hallucinates product features, you create false expectations. If an AI agent makes business decisions based on fabricated data, the consequences can be severe. Understanding hallucination risk is essential for anyone deploying AI in production.

Why they happen: LLMs generate text by predicting the most likely next token based on patterns learned during training. They do not have a "truth database" or the ability to verify facts. When the model encounters a prompt about something it has limited training data on, or when it needs to fill a gap in its knowledge, it generates plausible-sounding text based on patterns rather than facts. The model is always generating, never verifying.

How to reduce hallucinations: use RAG (Retrieval-Augmented Generation) to ground the model's responses in verified data. Include explicit instructions in your prompts ("Only use information from the provided context. If you do not know the answer, say so."). Lower the temperature parameter (which controls randomness, lower = more conservative). Implement fact-checking layers where a second LLM or automated system verifies the first model's output. For structured data tasks, validate outputs against schemas or databases.

Detection strategies: implement automated checks on LLM output. For factual claims, cross-reference against known databases. For code generation, run the generated code. For content, compare key claims against source documents. Human review remains the most reliable detection method for critical use cases. Some organizations use a "confidence scoring" approach where the LLM rates its own confidence and flags low-confidence outputs for human review.

Common mistakes: assuming a confident-sounding response is accurate. Not implementing any verification layer in LLM-powered workflows. Blaming the model when the real issue is insufficient context in the prompt. Using LLMs for tasks that require 100% factual accuracy without a verification step. Not educating team members about hallucination risk before they start using AI tools.

Practical example: a marketing team uses an LLM to draft competitor comparison pages. The first draft includes specific pricing for three competitors, two of which are accurate and one of which is fabricated. Without human review, this would publish incorrect pricing information. The team implements a workflow: the LLM drafts the content, flags any specific claims with [VERIFY] tags, and a team member checks each tagged claim against the competitor's actual pricing page before publishing. This hybrid approach leverages the LLM's speed while preventing hallucinated facts from reaching the public.

Hallucination

Related terms

Put these concepts into action