Popular Lesson

1.6 – Hallucinations and Prompt Quality (Getting to Reliable Answers) Lesson

Stop fake AI answers by designing prompts that require sources, add grounding, and set clear escalation rules. Watch the lesson video for real failure cases and practical tactics you can apply today.

What you'll learn

  • Identify moments when models are likely to hallucinate, including thin prompt context, stale data, and pressure to answer quickly.

  • Write prompts as clear specifications that state the task, inputs, constraints, and output format.

  • Make honesty the default by requiring citations for claims or a clear I don’t know when evidence is missing.

  • Ground responses with Retrieval Augmented Generation that quotes sources and avoids speculation or creative gap filling.

  • Measure reliability using citation coverage checks, spot audits of sources, and tracking accuracy over time.

  • Escalate uncertain or high risk topics to search or human review using confidence thresholds and safe fallback paths.

Lesson Overview

This lesson tackles one of the most common failures in AI use: outputs that sound confident yet are wrong. Hallucinations happen because models predict likely words, not truth. When your prompt is light on context, your data is out of date, or you push for fast answers, the error rate rises sharply. The goal here is to replace guesswork with systems that prefer evidence, cite sources, and decline when they do not know.

You will learn how to write prompts like specifications. That means defining the task, supplying inputs such as files or examples, naming constraints, and stating the format you expect back. You will also set one simple default: require sources or abstain. The video shows how this single change lifts reliability.

Grounding adds another layer of safety. With Retrieval Augmented Generation, the model pulls facts from a trusted knowledge base and quotes the exact snippets it used. If retrieval finds nothing relevant, the model should say so. No speculation. You will also see how to measure what the system produces, including tests for citation coverage and spot checks, and when to add human review for high impact work like legal or policy decisions. The lesson closes with examples of what good looks like, such as a contract tool that cites section numbers, and what to avoid, like a support bot that invents policy and erodes trust.

Who This Is For

If you need AI outputs you can stand behind, this lesson is for you. It is useful for anyone who ships prompts, owns AI features, or reviews AI answers when errors carry real costs.

  • Product and ops teams that deliver AI features to customers or internal users
  • Customer support leaders who need accurate, policy-aligned responses
  • Legal, compliance, and risk teams that must prevent fabricated citations
  • Analysts and researchers who summarize documents or answer questions from files
  • Educators and trainers who want students to use AI without spreading falsehoods
Skill Leap AI For Business
  • Comprehensive, Business-Centric Curriculum
  • Fast-Track Your AI Skills
  • Build Custom AI Tools for Your Business
  • AI-Driven Visual & Presentation Creation

Where This Fits in a Workflow

Use these methods when you define how your AI should answer and before you roll out a tool to a wider audience. Start by shaping the prompt as a clear specification and add a strict rule: cite or decline. Next, connect grounding so the system retrieves relevant passages and quotes them. Then set confidence thresholds and escalation paths for risky topics.

Examples:

  • Contract review: The assistant retrieves clauses from a repository, quotes section text, and if it finds no match, returns No termination clause found in this document.
  • Customer support: The bot answers only from approved policy pages, cites the page and section, and routes the case to a human when coverage is low or the topic is high risk.

Technical & Workflow Benefits

The old way: write a generic prompt, trust the model’s tone, and fix errors after they reach users. This leads to fabricated answers, false citations, refunds, and lost trust. It also creates hidden workload, because teams must triage complaints, rework outputs, and explain failures.

The improved approach: treat the prompt as a specification, require sources or an explicit I don’t know, and ground answers with retrieval that quotes exact passages. Add simple checks for citation coverage and a human review path for high stakes cases. This reduces guesswork, shortens review cycles, and raises the share of answers you can publish as-is. In legal and policy-heavy work, it prevents invented case law and protects your organization from avoidable risk. In support, it reduces escalations caused by wrong answers and keeps responses consistent across the team.

Practice Exercise

Scenario: You maintain a knowledge base with five policy PDFs and want reliable answers to employee questions.

Steps:

  1. Write a prompt specification. Include:
  • Task: Answer employee questions using only the attached policy documents.
  • Inputs: The five PDFs.
  • Constraints: Quote the source passage and include document name and section. If a claim lacks a source, respond with I don’t know.
  • Output format: Short answer, followed by citations with quoted snippets.

2. Add grounding. If you have a retrieval tool, connect the PDFs. If not, paste short excerpts from the documents into the prompt context. Require the model to select and quote the most relevant excerpt for every claim.

3. Test and measure. Ask 10 realistic questions. Record:

  • Percent of answers with at least one citation and a direct quote
  • Percent of I don’t know responses when coverage is missing
  • Results of three spot checks where you open the cited source and verify the quote

Reflection: Which prompt edits improved citation coverage the most, and where did the system still guess or overreach?

Course Context Recap

This lesson strengthens the course theme of getting to reliable answers. Earlier lessons introduced the idea that AI should cite or decline. Here, you turn that principle into daily practice using grounding, measurement, and clear escalation rules. Next, you will continue building guardrails and evaluation habits that adapt as models change. Keep going to see how these safety checks link with your broader AI workflows and how to evolve them as new failure modes appear.