Chain-of-thought (CoT) is a prompting and training technique that elicits intermediate reasoning steps from language models before producing a final answer. By verbalizing the latent reasoning process, CoT improves performance on multi-step arithmetic, logic, planning, and code generation, and makes decision paths more interpretable for review.
What is Chain-of-Thought (CoT)?
CoT instructs the model to “think step by step,” optionally with few-shot exemplars that demonstrate decomposed reasoning. Supervised fine-tuning with rationale annotations, self-consistency sampling (multiple chains then majority vote), and reinforcement learning from process or outcome feedback further enhance robustness. Variants include deliberate reasoning, program-of-thought (executing code steps), and tree-of-thought (exploring branches with selection/reranking). Effective CoT balances verbosity with signal, uses domain-specific schemas for math/code, and integrates with tool use (calculators, code runners) to verify steps.
Where it’s used and why it matters
CoT underpins better reasoning for exams, planning, debugging, and analytics. It reduces shortcut heuristics, surfaces assumptions, and enables adjudication via voting or external checks. In enterprise, it supports auditability for regulated decisions and assists agents by structuring plans into verifiable sub-steps.
Examples
- Math word problems: parse quantities, form equations, compute iteratively.
- Incident triage: outline hypotheses, test via logs/metrics, converge on root cause.
- Code: draft algorithm steps, implement, run tests, fix failing cases.
- Sales ops: reason about territories, quotas, and constraints before recommending actions.
FAQs
- Does CoT reveal sensitive prompts? Redact chain content before logging; restrict in production if disclosure risk is high.
- Is more chain always better? No; overly long chains can introduce errors; self-consistency and tool checks help.
- Can small models do CoT? With fine-tuning and scaffolding, SLMs benefit, especially when paired with calculators or retrieval.
- How to evaluate? Use step-level correctness, final accuracy, and process metrics like contradiction rate.
