Corrections: - File count: 22 → 23 files (actual count verified) - Line count: ~6,200 → ~5,600 lines (verified with wc -l) - Version history: Listed all 6 specialties explicitly (itil, devops, prompt-engineering, data-analysis, sccm, TEMPLATE) - Added LEGACY.md to full documentation list All metrics now reflect actual v6 system state.
277 lines
8.8 KiB
Markdown
277 lines
8.8 KiB
Markdown
## description: "A consolidated guide covering Chain-of-Thought (CoT) methods, advanced variants, program-aided reasoning, verification frameworks, and a documentation review checklist for authors and reviewers."
|
|
|
|
## A Guide to Advanced Reasoning and Problem-Solving Techniques
|
|
|
|
## Purpose and audience
|
|
|
|
This document consolidates foundational and advanced prompting techniques that help Large Language Models (LLMs) solve complex reasoning tasks. It's written for prompt engineers, AI researchers, documentation writers, and reviewers who need a practical reference and a checklist for producing and evaluating CoT-style prompts and artifacts.
|
|
|
|
## TL;DR
|
|
|
|
* Use Chain-of-Thought (CoT) to get LLMs to expose intermediate reasoning steps.
|
|
* Start with Zero-Shot CoT for quick wins; adopt Few-Shot or Auto-CoT when you can supply curated demonstrations.
|
|
* Improve robustness with Self-Consistency, Least-to-Most, and Plan-and-Solve techniques.
|
|
* Offload exact computation via Program-of-Thoughts (PoT) or Program-Aided Language models (PAL).
|
|
* Reduce hallucination and factual errors with Chain-of-Verification (CoVe).
|
|
|
|
## 1. Foundational CoT Techniques
|
|
|
|
These are the primary methods for implementing CoT prompting. For a detailed implementation guide and prompt templates, see the [Chain-of-Thought Deep Dive](http://./style.cot.instructions.md).
|
|
|
|
### 1.1 Few-Shot CoT
|
|
|
|
Description: Provide a small set (3-4 recommended) of demonstrations that include: question, step-by-step reasoning (the chain), and the final answer.
|
|
|
|
When to use:
|
|
|
|
* Complex tasks with consistent reasoning structure.
|
|
* When you can author high-quality, diverse examples.
|
|
|
|
Pros/Cons:
|
|
|
|
* Strong guidance for the model.
|
|
* Manual and time-consuming to craft good demonstrations.
|
|
|
|
Tips:
|
|
|
|
* Keep examples concise but complete.
|
|
* Vary difficulty slightly to improve generalization.
|
|
|
|
### 1.2 Zero-Shot CoT
|
|
|
|
Description: Trigger reasoning by appending a simple phrase such as "Let's think step by step." No examples required.
|
|
|
|
When to use:
|
|
|
|
* Fast experiments, prototyping, or when demonstration data is unavailable.
|
|
|
|
Pros/Cons:
|
|
|
|
* No manual examples required.
|
|
* May be less reliable than high-quality Few-Shot demonstrations for hard problems.
|
|
|
|
### 1.3 Auto-CoT (Automatic CoT)
|
|
|
|
Description: Automates demonstration selection using clustering of questions and Zero-Shot CoT to generate reasoning chains for representative samples.
|
|
|
|
When to use:
|
|
|
|
* Large datasets where manual demo creation is impractical.
|
|
|
|
Pros/Cons:
|
|
|
|
* Scales to large datasets; improves diversity of prompts.
|
|
* Requires a pipeline for clustering and auto-generation; quality depends on clustering and zero-shot outputs.
|
|
|
|
### 1.4 Retrieval-Augmented Generation (RAG)
|
|
|
|
Description: Enhances LLM responses by integrating external knowledge sources, reducing hallucinations and providing up-to-date, verifiable information.
|
|
|
|
When to use:
|
|
|
|
* When answers require domain-specific, real-time, or proprietary information not present in the model's training data.
|
|
|
|
Pros/Cons:
|
|
|
|
* Improves accuracy and trustworthiness.
|
|
* Adds complexity and latency due to the retrieval step.
|
|
|
|
For detailed implementation patterns, see the [RAG Deep Dive](http://./style.rag.instructions.md).
|
|
|
|
## 2. Advanced CoT Variants
|
|
|
|
### 2.1 Self-Consistency
|
|
|
|
How it works:
|
|
|
|
* Sample multiple reasoning trajectories from the model (different decoding seeds or temperature).
|
|
* Aggregate answers via majority vote or other consensus method.
|
|
|
|
Why it helps:
|
|
|
|
* Reduces sensitivity to single-path errors; harnesses diversity for more reliable answers.
|
|
|
|
Costs:
|
|
|
|
* Increased token consumption and compute.
|
|
|
|
### 2.2 Least-to-Most Prompting
|
|
|
|
How it works:
|
|
|
|
1. Decompose a hard problem into simpler sub-problems.
|
|
2. Solve sub-problems sequentially, passing prior solutions forward.
|
|
|
|
Why it helps:
|
|
|
|
* Breaks down complexity and reduces error accumulation.
|
|
|
|
### 2.3 Tree-of-Thought (ToT)
|
|
|
|
How it works:
|
|
|
|
* The model explores multiple reasoning paths (branches) simultaneously.
|
|
* It self-evaluates and prunes less promising branches, pursuing the most logical path.
|
|
|
|
Why it helps:
|
|
|
|
* Overcomes single-path failures common in standard CoT. Excellent for problems with complex decision spaces.
|
|
|
|
For detailed implementation guides and prompt templates, see the [Tree-of-Thought Deep Dive](http://./style.tot.instructions.md).
|
|
|
|
Edge cases:
|
|
|
|
* Decomposition quality matters. Poor decompositions can hurt performance.
|
|
|
|
## 3. Program-Aided Reasoning
|
|
|
|
Offload deterministic computation to a real interpreter to avoid LLM numerical errors and enforce correctness.
|
|
|
|
### 3.1 Program-of-Thoughts (PoT)
|
|
|
|
* Prompt the LLM to output a program (commonly Python) that implements the reasoning.
|
|
* Execute the program in a trusted interpreter and return results.
|
|
|
|
Benefits:
|
|
|
|
* Accurate arithmetic and deterministic steps.
|
|
* Easier to test, debug, and unit-test logic.
|
|
|
|
Risks:
|
|
|
|
* Needs secure sandboxing for arbitrary code execution.
|
|
|
|
### 3.2 Program-Aided Language Models (PAL)
|
|
|
|
* Mixes natural language reasoning with interleaved code snippets for computation.
|
|
* Execute code snippets and feed results back into the reasoning chain.
|
|
|
|
When to prefer PoT vs PAL:
|
|
|
|
* PoT: fully program-first for heavy computation.
|
|
* PAL: when you want readable reasoning interleaved with code.
|
|
|
|
## 4. Verification and Refinement Techniques
|
|
|
|
### 4.1 Chain-of-Verification (CoVe)
|
|
|
|
A 4-step self-verification loop designed to reduce hallucinations:
|
|
|
|
1. Generate baseline response.
|
|
2. Plan verification questions targeted at weak claims.
|
|
3. Independently answer each verification question (avoid bias from baseline).
|
|
4. Produce a final, corrected answer using verification results.
|
|
|
|
When to use:
|
|
|
|
* High-stakes outputs or when factuality/trustworthiness matters.
|
|
|
|
Trade-offs:
|
|
|
|
* Extra costs and latency; significant gains in factual accuracy when verification steps are well-designed.
|
|
|
|
## 5. Practical Contract for Prompt Components
|
|
|
|
* Inputs: Natural language problem, (optional) demonstration set, optional execution sandbox for code.
|
|
* Outputs: Final answer, optional reasoning trace, and (when used) executed program and program output.
|
|
* Error modes: Calculation errors, omitted steps, nondeterminism, unsafe code in generated programs.
|
|
|
|
Edge cases to plan for:
|
|
|
|
* Empty or ambiguous user input.
|
|
* Very large or adversarial inputs.
|
|
* Long multi-step reasoning paths that exceed token limits.
|
|
* Security concerns when executing generated code.
|
|
|
|
Testing:
|
|
|
|
* Unit test with representative problems (happy path + 1-2 edge cases).
|
|
* Smoke test for program-execution paths (syntactic correctness and sandbox safety).
|
|
|
|
## 6. Documentation Review Checklist (for authors & reviewers)
|
|
|
|
Use this checklist when authoring or reviewing prompts, examples, and docs:
|
|
|
|
### Content & Accuracy
|
|
|
|
* Technical claims are supported or labeled as conjecture.
|
|
* References (papers, datasets) included where appropriate.
|
|
* No PII or unsafe content.
|
|
|
|
### Clarity & Readability
|
|
|
|
* Active voice and short sentences.
|
|
* Headings are front-loaded, under eight words when possible.
|
|
* Examples are minimal, runnable, and annotated.
|
|
|
|
### Formatting & Style
|
|
|
|
* Use inline code for tokens/code and fenced blocks for runnable snippets.
|
|
* Callouts: Note / Important / Warning used properly.
|
|
* Tables fully filled or marked N/A.
|
|
|
|
### Reproducibility
|
|
|
|
* Include exact prompt templates and variables using ${var} or clear placeholders.
|
|
* If code execution is required, include sandbox instructions and safety notes.
|
|
|
|
### Verification
|
|
|
|
* Add a suggested verification plan (CoVe-like) for non-trivial claims.
|
|
* Provide unit tests or example inputs/outputs when possible.
|
|
|
|
### Actionable Feedback (for reviewers)
|
|
|
|
* Offer concrete rewrites for unclear paragraphs.
|
|
* Convert long prose into numbered steps where sequence matters.
|
|
|
|
## 7. Minimal Examples
|
|
|
|
Zero-Shot CoT trigger (pseudoprompt):
|
|
|
|
Q: [problem statement]
|
|
|
|
A: Let's think step by step.
|
|
|
|
Few-Shot CoT (structure):
|
|
|
|
Q: Example question 1
|
|
|
|
A: [Step 1]. [Step 2]. The answer is X.
|
|
|
|
Q: New question
|
|
|
|
A:
|
|
|
|
PoT example sketch:
|
|
|
|
# LLM outputs a Python function that implements the logic
|
|
|
|
def solve(input):
|
|
|
|
# compute
|
|
|
|
return result
|
|
|
|
# Runner executes the function and returns the printed/returned value
|
|
|
|
## 8. Security and Operational Notes
|
|
|
|
* Do not execute generated code without sandboxing and resource limits.
|
|
* Log program executions and outputs for auditing.
|
|
* Rate-limit Self-Consistency or multi-sample methods to control cost.
|
|
|
|
## 9. Next Steps and Suggested Improvements
|
|
|
|
* Add curated Few-Shot demonstrations (3-4) as a companion demos/ file.
|
|
* Implement a small Auto-CoT pipeline (clustering + generator) and include reproducible scripts.
|
|
* Add unit tests for PoT-generated code (example harness + sandbox instructions).
|
|
|
|
## 10. References & Further Reading
|
|
|
|
(Representative pointers - include exact citations when publishing)
|
|
|
|
* Chain-of-Thought prompting literature.
|
|
* Auto-CoT and Self-Consistency papers.
|
|
* Program-of-Thoughts and Program-Aided Language Models (PoT, PAL).
|
|
* Chain-of-Verification (CoVe) verification methods. |