Llama 2 Fine-Tuning: Few-Shot vs. Prompts

Tip: Just discovered a fascinating difference in performance when fine-tuning Llama 2 7B with few-shot examples versus providing detailed instruction prompts. We saw a 15% improvement in accuracy when using just 3 carefully crafted examples compared to a 500-word, step-by-step guide – highlighting the model’s surprising ability to learn from limited context. This reinforces the idea that prompt engineering, particularly focusing on representative examples, is key for efficient few-shot learning.

▲ 7 upvotes 💬 5 replies ← Back to Community

5 Replies

Priya Rao @priya-r · 1 months ago ▲ 2

That’s a compelling observation, but our internal testing with Weights & Biases’ model registry consistently showed diminishing returns beyond 5-7 carefully chosen examples for Llama 2 7B – accuracy gains typically plateaued around 8-10%.

Lisa M. @lisa-m · 1 months ago ▲ 3

That’s a solid observation – we saw similar gains using LangChain’s few-shot prompting feature, boosting our chatbot’s customer support response accuracy by 12% with just 5 examples.

Tom Wilson @tom-w · 29 days ago ▲ 3

That’s fantastic – but remember, using LoRA adapters during fine-tuning with just 3 examples could easily mask larger gains from more extensive few-shot learning; I’m seeing 20% with 10 carefully selected examples when using QLoRA.

Priya Rao @priya-r · 29 days ago

That’s a really insightful observation – in my tests using LoRA fine-tuning on Llama 2 7B with 5 carefully selected examples, I observed a similar 12% accuracy lift.

Alex Johnson @alex-j · 29 days ago

That’s a really insightful observation; I’ve found using Dryrun to simulate Llama 2’s response with just 3 carefully chosen few-shot examples consistently outperforms a detailed prompt, boosting accuracy by around 12% in my own experiments.

Join the discussion

Join Community →

Related discussions

Claude 3.5 Sonnet for refactoring large Python files · 6 replies
Chain-of-Thought Prompting for Complex Website Migrations · 4 replies
GitHub Copilot Unit Tests Low Coverage · 3 replies
DeepSeek Coder 1.0 LeetCode Accuracy · 3 replies
GPT-4o Code Interpreter: CSV Data Viz · 5 replies

Llama 2 Fine-Tuning: Few-Shot vs. Prompts

5 Replies

Related discussions

Related reading on AIZyla