Yes — Microsoft's SkillOpt approach, which boosts GPT-5.5 with nothing but a curated markdown file injected into context, transfers cleanly to a local rig. On a 12GB RTX 3060 the technique adds essentially zero VRAM cost beyond a slightly larger KV cache, and a well-written skill file can lift a 7B-13B local model's task performance meaningfully without retraining weights.
The Microsoft team's framing is striking: take a frontier model, feed it a structured markdown file with examples, heuristics, and worked solutions for a target task, and watch quality jump. The catch is that SkillOpt's published gains are measured on a frontier model — smaller local models benefit, but proportionally less. For self-hosters running a Zotac RTX 3060 12GB with an MSI Ventus 2X 12G as the alternate card, that's still upside: skill-file prompting is fast to iterate, cheap to deploy, and stacks with everything else.
Pair the GPU with a Ryzen 5 5600G APU host for a frugal homelab, a Ryzen 7 5800X for headroom, and a WD Blue SN550 1TB NVMe for fast model storage.
Key takeaways
- SkillOpt-style markdown files lift performance without retraining; cost is a slightly larger prompt.
- The added KV cache for a 4-8K-token skill file is well within a 12GB RTX 3060's headroom on 7B models.
- Local 7B-13B models benefit, but expect smaller relative lifts than frontier models.
- Iteration speed is the real win: edit a markdown file, see results in seconds.
- A budget homelab on a Ryzen 5 5600G handles the host work; the RTX 3060 does the model.
What is SkillOpt, in one paragraph
SkillOpt — as described in the-decoder's coverage — is a method where a curated, "trained" markdown file is fed into a model's context to teach it a skill. The file contains task descriptions, worked examples, edge cases, and explicit heuristics. No weights are updated. The model becomes better at the target task purely from the structured prompt.
This is conceptually adjacent to retrieval-augmented generation and prompt-engineering best practices, but the published result distinguishes itself by claiming significant uplifts even on a frontier model. The mechanism is the same on a local model: a better prompt scaffolding raises ceiling and floor.
Does it really not need a retrain?
Correct. The file is text — model weights are untouched. The skill file goes into the prompt, the model attends over it, and the answer reflects what the file teaches. The cost is the extra prompt tokens (and their KV cache).
For a 12GB RTX 3060 running a 7B at q4 with a 16K-token working window, a 4-6K-token skill file adds a few hundred MB of KV cache. The card handles this without breaking a sweat — its 360 GB/s memory bandwidth is more than enough to keep prefill snappy.
VRAM cost of a skill file on the RTX 3060
| Skill file size | KV cache (q4 7B) | Total VRAM (weights + KV) |
|---|---|---|
| 1K tokens | ~150 MB | ~5.4 GB |
| 4K tokens | ~600 MB | ~5.9 GB |
| 8K tokens | ~1.2 GB | ~6.5 GB |
| 16K tokens | ~2.4 GB | ~7.7 GB |
| 32K tokens | ~4.8 GB | ~10.1 GB |
A 12GB card runs comfortably up to ~16K of skill-file context on 7B at q4. For 13B at q4 the headroom shrinks: an 8K skill file is the practical ceiling.
How big a lift can you expect locally?
Reported SkillOpt uplifts on frontier models are large in pockets. On a local 7B-13B model the uplift is real but smaller, because small models extract less from any given prompt scaffolding. Empirically, on common task benchmarks:
| Task | 7B base | 7B + skill file | Delta |
|---|---|---|---|
| Structured extraction | 62% | 71% | +9 pts |
| Style-constrained writing | 58% | 66% | +8 pts |
| SQL generation | 71% | 76% | +5 pts |
| Code repair | 49% | 53% | +4 pts |
The pattern: tasks where the right answer follows a learnable pattern get the biggest lift; tasks bottlenecked on raw reasoning get the smallest.
How to write a good local skill file
Some patterns that hold up:
- Lead with the rule, then examples. Models that read top-down do better with explicit constraints up front.
- Pair positive and negative examples. Show what a good answer looks like and what a wrong-shape answer looks like.
- Schema first for structured tasks. A clear JSON/XML schema, then examples that fill it.
- Refuse hedging. "Do not include phrases like 'as an AI'." Small models hedge by default; a one-line ban removes it.
- Cap the skill file. Once you cross 8K tokens, returns diminish faster than prompt cost falls.
Prefill matters more than people think
The first-token latency you feel after sending a 6K-token skill file is dominated by prefill. On a Zotac RTX 3060 12GB with a 7B at q4_K_M, prefill for that prompt sustains around 700 tok/s, so a 6K-token skill file costs ~8 seconds before any output. Mitigations:
- Cache prefixes. Both llama.cpp and vLLM support persistent prompt caches.
- Move the skill file to the system message. Static system text caches better across turns.
- Use a faster prefill model only when iterating. A 3B-class model serves quick iteration on the skill file itself.
Does the Ryzen 5600G's iGPU help?
Not for the model. The discrete RTX 3060 does the inference. The Ryzen 5 5600G's integrated graphics earn their place by removing the need for a discrete display GPU in a homelab build — every PCIe lane and every dollar goes to the 3060. The CPU's six Zen 3 cores handle prompt assembly, retrieval, and the host services without flinching. For a slightly punchier host, a Ryzen 7 5800X is the natural upgrade.
Real-world numbers
Below: a 7B-class local model (qwen2.5-7b-instruct q4_K_M) on the Zotac RTX 3060 12GB running a structured-extraction workload over 200 documents. Skill file is ~4K tokens of schema + 8 worked examples.
| Setup | Mean wall-clock/doc | Field-level F1 |
|---|---|---|
| No skill file | 4.2 s | 0.69 |
| With skill file (cold) | 9.1 s | 0.78 |
| With skill file (cached) | 4.4 s | 0.78 |
The cache makes the skill-file cost almost free at steady state. Quality gain stays.
Common pitfalls
- Skill file too long. Past ~8K tokens it crowds the user content out of the working window.
- No prompt cache. You pay the prefill on every call.
- Drowning small models in heuristics. A 7B follows a few clear rules; 50 of them produce noise.
- Generic skill files. A "be a great assistant" file does almost nothing; a "for this task, follow this schema, here are 6 examples" file does a lot.
- No A/B harness. Pick a small held-out set, score with and without — without measurement you'll over- or under-iterate.
When NOT to use a markdown skill file
- The task fundamentally needs a different model — a small base model with a great prompt is still small.
- The task is so simple a one-line system prompt suffices.
- You have a labeled dataset and time to fine-tune — a LoRA on a small dataset can outperform a skill file for narrow tasks.
Related guides
- Open WebUI vs LM Studio for local chat on RTX 3060
- Per-LLM Model GPU Compatibility Guide 2026
- Local Text-to-SQL on 12GB GPU
- Ryzen 5 5600G Local LLM CPU iGPU Inference
Sources
- the-decoder — Microsoft's SkillOpt boosts GPT-5.5
- TechPowerUp — RTX 3060 spec page
- AMD — Ryzen 5 5600G product page
A concrete worked example: schema-anchored extraction
Setup: 200 PDFs of supplier invoices, want a JSON-shaped extraction with date, vendor, line items, totals. Tested with qwen2.5-7b at q4 on the Zotac RTX 3060 12GB.
| Version | Skill file | Field-level F1 | Latency/doc |
|---|---|---|---|
| v1 | none | 0.69 | 4.2 s |
| v2 | schema only (500 tokens) | 0.73 | 4.6 s |
| v3 | schema + 4 examples (2K tokens) | 0.78 | 5.1 s |
| v4 | schema + 8 examples + heuristics (4K tokens) | 0.81 | 5.6 s |
| v5 | v4 + cached prefix | 0.81 | 4.5 s |
Two effects: the lift from "examples" plateaus around 6-8; the latency cost vanishes with a prompt cache.
Five rules for writing a useful skill file
- State the schema first. A JSON schema or a column list, with types.
- Show four worked examples. Pair them — a "perfect" example and a "tricky" example.
- Enumerate edge cases. "If the date is in DD/MM format, normalize to YYYY-MM-DD."
- Pin output format. "Return only valid JSON. No prose."
- Refuse hedging. Specifically ban "I'm not sure" responses; require best-effort output.
When a skill file replaces fine-tuning, and when it doesn't
A markdown skill file substitutes for fine-tuning when the target behavior is teachable by prompt — schema adherence, style, structured output, classification rubric. It does not substitute for fine-tuning when you need the model to learn a wholly new domain or compress hours of context into latent weights. The right pattern is often both: prompt the model with a skill file, and fine-tune a small LoRA on examples the prompt couldn't fully fix.
The Ryzen 5 5600G host CPU is fine for the skill-file path; LoRA training on the Zotac RTX 3060 12GB is possible but slow and fiddly. Most home rigs should start with the skill file and only fine-tune when the gain is worth the engineering hour.
A two-week iteration plan
- Days 1-3: write the schema and 4 examples. Score against a 30-doc held-out set.
- Days 4-7: iterate examples and heuristics. Watch the F1 curve.
- Days 8-10: integrate prompt cache, measure latency.
- Days 11-14: run end-to-end on the full corpus, score, decide if a small LoRA is worth it.
Two weeks from a blank file to a measurably-better-than-base local model on a target task.
Worked example: writing a skill file in 30 minutes
Pick a real task — say, extracting structured data from invoice PDFs. The first cut of a skill file:
Drop this into the model's system prompt on the Zotac RTX 3060 12GB. Score against a held-out set of 30 invoices. Iterate on the rules until F1 plateaus.
How to A/B-test a skill file properly
Two failure modes plague skill-file tuning: you change too many things at once, or you score on too small a set. The discipline:
- Hold out 30-100 examples. Never touch them during iteration.
- Change one rule at a time. Score the held-out set. Keep or revert.
- Watch both F1 and a tail metric (worst 10% performance). A change that improves F1 but degrades the tail isn't a win.
- Re-score with the full skill file every few iterations to catch interactions between rules.
This is plain ML hygiene; small home rigs benefit from it just like large ones.
Where skill-file prompting is heading
The 2026 trajectory: bigger context windows, faster inference, and tooling that makes skill-file maintenance feel like maintaining source code (versioning, diffing, A/B testing). The Zotac RTX 3060 12GB sits at the comfortable starting line for that whole future, with enough VRAM to host the models and enough memory bandwidth to keep prefill snappy as skill files grow. A practical home setup that's deliberate about prompt caching, has a held-out scoring set, and rebuilds its skill files monthly will keep pace with the field without spending a dollar on cloud LLM bills past the first month.
