AI-Driven Driver Install on Win9x: Vision LLM + Voodoo3 Recovery

AI-Driven Driver Install on Win9x: Vision LLM + Voodoo3 Recovery

A vision LLM watching KVM screenshots can provision a blank Win98 SE machine with working Voodoo3 and Audigy drivers in under 20 minutes.

Vision LLMs watching KVM screenshots can install Win98 drivers automatically. Here's the architecture, real walk-through, costs, and failure modes of the retro-agent approach.

Yes — as of 2026, a vision LLM watching KVM screenshots can install Windows 98 SE drivers automatically, including the notoriously difficult Voodoo3 and Creative Audigy sequences. The retro-agent project has demonstrated a blank Win98 SE install to working Voodoo3 + Audigy FX + NIC in under 20 minutes, with no human interaction after the initial boot.


The retro-agent Fleet

Running a fleet of period-correct retro PCs for testing, preservation, or content creation involves a surprising amount of tedious work: driver installs, INF surgery, registry repair, PnP re-detection cycles. Until recently, this work was entirely manual. Modern vision-language models have changed the calculus.

The retro-agent project at github.com/voidsstr/retro-agent is an open-source orchestrator that controls physical retro machines via KVM-over-IP, using a vision LLM to interpret screenshots and a text LLM to reason about the next action. The system has successfully handled Win98 SE, Win2K SP4, and WinXP SP3 driver installations, including the 3dfx Voodoo3 driver sequence that requires PnP re-detection after initial file copy.

This article covers the architecture, the hardware you need on the modern side, the specific failure modes encountered on Voodoo3, and the cost-per-machine math that makes local LLM hosting the only viable approach for fleet operators.


Key Takeaways

  • Vision LLMs can see and interact with Win98 SE screens via KVM-over-IP; no special software on the retro machine required
  • The Voodoo3 installer's PnP-only registration step defeats all scripted installs — LLM interaction is required for correct device registration
  • An RTX 3060 12GB (ASIN B08WRVQ4KR) running Qwen3-VL 7B Q4 is sufficient for the orchestrator host
  • End-to-end latency per click: 8-15 seconds; full Win98 SE driver session: 14-22 minutes
  • Local inference cost per machine: ~$0.012 (electricity). Claude API: $0.40-$0.70 per machine
  • The architecture works on Win98 SE, Win2K SP4, and WinXP SP3 without code changes

Why Win9x Driver Installers Can't Be Silent-Scripted

Modern software regularly supports /S, /quiet, or /silent command-line switches for unattended installation. Win9x-era installers do not.

3dfx Voodoo3 drivers shipped with an InstallShield 5.x-based setup.exe from 1999. InstallShield 5 has no documented silent-install capability. The installer:

  1. Extracts files to a temp directory
  2. Copies driver files to \Windows\System
  3. Writes some registry keys for the 3dfx control panel
  4. Stops. It does NOT write the device INF to the PnP registry. The actual device driver binding happens when Windows detects the hardware on next boot and walks the INF database.

This means a "successful" scripted run of setup.exe leaves you with files on disk but an unregistered device. On reboot, the Found New Hardware wizard appears for the Voodoo3 — and if your script isn't watching for it, the machine sits at that dialog indefinitely.

The VOGONS retro hardware forum has documented this behavior extensively. The workaround in the pre-LLM era was either to pre-stage the INF files manually or to have a human present at the keyboard for the PnP detection step.

Creative's Audigy 2 ZS installer has a similar issue on WinXP: it registers the device via the driver INF correctly, but then launches a "Creative AudioHQ" post-install configuration dialog that requires a click to dismiss before the driver is fully active. Silent-scripting terminates setup.exe before this dialog closes, leaving the driver installed but the audio subsystem unconfigured.


Architecture: Vision LLM Watching Screenshots, Text LLM Emitting Clicks

The retro-agent orchestrator runs on a modern host machine (the "LLM host") that controls the retro machine via hardware KVM-over-IP. The architecture has four components:

1. Screen capture loop A Python coroutine polls the KVM-over-IP device every 2-3 seconds, captures a 1920x1080 or 1024x768 screenshot, and writes it to a rolling buffer. The image is downsampled to 1024x768 before LLM submission to reduce token cost.

2. Vision-language model (VLM) The downsampled screenshot is sent to a locally-hosted VLM — either Qwen3-VL 7B or Llama 3.2 Vision 11B — with a structured prompt asking it to describe the current screen state and identify the most prominent interactive element (dialog box, button, text field, filename).

The VLM outputs JSON like:

json
{
  "screen_state": "InstallShield dialog: 'Ready to Install' with OK and Cancel buttons",
  "primary_element": {"type": "button", "label": "OK", "estimated_position": "center-right"},
  "reasoning": "Installation dialog awaiting user confirmation to proceed"
}

3. Action planner (text LLM) A smaller text LLM (or the same VLM in a second pass) receives the screen state description plus the current installation goal ("Install Voodoo3 drivers on Win98 SE") and outputs the next action:

json
{
  "action": "click",
  "target": "OK button in Ready to Install dialog",
  "confidence": 0.94,
  "wait_after_ms": 3000
}

4. HID emulator The action is translated to absolute screen coordinates and issued via the KVM-over-IP's HID emulation channel. The system waits the specified delay, captures a new screenshot, and loops.

The VLM and text planner can be the same model (Qwen3-VL handles both vision and text reasoning), or separated for cost efficiency (run a smaller text planner for the majority of non-visual reasoning steps).


RTX 3060 12GB as the Orchestrator Host

The LLM host needs enough VRAM to run the VLM at interactive latency. For Qwen3-VL 7B at Q4_K_M, the TechPowerUp-verified RTX 3060 12GB runs the model with ~5.2GB VRAM for weights plus ~1.5GB for the image token embeddings, leaving substantial headroom for the 4k context window.

The MSI RTX 3060 12GB (ASIN B08WRVQ4KR) is the recommended host card for the retro-agent orchestrator for the same reasons it dominates entry-level LLM work generally: 12GB VRAM, 360 GB/s bandwidth, and broad llama.cpp CUDA support. At Q4, Qwen3-VL 7B processes a screenshot + prompt in approximately 8-12 seconds end-to-end, which sets the click-cycle time.

For faster cycles (needed when installers have timeout dialogs), upgrade to a 16-24GB card. The RTX 3090 running Qwen3-VL 14B at Q4 reduces cycle time to 4-6 seconds per click — important for installer dialogs that auto-cancel after 30 seconds.

Hardware list for a complete setup:

  • LLM host GPU: RTX 3060 12GB (B08WRVQ4KR) or better
  • KVM-over-IP: PiKVM v4 Plus or Sipeed NanoKVM (both provide VGA/HDMI capture + HID emulation)
  • Retro machine: Standard ATX or SFF with PCI slot, period-correct CPU (Pentium II through Core 2)
  • Network: Both machines on the same LAN; retro machine needs a WinXP/98-compatible NIC

Voodoo3 INF Surgery: PCI ID Matching, Ghost Device Cleanup

The Voodoo3 driver installation has two hard parts:

Part 1: PCI ID matching 3dfx's INF file (voodoo3.inf) lists PCI vendor ID 0x121A and device IDs for the Voodoo3 2000, 3000, and 3500. When the Found New Hardware wizard fires on Win98, it searches \Windows\Inf for a matching entry. If setup.exe ran but the INF wasn't copied correctly, the wizard finds nothing and defaults to the "Unknown Device" state.

The retro-agent handles this by watching for the wizard dialog, checking whether an INF is present via a keyboard-shortcut screen-read of Device Manager, and if not, triggering a secondary task to manually copy the INF before dismissing the wizard.

Part 2: Ghost device cleanup If a Voodoo card was previously installed and removed, Win98 leaves a ghost entry in the registry under HKEY_LOCAL_MACHINE\Enum\PCI. The new installation's PnP detection may attempt to reuse the ghost entry, producing a broken device even after a correct INF install. The retro-agent detects this by checking the screen after the PnP cycle completes — if Device Manager shows a yellow exclamation mark, it sequences through the ghost cleanup procedure (which requires navigating Device Manager dialogs to remove the old entry and force re-detection).

This specific failure mode is documented in the VOGONS Voodoo3 thread archive and was the primary source of "setup.exe runs fine, card doesn't work" reports in 1999-2001.


Real Walkthrough: Blank Win98 SE to Working Voodoo3 + Audigy FX in 18 Minutes

Here's a logged session trace from the retro-agent orchestrator on a Pentium III 733MHz system:

T+0:00 — Win98 SE boots after clean install. Desktop visible, no drivers.

T+0:30 — Orchestrator identifies screen state: "Win98 desktop, no active dialogs." Issues keyboard shortcut to open My Computer, navigates to driver share (mapped via network during install).

T+1:15 — Navigates to Voodoo3 driver directory, double-clicks setup.exe. InstallShield extraction screen visible.

T+2:00 — "Welcome to Setup" dialog. Orchestrator clicks Next.

T+2:40 — "License Agreement." Orchestrator clicks Accept.

T+3:10 — "Ready to Install." Orchestrator clicks OK.

T+4:20 — File copy progress completes. setup.exe exits. No reboot prompt. Orchestrator detects this edge case (expected a reboot dialog; screen returned to desktop). Issues shutdown -r via keyboard shortcut.

T+5:00 — System reboots. Win98 startup screen.

T+6:15 — "Found New Hardware: 3dfx Voodoo3" wizard appears. Orchestrator detects the wizard dialog. Navigates: "Search for the best driver — Specify a location — C:\Windows\Inf." Clicks OK.

T+7:30 — INF found. Driver files copied. "Finish" button. Orchestrator clicks Finish.

T+7:50 — Reboot prompt. Orchestrator clicks Yes.

T+9:00 — Win98 boots into 16-bit VGA (expected; Voodoo3 not yet active as primary display). Desktop visible at 640x480x16. Orchestrator detects correct state.

T+9:30 — Orchestrator navigates to Display Properties, Settings. Sets resolution to 1024x768, color depth to 16-bit. Applies.

T+10:15 — Display switches successfully. Voodoo3 active. GPU install complete.

T+10:30 — Orchestrator navigates to Audigy FX driver directory. Runs setup.exe.

T+12:00 — Creative installer completes file copy. "Configuration required" dialog appears (the post-install step that defeats silent scripting). Orchestrator clicks OK.

T+12:45 — Audio subsystem configuration UI. Orchestrator clicks Finish.

T+13:00 — Reboot prompt. Orchestrator clicks Yes.

T+14:30 — Win98 boots. Sound hardware detected. Orchestrator opens Device Manager, scans for yellow exclamation marks. None found. Both Voodoo3 and Sound Blaster Audigy FX show clean installs.

Session complete. Total wall time: 14 minutes 30 seconds. LLM calls: 87.


Failure Modes: Driver Install.exe Doesn't Create Registry, Only PnP Does

The retro-agent's failure mode catalog from 200+ sessions in 2026:

FailureFrequencyCauseLLM Handling
Ghost device left from prior install18%PnP uses stale registry entryScreen-monitors Device Manager post-install; runs ghost cleanup if yellow-bang detected
PnP wizard "can't find driver"12%INF not in Windows Inf dirRe-copies INF, re-triggers PnP
Installer dialog timeout (30s auto-cancel)8%VLM cycle time exceeds dialog patienceUpgraded to 14B VLM for affected installer types
Post-install config dialog missed6%Orchestrator expected reboot but dialog appearedAdded dialog-state re-check after every setup.exe exit
Registry ghost causes infinite PnP loop4%Same ghost reused twiceManual registry key deletion via regedit.exe scripted by text LLM
Incorrect click coordinates (scaled display)3%KVM resolution mismatchFixed by calibrating VLM prompt with actual screen resolution

The "only PnP creates registry" issue is the most consequential. The retro-agent handles it by always checking Device Manager after the PnP cycle completes. If a device remains in "Unknown" or "Error" state, the orchestrator runs a structured recovery flow that varies by device class (GPU recovery differs from audio recovery in the registry paths involved).


Verdict Matrix: When to Use AI vs Manual

ScenarioUse AI Orchestrator?Why
Single machine, one-time installNoManual is faster than setup time
Fleet of 5+ machines, same driver setYesAmortized over machines; consistent
Unfamiliar installer with timeout dialogsMarginalNeed 14B VLM for reliability
Voodoo3 ghost device cleanupYesDocumented pattern; LLM reliable here
Creative Audigy post-install configYesConsistent dialog pattern
Novel driver with unknown flowNoLLM will fail; better to observe once manually
Remote machine without physical accessYesOnly option without travel
Research and documentation of install flowsYesLLM generates structured logs of every step

The break-even point is roughly 3-4 machines for the same driver sequence. Below that, the orchestrator setup time (configuring KVM-over-IP, deploying the agent, test-running) exceeds the manual install time.


Cost-Per-Install Math

Local 3060 12GB hosting Qwen3-VL 7B Q4:

  • Session length: 14-22 minutes
  • GPU power during inference: ~155W (averaged; not all cycles are compute-bound)
  • Electricity: (18 min / 60) x 0.155 kW x $0.12/kWh = $0.0056 per session
  • Rounded up with idle host overhead: ~$0.012 per machine

Claude API (claude-3.5-sonnet with vision):

  • 175 vision calls x ~800 tokens each (image + prompt) = 140,000 input tokens
  • At $3.00 / MTok input: $0.42
  • 175 responses x ~150 tokens = 26,250 tokens at $15.00 / MTok = $0.39
  • Total: ~$0.81 per machine

At a fleet of 50 machines, local hosting saves $40 per provisioning cycle — and those cycles repeat whenever a machine needs to be restored from backup or reimaged. The economic case for the retro-agent project's local LLM approach is clear at any fleet size above single digits.


Bottom Line

The vision LLM + KVM-over-IP architecture solves a problem that has no other clean solution: unattended driver installation on Win9x-era machines where installers predate silent-install conventions and PnP device registration requires interactive dialog completion.

The RTX 3060 12GB (B08WRVQ4KR) is sufficient as the orchestrator host for Qwen3-VL 7B. The retro-agent approach handles the Voodoo3 PnP registration gap and the Creative Audigy post-install config dialog — the two historically reliable failure points for scripted retro provisioning.

For single machines, manual is faster. For a fleet, the orchestrator is the only economically viable path. At $0.012 per session in electricity, it's also essentially free to run.


Citations and Sources


Frequently Asked Questions

What hardware do I need to run the vision LLM orchestrator? A modern host with 12GB+ VRAM minimum — the MSI RTX 3060 12GB handles Qwen3-VL 7B at Q4 with 4k context, which is enough for screenshot to click reasoning. For faster cycle times, a 16-24GB card (RTX 4070 Ti, RTX 3090) lets you run 14B vision models with longer chain-of-thought. The retro target machine needs only a working network adapter and a screenshot-export path (VNC, KVM-over-IP).

How does the LLM 'see' the Win98 screen? The orchestrator captures a screenshot via the KVM-over-IP device or VNC every 2-3 seconds, downsamples to 1024x768, and feeds it to a vision-language model (Qwen3-VL or Llama 3.2 Vision). The model outputs structured JSON with the next click target and a textual reasoning trace. The orchestrator then issues a HID-emulation click via USB or virtual-keyboard injection. End-to-end latency: ~8-15 seconds per click.

Why can't you just use silent-install switches like /S? Most Win9x-era driver installers (including 3dfx Voodoo3 and Creative Audigy 2 ZS) predate silent-install conventions. Their installers are bespoke InstallShield 3-5 builds with custom dialog flows, registry-touch sequences, and PnP-only device registration paths. The infamous gotcha: Voodoo3's setup.exe doesn't write the INF entries — it ONLY adds files; the actual driver registration happens via PnP detection later. Scripted installs miss this step and leave a working but unrecognized device.

What does this cost in API tokens? A full Win98 SE to Voodoo3 + Audigy FX + NIC working session runs ~150-200 vision-LLM calls. On a local 3060 12GB hosting Qwen3-VL, that's electricity only — roughly 0.08 kWh at $0.012. Using Claude API for the same flow runs $0.40-$0.70 per machine. For a fleet operator like the retro-agent project, local hosting is the only economically viable path.

Can this work on WinXP and Win2K too? Yes — the architecture is OS-agnostic; the LLM only needs to understand the screen. WinXP is actually easier because more installers support /quiet or /silent flags, reducing LLM calls per session. Win2K is the hardest because driver-signing prompts require timed dialog handling. The retro-agent fleet covers all three — Win98 SE, Win2K SP4, and WinXP SP3 — using the same orchestrator codebase.

Products mentioned in this article

Live prices from Amazon and eBay — both shown for every product so you can pick the channel that fits.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Sources

— SpecPicks Editorial · Last verified 2026-05-13