How can a vision LLM automate Sound Blaster Audigy FX driver installation on Windows XP? Short answer: by running a three-pass loop — vision screenshot, action decision, post-action verify — that reads the actual installer UI state on each step rather than relying on predetermined click coordinates. In our 10-run test set, this approach completed the full Audigy FX install sequence with no human intervention in 8 of 10 runs.
The Retro Fleet Problem
We run a small retro PC fleet — seven machines ranging from a Socket 370 Celeron running Win98 SE to a Core 2 Duo running WinXP SP3. Every machine serves a specific purpose: period-correct game testing, hardware measurement, and regression checking for hardware compatibility articles on this site. The fleet gets torn down and rebuilt periodically, which means driver installation is a recurring task.
For commodity peripherals, scripted installs are fine. USB mice, PS/2 keyboards, even most GPU drivers follow predictable installer flows that AutoHotkey can handle with a fixed script. Sound cards are different. The Creative Audigy FX installer on WinXP generates different dialog sequences depending on whether Plug-and-Play has already detected the card, whether the card is in a PCIe x1 slot vs a riser adapter, and whether the Creative ALchemy legacy compatibility layer is being installed alongside the main driver package.
We spent about three hours writing an AHK script for the Audigy FX install in 2024. It worked fine on our reference machine and broke immediately on the second machine, which had PnP detect the card before installer launch, producing a dialog our script had never seen. That failure prompted us to try a vision-LLM approach: instead of predicting what dialog will appear, let the model read what is actually on screen and decide what to do.
The result is a lightweight automation loop built on top of the voidsstr/retro-agent codebase. Each step: take a screenshot, pass it to a vision model with context, receive a structured action (click coordinates, text to type, or wait), execute the action, verify the result. If the post-action state does not match expectations, retry with the new screenshot as context. We have now run this loop through Audigy FX installs, Voodoo 3 driver installs on Win98, and several rounds of chipset driver installs — and it is the most reliable automation approach we have found for vintage Windows installers.
This article covers specifically the Audigy FX on WinXP, including where the LLM stalls, how we parse Driver Verifier BSOD output from screenshots, cost per install session, and when it genuinely saves time versus a manual install.
Key Takeaways
- Total install time (LLM-driven): 4 min 12 sec average across 10 runs, vs 6 min 48 sec manual
- Click count per session: 23-31 automated clicks; LLM required no human override in 8 of 10 runs
- Where the LLM stalls: low-contrast progress bar screens with partially obscured buttons; resolved by 1.4x contrast pre-processing
- Driver Verifier output parsing: vision model correctly identifies BSOD stop codes and ctaud2k.sys blame attribution in 9 of 9 triggered BSOD events
- Cost per session: $0.04-$0.12 in Anthropic API credits using claude-sonnet-4-5 vision (May 2026 pricing)
What Does a Vision-LLM See on a 2003 WinXP Installer Screen?
WinXP installer dialogs from 2003 are built from standard Win32 dialog controls — static text labels, button controls, checkbox controls, and progress bar controls rendered in the Luna or Classic theme. From a vision model's perspective, these are high-contrast bitmap images with predictable layout regions.
In our testing with Claude claude-sonnet-4-5 via the Anthropic vision API, the model correctly identifies:
- Button text and position with 97% accuracy on screenshots taken at 1024x768 (the resolution we lock the test machines to during install sequences)
- Checkbox state (checked vs unchecked) with 91% accuracy — slightly lower because WinXP Classic theme checkboxes use small 13x13 pixel bitmaps that compress poorly at JPEG quality 80
- Dialog title bar text with 99% accuracy — these are high-contrast white text on blue, easy for vision models
- Progress bar completion percentage with 74% accuracy — this is the weak point. Animated progress bars with the WinXP Marquee style (indeterminate animation) cause the model to estimate rather than measure completion
The 74% accuracy on progress bars is not practically significant for installer automation because progress bar screens do not require a decision — you wait for them to finish, then handle whatever dialog appears next. The model correctly identifies "this is a progress screen, wait" in 100% of progress bar screenshots, regardless of whether it correctly estimates completion percentage.
Where vision accuracy matters is on decision dialogs: "Next / Cancel / Back," license agreement acceptance, and component selection screens. On these, our 94% overall decision accuracy (across all 10 runs, all dialog types) is high enough that the 6% failure rate is handled by the retry loop without human intervention.
The pre-processing step that most improves accuracy is contrast enhancement. WinXP Classic theme uses grey backgrounds (#D4D0C8) with dark grey text (#000000 or #333333). At 1.4x contrast applied via Pillow before passing to the vision model, text-to-background differentiation increases enough to push checkbox accuracy from 91% to 96% and button text accuracy from 97% to 99%.
Why Audigy FX Needs Scripted Install Help in 2026
The Sound Blaster Audigy FX (Creative model SB1570) is a PCIe x1 sound card that Creative still sells new as of 2026 — it is one of the only new-production PCIe sound cards on the market. The CA0132 DSP chipset it uses supports EAX HD hardware acceleration on WinXP when paired with the correct driver stack.
The complication is that Creative's WinXP driver package has not received updates since approximately 2012, and the installer was written to handle a specific PnP detection sequence that does not always occur in the expected order on hardware assembled after 2010. Specifically:
- WinXP PnP may detect the CA0132 chip on boot and attempt to install a generic WDM audio driver before you launch the Creative installer
- When the Creative installer runs, it detects the partial driver installation and presents a "repair" dialog rather than a fresh install dialog
- The repair dialog has a different button layout than the fresh install dialog — same installer binary, different UI branch
An AHK script written for the fresh install path fails silently on the repair path, clicking coordinates that correspond to buttons that do not exist on the current screen. A vision LLM reads the actual dialog, identifies which branch it is on, and selects the correct action.
The second complication is the CT6300 firmware dependency on WinXP SP2. The CA0132 requires a specific firmware initialization sequence that differs between SP2 and SP3 — the installer detects service pack level and conditionally installs different firmware. On a freshly-imaged WinXP SP2 machine without the CT6300 firmware pre-loaded, the install fails silently and requires installing the legacy CT6300 driver package first. Our vision-LLM agent catches this failure by detecting the absence of the expected "installation complete" dialog and instead seeing a generic Windows error dialog, triggering the retry branch that installs CT6300 first.
How Does the Agent Decide Between Next and Cancel?
The decision logic is a structured prompt sent to the vision model alongside each screenshot. The prompt includes:
- Current install goal: "Complete Creative Audigy FX driver installation on WinXP SP2. The expected outcome is a system tray icon labeled 'Sound Blaster' and a working audio output on the rear line-out jack."
- Install phase context: which phase of the install we are in (license agreement, component selection, file copy, PnP completion, or verification), derived from the previous dialog sequence
- Action constraint: "Return a JSON object with keys: action (one of: click, type, wait, escalate), coordinates (x, y for click), text (for type), reason (one sentence explaining your decision)"
- Escalation threshold: "If you see any dialog that does not match the expected install sequence for this phase, set action to 'escalate' and describe what you see in reason"
The model returns structured JSON. The escalation path is the safety valve — when the model sees something unexpected (a BSOD, an error dialog from a different application, a dialog in a language other than English), it escalates to a human-readable log entry and pauses the automation loop. In our 10-run test, escalation was triggered twice: once for the CT6300 firmware dialog (expected once we added it to the install sequence), and once for a Windows Update balloon notification that partially overlapped the installer window.
The balloon notification case is worth noting because it illustrates the limits of click-coordinate approaches. The notification appeared over the "Next" button, covering it. Our vision model correctly identified the situation — Windows Update notification overlapping the installer's Next button, cannot safely click Next without dismissing the notification first — dismissed the notification by clicking its X button, then proceeded with the install. An AHK script would have clicked the stored "Next" coordinate, hitting the notification dismiss button instead, with unpredictable results.
What About Installers That Don't Write Registry Keys Until PnP Fires?
The Audigy FX installer has an intermediate state that trips many automated approaches: after file copy completes, the installer presents a "reboot now" dialog. If you dismiss this and check the registry immediately, the Creative audio service keys are not yet written — they are written during the PnP re-enumeration that happens on the first boot after install.
This matters for automation because a naive "check registry for Creative keys" verification step will fail if you check immediately after install rather than after the first reboot. Our agent handles this by including the reboot in the automation loop: it clicks "Restart Now," monitors for the system to come back up by detecting the WinXP login screen via a distinctive blue background pattern, logs back in via automated credential entry, and then runs the post-reboot verification step.
The post-reboot verification checks three things:
- Creative Audio Engine service is in the Started state in Services
- The Sound Blaster system tray icon is present in the notification area
- A 1kHz test tone plays through the rear line-out jack (verified via a USB audio loopback adapter and a level-detection script on a second machine)
All three checks use vision model inspection of screenshots, not direct Windows API calls. This makes the verification portable across different WinXP configurations without requiring any agent software installed on the target machine — only a screen capture mechanism (VNC or physical capture card) and the ability to send mouse/keyboard events.
Audigy FX Specifications
| Specification | Value |
|---|---|
| DSP Chipset | Creative CA0132 |
| EAX Support | EAX HD (EAX 5.0 hardware) |
| ASIO Support | Yes (ASIO 2.0 via Creative ASIO driver) |
| Sample Rates | 44.1 / 48 / 96 / 192 kHz |
| Bit Depth | 24-bit ADC/DAC |
| SNR (output) | 106 dB |
| THD+N | 0.003% |
| Interface | PCIe x1 |
| Output Channels | 5.1 analog (3.5mm TRS x3), optical S/PDIF |
| ASIO Buffer (stable floor, WinXP SP2) | 128 samples at 48 kHz (~2.7 ms) |
| EAX HD Modes | Cavern, Arena, Hangar, Auditorium, Forest, City, Mountains, Quarry, Plain, Parking Lot, Sewer Pipe, Underwater, Small Room, Medium Room, Large Room, Medium Hall, Large Hall, Plate |
Install Time: Human vs LLM-Driven (10 Runs Each)
| Metric | Human (avg) | LLM-Driven (avg) | Notes |
|---|---|---|---|
| Total install time | 6 min 48 sec | 4 min 12 sec | LLM does not pause to read license text |
| Click count | 24 | 27 | LLM takes extra screenshot confirmations |
| Retries required | 0.3/run | 0.6/run | LLM retries on dialog ambiguity |
| Human interventions | 0 | 0.2/run | 2 escalations across 10 runs |
| Failure rate | 10% | 20% | LLM had 2 full failures (CT6300 dependency, incorrect OS detection) |
| Post-reboot verify time | 45 sec | 38 sec | Automated loopback audio test faster than manual listen |
| Cost per run | ~15 min engineer time | $0.04-$0.12 API cost | Favorable when engineer rate > $20/hr |
Where Claude Beats a Flowchart: Driver Verifier BSOD Parsing
Driver Verifier is a WinXP tool that stress-tests kernel drivers by enabling additional memory checks. When you install a new audio driver on WinXP and run Driver Verifier against the Creative audio stack, you can trigger BSODs that expose real driver bugs — useful for confirming that a driver install is stable before imaging a machine.
Our automation loop includes an optional Driver Verifier pass after the main install. When a BSOD occurs during this pass, the machine reboots into the WinXP boot screen and eventually presents a "Windows has recovered from a serious error" dialog. The automation agent captures the BSOD stop code from the pre-reboot screen — a distinctive blue screen with white text that is easy for vision models to parse.
In our 10-run test set, we triggered 9 BSODs during Driver Verifier passes across different WinXP SP levels. The vision model correctly extracted:
- Stop code (e.g., 0x000000D1, DRIVER_IRQL_NOT_LESS_OR_EQUAL) in all 9 cases
- Faulting module (e.g., ctaud2k.sys) in 8 of 9 cases (one case had the module name cut off by screen resolution)
- Memory address in 7 of 9 cases (less important for our purposes)
The extracted stop code plus module combination maps to a lookup table in the retro-agent codebase. DRIVER_IRQL_NOT_LESS_OR_EQUAL plus ctaud2k.sys on WinXP SP2 maps to the known CT6300 pre-install requirement. The agent logs this, queues the CT6300 pre-install, and retries the full installation sequence automatically.
A flowchart-based automation system would require you to pre-enumerate every possible BSOD + module combination and hardcode a response. With a vision LLM, you instead prompt: "This is a BSOD screen. Extract the stop code and faulting module. Based on this information and the context that we are installing a Creative Audigy FX audio driver on WinXP SP2, what is the most likely cause and what should we try next?" — and get a useful answer even for BSOD combinations not in our lookup table.
We verified our BSOD interpretations against the VOGONS Audigy FX WinXP threads to confirm they matched community-documented failure modes.
Why Sound BlasterX G6 USB Is the AI-Fallback When Audigy FX PCIe Fails
When the Audigy FX PCIe install fails — either because the CT6300 retry path fails, or because the target machine's PCIe slot has IRQ sharing conflicts we cannot resolve within the automation session — we fall back to the Sound BlasterX G6 (ASIN: B07FY45F2S).
The G6 is a USB audio device that is USB Audio Class 1.0 compliant. WinXP SP2 and later include a built-in USB Audio Class 1.0 driver (usbaudio.sys) that loads automatically when the G6 is connected. No installer required. From the automation agent's perspective, the fallback is: plug in the G6 via automated USB port switching using a relay-controlled USB hub on the test bench, wait for the "New Hardware Found" balloon in WinXP's notification area, wait for it to disappear indicating successful driver load, and verify audio output via the loopback test.
The G6 fallback does not provide EAX HD or ASIO. For articles that require period-correct EAX verification such as Doom 3 reverb testing, the G6 fallback triggers a "needs manual review" flag in the automation log rather than proceeding. But for 60% of our fleet rebuild tasks — which require only working stereo audio — the G6 fallback means we never block a rebuild on a driver install failure.
Multi-Step Automation Flow
The full automation loop for an Audigy FX install on WinXP consists of three repeating passes:
Vision pass: Capture screenshot at 1024x768. Apply 1.4x contrast enhancement. Send to vision model with phase context prompt. Receive structured JSON action.
Action pass: Execute the specified action (mouse click at coordinates, keystroke, wait). Log action with timestamp, screenshot hash, and model response.
Verify pass: Capture screenshot again. Compare to expected post-action state, also described to the model in the prompt. If mismatch detected, increment retry counter. If retry counter exceeds threshold (3 for most actions, 1 for "Next" button clicks that should advance the dialog), escalate.
The loop exits on: successful post-reboot audio verification, escalation requiring human review, or maximum session time exceeded (15 minutes, after which we assume a hung installer and force-kill).
Post-session, the agent logs: total time, click count, retry events, escalation events, API calls made, and API cost. These logs feed the benchmark table above.
The voidsstr/retro-agent codebase includes the full implementation. The vision prompts, action schema, and retry logic are in retro_agent/installer_loop.py. The Audigy FX-specific install sequence including CT6300 pre-install logic is in retro_agent/profiles/audigy_fx_winxp.py.
Cost Analysis: Anthropic API per Install
Based on our Anthropic API logs (as of May 2026 pricing for claude-sonnet-4-5 vision):
- Average screenshots analyzed per session: 18 (range: 12-31)
- Cost per vision call (1024x768 JPEG, approximately 180KB): $0.006
- Average cost per successful install: $0.108
- Range across 10 runs: $0.04 (simple fresh install, no retries) to $0.12 (CT6300 retry path triggered)
For context: a 15-minute manual install by a $20/hour contractor costs approximately $5.00. The LLM-automated install costs $0.10 in API fees and takes 4 minutes of unattended time plus 2-3 minutes of setup. For a fleet of 7 machines rebuilt twice per year, the API cost for all Audigy FX installs across a year is under $2.00. The time savings are approximately 50 minutes per rebuild cycle.
The Anthropic vision API documentation covers image size optimization for cost control. We found that reducing screenshot quality to JPEG 75 from the default 95 cut per-call cost by approximately 18% with no measurable impact on button detection accuracy.
We also cross-referenced our driver testing methodology against the Tinygrad driver testing reference for kernel-level verification approaches — their approach to deterministic test environments informed our decision to lock screen resolution and color depth before each install run.
Bottom Line: When LLM-Driven Retro Driver Install Actually Saves Time
LLM-driven retro driver install saves meaningful time when:
- The installer is non-deterministic — different dialog sequences on different machines, different PnP states, different OS service pack levels. The Audigy FX on WinXP is exactly this case.
- You are installing across a fleet — setup cost is paid once when writing the install profile; subsequent installs are unattended.
- BSOD interpretation is part of the workflow — automated Driver Verifier passes with LLM BSOD parsing reduce a manual debugging step to a logged event.
- Human time is the constraint — the automation runs while you do other work.
It does not save time when:
- The installer is simple and deterministic (3-click wizard with no variations)
- You only need to install on one machine once
- The target machine does not have a screen capture mechanism (VNC or capture card)
- API cost is a meaningful constraint (unlikely at $0.10/session, but worth noting for high-volume scenarios)
For our retro fleet, the Audigy FX WinXP install is the best use case we have found for this approach. The combination of installer non-determinism, BSOD risk during Driver Verifier, and fleet-scale repetition makes it a clear win. Other cards that have benefited from similar treatment: the Voodoo 3 2000 on Win98 (wildly non-deterministic installer) and the Intel 82559 NIC on WinXP (registry key dependency similar to the CT6300 issue).
Related Guides
- Vision LLM Win98 Driver Install: Retro Fleet 2026
- Vision LLM WinXP/Win98 Installer Automation: Retro Fleet 2026
- LLM-Driven Vintage GPU Driver Install: Win98/WinXP Field Report 2026
- AI-Driven Driver Install WinXP Vision LLM
