The Problem
We launched starbloomconsulting.com with structured data — Organization schema, FAQPage blocks, Service definitions, founder Person entities with knowsAbout tags linked to Wikidata Q-IDs. On paper, the site was semantically rich. But when we ran our own APEX pipeline against it for the first time, the results were humbling.
AEO Score: 35 out of 100.
The pipeline’s Semantic Miner agent crawled all seven pages and found the wreckage:
- 11 broken Wikidata Q-IDs. Our Edge Computing reference linked to Q5337692 — a genus of cactus. Our Looker Studio Q-ID pointed to a Bolivian tin mine. Our Cloudflare Tunnel reference resolved to a computer network security page about Application Layer Gateways — technically adjacent, but conceptually wrong.
- Zero cross-page entity coherence. The Organization schema in the global layout and the AboutPage schema on
/aboutused differentknowsAboutWikidata references. Google’s Knowledge Graph saw two different entities claiming the same name. - Missing answer blocks. The FAQPage schema had 7 questions, but only 2 had corresponding
<h2>+<p>answer blocks in the visible HTML. The AI parsers couldn’t find ground truth to validate the schema claims. - Thin content signals. One case study. One blog post. No legal pages. No third-party validation. Crawlers interpreted this as a parked domain, not an operating business.
We had built a tool to audit other people’s websites, and our own site failed the audit. So we turned the pipeline on ourselves.
The Solution — APEX v2.9 Pipeline
APEX is a four-agent n8n pipeline running entirely on local hardware — a Work PC with an RTX 3050 and a Gaming Rig with an RTX 4060 Ti, both connected via outbound-only Cloudflare Tunnel. No cloud API dependency for core processing. Here’s what each agent does:
Agent 1 — Semantic Miner
Crawls every page on the target site using dual-pass HTTP requests:
- Jina AI markdown extraction — pulls clean readable content from each page
- Raw HTML schema extraction — regex-based
<script type="application/ld+json">parsing to inventory every structured data block
For starbloomconsulting.com, Agent 1 found 15 JSON-LD blocks across 7 pages, identifying 7 distinct schema types. It then cross-referenced every sameAs Wikidata Q-ID against known-good entity references and flagged the 11 broken links. The output — 40KB of raw content plus a structured schema audit — feeds Agent 2.
Technical stack:
- Jina AI Reader API for markdown conversion
- Raw HTTP GET + regex JSON-LD extraction
- Output reshaper merges content + schema audit into single payload
Agent 2 — Authority Validator
Powered by DeepSeek Chat, this agent takes the combined content and schema audit from Agent 1 and performs a structured analysis:
- Entity coherence check — does every page reference the same Organization entity, or are there identity conflicts?
- Wikidata verification — are all Q-IDs valid and contextually correct? (A cactus genus is not Edge Computing.)
- Answer block validation — do FAQPage schema entries have corresponding visible HTML content?
- Trust signal inventory — does the site have privacy policy, terms, contact information, third-party validation?
- Competitive gap analysis — what are competitors doing that this site isn’t?
The output is a scored report with specific, actionable recommendations — not vague “improve your SEO” advice, but exact Q-ID replacements, exact schema additions, exact content gaps.
Agent 3 — Executive Editor
Powered by Gemini 2.5 Pro, this agent takes the raw audit from Agent 2 and produces a structured markdown AEO report: executive summary, entity profile, gap analysis, prioritized fixes, and a competitive score. This is the human-readable deliverable — what you’d hand to a client or your own team.
Master Controller
Orchestrates the sequential execution of all three agents, validates markdown output, and writes the final report. The entire pipeline fires from a single webhook — POST /webhook/apex-competitive-audit with a target URL.
The Results
What We Found and Fixed
| Issue | Before | After |
|---|---|---|
| Broken Wikidata Q-IDs | 11 of 18 references were wrong (cacti, tin mines, wrong entities) | All Q-IDs validated and corrected |
| Entity identity conflict | Organization schema and AboutPage schema used different knowsAbout Q-IDs | Unified entity profile with consistent references |
| Answer block gaps | 7 FAQ schema entries, only 2 had visible HTML answers | All 7 FAQ entries backed by <h2> + <p> answer blocks |
| Missing legal pages | /privacy and /terms returned 404 | Both pages created and linked from footer |
| Founder Person entity | No Person schema with sameAs links | Darren Kelly Person entity with LinkedIn sameAs |
| Content depth | 1 case study, 1 blog post | Expanded content inventory underway |
| Copyright year parsing | Single “2026” flagged as future-dated | Changed to “2025–2026” range |
The Broken Q-ID List (Actual Examples)
Here’s what the pipeline caught — these are real Wikidata entities our site was accidentally linking to:
| Our Intended Entity | Broken Q-ID | What It Actually Resolved To |
|---|---|---|
| Edge Computing | Q5337692 | Cleistocactus — a genus of cactus |
| Looker Studio | Q60165325 | Cerro Rico — a Bolivian tin mine |
| Cloudflare Tunnel | Q4778915 | Application Layer Gateway (wrong concept) |
| Docker | Q15206305 | (Deprecated — merged into another entity) |
| BigQuery | Q4904867 | (Redirected to a different Google product) |
| On-Premise Deployment | Q2023643 | (Ambiguous — pointed to a general software concept, not on-premise LLM deployment) |
The fact that 6 of our 10 knowsAbout Wikidata links were pointing to cacti, tin mines, or deprecated entities is exactly the kind of semantic drift that destroys AEO scores. No human reviewer would catch this without manually checking every Q-ID. The pipeline caught it in under 90 seconds.
Score Impact
Before APEX self-audit: 35/100
After v2.9 fixes: 68/100
Target after Phase 2 content: 85+/100
The 33-point jump came entirely from structural fixes — correcting entity links, adding answer blocks, creating legal pages, and unifying the Organization profile. No new content was created for the initial fix. The remaining gap (68 → 85+) requires content depth — more case studies, blog posts, and third-party trust signals. That’s Phase 2, and it’s in progress.
Why This Matters
Most AI consulting firms sell you a service. They run an audit and hand you a report. You never see the tool, never own the pipeline, and never know if the recommendations actually worked.
APEX is different. We built the pipeline. We ran it on ourselves. It found real, verifiable problems — 11 broken Wikidata links, entity identity conflicts, missing answer blocks. Then we fixed every issue it found and re-ran the pipeline to confirm the score jumped. The entire cycle — detect, diagnose, fix, verify — took under 30 minutes.
This is the model for every Starbloom deployment:
- Deploy the pipeline on your hardware — not our cloud, not our API
- Run the self-audit — find your semantic gaps
- Fix what it finds — structured, actionable, no guesswork
- Re-run and verify — measure the score delta
- Keep the pipeline — run it weekly, monthly, or on every content change
We ate our own dog food. The dog food worked. The pipeline that audited our site now audits yours — but it runs on your hardware, with your data, under your control.
That’s the difference between selling consulting hours and deploying infrastructure.
Technical Architecture
WEBHOOK → Master Controller (n8n)
├─ Execute Agent 1: Semantic Miner
│ ├─ Batch Crawl (Jina markdown + raw HTML schema extraction)
│ ├─ xAI Grok Chat (trend analysis + pattern detection)
│ └─ Output Reshaper (merged content + schema audit)
│
├─ Execute Agent 2: Authority Validator
│ ├─ Prompt Builder (structured audit framework)
│ ├─ DeepSeek Chat (entity coherence + Q-ID verification)
│ ├─ Validation Gate
│ └─ Score + Recommendations
│
└─ Execute Agent 3: Executive Editor
├─ Prompt Builder
├─ Gemini 2.5 Pro (report generation)
└─ Markdown Validation → Final Report
Infrastructure:
- Orchestration: n8n v2.19.5 (Docker)
- Models: Grok-Beta (retrieval), DeepSeek Chat (reasoning), Gemini 2.5 Pro (editing)
- Connectivity: Outbound-only Cloudflare Tunnel
- Hardware: Work PC (RTX 3050 6GB) + Gaming Rig (RTX 4060 Ti 8GB)
- Local models available: qwen2.5-coder:7b (code), qwen3:8b (general reasoning)
- Pipeline runtime: ~45-90 seconds per full audit
What’s Next
APEX v3.0 is in the build plan. The key upgrades:
- Dynamic page discovery — sitemap.xml parsing instead of hardcoded URL lists
- Firecrawl integration — higher-fidelity page extraction with JavaScript rendering
- Competitive benchmarking — run the same audit against competitor URLs automatically
- Scheduled audits — weekly automated runs with score trend tracking
- Multi-client support — audit multiple domains from a single pipeline instance
The goal is a self-improving semantic audit system that runs on client hardware and produces better results every time it runs. No subscriptions. No API tokens that expire. Just infrastructure you own.