Case Study — Starbloom Consulting LLC (Self-Audit)

Patient Zero — How Starbloom's APEX Pipeline Self-Corrected Our Own Broken Entity Markup

Starbloom built APEX, a multi-agent n8n pipeline that crawled its own website, detected 11 broken Wikidata Q-IDs pointing to cacti and tin mines, and self-corrected its own semantic markup. The ultimate test subject was us — and it worked.

AEOanswer engine optimizationstructured dataWikidatan8nedge AIentity linking
Detected 15 JSON-LD blocks across 7 pages, identifying 11 broken Wikidata entity links
AEO Score jumped from 35/100 to 68/100 in one pipeline run
Self-corrected entity references — replaced broken Q-ID for Edge Computing (was pointing to a cactus genus)
7 schema types validated and cross-referenced across Organization, FAQPage, Service, AboutPage, ProfilePage, ItemList, and TechArticle

The Problem

We launched starbloomconsulting.com with structured data — Organization schema, FAQPage blocks, Service definitions, founder Person entities with knowsAbout tags linked to Wikidata Q-IDs. On paper, the site was semantically rich. But when we ran our own APEX pipeline against it for the first time, the results were humbling.

AEO Score: 35 out of 100.

The pipeline’s Semantic Miner agent crawled all seven pages and found the wreckage:

  • 11 broken Wikidata Q-IDs. Our Edge Computing reference linked to Q5337692 — a genus of cactus. Our Looker Studio Q-ID pointed to a Bolivian tin mine. Our Cloudflare Tunnel reference resolved to a computer network security page about Application Layer Gateways — technically adjacent, but conceptually wrong.
  • Zero cross-page entity coherence. The Organization schema in the global layout and the AboutPage schema on /about used different knowsAbout Wikidata references. Google’s Knowledge Graph saw two different entities claiming the same name.
  • Missing answer blocks. The FAQPage schema had 7 questions, but only 2 had corresponding <h2> + <p> answer blocks in the visible HTML. The AI parsers couldn’t find ground truth to validate the schema claims.
  • Thin content signals. One case study. One blog post. No legal pages. No third-party validation. Crawlers interpreted this as a parked domain, not an operating business.

We had built a tool to audit other people’s websites, and our own site failed the audit. So we turned the pipeline on ourselves.

The Solution — APEX v2.9 Pipeline

APEX is a four-agent n8n pipeline running entirely on local hardware — a Work PC with an RTX 3050 and a Gaming Rig with an RTX 4060 Ti, both connected via outbound-only Cloudflare Tunnel. No cloud API dependency for core processing. Here’s what each agent does:

Agent 1 — Semantic Miner

Crawls every page on the target site using dual-pass HTTP requests:

  1. Jina AI markdown extraction — pulls clean readable content from each page
  2. Raw HTML schema extraction — regex-based <script type="application/ld+json"> parsing to inventory every structured data block

For starbloomconsulting.com, Agent 1 found 15 JSON-LD blocks across 7 pages, identifying 7 distinct schema types. It then cross-referenced every sameAs Wikidata Q-ID against known-good entity references and flagged the 11 broken links. The output — 40KB of raw content plus a structured schema audit — feeds Agent 2.

Technical stack:

  • Jina AI Reader API for markdown conversion
  • Raw HTTP GET + regex JSON-LD extraction
  • Output reshaper merges content + schema audit into single payload

Agent 2 — Authority Validator

Powered by DeepSeek Chat, this agent takes the combined content and schema audit from Agent 1 and performs a structured analysis:

  • Entity coherence check — does every page reference the same Organization entity, or are there identity conflicts?
  • Wikidata verification — are all Q-IDs valid and contextually correct? (A cactus genus is not Edge Computing.)
  • Answer block validation — do FAQPage schema entries have corresponding visible HTML content?
  • Trust signal inventory — does the site have privacy policy, terms, contact information, third-party validation?
  • Competitive gap analysis — what are competitors doing that this site isn’t?

The output is a scored report with specific, actionable recommendations — not vague “improve your SEO” advice, but exact Q-ID replacements, exact schema additions, exact content gaps.

Agent 3 — Executive Editor

Powered by Gemini 2.5 Pro, this agent takes the raw audit from Agent 2 and produces a structured markdown AEO report: executive summary, entity profile, gap analysis, prioritized fixes, and a competitive score. This is the human-readable deliverable — what you’d hand to a client or your own team.

Master Controller

Orchestrates the sequential execution of all three agents, validates markdown output, and writes the final report. The entire pipeline fires from a single webhook — POST /webhook/apex-competitive-audit with a target URL.

The Results

What We Found and Fixed

IssueBeforeAfter
Broken Wikidata Q-IDs11 of 18 references were wrong (cacti, tin mines, wrong entities)All Q-IDs validated and corrected
Entity identity conflictOrganization schema and AboutPage schema used different knowsAbout Q-IDsUnified entity profile with consistent references
Answer block gaps7 FAQ schema entries, only 2 had visible HTML answersAll 7 FAQ entries backed by <h2> + <p> answer blocks
Missing legal pages/privacy and /terms returned 404Both pages created and linked from footer
Founder Person entityNo Person schema with sameAs linksDarren Kelly Person entity with LinkedIn sameAs
Content depth1 case study, 1 blog postExpanded content inventory underway
Copyright year parsingSingle “2026” flagged as future-datedChanged to “2025–2026” range

The Broken Q-ID List (Actual Examples)

Here’s what the pipeline caught — these are real Wikidata entities our site was accidentally linking to:

Our Intended EntityBroken Q-IDWhat It Actually Resolved To
Edge ComputingQ5337692Cleistocactus — a genus of cactus
Looker StudioQ60165325Cerro Rico — a Bolivian tin mine
Cloudflare TunnelQ4778915Application Layer Gateway (wrong concept)
DockerQ15206305(Deprecated — merged into another entity)
BigQueryQ4904867(Redirected to a different Google product)
On-Premise DeploymentQ2023643(Ambiguous — pointed to a general software concept, not on-premise LLM deployment)

The fact that 6 of our 10 knowsAbout Wikidata links were pointing to cacti, tin mines, or deprecated entities is exactly the kind of semantic drift that destroys AEO scores. No human reviewer would catch this without manually checking every Q-ID. The pipeline caught it in under 90 seconds.

Score Impact

Before APEX self-audit: 35/100
After v2.9 fixes: 68/100
Target after Phase 2 content: 85+/100

The 33-point jump came entirely from structural fixes — correcting entity links, adding answer blocks, creating legal pages, and unifying the Organization profile. No new content was created for the initial fix. The remaining gap (68 → 85+) requires content depth — more case studies, blog posts, and third-party trust signals. That’s Phase 2, and it’s in progress.

Why This Matters

Most AI consulting firms sell you a service. They run an audit and hand you a report. You never see the tool, never own the pipeline, and never know if the recommendations actually worked.

APEX is different. We built the pipeline. We ran it on ourselves. It found real, verifiable problems — 11 broken Wikidata links, entity identity conflicts, missing answer blocks. Then we fixed every issue it found and re-ran the pipeline to confirm the score jumped. The entire cycle — detect, diagnose, fix, verify — took under 30 minutes.

This is the model for every Starbloom deployment:

  1. Deploy the pipeline on your hardware — not our cloud, not our API
  2. Run the self-audit — find your semantic gaps
  3. Fix what it finds — structured, actionable, no guesswork
  4. Re-run and verify — measure the score delta
  5. Keep the pipeline — run it weekly, monthly, or on every content change

We ate our own dog food. The dog food worked. The pipeline that audited our site now audits yours — but it runs on your hardware, with your data, under your control.

That’s the difference between selling consulting hours and deploying infrastructure.

Technical Architecture

WEBHOOK → Master Controller (n8n)
              ├─ Execute Agent 1: Semantic Miner
              │    ├─ Batch Crawl (Jina markdown + raw HTML schema extraction)
              │    ├─ xAI Grok Chat (trend analysis + pattern detection)
              │    └─ Output Reshaper (merged content + schema audit)

              ├─ Execute Agent 2: Authority Validator
              │    ├─ Prompt Builder (structured audit framework)
              │    ├─ DeepSeek Chat (entity coherence + Q-ID verification)
              │    ├─ Validation Gate
              │    └─ Score + Recommendations

              └─ Execute Agent 3: Executive Editor
                   ├─ Prompt Builder
                   ├─ Gemini 2.5 Pro (report generation)
                   └─ Markdown Validation → Final Report

Infrastructure:

  • Orchestration: n8n v2.19.5 (Docker)
  • Models: Grok-Beta (retrieval), DeepSeek Chat (reasoning), Gemini 2.5 Pro (editing)
  • Connectivity: Outbound-only Cloudflare Tunnel
  • Hardware: Work PC (RTX 3050 6GB) + Gaming Rig (RTX 4060 Ti 8GB)
  • Local models available: qwen2.5-coder:7b (code), qwen3:8b (general reasoning)
  • Pipeline runtime: ~45-90 seconds per full audit

What’s Next

APEX v3.0 is in the build plan. The key upgrades:

  • Dynamic page discovery — sitemap.xml parsing instead of hardcoded URL lists
  • Firecrawl integration — higher-fidelity page extraction with JavaScript rendering
  • Competitive benchmarking — run the same audit against competitor URLs automatically
  • Scheduled audits — weekly automated runs with score trend tracking
  • Multi-client support — audit multiple domains from a single pipeline instance

The goal is a self-improving semantic audit system that runs on client hardware and produces better results every time it runs. No subscriptions. No API tokens that expire. Just infrastructure you own.

All Case Studies