Jul 21, 2025

SQL is everywhere, but nowhere centralized.
At most organizations, SQL lives in dozens of places: Snowflake worksheets, dbt models, GitHub repos, dashboards, notebooks, and scripts. Each one tells part of the story, but there’s no reliable way to see the full picture.
Sherloq was built to fix that. But this isn’t just about better search.
It’s about solving a core challenge in modern data work: how do you search across heterogeneous sources, written in different styles, with different schema, syntax, and structure, and still understand what’s actually going on?
That’s the real problem federated search is designed to solve.
The Core Challenge: Fragmented SQL Across Sources
At first glance, centralization may seem like the obvious solution. Just put everything in one database.
But that only works if the sources are uniform. In real-world organizations, they never are.
Each source of SQL reflects different realities:
Different syntax conventions (Snowflake SQL, dbt Jinja macros, raw SQL scripts)
Different environments and access levels
Different naming conventions for the same concept (user_id vs. uid vs. customer_id)
Different query intent (ad hoc vs. production vs. debug)
You can’t flatten these into a single system. You need infrastructure that respects these differences and still allows you to reason across them.
Sherloq’s Approach: Federated Indexing and Search
Sherloq treats SQL as a primary knowledge asset and builds a federated search index around it.
Instead of requiring users to adopt a new IDE or workflow, Sherloq integrates with existing tools such as Snowflake, dbt, GitHub, and others to extract and organize SQL artifacts.
The core system includes:
Connectors that pull queries from multiple sources, without requiring migration
Parsing and metadata enrichment, including table usage, join structures, authorship, and execution context
Embedding models that capture semantic similarity between queries
An indexing layer that supports both keyword-based and vector-based retrieval
This allows Sherloq to serve as a cross-platform query layer. Users, tools, and AI systems can search, retrieve, and act on SQL across disparate environments.
Federated Search Requires Semantic Understanding
Sherloq doesn’t just index text. It builds semantic understanding across systems.
We normalize and connect queries by:
Grouping equivalent fields using naming patterns, co-occurrence, and usage context
Inferring join paths and relationships across sources
Recognizing equivalent logic such as user_status = 'active' vs. is_active = TRUE
Embedding the structure and intent of queries for similarity-based retrieval
This enables you to ask questions like:
Where is the concept of active users calculated?
What joins happen between payments and users tables across the org?
Which queries filter out test users?
These can’t be answered by grep. They require field-level alignment and structural reasoning across systems.
Why This Matters for AI: Retrieval Comes First
Most AI systems focus on generating SQL, but generation is just the final step.
The first step is retrieving the right context.
What does a typical join look like between these two tables?
Which filters are commonly used in production logic?
What was the last query that used this metric or table successfully?
Without this retrieval layer, LLMs operate in a vacuum. They guess.
Sherloq provides a structured memory of how your team writes SQL. It powers retrieval-augmented generation (RAG), making AI copilots more accurate, relevant, and aligned with internal standards.
It is not enough to generate correct SQL in theory. The model must generate SQL that fits your organization’s language, logic, and constraints. That starts with surfacing the right examples.
A Foundation That Extends Beyond SQL
Federated indexing and semantic search are not just solutions for SQL.
They represent a broader architectural approach to working with disconnected, inconsistent, but deeply valuable knowledge artifacts - whether that knowledge is expressed in SQL, logs, JSON configs, or alerting rules.
The core idea is this: you can’t centralize everything. But you can understand it, connect it, and make it searchable.
Sherloq applies this concept to SQL because it’s one of the most high-signal, high-structure sources of intent in modern data work. SQL has well-defined syntax, predictable logic patterns, and natural points of reuse, which makes it an excellent domain for building federated, AI-ready systems.
But the same principles apply elsewhere. Each domain has its own quirks, syntax, and semantics - and each deserves a system that can reason across them, not flatten them.
The model - federated indexing, semantic enrichment, context retrieval - is broadly applicable. It’s the groundwork for AI systems that learn from what your team actually does, not just what’s documented.
We’re building infrastructure that connects knowledge as it exists today, across formats and tools, and makes it usable, not just for search, but for intelligence.
Get Sherloq Free