Why Sherloq Is Building the Federated Search Layer for SQL

Why Sherloq Is Building the Federated Search Layer for SQL

Why Sherloq Is Building the Federated Search Layer for SQL

Jul 21, 2025


SQL is everywhere, but nowhere centralized.

At most organizations, SQL lives in dozens of places: Snowflake worksheets, dbt models, GitHub repos, dashboards, notebooks, and scripts. Each one tells part of the story, but there’s no reliable way to see the full picture.

Sherloq was built to fix that. But this isn’t just about better search.

It’s about solving a core challenge in modern data work: how do you search across heterogeneous sources, written in different styles, with different schema, syntax, and structure, and still understand what’s actually going on?

That’s the real problem federated search is designed to solve.


The Core Challenge: Fragmented SQL Across Sources

At first glance, centralization may seem like the obvious solution. Just put everything in one database.

But that only works if the sources are uniform. In real-world organizations, they never are.


Each source of SQL reflects different realities:

  • Different syntax conventions (Snowflake SQL, dbt Jinja macros, raw SQL scripts)

  • Different environments and access levels

  • Different naming conventions for the same concept (user_id vs. uid vs. customer_id)

  • Different query intent (ad hoc vs. production vs. debug)

You can’t flatten these into a single system. You need infrastructure that respects these differences and still allows you to reason across them.

Sherloq’s Approach: Federated Indexing and Search


Sherloq treats SQL as a primary knowledge asset and builds a federated search index around it.

Instead of requiring users to adopt a new IDE or workflow, Sherloq integrates with existing tools such as Snowflake, dbt, GitHub, and others to extract and organize SQL artifacts.


The core system includes:

  • Connectors that pull queries from multiple sources, without requiring migration

  • Parsing and metadata enrichment, including table usage, join structures, authorship, and execution context

  • Embedding models that capture semantic similarity between queries

  • An indexing layer that supports both keyword-based and vector-based retrieval

This allows Sherloq to serve as a cross-platform query layer. Users, tools, and AI systems can search, retrieve, and act on SQL across disparate environments.

Federated Search Requires Semantic Understanding


Sherloq doesn’t just index text. It builds semantic understanding across systems.


We normalize and connect queries by:

  • Grouping equivalent fields using naming patterns, co-occurrence, and usage context

  • Inferring join paths and relationships across sources

  • Recognizing equivalent logic such as user_status = 'active' vs. is_active = TRUE

  • Embedding the structure and intent of queries for similarity-based retrieval


This enables you to ask questions like:

  • Where is the concept of active users calculated?

  • What joins happen between payments and users tables across the org?

  • Which queries filter out test users?

These can’t be answered by grep. They require field-level alignment and structural reasoning across systems.


Why This Matters for AI: Retrieval Comes First


Most AI systems focus on generating SQL, but generation is just the final step.

The first step is retrieving the right context.

  • What does a typical join look like between these two tables?

  • Which filters are commonly used in production logic?

  • What was the last query that used this metric or table successfully?


Without this retrieval layer, LLMs operate in a vacuum. They guess.

Sherloq provides a structured memory of how your team writes SQL. It powers retrieval-augmented generation (RAG), making AI copilots more accurate, relevant, and aligned with internal standards.

It is not enough to generate correct SQL in theory. The model must generate SQL that fits your organization’s language, logic, and constraints. That starts with surfacing the right examples.


A Foundation That Extends Beyond SQL


Federated indexing and semantic search are not just solutions for SQL.

They represent a broader architectural approach to working with disconnected, inconsistent, but deeply valuable knowledge artifacts - whether that knowledge is expressed in SQL, logs, JSON configs, or alerting rules.

The core idea is this: you can’t centralize everything. But you can understand it, connect it, and make it searchable.


Sherloq applies this concept to SQL because it’s one of the most high-signal, high-structure sources of intent in modern data work. SQL has well-defined syntax, predictable logic patterns, and natural points of reuse, which makes it an excellent domain for building federated, AI-ready systems.


But the same principles apply elsewhere. Each domain has its own quirks, syntax, and semantics - and each deserves a system that can reason across them, not flatten them.

The model - federated indexing, semantic enrichment, context retrieval - is broadly applicable. It’s the groundwork for AI systems that learn from what your team actually does, not just what’s documented.


We’re building infrastructure that connects knowledge as it exists today, across formats and tools, and makes it usable, not just for search, but for intelligence.

Get Sherloq Free