October 25, 2025

Grounding LLM with Real-time Data

How to ground LLMs with real-time data using web search tools and structured outputs. Compare different approaches including search APIs, native LLM tools, and MCP integration.

Web Search LLM Grounding Real-time Data AI Engineering Structured Outputs BAML

Grounding LLMs with Real-time Data

Grounding LLMs with Real-time Data
Provider Comparison
Conclusion

When you build an AI agent or large language model (LLM) workflow, one major pain point is stale or hallucinated output.

For example, imagine a travel operations manager trying to keep attendees updated on airport delays, flight cancellations, or sudden event schedule changes. Without live data, the team spends time manually searching multiple sites, risking missed alerts.

Here’s the thing: if you want your AI workflow to be trustworthy and actionable, you need grounding — anchoring the output in live web search results or other external data.

Grounding means your model doesn’t just rely on internal knowledge; it taps into up-to-date external sources, typically via search tools.

A simple example : you ask “What’s the latest regulation for AI models in the EU?” If the LLM only uses its training cut-off (say mid-2024), it might be wrong. If instead it uses a search tool to fetch the latest web articles or a reliable database of regulation and then incorporates them into its reasoning — much better.

My Checklist for AI Teams

If I were advising a team building real-time workflows (like travel alerts, events, or logistics), here’s what I’d have them check before implementing grounding:

Do we have queries where live data matters and could change suddenly?
Can we afford the extra latency and cost of live search?
Are we capturing metadata (queries, sources, timestamps) for observability?
Do we monitor how often grounding is used and how successful it is?
Are we prepared for user trust / audit questions — which source, when, and how up-to-date?

Case Study (AI Workflow Example – Travel & Events)

Problem:
A travel operations manager for a global conference platform wants to keep attendees informed in real time about airport delays, cancellations, or sudden event venue changes. Currently, the team manually searches multiple airline and local news sites to verify information — slow and error-prone.

AI Assistant with Grounding:

Triggers a web search for queries like:

“London Heathrow flight delays today Oct 25 2025” or
“FIFA fan fest schedule updates October 2025”
Fetches up-to-the-minute news articles, tweets from official sources, and announcements
Injects context into the LLM prompt
Generates concise updates:

“According to Heathrow Airport (URL), flights from Terminal 5 are delayed due to technical issues. The FIFA Fan Fest schedule has moved the 5 PM match preview to the east plaza.”
Logs metadata: query, URLs, fetch timestamp, source credibility

Outcome:

Operations team and attendees get instant, actionable alerts
Reduces manual search overhead and error risk
All alerts are traceable for reporting or audit purposes

Takeaway:
Grounding turns the AI from a “best guess” assistant into a reliable real-time workflow tool, automatically keeping teams up to date in situations where schedules or conditions change suddenly.

Now, let’s get practical

Let’s look at how to implement grounding with real-time web search and structured outputs.

Without grounding in search

resp = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "What was the latest Indian Premier League match and who won?",
        },
    ],
)

RESPONSE: 
I don’t have live access to today’s results. The most recent IPL match I have on record (up to Oct 2024) is the IPL 2024 Final on 26 May 2024, where Kolkata Knight Riders beat Sunrisers Hyderabad by 8 wickets.

If you want the latest match result right now, please check:
- Official: iplt20.com/matches/results
- ESPNcricinfo: espncricinfo.com/series/ipl-2025-xxxxx/match-schedule-fixtures-and-results

Here are the knowledge cutoff dates for major models:

Model	Knowledge Cutoff
GPT-4.1	June 2024
o4-mini	June 2024
GPT-5 (reports vary)	Sept–Oct 2024
Claude 3.5 Sonnet	April 2024
Claude 3.7 Sonnet	Oct–Nov 2024
Claude 4 (Opus/Sonnet)	Jan–Mar 2025
Gemini 2.5 (Pro/Flash)	Jan 2025
DeepSeek-V3	July–Dec 2024
QwQ-32B	Nov 28, 2024

Why Structured Outputs Are Critical for Search Grounding

When you ground an LLM with web search results, the output can be messy — dates, names, or numbers might get mixed up across multiple sources, and the LLM might phrase things differently each time. This makes it hard to use the results in a real workflow or feed them into downstream tools.

Structured outputs solve this problem by enforcing a consistent, machine-readable format. They provide:

Reliability: Each answer can be parsed automatically without fragile heuristics.
Consistency: Multiple queries follow the same format, making dashboards and analytics accurate.
Traceability: Each piece of data links back to its source, timestamp, and query.
Safer automation: Downstream AI agents or business processes can consume the data without introducing errors or hallucinations.

Example (Unstructured vs Structured)

Unstructured output from LLM:

“According to the news, Company X launched Product Y yesterday, and analysts are optimistic about its growth.”

Structured output:

{
  "company": "Company X",
  "product": "Product Y",
  "launch_date": "2025-10-24",
  "source": "https://news.example.com/article123",
  "analyst_sentiment": "positive"
}

With this structure, your system can automatically:

Update dashboards
Trigger notifications for relevant teams
Feed data into other AI agents or reporting pipelines

Structured outputs are not just a “nice-to-have.” They turn messy web data into actionable intelligence, which is essential for any real-world AI workflow that relies on live data.

Implementation

Some use cases demand live data — like news verification, travel updates, or event alerts.
There are three main ways to add real-time search grounding:

Search API + Structured Output (Recommended)
Use providers like Exa, Brave, or Google Search API, then format the output with Instructor or BAML.
Native LLM Tools
Use the built-in search tools from providers like OpenAI, Anthropic, or Gemini.
MCP-based Search Integration
Best for local development or if your pipeline already runs on MCP servers.

1. Search API + Structured Output

This gives you the most control to pick and choose LLM providers, search engine and other factors to give the best result.

def execute_web_search(query):
    exa = Exa(api_key=api_key)
    search_params = {
        "query": query,
        "num_results": 5,
        "text": True,
        "type": "auto",
    }
    search_response = exa.search_and_contents(**search_params)

Filter and summarize results with structured output:

// websearch.baml
class CleanSummary {
  answer string
  links string[]
}

function SummarizeAndClean(results: SearchResults[]) -> CleanSummary[] {
  client "openai/gpt-4.1"
  prompt #"
    Summarize these search results into clean, concise summaries:
    - Remove any profanity
    - Keep key details
    - Include relevant links
    - Use the latest info if conflicting
    {{ ctx.output_format }}
  "#
}

exa search

2. Use Native Tools from LLM Providers

This is the fastest way to get started.

Using OpenAI client:

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    tools=[{"type": "web_search"}],
    input="What was a positive news story from today?"
)
print(response.output_text)

Using Gemini models:

response = client.models.generate_content(
    model=MODEL_ID,
    contents="What was the latest Indian Premier League match and who won?",
    config={"tools": [{"google_search": {}}]},
)

# print the response
display(Markdown(f"**Response**:\n {response.text}"))
# print the search details
print(f"Search Query: {response.candidates[0].grounding_metadata.web_search_queries}")
# urls used for grounding
print(f"Search Pages: {', '.join([site.web.title for site in response.candidates[0].grounding_metadata.grounding_chunks])}")

display(HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content))

google search llm

The new google maps integration with Gemini API allows you ground results in maps which enables pretty interesting queries like
Do any cafes around here do a good flat white? I will walk up to 20 minutes away

Native tool + Structured outputs

Combining structured outputs with native tools gives you more control over data format and consistency.

// websearch.baml
client<llm> WebSearchClient {
  provider "openai-responses"
  options {
    api_key env.OPENAI_API_KEY
    model "gpt-5"
    tools [{
      type "web_search"
      
    }]
  }
function SearchNews(query: string) -> WebResults {
  client WebSearchClient
  prompt #"
    Search for and summarize {{ query }}. Only give results for 2025 pointwise. Limit the search results to 3. 
    {{ctx.output_format }}
  "#
}
   
// query "what are the biggest conferences on AI Engineering and what do they focus on ?"

3. MCP for Search Integration

This approach makes sense if you are already using MCP in your AI pipeline and websearch would another server. If you are using claude desktop or IDE that supports MCP , it's pretty easy to add websearch MCP server .
Create a custom client to connect to EXA MCP

from fastmcp import FastMCP
mcp = FastMCP(
    "Exa MCP Client",
    url=EXA_MCP_URL,
    headers={"Authorization": f"Bearer {EXA_API_KEY}"}
)

async def main():
  tools = await mcp.list_tools()
  search_response = await mcp.call_tool(
        "exa_search",
        {"query": "artificial intelligence latest developments", "num_results": 5}
    )
asyncio.run(main())

Provider Comparison

Provider	Index Type	Typical Cost	Strengths	Best Fit
Brave Search API	Independent	~$0.005	Fresh index, high QPS	Broad, low-cost grounding
Google Programmable Search	Google	~$0.005	Huge corpus	Google-specific retrieval
Perplexity API	Aggregated + LLM ranking	~$0.005	Hybrid ranking	Default SERP-style retrieval
Tavily	Meta-search + extract	1–2 credits	LLM-focused	Agentic RAG
Exa	In-house + content fetch	~$0.005	Fine-grained control	Summarization & analysis
Anthropic Web Search	Native tool	~$0.01 + tokens	Governance, citations	LLM-native agents
Kagi API	Curated	~$0.025	Privacy, ad-free	Premium user experiences

Fact lookups at scale :Use Brave Base or Google JSON for lowest cost per query, high QPS, and straightforward SERP parsing.
Agentic RAG - Tavily and EXA if you want summaries and keep tight control over the output structure.
Fully LLM-native research tasks - Use Anthropic Web Search for tool-based queries with automatic citations, multi-step searches, and domain allow/block lists. This is easiest to get started.
High-fidelity, privacy-focused consumer search in product: Use Kagi API when curated ranking and ad-free results matter more than marginal per-query cost; good for premium user experiences.

Conclusion

Based on your use case — whether you’re researching, verifying, or reasoning — choose the grounding method that fits best.
But for all use cases, structured outputs help you experiment faster, track retrieval performance, and make your AI pipeline production-ready.

Try out the examples here:
github.com/sandipan1/baml-agents