Production-Ready MCP Clients for LLMs
Build robust, scalable MCP integrations with structured validation, error handling, and debugging capabilities
Production-Ready MCP Clients for LLMs
In previous blog, we explored the basics of MCP and built a simple integration on how to connect your LLM to the MCP server using Pydantic AI. That approach works perfectly for demos and experimentation, but production systems need more robust patterns.If you don’t know what’s happening underneath, you’ll lose visibility and control at the worst time—when things break.
This post focuses on building MCP Client that can handle real-world complexity: multiple concurrent tool calls, validation failures, error recovery, and the debugging visibility you need when things go wrong.
The Production Reality Check
Here's what happens when you move beyond demos:
Tool Selection Failures: Your LLM confidently chooses write_query when it should use read_query, potentially corrupting data.
Parameter Validation Issues: The LLM generates syntactically correct but semantically wrong SQL queries that crash your database.
Concurrent Execution Problems: Multiple tool calls running simultaneously cause race conditions and resource conflicts.
Debugging Nightmares: When something goes wrong, you have no visibility into why the LLM made specific tool choices.
Scale Bottlenecks: Your simple sequential execution pattern can't handle the volume of requests in production.
Let's solve these problems systematically.
Case Study
A E-commerce company deployed an AI Assistant to help the customer service team query their Customer Service database.The demo was flawless—the AI could answer questions like "How many customers signed up last month?" and "Show me John Smith's order history" with perfect accuracy.
Three weeks after launch, their system had:
- Created 2,847 duplicate customer records (AI chose INSERT instead of SELECT)
- Corrupted 156 order statuses (invalid SQL syntax passed validation)
Root cause ? Their LLM integration had no validation, error handling, visibility into what the AI was actually doing.
Architecture Overview: Structured Tool Call Management

Instead of letting the LLM directly execute tools through a simple interface, we'll build a structured pipeline:
- Tool Discovery & Model Generation: Dynamically create validation models for each tool
- Intelligent Tool Selection: Let the LLM choose tools with proper validation
- Parameter Validation: Ensure all tool calls have valid parameters before execution
- Robust Execution: Handle errors gracefully and provide detailed logging
- Result Synthesis: Consolidate results into coherent responses
This approach gives us type safety, error recovery, and the debugging visibility needed for production systems.
Step 1: Structured Tool Definitions for LLM Use
MCP protocol defines a structured format for messages based on JSON-RPC 2.0.
To ensure that the LLM can generate valid inputs for tools, we need to give it a precise schema. That’s where Pydantic models come in — it enforces structure, validate input, and catch errors before execution.
For tool write_query a pydantic model would be
class write_query(BaseModel): query: str = Field(..., description="SQL query to execute")
This guides the LLM to return JSON like
{ "query": "CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)" }
Once the LLM selects the relevant tools, we convert them into ToolDefinition objects. These definitions are later used to generate Pydantic models for structured parameter generation by the LLM.
let's define models that can handle multiple tool calls with proper validation:
class ToolCall(BaseModel): """Model for a single tool call with its parameters""" name: str description: str parameters: dict class ToolCalls(BaseModel): """Model for multiple tool calls with comprehensive validation""" calls: List[Union[ToolCall]] @field_validator("calls") def validate_tool_calls(cls, v, info: ValidationInfo): # Get available tools from context tools: List[ToolDefinition] = info.context.get("tools", []) valid_tool_names = [tool.name for tool in tools] # Check for invalid tool names invalid_names = [call.name for call in v if call.name not in valid_tool_names] if invalid_names: raise ValueError( f"Tools {invalid_names} are not valid. Valid tools are: {valid_tool_names}" ) # Prevent overwhelming the system with too many concurrent calls if len(v) > 4: raise ValueError("You can only select at most 4 tools to call") # Check for duplicate tool calls (usually indicates poor planning) tool_names = [call.name for call in v] if len(tool_names) != len(set(tool_names)): raise ValueError("Duplicate tool calls detected - this usually indicates inefficient planning") return v class LLMResponse(BaseModel): """Structured response from the final LLM synthesis""" answer: str confidence: float = 1.0 tools_used: List[str] = [] warnings: List[str] = []
This validation layer catches common problems early:
- Invalid tool names are rejected with helpful error messages
- Resource limits prevent the system from being overwhelmed
- Duplicate calls are flagged as potential inefficiencies
- Missing context is handled gracefully
Step 2: Dynamic Tool Model Creation
Rather than manually creating Pydantic models for each tool, we can dynamically generate them based on the MCP server's tool schemas. This model acts as a response template for the LLM, guiding it to produce structured parameters tailored to the tool.
Once the parameters are generated, they're used to invoke the tool via the MCP client — completing the reasoning-to-execution loop.:
def create_tool_models(tools: List[ToolDefinition]) -> Dict[str, BaseModel]: """ Creates Pydantic models for each tool based on their schemas. Args: tools: List of tool definitions from the MCP client Returns: Dictionary mapping tool names to their Pydantic models """ tool_models = {} for tool in tools: tool_def = ToolDefinition( name=tool.name, description=tool.description, parameters_json_schema=tool.inputSchema, ) tool_models[tool.name] = create_model_from_tool_schema(tool_def) return tool_models # Example usage: async def initialize_database_agent(): """Initialize an agent that can manage customer database operations""" config = { "mcpServers": { "customer_db": { "command": "uvx", "args": ["mcp-server-sqlite", "--db-path", "customers.sqlite"], } } } client = Client(config) async with client: tools = await client.list_tools() tool_models = create_tool_models(tools) print(f"Successfully created models for {len(tool_models)} tools:") for name in tool_models.keys(): print(f" - {name}") return client, tools, tool_models
This approach has several advantages:
- Automatic synchronization with MCP server capabilities as they evolve over time
- Type safety for all tool parameters
- Graceful degradation when individual tools have schema issues
- Easy debugging of tool model creation problems
Step 3: Intelligent Tool Selection with Context
Instead of letting the LLM choose tools blindly, we'll give it rich context about available tools and guide it toward making good decisions:
async def generate_tool_calls( user_query: str, tools: List[ToolDefinition], async_client, context: Dict = None ) -> ToolCalls: """ Uses the LLM to generate appropriate tool calls based on the user query. Returns: ToolCalls object containing the LLM's chosen tool calls """ system_prompt = f"""You are a helpful assistant that can call tools in response to user requests. Available tools: {[f"- {tool.name}: {tool.description}" for tool in tools]} Guidelines for tool selection: 1. **Read before write**: Always use read_query before write_query to understand data structure 2. **Validate existence**: Use list_tables or describe_table to check if tables/columns exist 3. **Be conservative**: Don't make unnecessary function calls 4. **Think sequentially**: Some operations must happen in order 5. **Maximum 4 tools**: You can select at most 4 tools per request For each tool call, provide: The appropriate parameters based on the tool's schema Think step by step about what information you need and in what order.""" return await async_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_query}, ], temperature=0.0, # Use deterministic output for tool selection response_model=ToolCalls, context={"tools": tools}, )
The improved prompting includes:
- Detailed tool schemas so the LLM understands exactly what each tool does
- Best practice guidelines based on common failure patterns
- Sequential thinking encouragement for multi-step operations
- Conservative defaults to prevent unnecessary tool execution
Step 4: Robust Tool Execution with Error Handling
Now we'll execute the tool calls with comprehensive error handling and logging:
async def execute_tool_calls( tool_calls: ToolCalls, tool_models: Dict[str, BaseModel], client, timeout: int = 30 ) -> List[dict]: """ Executes tool calls with robust error handling and detailed logging. Args: tool_calls: The generated tool calls to execute tool_models: Dictionary of tool models for validation client: The MCP client timeout: Timeout for individual tool calls Returns: List of dictionaries containing tool call results and metadata """ results = [] for i, tool_call in enumerate(tool_calls.calls): start_time = time.time() try: # Validate tool exists if tool_call.name not in tool_models: error_msg = f"Tool {tool_call.name} not found in available tools" logger.error(error_msg) results.append({ "tool_name": tool_call.name, "parameters": tool_call.parameters, "response": None, "error": error_msg, "execution_time": 0, "success": False }) continue # Validate parameters using the tool's Pydantic model tool_model = tool_models[tool_call.name] try: validated_params = tool_model(**tool_call.parameters) clean_params = validated_params.model_dump(exclude={"name", "description"}) except Exception as validation_error: error_msg = f"Parameter validation failed for {tool_call.name}: {validation_error}" logger.error(error_msg) results.append({ "tool_name": tool_call.name, "parameters": tool_call.parameters, "response": None, "error": error_msg, "execution_time": 0, "success": False }) continue # Execute the tool call with timeout logger.info(f"Executing tool {tool_call.name} with params: {clean_params}") try: response = await asyncio.wait_for( client.call_tool(tool_call.name, clean_params), timeout=timeout ) execution_time = time.time() - start_time logger.info(f"Tool {tool_call.name} completed in {execution_time:.2f}s") results.append({ "tool_name": tool_call.name, "parameters": clean_params, "response": response[0] if response else "No response", "error": None, "execution_time": execution_time, "success": True }) except asyncio.TimeoutError: error_msg = f"Tool {tool_call.name} timed out after {timeout}s" logger.error(error_msg) results.append({ "tool_name": tool_call.name, "parameters": clean_params, "response": None, "error": error_msg, "execution_time": timeout, "success": False }) except Exception as execution_error: execution_time = time.time() - start_time error_msg = f"Tool execution failed: {execution_error}" logger.error(f"Tool {tool_call.name} failed after {execution_time:.2f}s: {execution_error}") results.append({ "tool_name": tool_call.name, "parameters": clean_params, "response": None, "error": error_msg, "execution_time": execution_time, "success": False }) except Exception as unexpected_error: # Catch-all for any unexpected errors execution_time = time.time() - start_time error_msg = f"Unexpected error: {unexpected_error}" logger.error(f"Unexpected error in tool {tool_call.name}: {unexpected_error}") results.append({ "tool_name": tool_call.name, "parameters": tool_call.parameters, "response": None, "error": error_msg, "execution_time": execution_time, "success": False }) # Log execution summary successful_calls = sum(1 for r in results if r["success"]) total_time = sum(r["execution_time"] for r in results) logger.info(f"Executed {len(results)} tool calls: {successful_calls} successful, total time: {total_time:.2f}s") return results
This execution framework provides:
- Parameter validation before any tool execution
- Timeout protection to prevent hanging calls
- Detailed logging for debugging and monitoring
- Graceful error handling that doesn't crash the entire workflow
- Performance metrics for optimization
Step 5: Intelligent Response Synthesis
Finally, we'll synthesize the results into a coherent response that acknowledges both successes and failures:
async def generate_final_response( user_query: str, tool_responses: List[dict], async_client ) -> LLMResponse: """ Generates a comprehensive final response that handles both successful and failed tool executions intelligently. """ # Separate successful and failed tool calls successful_results = [r for r in tool_responses if r["success"]] failed_results = [r for r in tool_responses if not r["success"]] # Build context for the LLM context_parts = [] if successful_results: context_parts.append("Successful tool executions:") for result in successful_results: context_parts.append(f"- {result['tool_name']}: {result['response']}") if failed_results: context_parts.append("\nFailed tool executions:") for result in failed_results: context_parts.append(f"- {result['tool_name']}: {result['error']}") context = "\n".join(context_parts) system_prompt = f"""You are analyzing the results of tool executions to answer a user query. Original query: {user_query} Tool execution results: {context} Instructions: 1. If all tools succeeded, provide a complete answer based on the results 2. If some tools failed, acknowledge the limitations and provide partial answers where possible 3. If critical tools failed, explain what couldn't be determined and why 4. Suggest next steps if the query couldn't be fully answered 5. Be honest about limitations - don't make up information Provide your confidence level (0.0-1.0) based on how completely you could answer the query.""" return await async_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Please analyze these results and provide a comprehensive response."}, ], temperature=0.1, response_model=LLMResponse, )
Step 6: Complete Production Workflow
Here's how all the pieces fit together in a production-ready system:
import time import asyncio from fastmcp import Client async def production_mcp_workflow( user_query: str, config: dict, max_retries: int = 2 ) -> LLMResponse: """ Complete production-ready MCP workflow with error handling and retries. """ for attempt in range(max_retries + 1): try: logger.info(f"Processing query (attempt {attempt + 1}): {user_query}") # Initialize MCP client and discover tools client = Client(config) async with client: tools = await client.list_tools() tool_models = create_tool_models(tools) if not tools: return LLMResponse( answer="No tools available from the MCP server.", confidence=0.0, warnings=["MCP server returned no tools"] ) # Generate tool calls tool_calls = await generate_tool_calls(user_query, tools, async_client) logger.info(f"Generated {len(tool_calls.calls)} tool calls: {[call.name for call in tool_calls.calls]}") # Execute tools tool_responses = await execute_tool_calls(tool_calls_response, tool_models, client) logger.info(f"Executed {len(tool_responses)} tool calls") # Generate final response final_response = await generate_final_response( user_query=user_query, tool_responses=tool_responses, async_client=async_client ) return final_response except Exception as e: logger.error(f"Error in workflow (attempt {attempt + 1}): {str(e)}") if attempt == max_retries: return LLMResponse( answer="I apologize, but I encountered an error while processing your request. Please try again later.", confidence=0.0, warnings=[f"Workflow failed after {max_retries + 1} attempts: {str(e)}"] ) # Wait before retrying await asyncio.sleep(1 * (attempt + 1)) # Exponential backoff
User Query : "see if the table animal exists. If it exists, give description of the table"
Response : The table "animals" does exist in the database. It has the following structure:
1. name: Type - TEXT, Nullable - Yes
2. type: Type - TEXT, Nullable - Yes
3. age: Type - INTEGER, Nullable - Yes
This table does not have any primary key defined
DAG-based Execution for Multi-Step Workflows
So far, we've focused on a single-turn tool interaction- but many real-world queries are
multi-step and dependent tool interactions between your LLM agent and an MCP server.
Suppose your LLM agent receives a user query:
"List all tables in the database and describe each one."
With a DAG-based approach:
- The agent calls the list_tables tool (root node).
- For each table returned, the agent creates a node to call the describe_table tool.
- The results are gathered and passed to the LLM for summarization or further reasoning.
This pattern generalizes to any scenario where:
- Agents dynamically decide the number of tool calls based on data.
- Tools calls can be made only if certain conditions are met.
- Output of one tool call influences the flow of next.
I'll explore this DAG-based approach more thoroughly in a later part of this series.
Production MCP Pitfalls
Pitfall #1: The "Demo Magic" Problem
You MCP integration works perfectly in controlled demos but failes with real user queries
Real Example : Support customer bot worked flawlessly when demoed with "Show me customer John Smith's ticket". But when a real user asked "What's up with john smith's stuff?", AI generated
SELECT * FROM customers WHERE name = "john smith's stuff"
Pitfall #2 : The Timeout Cascade
One slow query blocks your entire system.
Real Example: A user asked "Show me all customer data for analysis." The AI generated a query that took 45 seconds to complete, blocking all other requests.
Pitfalls #3: The Context Window Explosion
Tool responses exceed the LLM's context window, causing failures or truncated responses.
Real Example: "Show me all customer tickets" returned 50,000 rows, consuming the entire context window and making the LLM unusable.
Pitfall #4: The Silent Failure Trap
Tools fail silently, and the AI hallucinates responses based on no data.
Real Example: Database query fails due to a locked table, but the AI responds: "John Smith has 3 open tickets" (completely made up).
I recently tweeted about one.
Pitfall #5: The Error Message Black Hole
Unclear error messages make debugging impossible. When your tools doesn't return approapriate error messages for the LLM to parse.
Pitfall #6: The Parameter Validation Illusion
Parameters look correct but contain subtle errors that cause wrong results.
Real Example: AI generates SELECT * FROM customers WHERE created_at = '2024-13-45' (invalid date that SQLite accepts but returns no results).
What's next: The intelligence problem?
You now have a robust, production-ready MCP client that won't crash, corrupt data, or leave you debugging at 3 AM. But there's still one critical question we haven't answered:
How do you know if your AI is making right tool choices?— decision to invoke the right tool with the right parameters still rests on the LLM, which is inherently non-deterministic. Things get tricky when you are dealing with 20 tools and LLM has to decide which tool to pick and in their order for executing them.
In the next part of this series, we'll dive into the overlooked but critical problem: How do you know if your agent is choosing the right tool — and how do you fix it when it doesn't?
We'll explore tool retrieval evaluation, failure patterns, and practical ways to debug and improve tool selection, so your agent not only runs, but runs smart.
💡 Got questions about implementing these patterns? Drop them in comments
This post is part of a series on production AI engineering with MCP. Follow along as we build from basic connections to enterprise-grade AI systems that you can actually depend on.