Module 4: Advanced Inferencing: MCP Integration¶
In Module 3, you configured tool calling and saw how the AI generates function calls. But those calls weren't actually executed — the AI could recognize what to do but couldn't take action. Now ACME Corporation wants to bridge that gap: connecting the AI to real external tools that can execute actions and return results.
Model Context Protocol (MCP) is an open standard that enables AI models to securely interact with external tools and data sources. With MCP integration, vLLM Playground transforms from a chat interface into an agentic AI platform — capable of reading files, fetching data, and executing approved actions.
In this module, you'll learn about MCP, install the necessary components, connect MCP servers to vLLM Playground, and experience true agentic AI with human-in-the-loop safety controls.
Learning Objectives¶
By the end of this module, you'll be able to:
- ✅ Understand the Model Context Protocol (MCP) and its role in agentic AI
- ✅ Install and configure MCP components for vLLM Playground
- ✅ Connect and use MCP servers for external tool access
- ✅ Configure file system access for AI-powered document analysis
- ✅ Execute tool calls with human-in-the-loop approval
- ✅ Build agentic workflows that combine AI reasoning with real tool execution
Exercise 1: Understand MCP and Install Components¶
Before connecting external tools, ACME's engineering team needs to understand what MCP is and ensure all components are properly installed.
What is MCP?¶
Model Context Protocol (MCP) is an open standard developed to enable AI models to:
- Access external tools: File systems, APIs, databases
- Execute actions: Run commands, fetch data, modify files
- Maintain safety: Human-in-the-loop approval for sensitive operations
MCP Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ MCP Architecture │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ vLLM │ │ MCP │ │ External │ │
│ │ Playground │◄──►│ Server │◄──►│ Resource │ │
│ │ (AI Chat) │ │ (Bridge) │ │ (Files,API) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └───────────────────┘ │
│ Human-in-the-Loop │
│ Approval Layer │
└─────────────────────────────────────────────────────────────┘
Key MCP Concepts¶
| Concept | Description |
|---|---|
| MCP Server | A bridge that exposes tools and resources to AI models |
| Tools | Functions the MCP client can call (e.g., read_file, get_time) |
| Resources | Data sources the MCP client can access (e.g., files, databases) |
| Human-in-the-Loop | Approval mechanism for sensitive operations |
| Transport | Communication protocol between client and server (stdio, HTTP) |
Why MCP Matters for ACME¶
| Without MCP | With MCP |
|---|---|
| AI can only generate text responses | AI accesses real customer documents |
| No access to real-time data | Retrieves current time for scheduling |
| Cannot execute actions | Executes approved support actions |
| Limited to trained knowledge | Integrates with existing systems |
Steps¶
-
Verify Python version for MCP support:
Expected output: Python 3.10 or later
Python 3.10+ Required
MCP requires Python 3.10+. If your version is older, MCP features will not work.
-
Install MCP client dependencies:
Or install with vLLM Playground:
-
Verify MCP installation:
-
Install MCP server transport dependencies:
# Install uv (includes uvx) for Python-based MCP servers pip install uv # Verify uvx installation uvx --version# Install npx (for Node.js-based MCP servers like Filesystem) # On macOS: brew install node # On Linux: # sudo apt install nodejs npm (Debian/Ubuntu) # sudo dnf install nodejs npm (Fedora/RHEL) # Verify npx installation npx --versionTransport Dependencies
npxis used for Node.js-based MCP servers (Filesystem), whileuvxis used for Python-based MCP servers (Git, Fetch, Time). -
Restart vLLM Playground to detect MCP:
-
Open the vLLM Playground web UI:
-
Navigate to the MCP Servers section in the sidebar to verify MCP support is available.
You should see preset MCP server options:
Server Purpose Filesystem Read, write, and navigate files Git Interact with Git repositories Fetch Retrieve content from URLs Time Get current time and timezone information
✅ Verify¶
Confirm MCP is ready:
- Python 3.10+ is installed
- MCP library is installed (
import mcpworks) - vLLM Playground shows MCP Servers panel
- Preset servers are visible in the panel
Troubleshooting¶
'MCP requires Python 3.10+'
Solution:
- Check Python version:
python3 --version - Install Python 3.10+ or use pyenv
- Ensure vLLM Playground uses the correct Python
'ModuleNotFoundError: No module named mcp'
Solution:
- Install MCP:
pip install mcp - If using virtual environment, ensure it's activated
- Try:
pip3 install mcp
Exercise 2: Connect Time Server¶
Now that MCP is installed, you'll connect your first MCP server. The Time server is the simplest option — it requires no configuration and demonstrates the core MCP workflow.
ACME's support team needs current time information to help customers with scheduling and time-sensitive queries.
Steps¶
-
Start a vLLM server with tool calling enabled:
Setting Value Model Qwen/Qwen2.5-3B-InstructRun Mode Container Compute Mode GPU Enable Tool Calling Checked ✓ Tool Call Parser hermesTool Calling Required
MCP requires tool calling to be enabled on the vLLM server.
-
Click Start Server and wait for the server to be ready.
-
Navigate to the MCP Servers panel in the sidebar.
-
In Quick Start with Presets, click the Time preset.
The Time server configuration dialog appears. No additional settings needed.
-
Click Save Server to save the MCP server configuration.
-
Click the Connect toggle to establish the connection.
-
Wait for the connection. You should see:
- Status indicator turns green
- Available tools from the server are listed (e.g.,
get_current_time)
-
In the Chat panel, enable MCP tools:
- Look for the MCP tools toggle
- Ensure Time server tools are enabled
-
Test the Time server with a prompt:
The AI should:
- Recognize it needs the current time
- Generate a tool call to
get_current_time - Wait for your approval (human-in-the-loop)
- Return the actual current time after approval
-
When prompted for approval, review the tool call and click Execute.
-
Click Continue Conversation to allow the AI to process the result.
-
Observe the complete flow:
- AI generates tool call
- You approve the execution
- MCP server executes the tool
- Result returns to AI
- AI incorporates result in response
-
Test additional time queries:
Understanding the MCP Workflow¶
User: "What time is it in New York?"
│
▼
┌─────────────────────────────┐
│ AI recognizes need for │
│ current time data │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ AI generates tool call: │
│ get_current_time("New York")│
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ ⚠️ Human Approval Required │
│ [Execute] [Skip] │
└─────────────────────────────┘
│ (Execute)
▼
┌─────────────────────────────┐
│ MCP Server executes tool │
│ Returns: "2:34 PM EST" │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ AI Response: "The current │
│ time in New York is 2:34 PM │
│ Eastern Standard Time." │
└─────────────────────────────┘
✅ Verify¶
Confirm Time server works:
- Time server shows "Connected" status
- Tool calls trigger approval dialog
- After approval, actual time is returned
- AI response includes real-time data
Exercise 3: File System Access with MCP¶
ACME's support team needs the AI to analyze customer documents, read configuration files, and access knowledge base articles. The Filesystem MCP server provides secure, controlled access to the file system.
Steps¶
-
Create a documents directory for the Filesystem server:
-
In the MCP Servers panel, click the Filesystem preset.
-
Configure the Filesystem server:
Setting Value Allowed Directories /Users/YOUR_USERNAME/documents(replace with your path)Security
Only grant access to directories the AI should read. Be careful with sensitive directories.
-
Click Save Server and Connect.
-
Create test files for the AI to analyze:
# Create a sample customer FAQ cat > ~/documents/customer_faq.txt << 'EOF' ACME Corporation - Customer FAQ Q: What are your support hours? A: Our support team is available Monday-Friday, 9am-6pm EST. Q: How do I track my order? A: Visit acme.com/orders and enter your order number. Q: What is your return policy? A: We accept returns within 30 days of purchase with original receipt. Q: How do I contact support? A: Email support@acme.com or call 1-800-ACME-HELP. EOF # Create a product catalog summary cat > ~/documents/products.txt << 'EOF' ACME Product Catalog - Q1 2026 Electronics: - ACME SmartWatch Pro - $299 - ACME Wireless Earbuds - $79 - ACME Tablet 10" - $449 Home: - ACME Robot Vacuum - $399 - ACME Air Purifier - $199 - ACME Smart Thermostat - $129 EOF -
In the Chat panel, ensure Filesystem tools are enabled.
-
Test file reading:
The AI should:
- Generate a tool call to read the file
- Wait for your approval
- After approval, read and summarize the contents
-
Click Execute to approve the file read operation.
-
Test directory listing:
-
Test document analysis:
✅ Verify¶
Confirm Filesystem MCP works:
- Filesystem server connected successfully
- AI can list directory contents (with approval)
- AI can read file contents (with approval)
- AI correctly analyzes and summarizes documents
- All operations require explicit approval
Security Considerations¶
| Practice | Why It Matters |
|---|---|
| Limit directories | Prevent access to sensitive system files |
| Use read-only mode | Prevent accidental file modifications |
| Review each approval | Maintain control over AI actions |
| Audit tool calls | Track what the AI accesses |
Exercise 4: Agentic Workflow with Human-in-the-Loop¶
Now you'll combine everything into a complete agentic workflow. ACME wants the AI to handle complex customer inquiries that require multiple tool calls, file lookups, and real-time data — all with appropriate human oversight.
Understanding Agentic Workflows¶
An agentic AI can:
- Plan: Break complex requests into steps
- Execute: Call tools to gather information
- Reason: Analyze results and determine next actions
- Respond: Synthesize findings into helpful answers
Human-in-the-loop ensures:
- Safety: Sensitive operations require approval
- Control: You can deny inappropriate requests
- Visibility: Full transparency into AI actions
Steps¶
-
Ensure both Time and Filesystem MCP servers are connected.
-
Set a comprehensive system prompt:
You are an AI assistant for ACME Corporation's customer support team. You have access to: - Current time information (for scheduling and time-sensitive queries) - Customer documentation files (FAQ, product catalog) When helping customers: 1. Use available tools to find accurate information 2. Combine information from multiple sources when needed 3. Provide helpful, accurate responses based on real data 4. If you can't find information, say so honestly Always be professional and customer-focused. -
Test a complex customer scenario:
A customer in Los Angeles is asking: "What time does your support close today, and can you tell me about your return policy? I'm also interested in your wireless earbuds."Watch the AI:
- Recognize multiple information needs
- Plan which tools to call
- Request approval for each tool call
- Synthesize all information into one response
-
For each tool call, you'll see an approval prompt:
- Time query: Get current time in LA to calculate support hours
- FAQ read: Get support hours and return policy
- Products read: Get earbuds information
-
Test the safety controls with an inappropriate request:
Expected behavior:
- If directory not in allowed list: Tool call should fail or not be attempted
- AI should explain it can only access authorized directories
-
Experiment with denying a request:
When an approval dialog appears, click Skip instead of Execute.
Observe how the AI handles the denial — it should acknowledge it couldn't complete the action.
✅ Verify¶
Confirm agentic workflow works:
- AI plans multi-step approaches for complex queries
- Each tool call triggers separate approval
- AI synthesizes results from multiple tools
- Denied requests are handled gracefully
- Unauthorized access attempts are blocked
Clean Up¶
Before proceeding to Module 5:
- Disconnect any connected MCP servers by clicking the Connect toggle to turn it off
- Click Stop Server in the vLLM Playground web UI
Preparation for Module 5
Module 5 focuses on performance testing, which requires a clean server configuration without MCP overhead.
Learning Outcomes¶
By completing this module, you should now understand:
- ✅ What MCP is and its role in enabling agentic AI
- ✅ How to install and verify MCP components
- ✅ How to connect and configure MCP servers in vLLM Playground
- ✅ The MCP workflow from tool call to execution to response
- ✅ The importance of human-in-the-loop approval for safe AI operations
- ✅ How agentic workflows combine planning, tool use, and reasoning
- ✅ Security best practices for granting AI access to resources
Module Summary¶
What you accomplished:
- Learned MCP concepts and installed MCP components
- Connected Time MCP server for real-time data access
- Configured Filesystem server for document analysis
- Experienced human-in-the-loop approval for tool execution
- Built agentic workflows combining multiple tools and reasoning
Key takeaways:
- MCP transforms AI from passive responder to active agent
- Human-in-the-loop provides safety without sacrificing capability
- Start with minimal permissions and expand as needed
- Agentic AI can handle complex, multi-step tasks autonomously
- The approval layer ensures you maintain control over AI actions
Business impact for ACME:
- AI can now access and analyze real customer documents
- Support agents get accurate, real-time information
- Complex queries handled with multiple data sources
- Safe, auditable AI operations with full transparency
Next: Module 5: Performance Testing — Use GuideLLM to benchmark your vLLM server and optimize for production.













