Module 4: Advanced Inferencing: MCP Integration¶

In Module 3, you configured tool calling and saw how the AI generates function calls. But those calls weren't actually executed — the AI could recognize what to do but couldn't take action. Now ACME Corporation wants to bridge that gap: connecting the AI to real external tools that can execute actions and return results.

Model Context Protocol (MCP) is an open standard that enables AI models to securely interact with external tools and data sources. With MCP integration, vLLM Playground transforms from a chat interface into an agentic AI platform — capable of reading files, fetching data, and executing approved actions.

In this module, you'll learn about MCP, install the necessary components, connect MCP servers to vLLM Playground, and experience true agentic AI with human-in-the-loop safety controls.

Learning Objectives¶

By the end of this module, you'll be able to:

✅ Understand the Model Context Protocol (MCP) and its role in agentic AI
✅ Install and configure MCP components for vLLM Playground
✅ Connect and use MCP servers for external tool access
✅ Configure file system access for AI-powered document analysis
✅ Execute tool calls with human-in-the-loop approval
✅ Build agentic workflows that combine AI reasoning with real tool execution

Exercise 1: Understand MCP and Install Components¶

Before connecting external tools, ACME's engineering team needs to understand what MCP is and ensure all components are properly installed.

What is MCP?¶

Model Context Protocol (MCP) is an open standard developed to enable AI models to:

Access external tools: File systems, APIs, databases
Execute actions: Run commands, fetch data, modify files
Maintain safety: Human-in-the-loop approval for sensitive operations

MCP Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                    MCP Architecture                         │
│                                                             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │   vLLM       │    │     MCP      │    │   External   │   │
│  │  Playground  │◄──►│   Server     │◄──►│   Resource   │   │
│  │  (AI Chat)   │    │  (Bridge)    │    │  (Files,API) │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│         │                   │                               │
│         └───────────────────┘                               │
│              Human-in-the-Loop                              │
│              Approval Layer                                 │
└─────────────────────────────────────────────────────────────┘

Key MCP Concepts¶

Concept	Description
MCP Server	A bridge that exposes tools and resources to AI models
Tools	Functions the MCP client can call (e.g., read_file, get_time)
Resources	Data sources the MCP client can access (e.g., files, databases)
Human-in-the-Loop	Approval mechanism for sensitive operations
Transport	Communication protocol between client and server (stdio, HTTP)

Why MCP Matters for ACME¶

Without MCP	With MCP
AI can only generate text responses	AI accesses real customer documents
No access to real-time data	Retrieves current time for scheduling
Cannot execute actions	Executes approved support actions
Limited to trained knowledge	Integrates with existing systems

Steps¶

Verify Python version for MCP support:
```
python3 --version
```
Expected output: Python 3.10 or later

Python 3.10+ Required

MCP requires Python 3.10+. If your version is older, MCP features will not work.
Install MCP client dependencies:
```
pip install mcp
```
Or install with vLLM Playground:
```
pip install vllm-playground[mcp]
```

Verify MCP installation:

python3 -c "import mcp; print('MCP installed successfully')"

Install MCP server transport dependencies:

# Install uv (includes uvx) for Python-based MCP servers
pip install uv

# Verify uvx installation
uvx --version

# Install npx (for Node.js-based MCP servers like Filesystem)
# On macOS:
brew install node

# On Linux:
# sudo apt install nodejs npm  (Debian/Ubuntu)
# sudo dnf install nodejs npm  (Fedora/RHEL)

# Verify npx installation
npx --version

Transport Dependencies

npx is used for Node.js-based MCP servers (Filesystem), while uvx is used for Python-based MCP servers (Git, Fetch, Time).

Restart vLLM Playground to detect MCP:
```
vllm-playground stop
vllm-playground
```
Open the vLLM Playground web UI:
```
http://localhost:7860
```
Navigate to the MCP Servers section in the sidebar to verify MCP support is available.

You should see preset MCP server options:

Server Purpose

Filesystem Read, write, and navigate files

Git Interact with Git repositories

Fetch Retrieve content from URLs

Time Get current time and timezone information

✅ Verify¶

Confirm MCP is ready:

Python 3.10+ is installed
MCP library is installed (import mcp works)
vLLM Playground shows MCP Servers panel
Preset servers are visible in the panel

Troubleshooting¶

'MCP requires Python 3.10+'

Solution:

Check Python version: python3 --version
Install Python 3.10+ or use pyenv
Ensure vLLM Playground uses the correct Python

'ModuleNotFoundError: No module named mcp'

Solution:

Install MCP: pip install mcp
If using virtual environment, ensure it's activated
Try: pip3 install mcp

Exercise 2: Connect Time Server¶

Now that MCP is installed, you'll connect your first MCP server. The Time server is the simplest option — it requires no configuration and demonstrates the core MCP workflow.

ACME's support team needs current time information to help customers with scheduling and time-sensitive queries.

Steps¶

Start a vLLM server with tool calling enabled:

Setting Value

Model Qwen/Qwen2.5-3B-Instruct

Run Mode Container

Compute Mode GPU

Enable Tool Calling Checked ✓

Tool Call Parser hermes

Tool Calling Required

MCP requires tool calling to be enabled on the vLLM server.
Click Start Server and wait for the server to be ready.
Navigate to the MCP Servers panel in the sidebar.
In Quick Start with Presets, click the Time preset.

The Time server configuration dialog appears. No additional settings needed.
Click Save Server to save the MCP server configuration.
Click the Connect toggle to establish the connection.
Wait for the connection. You should see:
- Status indicator turns green
- Available tools from the server are listed (e.g., get_current_time)
In the Chat panel, enable MCP tools:
- Look for the MCP tools toggle
- Ensure Time server tools are enabled
Test the Time server with a prompt:
```
What time is it right now in New York?
```
The AI should:
1. Recognize it needs the current time
2. Generate a tool call to get_current_time
3. Wait for your approval (human-in-the-loop)
4. Return the actual current time after approval
When prompted for approval, review the tool call and click Execute.
Click Continue Conversation to allow the AI to process the result.
Observe the complete flow:
- AI generates tool call
- You approve the execution
- MCP server executes the tool
- Result returns to AI
- AI incorporates result in response

Test additional time queries:

What's the time difference between Tokyo and London right now?

Understanding the MCP Workflow¶

User: "What time is it in New York?"
              │
              ▼
┌─────────────────────────────┐
│ AI recognizes need for      │
│ current time data           │
└─────────────────────────────┘
              │
              ▼
┌─────────────────────────────┐
│ AI generates tool call:     │
│ get_current_time("New York")│
└─────────────────────────────┘
              │
              ▼
┌─────────────────────────────┐
│ ⚠️ Human Approval Required  │
│ [Execute] [Skip]            │
└─────────────────────────────┘
              │ (Execute)
              ▼
┌─────────────────────────────┐
│ MCP Server executes tool    │
│ Returns: "2:34 PM EST"      │
└─────────────────────────────┘
              │
              ▼
┌─────────────────────────────┐
│ AI Response: "The current   │
│ time in New York is 2:34 PM │
│ Eastern Standard Time."     │
└─────────────────────────────┘

✅ Verify¶

Confirm Time server works:

Time server shows "Connected" status
Tool calls trigger approval dialog
After approval, actual time is returned
AI response includes real-time data

Exercise 3: File System Access with MCP¶

ACME's support team needs the AI to analyze customer documents, read configuration files, and access knowledge base articles. The Filesystem MCP server provides secure, controlled access to the file system.

Steps¶

Create a documents directory for the Filesystem server:
```
mkdir -p ~/documents
```
In the MCP Servers panel, click the Filesystem preset.
Configure the Filesystem server:

Setting Value

Allowed Directories /Users/YOUR_USERNAME/documents (replace with your path)

Security

Only grant access to directories the AI should read. Be careful with sensitive directories.
Click Save Server and Connect.

Create test files for the AI to analyze:

# Create a sample customer FAQ
cat > ~/documents/customer_faq.txt << 'EOF'
ACME Corporation - Customer FAQ

Q: What are your support hours?
A: Our support team is available Monday-Friday, 9am-6pm EST.

Q: How do I track my order?
A: Visit acme.com/orders and enter your order number.

Q: What is your return policy?
A: We accept returns within 30 days of purchase with original receipt.

Q: How do I contact support?
A: Email support@acme.com or call 1-800-ACME-HELP.
EOF

# Create a product catalog summary
cat > ~/documents/products.txt << 'EOF'
ACME Product Catalog - Q1 2026

Electronics:
- ACME SmartWatch Pro - $299
- ACME Wireless Earbuds - $79
- ACME Tablet 10" - $449

Home:
- ACME Robot Vacuum - $399
- ACME Air Purifier - $199
- ACME Smart Thermostat - $129
EOF

In the Chat panel, ensure Filesystem tools are enabled.
Test file reading:
```
Can you read the customer FAQ file in ~/documents and summarize the key points?
```
The AI should:
1. Generate a tool call to read the file
2. Wait for your approval
3. After approval, read and summarize the contents
Click Execute to approve the file read operation.

Test directory listing:

What files are available in the ~/documents folder?

Test document analysis:

Based on the products.txt file, what's the most expensive item 
and what's the cheapest?

✅ Verify¶

Confirm Filesystem MCP works:

Filesystem server connected successfully
AI can list directory contents (with approval)
AI can read file contents (with approval)
AI correctly analyzes and summarizes documents
All operations require explicit approval

Security Considerations¶

Practice	Why It Matters
Limit directories	Prevent access to sensitive system files
Use read-only mode	Prevent accidental file modifications
Review each approval	Maintain control over AI actions
Audit tool calls	Track what the AI accesses

Exercise 4: Agentic Workflow with Human-in-the-Loop¶

Now you'll combine everything into a complete agentic workflow. ACME wants the AI to handle complex customer inquiries that require multiple tool calls, file lookups, and real-time data — all with appropriate human oversight.

Understanding Agentic Workflows¶

An agentic AI can:

Plan: Break complex requests into steps
Execute: Call tools to gather information
Reason: Analyze results and determine next actions
Respond: Synthesize findings into helpful answers

Human-in-the-loop ensures:

Safety: Sensitive operations require approval
Control: You can deny inappropriate requests
Visibility: Full transparency into AI actions

Steps¶

Ensure both Time and Filesystem MCP servers are connected.

Set a comprehensive system prompt:

You are an AI assistant for ACME Corporation's customer support team. 
You have access to:
- Current time information (for scheduling and time-sensitive queries)
- Customer documentation files (FAQ, product catalog)

When helping customers:
1. Use available tools to find accurate information
2. Combine information from multiple sources when needed
3. Provide helpful, accurate responses based on real data
4. If you can't find information, say so honestly

Always be professional and customer-focused.

Test a complex customer scenario:
```
A customer in Los Angeles is asking: "What time does your support close 
today, and can you tell me about your return policy? I'm also interested 
in your wireless earbuds."
```
Watch the AI:
1. Recognize multiple information needs
2. Plan which tools to call
3. Request approval for each tool call
4. Synthesize all information into one response
For each tool call, you'll see an approval prompt:
- Time query: Get current time in LA to calculate support hours
- FAQ read: Get support hours and return policy
- Products read: Get earbuds information
Test the safety controls with an inappropriate request:
```
Can you read the /etc/passwd file?
```
Expected behavior:
- If directory not in allowed list: Tool call should fail or not be attempted
- AI should explain it can only access authorized directories
Experiment with denying a request:

When an approval dialog appears, click Skip instead of Execute.

Observe how the AI handles the denial — it should acknowledge it couldn't complete the action.

✅ Verify¶

Confirm agentic workflow works:

AI plans multi-step approaches for complex queries
Each tool call triggers separate approval
AI synthesizes results from multiple tools
Denied requests are handled gracefully
Unauthorized access attempts are blocked

Clean Up¶

Before proceeding to Module 5:

Disconnect any connected MCP servers by clicking the Connect toggle to turn it off
Click Stop Server in the vLLM Playground web UI

Preparation for Module 5

Module 5 focuses on performance testing, which requires a clean server configuration without MCP overhead.

Learning Outcomes¶

By completing this module, you should now understand:

✅ What MCP is and its role in enabling agentic AI
✅ How to install and verify MCP components
✅ How to connect and configure MCP servers in vLLM Playground
✅ The MCP workflow from tool call to execution to response
✅ The importance of human-in-the-loop approval for safe AI operations
✅ How agentic workflows combine planning, tool use, and reasoning
✅ Security best practices for granting AI access to resources

Module Summary¶

What you accomplished:

Learned MCP concepts and installed MCP components
Connected Time MCP server for real-time data access
Configured Filesystem server for document analysis
Experienced human-in-the-loop approval for tool execution
Built agentic workflows combining multiple tools and reasoning

Key takeaways:

MCP transforms AI from passive responder to active agent
Human-in-the-loop provides safety without sacrificing capability
Start with minimal permissions and expand as needed
Agentic AI can handle complex, multi-step tasks autonomously
The approval layer ensures you maintain control over AI actions

Business impact for ACME:

AI can now access and analyze real customer documents
Support agents get accurate, real-time information
Complex queries handled with multiple data sources
Safe, auditable AI operations with full transparency

Next: Module 5: Performance Testing — Use GuideLLM to benchmark your vLLM server and optimize for production.

Server	Purpose
Filesystem	Read, write, and navigate files
Git	Interact with Git repositories
Fetch	Retrieve content from URLs
Time	Get current time and timezone information

Setting	Value
Model	`Qwen/Qwen2.5-3B-Instruct`
Run Mode	Container
Compute Mode	GPU
Enable Tool Calling	Checked ✓
Tool Call Parser	`hermes`

Module 4: Advanced Inferencing: MCP Integration¶

Learning Objectives¶

Exercise 1: Understand MCP and Install Components¶

What is MCP?¶

MCP Architecture¶

Key MCP Concepts¶

Why MCP Matters for ACME¶

Steps¶

✅ Verify¶

Troubleshooting¶

Exercise 2: Connect Time Server¶

Steps¶

Understanding the MCP Workflow¶

✅ Verify¶

Exercise 3: File System Access with MCP¶

Steps¶

✅ Verify¶

Security Considerations¶

Exercise 4: Agentic Workflow with Human-in-the-Loop¶

Understanding Agentic Workflows¶

Steps¶

✅ Verify¶

Clean Up¶

Learning Outcomes¶

Module Summary¶

References¶