Module 3: Advanced Inferencing: Tool Calling¶
In Module 2, you learned to constrain AI outputs to specific formats. Now ACME Corporation wants to take their AI customer support to the next level: enabling the AI to not just respond, but to take actions — looking up order status, checking inventory, or scheduling callbacks.
Tool calling (also known as function calling) allows the AI to recognize when it needs external data or actions, and generate structured function calls that your systems can execute. In this module, you'll configure tool calling and define custom tools for ACME's customer support scenarios.
Tool Calling Pattern
This module focuses on the tool calling pattern — how the AI generates function calls. Actual tool execution with human-in-the-loop approval is covered in Module 4 (MCP Integration).
Learning Objectives¶
By the end of this module, you'll be able to:
- ✅ Enable tool calling in vLLM Playground server configuration
- ✅ Understand tool calling parsers for different model families
- ✅ Define custom tools with proper function schemas
- ✅ Interpret AI-generated function calls and arguments
- ✅ Choose the appropriate model and parser for tool calling use cases
Exercise 1: Enable Tool Calling¶
ACME's engineering team needs to configure the vLLM server to support tool calling. This requires enabling the feature and selecting the appropriate parser for the model being used.
Understanding Tool Calling¶
Tool calling enables AI models to recognize when they need external data or capabilities, and generate structured requests (function calls) that your application can interpret and execute.
Important: AI Doesn't Execute Tools
Tool calling does NOT mean LLM executes tools. When tool calling is enabled, the AI model generates a JSON-formatted function call with arguments — but the LLM itself does not execute any code or call external APIs. Your application (vLLM Playground in this case) is responsible for parsing the tool call, executing the actual function, and returning results to the model. This design ensures security and gives you full control.
Understanding Tool Calling Parsers¶
Different model families use different formats for tool calling. vLLM supports several parsers:
| Parser | Models | Format |
|---|---|---|
llama3_json |
Llama 3.x, Llama 3.1, Llama 3.2 | JSON-based function calls |
mistral |
Mistral, Mixtral | Mistral's native tool format |
hermes |
Hermes, Qwen (with Hermes prompt) | Hermes-style function calling |
auto (Auto-detect) |
Various | Attempts to detect model type |
Steps¶
-
Open the vLLM Playground web UI:
-
If a server is running, stop it first:
Click Stop Server in the Server Configuration panel.
-
In the Server Configuration panel, configure tool calling:
- Check Enable Tool Calling
- Select a tool calling parser (or leave as "Auto-detect")
-
For this exercise, configure:
Setting Value Model Qwen/Qwen2.5-3B-Instruct(or similar Qwen model)Enable Tool Calling ✓ Checked Tool Call Parser hermes(or Auto-detect)Open Models
Qwen models are open and don't require HuggingFace authentication. If using gated models like Llama, ensure you have HuggingFace access configured.
-
Click Start Server and wait for the model to load.
-
Monitor the server logs in the Server Logs panel for tool calling confirmation.
✅ Verify¶
Confirm tool calling is enabled:
- Server started without errors
- "Enable Tool Calling" shows as active in the UI
- Server logs confirm tool calling parser loaded
Troubleshooting¶
'Tool calling not supported for this model'
Solution:
- Use a model that supports function calling (Llama 3.x, Mistral, Qwen)
- Check the model's documentation for tool calling support
- Try a different parser setting
Server fails to start with tool calling enabled
Solution:
- Check GPU memory — tool calling may require additional resources
- Try a smaller model
- Review server logs for specific errors
Exercise 2: Define Custom Tools¶
With tool calling enabled, ACME needs to define the tools (functions) that the AI can invoke. Each tool has a name, description, and parameter schema that tells the AI when and how to use it.
Understanding Tool Definitions¶
A tool definition includes:
- name: Function identifier (e.g.,
get_order_status) - description: When to use this tool (helps AI decide)
- parameters: JSON Schema defining expected arguments
Steps¶
-
In vLLM Playground, navigate to the Tools panel (🔧 icon in the toolbar).
-
Click Add Tool to define your first customer support function.
-
Create an order status lookup tool:
{ "type": "function", "function": { "name": "get_order_status", "description": "Look up the current status of a customer order. Use this when a customer asks about their order, shipping, or delivery.", "parameters": { "type": "object", "properties": { "order_id": { "type": "string", "description": "The order ID to look up (e.g., ORD-12345)" } }, "required": ["order_id"] } } } -
Add a second tool for customer information:
{ "type": "function", "function": { "name": "get_customer_info", "description": "Retrieve customer account information. Use this when you need to verify customer identity or look up account details.", "parameters": { "type": "object", "properties": { "customer_email": { "type": "string", "description": "Customer's email address" }, "customer_id": { "type": "string", "description": "Customer's account ID (optional if email provided)" } }, "required": ["customer_email"] } } } -
Add a third tool for scheduling callbacks:
{ "type": "function", "function": { "name": "schedule_callback", "description": "Schedule a callback from a support agent. Use this when the customer requests to speak with a human or needs escalated support.", "parameters": { "type": "object", "properties": { "customer_phone": { "type": "string", "description": "Customer's phone number for callback" }, "preferred_time": { "type": "string", "description": "Preferred callback time (e.g., 'morning', 'afternoon', '2pm EST')" }, "issue_summary": { "type": "string", "description": "Brief summary of the customer's issue" } }, "required": ["customer_phone", "issue_summary"] } } } -
Add a product search tool:
{ "type": "function", "function": { "name": "search_products", "description": "Search the product catalog. Use this when a customer asks about products, availability, or pricing.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "Search query (product name, category, or keywords)" }, "category": { "type": "string", "description": "Product category filter (optional)", "enum": ["electronics", "clothing", "home", "sports", "all"] }, "max_results": { "type": "integer", "description": "Maximum number of results to return" } }, "required": ["query"] } } } -
Your tools panel should now show 4 defined tools.
✅ Verify¶
Confirm tools are properly defined:
- All 4 tools appear in the Tools panel
- Each tool has name, description, and parameters
- No JSON syntax errors (panel accepts the definitions)
Best Practices for Tool Definitions¶
| Practice | Why It Matters |
|---|---|
| Clear descriptions | Helps AI decide when to use the tool |
| Specific parameter names | Reduces ambiguity in extracted values |
| Required vs optional | Guides AI on minimum needed information |
| Enum constraints | Limits values to valid options |
| Default values | Provides sensible fallbacks |
Exercise 3: Test Tool Calling Workflow¶
Now you'll see tool calling in action. The AI will analyze customer messages and generate appropriate function calls based on the tools you defined.
Steps¶
-
Ensure your tools from Exercise 2 are loaded and the server is running with tool calling enabled.
-
In the Chat panel, set a system prompt:
-
In the Tools panel, set Tool Choice to "Auto (recommended)".
This allows the AI to automatically decide when to use tools.
-
Test order status inquiry:
Hi, I placed an order last week and haven't received any updates. My order number is ORD-78432. Can you check the status?Expected AI behavior — generates a tool call:
Streaming Disabled
Streaming is disabled for tool calling due to a bug in vLLM v0.11.0. This has been resolved in later versions.
-
Observe the response format:
- The AI recognizes the need for external data
- Instead of making up information, it generates a function call
- The arguments are extracted from the customer's message
-
Test customer lookup:
Expected tool call:
-
Test callback scheduling:
This is frustrating! I've been trying to resolve this billing issue for days. Can someone call me back? My number is 555-123-4567, preferably in the afternoon.Expected tool call:
-
Test product search:
Expected tool call:
-
Test a message that doesn't need tools:
Expected: Normal text response (no tool call) — the AI should respond conversationally.
-
Test multiple potential tools:
Observe: The AI may generate multiple tool calls or prioritize one.
✅ Verify¶
Confirm tool calling workflow works:
- AI generates tool calls for appropriate requests
- Arguments are correctly extracted from customer messages
- Tool names match the defined functions
- AI responds normally when tools aren't needed
- JSON format is valid and parseable
Understanding the Workflow¶
┌─────────────────────────────────────────────────────────────┐
│ Tool Calling Workflow │
│ │
│ 1. Customer Message │
│ "Check order ORD-12345" │
│ ↓ │
│ 2. AI Analysis │
│ Recognizes need for get_order_status tool │
│ ↓ │
│ 3. Tool Call Generated │
│ {"name": "get_order_status", "arguments": {...}} │
│ ↓ │
│ 4. [Module 4] Tool Execution (MCP Server) │
│ Actual function runs, returns data │
│ ↓ │
│ 5. [Module 4] AI Response │
│ AI incorporates result into customer response │
└─────────────────────────────────────────────────────────────┘
In this module, we completed steps 1-3. Module 4 (MCP Integration) covers steps 4-5 with actual tool execution.
Troubleshooting¶
AI responds with text instead of tool calls
Solution:
- Verify tool calling is enabled in server config
- Check that tools are properly loaded
- Ensure system prompt mentions available tools
- Try rephrasing the customer message more explicitly
AI generates wrong tool or arguments
Solution:
- Improve tool descriptions to be more specific
- Add examples in the description
- Check parameter names are clear and unambiguous
- Try a different bigger model (Llama 4 has improved tool calling)
Invalid JSON in tool call
Solution:
- Check tool call parser matches model
- Try "Auto-detect" parser setting
- Some models may need specific prompt formatting
Clean Up¶
Before proceeding to Module 4, stop the current vLLM server:
- Click Stop Server in the vLLM Playground web UI
- Verify the server has stopped
Preparation for Module 4
Module 4 requires restarting after installing MCP dependencies.
Learning Outcomes¶
By completing this module, you should now understand:
- ✅ How tool calling enables AI to request external data and actions
- ✅ The role of tool parsers for different model families
- ✅ How to define tools with proper schemas and descriptions
- ✅ The workflow from customer message to generated function call
- ✅ Best practices for tool definitions that guide AI behavior
- ✅ The difference between tool call generation and tool execution
Module Summary¶
What you accomplished:
- Enabled tool calling in vLLM Playground server configuration
- Understood tool calling parsers for Llama, Mistral, and Hermes models
- Defined 4 customer support tools with proper schemas
- Tested tool calling workflow with various customer scenarios
- Observed AI-generated function calls with extracted arguments
Key takeaways:
- Tool calling transforms AI from passive responder to active assistant
- Good tool descriptions are critical for AI decision-making
- The AI generates structured calls; your systems handle execution
- Different models have different tool calling capabilities
- Tool calling is the foundation for agentic AI workflows
Business impact for ACME:
- AI can now recognize when it needs customer data
- Structured tool calls integrate with existing backend APIs
- Reduces need for customers to provide information multiple ways
- Foundation for automated support workflows
Next: Module 4: MCP Integration — Connect the AI to actual external tools with human-in-the-loop approval.
References¶
- vLLM Playground - Tool Calling
- vLLM Tool Calling
- OpenAI Function Calling Guide (compatible format)








