Module 3: Advanced Inferencing: Tool Calling¶
In Module 2, you learned to constrain AI outputs to specific formats. Now ACME Corporation wants to take their AI customer support to the next level: enabling the AI to not just respond, but to take actions—looking up order status, checking inventory, or scheduling callbacks. This capability reduces average customer support handling time from 8 minutes to 3 minutes (62% faster), enabling ACME to handle 2.5x more customer inquiries with the same support team while improving first-contact resolution rates from 65% to 85%.
Tool calling (also known as function calling) allows the AI to recognize when it needs external data or actions, and generate structured function calls that your systems can execute. In this module, you'll configure tool calling and define custom tools for ACME's customer support scenarios. By implementing tool calling, ACME achieves $400,000 annual savings through improved agent productivity and reduced escalation rates.
Note
This module focuses on the tool calling pattern—how the AI generates function calls. Actual tool execution with human-in-the-loop approval is covered in Module 4 (MCP Integration).
Learning Objectives¶
By the end of this module, you'll be able to:
- Enable tool calling in vLLM Playground server configuration
- Understand tool calling parsers for different model families
- Define custom tools with proper function schemas
- Interpret AI-generated function calls and arguments
- Choose the appropriate model and parser for tool calling use cases
Exercise 1: Enable Tool Calling¶
ACME's engineering team needs to configure the vLLM server to support tool calling. This requires enabling the feature and selecting the appropriate parser for the model being used.
You'll restart the server with tool calling enabled and understand the different parser options.
Prerequisites¶
- Module 1 completed (familiar with vLLM Playground)
- Access to vLLM Playground web UI
Understanding Tool Calling¶
Tool calling enables AI models to recognize when they need external data or capabilities, and generate structured requests (function calls) that your application can interpret and execute.
Important: Tool calling does NOT mean LLM executes tools
When tool calling is enabled, the AI model generates a JSON-formatted function call with arguments — but LLM itself does not execute any code or call external APIs. Your application (vLLM Playground in this case) is responsible for parsing the tool call, executing the actual function, and returning results to the model. This design ensures security and gives you full control over what actions are taken.
Understanding Tool Calling Parsers¶
Different model families use different formats for tool calling. vLLM supports several parsers:
| Parser | Models | Format |
|---|---|---|
llama3_json |
Llama 3.x, Llama 3.1, Llama 3.2 | JSON-based function calls |
mistral |
Mistral, Mixtral | Mistral's native tool format |
hermes |
Hermes, Qwen (with Hermes prompt) | Hermes-style function calling |
auto (Auto-detect) |
Various | Attempts to detect model type |
Steps¶
-
Open the vLLM Playground web UI:
Navigate to: http://localhost:7860
-
If a server is running, stop it first:
- Click Stop Server in the Server Configuration panel
-
In the Server Configuration panel, configure tool calling:
- Check Enable Tool Calling
- Select a tool calling parser (or leave as "Auto-detect")
-
For this exercise, configure:
Setting Value Model Qwen/Qwen2.5-3B-Instruct(or similar Qwen model)Enable Tool Calling ✓ Checked Tool Call Parser hermes(or Auto-detect)Note
Qwen models are open and don't require HuggingFace authentication. If using gated models like Llama, ensure you have HuggingFace access configured.
-
Click Start Server and wait for the model to load.
-
Monitor the server logs in the Server Logs panel for tool calling confirmation. Look for messages indicating tool calling is enabled and the model has loaded successfully.
✅ Verify¶
Confirm tool calling is enabled:
- Server started without errors
- "Enable Tool Calling" shows as active in the UI
- Server logs confirm tool calling parser loaded
Troubleshooting¶
Issue: "Tool calling not supported for this model"
Solution:
- Use a model that supports function calling (Llama 3.x, Mistral, Qwen)
- Check the model's documentation for tool calling support
- Try a different parser setting
Issue: Server fails to start with tool calling enabled
Solution:
- Check GPU memory—tool calling may require additional resources
- Try a smaller model
- Review server logs for specific errors
Exercise 2: Define Custom Tools¶
With tool calling enabled, ACME needs to define the tools (functions) that the AI can invoke. Each tool has a name, description, and parameter schema that tells the AI when and how to use it.
You'll create customer support tools that the AI can call to assist customers.
Understanding Tool Definitions¶
A tool definition includes:
- name: Function identifier (e.g.,
get_order_status) - description: When to use this tool (helps AI decide)
- parameters: JSON Schema defining expected arguments
Steps¶
-
In vLLM Playground, navigate to the Tools panel (🔧 icon in the toolbar).
-
Click Add Tool to define your first customer support function.
-
Create an order status lookup tool:
{ "type": "function", "function": { "name": "get_order_status", "description": "Look up the current status of a customer order. Use this when a customer asks about their order, shipping, or delivery.", "parameters": { "type": "object", "properties": { "order_id": { "type": "string", "description": "The order ID to look up (e.g., ORD-12345)" } }, "required": ["order_id"] } } } -
Add a second tool for customer information:
{ "type": "function", "function": { "name": "get_customer_info", "description": "Retrieve customer account information. Use this when you need to verify customer identity or look up account details.", "parameters": { "type": "object", "properties": { "customer_email": { "type": "string", "description": "Customer's email address" }, "customer_id": { "type": "string", "description": "Customer's account ID (optional if email provided)" } }, "required": ["customer_email"] } } } -
Add a third tool for scheduling callbacks:
{ "type": "function", "function": { "name": "schedule_callback", "description": "Schedule a callback from a support agent. Use this when the customer requests to speak with a human or needs escalated support.", "parameters": { "type": "object", "properties": { "customer_phone": { "type": "string", "description": "Customer's phone number for callback" }, "preferred_time": { "type": "string", "description": "Preferred callback time (e.g., 'morning', 'afternoon', '2pm EST')" }, "issue_summary": { "type": "string", "description": "Brief summary of the customer's issue" } }, "required": ["customer_phone", "issue_summary"] } } } -
Add a product search tool:
{ "type": "function", "function": { "name": "search_products", "description": "Search the product catalog. Use this when a customer asks about products, availability, or pricing.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "Search query (product name, category, or keywords)" }, "category": { "type": "string", "description": "Product category filter (optional)", "enum": ["electronics", "clothing", "home", "sports", "all"] }, "max_results": { "type": "integer", "description": "Maximum number of results to return" } }, "required": ["query"] } } } -
Your tools panel should now show 4 defined tools:
✅ Verify¶
Confirm tools are properly defined:
- All 4 tools appear in the Tools panel
- Each tool has name, description, and parameters
- No JSON syntax errors (panel accepts the definitions)
Best Practices for Tool Definitions¶
| Practice | Why It Matters |
|---|---|
| Clear descriptions | Helps AI decide when to use the tool |
| Specific parameter names | Reduces ambiguity in extracted values |
| Required vs optional | Guides AI on minimum needed information |
| Enum constraints | Limits values to valid options |
| Default values | Provides sensible fallbacks |
Exercise 3: Test Tool Calling Workflow¶
Now you'll see tool calling in action. The AI will analyze customer messages and generate appropriate function calls based on the tools you defined. Remember: in this module, we're observing the pattern—the AI generates the calls, but we don't execute them.
Steps¶
-
Ensure your tools from Exercise 2 are loaded and the server is running with tool calling enabled.
-
In the Chat panel, set a system prompt for customer support:
-
In the Tools panel, set Tool Choice to Auto (recommended).
This setting allows the AI to automatically decide when to use tools based on the conversation context.
-
Test order status inquiry:
Hi, I placed an order last week and haven't received any updates. My order number is ORD-78432. Can you check the status?Expected AI behavior—generates a tool call:
Note
Streaming is disabled for tool calling due to a bug in vLLM v0.11.0. This issue has been resolved in later versions of vLLM.
-
Observe the response format:
- The AI recognizes the need for external data
- Instead of making up information, it generates a function call
- The arguments are extracted from the customer's message
-
Test customer lookup:
Expected tool call:
-
Test callback scheduling:
This is frustrating! I've been trying to resolve this billing issue for days. Can someone call me back? My number is 555-123-4567, preferably in the afternoon.Expected tool call:
-
Test product search:
Expected tool call:
-
Test a message that doesn't need tools:
Expected: Normal text response (no tool call) — the AI should respond conversationally without invoking a tool.
-
Test multiple potential tools:
Observe: The AI may generate multiple tool calls or prioritize one. Different models handle this differently.
✅ Verify¶
Confirm tool calling workflow works:
- AI generates tool calls for appropriate requests
- Arguments are correctly extracted from customer messages
- Tool names match the defined functions
- AI responds normally when tools aren't needed
- JSON format is valid and parseable
Understanding the Workflow¶
┌─────────────────────────────────────────────────────────────┐
│ Tool Calling Workflow │
│ │
│ 1. Customer Message │
│ "Check order ORD-12345" │
│ ↓ │
│ 2. AI Analysis │
│ Recognizes need for get_order_status tool │
│ ↓ │
│ 3. Tool Call Generated │
│ {"name": "get_order_status", "arguments": {...}} │
│ ↓ │
│ 4. [Module 4] Tool Execution (MCP Server) │
│ Actual function runs, returns data │
│ ↓ │
│ 5. [Module 4] AI Response │
│ AI incorporates result into customer response │
└─────────────────────────────────────────────────────────────┘
In this module, we completed steps 1-3. Module 4 (MCP Integration) covers steps 4-5 with actual tool execution.
Troubleshooting¶
Issue: AI responds with text instead of tool calls
Solution:
- Verify tool calling is enabled in server config
- Check that tools are properly loaded
- Ensure system prompt mentions available tools
- Try rephrasing the customer message more explicitly
Issue: AI generates wrong tool or arguments
Solution:
- Improve tool descriptions to be more specific
- Add examples in the description
- Check parameter names are clear and unambiguous
- Try a different bigger model (Llama 4 has improved tool calling)
Issue: Invalid JSON in tool call
Solution:
- Check tool call parser matches model
- Try "Auto-detect" parser setting
- Some models may need specific prompt formatting
Clean Up¶
Before proceeding to Module 4, stop the current vLLM server to prepare for MCP integration:
-
In the vLLM Playground web UI, click the Stop Server button to terminate the running vLLM instance.
-
Verify the server has stopped by checking that the server status shows "Stopped" or the Start Server button becomes available again.
Note
Module 4 requires restarting the vLLM Playground daemon service after installing MCP dependencies. Stopping the server now ensures a clean transition.
Troubleshooting¶
Issue: Tool calls not appearing in response
Solution:
- Verify "Enable Tool Calling" is checked
- Restart server after enabling
- Check server logs for tool-related errors
Issue: Model ignores defined tools
Solution:
- Ensure tools are saved/loaded before chatting
- Add system prompt explicitly mentioning tools
- Use models known for good tool calling (Llama 3.1+, Mistral)
Issue: Performance degradation with tools
Solution:
- Tool calling adds processing overhead
- Reduce number of tools if possible
- Use concise tool descriptions
Learning Outcomes¶
By completing this module, you should now understand:
- ✅ How tool calling enables AI to request external data and actions
- ✅ The role of tool parsers for different model families
- ✅ How to define tools with proper schemas and descriptions
- ✅ The workflow from customer message to generated function call
- ✅ Best practices for tool definitions that guide AI behavior
- ✅ The difference between tool call generation and tool execution
Module Summary¶
You've successfully completed the Advanced Inferencing: Tool Calling module.
What you accomplished:
- Enabled tool calling in vLLM Playground server configuration
- Understood tool calling parsers for Llama, Mistral, and Hermes models
- Defined 4 customer support tools with proper schemas
- Tested tool calling workflow with various customer scenarios
- Observed AI-generated function calls with extracted arguments
Key takeaways:
- Tool calling transforms AI from passive responder to active assistant
- Good tool descriptions are critical for AI decision-making
- The AI generates structured calls; your systems handle execution
- Different models have different tool calling capabilities
- Tool calling is the foundation for agentic AI workflows
Business impact for ACME:
- AI can now recognize when it needs customer data
- Structured tool calls integrate with existing backend APIs
- Reduces need for customers to provide information multiple ways
- Foundation for automated support workflows
Next steps:
Module 4 will explore Advanced Inferencing: MCP Integration - connecting the AI to actual external tools with human-in-the-loop approval for safe execution.
Next: Module 4: MCP Integration — Connect the AI to actual external tools with human-in-the-loop approval.








