Module 2: Advanced Inferencing: Structured Outputs¶
In Module 1, you deployed a vLLM server and experienced the chat interface. Now ACME Corporation faces a new challenge: their customer support system needs to integrate AI responses with existing backend systems. Free-form text responses are difficult to parse and process programmatically.
ACME's engineering team requires predictable, structured outputs that downstream systems can reliably consume. vLLM Playground's structured output capabilities — JSON Schema, Regex, and Grammar — provide exactly this control.
In this module, you'll learn to constrain LLM outputs to specific formats, ensuring ACME's AI responses are system-ready and consistently parseable.
Learning Objectives¶
By the end of this module, you'll be able to:
- Analyze customer feedback using sentiment classification with constrained outputs
- Define JSON Schema constraints for structured API-ready responses
- Apply Regex patterns to enforce specific output formats
- Create custom Grammar rules for complex output structures
- Choose the appropriate structured output method for different use cases
Exercise 1: Sentiment Analysis with Constrained Outputs¶
ACME's customer support team receives thousands of feedback messages daily. Before routing to the appropriate team, they need to automatically classify the sentiment of each message. The classification must be consistent and machine-readable.
You'll configure vLLM to output only valid sentiment labels, ensuring reliable automated processing.
Prerequisites¶
- Module 1 completed
- Access to vLLM Playground web UI
Steps¶
-
Open the vLLM Playground web UI:
Navigate to: http://localhost:7860
-
Configure a more capable model:
For structured outputs to work reliably, we need a more capable model than TinyLlama used in Module 1.
- Click Stop Server if a server is currently running
- In the Model section, select or enter:
Qwen/Qwen2.5-3B-Instruct - Ensure Container mode and GPU mode are selected
- Click Start Server
Note
Qwen2.5-3B-Instruct is a 3 billion parameter model that provides better instruction-following capabilities for structured output tasks while still being fast on GPU.
-
Wait for the server to be ready (green "Server is ready to chat!" notification).
-
Navigate to the Structured Outputs section in the toolbar panel.
-
Click "Enable Structured Outputs".
-
Select Choice mode for simple constrained outputs.
-
Configure the sentiment choices:
This constrains the model to output ONLY one of these three values.
-
Set a system prompt for sentiment analysis:
-
Test with sample customer feedback:
Feedback: "I absolutely love your product! It exceeded all my expectations and the customer service was fantastic."Expected output:
positive -
Test with negative feedback:
Feedback: "Terrible experience. The product arrived broken and nobody responded to my support ticket for a week."Expected output:
negative -
Test with neutral feedback:
Feedback: "The product works as described. Delivery was on time. Nothing special but nothing wrong either."Expected output:
neutral
✅ Verify¶
Confirm structured outputs are working:
- Model outputs ONLY one of the three allowed values
- No additional text, explanations, or formatting
- Consistent classification across similar inputs
Business Value for ACME¶
With constrained sentiment outputs, ACME achieves measurable business outcomes:
- 95% reduction in manual review time: From 20 hours per day to 1 hour per day analyzing customer feedback
- 60% faster response to negative feedback: Priority routing in under 2 minutes vs 30-minute manual triage
- Real-time sentiment tracking: Monitor 10,000+ daily messages with automated dashboards (previously batched weekly)
- 40% improvement in customer satisfaction scores: Faster issue identification and resolution
- $250,000 annual savings: Reduced customer support operational costs through automation
Exercise 2: JSON Schema for Structured Responses¶
ACME's backend systems expect customer support responses in a specific JSON format. The AI must generate responses that conform exactly to the required schema, enabling direct API integration without post-processing.
You'll define a JSON Schema that constrains the model to output properly structured customer support tickets.
Steps¶
-
Click the Clear button in the Chat Interface to start a new conversation.
-
In the Structured Outputs panel, select JSON mode.
-
Define a schema for customer support ticket extraction:
{ "type": "object", "properties": { "ticket_type": { "type": "string", "enum": ["billing", "technical", "general", "complaint", "feature_request"] }, "priority": { "type": "string", "enum": ["low", "medium", "high", "urgent"] }, "summary": { "type": "string", "maxLength": 100 }, "customer_sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] }, "requires_escalation": { "type": "boolean" } }, "required": ["ticket_type", "priority", "summary", "customer_sentiment", "requires_escalation"] } -
Set a system prompt for ticket extraction:
-
Test with a sample customer message:
Customer message: "I've been charged twice for my subscription this month! This is unacceptable and I want a refund immediately. I've been a loyal customer for 3 years and this is how you treat me?"Expected output (formatted):
-
Test with a different scenario:
Customer message: "Hey, just wondering if you have any plans to add dark mode to the mobile app? Would be really nice to have. Thanks!"Expected output (formatted):
-
Verify the output is valid JSON by checking the response panel shows proper formatting.
✅ Verify¶
Confirm JSON Schema constraints are enforced:
# You can validate the JSON output by saving it to a file and using Python
cat << 'EOF' > /tmp/output.json
<paste_your_json_output_here>
EOF
python3 -c "import json; print(json.load(open('/tmp/output.json')))"
Tip
The vLLM Playground response panel already validates JSON formatting - if the output displays properly formatted, it's valid JSON.
- Output is valid JSON (no syntax errors)
- All required fields are present
- Enum values match the defined options
- Boolean field is true/false (not string)
Exercise 3: Regex Patterns for Formatted Outputs¶
ACME's ticketing system requires specific ID formats for tracking. The AI must generate ticket IDs that match the exact pattern expected by downstream systems: ACME-XXXX-YYYY where X is a letter and Y is a digit.
You'll use Regex constraints to enforce this precise format.
Steps¶
-
Click the Clear button in the Chat Interface to start a new conversation.
-
In the Structured Outputs panel, select Regex mode.
-
Define a pattern for ACME ticket IDs:
This pattern enforces: - Literal prefix
ACME-- Exactly 4 uppercase letters - A hyphen-- Exactly 4 digits -
Set a system prompt:
-
Test the Regex constraint:
Expected output format:
ACME-BXYZ-1234(exact letters/numbers will vary) -
Verify the format matches by checking:
- Starts with
ACME- - Followed by 4 uppercase letters
- Then a hyphen
- Ends with 4 digits
- Starts with
-
Try a more complex Regex for phone number formatting:
This enforces US phone format:
(XXX) XXX-XXXX -
Test with prompt:
Expected output:
(555) 123-4567
✅ Verify¶
Confirm Regex patterns are enforced:
- Output matches the exact pattern
- No extra characters or whitespace
- Pattern is consistently applied across multiple generations
Use Cases for ACME¶
Regex constraints enable:
- Ticket IDs: Consistent format for tracking systems
- Reference numbers: Order IDs, invoice numbers, case numbers
- Formatted data: Phone numbers, dates, postal codes
- Codes: Product SKUs, department codes, status codes
Exercise 4: Grammar for Complex Output Structures¶
ACME's analytics team needs AI-generated reports in a specific markup format that their reporting tool can parse. The format is more complex than what JSON Schema or Regex can easily express.
You'll define a custom Grammar to enforce a report structure with sections, bullet points, and specific delimiters.
Steps¶
-
Click the Clear button in the Chat Interface to start a new conversation.
-
In the Structured Outputs panel, select Grammar mode.
-
Grammar uses GBNF (GGML BNF) format. Define a simple report structure:
root ::= report report ::= header sections footer header ::= "## REPORT START ##\n" sections ::= section+ section ::= section-title bullet-list "\n" section-title ::= "### " [A-Za-z ]+ " ###\n" bullet-list ::= bullet+ bullet ::= "- " [A-Za-z0-9 ,.]+ "\n" footer ::= "## REPORT END ##"This grammar enforces: - Report wrapped in START/END markers - One or more sections with titles - Bullet lists within each section
-
Set a system prompt:
-
Test with a prompt:
Summarize today's customer support metrics: 150 tickets resolved, 23 escalated, average response time 4.2 minutes, customer satisfaction 94%.Expected output format:
✅ Verify¶
Confirm Grammar constraints work:
- Output follows the defined structure exactly
- Required delimiters (##, ###, -) are present
- No content outside the grammar rules
When to Use Grammar¶
| Use Case | Best Method |
|---|---|
| Simple choices (yes/no, categories) | Choice mode |
| Structured data (APIs, databases) | JSON Schema |
| Specific formats (IDs, codes) | Regex |
| Complex markup, reports, custom formats | Grammar |
Reset Structured Outputs¶
Before moving on, disable Structured Outputs to reset the configuration:
- In the Structured Outputs panel, uncheck Enable Structured Outputs.
- Click the Clear button to start fresh.
Note
In this module, we explored each structured output mode individually for learning purposes. In practice, these features can complement each other - for example, you might use JSON Schema for API responses in one conversation and Regex for ID generation in another.
Troubleshooting¶
This section covers common issues across all structured output modes (Choice, JSON Schema, Regex, Grammar).
JSON Schema Issues¶
Issue: Output contains extra text before/after JSON
Solution:
- Ensure JSON Schema mode is properly selected
- Check that the schema is valid JSON
- Restart the server if mode changes don't take effect
Issue: Model outputs invalid enum values
Solution:
- Verify enum arrays are properly formatted in schema
- Ensure model supports guided decoding (check server logs)
Grammar Syntax Issues¶
Issue: Grammar syntax errors
Solution:
- Verify GBNF syntax is correct
- Test with simpler grammar first
- Check for missing quotes or escape characters
Issue: Model output doesn't match grammar
Solution:
- Ensure Grammar mode is selected (not JSON or Regex)
- Restart server after changing grammar
- Simplify grammar rules and test incrementally
General Structured Output Issues¶
Issue: Structured outputs not enforced (model ignores constraints)
Solution:
- Verify the server was started with structured output support
- Check server logs for guided decoding errors
- Some models may not support all constraint types
Issue: "Guided decoding not supported" error
Solution:
- Ensure you're using a compatible model
- Check vLLM version supports guided decoding
- Try a different model that supports constrained generation
Issue: Performance slower with structured outputs
Solution:
- This is expected - constraint checking adds overhead
- Use simpler constraints when possible
- For high throughput, consider JSON Schema over Grammar
Learning Outcomes¶
By completing this module, you should now understand:
- ✅ How structured outputs enable reliable AI integration with backend systems
- ✅ The difference between Choice, JSON Schema, Regex, and Grammar modes
- ✅ When to use each structured output method based on requirements
- ✅ How to define JSON Schemas for API-ready responses
- ✅ How Regex patterns enforce specific output formats
- ✅ How Grammar rules handle complex structured outputs
Module Summary¶
You've successfully completed the Advanced Inferencing: Structured Outputs module.
What you accomplished:
- Implemented sentiment classification with constrained choice outputs
- Defined JSON Schemas for customer support ticket extraction
- Applied Regex patterns for ticket ID and phone number formatting
- Created Grammar rules for structured report generation
Key takeaways:
- Structured outputs transform unpredictable AI text into reliable, parseable data
- JSON Schema is ideal for API integration and database storage
- Regex excels at enforcing specific ID and code formats
- Grammar handles complex markup and custom structures
- Choose the simplest constraint method that meets your requirements
Business impact for ACME:
- Customer feedback automation: 10,000+ daily messages automatically categorized with 98% accuracy, eliminating 20 hours of manual review
- Support ticket efficiency: JSON-formatted tickets integrate directly with Salesforce, reducing data entry time by 75% (15 min to 4 min per ticket)
- Consistent tracking: Regex-validated ticket IDs enable 100% traceability across systems, reducing lost ticket incidents from 50/month to zero
- Executive reporting: Grammar-constrained reports generate automatically at 5 PM daily, saving 10 hours/week of manual report compilation
- Total productivity gain: 35 hours per week freed for high-value customer interactions, equivalent to adding 1 full-time support agent ($65K annual value)
Next steps:
Module 3 will explore Advanced Inferencing: Tool Calling - enabling the AI to execute functions, retrieve data, and perform actions on behalf of users.
Next: Module 3: Tool Calling — Enable the AI to execute functions, retrieve data, and perform actions.









