Module 2: Advanced Inferencing: Structured Outputs¶

In Module 1, you deployed a vLLM server and experienced the chat interface. Now ACME Corporation faces a new challenge: their customer support system needs to integrate AI responses with existing backend systems. Free-form text responses are difficult to parse and process programmatically.

ACME's engineering team requires predictable, structured outputs that downstream systems can reliably consume. vLLM Playground's structured output capabilities — JSON Schema, Regex, and Grammar — provide exactly this control.

In this module, you'll learn to constrain LLM outputs to specific formats, ensuring ACME's AI responses are system-ready and consistently parseable.

Learning Objectives¶

By the end of this module, you'll be able to:

Analyze customer feedback using sentiment classification with constrained outputs
Define JSON Schema constraints for structured API-ready responses
Apply Regex patterns to enforce specific output formats
Create custom Grammar rules for complex output structures
Choose the appropriate structured output method for different use cases

Exercise 1: Sentiment Analysis with Constrained Outputs¶

ACME's customer support team receives thousands of feedback messages daily. Before routing to the appropriate team, they need to automatically classify the sentiment of each message. The classification must be consistent and machine-readable.

You'll configure vLLM to output only valid sentiment labels, ensuring reliable automated processing.

Prerequisites¶

Module 1 completed
Access to vLLM Playground web UI

Steps¶

Open the vLLM Playground web UI:

Navigate to: http://localhost:7860
Configure a more capable model:

For structured outputs to work reliably, we need a more capable model than TinyLlama used in Module 1.
- Click Stop Server if a server is currently running
- In the Model section, select or enter: Qwen/Qwen2.5-3B-Instruct
- Ensure Container mode and GPU mode are selected
- Click Start Server
Note

Qwen2.5-3B-Instruct is a 3 billion parameter model that provides better instruction-following capabilities for structured output tasks while still being fast on GPU.
Wait for the server to be ready (green "Server is ready to chat!" notification).
Navigate to the Structured Outputs section in the toolbar panel.
Click "Enable Structured Outputs".
Select Choice mode for simple constrained outputs.
Configure the sentiment choices:
```
["positive", "negative", "neutral"]
```
This constrains the model to output ONLY one of these three values.

Set a system prompt for sentiment analysis:

You are a sentiment classifier. Analyze the customer feedback and respond with exactly one word: positive, negative, or neutral. No other output.

Test with sample customer feedback:

Feedback: "I absolutely love your product! It exceeded all my expectations and the customer service was fantastic."

Expected output: positive

Test with negative feedback:

Feedback: "Terrible experience. The product arrived broken and nobody responded to my support ticket for a week."

Expected output: negative

Test with neutral feedback:

Feedback: "The product works as described. Delivery was on time. Nothing special but nothing wrong either."

Expected output: neutral

✅ Verify¶

Confirm structured outputs are working:

Model outputs ONLY one of the three allowed values
No additional text, explanations, or formatting
Consistent classification across similar inputs

Business Value for ACME¶

With constrained sentiment outputs, ACME achieves measurable business outcomes:

95% reduction in manual review time: From 20 hours per day to 1 hour per day analyzing customer feedback
60% faster response to negative feedback: Priority routing in under 2 minutes vs 30-minute manual triage
Real-time sentiment tracking: Monitor 10,000+ daily messages with automated dashboards (previously batched weekly)
40% improvement in customer satisfaction scores: Faster issue identification and resolution
$250,000 annual savings: Reduced customer support operational costs through automation

Exercise 2: JSON Schema for Structured Responses¶

ACME's backend systems expect customer support responses in a specific JSON format. The AI must generate responses that conform exactly to the required schema, enabling direct API integration without post-processing.

You'll define a JSON Schema that constrains the model to output properly structured customer support tickets.

Steps¶

Click the Clear button in the Chat Interface to start a new conversation.
In the Structured Outputs panel, select JSON mode.

Define a schema for customer support ticket extraction:

{
  "type": "object",
  "properties": {
    "ticket_type": {
      "type": "string",
      "enum": ["billing", "technical", "general", "complaint", "feature_request"]
    },
    "priority": {
      "type": "string",
      "enum": ["low", "medium", "high", "urgent"]
    },
    "summary": {
      "type": "string",
      "maxLength": 100
    },
    "customer_sentiment": {
      "type": "string",
      "enum": ["positive", "negative", "neutral"]
    },
    "requires_escalation": {
      "type": "boolean"
    }
  },
  "required": ["ticket_type", "priority", "summary", "customer_sentiment", "requires_escalation"]
}

Set a system prompt for ticket extraction:

You are a customer support ticket classifier. Extract structured information from customer messages and output a JSON object with ticket_type, priority, summary, customer_sentiment, and requires_escalation fields.

Test with a sample customer message:

Customer message: "I've been charged twice for my subscription this month! This is unacceptable and I want a refund immediately. I've been a loyal customer for 3 years and this is how you treat me?"

Expected output (formatted):

{
  "ticket_type": "billing",
  "priority": "high",
  "summary": "Customer charged twice for subscription, requesting immediate refund",
  "customer_sentiment": "negative",
  "requires_escalation": true
}

Test with a different scenario:

Customer message: "Hey, just wondering if you have any plans to add dark mode to the mobile app? Would be really nice to have. Thanks!"

Expected output (formatted):

{
  "ticket_type": "feature_request",
  "priority": "low",
  "summary": "Request for dark mode in mobile app",
  "customer_sentiment": "positive",
  "requires_escalation": false
}

Verify the output is valid JSON by checking the response panel shows proper formatting.

✅ Verify¶

Confirm JSON Schema constraints are enforced:

# You can validate the JSON output by saving it to a file and using Python
cat << 'EOF' > /tmp/output.json
<paste_your_json_output_here>
EOF
python3 -c "import json; print(json.load(open('/tmp/output.json')))"

Tip

The vLLM Playground response panel already validates JSON formatting - if the output displays properly formatted, it's valid JSON.

Output is valid JSON (no syntax errors)
All required fields are present
Enum values match the defined options
Boolean field is true/false (not string)

Exercise 3: Regex Patterns for Formatted Outputs¶

ACME's ticketing system requires specific ID formats for tracking. The AI must generate ticket IDs that match the exact pattern expected by downstream systems: ACME-XXXX-YYYY where X is a letter and Y is a digit.

You'll use Regex constraints to enforce this precise format.

Steps¶

Click the Clear button in the Chat Interface to start a new conversation.
In the Structured Outputs panel, select Regex mode.
Define a pattern for ACME ticket IDs:
```
ACME-[A-Z]{4}-[0-9]{4}
```
This pattern enforces: - Literal prefix ACME- - Exactly 4 uppercase letters - A hyphen - - Exactly 4 digits

Set a system prompt:

Generate a unique ticket ID for the customer support system. Output only the ticket ID, nothing else.

Test the Regex constraint:
```
Generate a ticket ID for a new billing inquiry.
```
Expected output format: ACME-BXYZ-1234 (exact letters/numbers will vary)
Verify the format matches by checking:
- Starts with ACME-
- Followed by 4 uppercase letters
- Then a hyphen
- Ends with 4 digits
Try a more complex Regex for phone number formatting:
```
$\d{3}$ \d{3}-\d{4}
```
This enforces US phone format: (XXX) XXX-XXXX
Test with prompt:
```
Format this phone number: 5551234567
```
Expected output: (555) 123-4567

✅ Verify¶

Confirm Regex patterns are enforced:

# Validate ticket ID format with grep
echo "ACME-WXYZ-1234" | grep -E "^ACME-[A-Z]{4}-[0-9]{4}$"

Output matches the exact pattern
No extra characters or whitespace
Pattern is consistently applied across multiple generations

Use Cases for ACME¶

Regex constraints enable:

Ticket IDs: Consistent format for tracking systems
Reference numbers: Order IDs, invoice numbers, case numbers
Formatted data: Phone numbers, dates, postal codes
Codes: Product SKUs, department codes, status codes

Exercise 4: Grammar for Complex Output Structures¶

ACME's analytics team needs AI-generated reports in a specific markup format that their reporting tool can parse. The format is more complex than what JSON Schema or Regex can easily express.

You'll define a custom Grammar to enforce a report structure with sections, bullet points, and specific delimiters.

Steps¶

Click the Clear button in the Chat Interface to start a new conversation.
In the Structured Outputs panel, select Grammar mode.

Grammar uses GBNF (GGML BNF) format. Define a simple report structure:

root ::= report
report ::= header sections footer
header ::= "## REPORT START ##\n"
sections ::= section+
section ::= section-title bullet-list "\n"
section-title ::= "### " [A-Za-z ]+ " ###\n"
bullet-list ::= bullet+
bullet ::= "- " [A-Za-z0-9 ,.]+ "\n"
footer ::= "## REPORT END ##"

This grammar enforces: - Report wrapped in START/END markers - One or more sections with titles - Bullet lists within each section

Set a system prompt:

Generate a brief customer support summary report. Follow the exact format structure provided.

Test with a prompt:

Summarize today's customer support metrics: 150 tickets resolved, 23 escalated, average response time 4.2 minutes, customer satisfaction 94%.

Expected output format:

## REPORT START ##
### Daily Metrics ###
- Total tickets resolved, 150
- Escalated tickets, 23
- Average response time, 4.2 minutes
### Customer Satisfaction ###
- Overall satisfaction score, 94 percent
## REPORT END ##

✅ Verify¶

Confirm Grammar constraints work:

Output follows the defined structure exactly
Required delimiters (##, ###, -) are present
No content outside the grammar rules

When to Use Grammar¶

Use Case	Best Method
Simple choices (yes/no, categories)	Choice mode
Structured data (APIs, databases)	JSON Schema
Specific formats (IDs, codes)	Regex
Complex markup, reports, custom formats	Grammar

Reset Structured Outputs¶

Before moving on, disable Structured Outputs to reset the configuration:

In the Structured Outputs panel, uncheck Enable Structured Outputs.
Click the Clear button to start fresh.

Note

In this module, we explored each structured output mode individually for learning purposes. In practice, these features can complement each other - for example, you might use JSON Schema for API responses in one conversation and Regex for ID generation in another.

Troubleshooting¶

This section covers common issues across all structured output modes (Choice, JSON Schema, Regex, Grammar).

JSON Schema Issues¶

Issue: Output contains extra text before/after JSON

Solution:

Ensure JSON Schema mode is properly selected
Check that the schema is valid JSON
Restart the server if mode changes don't take effect

Issue: Model outputs invalid enum values

Solution:

Verify enum arrays are properly formatted in schema
Ensure model supports guided decoding (check server logs)

Grammar Syntax Issues¶

Issue: Grammar syntax errors

Solution:

Verify GBNF syntax is correct
Test with simpler grammar first
Check for missing quotes or escape characters

Issue: Model output doesn't match grammar

Solution:

Ensure Grammar mode is selected (not JSON or Regex)
Restart server after changing grammar
Simplify grammar rules and test incrementally

General Structured Output Issues¶

Issue: Structured outputs not enforced (model ignores constraints)

Solution:

Verify the server was started with structured output support
Check server logs for guided decoding errors
Some models may not support all constraint types

Issue: "Guided decoding not supported" error

Solution:

Ensure you're using a compatible model
Check vLLM version supports guided decoding
Try a different model that supports constrained generation

Issue: Performance slower with structured outputs

Solution:

This is expected - constraint checking adds overhead
Use simpler constraints when possible
For high throughput, consider JSON Schema over Grammar

Learning Outcomes¶

By completing this module, you should now understand:

✅ How structured outputs enable reliable AI integration with backend systems
✅ The difference between Choice, JSON Schema, Regex, and Grammar modes
✅ When to use each structured output method based on requirements
✅ How to define JSON Schemas for API-ready responses
✅ How Regex patterns enforce specific output formats
✅ How Grammar rules handle complex structured outputs

Module Summary¶

You've successfully completed the Advanced Inferencing: Structured Outputs module.

What you accomplished:

Implemented sentiment classification with constrained choice outputs
Defined JSON Schemas for customer support ticket extraction
Applied Regex patterns for ticket ID and phone number formatting
Created Grammar rules for structured report generation

Key takeaways:

Structured outputs transform unpredictable AI text into reliable, parseable data
JSON Schema is ideal for API integration and database storage
Regex excels at enforcing specific ID and code formats
Grammar handles complex markup and custom structures
Choose the simplest constraint method that meets your requirements

Business impact for ACME:

Customer feedback automation: 10,000+ daily messages automatically categorized with 98% accuracy, eliminating 20 hours of manual review
Support ticket efficiency: JSON-formatted tickets integrate directly with Salesforce, reducing data entry time by 75% (15 min to 4 min per ticket)
Consistent tracking: Regex-validated ticket IDs enable 100% traceability across systems, reducing lost ticket incidents from 50/month to zero
Executive reporting: Grammar-constrained reports generate automatically at 5 PM daily, saving 10 hours/week of manual report compilation
Total productivity gain: 35 hours per week freed for high-value customer interactions, equivalent to adding 1 full-time support agent ($65K annual value)

Next steps:

Module 3 will explore Advanced Inferencing: Tool Calling - enabling the AI to execute functions, retrieve data, and perform actions on behalf of users.

Next: Module 3: Tool Calling — Enable the AI to execute functions, retrieve data, and perform actions.

Module 2: Advanced Inferencing: Structured Outputs¶

Learning Objectives¶

Exercise 1: Sentiment Analysis with Constrained Outputs¶

Prerequisites¶

Steps¶

✅ Verify¶

Business Value for ACME¶

Exercise 2: JSON Schema for Structured Responses¶

Steps¶

✅ Verify¶

Exercise 3: Regex Patterns for Formatted Outputs¶

Steps¶

✅ Verify¶

Use Cases for ACME¶

Exercise 4: Grammar for Complex Output Structures¶

Steps¶

✅ Verify¶

When to Use Grammar¶

Reset Structured Outputs¶

Troubleshooting¶

JSON Schema Issues¶

Grammar Syntax Issues¶

General Structured Output Issues¶

Learning Outcomes¶

Module Summary¶

References¶