Structured Outputs Schema Generator

Name: Pierre Kasparian - Intégration IA freelance
Rating: 5

Describe in plain language the data you want to extract, and instantly get a production-ready Pydantic class or JSON Schema for OpenAI Function Calling or Instructor.

Your schema will appear here

Describe your data on the left and click Generate.

System Prompt used by this tool

For full transparency, here is the exact prompt sent to the Groq API (Llama 3.3 70B). The prompt shown matches the currently selected format.

You are a Python expert specializing in Pydantic v2 for OpenAI Structured Outputs and the Instructor library.

Your task: generate a Pydantic BaseModel class from the user's description of data to extract.

Strict rules:
1. Imports: from typing import Optional, List; from pydantic import BaseModel, Field
2. Use Optional[T] for fields that may be absent in the source text
3. Use List[SubModel] for collections; create a dedicated sub-class for complex list items
4. Every field must carry Field(description="..."), as this description guides the LLM during extraction
5. snake_case for all field names
6. Add a one-line class docstring to every class
7. Output ONLY valid Python code, with no markdown code fences and no explanations

Few-shot example:
User: "I want to extract the company name, the amount before tax, the invoice date and a list of line items"
Output:
from typing import Optional, List
from pydantic import BaseModel, Field

class LineItem(BaseModel):
    """A single line item on the invoice."""
    description: str = Field(description="Product or service description")
    quantity: Optional[float] = Field(None, description="Quantity ordered")
    unit_price: Optional[float] = Field(None, description="Unit price in euros")
    total: Optional[float] = Field(None, description="Line total in euros")

class Invoice(BaseModel):
    """Structured invoice data for LLM extraction."""
    company_name: str = Field(description="Name of the issuing company")
    amount_before_tax: float = Field(description="Total before-tax amount in euros")
    invoice_date: str = Field(description="Invoice date in YYYY-MM-DD format")
    items: List[LineItem] = Field(default_factory=list, description="All line items on the invoice")

Why this prompt prevents hallucinations

Few-Shot prompting anchors the model to an exact output format via an example. The rule "Output ONLY valid code" strips any surrounding text. A temperature of 0.1 ensures deterministic output. Numbered rules form a checklist the LLM follows sequentially.

How it works

Describe your data

Write in plain language what you want to extract: fields, expected types, nested lists.

Choose the format

Pydantic for a Python project using Instructor or LangChain. JSON Schema for direct integration with the OpenAI API.

Copy and use

The schema is ready to use. Paste it into your code and wire up your LLM extraction in a few lines.

Why strict typing is essential for connecting AI to your databases

Connecting an LLM to PostgreSQL, BigQuery or a REST API without strict typing is like submitting a form without validation: data arrives, but in any format. JSON becomes unusable, inserts fail, and your pipeline crashes on the first exception.

The free-form LLM problem

Without a schema, an LLM returns free text. "amount": "$1,200.00" instead of 1200.0 as a float. "date": "March 15th" instead of "2024-03-15". These formats are unreadable by a SQL engine or REST API expecting precise types.

Structured Outputs: the solution

OpenAI Function Calling and Instructor force the model to produce valid JSON conforming to your schema. No more manual string parsing, no more exceptions on unexpected formats. The output is directly injectable into your database.

Pydantic or JSON Schema?

Pydantic is ideal in Python: automatic validation, native serialisation, direct integration with Instructor and LangChain. JSON Schema suits multi-language environments or when calling the OpenAI API directly without an SDK.

In production: PostgreSQL and BigQuery

With a Pydantic schema, you can use .model_dump() to insert directly into SQLAlchemy or BigQuery Storage Write API. Every field is typed, validated and documented. The LLM-to-database pipeline becomes reliable and maintainable.

Usage with Instructor (Python)

Once you have your Pydantic schema, plug it directly into Instructor for structured extraction with zero manual parsing:

import instructor
from openai import OpenAI
# Paste your Pydantic class here

client = instructor.from_openai(OpenAI())

result = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Invoice,  # your generated class
    messages=[{
        "role": "user",
        "content": f"Extract data from this invoice:\n{invoice_text}"
    }]
)

# result is a typed, validated Invoice object
print(result.model_dump())
# -> {"company_name": "Acme Corp", "amount_before_tax": 1200.0, ...}

Frequently asked questions

What is the difference between Pydantic and JSON Schema for Structured Outputs?

Pydantic generates a Python class with built-in validation, ideal for Python projects using Instructor or LangChain. JSON Schema is a universal format natively supported by the OpenAI API, useful for multi-language projects or when calling the API without a Python SDK.

Is the generated schema compatible with Instructor?

Yes. The generated Pydantic schema uses Pydantic v2 with BaseModel and Field, which is directly compatible with Instructor (client.chat.completions.create with response_model=YourModel).

Can I use JSON Schema directly with the OpenAI API?

Yes. Pass the generated schema in the response_format parameter: { type: 'json_schema', json_schema: { name: 'MySchema', schema: yourSchema } } when calling the OpenAI API.

How are optional fields handled?

In Pydantic, Optional[T] indicates a field that may be absent in the source text (defaults to None). In JSON Schema, fields absent from the "required" array are optional. The generator handles this logic automatically based on your description.

What technology powers this tool?

The tool uses Meta's Llama 3.3 70B via the Groq API, with a temperature of 0.1 for deterministic results. The system prompt includes a Few-Shot example to anchor the output format.

Your data pipelines need more than just schemas?

I set up your LLM-connected agents and ETL pipelines, from prototype to production.

Get in touch