A Practical Guide to using Pydantic | by Marc Nealer | Jun, 2024 | Medium

Source: A Practical Guide to using Pydantic | by Marc Nealer | Jun, 2024 | Medium

9 min read

Jun 22, 2024

3

When I started to experiment with FastAPI, I came across Pydantic. With FastAPI, you really have no choice. However, my initial thoughts on the library were not the best. Its has a somewhat steep learning curve and there seems to be a lot of ways to do the same thing without any one saying, “oh use this route unless…”.

With that said, Pydantic is wonderful and such a powerful tool once you understand it. Its in my top 10 Python libraries.

Before I continue, it should be noted that this document discusses Pydantic v2.*. There are significant differences between version 1 and 2. I would also caution you on using ChatGTP or Gemini to help with coding Pydantic. Results given are a strange mix of version 1 and 2.

So What is Pydantic

Pydantic is Python Dataclasses with validation, serialization and data transformation functions. So you can use Pydantic to check your data is valid. transform data into the shapes you need, and then serialize the results so they can be moved on to other applications.

A REALLY Basic example

Lets say you have a function that expects a first and last name. you need to ensure both are there and that they are strings.

from pydantic import BaseModel

class MyFirstModel(BaseModel):
    first_name: str
    last_name: str

validating = MyFirstModel(first_name="marc", last_name="nealer")

While this example is a little silly, it shows a couple of things. First off you can see Pydantic classes look almost the same as Python dataclasses. The second thing to note is that unlike a dataclass, Pydantic will check the values are strings and issue validation errors if they are not.

A point to note, is that validating by the type give, as shown here, is known as the default validation. Later we will discuss validation before and after this point.

Lets get a little more complicated

When it comes to optional parameters, Pydantic handles then with no problem, but the typing might not be what you expect

from pydantic import BaseModel
from typing import Union, Optional

class MySecondModel(BaseModel):
    first_name: str
    middle_name: Union[str, None] # This means the parameter doesn't have to be sent
    title: Optional[str] # this means the parameter should be sent, but can be None
    last_name: str

So if you use Union, with None as an option, then Pydantic is ok if the parameter is there or not. If you use Optional[], it expects the parameter to be sent, even if its blank. This notation might be what you expect, but I find it a little odd.

From this, you can see that we can use all the objects from the typing library and Pydantic will validate against them.

from pydantic import BaseModel
from typing import Union, List, Dict
from datetime import datetime

class MyThirdModel(BaseModel):
    name: Dict[str: str]
    skills: List[str]
    holidays: List[Union[str, datetime]]

Applying Default Values

so far we haven’t discussed what we would do if values are missing.

from pydantic import BaseModel


class DefaultsModel(BaseModel):
    first_name: str = "jane"
    middle_names: list = []
    last_name : str = "doe"

The seems kinda obvious. There is however a problem and that is with the definition of the list. If you code a model in this way, only one list object is created and its shared between all instances of this model. The same happens with dictionaries etc.

To resolve this, we need to introduce the Field Object.

from pydantic import BaseModel, Field

class DefaultsModel(BaseModel):
    first_name: str = "jane"
    middle_names: list = Field(default_factory=list)
    last_name: str = "doe"

Notice that a class or function is passed to the default factory and not a instance of such. This results in a new instance being created for all instances of the model.

If you have been looking at the Pydantic documentation, you would see the Field class being used in lots of different ways. However, the more I use Pydantic, the less I used the Field Object. It can do a lot of things, but it can also make life complicated. For the defaults and default factory, its the way to go. For the rest, well you will see what I do here.

Nesting Models

I don’t have a lot of call to use nested Pydantic models, but I can see it being useful. Nesting is really simple

from pydantic import BaseModel

class NameModel(BaseModel):
    first_name: str
    last_name: str
    
class UserModel(BaseModel):
    username: str
    name: NameModel

Custom Validation

While the default validation through types is great, we will always need to go beyond that. Pydantic has a number of different ways that you can add your own validation routines.

Before we start looking at any of these, we need to discuss the Before and After options. As I stated above, the tying validation is considered the default so when Pydantic adds custom validation on fields, its defined as before or after this default.

With model validation, which we will discuss a little later, the meaning is different. Before refers to validating before the object is initialized, and after, is when the object has been initialized and other validation has completed.

Field Validation

We can define validation using the Field() object, but as we get more into Pydantic, overuse of the Field() object makes life difficult. We can also create validators using a decorator and stating the fields it is supposed to be applied to. What I prefer to use are the Annotated validators. They are neat and tidy, and easy to understand. Fellow programmers will be able to follow what your doing with ease.

from pydantic import BaseModel, BeforeValidator, ValidationError
import datetime
from typing import Annotated


def stamp2date(value):
    if not isinstance(value, float):
        raise ValidationError("incoming date must be a timestamp")
    try:
        res = datetime.datetime.fromtimestamp(value)
    except ValueError:
        raise ValidationError("Time stamp appears to be invalid")
    return res


class DateModel(BaseModel):
    dob: Annotated[datetime.datetime, BeforeValidator(stamp2date)]

The example is validating the data before the default validation. this is really useful as it gives us a chance to change and reformat the data, as well as validating. In this case I’m expecting a numerical time stamp to be passed. I validate for that and then convert the timestamp to a datetime object. The default validation is expecting a datetime object.

Pydantic also has AfterValidator and WrapValidator. The former runs after the default validator and the latter work like middleware, performing actions before and after. We can also apply multiple validator

from pydantic import BaseModel, BeforeValidator, AfterValidator, ValidationError
import datetime
from typing import Annotated


def one_year(value):
    if value < datetime.datetime.today() - datetime.timedelta(days=365):
        raise ValidationError("the date must be less than a year old")
    return value
  

def stamp2date(value):
    if not isinstance(value, float):
        raise ValidationError("incoming date must be a timestamp")
    try:
        res = datetime.datetime.fromtimestamp(value)
    except ValueError:
        raise ValidationError("Time stamp appears to be invalid")
    return res


class DateModel(BaseModel):
    dob: Annotated[datetime.datetime, BeforeValidator(stamp2date), AfterValidator(one_year)]

The majority of the time, I use the BeforeValidator. Transforming incoming data is a must, in many usecases. AfterValidator is great when you want to check that, while the value is of the right type, it has to meet other criteria. WrapValidator I haven’t used. I would like to hear from anyone who does, as I would like to understand the usecases for such.

Before we move on from this, I thought an example of where multiple types need to be an option. Or more to the point, where a parameter will be optional.

from pydantic import BaseModel, BeforeValidator, ValidationError, Field
import datetime
from typing import Annotated


def stamp2date(value):
    if not isinstance(value, float):
        raise ValidationError("incoming date must be a timestamp")
    try:
        res = datetime.datetime.fromtimestamp(value)
    except ValueError:
        raise ValidationError("Time stamp appears to be invalid")
    return res


class DateModel(BaseModel):
    dob: Annotated[Annotated[datetime.datetime, BeforeValidator(stamp2date)] | None, Field(default=None)]

Model Validation

Lets take a simple usecase. We have three values, that are all going to be optional, but at least one of them has to be sent. Field validation only looks at each field on its own, so its no good here. This is where Model validation comes in.

from pydantic import BaseModel, model_validator, ValidationError
from typing import Union, Any

class AllOptionalAfterModel(BaseModel):
    param1: Union[str, None] = None
    param2: Union[str, None] = None
    param3: Union[str, None] = None
    
    @model_validator(mode="after")
    def there_must_be_one(self):
        if not (self.param1 or self.param2 or self.param3):
            raise ValidationError("One parameter must be specified")
        return self

class AllOptionalBeforeModel(BaseModel):
    param1: Union[str, None] = None
    param2: Union[str, None] = None
    param3: Union[str, None] = None
    
    @model_validator(mode="before")
    @classmethod
    def there_must_be_one(cls, data: Any):
        if not (data["param1"] or data["param2"] or data["param3"]):
            raise ValidationError("One parameter must be specified")
        return data

Above are two examples. The First is an After validation. You will notice that its marked mode=”after” and its passed the object as self. This is an important distinction.

The Before validation follows a very different route. First off, you can see the model_validation decorator with mode=”before”. Then the classmethod decorator. Important. YOU NEED TO SPECIFY BOTH AND IN THIS ORDER.

I had some very odd error messages when I didn’t do this, so its an important point to note.

Next you will notice that the class and the data (parameters) passed to the class are both passed to the method as arguments. Validation is done on the data or passed values, which are usually passed on as a dictionary. The data object needs to be passed back at the end of the validation, thus showing you can use this method to alter the data, just like the BeforeValidator.

Alias’s

Alias’s are important, especially if your dealing with incoming data and am performing transformations. We use alias’s to change the name of values, or to locate values when they are not passed as the field name.

Pydantic defines alias’s as Validation Alias’s (The name of the incoming value is not the same as the field), and Serialization Alias’s (changing the name when we serialize or output the data after validation).

The documentation goes into a lot of detail on defining the Alias’s using the Field() object, but there are issues with this. Defining defaults and Alias’s together doesn’t work. We can however define alias’s at the model level instead of at the field level.

from pydantic import AliasGenerator, BaseModel, ConfigDict


class Tree(BaseModel):
    model_config = ConfigDict(
        alias_generator=AliasGenerator(
            validation_alias=lambda field_name: field_name.upper(),
            serialization_alias=lambda field_name: field_name.title(),
        )
    )

    age: int
    height: float
    kind: str


t = Tree.model_validate({'AGE': 12, 'HEIGHT': 1.2, 'KIND': 'oak'})
print(t.model_dump(by_alias=True))
#> {'Age': 12, 'Height': 1.2, 'Kind': 'oak'}

I took this example from the documentation, as its a bit on the simple side and not really of much use, but it does show how the field names can be transformed. A point to note here is that if you want to serialize the model using the serialization alias’s, you need to say so “by_alias=True”.

Now lets get on with some more useful examples of using Alias’s using the AliasChoices and AliasPath objects.

AliasChoices

Data being sent to you where a given value is given different field or column names, is really common. Ask a dozen people to send a list of names with first and last names in different columns, and I bet you get different column names!!

AliasChoices allows you to define a list of incoming value names that will match a given field.

from pydantic import BaseModel, ConfigDict, AliasGenerator, AliasChoices

aliases = {
    "first_name": AliasChoices("fname", "surname", "forename", "first_name"),
    "last_name": AliasChoices("lname", "family_name", "last_name")
}


class FirstNameChoices(BaseModel):
    model_config = ConfigDict(
        alias_generator=AliasGenerator(
            validation_alias=lambda field_name: aliases.get(field_name, None)
        )
    )
    title: str
    first_name: str
    last_name: str

The code shown here allows you to define a dictionary where the key is the field name and the value is an AliasChoices object. Do note that I have included the actual field name in the list. You might be using this to transform and serialize data to be saved, and then want to read it back into the model for use. Thus the actual field name should be in the list.

AliasPath

In most cases, incoming data is not flat, or comes in blobs of json, which are turned into dictionaries and then passed to your model. So how do we set a field to a value that is in a dictionary or list. Well that’s what AliasPath does.

from pydantic import BaseModel, ConfigDict, AliasGenerator, AliasPath

aliases = {
    "first_name": AliasPath("name", "first_name"),
    "last_name": AliasPath("name",  "last_name")
}


class FirstNameChoices(BaseModel):
    model_config = ConfigDict(
        alias_generator=AliasGenerator(
            validation_alias=lambda field_name: aliases.get(field_name, None)
        )
    )
    title: str
    first_name: str
    last_name: str

obj = FirstNameChoices(**{"name":{"first_name": "marc", "last_name": "Nealer"},"title":"Master Of All"})

From the code above you can see first and last name are in a dictionary. I’ve used AliasPath to flatten the data pulling the values out of the dictionary, so all values are on the same level.

Using AliasPath and AliasChoices

We can use both of these together.

from pydantic import BaseModel, ConfigDict, AliasGenerator, AliasPath, AliasChoices

aliases = {
    "first_name": AliasChoices("first_name", AliasPath("name", "first_name")),
    "last_name": AliasChoices("last_name", AliasPath("name",  "last_name"))
}


class FirstNameChoices(BaseModel):
    model_config = ConfigDict(
        alias_generator=AliasGenerator(
            validation_alias=lambda field_name: aliases.get(field_name, None)
        )
    )
    title: str
    first_name: str
    last_name: str

obj = FirstNameChoices(**{"name":{"first_name": "marc", "last_name": "Nealer"},"title":"Master Of All"})

Final Thoughts

Pydantic is a Mega Brilliant library, but does suffer from having a lot of ways to do the same thing. To get to understanding and using the examples I’ve shown here, took a lot of work. I hope that using these, you can get stuck into Pydantic faster and with a far less work than I had to go through.

One last thing. Pydantic and AI services. Chat-gtp, Gemini etc give erratic answers to questions on Pydantic. Its like it can’t decide if its Pydantic V1 or V2 and just mixes then up. You even get “Pydantic can’t do that” to stuff it can. So best avoid them when using the library

Leave a Reply

The maximum upload file size: 500 MB. You can upload: image, audio, video, document, spreadsheet, interactive, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here