CategoriesProject ManagementPython

Getting Started with FastAPI-Users and Alembic

(Updated 2022-03-15)

FastAPI-Users is a user registration and authentication system that makes adding user accounts to your FastAPI project easier and secure-by-default. It comes with support for various ORMs, and contains all the models, dependencies, and routes you need for registration, activation, email verification, and more.

When setting up your database, you can use SQLAlchemy (or your preferred ORM), plus the provided models, to create the necessary tables very quickly–as you can see from the example in the docs, it doesn’t take much to get everything you need.

In an actively developed project, though, your database is likely to go through many changes over time. Alembic is a tool, used alongside SQLAlchemy, that helps manage database migrations.

This article will cover how to get started with FastAPI-Users and Alembic in a Poetry project. I’ll be using a SQLite database in the examples, because it’s readily available in Python, it’s a good database, and it will illustrate one of Alembic’s features.

Start by running poetry new or poetry init to start a new project.

Adding Dependencies

First of all, let’s add our dependencies to our Poetry project. For the sake of this tutorial, I’ll be pinning specific version numbers. You should consider what versions you want your project to be compatible with when adding your dependencies.

$ poetry add fastapi==0.74.0
$ poetry add fastapi-users[sqlalchemy2]==9.2.5
$ poetry add databases[sqlite]==0.5.5
$ poetry add alembic==1.7.7

Creating the FastAPI App

(Update: since I first wrote this, FastAPI-Users has made some fairly significant changes that make it much more flexible, but require a bit more setup. I’ve kept it as one file below, but I highly recommend seeing the full example in the docs, where it’s separated into different files)

In your project’s source code directory, create a file main.py and put the following code in it.

from typing import AsyncGenerator, Optional

import databases
from fastapi import Depends, FastAPI, Request
from fastapi_users import models as user_models
from fastapi_users import db as users_db
from fastapi_users import BaseUserManager, FastAPIUsers
from fastapi_users.authentication import (
    AuthenticationBackend,
    CookieTransport,
    JWTStrategy,
)
from fastapi_users.db import SQLAlchemyUserDatabase
import sqlalchemy as sa
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base
from sqlalchemy.orm import sessionmaker

DATABASE_URL = "sqlite+aiosqlite:///./test.db"
SECRET = "SECRET"


class User(user_models.BaseUser):
    name: Optional[str]


class UserCreate(user_models.BaseUserCreate):
    name: str


class UserUpdate(User, user_models.BaseUserUpdate):
    pass


class UserDB(User, user_models.BaseUserDB):
    pass


database = databases.Database(DATABASE_URL)

Base: DeclarativeMeta = declarative_base()


class UserTable(Base, users_db.SQLAlchemyBaseUserTable):
    name = sa.Column(
        sa.String(length=100),
        server_default=sa.sql.expression.literal("No name given"),
        nullable=False,
    )


engine = create_async_engine(DATABASE_URL, connect_args={"check_same_thread": False})

users = UserTable.__table__
user_db = users_db.SQLAlchemyUserDatabase(UserDB, database, users)


def get_jwt_strategy() -> JWTStrategy:
    return JWTStrategy(secret=SECRET, lifetime_seconds=3600)


auth_backend = AuthenticationBackend(
    name="jwt",
    transport=CookieTransport(),
    get_strategy=get_jwt_strategy,
)


class UserManager(BaseUserManager[UserCreate, UserDB]):
    user_db_model = UserDB
    reset_password_token_secret = SECRET
    verification_token_secret = SECRET

    async def on_after_register(self, user: UserDB, request: Optional[Request] = None):
        print(f"User {user.id} has registered.")

    async def on_after_forgot_password(
        self, user: UserDB, token: str, request: Optional[Request] = None
    ):
        print(f"User {user.id} has forgot their password. Reset token: {token}")

    async def on_after_request_verify(
        self, user: UserDB, token: str, request: Optional[Request] = None
    ):
        print(f"Verification requested for user {user.id}. Verification token: {token}")


async_session_maker = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)


async def get_async_session() -> AsyncGenerator[AsyncSession, None]:
    async with async_session_maker() as session:
        yield session


async def get_user_db(session: AsyncSession = Depends(get_async_session)):
    yield SQLAlchemyUserDatabase(UserDB, session, UserTable)


async def get_user_manager(user_db: SQLAlchemyUserDatabase = Depends(get_user_db)):
    yield UserManager(user_db)


app = FastAPI()
fastapi_users = FastAPIUsers(
    get_user_manager,
    [auth_backend],
    User,
    UserCreate,
    UserUpdate,
    UserDB,
)


@app.on_event("startup")
async def startup():
    await database.connect()


@app.on_event("shutdown")
async def shutdown():
    await database.disconnect()


app.include_router(
    fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(fastapi_users.get_register_router(), prefix="/auth", tags=["auth"])
app.include_router(
    fastapi_users.get_reset_password_router(),
    prefix="/auth",
    tags=["auth"],
)
app.include_router(
    fastapi_users.get_verify_router(),
    prefix="/auth",
    tags=["auth"],
)
app.include_router(fastapi_users.get_users_router(), prefix="/users", tags=["users"])

This is basically just the example given by FastAPI-Users, condensed into one file, and minus a few things, including the code that creates the database table–we’ll be using Alembic to do that. If you’re not already familiar with this, I recommend going through the configuration docs where each section of the code is explained.

You can start up the server using poetry run uvicorn projectname.main:app and make sure everything is working. To test it, I navigated to the docs in my browser (http://127.0.0.1:8000/docs). This should show all the FastAPI-Users routes and how to use them. They won’t work yet, since the database isn’t created yet. So, it’s time to set that up!

Initializing Alembic

In the top level directory of your project, run this command:

$ poetry run alembic init alembic

This will create some directories and files. Note that the name passed to the init command can be whatever you want: maybe you’re going to be managing two different databases, and you want to name each directory after the database that it will apply to.

For a description of the files and directories inside the new alembic directory, take a look at the tutorial.

Now, we need to edit alembic.ini to tell it to use our SQLite database. Find the line that looks like

sqlalchemy.url = driver://user:pass@localhost/dbname

and replace it with

sqlalchemy.url = sqlite:///./test.db

Now Alembic is all set up and ready to go!

Creating a Migration Script

We can use the command alembic revision to have Alembic create our first migration script. By passing the -m flag, the script can be titled.

$ poetry run alembic revision -m "Create FastAPI-Users user table"

This should create a file named something like {identifier}_create_fastapi_users_user_table.py, which contains the beginnings of a migration script in it. All we have to do is write the actual migration.

One of the columns in the user table is a custom type, so at the top of the file add this import: from fastapi_users_db_sqlalchemy import GUID

Now, it’s time to write the actual migration! This is in the form of upgrade() and downgrade() functions. The downgrade() function isn’t required, but if you don’t write it, you won’t be able to go revert a database to a previous version.

def upgrade():
    op.create_table( # This tells Alembic that, when upgrading, a table needs to be created.
        "user", # The name of the table.
        sa.Column("id", GUID, primary_key=True), # The column "id" uses the custom type imported earlier.
        sa.Column(
            "email", sa.String(length=320), unique=True, index=True, nullable=False
        ),
        sa.Column("hashed_password", sa.String(length=72), nullable=False),
        sa.Column("is_active", sa.Boolean, default=True, nullable=False),
        sa.Column("is_superuser", sa.Boolean, default=False, nullable=False),
        sa.Column("is_verified", sa.Boolean, default=False, nullable=False),
    )


def downgrade():
    op.drop_table("user") # If we need to downgrade the database--in this case, that means restoring the database to being empty.

Running the Migration

To update to the most recent revision, we can use this command:

$ poetry run alembic upgrade head

It’s also possible to pass specific versions to upgrade/downgrade to, or use relative identifiers to, for example, upgrade to the version two versions ahead of the current one.

If the command was successful, your test.db database file should now contain a user table with all of the columns specified in the script.

Register an Account

To test that the database is now set up, let’s try creating an account.

First, run poetry install to make sure that your project is installed and ready to go. Then, go ahead and start up uvicorn again. You can use anything you want to send the data to the API, but I think the easiest way to verify everything is working is through the docs.

Go to http://127.0.0.1:8000/docs#/auth/register_register_auth_register_post and click on Try it out. Default values are supplied, but you can edit them if you want. When you’re ready, click Execute and see what your server responds with; if all is well, it should be a 201 response code.

Congratulations, you’ve now used Alembic to migrate your database from nonexistence to having a user table suitable for use with FastAPI-Users! From here, you can continue making revisions based on the needs of your project. It’s totally possible to do that by repeating this same process: create a new migration script using alembic revision, then fill in the upgrade() function, then run the migration. However, there is an alternative that many people find easier. Let’s explore that.

Using Alembic’s Autorevision

Suppose, now that you have the basic columns required by FastAPI-Users, you want to update your user table to also have a name column.

When Alembic ran our first migration, it also created its own table to track the current schema and which version of the database it is. When we make changes to our models, we can tell Alembic to compare the new model to the current database, and automatically create a revision based on those changes.

It’s not perfect, and it isn’t intended to be: in some cases, you’ll still need to edit the output to reflect your intentions.

To get started, we need to edit env.py in the alembic directory. All you have to do is import the Base variable from main.py, which should look like from yourproject.main import Base and then assign it to the target_metadata variable. There’s already a line in env.py that looks like target_metadata = None; just change that to target_metadata = Base.metadata. For more detailed instructions, the tutorial has you covered.

Updating the Models

Now that Alembic can see our model, it’s time to actually change the model. To add a name attribute, there are a few lines of code we need to add to main.py.

The Pydantic models User and UserCreate need to be updated. These are what FastAPI will use in establishing what data the API will need to be sent, for example, when the user is created.

Under User, replace pass with the attribute name: Optional[str].

Under UserCreate, replace pass with the attribute name: str.

Now, users will be required to send a name when they make an account, and can optionally update their name when making a patch request to /users/me.

That takes care of the FastAPI side, but we still need to add the column to the SQLAlchemy model. In the UserTable class, replace pass with

    name = sa.Column(
        sa.String(length=100),
        server_default=sa.sql.expression.literal("No name given"),
        nullable=False,
    )

So, on the database side, the name column is going to be a string, not nullable, and with a default value of “No name given”. Adding new NOT NULL columns to an existing table is a tricky business, and having a default value may not be the right way to do it, but that’s another post for another time.

The Migration Script

To generate the migration script, run this command:

$ poetry run alembic revision --autogenerate -m "Added name column"

The new script should look like this:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.add_column('user', sa.Column('name', sa.String(length=100), server_default=sa.text("'No name given'"), nullable=False))
    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_column('user', 'name')
    # ### end Alembic commands ###

Looks good! After running the migration with poetry run alembic upgrade head, you can run uvicorn again and check to see that the docs are updated and you’re able to add a user with a name successfully.

Batch Operations

(Update: this section was true when I first wrote it. However, as of version 3.35.0, SQLite supports DROP COLUMN. If you run the below downgrade command with the latest versions of SQLite, it will work instead of failing. I’m leaving this section for historic reasons, and because SQLite still doesn’t support other operations, so the information is still useful. You can check the docs for more information about what is supported and why)

Remember how I said that SQLite would help illustrate one of the features of Alembic? Now is that time.

To see why Batch Operations are necessary, try downgrading the latest revision.

$ poetry run alembic downgrade -1

If you got an exception like sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near "DROP": syntax error, then you’ve discovered that SQLite has some limitations when it comes to editing tables in certain ways. One of those limitations is that you can’t drop a column.

Instead, you have to create a new table minus the column you want to drop, copy the data to the new table, drop the old table, and finally rename the new table to the old name.

That sounds like a lot of work, but we can use Alembic’s batch operations to do it for us.

Open up the “Added name column” revision and edit the downgrade() function.

downgrade():
    with op.batch_alter_table("user", schema=None) as batch_op:
        batch_op.drop_column("name")

On Sqlite, this migration will go through the necessary procedure to drop the column. On databases that support dropping columns, it will just drop the column without the extra effort.

There is a lot more to batch operations than that, which you can read all about in the docs, but that covers the basic idea.

If you want, you can have Alembic output autogenerated migration scripts as batch operations by editing env.py and passing the argument render_as_batch=True to context.configure() in the run_migrations_online() function.

Conclusion

In this article, we’ve learned about FastAPI-Users, Alembic, and the basics of how to use Alembic to manage your database. At this point, you should have a functional project with a database ready to accept user registrations. You should have the tools needed to start expanding on that database and upgrade it incrementally as you develop your project.

Finally, here are a few links that I think would be a good place to start learning more.
https://www.chesnok.com/daily/2013/07/02/a-practical-guide-to-using-alembic/
https://speakerdeck.com/selenamarie/alembic-and-sqlalchemy-sane-schema-management?slide=45
https://alembic.sqlalchemy.org/en/latest/cookbook.html#building-an-up-to-date-database-from-scratch
https://www.viget.com/articles/required-fields-should-be-marked-not-null/
https://stackoverflow.com/questions/3492947/insert-a-not-null-column-to-an-existing-table
https://www.red-gate.com/hub/product-learning/sql-prompt/problems-with-adding-not-null-columns-or-making-nullable-columns-not-null-ei028

CategoriesBook Reviews

Digital Declutter: Halfway Reflection

As I was getting ready to write this post, I came across this quote:

“The things you learn in maturity aren’t simple things such as acquiring information and skills. You learn not to engage in self-destructive behavior. You learn not to burn up energy in anxiety. You discover how to manage your tensions. You learn that self-pity and resentment are among the most toxic of drugs. You find that the world loves talent but pays off on character.”

John Gardner

This quote sums up one of my main purposes for a digital declutter. It’s not about being a better programmer. I won’t learn how to write better code. There isn’t a programming language that I’ll suddenly know at the end. Instead, the goal is to learn how to live with the technology that’s in my life. To discover self-destructive behaviors and replace them with self-constructive ones. Sure, as a nice side-effect, that should result in having more time to learn those hard skills, but that isn’t the motivation.

Now, on to some reflection on what I’ve learned so far.

In the first few days, it was interesting to see how often I would check my phone for no reason. Even though everything I would normally turn to was gone, the habit was still hard to break. I found myself scrolling through my list of apps, looking for anything to do. I ended up having to uninstall a few more apps that I hadn’t expected to be a problem. During this time, I definitely also looked at the weather more often than normal.

After some time, that tendency started to fade. I started reading a more: two books in the past week and a half or so. That’s something I want to emphasize. It wasn’t the case that I put these restrictions in place, then suddenly became super productive. Actually, for several days I’ve been far behind on my todo list. Example: I intended to write this post six days ago as a “end of the first week reflection.” Now it turned into a “almost halfway through reflection.”

The point is, reading those books actually felt good. Those hours I spent on my phone were, at worst, draining, and at best only mildly beneficial. I’m sure there will continue to be ups and downs in my energy to be productive. I believe that I’m learning how to recover that energy better, though, instead of throwing it away.

To wrap this up, let me just say that the feeling of freedom alone has been worth the cost of giving up some of the digital distractions I clung to. Feel free to reach out to me if you want to talk about doing your own digital declutter.

CategoriesBook Reviews

Digital Declutter: Preparations

Recently, I decided to do a digital declutter. This has been on my mind ever since I read Digital Minimalism, and I’ve half-heartedly implemented some aspects of a digital declutter, but never fully committed. All the recent stress, though, has had me reaching more and more for cheap distractions, like phone games, scrolling through social media, and such like.

The goal is to create some mental space to be able to examine what works best for my life. What will best help me accomplish what I want to accomplish?

Here are some at actions I took before starting, to support me in this process:

  1. Talked to my wife about it. As my partner in life, and the person who knows me best, she can help me be accountable and reflect on how I’m changing.
  2. Made a list of categories of services that I spend my time on, and how I wanted to handle that. For example, for Twitter, I needed to change my password to something I can’t remember (so that I must use a password manager to access Twitter).
  3. Made a new user on my laptop dedicated to work. I set up a new work-only Protonmail email account, work-only Lastpass account, etc. By not having easy access to the passwords for distracting services, I make it more inconvenient to get on Facebook than just Ctrl+T, F, Enter. Plus, it was a good opportunity to be more mindful of the services I wanted to use. Instead of the data-hungry Gmail, for example, I went with privacy-focused Protonmail.
  4. Set up access controls through PAM to restrict my old account to be used only during certain times of the day.
  5. Removed various apps from my phone, and disabled the ones I couldn’t remove. I also made sure it would be inconvenient to get the passwords for services like Twitter on my phone.
  6. Decided on a time frame: November 28 – December 28.

During this time, I want to reflect on what high quality activities can replace my low quality ones. Instead of browsing the suggested articles that Google bombards me with, for example, perhaps it would be better to sign up for one or two high quality newsletters that cover topics I’m interested in.

So far, my preparations seem to be helping me stay focused. I expect there will be some difficulty adjusting and figuring out what works best for me, but I’m excited for the possibilities!

CategoriesPython

Investigating Python Memory Usage

Alternate title: Why Is This Program Using So Much Memory and What Can I Do About It??

As part of work on my speech transcriber project, which aims to transcribe longer recordings while using less memory by segmenting based on DeepSegment. It’s still very much a work in progress.

While testing on an AWS EC2 instance with 2GB of RAM, though, it crashed with a memory error, even though it shouldn’t use nearly that much. This post is about how I diagnosed and solved the problem, and what tools are available.

Getting an Overview of the Problem

First, I narrowed down my code to something that was more easily repeatable.

from pydub import AudioSegment

segment = AudioSegment.from_file("test_audio.mp3") # Open the 57MB mp3
segment.set_frame_rate(16000) # Change the frame rate

All of the graphs below were based on running this simple test.

Now, it’s time to introduce psrecord, which is capable of measuring the CPU and RAM usage of a process. It can attach to an already-running process with the command psrecord <pid> --plot plot.png, which is useful for peeking at a long-running process.

For our purposes, though, psrecord can start the process for us and monitor it from start to finish. Just put the command to run in quotation marks in place of the pid. It’ll look like psrecord "python test_memory.py" --plot plot.png

Here’s what the resulting graph looks like:

Pydub memory usage before changes

The red line plots CPU usage (on the left) and the blue line memory usage (on the right). The peak memory usage is roughly 2,300MB. Definitely too much for my 2GB EC2 instance.

This is a good overview of the scope of the problem, and gives a baseline of CPU and time to compare to. In other words, if a change gets us below the 2GB mark on RAM, but suddenly takes longer to process, or uses more CPU, that’s something we want to be aware of.

Finding the Root of the Problem

What psrecord does not tell us is where the memory is being allocated in the program. What line(s) of code, specifically, are using up all of this memory?

That’s where Fil comes in. It produces a flamegraph, much like Py-Spy, but with memory usage instead of CPU. This will let us zoom in on the specific lines in pydub that allocate memory.

(Note that Fil’s actual output is an SVG and much easier to use)

According to Fil, the peak memory was 2,147MB and there are a number of places that memory is allocated. Our goal, then, is to look through those places and see if any of them can be removed.

Diving into the Pydub Source

To do that, we’re going to have to dig into the source code and try to understand the flow of data. The following samples come from this file in the pydub repository.

def from_file(cls, file, format=None, codec=None, parameters=None, **kwargs):
... # Open the file and convert it to the WAV format
    p_out = bytearray(p_out) # Cast to bytearray to make it mutable
    fix_wav_headers(p_out) # Mutate the WAV data to fix the headers
    obj = cls._from_safe_wav(BytesIO(p_out)) # Create the AudioSegment
def _from_safe_wav(cls, file):
    file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
    file.seek(0)
    obj = cls(data=file)
    if close_file:
        file.close()
    return obj
def __init__(self, data=None, *args, **kwargs):
...
    else:
        # normal construction
        try:
            data = data if isinstance(data, (basestring, bytes)) else data.read()
...
        wav_data = read_wav_audio(data)
def read_wav_audio(data, headers=None):
... # Read the headers to get various metadata to store in the WavData
    return WavData(audio_format, channels, sample_rate, bits_per_sample,
                   data[pos:pos + data_hdr.size])

When opening a file using AudioSegment.from_file, the flow is basically:

  1. Open the file and convert it to WAV.
  2. Cast the bytes to a bytearray, then mutate that bytearray to fix the wav headers.
  3. Cast the bytearray to BytesIO, then use AudioSegment._from_safe_wav to create the instance of AudioSegment.
  4. _from_safe_wav makes sure the file is opened and at the beginning of the file, before constructing the AudioSegment using the data.
  5. __init__ reads the data from the BytesIO object.
  6. The data is passed to read_wav_audio to get headers extracted, so the raw data being operated on is only the audio data.
  7. read_wav_audio extracts the headers and returns them as part of a WavData object, along with the raw audio data. It cuts off the headers by slicing the bytes.

As Fil showed, there are several copies of the data being passed around. Some can’t really be avoided. For example, slicing bytes is going to make a copy.

The Solution

It took quite a bit of experimenting to arrive at the solution. I started by using a memoryview, which would allow the last step (slicing the data) to not make a copy. That worked for my use, but it broke a number of functions, so it wasn’t acceptable as a contribution.

My next try used a bytearray, which again allowed me to cut off the headers without making a big copy. This got closer (at least, most things didn’t break), but it did break Python 2.7 support. More importantly, it made AudioSegments mutable.

Finally, I realized that I was focusing on the wrong end of the stack. The last operation is naturally what drew my attention first–since it showed up as the cause of the exception when my program ran out of memory. However, there’s a much easier place to reduce copying earlier in the call stack.

Here’s how I changed from_file:

p_out = bytes(p_out)
obj = cls(p_out)

Yes, all that happened is I replaced the casting to BytesIO and the call to _from_safe_wav with casting back to bytes, then instantiating the class directly. If you look back at it, this is exactly what _from_safe_wav did. It just had several layers of indirection: wrapping the bytes in BytesIO, then reading them back later.

So, was that small change worth it? Let’s see what Fil says about it now.

I would say that a ~900MB savings in RAM is worthwhile!

And for completeness, here’s the psrecord graph:

Pydub memory usage after changes

As might be expected, removing things only made it faster. Memory usage peaks lower, and the whole program runs much faster. A lot of the run time seems to have been just copying data around, so that makes sense.

Lessons Learned

First, keep looking until you find the right tools for the job. When I first set out to understand the memory usage, the first tools were designed more for finding memory leaks, which is a different category of memory error. Finding the right tools helped me find the solution much easier.

Second, slow down and think through the options. My initial efforts focused on only one portion of the possible locations that memory usage could be reduced, which ended up being the wrong place to focus.

On the other hand, don’t let analysis paralysis win. Even if it’s not clear where the solution might end up being, jumping in and experimenting can give you a better idea of what might work.

Third, don’t be afraid to explore whether an open source library could be better for your use case! For small files, the overhead of making all those copies is not so significant, so not as many people have likely looked into improving memory usage. Taking the time to explore the issue allowed me to make a contribution.

Thanks for reading! I would appreciate your feedback, or to hear about a tricky memory problem that you debugged.

CategoriesProject Management

On Writing Good Documentation

Recently, I’ve been working on a project, mostly for the purpose of learning how to set up a modern Python project using the best tools currently available. Most of it is based on the Hypermodern Python series of articles, supplemented by my own preferences and research.

At this point in the project, I’m working on writing the docs, so I want to clarify my thoughts on what should go into them to make them as useful as possible.

It boils down to answering questions that people are likely to have. Good docs make it easy to find the answers you’re looking for. Generally, that means being concise, understandable, and organized.

Concise

The definition of concise is “brief, yet including all important information.” Saying what needs to be said in few words is a skill that can be learned over time, but it requires effort. Keeping it short makes scanning documentation to find what you need easier.

The “including all important information” is there for a reason, though. Thorough documentation shouldn’t be sacrificed for anything!

Resources for learning to write concisely are everywhere, so it may be worth investing some time in learning about this in a more structured way. Other than that, I think a process of editing previous writing is a good way to hone this skill.

Understandable

In some ways, this is the opposite of concise. You can usually pack more information into fewer words by using difficult-to-understand words or obscure grammar. There’s a balance to find here.

There are some tools that can help determine how difficult your writing is. It’s probably not a good idea to rely on them too much, but they can give a general idea. For example, this blog post has “an average grade level of about 8,” so it should be pretty easy to understand.

Along with finding the right words, having examples can make things a lot clearer, when applicable. They show concretely how to do what your words are describing.

Organized

If the answer to your question is buried somewhere that isn’t obvious, or it’s split between multiple sections, it’s going to be harder to find. My plan is to think about what questions are likely to be asked, then try to group sections based on that.

Initially, this should be easy enough. Most questions are likely to be along the lines of “How do I use X to do Y?” That will likely be in the docstring for X, for common or simple Y. Or maybe a page in the docs dedicated to Y, if it’s more complex.

Later, though, there may be questions like “How do I contribute?” or “Why was X decision made instead of Y?” Or any number of questions that don’t belong in a docstring.

Conclusion

There’s a lot that goes into writing good documentation. These are just some general principles, which really apply to all technical writing. I think getting them written down has helped me consider how they apply to my project. Anyway, if you manage to have concise, understandable, and organized docs, you’re already doing well.

What else do you think goes into writing good documentation?

CategoriesPython

Xonsh Alias: Setup a Python Project

Recently, I was reading The Pragmatic Programmer, and the section on continual learning, especially learning how to use new tools, really stuck with me.

With that in mind, I turned my attention to ways to improve my process for starting up a new project, which is something I do fairly often to experiment.

There are several aspects of setting up a new project, and managing all of them manually can be repetitive, error-prone, and difficult. Some of the ones I wanted to take care of using tools include:

  • Structuring the project directory structure
  • Creating a virtual environment and activating it every time
  • Managing dependencies

The tools I decided to use are poetry and vox/autovox. Poetry takes care of a lot of project management issues, while vox allows me to use virtualenvs that play well with Xonsh. In the future, I’d also like to explore using cookiecutter for templates.

I tied all of these tools–of course, plus git–together into a Xonsh alias. If you’re not familiar with that, check out my introduction to Xonsh and my article about using an alias to filter Mut.py results.

xontrib load vox, autovox

from pathlib import Path


def _create_project(args):
    project_name = args[0]
    poetry config virtualenvs.create false
    poetry new @(project_name)
    cd @(project_name)
    
    env_name = str(Path.cwd())[1:]
    print("Removing previous virtualenv, if it exists (KeyError means that it did not exist)")
    vox deactivate
    vox remove @(env_name)
    
    pyenv_path = Path($(pyenv root).strip()) / "versions"
    interpreter = $(pyenv version-name).split(":")[0]
    interpreter_path = str(pyenv_path / interpreter / "bin" / "python")
    print("Using Python " + interpreter + " at " + str(interpreter_path))
    
    vox new @(env_name) --interpreter @(interpreter_path)
    git init
    rm pyproject.toml
    poetry init

aliases["create_project"] = _create_project


@events.autovox_policy
def auto_based_on_dir(path, **_):
    venv = Path($HOME + "/.virtualenvs" + str(path))
    if venv.exists():
        return venv

The usage is simple: $ create_project project_name will use poetry to create a new project directory project_name, then creates an environment, initializes the git repository, removes the pyproject.toml made by poetry new, and finally runs poetry init to interactively create a new pyproject.toml.

Most of this is pretty simple, but it takes care of several steps at once with one command, which allows me to jump right in to coding when I have a new idea or want to experiment with something.

The most complicated part is the creation of the virtual environment and registering an autovox policy to automatically activate the environment. Vox creates all virtual environments in ~/.virtualenvs. So, for example, if I start a project in /home/harrison/project_name, then a virtual environment gets created at ~/.virtualenvs/home/harrison/project_name.

The auto_based_on_dir function gets registered with autovox and controls activating the proper environment based on what directory I’m working on. It does this by checking whether a virtual environment based on a particular path exists, and returns the path to it if it does.

Conclusion

I’m excited to continue to improve the tools I use in my projects. In particular, poetry seems like a good way to manage and publish projects to the PyPI. It only took a little bit of time to put this together, and I expect it will result in a lot of good.

Switching to vox and using autovox to activate virtual environments should also save a lot of time. In the past, I’ve used pyenv virtualenv and manually activated environments as needed.

What tools do you use as part of your workflow?

CategoriesPython

Advanced Python Data Classes: Custom Tools

Python’s dataclasses module, added in 3.7, is a great way to create classes designed to hold data. Although they don’t do anything that a regular class couldn’t do, they take out a lot of boilerplate code and let you focus on the data.

If you aren’t already familiar with dataclasses, check out the docs. There are also plenty of great tutorials covering their features.

In this tutorial, we’re going to look at a way to write tools that extend dataclasses.

Let’s start with a simple dataclass that holds a UUID, username, and email address of a user.

from dataclasses import dataclass, field
import uuid


@dataclass
class UserData:
    username: str
    email: str
    _id: uuid.UUID = field(default_factory=uuid.uuid4)


if __name__ == "__main__":
    username = input("Enter username: ")
    email = input("Enter your email address: ")

    data = UserData(username, email)
    print(data)

This is pretty simple. Ask the user for a username and an email address, then show them the shiny new data class instance that we made using their information. The class will, by default, generate a unique id for every user.

But what if we have sneaky users who might try giving an invalid email address, just to break things?

It’s simple enough to extend data classes to support field validation. dataclass is just a decorator that takes a class and adds various methods and attributes to it, so let’s make our own decorator that does the same thing.

def validated_dataclass(cls):
    cls.__post_init__ = lambda self: print("Initializing!")
    cls = dataclass(cls)
    return cls

@validated_dataclass
class UserData:
...

Here, we add a simple __post_init__ method to the class, which will be called by the data class every time we instantiate the class. But how can we use this power to validate an email address?

This is where the metadata argument of a field comes in. Basically, it’s a dict that we can set when defining a field in the data class. It’s completely ignored by the regular dataclass implementation, so we can use it to include information about the field for our own purposes.

Here’s how UserData looks after adding a validator for the email field.

from dataclasses import dataclass, field

def validate_email(value):
    if "@" not in value:
        raise ValueError("There must be an '@' in your email!")
    
    return value


@validated_dataclass
class UserData:
    username: str
    email: str = field(metadata={"validator": validate_email})
    _id: uuid.UUID = field(default_factory=uuid.uuid4)

Now the email field of the data class will carry around that validator function, so that anyone can access it. Let’s update the decorator to make use of it.

from dataclasses import dataclass, field, fields

def validated_dataclass(cls):
    cls = dataclass(cls)
    def _set_attribute(self, attr, value):
        for field in fields(self):
            if field.name == attr and "validator" in field.metadata:
                value = field.metadata["validator"](value)
                break

        object.__setattr__(self, attr, value)

    cls.__setattr__ = _set_attribute
    return cls

The new decorator replaces the regular __setattr__ with a function that first looks at the metadata of the fields. If there is a validator function associated with the attribute, it calls the function and uses its return value as the value to set.

The power of this approach is that now anybody can validate fields on their data classes by importing this decorator and defining a validator function in the metadata of their field. It’s a drop-in replacement to extend any data class.

One downside to this is the performance cost. Even attributes that don’t need validation will run through the list of fields every time they’re set. In another article, I’ll look at how much of a cost this actually is, and explore some optimizations we can make to reduce the overhead.

Another downside is the potential lack of readability of setting metadata on every field. If that becomes a problem, you could try defining the metadata dict elsewhere, so the field would look like email: str = field(metadata=email_metadata).

The possible uses of metadata are limitless! Combined with custom decorators that use dataclass behind the scenes, we can add all sorts of functionality to data classes.

For serious validation needs, it’s still most likely to be better to use something like Pydantic or Marshmallow, rather than make your own. Both of them have either built-in support for data classes, or there are other packages available to add that support.

If you have any ideas for extending data classes, let me know in the comments!

CategoriesPython

Learning CPython Bytecode Instructions

Recently, I’ve been interested in learning some of the internal workings of CPython in order to more deeply understand Python and possibly contribute to it. I’ve also come across Nim and wanted to try a project using it. To accomplish two goals at once, I decided to write a toy Python VM using Nim to execute Python bytecode.

This post is where I intend to compile information about Python’s compilation step as I learn about it, as a reminder for myself and a resource for anyone else who might be curious.

What Is Python’s Compilation Step?

For a detailed overview of CPython and the steps a program goes through from source code to execution, check out this RealPython guide by Anthony Shaw. Part 3, in particular, describes the step in question here.

In short, and broadly speaking, the process goes source code -> lexing/tokenizing -> parsing to AST -> compiling to bytecode -> execution. You can compile code using Python’s built-in compile function, which results in a code object.

Here’s the JSON object that results from a code object compiled from the statement hello = 3000. This JSON contains everything needed to run the program. The most important items in this example are "code", which contains the opcodes; "consts", which is a list of constants used in the program; and "names", which is a list of variable names used in the programs. This will make more sense later.

{
  "argcount": 0,
  "cellvars": [],
  "code": "d\u0000Z\u0000d\u0001S\u0000",
  "consts": [3000, null],
  "filename": "test.py",
  "firstlineno": 1,
  "flags": 64,
  "freevars": [],
  "kwonlyargcount": 0,
  "lnotab": "",
  "name": "<module>",
  "names": ["hello"],
  "nlocals": 0,
  "stacksize": 1,
  "varnames": []
}

Another helpful tool is dis, which outputs the opcodes of a compiled program.

For example, the code object, according to dis, looks like this:

  1           0 LOAD_CONST               0 (3000)
              2 STORE_NAME               0 (hello)
              4 LOAD_CONST               1 (None)
              6 RETURN_VALUE

The first column is the line number. The second column I’m not sure about yet; it might be the index of the opcode, since it goes like [opcode, arg, opcode, arg…]. The third column is the name of the opcode. The fourth column is the argument for the opcode, which is usually an index for a different list. Finally, the fifth column is what that index points to.

Putting it all together:

The first opcode, LOAD_CONST, loads a constant from index 0 in the consts list, which is 3000, and puts it on the stack. The next opcode, STORE_NAME, pops a value (3000) off the stack and associates it with the name at index 0 of the names list, which is hello.

The next two opcodes just indicate the end of the frame.

That’s a lot to digest, and we’ve only just managed to assign an integer to a variable!

Strategy

This is a big project, so I’m planning on doing it a bite at a time. At its current stage, my Nim VM can take the bytecode of programs that simply assign constants to variables —and the constant has to be either an integer or a string.

My strategy is to write a Python program that requires just a bit more functionality than what my VM currently implements. Then work on the VM until it can successfully run the program. Rinse and repeat.

I’ll try to write it in the best Nim that I can, but I won’t let analysis paralysis prevent me from going ahead and getting something working. Since I’m doing this to learn Nim, a lot of it probably isn’t going to be the most idiomatic, performant Nim code. That’s okay. I’ll try to fix things as I continue to learn.

When I finish implementing an opcode, or learn something useful about code objects and such like, I’ll update this article with what I’ve learned.

Credits/References

Thanks to Alison Kaptur for an excellent article describing a simple implementation of a Python VM written in Python. That pointed me to the actual VM, Byterun, which she worked on, though it was primarily written by Ned Batchelder. It has been a great help in understanding how all of this works. So, without further ado, here are my notes on terminology, code objects, and opcodes.

Definitions

  • Code Object: A collection of fields containing all the information necessary to execute a Python program.
  • Stack: A data structure holding the values that the VM is working on. You push values to the top of the stack to store them and pop value from the top of the stack to work with them.
  • Opcode: A byte that instructs the VM to take a particular action, such as pushing a value to the stack or popping the top two values from the stack and adding them together.
  • Jump: An opcode that “jumps,” skips to another part of the bytecode. This can be relative (jump forward 10 bytes from the current position, for example) or absolute (jump to the 23rd byte from the beginning).

Code Objects

Code objects consist of several fields containing various information about the program.

Field NameDescription
argcountI’m not sure yet.
cellvarsI’m not sure yet.
codeA bytestring representing the opcodes and their arguments. Every other byte is an opcode, and in between each opcode is a byte for the argument.
constsA list of constants. These could be integers, strings, None [null], etc. These are put onto the stack by the LOAD_CONST opcode.
filenameThe name of the file which was compiled. When compiling from within a program, this is passed to the compile built-in function as an argument, so it could be any string, really.
firstlinenoThe first line number. I’m not sure what this is for, yet. I’m assuming it has something to do with frames.
flagsI’m not sure yet.
freevarsI’m not sure yet.
kwonlyargcountI’m not sure yet. I assume it has something to do with keyword-only arguments to functions.
lnotabI’m not sure yet.
nameThe name of the module. I assume this is related to importing.
namesA list of names of variables, which will be referenced by certain opcodes to associate variables with values.
nlocalsI’m not sure yet.
stacksizeI’m not sure yet. It’s possible that the max size of the stack is precomputed so that the data structure can be initialized to the correct size.
varnamesI’m not sure yet.

Opcodes

Note: sometimes the byte value skips ahead, to leave room for new opcodes to be inserted there in the future. To indicate when that happens, in the table below, the Byte Value is in bold.

Some of the information here comes from the dis module’s documentation. Some of it comes from the opcode module’s source code. Some of it comes from what I’ve learned through experimentation.

NameByte
Value
Description
POP_TOP1Pop the top value from the stack.
ROT_TWO2Rotate the top two values of the stack. For example: [1, 2, 3, 4] -> [1, 2, 4, 3].
ROT_THREE3As above, but rotate the top three values.
DUP_TOP4Duplicate the top stack value.
DUP_TOP_TWO5Duplicate the top two stack values.
ROT_FOUR6As ROT_TWO, but rotate the top four values.
NOP9No operation. Does nothing. Used as a placeholder.
UNARY_POSITIVE10Unary operations take the top of the stack apply an operation to it, then push it back on the stack. This one adds 1 to it.
UNARY_NEGATIVE11Same as above, but subtract 1.
UNARY_NOT12Negates the top stack value (as in not x).
UNARY_INVERT15Inverts the top stack value (as in ~x).
BINARY_MATRIX_
MULTIPLY
16Binary operations take the top two values from the stack, perform an operation on them, then push the result onto the stack. This one performs matrix multiplication (the @ syntax, new in 3.5).
INPLACE_MATRIX_
MULTIPLY
17Performs an inplace matrix multiplication.
BINARY_POWER19
BINARY_MULTIPLY20
BINARY_MODULO22
BINARY_ADD23
BINARY_SUBTRACT24
BINARY_SUBSCR25
BINARY_FLOOR_DIVIDE26
BINARY_TRUE_DIVIDE27
INPLACE_FLOOR_DIVIDE28
INPLACE_TRUE_DIVIDE29
GET_AITER50
GET_ANEXT51
BEFORE_ASYNC_WITH52
BEGIN_FINALLY53
END_ASYNC_FOR54
INPLACE_ADD55
INPLACE_SUBTRACT56
INPLACE_MULTIPLY57
INPLACE_MODULO59
STORE_SUBSCR60
DELETE_SUBSCR61
BINARY_LSHIFT62
BINARY_RSHIFT63
BINARY_AND64
BINARY_XOR65
BINARY_OR66
INPLACE_POWER67
GET_ITER68
GET_YIELD_FROM_ITER69
PRINT_EXPR70
LOAD_BUILD_CLASS71
YIELD_FROM72
GET_AWAITABLE73
INPLACE_LSHIFT75
INPLACE_RSHIFT76
INPLACE_AND77
INPLACE_XOR78
INPLACE_OR79
WITH_CLEANUP_
START
81
WITH_CLEANUP_
FINISH
82
RETURN_VALUE83
IMPORT_STAR84
SETUP_ANNOTATIONS85
YIELD_VALUE86
POP_BLOCK87
END_FINALLY88
POP_EXCEPT89
STORE_NAME90All opcodes from here on have arguments. This operation pops the top value from the stack and associates it with a name in the names list. The argument is the index of the name.
DELETE_NAME91Deletes the association between a name and a value. The argument is the index of the name.
UNPACK_SEQUENCE92The argument is the number of tuple items.
FOR_ITER93The argument is a relative jump.
UNPACK_EX94
STORE_ATTR95The argument is the index of a name in the names list.
DELETE_ATTR96The argument is the index of a name in the names list.
STORE_GLOBAL97The argument is the index of a name in the names list.
DELETE_GLOBAL98The argument is the index of a name in the names list.
LOAD_CONST100Push a constant onto the stack. The argument is the index of a constant in the consts list.
LOAD_NAME101Push the value associated with a name onto the stack. The argument is the index of a name in the names list.
BUILD_TUPLE102Pop the top n values from the stack and build a tuple from them. The argument is the number of tuple items to pop. Push the resulting tuple onto the stack.
BUILD_LIST103Pop the top n values from the stack and build a list from them. The argument is the number of list items to pop. Push the resulting tuple onto the stack.
BUILD_SET104Pop the top n values from the stack and build a set from them. The argument is the number of set items to pop. Push the resulting set onto the stack.
BUILD_MAP105Pop the top n values from the stack and build a map (dictionary) from them. The argument is the number of dict entries. Push the resulting dict onto the stack.
LOAD_ATTR106The argument is the index of a name in the names list.
COMPARE_OP107
IMPORT_NAME108
IMPORT_FROM109
JUMP_FORWARD110The argument is the number of bytes to skip (a relative jump).
JUMP_IF_FALSE_
OR_POP
111The argument is the byte index to jump to (an absolute jump).
JUMP_IF_TRUE_
OR_POP
112The argument is the byte index to jump to (an absolute jump).
JUMP_ABSOLUTE113The argument is the byte index to jump to (an absolute jump).
POP_JUMP_IF_FALSE114The argument is the byte index to jump to (an absolute jump).
POP_JUMP_IF_TRUE115The argument is the byte index to jump to (an absolute jump).
LOAD_GLOBAL116The argument is the index of a name in the names list.
SETUP_FINALLY122The argument is the number of bytes to jump (a relative jump).
LOAD_FAST124The argument is the local variable number.
STORE_FAST125The argument is the local variable number.
DELETE_FAST126The argument is the local variable number.
RAISE_VARARGS130The argument is the number of raise arguments (1, 2, or 3).
CALL_FUNCTION131
MAKE_FUNCTION132
BUILD_SLICE133
LOAD_CLOSURE135
LOAD_DEREF136
STORE_DEREF137
DELETE_DEREF138
CALL_FUNCTION_KW141
CALL_FUNCTION_EX142
SETUP_WITH143
LIST_APPEND145
SET_ADD146
MAP_ADD147
LOAD_CLASSDEREF148
EXTENDED_ARG144Note: this one is out of order in opcode.py, so I’ve listed it out of order here, too.
BUILD_LIST_UNPACK149
BUILD_MAP_UNPACK150
BUILD_MAP_UNPACK
_WITH_CALL
151
BUILD_TUPLE_UNPACK152
BUILD_SET_UNPACK153
SETUP_ASYNC_WITH154
FORMAT_VALUE155
BUILD_CONST_KEY_MAP156
BUILD_STRING157
BUILD_TUPLE_UNPACK_
WITH_CALL
158
LOAD_METHOD160
CALL_METHOD161
CALL_FINALLY162
POP_FINALLY163
CategoriesPython

Getting Started with DeepSpeech on AWS

Recently, I’ve been working on a project using Python + DeepSpeech. I will share some considerations for setting this type of project up on AWS, including which instance types to look at, in this article. It took quite a bit of trial and error to figure out which one would work best!

What Is DeepSpeech?

DeepSpeech is a speech-to-text engine + model. In other words, it comes with everything you need to get started transcribing audio files to text.

It comes with Python bindings and a client, which you can use as a command line utility, or as an example of how to write your own Python program that uses DeepSpeech.

There are some limitations: the model requires WAV audio at 16,000 hz. The client can use Sox to resample to 16,000 hz if required, but it’s up to you to make sure the file is in the WAV format. My project uses Pydub to handle preprocessing audio files.

DeepSpeech on AWS?

There are a few considerations when putting a DeepSpeech project on AWS EC2. At minimum, you need the right CPU and enough memory.

The main requirement of the CPU is that it support the AVX instruction set, which rules out several instance types. Even on those that do support the instruction set, you need to make sure to use HVM AMI in order to access it.

Beyond that, it’s helpful to note that DeepSpeech will only use one core of the CPU, so using an instance with a lot of cores will only help if transcribing multiple files in parallel.

Memory requirements are also important in the consideration, and depend on the size of the audio files. If you’re only transcribing small files, it shouldn’t be an issue. Trying to work with 30-45 minute-long recordings has required some working around to keep the memory usage reasonable, especially in the preprocessing area.

So, what instance type am I using? Right now, it’s a t3.small. It has the right kind of CPU and enough memory to do the preprocessing and transcribe small chunks at a time. However, I would need more memory if trying to transcribe a large audio file straight through.

If I were putting this in production, though, I think I would split preprocessing and transcribing and put the former on a C5 instance and the latter on either a C5 or P3 instance, after testing to see which works best for the requirements.

After picking an instance, installation is fairly easy. Just follow the instructions and it should work fine.

So, AWS experts, did I overlook an option that would suit DeepSpeech even better? Let me know!

–Harrison

CategoriesPython

Measuring Python Performance Using Py-Spy

When optimizing the performance of a program, it’s essential to test and measure what the bottlenecks are. Programmers are bad at guessing what part of a program will be the slowest. Trying to guess is likely to lead to sacrificing code readability for uncertain gains, or even losses in performance (Code Complete, 2nd Edition page 594).

In my case, I have a starry sky generator that I wanted to improve the performance of. The goal: to allow people to generate bigger images in a reasonable amount of time. So, how can we find out where improvements need to be made?

Enter Py-Spy

Py-Spy is a tool for profiling a Python program. When I was first optimizing the performance of my program, I used PyFlame, but that project is no longer maintained. Py-Spy does everything PyFlame did and more. Another nice bonus is that it isn’t limited to Linux. On top of all that, it’s also installable through pip, so it seems to be a big win!

To install it, just run pip install py-spy.

Py-Spy has a number of commands and options we can use to customize the output. For one, we can attach it to an already-running process (for example, a production web server that’s having issues we want to diagnose) using -p PID. This method will probably require you to run it as root (sudo) so it can access the memory of the other process. The method I will be using is to pass py-spy the command to start the Python program itself, which will look like py-spy [command] -- python stars.py.

Speaking of commands, there are three available: record, top, and dump. Top is an interesting one: it looks like the unix top command, but instead shows data about which functions your program is spending the most time in. Dump just prints out the current call stack, and can only be used in by attaching to an already-running process. This is useful if your program is getting hung up somewhere, and you want to find out where.

For our purposes, though, the record command is the most useful. It comes with various options.

Record’s Options

$ py-spy record --help
py-spy-record 
Records stack trace information to a flamegraph, speedscope or raw file

USAGE:
    py-spy record [OPTIONS] --output <filename> --pid <pid> [python_program]...

OPTIONS:
    -p, --pid <pid>              PID of a running python program to spy on
    -o, --output <filename>      Output filename
    -f, --format <format>        Output file format [default: flamegraph]  [possible values:
                                 flamegraph, raw, speedscope]
    -d, --duration <duration>    The number of seconds to sample for [default: unlimited]
    -r, --rate <rate>            The number of samples to collect per second [default: 100]
    -s, --subprocesses           Profile subprocesses of the original process
    -F, --function               Aggregate samples by function name instead of by line number
    -g, --gil                    Only include traces that are holding on to the GIL
    -t, --threads                Show thread ids in the output
    -i, --idle                   Include stack traces for idle threads
    -n, --native                 Collect stack traces from native extensions written in Cython, C
                                 or C++
        --nonblocking            Don't pause the python process when collecting samples. Setting
                                 this option will reduce the perfomance impact of sampling, but
                                 may lead to inaccurate results
    -h, --help                   Prints help information
    -V, --version                Prints version information

ARGS:
    <python_program>...    commandline of a python program to run

There are a few options that are particularly noteworthy here. --format lets us pick between flamegraph, raw, or speedscope. We’ll be using flamegraph, but speedscope is interesting, too. You can examine a speedscope file using the webapp.

--function will group the output by function, rather than by line number. Both have pros and cons. Grouping by function is helpful to get an easier-to-understand overview, while grouping by line number can help you narrow it down further.

Finally, --rate tells py-spy how many times per second to sample the program. The default is 100, but I’ve found that adjusting this either up or down can help, especially if there are a lot of small, quick functions (or lines) that add up. It doesn’t hurt to play around with this and compare the resulting flamegraphs to see which seem the most useful.

Now, we can generate a flamegraph of the starry sky generator. I’ll be running py-spy -o profile.svg --function -- python stars.py on this commit, modified to generate one image with the dimensions (2000, 2000).

Reading a Flamegraph

Here it is!

py-spy cast (stars.py) (957 samples, 24.22%)cast (stars.py)randint (random.py) (690 samples, 17.46%)randint (random.py)randrange (random.py) (561 samples, 14.20%)randrange (random.py)_randbelow_with_getrandbits (random.py) (242 samples, 6.13%)_randbel..generate_star_pixel (stars.py) (668 samples, 16.91%)generate_star_pixel (stars..planck (stars.py) (397 samples, 10.05%)planck (stars.p..generate_sky_pixel (stars.py) (2,946 samples, 74.56%)generate_sky_pixel (stars.py)randint (random.py) (304 samples, 7.69%)randint (ra..randrange (random.py) (260 samples, 6.58%)randrange.._randbelow_with_getrandbits (random.py) (112 samples, 2.83%)_r..<module> (stars.py) (3,927 samples, 99.39%)<module> (stars.py)generate_sky (stars.py) (3,889 samples, 98.43%)generate_sky (stars.py)putpixel (PIL/Image.py) (668 samples, 16.91%)putpixel (PIL/Image.py)load (PIL/Image.py) (259 samples, 6.56%)load (PIL..all (3,951 samples, 100%)

When running it yourself, you’ll get an SVG file that, when opened in your browser, will be bigger, easier to read, and include some nice JavaScript features like being able to click on a block to zoom in on it, and search. For the moment, take some time to explore the above SVG—hover over a block to get the full text and percentage.

The graph is read from top to bottom. So, in this case, all of the time was spent in the module stars.py. Underneath that is the call to generate_sky, which also basically takes up all of the time. From there, things get more interesting. A portion of the time is taken up just by generate_sky (the part that doesn’t have any blocks beneath it), most of it is taken up by generate_sky_pixel, and some is used by putpixel.

Note that this isn’t grouped by time, but by function. These functions are called one-by-one, so if it were grouped by time, it would be a tiny block for generate_sky_pixel, then a tiny block for putpixel and so on several thousand times.

Since it’s grouped by function, we can more easily compare overall how much time is spent in a particular function versus another. At a glance, we can see that much more time is spent generating the pixel vs. putting it into the image.

A lot of time in generate_sky_pixel isn’t taken up by a sub function, but a fairly significant amount is still used by cast and others.

Let’s get a new graph, but grouped by line number instead of function: py-spy record -o profile.svg -- python stars.py

py-spy Reset Zoom generate_sky (stars.py:132) (51 samples, 1.24%)generate_sky_pixel (stars.py:101) (124 samples, 3.02%)gen..generate_sky_pixel (stars.py:102) (58 samples, 1.41%)cast (stars.py:78) (47 samples, 1.15%)generate_sky_pixel (stars.py:103) (621 samples, 15.15%)generate_sky_pixel (sta..cast (stars.py:79) (469 samples, 11.44%)cast (stars.py:79)randint (random.py:248) (360 samples, 8.78%)randint (ran..randrange (random.py:224) (182 samples, 4.44%)randr..generate_sky_pixel (stars.py:109) (81 samples, 1.98%)g..cast (stars.py:78) (64 samples, 1.56%)generate_sky_pixel (stars.py:110) (603 samples, 14.71%)generate_sky_pixel (st..cast (stars.py:79) (439 samples, 10.71%)cast (stars.py:7..randint (random.py:248) (355 samples, 8.66%)randint (ran..randrange (random.py:224) (159 samples, 3.88%)rand..generate_sky_pixel (stars.py:111) (345 samples, 8.41%)generate_sky..randint (random.py:248) (286 samples, 6.98%)randint (..randrange (random.py:224) (152 samples, 3.71%)rand..generate_sky_pixel (stars.py:115) (181 samples, 4.41%)gener..planck (stars.py:71) (55 samples, 1.34%)generate_star_pixel (stars.py:87) (205 samples, 5.00%)genera..planck (stars.py:72) (93 samples, 2.27%)p..planck (stars.py:71) (44 samples, 1.07%)generate_star_pixel (stars.py:88) (150 samples, 3.66%)gene..planck (stars.py:72) (69 samples, 1.68%)planck (stars.py:71) (51 samples, 1.24%)generate_star_pixel (stars.py:89) (179 samples, 4.37%)gener..planck (stars.py:72) (87 samples, 2.12%)p..generate_star_pixel (stars.py:91) (101 samples, 2.46%)ge..generate_sky_pixel (stars.py:117) (719 samples, 17.54%)generate_sky_pixel (stars.p..generate_sky_pixel (stars.py:119) (85 samples, 2.07%)g..generate_sky_pixel (stars.py:120) (54 samples, 1.32%)generate_sky_pixel (stars.py:121) (52 samples, 1.27%)putpixel (PIL/Image.py:1678) (41 samples, 1.00%)load (PIL/Image.py:796) (79 samples, 1.93%)l..putpixel (PIL/Image.py:1680) (355 samples, 8.66%)putpixel (PI..load (PIL/Image.py:818) (130 samples, 3.17%)loa..putpixel (PIL/Image.py:1686) (56 samples, 1.37%)<module> (stars.py:145) (4,026 samples, 98.20%)<module> (stars.py:145)generate_sky (stars.py:140) (3,958 samples, 96.54%)generate_sky (stars.py:140)putpixel (PIL/Image.py:1692) (235 samples, 5.73%)putpixe..all (4,100 samples, 100%)

There’s a lot more information in this graph. For example, it calls attention to the fact that cast is called in two different lines in generate_sky_pixel. The time spent in generate_star_pixel is pretty evenly distributed between lines 87, 88, and 89–which makes sense, because those are the three lines that call planck.

There’s one more piece of information that will be useful: the total time it takes to generate an image. The flamegraph tells us what percentage of time each function/line takes, but it isn’t meant to measure the total run time. I created a performance.py file which uses timeit to generate ten images with the dimensions (900, 900) and return the average number of seconds it took per image. In this case, it took 7.64 seconds. We can definitely do better.

Tuning the Performance

Now that we have the information from these two graphs, as well as the run time, we can get to work making it faster. Looking again at the first flamegraph above, it seems putpixel uses up a total of ~17% of the run time. The docs specifically warn that putpixel is relatively slow, so it seems like this should be a pretty easy win.

I experimented with several methods, including storing the data as a list of lists of tuples, then converting to a numpy array, then feeding that to Image.fromarray, with a result of 7.1 seconds, about a 7% savings. As you might imagine, this still wasn’t very good.

The natural progression from there seemed to be to skip the lists altogether and start with a numpy array, filled with zeroes initially. For some reason, this was actually slower than putpixel: 7.89 seconds. I’m not a NumPy expert, so I’m not sure why this is. Perhaps mutating is a slow operation for NumPy, or maybe I was just doing it the wrong way. If someone takes a look and wants to let me know, I’d be happy to learn about this.

After that, I tried building up a bytearray, extending it by the three pixels each time they were generated, then converting that to bytes and passing it to Image.frombytes(). Total run time: 6.44 seconds. That’s about a 17% savings over putpixel.

Here’s what our flamegraph looks like now that we’ve settled on a putpixel replacement (and after splitting the bytearray.extend onto its own line, so that it will show up separately):

py-spy Reset Zoom generate_sky (stars.py:144) (39 samples, 1.21%)generate_sky_pixel (stars.py:110) (109 samples, 3.37%)gen..generate_sky_pixel (stars.py:111) (52 samples, 1.61%)cast (stars.py:87) (43 samples, 1.33%)generate_sky_pixel (stars.py:112) (580 samples, 17.95%)generate_sky_pixel (stars.py..cast (stars.py:88) (431 samples, 13.34%)cast (stars.py:88)randint (random.py:248) (327 samples, 10.12%)randint (random..randrange (random.py:224) (173 samples, 5.35%)randran..generate_sky_pixel (stars.py:118) (94 samples, 2.91%)ge..cast (stars.py:87) (45 samples, 1.39%)generate_sky_pixel (stars.py:119) (584 samples, 18.07%)generate_sky_pixel (stars.py..cast (stars.py:88) (428 samples, 13.24%)cast (stars.py:88)randint (random.py:248) (322 samples, 9.96%)randint (rando..randrange (random.py:224) (159 samples, 4.92%)randra..generate_sky_pixel (stars.py:120) (365 samples, 11.29%)generate_sky_pixe..randint (random.py:248) (292 samples, 9.03%)randint (rand..randrange (random.py:224) (132 samples, 4.08%)rand..generate_sky_pixel (stars.py:124) (136 samples, 4.21%)gener..generate_star_pixel (stars.py:100) (112 samples, 3.47%)gen..planck (stars.py:80) (67 samples, 2.07%)p..generate_star_pixel (stars.py:96) (214 samples, 6.62%)generate_..planck (stars.py:81) (94 samples, 2.91%)pl..generate_star_pixel (stars.py:97) (137 samples, 4.24%)gener..planck (stars.py:81) (62 samples, 1.92%)p..planck (stars.py:80) (41 samples, 1.27%)generate_sky_pixel (stars.py:126) (740 samples, 22.90%)generate_sky_pixel (stars.py:126)generate_star_pixel (stars.py:98) (171 samples, 5.29%)genera..planck (stars.py:81) (82 samples, 2.54%)pl..generate_sky_pixel (stars.py:128) (60 samples, 1.86%)g..generate_sky_pixel (stars.py:129) (58 samples, 1.79%)g..generate_sky_pixel (stars.py:130) (47 samples, 1.45%)generate_sky (stars.py:153) (2,967 samples, 91.80%)generate_sky (stars.py:153)generate_sky (stars.py:154) (192 samples, 5.94%)generate..<module> (stars.py:161) (3,225 samples, 99.78%)<module> (stars.py:161)all (3,232 samples, 100%)

Line 154 (bytearray.extend(pixels)) now only took up about 6% of the time. Even on a small image of 900 by 900 pixels, this resulted in a savings of over a second per image. For bigger images, this savings is in the range of several seconds.

Everything else in the program is directly related to image generation and the math and random number generation behind that, so assuming all of that is already optimal (spoiler alert: it isn’t, the cast() function was entirely unnecessary), this is about as fast as the program can get.

Conclusion

Flamegraphs and the profilers that generate them are useful tools for understanding the performance of a program. Using them, you can avoid trying to guess where bottlenecks are and potentially doing a lot of work for little gain.

For further reading, I recommend this article about the reasoning behind the creation of flamegraphs and the problem they were trying to solve. If you’re struggling to understand how to read the graph, it may help more than my explanation.

Now, go forth and conquer your performance problems!