The workflow for a data migration in Django with South migrations is relatively simple, and fairly well-documented. If you have a model that you want to modify, you’ll want to
- define your new fields and create a schemamigration;
- create a blank migration and access the ORM dictionary to write your data migration, which moves the data from the old fields to the new; and
- remove the old fields and create another schemamigration to say goodbye to those unsalted passwords forever.
The workflow is simple enough to understand, but if you want to do anything more complicated than break your names into first_name and last_name, you’ll need some more tools. Recently, I ran into a situation where I needed to condense two entire models into a single super-model that contained all fields from both of the originals. To illustrate, I will first give a simple, silly example. If you are neither of these things, feel free to skip to the latter section in which I lay out how to write an epic-level data migration.
Silly Example: Hybridizing Animals
First, lay out the models. Ducks and beavers each get a name, a tail type, and a boolean for their bill (by default, beavers don’t have one). For simplicity’s sake, put both of these in an “animals” app within models.py
class Duck(models.model): name = models.CharField(max_length=32) weight = models.DecimalField() tail = models.CharField(default="feathered", max_length=32) bill = models.BooleanField(default=True) class Beaver(models.model): name = models.CharField(max_length=32) weight = models.DecimalField() tail = models.CharField(default="broad and featherless", max_length=32) bill = models.BooleanField(default=False)
With that taken care of, run the initial migration
./manage.py schemamigration --initial animals
Then, create some animals in the database. Registering the app in the Django admin makes creating animals easy.
Time to get hybridizing! The three steps are schemamigration, datamigration, schemamigration, so start by creating the hybrid animal class. This goes in animals/models.py with the other two. Give it the same fields as before, but do not specify defaults because these need to come from the inherited classes, and they’re all required by default anyway.
class Platypus(models.model): name = models.CharField(max_length=65) weight = models.DecimalField() tail = models.CharField(max_length=32) bill = models.BooleanField()
New model added; run the schemamigration:
./manage.py schemamigration animals --auto
To set up the datamigration, begin by creating an empty migration. Don’t forget to give it a name:
./manage.py datamigration animals hybridize_ducks_and_beavers
Inside the migration file, write a forwards function:
def forwards(self, orm): for duck in orm['animals.duck'].objects.all(): beaver = orm[‘animals.beaver’].objects.get(id=duck.id) form animals.models import Platypus platypus = Platypus ( name = duck.name + “-“ + beaver.name weight = (duck.weight + beaver.weight) / 2 tail = beaver.tail bill = duck.bill )
A couple of things to note here:
- The script loops through every duck in the list of ducks. It matches every duck with a beaver by grabbing the beaver that has the same id as each duck. (It assumes, of course, that there is a matching beaver for each duck.)
- Since there are not currently any Platypuses registered, they do not appear in the ORM. Rather than referencing existing models – as done with ducks and beavers – the script needs to import Platypus from the animals models.py file, and create a new instance of the model each time it iterates through the loop.
The new platypuses have hyphenated names. Their weights are an average of their parents, and they get their tails and bills from their beaver and duck parents, respectively:
The genetic experimentation is complete, all that is left is to remove the old models. In animals/models.py, delete all the duck and beaver code, and run
./manage.py schemamigration animals --auto
This will delete the old tables, leaving only platypuses!
Serious Example: Merging Django’s auth.user Model With a Custom User Model
Django’s default user model automatically provides a variety of commonly-used fields, such as username, email, password, is_staff, last_login, and so on. With the release of Django 1.5, it is now relatively simple to write a user model which encapsulates these fields as well as any other custom information that needs to stored about the user. However, prior to this, it was necessary to create a separate, custom table to contain any extra information, and link it via a one-to-one relationship to the auth.user table. This is the situation I was confronted with on a recent project, and when the time came to upgrade the project to Django 1.5, it made sense to combine the two user tables into one larger table to simplify storage and referencing. The procedure helped solidify my understanding of Django user models as well as South migrations, and I hope it helps you as well!
To begin, the auth_user table contained the columns: id, username, first_name, last_name, email, password, is_staff, is_active, is_superuser, last_login, and date_joined. Additionally, the auth_user model had many-to-many relationships with tables called “groups” and “user_permissions”. The custom user model was in an app called members. Thus, the members_user model contained the columns: user_ptr_id (the link to auth_user), user_type, birthdate, bio, email_prefs, hide_onboarding, cancel_state, cancel_reason, and photo. Additionally, the members_user model had three many-to-many fields: each user had favorite_comments, favorite_journal_entries, and favorite_videos.
Ultimately, I wanted all of this data to be encapsulated in a new model called “Profile” in the members app. First, I created the new Profile class in my members/models.py file. It was a duplicate of the existing members_user model, except that it also inherited from django.contrib.auth.models.AbstractUser. This is the mixin used by the regular auth.user model, and granted my Profile model all of the usual user fields (password, username, etc.). Then, I ran
./manage.py schemamigration —auto
to generate the blank model, ready to be populated.
The tricky part is the data migration. In order to coerce the data into a single table, it is necessary to loop through each auth_user; and each time:
- create a new profile object,
- insert the auth_user data,
- create new many-to-many tables from auth_user,
- insert the members_user data, and
- create new many-to-many tables from members_user.
./manage.py datamigration members migrate_userdata_to_profiledata
Next, the data migration forwards function:
class Migration(DataMigration): def forwards(self, orm): "Write your forwards methods here." # Note: Remember to use orm['appname.ModelName'] # rather than "from appname.models..." for authuser in orm['auth.user'].objects.all(): # Create a new members.Profile for every existing auth.User. I # needed to import Profile in order to create new instances of it. from members.models import Profile memberprofile = Profile ( id=authuser.id, password=authuser.password, last_login=authuser.last_login, is_superuser=authuser.is_superuser, username=authuser.username, first_name=authuser.first_name, last_name=authuser.last_name, email=authuser.email, is_staff=authuser.is_staff, is_active=authuser.is_active, date_joined=authuser.date_joined ) # Transfer the many-to-many tables from auth_user for group in authuser.groups.all(): memberprofile.groups.add(group.id) for permission in authuser.user_permissions.all(): memberprofile.user_permissions.add(permission.id) try: # If there is an associated members.User, # add those fields to the members.Profile memberuser = orm['members.user'].objects.get(user_ptr_id=authuser.id) memberprofile.user_type=memberuser.user_type memberprofile.birthdate=memberuser.birthdate memberprofile.bio=memberuser.bio memberprofile.email_prefs=memberuser.email_prefs memberprofile.hide_onboarding=memberuser.hide_onboarding memberprofile.cancel_state=memberuser.cancel_state memberprofile.cancel_reason=memberuser.cancel_reason memberprofile.photo=memberuser.photo # Transfer the m2m fields from user to profile for comment in memberuser.favorite_comments.all(): memberprofile.favorite_comments.add(comment.id) for journalentry in memberuser.favorite_journal_entries.all(): memberprofile.favorite_journal_entries.add(journalentry.id) for video in memberuser.favorite_videos.all(): memberprofile.favorite_videos.add(video.id) # In case there is a problem getting the related # members_user model, I used pdb to diagnose the issue. except orm['members.user'].DoesNotExist: pass except Exception as e: import pdb; pdb.set_trace() # All done! Save, and move on to the next user. memberprofile.save()
After performing a data migration this big, it’s important to check the actual data for consistency. Indeed, as I wrote this function, I performed the data migration, identified an error, and deleted the table data and migration many times.
The last step was to delete the old members.user model and run
./manage.py schemamigration members --auto
Transition complete; all user data is in a single table!
Concentric Sky uses Django as one of our core technologies. With Django, we build backends for mobile applications, craft custom web applications and deploy data-driven websites. We’ve written a number of articles on Django, use the tags to find more.