Converting Django Tests to Use Factory Boy

I heard about Factory Boy this last summer at a Django meetup in LA and figured I’d give it a shot. I ended up using it on a small project and intending to write a post back then, but I got distracted and never finished the project. I’ve been working on another Django site the past month or two and decided today would be a good day to convert all of my partner’s code into using Factory Boy.

First off, Factory Boy is a tool for generating data in tests. Pretty much any project you have is going to need some way of doing this. There are basically 3 options that I can think of:

  1. Use fixtures – this works, but quickly becomes a pain in the ass to maintain. If you change a model, you need to modify your fixtures to keep them up to date.
  2. Use calls to your Model.objects.create() and pass in all the arguments you need. This works, but you end up repeating a lot of code.
  3. Create some sort of factory for building objects. I’ve tried all of these and think #3 is probably the best, and Factory Boy is a lot better implemented than the solution we came up with to solve #3 at Mahalo. Fixtures are very hard to maintain, and making create() calls all over your tests leads to a lot of repeated code.

What I’m doing now is converting #2 into #3. It’s a pretty simple process, here’s the basics for the set up:

  • Install factory_boy – check out the github page for details – https://github.com/dnerdy/factory_boy
  • Set up your app – there’s no specific way you have to do this, but I like to make tests.py into a module and put all my factories in a file inside of that. Here’s the steps for that:
    • $ mkdir myapp/tests
    • $ cp myapp/tests.py myapp/tests/tests.py
    • $ echo “from tests import *” > myapp/tests/__init__.py
    • From here you can start making your factories. I like to call my factory file factories.py ( myapp/tests/factories.py ), but you can call it whatever you want. Don’t call it factory.py though, that gave me some import issues because the package itself is called factory.
  • From here you’re good to start converting your code over.
For me, I decided to do it one model at a time. So let’s look at some code (I’m going to keep it pretty simple, but you’ll get the idea) Here’s my existing code. In this example, the view displays the products in the database, so we create two products and then check to see that they are displayed in this view:
import django.test as django_test
import products.models as products_models
class ProductViewTests(django_test.TestCase):
    def test_basic_view(self):
        product1 = products_models.Product.objects.create(name="Test Product",
            features="Awesome Features")
        product2 = products_models.Product.objects.create(name="Another Test Product",
            features="Even More Awesome Features")
        c = django_test.client.Client()

        response = c.get("/products/")
        self.assertContains(response, product1.name)
        self.assertContains(response, product1.features)
        self.assertContains(response, product2.name)
        self.assertContains(response, product2.features)

So, as you can see, we need a factory to create product objects, let’s put that into our factories.py file:

import factory
from products.models import Product
class ProductFactory(factory.Factory): # factory boy knows this is for the Product model
    name = "Test Product"
    features = "Awesome feature set brah!"

That’s technically all I need to start using the factory to make objects in the database (or not in the database if you don’t want to save them, but that’s up to you to figure out — hint: it’s in the docs.) However, what if you want automagically make objects that don’t all have the same name, you can modify your factory a bit. Factory Boy provides some cool ways to automate things. For this we can use the Sequence object:

    name = factory.Sequence(lambda n: 'Test Product {0}'.format(n))

This will make Products with names like ‘Test Product 1″, “Test Product 2”, etc. Pretty cool huh?

Ok, back to our test code. Let’s pull out the two object creation lines and replace them with our factory.

import django.test as django_test
import products.models as products_models
from products.tests.factories import ProductFactory
# One note here: I tried to import this like the others:
# import products.tests.factories as prod_fac - but it
# kept giving me an import error, so I gave up
class ProductViewTests(django_test.TestCase):
    def test_basic_view(self):
        product1 = ProductFactory()
        product2 = ProductFactory()
        c = django_test.client.Client()

        response = c.get("/products/")
        self.assertContains(response, product1.name)
        self.assertContains(response, product1.features)
        self.assertContains(response, product2.name)
        self.assertContains(response, product2.features)

There you have it. There are a lot more features to Factory Boy, such as having a model generate any other model dependencies it has, but this post is pretty long already, so I’ll save that for later. You can always check out the docs on github too, they’re good.

Fixing Postgres on Mac 10.7 Tiger for Django

I’m currently in the process of downsizing my number of computers to a single laptop (or maybe two… I mean who doesn’t need at least 2 laptops these days?), so I got a used macbook pro from a friend and have been trying to make the conversion to developing on it. So far I’ve spent far more time sysadmining on the mac than I have on ubuntu in the last 2 years. Not worth it for the pretty icons, but I really just go the macbook for the touchpad and battery life which are pretty gangbusters.

Anyways, setting up postgresql. Given that I’m not a Mac guy, I’m not 100% sure that everything I recommend here is the best way to do it, but I figured I’m not the only one who will run into these issues, so here’s my steps:

1) I set up my machine using pydanny’s snippet here: https://github.com/pydanny/pydanny-computer-setup/blob/master/mba-osx-lion.rst.

Here’s the postgres part:

$ sudo sysctl -w kern.sysv.shmall=65536
$ sudo sysctl -w kern.sysv.shmmax=16777216

Basically, what this is doing is setting up 2 kernel memory management values. If you are curious and want to see what the values are previous to changing them, you can do this:

$ sudo sysctl kern.sysv.shmall
$ sudo sysctl kern.sysv.shmmax

Then you use brew to install postgres and initialize a database to store all the info that postgres needs to run. The final command then is to actually start postgres running (the -D command specifies where the database you just initialized is located).

$ brew install postgresql
$ initdb /usr/local/var/postgres
$ postgres -D /usr/local/var/postgres

You probably don’t want to have to type that whole business out every time you want to reload/stop/start postgres though, so put this in you ~/bash_profile:

export PGDATA='/usr/local/var/postgres/'

Now you can leave off the -D /path/ part. Just type pg_ctl start to start up postgresql.

Troubleshooting

So that may have worked perfectly for you, but it didn’t for me.

Issue #1: Postgres comes bundled with Lion, or rather some someone decided that they would just put a box filled with shit into the OS and write postgres on it and hope no one noticed. Because of this, when you try to run psql, you’ll get a permission error. Bash is trying to use the wrong version of psql or whatever postgres binary you’re trying to run. To fix this, you need to put the brew installed binaries first on the path. Put this in your ~/.bash_profile:

export PATH=/usr/local/bin:$PATH

Issue #2: The kern.sysv.shmall and shmmax values from above didn’t work for me. It was working for a bit, but then stopped (weird huh? I’m thinking maybe that the shmall/shmmax that I set earlier got unset somehow). I kept getting nasty messages like this:

$ pg_ctl start
server starting
$ FATAL:  could not create shared memory segment:
Cannot allocate memory 
DETAIL:  Failed system call was shmget(key=5432001, size=16498688, 03600).
HINT:  This error usually means that PostgreSQL's request for a shared 
memory segment exceeded available memory or swap space, or exceeded your
kernel's SHMALL parameter.  You can either reduce the request size or 
reconfigure the kernel with larger SHMALL.  To reduce the request 
size (currently 16498688 bytes), reduce PostgreSQL's shared_buffers
parameter (currently 1536) and/or its max_connections parameter 
(currently 104).
The PostgreSQL documentation contains more information about 
shared memory configuration.

After spending a while reading the documentation and lamenting how much easier this would be if Apple spent as much time making their development environment work as they did printing money, I finally just upped the values a lot

$ sudo sysctl -w kern.sysv.shmmax=32768000
$ sudo sysctl -w kern.sysv.shmall=8000

One note on this: the shmall value is a value in pages (which are 1024 bytes * 4). The shmmax value (from what I could tell) has to be 4096*shmall, so I just picked a random value and now it works.

Issue #3: Django was giving me a connection error when trying to run syncdb (oddly enough, dbshell worked. I think this may have something to do with reading vs. writing, so it could be my fault). Here’s the last bit of traceback:

.../python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 140, in _cursor
    self.connection = Database.connect(**conn_params)
psycopg2.OperationalError: could not connect to server: Permission denied
	Is the server running locally and accepting
	connections on Unix domain socket "/var/pgsql_socket/.s.PGSQL.5432"?

Someone in #django was kind enough to point me in the direction of this fix: change the HOST value in the database dictionary in settings.py to this:

'HOST': '/tmp',

Ok, so I agree. WTF. But everything seems to be working now, so I’m just going to roll with it. Hope this helps someone out there.

 

What’s an API? An explanation for the layman

So working as a web developer, I work with a lot of APIs. I’m also frequently asked things by non-coders about our API and the APIs of other sites. The concept of API seems to be that of some magical coding device that solves problems. However, while this is an amusing concept, it’s not entirely correct. In trying to explain the details of this to a coworker the other day, it occurred to me that I needed a good parable. So here you go:

Imagine if you would, a grouping of small islands in the middle of an ocean. Many islands are rich in coconut trees, but one is favored by schools of fish for it’s shady rock outcroppings near shore . Between these islands are a number of conveyor belts, capable of holding a small amount of weight. Each island is inhabited by a single individual. The islands are too far apart for speech, and they have long since run out of anything to write on, so the individuals are left to communicate by a pre-agreed upon system. They send across pebbles on the conveyor system, with different numbers corresponding to different messages. An example exchange:

1 Red rock + 2 Blue rocks = What kind of fish do you have today?

1 Red + 1 Green = 1 Cod + 1 Tuna

2 Red + 2 Green = **Bad data — not translatable** — (Should a non-agreed upon message be passed, it will not be understood)

4 Green = Message not understood. Please resend

1 Red + 2 Green = I’ll take the Cod please

2 Blue + 2 Green = Will do, please send 2 coconuts in exchange

Now this is of course a simple nonsense exchange, and it’s not very intuitive system, but in creating a communication system which allows them to exchange information over the conveyor belt (which is our metaphor for the copper or fiberoptic lines of the internet, these men have created an API. In it’s simplest sense, and API is a predefined system for communication, allowing two parties to share information. While many API share similarities, each data provider is free to set up their API however they want, both in terms of how the requests are made and how they respond to requests.

As you can see, there’s nothing magical about this. In a practical sense, any data that your company or site wants to allow others to read or write can be built into an API, some takes longer than others to expose in an intelligible manner. For example, a request to “give me the username and email for a user with id=300” is simpler than something like “give me the social graph of usernames connected to the user with id=300.”

Obviously, providing your data in such a manner could be a security problem, so often times API’s will be locked down to only users or other sites making requests with appropriate keys or perhaps the site providing the API will limit the number of requests per hour. There are a number of technical considerations that can are are made in designing an API, but I’ll save you from and other complications here.

Hope that helps. Feel free to post question in the comments, and I’ll try to explain.

Mahalo 4.0 Launched

Last Tuesday morning, after 6 months of working on it, the Mahalo team launched our newest version of the site: Mahalo 4.0. I have a lot of thoughts on what went well and what could have been better during the whole process, but I’m going to save those for later. We had a nearly flawless cutover (technology-wise) It total our downtime was only about 3 hours, and for most of that it would have been unnoticeable to most visitors.  Not bad for a complete rewrite of our backend storage layer and our front-end js/css/html. We’re still working on some bugs, but that was to be expected.

Check out this video to hear our development team talk about the new site:

Lessons Learned from a Coding Sprint

We spent the month of October at Mahalo pushing really hard on the tech side of things to try to get some new products built. The plan was to set aggressive goals in the past few weeks of September and then spend October trying to complete those goals for every team. If we met the goals, then each team member would get a reward. It was a great chance for the team to really show what they were capable of and bond through their mutual struggles. (and from a more stoic standpoint, help them appreciate the times when they’re not sprinting.)

Seems like a pretty simply plan right? Well it is/was, but I learned a lot along the way.

Pre-Sprint

1) Pick a date range for the sprint as early as you can. This isn’t always easy, suppose perhaps you are invited to launch a new feature at a conference in a month. Yet if you have a larger project and more runway, giving your team as much warning as possible is valuable. Engineers (and most people) like to be able to plan things, and even though you may be ok with them missing a weekend day or two because of previous plans, they may feel like they’re letting the team down by not being there. Don’t put people in this situation if you don’t have to.

2) Provide detailed expectations The more accurately you can detail what is expected in the month, the better everyone can plan their time and coordinate efforts. I came up with a list of things that we wanted to get done and some rough estimates for how each item would take. From there I pulled out a reasonable, but aggressive schedule for the month. I ran this list by my team to see if they thought I’d missed anything important or if my estimates were off. This doesn’t need to be perfect, but everyone can be a little more relaxed when they know what’s expected and have had a chance to vet it for accuracy.

3) Pick a reward We have a very motivated team at Mahalo, but even so, working extra hours can be hard on the psyche. Having something to look forward to at the end of the sprint is nice. Hopefully your team will be proud of their accomplishments and enjoy the camaraderie during the sprint, but having a physical reward is important. It’s tangible. This could be anything from a cash bonus to some time off to some sort of gift. After some discussion, we decided the best thing to do was head to a nearby shopping center and give everyone the opportunity to buy something they wanted within certain limits. (Also as a side note, we required that everyone use part of their money to buy some sort of totem to put on their desk at work — that went over well.)

During the Sprint

During the sprint I saw my job as two-fold: Getting people things they needed to get their jobs done and progress the product forward (product decisions, design mocks, etc.) and keeping people happy. The first is more important up front and the second becomes more important as the sprint progresses, but both roles are active throughout. As much as possible, I tried to stay as late as my team stayed and never ask them to do things which I wouldn’t do myself. That said though, as a leader, you need to take responsibility for being on top of your game and in a good mood. If you’re grumpy and tired because you stayed too late, that could have a more adverse effect than ducking out a little early one day. Don’t do it every day, but remember to take care of yourself.

On that note, people might get grumpy during the sprint. Remember to be patient and to warn others in the company, otherwise some poor guy from the sales department might end up getting snapped at for asking a reasonable question at the wrong time.

People aren’t machines, they’re fallible, and such long hours and hard thinking are going to take their toll. Hopefully the following can help:

1) Architect up front Do as much planning and architecting as possible at the beginning of the sprint. As the sprint progressed, we all noticed a decrease in brain-power. If you can front-load the sprint with the majority of the hard thinking, you can avoid making mistakes you might regret later because of fatigue.

2) Don’t expect peak performance at the end People are going to get tired, output is going to faulter towards the end. No matter how energetic and motivated your team is, coding 10+ hours/day, 6 days a week is going to wear them out eventually. On this note, you can help out some by keeping weekend days short if you notice people getting burned out and trying to avoid having people stay late to put out fires if possible. We also made the mistake of planning the sprint for 1 month. By the time the 4th week rolled around, I was polled the team to get a feeling for how everyone was doing and nearly every developer independently said something along the lines of: “man, I was doing great up until the end of last week, but now my brain is fried.” I would definitely say that if you have the option, plan the sprint for either 2 or 3 weeks. This will mean minimal performance degradation at the end and minimal cool-down time post sprint. We’ve also had some 10 day long sprints at Mahalo, and they seem to have about a 3-4 day cooldown period. The month long sprint has been about 7-10 days for most to start to feel like they’re getting back into things again.

3) Use some common sense Developers are, in 99% of cases, more important than due dates. If someone is getting visibly distressed or burned out, send them home early or give them a day off.  This one is really a gut call. Some complaining is inevitable, so you’re going to have to realize the difference between this and someone who is truly suffering adverse effects from the hours.

4) Keep people fed As trivial as this seems, food is primal, and people respond to it. It’s nice to be taken care of. It saves developers time and energy to not have to take off on a weekend or late at night to go find food. If you can provide them with some, it’s win-win. This could really be a blog post unto itself.

Post Sprint

1) Recognize People After the sprint, it’s best to let people enjoy what they’ve done. I think often-times is tempting to just accept what’s been done and move on to the next thing. As backwards as it seems, you may have to force people to show off a little bit. At Mahalo, we took an hour or so and had an all-company meeting to show off what we’d accomplished and discuss what was coming next and some of the reasons for why we’d been working on what we’d been doing. I made a point to have every developer talk, even if just for 30 seconds. Everybody contributed, so everyone should get a chance to be recognized.

2) Reward People This is as simple as following through on the plans you made earlier for rewarding employees. If you can go a bit above-and-beyond that makes a difference too. Just giving them what was promised counts for a C. You want an A+. Throw in a nice dinner and some drinks or some sort of company sponsored activity. Yay, teambuilding too!

3) Let People Relax One last thing to remember is to be patient in the cooldown period from the sprint. Everyone’s brains are fried. They’re going to need some decompression to get themselves back up to full capacity and get rid of that caffeine addiction they acquired (true story).  Cleanup work is a good thing to do during this period, there’s likely a lot of it around, and it’s menial enough that people can sort of recuperate while doing it. Another thing we encouraged at Mahalo was people leaving early. Early at Mahalo is like 6:30-7, even so though, I found that having an extra hour each night was great.

One other quick note: During this period, try to refrain from saying things like: “This is awesome, good work, but why don’t we add…” This can be a really tempting trap. Just remember your suggestion in your head, write it down, and save it for a week. No one wants to hear about the next 16 tons right after they finished hauling the last 16. It’ll get done, don’t worry.

I certainly made some mistakes during the sprint, but we managed to do some stuff right. Only time will tell exactly what the outcome will be, but hopefully I can save others from learning the hard way, and save your teams from having to deal with your mistakes. :)