Gyepi Sam
2014-02-13
Survey Choice Rotation

Order bias is always a potential problem with any survey. Online surveys try to prevent them by reordering the pages, questions or question choices. The two reordering methods are randomization and rotation. Randomization mixes up the items such that each respondent is presented with one of the possible combinations. Most randomization algorithms are actually shuffles, most likely using the Fisher-Yates algorithm. Rotation reorders the choices for each respondent so that every choice is the first choice for a certain fraction of respondents. The key idea behind rotation is that each respondent sees the next rotation in the series.

Rotations are considerably less random than randomization. For N options, there are N factorial combinations of randomization, but only N rotations. So for a question with 5 options there are 5*4*3*2 = 120 random combinations and only 10 rotations. In a sense, rotations ensure that each position has an opportunity to be order biased whereas randomizations remove the possibility altogether.

Conceptually, rotation is simple to implement; using a zero based index where the first rotation is the original ordering, the next rotation is generated by moving the first item to the end of the list and then moving up the rest of the list. For k between 0 and N, the kth rotation of list L in python syntax is:

  L[k:N] + L[0:k]

The first k items move to the end of the list while the remaining items move up the list. There are N rotations and k uniquely determines each one. After N rotations, the list ends up in the original configuration.

Rotations are, however, more difficult to implement than randomization because it is necessary to track the number of respondents who have seen a particular order, requiring some state on the server.

Furthermore, when there is a possibility that some respondents will not complete the survey, through abandonment, disqualification, or other mechanism, rotations increase the likelihood of imbalances in the number of respondents across the various rotation orders. This requires that the rotation algorithm be changed to take the number of completed responses in each rotation combination into account. Essentially, we have to track rotations across completions and recycle rotations for unsuccessful respondents. A simple and effective solution to these issues are to:

Create N “buckets” for tracking . For simplicity, these can be represented with database records. We’ll use SQL for the example, but this could be easily implemented in a key value system, with a few changes.

Let’s assume we have a table named ‘buckets’ with the following fields:

survey_id int NOT NULL,
rotation  int NOT NULL,
assigned int DEFAULT 0,
completed int DEFAULT 0,
target int DEFAULT 0

The first three fields are self explanatory. The others less so:

* 'assigned' tracks the number of respondents who are assigned to a particular ordering.

* 'completed' tracks the number that finished.

* 'target' records the number of respondents we want to see that ordering.

We need to initialize the table before use:

A question like ‘media_exposures’ with 5 choices that need to be rotated can be initialized with:

for i = 0 to 4:

INSERT INTO buckets (survey_id, rotation, target)
VALUES (ID, i, TARGET/5)

where ID and TARGET are predefined values and we are dividing the number of respondents evenly across each buckets. Obviously, that could be different.

Upon assignment of a respondent to an ordering the table is updated and the ‘assigned’ column incremented.

UPDATE buckets
SET assigned = assigned + 1
WHERE survey_id = ? AND rotation = ?

Upon completion, the table is again updated, this time, the ‘completed’ column is incremented.

UPDATE buckets SET completed = completed + 1 WHERE survey_id = ? AND rotation = ?

With that out of the way, we can assign a respondent to an ordering with the following SQL query:

SELECT rotation
FROM buckets
WHERE survey_id = ?  AND (target = 0 OR completed < target)
ORDER BY assigned - completed, rotation
LIMIT 0

This chooses the least full bucket of all the uncompleted buckets and, in the event of a tie, selects the earlier rotation. It also handles the case where there are no targets specified.

Obviously the selected rotation needs to be stored in a respondent associated data store for later use. With this system in place, all we need do is select the assigned rotation order when the question is presented.

Note that while the initial respondents will indeed see the question in sequential rotational order, not everyone will, because the goal is to ensure a distribution of COMPLETED respondents across the rotation.

I have written a program in Go to model this process. The program has some built in assumptions about the various completion rates. Here’s the output from a typical run with a target completion of 100 respondents.

Starts: 400
Target Completes: 100

     ABANDONED: prob: 15%, count: 71
  DISQUALIFIED: prob: 15%, count: 65
     OVERQUOTA: prob: 20%, count: 69
     TERMINATE: prob: 25%, count: 92
      COMPLETE: prob: 25%, count: 103

    bucket   assigned  completed
         0         80         22
         1         80         21
         2         80         19
         3         80         21
         4         80         20

Actual Completes: 103

Because it is an adaptive process, the values are slightly inexact, but it’s interesting to note how closely this models real survey outcomes.

It’s worth pointing out again that, in general, randomization is better and preferable to rotations. Nonetheless, when it is necessary to use rotations, this is a fast and simple solution to the problem.