Santa’s Helper

Naughty or nice? Using data insights to dish out Christmas pressies to sugared-up rugrats.

The Problem

Every year in the run up to Christmas we always seem to get asked for help with IT-related problems right before the silly season really kicks off. But this is the first time we’ve been asked for help by the big guy from the North Pole himself, Santa.

So what did he have in mind? ‘Automation!’, he told us. ‘My Head Elf is down with the spicy cough and he usually decides which kid gets what. Hey, I just deliver the toys. I don’t tick off the naughty or nice list’.

Luckily the other elves had already consolidated the global present wishlist so they had a good idea of what each kid wanted (and didn’t want). They cross checked this against the ‘Naughty v Nice’ master spreadsheet (no pressies for the ‘naughties’ of course) and worked out how many gifts each kid could expect based on the size of their fam.

With the boss Elf on sick leave and Santa self-medicating with shots of butterscotch schnapps, the rest of the elf crew began frantically making random toys. With no one there to guide them, stock levels were all over the shop! However, they did manage to categorise and approximate the value of every gift they rolled out – High, Medium or Low. This way they made sure the kids got a variety of pressies of varying value.

Pre-game: Information Analysis

So what do we know about Santa’s problem? And, how do we begin to solve it?

Let’s define some business rules from what we know so far. Thanks to the elves we know what the little rug rats don’t want. Let’s call these Exclusion Rules. I mean we don’t want one of our little princes/princesses to get something they told us they don’t want right? Can you imagine?

But they’ve also let us know what they do want. We’ll do our best, but there’s no guarantees, so they might not get exactly what they asked for. There might not be enough stock to go around for instance. So let’s call these Weighted Rules.

We can use these rules to allocate a score to every child/present combo. Think of this as a suitability score. This is what we call a cartesian join (all presents and all children). We use BigQuery to handle this huge volume of data.

Let’s give an Exclusion Rule a large negative score and a Weighted Rule a small positive score. In this way, we can ensure that an Exclusion Score will always trump a Weighted Score, and these negative scores can be used to make sure a child won’t receive a present in a category they’ve told us they don’t like. 

The more of the Weighted Scores a child/present combination receives, the more likely the child is to receive that present. 

We now have a score for every child and present combination. All of this can be calculated using vanilla SQL – more on this shortly. Now we have these scores, we can use them for present allocation. We do this by:

  • Sorting the child/present combinations using the score – the highest scores go first, as these represent the most suitable presents.
  • Looping through each combination to check if we have the present available:
    • If we do, give the child the present and decrement the present stock level, as well as keep count of how many presents we’ve given to the child.
    • If not, move onto the next child/present combination and continue.

Deck the Halls with Data

With all Christmas Stories (or algorithms, in this case), an example of how this works will help (Santa is a visual learner and wants to see how this all hangs together). 

We’ll use BigQuery for our sample data and to run the SQL to create the scores. We’ll then use Python running in a Jupyter Notebook. Any relational database could be used, as could any programming language that can read/write to this database. However, we are fans of BigQuery, and a Jupyter notebook is an easy and familiar way to demonstrate the allocation process. We’ll also use a small number of children and presents to demonstrate how the process works – but the algorithm can scale to much larger datasets – especially if BigQuery is your database of choice.

The categories of presents that we have available are ACTION_FIGURES, CRAFTS, BUILDING, DOLLS, EDUCATIONAL, TECHNOLOGY, PUZZLES, SOFT_TOYS, MODELS, SPORTS and  REMOTE_CONTROL.

Let’s start with the children. Each child can select the types of presents they do/don’t like. The code repository accompanying this article explains how we generate this sample data, but a snapshot of the data looks as follows:
Then we need some presents for the children. Each of these presents belongs to one or more of the specified categories, as well as having a given Value – High, Medium or Low. A sample of this data looks as follows:
Now we need every combination of children and presents using our cartesian (or CROSS) join:

Likes vs. Dislikes

Using this cartesian data, we can now apply a Dislike Score for each combination of child/ present. We do this by looking at all the categories of present the child doesn’t like, against the categories of present for a given child/present row. We count how many matches of dislikes we have for present categories – and multiply this by a large negative score. The data looks as follows:

We can see here that for child ID #1 (Adrian) and present ID # 5, there is a match on MODELS (i.e. Adrian has said he doesn’t like presents of type MODELS and present # 5 is of type MODELS. We therefore have a single match count (1) and multiply this by a large negative number (-9999999) to provide our Dislike Score.

We do exactly the same to define the Wish Score for the children’s desired presents. However, we will multiply the count by a much smaller positive number (10, say) to ensure the dislike scores (if present) are always higher than the wish scores i.e.
So, for child ID #1 and present ID #1, the child likes ACTION_FIGURES and CRAFTS, and we can see the present is of type CRAFT. Hence we have a match count of 1 and we multiply this by 10 to give us our Wish Score. For present ID #2, we have a match on ACTION_FIGURES, giving us a Wish Score of 10 again, and so on.

Process the Values & Pass the Eggnog

We can now start to process the Value allocated to each present. We use this information in two ways.

  • Give High value presents a high score, with Medium a lower score and Low the lowest score. As every parent knows, the most expensive presents are the most desirable.
  • As desirable as High value presents are, we have to take into account stock levels of all presents, as well as being fair and trying to give presents of all values to the children. We can use these factors to provide a Diversity Score for each present based on its value. The more presents we have of a given value, the higher its score – which in turn means it’s more likely to be allocated.
Both of these scores can be applied to each present – although they won’t take into account the suitability of the present for the child – this is handled by the Wish Score.

So, how to go about obtaining these Value and Diversity scores? Let’s start with the Diversity score. This is can be calculated as follows:

(Number of presents in Value / Total Number of Presents) * Arbitrary Multiplier

We can see how this data looks in our required format as follows:

For the first row, we have 9 presents of High Value out of a total of 20 presents overall. Therefore our diversity score for any presents allocated to child #1 that are of High Value is 4.5 (9/20*10)

To ensure that High Value presents are treated as being more desirable, we can simply add a score for each present value, so Low = 1, Medium = 2 and High = 3. This means that Higher Value presents will be placed further up the desirability list of presents for the child – as the score for those presents will be higher.

Finally, we want to collect all of these scores – for both Exclusion and Weighted Rules – and sum them altogether to get a final grand total score for a given child/present combination.

Note: the dislike & like categories for the children and categories for the present are also available, but have been omitted here for clarity.

We sort by the score in descending order, so the best matches are listed first. For a limited number of categories and business rules – as per this example – a lot of the values will be the same – in which case additional sort fields (i.e. the child ID and present ID here) will be required in order to make the process deterministic and repeatable. You can randomise the sort – but it can make tracking down errors hard – as it can be difficult to repeat the process that caused the error in the first place.

Allocating Presents. You shouldn’t have!

Now we progress to the allocation process using all these scores – something we’re going to do in code. Let’s start by demonstrating the process using ‘pseudo code’. We already touched on how we’d go about this earlier, but the detailed process will look like this:

				
					# For every child/present score (sorted by score in descending order):
        
        # Get the child, present and score

        # If there is a negative score (or the child has been naughty), don't allocate,
        # otherwise continue

        # Check to see if we've give out all of these type of presents (continue if not)

        # Check to see if the child has their quota of presents (continue if not)

        # If we're allocating the present, then

            # Decrement the stock level for this present
            # Decrement the number of presents the child is now entitled to

        # Keep a reference to the allocation (including non-allocations, so we can 
        # check why)
        # Go back to the beginning of the loop and repeat.
				
			
The Python code corresponding to the pseudo code above is included in the accompanying project. We use this allocation process to create a new table called allocations (predictably) that contains all the child/present combinations along with whether that child received the present. For every child/present combination, as part of the allocation process, we want to collect the following information to help us understand why a present was allocated to the child. i.e.

  • Include – a boolean flag indicating whether the present was included in the child’s Christmas Stocking.
  • Exclude Present – if a present was specifically excluded due to an exclusion business rule (a negative score) or if the child was on the naughty list.
  • Present Allocation Exceeded – this will be true if we’ve run out of this present to give to the child (as other children were luckier and received it first).
  • Child Allocation Exceeded – if the child already has all the presents they’re entitled to (every family member has been catered for).
We now have enough information not only to provide Santa with his list of presents for every kid, but also to explain to him why each child was (or was not) given a certain present – handy when replying to those naughty children’s letters that arrive in the new year asking why they missed out. The list of children and presents including (some of) the available data used to provide the allocation is as follows:

We can use this information (along with additional data that has been omitted here for clarity – such as all the business rule scores) to help us understand why a child was allocated a given present.

However, the list that Santa wants on Christmas Eve is as follows:

Where to Next? Infinity and Beyond!

This example is intentionally trivial to demonstrate how straightforward logic can be used to solve an allocation problem. There are many other considerations as the problem scales in terms of volumes of data and complexity, such as:

  • Catering for missed allocations – that is children, that didn’t receive a present (due to low stock levels, for example).
  • Dealing with the volume of data caused by the cartesian join of children/presents – both when generating the scores using SQL and executing the allocation process.
  • Using more advanced techniques, such as Machine Learning, in order to provide a more accurate score for each child/present combination. For example, the machine could interrogate the presents each child has been given in previous years and use this information as input to help with its recommendation.

The complete set of accompanying code to demonstrate this allocation process can be found at:

That’s a Wrap

Santa is relieved (Mrs Clause is happy Santa is no longer stressed and/or drunk on schnapps), and the Mondo Ventures Team are all going to bed as it’s way past their bedtime.

Merry Christmas everyone, and if you have a Christmas nightmare you’d like help with, then give Mondo a shout. We’d love to hear from you.

Ready to get stuff done?
We’d love to chat.

Drop us a note and we’ll be in touch as soon as we can be.