Tuesday, January 22, 2013

Dealing with Nulls in Veera Transform Formulas

This next post comes from Jeff Fleischer, our Director of Client Operations, support wiz, and analyst extraordinaire: 

Working out the logic of a new variable you want to create with a TRANSFORM node can be challenging. But when missing data ("nulls") get into the mix, it can be especially confusing and frustrating. For example, if you'd written the conditional formula...

               IF ([A]='Freshman', 'UG', 'Grad')

...and some of the fields under column [A] were null, you would get nulls as an output for those rows rather than the desired 'Grad'. This is because trying to equate something with "nothing" confuses Veera as to what you would really want as a result. So here are some suggestions on how best to deal with those gaps and still get to the outcome you need...

1. 
The most obvious way to deal with gaps in data is to replace them with something. This may not always be desirable, but when it is, using a CLEANSE ahead of your TRANSFORM is your best bet. Select the "Is Missing" operator and use Alt-Left Mouse to select all the columns that need their data fields filled in with that new value, like 'unknown'. 

Of course, you could instead place a CLEANSE after your TRANSFORM, using it to fill in any missing values appearing in the new column. 

2.
If filling in those data holes using a cleanse is not preferable, maybe just a temporary patch will do. Look for the "Treat Missings in Formula as Zeros" checkbox just above the "New Variable Name" field in the TRANSFORM. Just as the name suggests, this will temporarily replace any missing data with a zero, allowing most operations to function. Be careful, though, if the column you're evaluating already contains zeros - the output may not be what you intended!

3.
If even temporarily replacing nulls with something else isn't an option, then change your TRANSFORM formula to deal with them ahead of everything else. To do this, you'll likely need to use one of two built-in Veera functions - IS NULL or IS NOT NULL. We might change our example to include another condition, such as...

IF ([A] IS NULL, 'Withdrawn', 
IF ([A]='Freshman', 'UG', 'Grad'))

The idea here is to catch any nulls before they affect the rest of the logic by putting that condition first. 

4.
Finally, another (if more specialized) option might be to use the "Missings:" TRANSFORM feature. Unlike the "Treat Missings in Formula as Zeros" checkbox, this control changes nulls that appear as the final result of a formula. The replacement options offered by this feature are limited (0 or 1), but it may be an easy way to fix a problem with absent data appearing in a new numeric field. 

-Jeff Fleischer

No comments:

Post a Comment