#### Python in SPSS

Using Python in SPSS is great if you want to do any complex calculations, without having to leave the SPSS environment. Python is much more flexible than SPSS syntax, and it’s actually very easy to use. It is especially useful when you are collaborating with people who are not willing to do all their analysis in python (e.g. with Spyder), yet require complex data processing steps in their analysis – for example converting between colour spaces. The documentation online is actually pretty good, but I thought I’d post a very simple use case, converting a colour from sRGB to LAB colour space. Doing this in SPSS syntax would be very tiring, but in python I can just cut and paste code into a loop and it’s done. Here is the program:

 12 DELETE VARIABLES LAB_L LAB_A LAB_B. OUTPUT CLOSE *.

I start off with some syntax that deletes any variables with the same names as those I am about to create. This is useful especially when developing as you might run the script many times and don’t want to have to delete the variables manually each time.

 1234567891011 BEGIN PROGRAM Python. import spss spss.StartDataStep() datasetObj = spss.Dataset() # Manipulation of variables goes here! spss.EndDataStep() END PROGRAM.

This is the standard boilerplate code needed in most Python SPSS scripts. BEGIN PROGRAM and END PROGRAM determine the area in which you are writing python. You then ‘import spss’ and start a data step. Finally you get a dataset object which allows you to iterate through rows and perform manipulations

 1234 # Create the new variables datasetObj.varlist.append('LAB_L',0) datasetObj.varlist.append('LAB_A',0) datasetObj.varlist.append('LAB_B',0)

Here I create the new variables, initialised with 0. LAB is a colour space with three dimensions, L, A and B.

 1234567 # Get the variable names rIndex = datasetObj.varlist['r'].index gIndex = datasetObj.varlist['g'].index bIndex = datasetObj.varlist['b'].index LIndex = datasetObj.varlist['LAB_L'].index AIndex = datasetObj.varlist['LAB_A'].index BIndex = datasetObj.varlist['LAB_B'].index

In order to perform data manipulations you need the index of the variable in the dataset. I get all these indexes at the start and store them in their own variables.

 123456789101112 for idx, row in enumerate(datasetObj.cases):     r = row[rIndex]     g = row[gIndex]     b = row[bIndex]     if r >= 0 and g >= 0 and b >=0:         LAB = calc_LAB(r,g,b)     else:         LAB = [None, None, None]             datasetObj.cases[idx, LIndex] = LAB     datasetObj.cases[idx, AIndex] = LAB     datasetObj.cases[idx, BIndex] = LAB

Here I iterate each row (or case), pulling the R, G and B values into python variables, using the variable indexes (i.e. rIndex, gIndex and bIndex). Then, if each is above 0, I send them to a function calc_LAB (which I will define later). Finally I take the LAB values and put them into the correct place in the dataset.

You can see how powerful this can be, calc_LAB is actually a lengthy function that would be a real chore to program in syntax. Here is the full program with the function:

 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697 * Encoding: UTF-8. *Importing all of the data. DELETE VARIABLES LAB_L LAB_A LAB_B. OUTPUT CLOSE *. BEGIN PROGRAM Python. import spss spss.StartDataStep() datasetObj = spss.Dataset() # Create the new variables datasetObj.varlist.append('LAB_L',0) datasetObj.varlist.append('LAB_A',0) datasetObj.varlist.append('LAB_B',0) # Get the variable names rIndex = datasetObj.varlist['r'].index gIndex = datasetObj.varlist['g'].index bIndex = datasetObj.varlist['b'].index LIndex = datasetObj.varlist['LAB_L'].index AIndex = datasetObj.varlist['LAB_A'].index BIndex = datasetObj.varlist['LAB_B'].index def calc_LAB(R,G,B):     var_R = float(R) / 255.0     var_G = float(G) / 255.0     var_B = float(B) / 255.0     if ( var_R > 0.04045 ):         var_R = pow(( ( var_R + 0.055 ) / 1.055 ), 2.4)     else:                           var_R = var_R / 12.92;     if ( var_G > 0.04045 ):         var_G = pow(( ( var_G + 0.055 ) / 1.055 ), 2.4)     else:         var_G = var_G / 12.92;     if ( var_B > 0.04045 ):         var_B = pow(( ( var_B + 0.055 ) / 1.055 ), 2.4)     else:         var_B = var_B / 12.92     var_R = var_R * 100     var_G = var_G * 100     var_B = var_B * 100     X = var_R * 0.4124 + var_G * 0.3576 + var_B * 0.1805     Y = var_R * 0.2126 + var_G * 0.7152 + var_B * 0.0722     Z = var_R * 0.0193 + var_G * 0.1192 + var_B * 0.9505     var_X = X / 95.047     var_Y = Y / 100.000     var_Z = Z / 108.883         third = 1.0/3.0         if ( var_X > 0.008856 ):         var_X = pow(var_X,third)     else:         var_X = ( 7.787 * var_X ) + ( 16.0 / 116.0 )     if ( var_Y > 0.008856 ):         var_Y = pow(var_Y,third)     else:         var_Y = ( 7.787 * var_Y ) + ( 16.0 / 116.0 )     if ( var_Z > 0.008856 ):         var_Z = pow(var_Z,third)     else:         var_Z = ( 7.787 * var_Z ) + ( 16.0 / 116.0 )     L = ( 116 * var_Y ) - 16     A = 500 * ( var_X - var_Y )     B = 200 * ( var_Y - var_Z )     return [L, A, B] for idx, row in enumerate(datasetObj.cases):     r = row[rIndex]     g = row[gIndex]     b = row[bIndex]     if r >= 0 and g >= 0 and b >=0:         LAB = calc_LAB(r,g,b)     else:         LAB = [None, None, None]             datasetObj.cases[idx, LIndex] = LAB     datasetObj.cases[idx, AIndex] = LAB     datasetObj.cases[idx, BIndex] = LAB spss.EndDataStep() END PROGRAM.

1. 