When originally created in 1968, SPSS stood for Statistical Package for the Social Sciences, but this data analysis technique has since been widely adopted by many other fields too.
SPSS can provide some robust and exciting data manipulation options, but only if you’ve set your survey up to take advantage of these capabilities. For those looking to get more advanced insight into the results of their survey SPSS can be a great tool.
I’ve been working with SPSS for years, and I’ve seen just about every possible mistake in creating an SPSS-compatible survey. My hope is that you’ll be able to sidestep these all-too-common problems with this basic guide to survey variable and SPSS.
Here I’ll cover some of the core uses of SPSS syntax, including:
- Renaming variables
- Adding and changing Variable labels
- Adding, changing, and updating Value labels
- Creating and manipulating variables
Laying the Groundwork: Survey Variables and Values
Survey variables are the names that you give to individual fields (like responses) in an SPSS data file. For example, if you were asking people to select movies they like (select all that apply) you might offer a list like this:
Butch Cassidy and the Sundance Kid
The Devil Wears Prada
You Only Live Twice
Most survey tools would generate the corresponding variable names:
To make reporting and analysis in SPSS easier you could adjust the variable names for these responses to be something like:
Variables are just that – variable data points that you can adjust based on your SPSS analysis needs. Generally speaking they are not used in reporting. They are what you will see when you are trying to run your analytics. The variable labels are what will actually show up in the report.
Survey values, on the other hand, are individual pieces of data. In this example there would probably be a zero or a one in the data set indicating if the respondent had selected it or not. A value label of zero would typically be labelled “unselected” while the value of one would be labeled "selected." Value labels can handle nearly any Unicode symbol, thus they are much more flexible and can help your report look significantly better.
Anytime you open a data file in SPSS and examine the data you are looking at the variables (think of columns) and values (cells in each column).
You are probably used to manually clicking on the data view and changing the variable name or updating the labels. Unfortunately in the survey research of SPSS world we are frequently re-exporting our data and then having to manually update the data file to our liking.
Using syntax like the ones outlined below allows you to make your changes once and just re-apply them to newly exported data files.
Now that we’ve got the terminology right, let’s jump into exactly how to configure your survey variables so your SPSS analysis can give you all the information you’re after.
(You can also see a detailed walkthrough of why it's important to learn SPSS syntax in this video.)
SPSS variables follow a very particular naming convention. The first character must be one of the following characters: @, #, or $. and the following characters can be nearly any combination of letters, numbers, non-punctuation characters, and a period "."
For reasons that are beyond the scope of this post, the best advice I can give you is to start your variable with a letter.
And remember, you cannot begin your variable name with a number!
Some of the most-common characters I see people trying to use in variable names (all of which are not allowed) are:
,' - ! " $ % & ( ) * ... / : ; ? [ \ ] ^ | + < = > © ® ™
The following command is the syntax which you would use to rename a variable entitled: Fake_Var1
Rename Variable (Fake_Var1 =New_Var1).
To rename more than one variable at a time you can use this syntax:
Rename Variable (Fake_Var1 =New_Var1)(Fake_Var2=New_Var2).
Renaming variables takes affect immediately after you run your command.
(There are some procedures that do not, but don't worry about that for now.)
Adding To or Changing Variable Labels
Now that we’ve covered variable naming conventions, it’s important to also get an understanding of how to add more information to an existing variable label (or change one completely).
Adding to or changing a Variable's label is straightforward.
Here is the syntax you would use to add (or change) the label to our Fake_Var1 used above:
variable label Fake_Var1 "My Fake Variable".
If you wish to add to or change multiple variables at one time, simply add them on the following lines (but don't do it include the main command). Here is an example:
variable label Fake_Var1 "My Fake Variable"
Fake_Var2 "My Second Fake Variable".
Adding to or Changing Value Labels
Remember that there’s an important difference between a variable label and a value label. Think of the variable label as the overall question being asked, while the value labels are the answers.
For this example let's assume you have a variable entitled Age_Group and, in your survey you had the following options:
To add or change the value labels you would use the following syntax:
value labels age_group 1"<18" 2"18-26" 3"27-50" 4"51+".
If you wanted to update just one value (and not overwrite all the others) you could use this syntax:
add value labels age_group 1"Teenagers- ugh!".
In the above scenario it probably isn't worth "updating" a single value rather than rewiring them all; however, when you have dozens (or hundreds) of value labels, this comes in very handy.
Creating and Manipulating Variables
One of the best ways analysts can add value is by finding new ways to examine data. There are many ways to change data that allow for further, more specific, analysis.
Here are a few of the common ones that I do regularly using the techniques we’ve been dicussing:
- Create variable that summarize others (Average, Sum, etc.)
- Create interval variable from a well grouped categorical variable
- Create bimodal variable from categorical variable
- Regroup categorical variables to increase size of sub-groups
- Regroup several variables into one
- Mathematically transform the variable (Log, Square, Square root, etc.)
- Set specific values to missing so they do not get used in analysis
- Replacing missing values with a valid value
I always find examples are much easier to follow, so let’s walk through one now.
Deeper Insight Into Soda Buying Behavior
Let’s say you’re a soda company and are trying to understand how many brands customers consider before purchase, commonly referred to as the, "consideration set."
For sodas, the consideration set might look like this:
When considering your next soda purchase, which of the following brands do you consider? (select all that apply)
- Diet Coke
- Diet Pepsi
- Dr. Pepper
- Mountain Dew
Typical reports provide the percent of each choice selected by respondents.
While this is interesting, it is also helpful to understand other basic statistics regarding customers’ purchase patterns.
An easy option is to setup a "sum" variable that adds-up the total number of sodas selected by each respondent. In this case the minimum is zero (unless you required an answer), and the maximum is eight.
Competition Data Through Survey Variables
Reviewing the mean, median, and mode of the sum variable would then provide useful insights into consumers’ attitudes and behaviors.
For example, if the average number of sodas selected was 1.2 that would mean consumers don’t stray often from their preferred brand. On the other hand, if the average number of sodas selected was 6.5 the market would appear to be more fragmented and competitive.
In the example above, assuming the data is coded as 0=Not selected and 1=Selected, and the variables are sequential, the code could be as simple as this:
Compute Consd_Set_Sum=sum(Coke to Other).
The compute procedure is not "hot," meaning that it doesn't actually do anything until you follow it up with an "Execute" command.
This seems odd, but when dealing with large data sets it is very helpful to not be reading and writing to your data set every time you use the "Compute" command.
The above code would create a “sum” variable that adds up the total number of sodas selected. So if someone selected Pepsi, Diet Pepsi and Mountain Dew, a value of three would be found in the variable.
Better Clarity Through Variable Labels
After the sum variable has been created, you should add a better label than the one from the previously mentioned statistics.
Here is the syntax that would do so (plus the minimum, maximum and standard deviation):
variable label Consid_Set_Sum “Number in Consideration set”.
Frequency Variable=Consid_Set_Sum /STATS=StdDev Min Max Mean Median Mode.
SPSS Syntax and Open Ended Questions
Another great use of syntax is to help code open-ended questions, which are notoriously difficult to analyze.
In the above example respondents that selected "Other__________" might have written in an option. Let's say you saw a lot of people that wrote in variations on the answer, "Mr. Pibb."
The following syntax is a great way to create a variable that will allow you to quantify how many respondents wrote it in. Below the second line is the one that looks for "PIBB", while the one before it sets all respondent values to zero:
If INDEX(Upcase(Other_Write_In),Upcase("PIBB"))>0 Mr_Pibb= 1 .
After running the above code, if the word "PIBB" appeared anywhere in the Other_Write_In variable, there would be a 1 in the
The above code looks odd, but that’s because it has case insensitivity built into it.
This means that you don't have input your variable in all upper case letters before checking. And even if you entered "pibb" for the text to find, it would still find "PIBB" because the command told SPSS to look for the uppercase version of "pibb" before the search.
Incidentally, if you did want to uppercase the Other_Write_In variable, below is the syntax:
And to compliment that, you can lowercase the variable with:
SPSS Syntax: Worth the Trouble
It may seem difficult at first, but setting up your syntax can give you much deeper insight into your survey data in a much shorter period of time. If you are interested in learning more you can review this post, which demonstrates three ways to obtain the syntax generated from SPSS when using the GUI interface.
Do have you have burning questions about SPSS? Give us a shout in the comments and Joe might answer them in a future article!
Joe Glines is the co-founder of the-Automator, a small company that specializes in automating reporting and daily tasks. He is an expert at SPSS as well as market research, and will be bringing his expertise to the SurveyGizmo blog on a regular basis.