How to Analyze Survey Data
The vast amount of data produced by survey methods can be difficult to wade through. In order to gain the most valid and valuable results in any survey, it is important to know how to analyze survey data in a proper way. This can be an intimidating task initially, but one that will become a lot clearer once you get to grips with a few key concepts!
Before launching into a discussion on the actual analysis we emphasize the importance of the questionnaire itself, as a faulty questionnaire cannot be saved by even the most accurate analysis!
In order avoid this make sure to put some thought into designing your survey questions and take care to avoid the common survey pitfalls.
Analyzing the data involves a number of considerations:
1) Suitable statistics package
Import your survey results into a suitable statistics package while using a well-recognized statistical tool to analyze your survey results. This is a very basic, but key step. SPSS is one of the most widely used statistics packages when it comes to survey-based, behavioral research, however, Excel, MATLAB and R are also suitable.
2) Data validation
As a research method that relies on human responses, surveys are bound to be susceptible to human error. This most commonly results in incomplete or incorrect answers. Missing data, for example, is one of the most commonly- encountered issues with survey-based methods.
If you use a survey-design program ensure it contains some data validation features. For example, a simple but effective feature is disallowing impossible or invalid answers on certain items. Other useful features are required responses or date formats.
3) Response Formats
The response format selected will influence how you should analyze your data.
Four of the most common response formats/ data types include:
a ) Categorical Data:
This is sometimes known as ‘nominal’ data. Categorical data is sorted, predictably, into categories.
For example, a researcher might want to look at differences in consumer trends between those who have children and those who do not.
Categorical data is among the most simple to analyze with percentages commonly reported in the results section. In order to do so, the total number of responses per category is divided by the total. This can be done manually for smaller studies but most statistics packages will do this for you automatically. The cumulative percentage of each category should amount to 100% in order for results to be accurate.
The most common method of reporting the analyzed categorical data is cross- tabulation. If you’d like to see examples of this then check out the following link.
b ) Ordinal data:
Ordinal refers to items that have a particular order on a scale. In relation to surveys, ordinal data is often that which is gathered using a rating or “Likert” scale. If you’ve ever completed a survey where you were asked to rate your answers from “Strongly Disagree” all the way up to “Strongly Agree” then you’ve come across this format!
These types of questionnaires are among the most commonly used but there is disagreement as to how ordinal data should actually be treated once it comes to the analysis stage.
Many people convert responses to numbers and then calculate the average.
Take this example:
You ask customer’s whether they like your new website with 0=Strongly Disagree, 1= Disagree, 2 = Neutral, 3 = Agree and 4 = Strongly Agree.
The reason some statisticians take issue with this method is that grants the difference between each point on the scale equal weighting. It is not really logical, however, to say that a “Strongly Agree” response is worth two “Neutral” responses.
Such methods can lead to distorted results!
In order to avoid this, the simplest way is to treat the data like categorical data (See above).
c ) Interval data:
Interval data is data where the distance between each set of values is meaningful and ordered. An example of interval data might be 10 years, 20 years and 30 years.
If you are using interval data make sure the intervals are sized equally. Otherwise, you will find it difficult to calculate averages or produce clear graphs.
A foolproof method of summarizing interval data is in the same manner as ordinal. If it is not equal then there are some more complex methods available. For an in-depth discussion have a look at this article.
d ) Ratio data:
The last of our response formats and the most complicated is ratio data. Ratio data is that which can have an absolute zero such as years of work experience or length i.e. cm.
As ratio data has a baseline zero as well as set meanings for intervals it is acceptable to use averages.
They are also suitable for using the standard deviation. This is basically the average distance from the mean.
Ratio data will allow you to answer questions such as, “What is the average amount of times customers visit my website per month?”
Once you have a grasp of how the different data types and how your response formats influence the type of analysis you should choose, you will be well on your way to mastering survey analysis.
What is the takeaway?
- Remember, a survey is only a snapshot into the behavior of a particular population, so if at all possible survey repetitions are a great bonus!
- Hone in on your hypothesis question that you were explicitly looking for at the start of the project. Don’t get bogged down under the amount of data you have and stay focused!
- The most important findings are those that are the most obvious and have the most statistical evidence. Don’t start looking for “more interesting” findings that have weaker statistical backing.
- Choose the appropriate analysis type for your reporting method and don’t fall into the trap of treating ordinal data like interval data.
With these hints under your belt, you will be much better prepared to take on that large survey dataset.