Statistics and Data Analysis

Statistical data analysis is a tool to help us better understand the world around us and make sense of the infinite data with which we are constantly bombarded. In the business world, statistical data analysis can be used across the organization to help managers and others make better decisions and optimize the effectiveness and profitability of the organization. In particular, inferential statistical techniques are used to test hypotheses to determine if the results of a study occur at a rate that is unlikely to be due to chance. Statistical techniques are applied to the analysis of data collected through various types of research paradigms. However, although it would be comforting to assume that the application of statistical tools to the analysis of empirical data would yield unequivocal answers to aid in decision making, it does not. Without understanding the principles behind statistical methods, it is difficult to analyze data or to correctly interpret the results.

Mathematical statistics is a branch of mathematics that deals with the analysis and interpretation of data and provides the theoretical underpinnings for various applied statistical disciplines. Although statistical manipulation for the sake of learning more about stochastic processes or expanding the understanding of statistical principles is fine for theorists or the classroom, most people use statistics as a tool, a means rather than an end. In particular, statistics is a way to help us better understand the world around us and make sense of the infinite data with which we are constantly bombarded. To that end, in the business world, statistics tends to be used to organize and analyze data so that it can be interpreted and applied to solving business problems. Through statistical data analysis, marketing analysts can better predict future trends in the marketplace or understand how best to market to specific market segments. Through statistical data analysis, logisticians can better understand how to manage the supply chain so that it is both more effective and efficient, with supplies, raw materials, and components being received just before they are needed and products finished just before they are to be delivered in order to cut down on wasted time, money, and storage. Through statistical data analysis, engineers can determine ways to better control the quality of manufacturing processes or design products that will meet the needs of the marketplace while lowering costs for the organization.

The best way to perform these and other tasks is determined through the application of inferential statistics to the analysis of empirical data. Inferential statistics is a collection of techniques that allow one to make inferences about data, including drawing conclusions about a population from a sample. Inferential statistics is used to test hypotheses to determine if the results of a study have statistical significance, meaning they occur at a rate that is unlikely to be due to chance. A hypothesis is an empirically testable declarative statement that the independent and dependent variables and their corresponding measures are related in a specific way as proposed by the theory. The independent variable is manipulated by the researcher. For example, an organization might want to determine which of two new designs it should bring to market. The independent variable is the design of the product. The dependent variable, so called because its value depends on which level of the independent variable the subject receives, is the subject's response to the independent variable -- in this case, whether people prefer Design A or Design B. The researcher may set up an experiment to test the hypothesis that one design is preferred over the other. The results of the analysis would give the company support for making an empirically based decision about which product to bring to market.

For purposes of data analysis, a hypothesis is stated in two ways. The null hypothesis (H0) is a statement that there is no statistical difference between the status quo and the experimental condition. For example, a null hypothesis about people's preference for the two new product designs would be that there is no preference for one design over the other. The alternative hypothesis (H1) would be that there is, in fact, a preference for one design over the other. After the hypothesis has been formulated, an experimental design is developed that allows the hypothesis to be empirically tested. Data is then collected and statistically analyzed to determine whether the null hypothesis should be accepted or rejected.

Statistical Methods

There are a number of different statistical methods for testing hypotheses, each appropriate for a different type of experimental design. One frequently used technique is the t-test, which is used to analyze the mean of a population or compare the means of two different populations. When one wishes to compare the means of two populations, a z statistic may be used. Another useful technique is analysis of variance (ANOVA), a family of techniques used to analyze the joint and separate effects of multiple independent variables on a single dependent variable to determine the statistical significance of the effect. Other statistical tools allow the prediction of one variable from the knowledge of another variable. Correlation coefficients allow analysts to determine whether two variables are positively related (e.g., the older people become, the more they prefer a certain brand of cereal), negatively related (e.g., the older people become, the less they prefer that brand cereal), or not related at all. Regression is a family of techniques that are used to develop mathematical models for use in predicting one variable from the knowledge of another variable. In general, statistical techniques can be applied to a wide range of business problems, including marketing research, quality control, prediction of marketplace trends or sales volume, and comparing the relative efficiency of the various operations in a multinational organization.

It would be comforting to assume that the application of statistical tools to the analysis of empirical data would yield definitive answers that would unequivocally indicate what decision should be made. Unfortunately, it does not. Without understanding the principles behind statistical methods, it is difficult to analyze data or to correctly interpret the results.

Limitations to Real-World Statistical Data Analysis

Even if these limitations could be overcome, there are also practical limitations to real-world statistical data analysis that need to be taken into account. As complicated as human behavior is and as confounding as extraneous variables can be, the data that is collected in a laboratory is pristine compared with the data that can be collected in the real world. There are several problems with the analysis of real-world data. First, it is virtually impossible to name and control every variable that might have an effect on the outcome of the dependent variable. In addition, it is frequently not possible to collect the data that is needed in order to test the hypothesis that one is trying to test. There are both practical and ethical considerations that need to be taken into account when collecting real-world data. For example, one cannot arbitrarily reduce the wages of half the employees in the company just to see if their job satisfaction goes down or their work product suffers.

If a hypothesis is poorly stated or the underlying theory is flawed, the statistical analysis may not be of use. Similarly, if the analyst does not understand how probability works, choosing the incorrect statistical technique can result in poor experimental design that yields spurious results. It must also be understood that the results of a statistical data analysis do not prove whether or not the hypothesis is true; they simply demonstrate whether there is a probability of the hypothesis being true at a given confidence level. For example, if a t-test or analysis of variance results in a value that is significant at the α = .05 level, this means not that the hypothesis is true but that the analyst is willing to run the risk of being wrong five times out of 100.

Opinion about the best way to make business decisions is divided between those who rely on statistical methodology and those who prefer to use their "gut" for making decisions. In reality, both approaches have advantages and disadvantages. Statistical methods are less prone to bias than are subjective judgments. In addition, statistics tend to be more reliable and can more efficiently make use of historical data. However, statistical techniques can only work with the data they are given, and sometimes sufficient empirical data is not available. Judgmental decision making can sometimes be useful to make reasoned judgments in such situations, based on the experience and insights of the decision maker. However, human error may make the analyst or manager more optimistic (or pessimistic) than actually warranted, trends or factors may be read into the data that are not actually there, or the effects of correlated variables may not be taken into account. In addition, not everyone has the experience or insight necessary to make reliable judgment-only forecasts.

In many situations, the best approach is to combine the objectivity of statistical analysis with the insight of human experience and judgment. Judgment is key to determining which pieces of data are relevant to designing a research paradigm. Judgment is also important in determining which statistical technique is most appropriate to analyze the data. There are a number of statistical techniques available for data analysis, and whether or not the end result will be of use depends on the choice of the right statistical tool. In addition, expert judgments can be helpful in aiding the analyst to understand the situation and give insight regarding the parameters in which the data and subsequent analysis should be interpreted.

Applications

Statistical data analysis allows researchers to interpret the results of research studies and determine the answers to real-world questions. The goal of research is to describe, explain, and predict behavior. For example, the marketing department may need to know which of two proposed new company logos will be most memorable and will have the most positive image in the minds of prospective customers, or the engineering department may need to determine which of two graphical user interfaces is more user friendly. Among the necessary steps for generating a useful research design are controlling the situation so that the research is only measuring what it is supposed to measure and including as many of the relevant factors as possible so that the research fairly emulates the real-world experience.

In the simplest research design, a stimulus (e.g., a new product design) is presented to the research subjects (e.g., potential customers), and a response is observed and recorded (e.g., which design they liked better and why). However, the real world tends to be more complicated than this, and three types of variables need to be considered. The variables of most concern are the independent variable (i.e., the stimulus or experimental condition that is hypothesized to affect behavior) and the dependent variable (i.e., the observed effect on behavior caused by the independent variable). However, these are not the only variables that need to be controlled during a study. As shown in Figure 1, there are also extraneous variables, which are variables that affect the outcome of the experiment (i.e., whether or not the people questioned like the new design) but have nothing to do with the independent variable itself. For example, if the person questioned does not like that brand of product, he or she may find none of their designs appealing. Similarly, if the person is in a hurry to get somewhere else or has something else on his or her mind, he or she may not give the alternative designs sufficient consideration to determine which is better. There are any number of such variables that are extraneous to the research question being asked but still affect the outcome of the research. One of the hallmarks of a well-designed experiment is that it controls for as many of the extraneous variables as possible. Although is impossible to control for every possible extraneous variable, the more of these that are accounted for and controlled in the experimental design, the more meaningful the results will be.

ors-bus-444-126395.jpg

Although laboratory research allows the most control over variables, it often is far removed from real life. As discussed above, it is important not only to control as many variables as one can when designing an experiment but also to have the experimental situation emulate the real-world situation as much as possible. For example, although Design A may work fine in a controlled laboratory environment, when the potential consumer is faced with the reality of using it at work while simultaneously sending e-mail, answering the phone, and sitting in a different chair, the design may not be workable. Therefore, in order to be able to extrapolate research results to the real world in a way that is meaningful, it is important to design an experiment that not only controls extraneous variables but is as realistic as possible.

Types of Research Techniques

There are several research techniques that can be used to investigate real-world business problems. As discussed above, the laboratory experiment allows the researcher the most control over extraneous variables. However, laboratory situations are far removed from the reality of how most people live their lives. Another approach to research is simulation. This technique allows the researcher to bring in more real-world variables but still control many of the extraneous variables. For example, people could be asked to perform a set number of tasks with the new product in an environment that simulates the way they would actually use it in the real world. Another approach would be to perform a field experiment in which people are given the product to try at work under the actual conditions in which they would use it. This approach has the advantage of being more realistic, but it also has the disadvantage of giving the researcher less control over extraneous variables.

There are other approaches to studying business problems that trade more realism for less control. One approach is field study, which is an examination of how people behave in the real world. For example, if both designs are already on the market, the researcher could observe what type of people buy each design and use this information to determine how to target the marketing of the product. This approach could be combined with another research technique called survey research. In this approach, subjects are interviewed by a member of the research team or asked to fill out a questionnaire regarding their preferences, reactions, habits, or other characteristics of interest to the researcher. For example, the researcher could ask each person buying one of the products a list of questions concerning what he or she is looking for in this type of product, where and when he or she will use the product, why he or she is replacing the old product, and what were the determining factors in buying the product that he or she ultimately decided to purchase. In theory, it is possible to develop a very detailed research instrument that could be used to collect all the data that the researcher needs for analysis. In practice, however, such detailed instruments are often lengthier than the potential research subject's attention span. In addition, unlike the other research techniques available, surveys and interviews and not based on observation. Therefore, there is no way to know whether or not the subject is telling the truth, and there are any number of reasons why he or she might not be (e.g., he or she did not have much time to answer the questionnaire, was not really interested in helping in the research, did not like the company, or wanted to please the researcher).

Conclusion

Performing a new research study is not the only way to obtain data for statistical analysis for decision making. Meta analysis and other secondary data analysis techniques allow researchers to analyze multiple previous research studies to look for trends or general findings. Existing data that the business has collected for other purposes or that is publicly available can also be statistically analyzed. Although these approaches do not give the researcher control over how the data was collected, they often can yield interesting results that can inform business decisions or simply add to the body of knowledge about a topic.

Terms & Concepts

Analysis of Variance (ANOVA): A family of statistical techniques that analyze the joint and separate effects of multiple independent variables on a single dependent variable and determine the statistical significance of the effect.

Dependent Variable: The outcome variable or resulting behavior that changes depending on whether the subject receives the control or experimental condition.

Hypothesis: An empirically testable declaration that certain variables and their corresponding measures are related in a specific way proposed by a theory.

Independent Variable: The variable in an experiment or research study that is intentionally manipulated in order to determine its effect on the dependent variable.

Inferential Statistics: A subset of mathematical statistics used in the analysis and interpretation of data.

Mathematical Statistics: A branch of mathematics that deals with the analysis and interpretation of data. Mathematical statistics provides the theoretical underpinnings for various applied statistical disciplines, including business statistics, in which data is analyzed to find answers to quantifiable questions.

Null Hypothesis (H0): The statement that the findings of the experiment will show no statistical difference between the control condition and the experimental condition.

Population: The entire group of subjects belonging to a certain category, such as all women between the ages of 18 and 27, all dry-cleaning businesses, or all college students.

Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that it will reflect the characteristics of the larger population.

Statistical Significance: The degree to which an observed outcome is unlikely to have occurred due to chance.

Variable: An object in a research study that can have more than one value.

Bibliography

Black, K. (2006). Business statistics for contemporary decision making (4th ed.). New York: John Wiley & Sons.

Davenport, T. H. (2013). Keep up with your quants. Harvard Business Review, 91(7), 120-123. Retrieved November 26, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=88351308&site=ehost-live

Karl, K. A., Mclntyre Hall, L., & Peluchette, J. V. (2013). City employee perceptions of the impact of dress and appearance: You are what you wear. Public Personnel Management, 42(3), 452-470. Retrieved November 26, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=90248809&site=ehost-live

Segers, J., & Inceoglu, I. (2012). Exploring supportive and developmental career management through business strategies and coaching. Human Resource Management, 51(1), 99-120. Retrieved November 26, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=70606612&site=ehost-live

Witte, R. S. (1980). Statistics. New York: Holt, Rinehart and Winston.

Suggested Reading

Bowerman, B. L. & O'Connel, R. T. (2005). Business statistics in practice (4th ed.). Columbus, OH: Irwin/McGraw-Hill.

Groebner, D. F., Shannon, P. W., Fry, P. C., & Smith, K. D. (2003). Business statistics: A decision-making approach (6th ed.). Upper Saddle River, NJ: Prentice Hall.

Levine, D. M., Krehbiel, T. C., & Berenson, M. L. (2005). Business statistics: First course (4th ed.). Upper Saddle River, NJ: Prentice Hall.

Purucker, C., Landwehr, J. R., Sprott, D. E., & Herrmann, A. (2013). Clustered insights: Improving eye tracking data analysis using scan statistics. International Journal of Market Research, 55(1), 105-130. Retrieved November 26, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=85098332&site=ehost-live

Essay by Ruth A. Wienclaw, PhD

Dr. Ruth A. Wienclaw holds a doctorate in industrial/organizational psychology with a specialization in organization development from the University of Memphis. She is the owner of a small business that works with organizations in both the public and private sectors, consulting on matters of strategic planning, training, and human-systems integration.