Archive for the ‘Evaluation Design’ Category

RKA Blog_Evaluation Design Series_LogoInterviews are a commonly used data collection method in qualitative studies, where the goal is to understand or explore a phenomenon.  They’re an extremely effective way to gather rich, descriptive data about people’s experiences in a program or exhibition, which is one reason we use them often in our work at RK&A.  Figuring out sample size for interviews can sometimes feel trickier than for quantitative methods, like questionnaires because there aren’t tools like sample size calculators to use.  However, there are several important questions to consider that can help guide your decision-making (and while you do so, remember that a cornerstone of qualitative research is that it requires a high tolerance for ambiguity and instinct!):

  1. How much does your population vary? The more homogenous the population, the smaller the sample size. For example, is your population all teachers? Do they all teach the same grade level?  If so, you can use a smaller sample size, since the population and related phenomenon are narrow.  Generally speaking, if a population is very homogeneous and the phenomenon narrow, aim for a sample size of around 10.  If the population is varied or the phenomenon is complex, aim for around 40 to 50.  And if you want to compare populations, aim for 25 to 30 per segment.  In any case, a sample of more than 100 is generally excessive.
  2. What is the scope of the question or phenomenon you are exploring? The more narrow the question being explored or phenomena being studied, the smaller your sample size can be. Are you looking at one program, or just one aspect of a program? Or, are you comparing programs or looking at many different aspects of a program?
  3. At what point will you reach redundancy? This is key for determining sample size for any qualitative data collection method.  You want to sample only to the point of saturation—that is, stop sampling when no new information emerges.  Another way to think about this is that you stop collecting data when you keep hearing the same things again and again.  To be clear, I’m talking about big trends here—while each interview will have its own nuance and the small details might vary from interview to interview, you can stop when the larger trends start to repeat themselves and no new trends arise.

The question of “how many” for qualitative studies might always feel a bit frustrating, since (as illustrated by the questions above) the answer will always be “it depends.”  But remember, as the word “qualitative” suggests, it’s less about exact numbers and more about understanding the quality of responses, including the breadth, depth, and range of responses.  Each study will vary, but as long as you consider the questions above the next time you are deciding on sample size for qualitative methods, you can be confident you’re approaching the study in a systematic and rigorous way.

Read Full Post »

RKA Blog_Evaluation Design Series_LogoSampling is a very important consideration for all types of data collection.  For audience research and summative evaluations in particular, it is important that the sample from which data is collected represents the actual population.  That is, the visitors who participate in a questionnaire or interview should match the entire population of visitors.  For instance, if the population of program visitors are 75% female, the sample should include approximately the same percent of females.  When the study sample and the museum’s visiting population are the same, the sample has external validity.  And when there is external validity, we can draw conclusions from a study’s results and generalize them to the entire population.


There are several protocols RK&A follows to work towards external validity.  First, to select study participants, we use a random sampling method, and most often, a continuous random selection method.  To follow the method, we instruct data collectors to position themselves in a designated recruitment location (e.g., museum or exhibition exit) and ask them to visualize an imaginary line on the floor.  Once they are in place, we instruct data collectors to select the first person who crosses the line.  If two people cross the line at the same time, we ask data collectors to select the person closest to them.  After the data collector finishes surveying or interviewing the selected person, the data collector returns to their recruitment location and selects the very next person to cross the line.  It is important for data collectors to follow this protocol every time so as not to introduce bias into the sample.  For instance, data collectors should not move the imaginary line or decide to delay recruiting because the person crossing the line looks unfriendly.


Second, we record observable demographics (e.g., approximate age) and visit characteristics (e.g., presence of children in the group) of any visitor who is invited to participate in the study but declines.  We also record the reason these recruited visitors provide for declining (e.g., parking meter is about to run out).  These data points are important to confirm or reject the external validity of the sample because we compare demographic and visit characteristics of those who participated in the study to the demographic and visit characteristics of those who declined participation.  While the data points for comparison are limited, they are still informative.  For instance, a trend we have observed is that visitors 35 – 54 years are most likely to decline participation, so their voices are often underrepresented.  The same goes for visitors with children, which may be a subset of those in the 35 – 54 year age group; they are often underrepresented in visitor studies.  Knowing where your sample may be lacking is important context when interpreting the results.


For these two reasons, we aim to systematically recruit visitors for audience research and evaluation studies.  Even for studies that use standardized questionnaires, we hire data collectors who use a random selection protocol to recruit participants and track information about those who declined.  As such, we do not recommend using survey kiosks to collect data since visitors self-select to complete the survey and cannot be compared to those who decided not to complete the survey (and if you think kiosks may be preferable because you could boost the number of surveys collected, see my former post on sample sizes).  Again, there are always some exceptions to these general rules described above.  Yet, our goal is always to use protocols that promote external validity as well as document threats to it…because what you don’t know can hurt you.



Read Full Post »

RKA Blog_Evaluation Design Series_LogoSample size is a standard question we are asked, particularly for questionnaires since we will be using statistical analyses. For most audience research projects, we recommend collecting 400 questionnaires.  We are not alone in this general rule of thumb—400 is considered by some researchers (and market researchers in particular) to be the “magic number” in the world of sample sizes.  What makes 400 magical is that it is the most economical number of questionnaires to collect (from most populations) while keeping the margin of error at ± 5% (and the confidence level at 95%).  A sample size of 400 questionnaires keeps the cost of the research down while still allowing us to have high confidence in the results. 


To dive into this issue deeper, let’s talk about the three primary factors necessary to think about when deciding on a sample size: (1) population; (2) confidence level; and (3) margin of error.  Population is the number of people in the group from which you are sampling.  For instance, your population may be the number of annual visitors to the Museum, members, or visitors to a specific exhibition or program.  A fact that is often enlightening and counter-intuitive is that population does not have a proportional relationship to sample size.  To demonstrate this, follow my calculations by trying out one of the many sample size calculators available on the web, such as this one or this one.  Let’s start by determining a sample size for surveying the National Gallery of Art, which reported nearly 4 million visits in 2014 (3,892,459 to be exact).   Using the margin of error ± 5% and 95% confidence level, the sample size suggested is 385.  By comparison, the sample size suggested for The Phillips Collection, which welcomed 106,154 exhibition visitors in 2014, is 383.  Despite vastly different sized visiting populations, the recommended sample size for each museum differs by just two!  Again, this example demonstrates that sample size is not proportional to the population, but also, having an estimate of your population is often sufficient to determine a sample size (unless you are determining a sample size for a program with small attendance or other small populations).


Confidence level and margin of error (or confidence interval), as you might expect, indicate the level of confidence or how “sure” you are about the results of the questionnaires.  Here, the researcher has to make a choice about an appropriate confidence level and margin of error based on how the data will be used.  At RK&A, we generally plan for the margin of error at ± 5% and a confidence level at either 90 or 95% because it provides enough confidence in the data given how our museum clients use the data to make institutional decisions.  If we were working with a medical professional making life-or-death decisions, we would want to be more confident in the results (thus, a lower margin of error and higher confidence level).  So why not plan to be as confident in the results as possible (regardless of how they are used)?  Money.  Confidence comes at a cost because, like population and sample size, the relationship between sample size and margin of error is not proportional.  For instance, see the graph below based on the population reported above for the National Gallery of Art.  Notice that the slope of the line is steepest on the left side of the graph and more gradual on the right side.  This shows the law of diminishing returns at play.  There are great benefits when moving from a sample of 200 to 400 (margin of error diminishes by about 2 percent), but the benefits are not nearly as great when moving from a sample of 400 to 600 (the margin of error diminishes by less than 1 percent).  Thus returning to our initial point, collecting more than 400 questionnaires is rarely prudent since the cost of data collection will be going up disproportionate to the reduction of the margin of error.  For our museum clients, we do not think that increase in confidence justifies the extra costs.

blog chart

I would be remorse to end this post without a footnote.   While 400 is our rule of thumb for audience research data being collected through a standardized questionnaire, there are certainly many considerations and reasons why 400 might not be the magic number in every case.  We joke that the response to any methodological question is the often frustrating retort: “It depends.”  Sample size is no different—it depends.

Read Full Post »

RKA Blog_Evaluation Design Series_LogoThere is a reason behind every methodological decision we make as evaluators. While we give great thought to our evaluation design, our thinking is not always transparent.


We have decided to pull the curtain back on our thinking in a new blog series that we are calling “Evaluation Design: A Peek Behind the Scenes.” Our goals are to reflect deeply on our practice (and maybe even question the way we do things), make our evaluation decisions transparent to our museum colleagues, and challenge our evaluator colleagues to reflect on and document their own practices. Check in tomorrow for the first post in the series!


Also, if you have ever wondered about why evaluators do the things they do, send us your queries! Our hope is that these posts can provide relevant insight into our evaluation minds.

Read Full Post »