The Statistical Validity of The 2015 GPC Member Survey

Open Letter To The Board, Nextdoor, September 2017

To The Tahoe Donner Board of Directors:

I understand that the statistical validity of the survey the General Planning Committee commissioned in 2015 may be a topic of discussion at an upcoming Board of Directors meeting. I have had the opportunity to review that survey’s methodology and its crosstab data, and though I do have a specific concern about two questions, overall I believe the survey is statistically valid and quite robust. I see no reason for the Board to discount it, and I believe it would be unreasonable to ignore it.

I became interested in this topic because I read and participate on Tahoe Donner’s Nextdoor forum. The 2015 survey comes up occasionally in discussion there, and there are a vocal few Nextdoor participants who believe the survey biased. I have a decent (if rusty) knowledge of statistics, and some minor experience with political survey design from many years ago. Remembering enough to judge the basics, I was curious why some thought the survey biased. More than once I asked those vocal few to enumerate the specific biases they saw, but my inquiries went unanswered. Finally, Jeff Connors passed along some concerns that members had conveyed to him.

As he had heard it, members were concerned about bias in two forms. First, because the survey was not sent to all homeowners, there was concern about sample bias. Second, there was concern about question bias, specifically for questions C1 and C2 because they surveyed a predetermined list of improvement priorities instead of asking members to name their priorities in an open-ended fashion. As such that list, it was believed, privileged specific projects. My interest piqued, I contacted Michael Sullivan at the GPC who provided me with the survey’s crosstab data.

Sample Size

The fact that the survey was not sent to 100% of Tahoe Donner homeowners, and was certainly not completed by 100% of Tahoe Donner homeowners, is not a legitimate statistical complaint. The point of a statistical survey is to study a whole population by means of a sample population. There is no need to contact 100% of a population to derive robust, statistically valid data. Hence, the legitimate question is not whether 100% of Tahoe Donner members received or took a survey, but rather whether the survey polled a random sample of Tahoe Donner members in sufficient quantities to be statistically valid.

For the purposes of the 2015 survey, the whole population was composed of approximately 6,350 individual Tahoe Donner properties. The research company that conducted the survey originally set out to sample 500 to 1,000 properties. However, according to the December 2015 TD News, “The response rate was approximately five times the normal rate for online surveys,” and they well exceeded their quota. In the end they surveyed 1,490 individual properties. There is zero doubt that they sampled a sufficient quantity of properties to yield statistically robust data. Given the population size and the sample size, the math suggests that we should be 99% confident the true number is within plus or minus 3% of the survey results. This, of course, assumes the sample was random, and that there was no bias introduced through the questions.

Sample Bias

Quantity alone is not enough. The sample must also be random. Provided the sample is large enough, as it certainly was in this case, random selection ensures that the sample is also statistically representative of the whole. Was the sample random, or at least sufficiently random to give us reasonable confidence that it was not biased in any significant fashion?

To think about this question we should examine the methods the research company used to contact Tahoe Donner property owners. Often this survey is termed an email survey, but this is not actually true. It was an online survey that solicited participants from three distinct sources. First, members were invited to participate in the online survey via an email announcement and link. Second, members were directed to participate in the online survey via various Tahoe Donner member outreach platforms, including the Tahoe Donner website, the TD News, and informational monitors at the amenities. Third, Tahoe Donner personnel conducted in person, intercept interviews with members at various amenities.

The email outreach efforts were wildly successful, leading to 1,042 interviews. Outreach via the website, TD News, and other platforms was also quite successful, adding 440 interviews. The intercept method was largely a failure, yielding a statistically insignificant 8 interviews.

How confident can we be that these methods yielded an essentially random, and thus statistically valid sample?

Tahoe Donner does not have email addresses for every single property within the Association. For statistical purposes this is perfectly acceptable, provided there is no systematic difference between those properties with and without email addresses known to the Association. While it might be reassuring to know that the survey reached most if not all properties, in the absence of clear evidence to the contrary, I would expect that properties with known email addresses do not differ in any systematic fashion from properties without known email addresses. As such, I would expect the sample derived via email contact was effectively random.

Do property owners who read the TD News, use the Tahoe Donner website, or view the other outreach platforms differ systematically from property owners who do not? Here again, in the absence of clear evidence to the contrary, I would expect these property owners do not differ systematically from other property owners. With these efforts as well, I would expect outreach was effectively random.

If, however, the email and online samples were biased in some way that we are unable to detect, it is highly improbable that both samples would have been biased in the same way. Therefore, combining two large samples derived from completely different outreach methods would also mitigate against sample bias.

We can also take reassurance from the fact that the researchers took steps to ensure that individual properties were surveyed only once, which also would have served as a safeguard against sample bias.

Is there any evidence at all for sample bias in the data? The best evidence I can find is that the survey may have slightly oversampled full-timers. Even here, however, I would urge caution. Question D2 asked survey takers to describe their use of their Tahoe Donner property, and it gave them three options: full-time, part-time, own a lot. In their responses, 77% self-identified as part-timers, 20% as full-timers, and 3% as lot owners. At the time it was believed that only 14% of Tahoe Donner properties were full-time residents. However, anecdotal experience tells me that not everyone uses the terms full-time and part-time in the same way. Indeed, I have met a number of Tahoe Donner members who say they are “full time,” but they really mean they are full time in the summer; Tahoe Donner is their summer, full-time residence. Are those full timers or part timers? I might say they are part timers, but if you surveyed them in the summer, I would expect they would self-identify as full-timers.

So I do not entirely trust self-reporting about full-time and part-time occupancy. My doubts about self-reporting on this point make it difficult for me to say for certain whether the researchers truly over-sampled full-timers, or whether there is a small population of objective part-timers who prefer to identify themselves as full-timers. These doubts aside, the researchers actually did weight certain findings (namely the ranking of investment priorities) to “correct” for this possible over-sampling. (In these rankings the researchers appear to have counted lot owners as full timers, for a total of 17% “on the hill.”) Provided the samples of part-timers and full-timers were random, this is a defensible approach.

Aside from the possible over-sampling of full-timers, there is no evidence to suggest that the samples were biased. Further, even if full-timers were over-sampled, that over-sampling was easily corrected by weighting the data according to known demographics, which is precisely what the researchers did to rank investment priorities.

Note: There will always be some selection bias in the specific sense that surveys like this one ask people to spend 10 minutes filling out a questionnaire, and there are many who simply will not do that. But are such people otherwise systematically different from those who willingly participate? Perhaps, but we have no way of knowing if they are, and if they are different, we have no way of knowing how they are different. Here as well, in the absence of clear evidence to the contrary, we must accept that the sample was sufficiently random to yield statistically valid results.

Question Bias

Were any of the questions formulated or sequenced in a way that might bias responses? I have read through the entire survey several times, and I see no obvious bias from the sequencing or the formulation of questions, except for question C1 and its related question, C2. C1 asked respondents to rate various improvements on a scale from very important to not important at all. In C2 respondents were asked to rank the improvements that they rated “very important” in C1. They could rank up to three “very important” improvements: most important, second most important, and third most important. The researchers combined responses in C1 and C2 to develop an overall, weighted ranking for the projects listed.

From what I gather from Jeff Connors, the complaint against C1 seems to be that it privileged 13 possible improvements to the exclusion of all others. Some noticed, for instance, that improvements to the golf course, undoubtedly a major Tahoe Donner amenity, were excluded from the list of possible improvement priorities.

However, I have read through old GPC minutes, and made a few inquiries to current and former GPC members, and I believe I understand how and why these particular projects populated this list. Question C1 was not intended to survey members generally about what Tahoe Donner’s improvement priorities should be. Nor was it intended to suggest new projects to the GPC. Rather, it was intended to survey members specifically about improvements that were either in the GPC pipeline at various levels of development and with various levels of seriousness (1-9), or had previously been suggested to the GPC by members (10-13). If improvements to the golf course were not featured on this list, that was simply because no improvements to the golf course were being considered at any level by the GPC when the survey was created.

This is not question bias. If survey respondents believed Tahoe Donner’s improvement priorities should be elsewhere, or if they believed Tahoe Donner should not be investing in improvements at all, then they would have responded “Not important at all” for each of the improvements listed. Further, members did have an opportunity to suggest new projects with questions B2 and D10, which were both open-ended.

While I do not agree that question C1 was biased because it surveyed only projects under a degree of consideration by the GPC, I do believe that the data gathered in C1 and C2 might have been subject to a social desirability bias. Social desirability bias is a type of survey response bias in which respondents exhibit a tendency to tailor their answers in ways that they believe will reflect favorably on themselves, both in their own eyes and in the eyes of others.

Options 3 and 4, invest in clean energy and conserve water, were quite unlike the other options in the list. The rest of the list pertained to recreation. (Even option 8 noted the recreational nature of Tahoe Donner’s adjacent open spaces.) Options 3 and 4 had no bearing on recreation. They were relatively non-controversial, non-amenity specific options pertaining to environmental responsibility. The inclusion of these options in the list might have biased the data against the listed amenity improvements in two ways.

First, it might have depressed the rating of leisure oriented amenity improvements in C1 because respondents may have been reluctant to rate leisure activities as more important or equally important compared with environmentally responsible investments. For example, compared to environmentally responsible improvements like conserving water and investing in clean energy, is improving the leisure experience at the Marina more or less important? A concern with social desirability would likely lead respondents to rate the leisure improvement lower. Further, respondents were primed to rate water conservation highly because of the drought, a possible bias that the researchers themselves noted.

The phrasing of the question was also problematic on this point: “Please rate how important each of these is to you.” Many do not want to appear to be the kind of person who is not environmentally conscious or water conscious, especially during a drought. The “to you” in the question invited precisely that sort of personal self-evaluation. I would be interested to know if people who received one or both of those options early in the random order tended to rank amenity improvements lower than those who received the options later. My suspicion is that they did, because they did not want to be seen by themselves or others as the kind of people who rate leisure higher than environmental responsibility. Such data, however, was not retained.

Finally, despite assurances at the beginning and end of the survey that responses would be kept anonymous, from the respondent’s perspective this survey was not at all anonymous. Participant names and contact information were taken for the purpose of dispensing prizes. While the lure of prizes almost certainly increased survey participation, the attendant lack of anonymity might have exacerbated a social desirability bias on questions C1 and C2.

Second, the inclusion of these socially desirable environmental improvements in the C1 list may have displaced and therefore depressed rankings for amenity improvements in C2. Possibly because of their high social desirability, many people rated options 3 and 4 “very important” in C1. Only improvements rated as “very important” in C1 were considered for ranking in C2. Here again, social desirability bias may have contributed to respondents ranking environmentally responsible improvements higher than leisure improvements. Further, because respondents were allowed to rank no more than three “very important” improvements, the inclusion of these socially desirable improvements might have displaced an unknown quantity of leisure amenity rankings from the C2 rankings, biasing both those rankings and the derived weighted ranking against amenity improvements.

For all these reasons, I am concerned that C1 and C2 along with the weighted rankings derived from them were biased against improvements to the listed recreational amenities. Critically, however, concern is not proof. It is not even evidence. We cannot know how members would have rated and ranked recreational amenity improvements without those two socially desirable options on the list. And we cannot know if social desirability truly played a role in the popularity and ranking of water conservation and alternative energy improvements. However, one lesson we can draw here is that future surveys should take care to avoid questions about social desirability bias by dealing with leisure and recreational improvements separately from environmental improvements.

Conclusion

How should we assess this survey, and how much weight should we give its findings? The sample size was tremendous, and there is no real evidence that the sample composition was biased in any significant fashion. To the extent that it may have been slightly biased toward full-timers, steps were taken to account for that bias in the rankings the researchers produced. With the possible exception of C1 and C2, the questions were not obviously biased, and even for C1 and C2 we merely have concerns of bias, not proof or evidence. Lacking any proof or evidence of bias, I must conclude that the survey is both valid and robust.

All surveys have flaws, and if you were to show me a perfect survey, I would tell you to look harder for its flaws. We cannot expect or demand perfection from surveys of this sort, and if we do expect or demand perfection, then we should not waste time and resources conducting them.

Given the absence of evidence proving bias, it would be unreasonable, irresponsible, and indefensible to invalidate or discount the survey’s findings. Further, it would be an insult to the Tahoe Donner members who made the effort and took the time to express their views through this survey, and a dereliction of the Board’s responsibility to listen seriously to the voice of the members, no matter how those members communicate – even when they write long and tedious letters.

What conclusions should we draw from this survey? The survey suggests that Tahoe Donner members are devoted to their trails and open spaces. They are environmentally conscious, and would like to be environmentally responsible. Trout Creek Recreation Center and the Marina both have substantial constituencies, and perhaps even larger constituencies than the survey’s results suggest. There is also a substantial, and perhaps underestimated constituency to expand Tahoe Donner’s amenities for children, possibly by making upgrades to the Northwoods Clubhouse. Tahoe Donner members are not, however, particularly interested in starting a community garden or building sports fields. Members are also in near unanimous agreement with the Association’s vision statement, and have a genuine desire to see the existing amenities and assets well-maintained and improved.

Finally, we would be wise to remember that surveys are not shortcuts through the difficult work of gathering and arraying information to make an informed and prudent decision. Their findings should always be considered alongside a wider variety of facts, opinions, reports, anecdotes, inferences, and considerations.