It’s the sample

Ziad Salim ; A retired international civil servant

JAKARTA POST, 15 Juli 2014

In a previous article published in this newspaper (“Samples and souls: hazards of election prediction in Indonesia”, April 24), the difference between surveys and quick counts and the often less than accurate results of the former were clarified.

Surveys use hypothetical voters in their samples while quick counts use real voters, hence, the latter are more accurate.

But now we have a case where even the quick counts are disagreeing with one another.

First, the culprit is not in the methodology or statistics; they all use the same and correct methodology and statistical techniques.

If the results are all the same (or moving in one direction), the difference in percentage points obtained can be gleaned from the margin of error and/or confidence level.

If the differences are very low (say 50 percent for one against 48 percent for the other with plus 2 percent margin of error), it simply means no valid inferences can be drawn, that is all.

But if the results are topsy-turvy like they are now (where half of the survey results pointed to pair Number 1 and the rest to pair Number 2), then the culprit is always in the samples where they can be biased, under-representative or even unrepresentative and this can happen as the result of sample manipulation. Samples can be manipulated not only in many ways but very easily too.

At the extreme end, a geek with a laptop in a room by himself can come up with a survey result all based on data mined in his own head.

And a real example of this showed up a few weeks ago where one clever person actually cut and pasted the survey results of the previous Barack Obama and John McCain campaigns in the US and tacked it on to CNN website, with many lapping it all up. Even though the hoax was eventually discovered, the damage had been done.

If the person wants to make his data a little bit more legitimate, he can collect a few, real pieces of data from outside.

If he collects too little (say, less than 1,000), his sample will be under-representative or too small relative to the population.

But all the quick counts collected huge data for this election (between 2,000 to 8,000), which means the divergent result did not come from sample size.

So, where did the divergence come from? First, it is ironic that the survey outfits were split between what have been described as “generally credible” and those suspected of leaning toward one candidate whose supporters have been accused of doing some “black” campaigns where they would do anything to win.

So, what would stop them from manipulating their data, where it is almost impossible to tell which data is legitimate and which is not?

As stated above, the extreme form of data manipulation is where it is cooked up in one room (the data is fictitious, but in a country where 600,000 “fictitious cows” could be imported from Australia, coming up with 2,000 “fictitious voters” would not be hard).

Second, they are collected from electoral districts known to be populated by voters supportive of one group (so that, if you collect most of your data from West Java, the stronghold of pair Number 1, you are bound to come up with pair Number 1 as the winner, due to a biased sample).

If you want to be more subtle and your data manipulation a little less detectable, you will collect your data from some remote areas too (say, in Papua) but only from locations where you know there are a lot of supporters of pair Number 1 (and this is easy to ascertain from the work of party members, field observers and election officials who can easily be paid if not bought).

The above possibilities explain why it is so easy to do a biased survey and to tailor-make the results as preordered by somebody.

The claim by some leaders of bogus survey outfits that payment by the subject ordering the survey does not influence the result is an outright lie or a form of public deception.

Even if you bare all your data (as one of the challenges made by one head of the suspected survey group), no one will be able to find any irregularity in your data especially if the field reports were sent to the survey centers using electronic means such as text messages via cell phones and the like, all unreliable and subject to manipulation.

It is strange, during the legislative election last April, when there was no national polarity or polarization in the choice of candidates or parties, all survey results had all their ducks in a row.

So the explanation for the divergent results in the recent presidential election (with only two and two polar opposite candidates running where one pair are determined to win by hook or crook), it is not that difficult to believe that the survey groups too had taken sides and data manipulation was the route they took, not so much to convince the public who won the election, but to create uncertainty, doubts and muddy the waters where they can hope to do their dirty fishing.

In the end, while the presence of survey institutes is legitimate in supporting democracy by providing early results so the powers that be can no longer manipulate elections, this time, sadly, some of them have entered the fray to subvert the process.

The problem is not that the people were split in the middle and could not or did not make up their minds cleanly about who to support in this election, but that the survey groups themselves have split in two, one actively taking sides, manipulating their samples, hiding behind the scientific statistical approach and taking advantage of democracy to subvert democracy itself, while assuming we are all stupid when in fact it is their samples that are stupid. ●

Budisan's Blog

Kamis, 17 Juli 2014

It’s the sample

It’s the sample

Tidak ada komentar:

Posting Komentar