|
|
(14 intermediate revisions by 6 users not shown) |
Line 1: |
Line 1: |
| Professor '''John Brignell''' held the Chair in Industrial Instrumentation at University of Southampton (UK) from 1980 to the late 1990s. [http://www.ecs.soton.ac.uk/~jeb/cv.htm] | | Professor '''John Brignell''' held the Chair in Industrial Instrumentation at University of Southampton (UK) from 1980 to the late 1990s. [http://www.ecs.soton.ac.uk/~jeb/cv.htm] |
| | | |
− | Brignell retired in the late 1990's from his academic career and now devotes his time to his interest in debunking the use of false statistics common in much of today's media. He presents his views on his website ''Numberwatch'', which was launched in July 2000, and is "devoted to the monitoring of the misleading numbers that rain down on us via the media. Whether they are generated by Single Issue Fanatics (SIFs), politicians, bureaucrats, quasi-scientists (junk, pseudo- or just bad), such numbers swamp the media, generating unnecessary alarm and panic. They are seized upon by media, hungry for eye-catching stories. There is a growing band of people whose livelihoods depend on creating and maintaining panic." [http://www.numberwatch.co.uk/number%20watch.htm] | + | Brignell retired in the late 1990's from his academic career and now devotes his time to his interest in debunking the use of what he claims to be false statistics common in much of today's media. He presents his views on his website ''Numberwatch'', which was launched in July 2000, and is "devoted to the monitoring of the misleading numbers that rain down on us via the media. Whether they are generated by Single Issue Fanatics (SIFs), politicians, bureaucrats, quasi-scientists (junk, pseudo- or just bad), such numbers swamp the media, generating unnecessary alarm and panic. They are seized upon by media, hungry for eye-catching stories. There is a growing band of people whose livelihoods depend on creating and maintaining panic." [http://www.numberwatch.co.uk/number%20watch.htm] |
| | | |
| Brignell has expressed delight with the feedback from the "encouragement and support I have received from some of the giants of the pro-science movement in the USA -- in no particular order [[Steve Milloy]], [[Alan Caruba|Alan Coruba]] [''sic''], [[James Randi]], [[Bob Caroll]], [[Michael Fumento]] and [[S. Fred Singer]]." [http://www.numberwatch.co.uk/term%20end.htm]. | | Brignell has expressed delight with the feedback from the "encouragement and support I have received from some of the giants of the pro-science movement in the USA -- in no particular order [[Steve Milloy]], [[Alan Caruba|Alan Coruba]] [''sic''], [[James Randi]], [[Bob Caroll]], [[Michael Fumento]] and [[S. Fred Singer]]." [http://www.numberwatch.co.uk/term%20end.htm]. |
Line 22: |
Line 22: |
| | | |
| A number of popular, politically correct theories are based on unsound mathematical evidence. In choosing to debunk such theories Brignell occasionally provokes the wrath of self interested supporters. He seems to enjoy this. | | A number of popular, politically correct theories are based on unsound mathematical evidence. In choosing to debunk such theories Brignell occasionally provokes the wrath of self interested supporters. He seems to enjoy this. |
− |
| |
− | == Statistical Significance (P<0.05) ==
| |
− |
| |
− | When researchers conduct a study, they select a sample from a population, and compare the value of a statistic found in the sample with that found in the population. They ask the question: "Is the statistic in the sample significantly different to that in the population?" If it is, they look for a reason why. What does significantly different mean? Researchers start with a set of numbers. They apply statistics to those numbers to change them into a set of odds, the odds of getting the numbers by chance. If the odds are too long, they say "It's too unlikely to have happened by chance. Something must have caused it." How unlikely is too unlikely? That's shown by the significance level. A typical significance level might be (P<0.01), which means "the odds of getting this result were more than 100 to 1."
| |
− |
| |
− | Brignell suggests that one common source of error in experiments is the use of low levels of significance in statistical testing, particularly P<0.05.
| |
− |
| |
− | If one applies a statistical test at a level of significance of 0.05, that means that one is accepting a probability of 0.05 of a false positive; that is, one is accepting that there is a chance of 1 in 20 that the result will appear significant when it isn't. Note that these odds apply to all tests carried out, not just the ones that return significant results. This has been called the 1 in 20 lottery.
| |
− |
| |
− | This can cause problems when combined with publication bias. Studies that produce significant results tend to be published, ones that don't tend not to be. However, the number of false positives depends on the total number of studies carried out rather than the number published. In simple terms, if 1 in 2 studies produce significant results, then 50% of them will be published of which 10% will be bogus. If 1 in 5 studies produce significant results, then 20% of them will be published of which 25% will be bogus. If 1 in 10 studies produce significant results, then 10% of them will be published of which 50% will be bogus. How many studies produce significant results? What percentage of published studies are bogus? The numbers are simply not known.
| |
− |
| |
− | Another source of problems occurs when this level of significance is used in combination with categorisation. This occurs when the data in a study is broken up into a number categories, and a statistical test is applied in each category. The effect is that of having multiple goes at the lottery: the more categories, the better the chance of producing a false positive. If the test is applied to 1 category, the odds of getting at least one bogus result are 1 in 20 (0.05). If there are 4 categories, the odds are nearly 1 in 5 (0.185). If there are 8 categories, the odds are about 1 in 3 (0.337). If there are 12 categories, the odds are about 9 in 20 (0.46). If there are 16 categories, the odds are about 5 in 9 (0.56). The odds continue to rise with the number of categories.
| |
− |
| |
− | Combining these sources of error compounds them. If 100 studies are conducted at a significance level of 0.05, each categorising their data into 10 categories, and 50 of them are published, then 40 will be bogus and only 10 will have found any real significance.
| |
− |
| |
− | If one hears that a study has been published that has "found a link between thyroid cancer and eating chocolate in women between the ages of 20 and 30 who eat more than three pieces of chocolate per day", how likely is it that the study is bogus? If one assumes that data was collected from a randomly selected sample of people, one could hypothesise that the data was categorised by sex (male, female), age (less than 20, 20 to 30, 30 to 40, over 40), and number of pieces eaten (none, 1 to 3, more than 3), for a total of 24 categories. This would suggest that the basic chance of the study being bogus is about 70% Incorporating publication bias might raise it to over 90%. This is of course assuming that the level of significance used is 0.05, a safe assumption in most such studies.
| |
− |
| |
− | Brignell suggests that the use of a significance level of 0.05 is inherantly unsound. He suggests that "It is difficult to generalise, but on the whole P<0.01 would normally be considered significant and P<0.001 highly significant."
| |
− | [http://http://www.numberwatch.co.uk/significance.htm]
| |
− |
| |
− | Brignell suggests: "Many leading scientists and mathematicians today believe that the emphasis on significance testing is grossly overdone. P<0.05 had become an end in itself and the determinant of a successful outcome to an experiment, much to the detriment of the fundamental objective of science, which is to understand."
| |
− |
| |
− | == Fitting Linear Trends ==
| |
− |
| |
− | Brignell states "One of the major problems in using a finite sequence of data to represent a source that is effectively infinitely long is that the process of chopping off the ends is a distortion. [...] The other is the fact that, even when there is no linear trend in the original process, there is always one in the finite block of data taken to represent it."
| |
− |
| |
− | Picture the sine wave y=sin(x). It is a continuous graph centred around the x axis. The x-values, are in radians, and the y-values range between 1 and -1. Lets take some samples from the graph.
| |
− |
| |
− | *Sample 1: The points are (0.44,0.42), (0.52,0.50), (0.61,0.57), (0.70,0.64) and (0.79,0.71), and the fitted curve is y = 0.82 x + 0.07
| |
− | *Sample 2: The points are (0.79,0.71), (0.87,0.77), (0.96,0.82), (1.05,0.87) and (1.13,0.91), and the fitted curve is y = 0.57 x + 0.26
| |
− | *Sample 3: The points are (1.13,0.91), (1.22,0.94), (1.31,0.97), (1.40,0.98) and (1.48,1.00), and the fitted curve is y = 0.26 x + 0.62
| |
− | *Sample 4: The points are (1.48,1.00), (1.57,1.00), (1.66,1.00), (1.75,0.98) and (1.83,0.97), and the fitted curve is y = -0.09 x + 1.13
| |
− | *Sample 5: The points are (1.83,0.97), (1.92,0.94), (2.01,0.91), (2.09,0.87) and (2.18,0.82), and the fitted curve is y = -0.42 x + 1.74
| |
− |
| |
− | So, depending on which set of points are selected, the associated linear trend is either an 82% rise, a 57% rise, a 26% rise, a 9% fall, or a 42% fall!
| |
− |
| |
− | A sine wave is a continuous, cycling curve; it has no linear trend. If one were to try to fit a line of best fit to it, the best one could fit would be the x-axis, y=0. Yet none of these samples has given that line. It should be obvious that, depending on what data points are used, it is possible to get just about any line of best fit that one could desire. The linear trend found is there, not because of any underlying cause, but because it is generated by the act of selecting the data set.
| |
− |
| |
− | As an alternative example, consider a set of points generated at random. There can obviously be no linear trend to such data. And yet, a line of best fit can be applied to any subset of these points, and a linear trend deduced from it. Again, the linear trend found is there, not because of any underlying cause, but because it is generated by the act of selecting the subset.
| |
− |
| |
− | These examples illustrate the "greatest hazard of trend estimation. The trend is a property of the data points we have and not of the original process from which they came. As we can never have an infinite number of readings there is always an error introduced by using a restricted number of data points to represent a process in the real world. Naturally the error decreases rapidly as we increase the number of data points, but it is always there and, in fact, the calculated trend is never zero".
| |
− |
| |
− | It is not enough to fit a line to a set of points and declare a linear trend. One must understand the underlying data in order to know whether there actually is a real trend. It may simply be that the trend one sees is an artifact of the selection of data points, and doesn't actually exist in the underlying process.
| |
− |
| |
− | Brignell's comments can be found here.
| |
− | [http://www.numberwatch.co.uk/Trends.htm]
| |
− |
| |
− | == The End Effect ==
| |
− |
| |
− | Brignell states "A major problem is the end effect, which relates to the huge changes in apparent slope that can be wrought just by the choice of where to start the data selection." "The reason for this is that in the calculation of the slope, the contribution of each data point is weighted according to its distance from the centre", so variation in the end points has much more effect on the final result than variation in the central points.
| |
− |
| |
− | If we start with the nine points, (1,1), (2,2), (3,3), ..., (9,9), the slope of the fitted line is 1. (Obviously.) If we replace the points at (3,3) through (7,7) with (3,8), (4,8), (5,8), (6,8) and (7,8), the slope of the fitted line is 0.83. The change has flattened the line. If we instead replace the end point with (1,1) with (1,4), the slope of the fitted line is 0.80. The effect on the slope of changing just the end point is greater than that of changing five central points.
| |
− |
| |
− | Brignell offers the following as an example of the end point effect.
| |
− | [http://www.numberwatch.co.uk/2003%20May.htm#wheeze]
| |
− | The graph contains 15 data points displaying a strong linear trend. The removal of 2 points from either end removes the trend. So the trend is really only supported by those 4 data points.
| |
− |
| |
− | Brignell's comments can be found here.
| |
− | [http://www.numberwatch.co.uk/Trends.htm]
| |
− |
| |
− | == Trojan Numbers ==
| |
− |
| |
− | The term Trojan number was coined by Brignell to describe a number used by authors to "get their articles or propaganda into the media." "The allusion is, of course, to the mythical stratagem whereby the Greeks infiltrated the city of Troy inside a giant wooden horse." The number looks impressive, but on further examination isn't.
| |
− |
| |
− | Brignell states "The major form of Trojan Number is the size of study. Early on in the piece it will be mentioned that (to invent some arbitrary numbers) there were 60,000 people in the study. The number experiencing the condition in question, say toe-nail cancer, is, however, much smaller, perhaps 60. Of these the number indulging in the putative cause, say passive drinking, is even smaller say 20. There is a number expected (as a proportion of the 60) at random from knowledge of the statistics for the general population, say, 14. Thus the number that really matters, the excess number of cases, is half a dozen. It is surprising how often an apparently huge study whittles down to an excess that you can count on your fingers. If the number 6 had been mentioned at the outset, the claim would have been laughed out of court, so it is never mentioned, though you can often have a pretty good stab at deducing it. In the statistics of rare events an excess of 6 on an expectation of 14 would be unsurprising. The rest of the 60,000 are mere bystanders." In fact, finding an extra 6 would not be significant at P<0.05.
| |
− |
| |
− | Trojan numbers can be repeatedly presented in different ways. For example, "3.21% of passive drinkers are depressed" or "72.45% of women under 35 are unaware that passive drinking causes toe-nail cancer". In each case, after the headline and a couple of sentences, the body of the article is a repeat of the material already presented. The repeated presentation of the same material helps to lodge it into the public consciousness, as well as to raise the profile of the academic doing the research.
| |
− |
| |
− | Brignell notes "One of the most effective forms of Trojan Number is the Virtual Body Count. Sub-editors cannot resist a headline Thousands to die of X." The body count is of course obtained by scaling up the study to the size of the population. The US population is 295,734,134. For a study of 60,000 people, one extra case therefore equals a body count of 4929. So six extra cases is 29,573 bodies. If the next study finds only three extra cases, the body count halves.
| |
− |
| |
− | Brignell's comments on trojan numbers can be found here.
| |
− | [http://www.numberwatch.co.uk/trojan_number.htm]
| |
− |
| |
− | == Publication Bias ==
| |
− |
| |
− | Brignell states: "Publication bias is a tendency on average to produce results that appear significant, because negative or near neutral results are almost never published." Researchers submit their work for publication if it produces significant results; if it does not, they put it in the drawer and go on to something else. Editors publish papers that show that something has occured, not those that show nothing.
| |
− |
| |
− | Brignell suggests: "It is possible to estimate the effects of publication bias, as long as we are prepared to indulge in a bit of guess work about the willingness of researchers to publish for various claimed levels of Relative Risk."
| |
− |
| |
− | Assume that a rare disease (toenail cancer) affects one in every thousand people. Assume that a group of researchers decide to study the effect of a particular factor (passive drinking) on the incidence of the disease. Assume that the factor has no effect. If the researchers study 10,000 people, they would expect to see 10 cases of toenail cancer, which would correspond to a relative risk (RR) of 1.0. Of course, as the distribution of cases is random, there may be more or less cases, their number being governed by a probability density function.
| |
− |
| |
− | Brignell continues: "Now, the guesswork part is to invent a plausible function that would represent the willingness of authors to publish. Observing the literature in general, we know that they are all willing to publish a RR of 2.0 but almost nobody will publish an RR of less that 1.1." He suggests a simple function that meets these guidelines.
| |
− |
| |
− | Brignell multiplies the two functions together, ordinate by ordinate, to determine the probability density function for published data. The resulting function has an average value of 15.8. This corresponds to a relative risk of 1.58, or an increase of 58%.
| |
− |
| |
− | To put it another way, researchers would expect to find about 10 cases of toenail cancer in a batch of 10,000 people. However, as the number is random, they could find any number. They have to find a minimum of about 16 cases before they will publish their data. So in reviewing publications one would expect to see an average of 15.8 cases, not 10. This is not due to any underlying linkage between toenail cancer and passive drinking, but simply due to the selectivity applied in choosing what to publish.
| |
− |
| |
− | Brignell comments: "The headlines will, of course, say something like Passive drinking causes a 58% increase in toe-nail cancer whereas the result is entirely spurious. This is a major reason for never accepting RRs of less than 2.0."
| |
− |
| |
− | Brignell's comments can be found here.
| |
− | [http://www.numberwatch.co.uk/publication_bias.htm]
| |
− |
| |
− | == The Extreme Value Fallacy ==
| |
− |
| |
− | Brignell states: "If you take a number of samples of a random variable and put them in order of magnitude, the extreme values are the largest and smallest. These extreme values exhibit special distributions of their own, which depend on the distribution of the original variate and the number of ranked samples from which they were drawn. The fallacy occurs when the extremes are treated as though they were single samples from the original distribution."
| |
− |
| |
− | For example, choose some men and calculate their average height. You will find that about half of them are higher than that average. Pick out the highest. Marvel at how much higher he is than the average.
| |
− |
| |
− | For example, divide some people into groups based on their month of birth. Work out how many people in each group have toenail cancer. Calculate the average number across the groups. Pick out the highest. Marvel at how much bigger it is than the average. Publish a newspaper report: "People born in July are more likely to get toe-nail cancer". Hypothesise about the causes for this. Suggest that toenail cancer might be linked to sunlight, and that people should stay indoors during summer months. This form of the fallacy is very common, and is referred to as the birth month fallacy. Brignell notes that it "recurs in the media several times a year."
| |
− |
| |
− | For example, divide some people into groups based on their star sign. Work out how many people in each group are involved in car accidents over a set period. Calculate the average number across the groups. Pick out the biggest. Marvel at how much bigger it is than the average. Suggest that Capricorns are more likely to be involved in car accidents.
| |
− |
| |
− | Brignell asks: "A survey in the village pub established that ordinary punters find this sort of thing ludicrous, so why do journalists and even self-styled scientific journals fall for it?"
| |
− |
| |
− | Brignell states "extreme values exhibit special distributions of their own, which depend on the distribution of the original variate and the number of ranked samples from which they were drawn." They are not binomially distributed, yet some researchers blithely apply statistics based on the binomial distribution to them. Brignell provides a worked example of this sort of abuse here.
| |
− | [http://www.numberwatch.co.uk/2001%20August.htm#extreme]
| |
− |
| |
− | Brignell's comments on the extreme value fallacy can be found here.
| |
− | [http://www.numberwatch.co.uk/extreme_value_fallacy.htm]
| |
− |
| |
− | == Computer Modelling ==
| |
− |
| |
− | Brignell states Computer Modelling "is one of the most powerful tools available to science and engineering and, like all powerful tools, it brings dangers as well as benefits. Andrew Donald Booth said 'Every system is its own best analogue'. As a scientist he should, of course, have said in what sense he means best. The statement is true in terms of accuracy but not in terms of utility. If you want to determine the optimum shape for the members of a bridge structure, for example, you cannot build half a dozen bridges and test them to destruction, but you can try large numbers of variations in a computer model. Computers allow us to optimise designs in ways that were unavailable in times past. Nevertheless, the very flexibility of a computer program, the ease with which a glib algorithm can be implemented with a few lines of code and the difficulty of fully understanding its implications can pave the path to Cloud Cuckoo Land."
| |
− |
| |
− | Brignell identifies the main hazards of computer models as follows:
| |
− | *Assumptions: "At almost every stage in the development of a model it is necessary to make assumptions, perhaps hundreds of them. These might or might not be considered reasonable by others in the field, but they rapidly become hidden. Some of the larger models of recent times deal with the interactions of variables whose very nature is virtually unknown to science."
| |
− | *Auditability: "In olden times, if a scientist published a theory, all the stages of reasoning that led to it could be critically examined by other scientists. With a computer model, it is possible, within a few days of development, for it to become so complex that it is a virtual impossibility for an outsider to understand it fully. Indeed, where it is the result of a team effort, it becomes unlikely that any individual understands it."
| |
− | *Omissions: "Often vital elements can be left out of a model and the effect of the omissions is only realised if and when it is tested against reality. A notorious example is the Millennium Bridge in London. It was only after it was built and people started to walk on it that the engineers realised that they had created a resonant structure. This could have been modelled dynamically if they had thought about it. Some models that produce profound political and economic consequences have never faced such a challenge."
| |
− | *Subconscious: "The human subconscious is a powerful force. Even in relatively simple physical measurements it has been shown that the results can be affected by the desires and expectations of the experimenter. In a large computer model this effect can be multiplied a thousandfold. Naturally, we discount the possibility of deliberate fraud."
| |
− | *Sophistication: "This word, which literally means falsification or adulteration, has come to mean advanced and efficient. In large computer models, however, the literal meaning is often more applicable. The structure simply becomes too large and complex for the inputs that support it."
| |
− | *Testability: "When we were pioneering the applications of computer modelling about forty years ago, we soon came to the conclusion that a model is useless unless it can be tested against reality. If a model gives a reasonably accurate prediction on a simple system then we have reasonable, but not irrefutable, grounds for believing it to be accurate in other circumstances. Unfortunately, this is one of the truisms that have been lost in the enthusiasms of the new age."
| |
− | *Chaos: "Large models are often chaotic, which means that very small changes in the input variables produce very large changes in the output variables." Errors present in the input variables are magnified in the output. If feedback mechanisms are present, "it is quite possible for systems to operate on the noise alone."
| |
− |
| |
− | Brignell comments: "Many of the computer models that receive great media coverage and political endorsement fail under some of these headings; and, indeed, some fail under all of them. Yet they are used as the excuse for profound, and often extremely damaging, policies that affect everyone. That is why computer models are dangerous tools."
| |
− |
| |
− | Brignell's comments on computer models can be found here.
| |
− | [http://www.numberwatch.co.uk/computer_modelling.htm]
| |
− |
| |
− | Another problem seen in some models is the presentation of input assumptions as predictions of the model. The researcher produces a theory and wishes to test it. He crafts a model to implement his theory, and uses it to make predictions. He then looks to see if the predictions match the real world situation. If they do, they support the accuracy of his model; if the model proves accurate, it in turn provides support for his theory. If the predictions are wrong, the researcher changes his model and tries again. If the model cannot be made to produce accurate predictions, he must look for other ways to support his theory. The predictions of the theory are not predictions of the model, rather they are design constraints of it. If the model does not produce results that meet these design constraints, then it is wrong and must be changed so that it does.
| |
− |
| |
− | A classic example of this lies in models used to support theories about the long term effects of global warming. A researcher theorises that one such effect will be a rise in sea level. He then attempts to support this theory by modelling the prediction that there will be a rise, using it to produce data which can be tested against real world data. If there is agreement, then he can use the accuracy of his model as evidence to support his theory. If there isn't, he must change his model in an effort to find such agreement. If he cannot make the model agree he must ultimately abandon his theory. The model shows that the sea level will rise; it was written to do so; the fact that it does so is not of itself evidence that the sea level will indeed rise!
| |
− |
| |
− | There is a simple way to ascertain whether a prediction is of the theory or of the model, and that is to ask "If the model doesn't support the prediction will the researcher change the model so that it does?" If the answer is no, the prediction came from the model, if yes, from the researcher.
| |
− |
| |
− | == Poisson ==
| |
− |
| |
− | If some process generates random numbers, those numbers will be distributed in a predictable way. The function that describes their distribution is called their probability function. The 'normal' probability function, the one associated with most sources of random numbers, is called the Normal or Binomial Distribution. Some sources of random numbers are distributed differently. In order to do statistics on a source of random numbers it is vitally important to known what the underlying distribution is. Applying statistical tests designed to be used with Normally distributed data to data distributed in a different way is a great way to generate nonsense.
| |
− |
| |
− | The Poisson distribution is a simplification of the Binomal distribution that applies in extreme cases. Both are based on the probability of a single event happening. If this probability is very small, the Poisson distribution applies. "It is important because most of the cases for which statistical support is needed are necessarily concerned with small probabilities, i.e. relatively rare events."
| |
− |
| |
− | A feature of the Poission distribution is that the standard deviation is the square root of the mean. This makes it easy to work out how many extra results are needed to achieve significance without doing massive calculations, because 95% of results fall within two standard deviations of the mean and 99% within three.
| |
− |
| |
− | For example, assume a study, based on 10,000 people, investigating a rare disease (toenail cancer) that strikes 1 in 1000. This is a small probability, so the Poisson distribution applies. The expected number of samples is 10, of which the square root is 3.16.
| |
− | *Twice that is 6.32, so for significance at (P<.05) you must find more than 16 or less than 3 people with toenail cancer for the result to be significant.
| |
− | *Thrice that is 9.48, so for significance at (P<.01) you must find either 0 or more than 19 people with toenail cancer for the result to be significant.
| |
− | If there are 14 people with toenail cancer then the result cannot be significant, even at (P<.05).
| |
− |
| |
− | It's interesting to look at the relative risks (RR) associated with these results. For significance at (P<0.05), you have to have at least 16 people with toenail cancer, which equates to a RR of 1.6, or "Passive drinking increases the likelihood of getting toenail cancer by 60%!". For (P<0.01) the minimum number of cases needed is 20, which equates to a RR of 2.0, or "Passive drinking doubles the chances of getting toenail cancer!" You can suggest that the relative risk could be 1.25, because the size of the study doesn't allow such a small relative risk to be significant.
| |
− |
| |
− | Brignell's much more rigourous explanation of the Poisson distribution can be found here.
| |
− | [http://www.numberwatch.co.uk/Poisson.htm]
| |
− |
| |
− | == Relative Risk ==
| |
− |
| |
− | If a researcher conducts a study seeking to link toenail cancer with passive drinking, and finds 16 people with toenail cancer amongst the passive drinkers when she expected to find 10, then the relative risk of toenail cancer associated with passive drinking is 1.6, or an increase of 60%.
| |
− |
| |
− | Of course, this is not actually true. The RR of 1.6 is actually only the RR found by the one study. Because of the random nature of the data being dealt with, successive studies may find 12, 8, and 21 cases of toenail cancer, in which case the relative risks would be 1.2, 0.8, and 2.1 respecively. The ''real'' relative risk is much more difficult to uncover. If a headline states that a study has found that "passive drinking causes a 60% increase in toenail cancer", this does not mean that it actually does. It just means that one study found a relative risk of 1.6.
| |
− |
| |
− | Actually, this is all a gross simplification. Brignell gives a more formal definition: "If X% of people exposed to a putative cause suffer a certain effect and Y% not exposed to the cause (or alternatively the general population) suffer the same effect, the RR is X/Y. If the effect is 'bad', then a RR greater than unity denotes a 'bad' cause, while an RR less than unity suggests beneficial cause (and likewise if they are both 'good'). An RR of exactly unity suggests that there is no correlation."
| |
− |
| |
− | Brignell states: "most scientists [...] take a fairly rigorous view of RR values. In observational studies, they will not normally accept an RR of less than 3 as significant and never an RR of less than 2. Likewise, for a putative beneficial effect, they never accept an RR of greater than 0.5." Some scientists dispute this, and there are respected publications who will happily publish studies that do not meet these rigourous standards.
| |
− |
| |
− | Brignell gives his reasons for his point of view as follows:
| |
− | *"Even where there is no correlation, the RR is never exactly unity, since both X and Y are estimates of statistical variates, so the question arises as to how much deviation from unity should be acceptable as significant."
| |
− | *"X and Y, while inherently unrelated, might be correlated through a third factor, or indeed many others (for example, age). Sometimes such confounding factors might be known (or thought to be known) and (sometimes dubious) attempts are made to allow for them. Where they are not known they cannot be compensated for, by definition."
| |
− | *"Sometimes biases are inherent in the method of measurement employed."
| |
− | *"Statistical results are often subjected to a chain of manipulations and selections which (whether designed to or not) can increase the deviation of the RR from unity."
| |
− | *"Publication bias can give the impression of average RRs greater than 1.5 when there is no effect at all."
| |
− |
| |
− | To these can be added Brignell's own experience as a former Professor of Industrial Instrumentation. But, in the end, as Brignell himself admits, the level of relative risk that should be accepted is a judgement call.
| |
− |
| |
− | The field of epidemiology is one where the rigourous standards of significance that Brignell calls for are not generally accepted. This field has produced the following headlines in recent years:
| |
− |
| |
− | *Pets "double children's risk of asthma attacks" (''The Independent'', 5 June 2001) Relative Risk: 2.0
| |
− | *Keeping pets 'prevents allergies' (''BBC News'', 27 May 2001) Relative Risk: 2.0 [http://news.bbc.co.uk/1/hi/health/1352145.stm]
| |
− |
| |
− | *Regular painkiller use linked to cancer (''BBC News'', 10 September 1999) [http://news.bbc.co.uk/1/hi/health/443250.stm]
| |
− | *Pain killers prevent cancer (''BBC News'', 8 April 2003) Relative Risks: 1.21, 1.28, and 1.49 [http://news.bbc.co.uk/1/hi/health/2928343.stm]
| |
− | *Aspirin linked with 30% increase in cancer (January 2004) Relative Risk: 1.3
| |
− |
| |
− | *Soy sauce cancer warning (''BBC News'', 20 June 2001) [http://news.bbc.co.uk/1/hi/health/1399042.stm]
| |
− | *Another Study Showing Soy Fights Cancer (University of Missouri, December 2001) [http://www.agweb.com/get_article.asp?pageid=82237&newscat=GN&src=gennews]
| |
− |
| |
− | *Hormone Studies: What Went Wrong? (''New York Times'', 22 April 2003) "How could two large high-quality studies come to diametrically different conclusions about menopause, hormone therapy and heart disease?" Relative Risks: 1.3 and 1.4
| |
− | [http://www.susanlovemd.com/community/flashes/in-the-news/news030422.htm]
| |
− |
| |
− | Brignell suggests that "If just one such contradiction occurred in a branch of real science there would be the immediate calling of an international conference to sort it out. As with cold fusion, laboratories all over the world would attempt to replicate the results.[...] in the field of epidemiology, such conflicts are accepted as normal."
| |
− |
| |
− | Brignell's comments on relative risk can be found here:
| |
− | [http://www.numberwatch.co.uk/RR.htm]
| |
− |
| |
− | == Data Dredging ==
| |
− |
| |
− | If a researcher conducts a statistical test at a significance level of (P<0.05), she is accepting that there will be a 1 in 20 chance of getting a false positive, that is of getting a bogus result. If she conducts a study that requires 100 such tests, she can expect to get five bogus results, and has a probability of 99.4% of getting at least one. She can then publish these bogus results. Brignell calls this sort of activity "data dredging".
| |
− |
| |
− | One way to do this would be to conduct a study of the usage of 10 substances, checking to see if each is a possible factor in the incidence of each of 10 diseases. Another would be to break the data into categories based on different criteria. For example, 2 sexes, 5 age ranges, 4 different putative factors, and 3 different dosage factors make for a total of 120 separate tests. When the study is published, the researcher will seek to downplay the number of tests actually carried out. This can be done by focussing on the areas that produced results and leaving out or glossing over the bits that didn't.
| |
− |
| |
− | A third and popular form of data dredge is to set up a database containing results on hundreds of putative causes and effects. Each combination can then be checked as a separate experiment, in the knowledge that about one in every 20 combinations will produce a publishable result. And each result can be presented as a separate study.
| |
− |
| |
− | Besides this fundamental statistical flaw, Brignell suggests that there are other problems inherant in the results of data dredging:
| |
− |
| |
− | *"The data gathered are often of an anecdotal nature. They are often in areas where people lie to themselves, let alone to others, especially with the modern pressures of political correctness (e.g. alcohol, tobacco or fat intake). People like to be helpful and give the answers they think questioners want to hear. There is little or no checking as to whether the recorded 'facts' are true." For example, people who have given up smoking may claim that they have never smoked.
| |
− |
| |
− | *"People are often equally vague about their illnesses. They might be either hypochondriacs or ostriches." Thus a vague description of some symptoms can be either diagnosed as a particular disease, or ignored because of its vagueness, depending on the requirements of the researcher.
| |
− |
| |
− | *"Many of the questions put to subjects are vague to the point of absurdity, such as what people have in their diet. Can you remember what and in what quantity you ate last week?"
| |
− |
| |
− | *"Questioners are liable to add biases of their own in recording responses." For example, recording a vague response as a more definite one. Or, if there are multiple data gathers, each may apply different standards to the data they gather.
| |
− |
| |
− | *"Some of the variables are then calculated indirectly from these vague data, such as the intake of a particular vitamin being estimated from the reported diet, or factors such as deprivation and pollution being estimated from the postal code."
| |
− |
| |
− | *"There are often indications of reverse causality (e.g. people might have abandoned a therapy through illness, rather than become ill because of abandoning the therapy)."
| |
− |
| |
− | *"Often researchers disguise the fact that a result is part of a data dredge and present it as a single experiment, but they sometimes inadvertently let the cat out of the bag by mentioning other variables."
| |
− |
| |
− | *"The subjects of the data base are often unrepresentative of the population at large, e.g. they might be restricted to the medical professions or co-religionists." Kinsey's landmark studies of human sexuality are often taken as being representative of the general population, but were actually based mostly on a mixture of the responses of prisoners and of those who attended Kinsey's lectures.
| |
− |
| |
− | Brignell's comments on data dredging can be found here.
| |
− | [http://www.numberwatch.co.uk/data_dredge.htm]
| |
− |
| |
− | == Premature Termination ==
| |
− |
| |
− | Assume that you conduct a study to determine the effects of passive drinking on toenail cancer. The latter affects 1 in 1000 people, your significance level is (P<0.05), and your study is of 10,000 people, so you need to find at least 17 cases of toenail cancer in passive drinkers before you can declare that there is a link. Let's say that this study is being accumulated over time, perhaps during the course of a year. The question then arises, when do you find these cases? Do they occur at regular intervals over the year? Do they all come in a clump at the end? Or at the beginning? The answer is that their appearance is random, and so they can appear at any time.
| |
− |
| |
− | Let us say that your study is going to find 14 cases of toenail cancer in passive drinkers, that 8 of these cases will occur in the first three months of the study, and the other 6 in the final nine months. The conclusion you would have to come to after the year has passed is that passive drinking is not a factor in the incidence of toenail cancer. Suppose, however that you were to stop the study after three months. You could then predict that the study was actually going to find 32 cases, which would be very significant. You could of course justify your action by claiming that it is unethical to subject people to passive drinking when your study shows that it causes toenail cancer.
| |
− |
| |
− | During the course of a study, it is quite normal for the study to wander between significance and insignificance. The practice of monitoring it so as to be able to stop the study at a point that suits the researcher best is dodgy to say the least.
| |
− |
| |
− | Brignell's comments on this practice can be found under the headings "Number of the month 1.0" [http://www.numberwatch.co.uk/2002%20July.htm#monthnumber], "From the scaremongers' manual" [http://www.numberwatch.co.uk/2005%20March.htm], and "Dangerous and destructive nonsense!" [http://www.numberwatch.co.uk/2002%20July.htm].
| |
| | | |
| == DDT == | | == DDT == |