Difference between revisions of "John Brignell"

From SourceWatch
Jump to navigation Jump to search
m (Reverted edit of 125.255.16.233, changed back to last version by Bob Burton)
 
(14 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
Professor '''John Brignell''' held the Chair in Industrial Instrumentation at University of Southampton (UK) from 1980 to the late 1990s. [http://www.ecs.soton.ac.uk/~jeb/cv.htm]
 
Professor '''John Brignell''' held the Chair in Industrial Instrumentation at University of Southampton (UK) from 1980 to the late 1990s. [http://www.ecs.soton.ac.uk/~jeb/cv.htm]
  
Brignell retired in the late 1990's from his academic career and now devotes his time to his interest in debunking the use of false statistics common in much of today's media. He presents his views on his website ''Numberwatch'', which was launched in July 2000, and is "devoted to the monitoring of the misleading numbers that rain down on us via the media. Whether they are generated by Single Issue Fanatics (SIFs), politicians, bureaucrats, quasi-scientists (junk, pseudo- or just bad), such numbers swamp the media, generating unnecessary alarm and panic. They are seized upon by media, hungry for eye-catching stories. There is a growing band of people whose livelihoods depend on creating and maintaining panic." [http://www.numberwatch.co.uk/number%20watch.htm]
+
Brignell retired in the late 1990's from his academic career and now devotes his time to his interest in debunking the use of what he claims to be false statistics common in much of today's media. He presents his views on his website ''Numberwatch'', which was launched in July 2000, and is "devoted to the monitoring of the misleading numbers that rain down on us via the media. Whether they are generated by Single Issue Fanatics (SIFs), politicians, bureaucrats, quasi-scientists (junk, pseudo- or just bad), such numbers swamp the media, generating unnecessary alarm and panic. They are seized upon by media, hungry for eye-catching stories. There is a growing band of people whose livelihoods depend on creating and maintaining panic." [http://www.numberwatch.co.uk/number%20watch.htm]
  
 
Brignell has expressed delight with the feedback from the "encouragement and support I have received from some of the giants of the pro-science movement in the USA -- in no particular order [[Steve Milloy]], [[Alan Caruba|Alan Coruba]] [''sic''], [[James Randi]], [[Bob Caroll]], [[Michael Fumento]] and [[S. Fred Singer]]." [http://www.numberwatch.co.uk/term%20end.htm].
 
Brignell has expressed delight with the feedback from the "encouragement and support I have received from some of the giants of the pro-science movement in the USA -- in no particular order [[Steve Milloy]], [[Alan Caruba|Alan Coruba]] [''sic''], [[James Randi]], [[Bob Caroll]], [[Michael Fumento]] and [[S. Fred Singer]]." [http://www.numberwatch.co.uk/term%20end.htm].
Line 22: Line 22:
  
 
A number of popular, politically correct theories are based on unsound mathematical evidence. In choosing to debunk such theories Brignell occasionally provokes the wrath of self interested supporters. He seems to enjoy this.
 
A number of popular, politically correct theories are based on unsound mathematical evidence. In choosing to debunk such theories Brignell occasionally provokes the wrath of self interested supporters. He seems to enjoy this.
 
== Statistical Significance (P<0.05) ==
 
 
When researchers conduct a study, they select a sample from a population, and compare the value of a statistic found in the sample with that found in the population. They ask the question: "Is the statistic in the sample significantly different to that in the population?" If it is, they look for a reason why. What does significantly different mean? Researchers start with a set of numbers. They apply statistics to those numbers to change them into a set of odds, the odds of getting the numbers by chance. If the odds are too long, they say "It's too unlikely to have happened by chance. Something must have caused it." How unlikely is too unlikely? That's shown by the significance level. A typical significance level might be (P<0.01), which means "the odds of getting this result were more than 100 to 1."
 
 
Brignell suggests that one common source of error in experiments is the use of low levels of significance in statistical testing, particularly P<0.05.
 
 
If one applies a statistical test at a level of significance of 0.05, that means that one is accepting a probability of 0.05 of a false positive; that is, one is accepting that there is a chance of 1 in 20 that the result will appear significant when it isn't. Note that these odds apply to all tests carried out, not just the ones that return significant results. This has been called the 1 in 20 lottery.
 
 
This can cause problems when combined with publication bias. Studies that produce significant results tend to be published, ones that don't tend not to be. However, the number of false positives depends on the total number of studies carried out rather than the number published. In simple terms, if 1 in 2 studies produce significant results, then 50% of them will be published of which 10% will be bogus. If 1 in 5 studies produce significant results, then 20% of them will be published of which 25% will be bogus. If 1 in 10 studies produce significant results, then 10% of them will be published of which 50% will be bogus. How many studies produce significant results? What percentage of published studies are bogus? The numbers are simply not known.
 
 
Another source of problems occurs when this level of significance is used in combination with categorisation. This occurs when the data in a study is broken up into a number categories, and a statistical test is applied in each category. The effect is that of having multiple goes at the lottery: the more categories, the better the chance of producing a false positive. If the test is applied to 1 category, the odds of getting at least one bogus result are 1 in 20 (0.05). If there are 4 categories, the odds are nearly 1 in 5 (0.185). If there are 8 categories, the odds are about 1 in 3 (0.337). If there are 12 categories, the odds are about 9 in 20 (0.46). If there are 16 categories, the odds are about 5 in 9 (0.56). The odds continue to rise with the number of categories.
 
 
Combining these sources of error compounds them. If 100 studies are conducted at a significance level of 0.05, each categorising their data into 10 categories, and 50 of them are published, then 40 will be bogus and only 10 will have found any real significance.
 
 
If one hears that a study has been published that has "found a link between thyroid cancer and eating chocolate in women between the ages of 20 and 30 who eat more than three pieces of chocolate per day", how likely is it that the study is bogus? If one assumes that data was collected from a randomly selected sample of people, one could hypothesise that the data was categorised by sex (male, female), age (less than 20, 20 to 30, 30 to 40, over 40), and number of pieces eaten (none, 1 to 3, more than 3), for a total of 24 categories. This would suggest that the basic chance of the study being bogus is about 70% Incorporating publication bias might raise it to over 90%. This is of course assuming that the level of significance used is 0.05, a safe assumption in most such studies.
 
 
Brignell suggests that the use of a significance level of 0.05 is inherantly unsound. He suggests that "It is difficult to generalise, but on the whole P<0.01 would normally be considered significant and P<0.001 highly significant."
 
[http://http://www.numberwatch.co.uk/significance.htm]
 
 
Brignell suggests: "Many leading scientists and mathematicians today believe that the emphasis on significance testing is grossly overdone. P<0.05 had become an end in itself and the determinant of a successful outcome to an experiment, much to the detriment of the fundamental objective of science, which is to understand."
 
 
== Fitting Linear Trends ==
 
 
Brignell states "One of the major problems in using a finite sequence of data to represent a source that is effectively infinitely long is that the process of chopping off the ends is a distortion. [...] The other is the fact that, even when there is no linear trend in the original process, there is always one in the finite block of data taken to represent it."
 
 
Picture the sine wave y=sin(x). It is a continuous graph centred around the x axis. The x-values, are in radians, and the y-values range between 1 and -1. Lets take some samples from the graph.
 
 
*Sample 1: The points are (0.44,0.42), (0.52,0.50), (0.61,0.57), (0.70,0.64) and (0.79,0.71), and the fitted curve is y = 0.82 x + 0.07
 
*Sample 2: The points are (0.79,0.71), (0.87,0.77), (0.96,0.82), (1.05,0.87) and (1.13,0.91), and the fitted curve is y = 0.57 x + 0.26
 
*Sample 3: The points are (1.13,0.91), (1.22,0.94), (1.31,0.97), (1.40,0.98) and (1.48,1.00), and the fitted curve is  y = 0.26 x + 0.62
 
*Sample 4: The points are (1.48,1.00), (1.57,1.00), (1.66,1.00), (1.75,0.98) and (1.83,0.97), and the fitted curve is  y = -0.09 x + 1.13
 
*Sample 5: The points are (1.83,0.97), (1.92,0.94), (2.01,0.91), (2.09,0.87) and (2.18,0.82), and the fitted curve is  y = -0.42 x + 1.74
 
 
So, depending on which set of points are selected, the associated linear trend is either an 82% rise, a 57% rise, a 26% rise, a 9% fall, or a 42% fall!
 
 
A sine wave is a continuous, cycling curve; it has no linear trend. If one were to try to fit a line of best fit to it, the best one could fit would be the x-axis, y=0. Yet none of these samples has given that line. It should be obvious that, depending on what data points are used, it is possible to get just about any line of best fit that one could desire. The linear trend found is there, not because of any underlying cause, but because it is generated by the act of selecting the data set.
 
 
As an alternative example, consider a set of points generated at random. There can obviously be no linear trend to such data. And yet, a line of best fit can be applied to any subset of these points, and a linear trend deduced from it. Again, the linear trend found is there, not because of any underlying cause, but because it is generated by the act of selecting the subset.
 
 
These examples illustrate the "greatest hazard of trend estimation. The trend is a property of the data points we have and not of the original process from which they came. As we can never have an infinite number of readings there is always an error introduced by using a restricted number of data points to represent a process in the real world. Naturally the error decreases rapidly as we increase the number of data points, but it is always there and, in fact, the calculated trend is never zero".
 
 
It is not enough to fit a line to a set of points and declare a linear trend. One must understand the underlying data in order to know whether there actually is a real trend. It may simply be that the trend one sees is an artifact of the selection of data points, and doesn't actually exist in the underlying process.
 
 
Brignell's comments can be found here.
 
[http://www.numberwatch.co.uk/Trends.htm]
 
 
== The End Effect ==
 
 
Brignell states "A major problem is the end effect, which relates to the huge changes in apparent slope that can be wrought just by the choice of where to start the data selection." "The reason for this is that in the calculation of the slope, the contribution of each data point is weighted according to its distance from the centre", so variation in the end points has much more effect on the final result than variation in the central points.
 
 
If we start with the nine points, (1,1), (2,2), (3,3), ..., (9,9), the slope of the fitted line is 1. (Obviously.) If we replace the points at (3,3) through (7,7) with (3,8), (4,8), (5,8), (6,8) and (7,8), the slope of the fitted line is 0.83. The change has flattened the line. If we instead replace the end point with (1,1) with  (1,4), the slope of the fitted line is 0.80. The effect on the slope of changing just the end point is greater than that of changing five central points.
 
 
Brignell offers the following as an example of the end point effect.
 
[http://www.numberwatch.co.uk/2003%20May.htm#wheeze]
 
The graph contains 15 data points displaying a strong linear trend. The removal of 2 points from either end removes the trend. So the trend is really only supported by those 4 data points.
 
 
Brignell's comments can be found here.
 
[http://www.numberwatch.co.uk/Trends.htm]
 
 
== Trojan Numbers ==
 
 
The term Trojan number was coined by Brignell to describe a number used by authors to "get their articles or propaganda into the media." "The allusion is, of course, to the mythical stratagem whereby the Greeks infiltrated the city of Troy inside a giant wooden horse." The number looks impressive, but on further examination isn't.
 
 
Brignell states "The major form of Trojan Number is the size of study. Early on in the piece it will be mentioned that (to invent some arbitrary numbers) there were 60,000 people in the study. The number experiencing the condition in question, say toe-nail cancer, is, however, much smaller, perhaps 60. Of these the number indulging in the putative cause, say passive drinking, is even smaller say 20. There is a number expected (as a proportion of the 60) at random from knowledge of the statistics for the general population, say, 14. Thus the number that really matters, the excess number of cases, is half a dozen. It is surprising how often an apparently huge study whittles down to an excess that you can count on your fingers. If the number 6 had been mentioned at the outset, the claim would have been laughed out of court, so it is never mentioned, though you can often have a pretty good stab at deducing it. In the statistics of rare events an excess of 6 on an expectation of 14 would be unsurprising. The rest of the 60,000 are mere bystanders." In fact, finding an extra 6 would not be significant at P<0.05.
 
 
Trojan numbers can be repeatedly presented in different ways. For example, "3.21% of passive drinkers are depressed" or "72.45% of women under 35 are unaware that passive drinking causes toe-nail cancer". In each case, after the headline and a couple of sentences, the body of the article is a repeat of the material already presented. The repeated presentation of the same material helps to lodge it into the public consciousness, as well as to raise the profile of the academic doing the research.
 
 
Brignell notes "One of the most effective forms of Trojan Number is the Virtual Body Count. Sub-editors cannot resist a headline Thousands to die of X." The body count is of course obtained by scaling up the study to the size of the population. The US population is 295,734,134. For a study of 60,000 people, one extra case therefore equals a body count of 4929. So six extra cases is 29,573 bodies. If the next study finds only three extra cases, the body count halves.
 
 
Brignell's comments on trojan numbers can be found here.
 
[http://www.numberwatch.co.uk/trojan_number.htm]
 
 
== Publication Bias ==
 
 
Brignell states: "Publication bias is a tendency on average to produce results that appear significant, because negative or near neutral results are almost never published." Researchers submit their work for publication if it produces significant results; if it does not, they put it in the drawer and go on to something else. Editors publish papers that show that something has occured, not those that show nothing.
 
 
Brignell suggests: "It is possible to estimate the effects of publication bias, as long as we are prepared to indulge in a bit of guess work about the willingness of researchers to publish for various claimed levels of Relative Risk."
 
 
Assume that a rare disease (toenail cancer) affects one in every thousand people. Assume that a group of researchers decide to study the effect of a particular factor (passive drinking) on the incidence of the disease. Assume that the factor has no effect. If the researchers study 10,000 people, they would expect to see 10 cases of toenail cancer, which would correspond to a relative risk (RR) of 1.0. Of course, as the distribution of cases is random, there may be more or less cases, their number being governed by a probability density function.
 
 
Brignell continues: "Now, the guesswork part is to invent a plausible function that would represent the willingness of authors to publish. Observing the literature in general, we know that they are all willing to publish a RR of 2.0 but almost nobody will publish an RR of less that 1.1." He suggests a simple function that meets these guidelines.
 
 
Brignell multiplies the two functions together, ordinate by ordinate, to determine the probability density function for published data. The resulting function has an average value of 15.8. This corresponds to a relative risk of 1.58, or an increase of 58%.
 
 
To put it another way, researchers would expect to find about 10 cases of toenail cancer in a batch of 10,000 people. However, as the number is random, they could find any number. They have to find a minimum of about 16 cases before they will publish their data. So in reviewing publications one would expect to see an average of 15.8 cases, not 10. This is not due to any underlying linkage between toenail cancer and passive drinking, but simply due to the selectivity applied in choosing what to publish.
 
 
Brignell comments: "The headlines will, of course, say something like Passive drinking causes a 58% increase in toe-nail cancer whereas the result is entirely spurious. This is a major reason for never accepting RRs of less than 2.0."
 
 
Brignell's comments can be found here.
 
[http://www.numberwatch.co.uk/publication_bias.htm]
 
 
== The Extreme Value Fallacy ==
 
 
Brignell states: "If you take a number of samples of a random variable and put them in order of magnitude, the extreme values are the largest and smallest. These extreme values exhibit special distributions of their own, which depend on the distribution of the original variate and the number of ranked samples from which they were drawn. The fallacy occurs when the extremes are treated as though they were single samples from the original distribution."
 
 
For example, choose some men and calculate their average height. You will find that about half of them are higher than that average. Pick out the highest. Marvel at how much higher he is than the average.
 
 
For example, divide some people into groups based on their month of birth. Work out how many people in each group have toenail cancer. Calculate the average number across the groups. Pick out the highest. Marvel at how much bigger it is than the average. Publish a newspaper report: "People born in July are more likely to get toe-nail cancer". Hypothesise about the causes for this. Suggest that toenail cancer might be linked to sunlight, and that people should stay indoors during summer months. This form of the fallacy is very common, and is referred to as the birth month fallacy. Brignell notes that it "recurs in the media several times a year."
 
 
For example, divide some people into groups based on their star sign. Work out how many people in each group are involved in car accidents over a set period. Calculate the average number across the groups. Pick out the biggest. Marvel at how much bigger it is than the average. Suggest that Capricorns are more likely to be involved in car accidents.
 
 
Brignell asks: "A survey in the village pub established that ordinary punters find this sort of thing ludicrous, so why do journalists and even self-styled scientific journals fall for it?"
 
 
Brignell states "extreme values exhibit special distributions of their own, which depend on the distribution of the original variate and the number of ranked samples from which they were drawn." They are not binomially distributed, yet some researchers blithely apply statistics based on the binomial distribution to them. Brignell provides a worked example of this sort of abuse here.
 
[http://www.numberwatch.co.uk/2001%20August.htm#extreme]
 
 
Brignell's comments on the extreme value fallacy can be found here.
 
[http://www.numberwatch.co.uk/extreme_value_fallacy.htm]
 
 
== Computer Modelling ==
 
 
Brignell states Computer Modelling "is one of the most powerful tools available to science and engineering and, like all powerful tools, it brings dangers as well as benefits. Andrew Donald Booth said 'Every system is its own best analogue'. As a scientist he should, of course, have said in what sense he means best. The statement is true in terms of accuracy but not in terms of utility. If you want to determine the optimum shape for the members of a bridge structure, for example, you cannot build half a dozen bridges and test them to destruction, but you can try large numbers of variations in a computer model. Computers allow us to optimise designs in ways that were unavailable in times past. Nevertheless, the very flexibility of a computer program, the ease with which a glib algorithm can be implemented with a few lines of code and the difficulty of fully understanding its implications can pave the path to Cloud Cuckoo Land."
 
 
Brignell identifies the main hazards of computer models as follows:
 
*Assumptions: "At almost every stage in the development of a model it is necessary to make assumptions, perhaps hundreds of them. These might or might not be considered reasonable by others in the field, but they rapidly become hidden. Some of the larger models of recent times deal with the interactions of variables whose very nature is virtually unknown to science."
 
*Auditability: "In olden times, if a scientist published a theory, all the stages of reasoning that led to it could be critically examined by other scientists. With a computer model, it is possible, within a few days of development, for it to become so complex that it is a virtual impossibility for an outsider to understand it fully. Indeed, where it is the result of a team effort, it becomes unlikely that any individual understands it."
 
*Omissions: "Often vital elements can be left out of a model and the effect of the omissions is only realised if and when it is tested against reality. A notorious example is the Millennium Bridge in London. It was only after it was built and people started to walk on it that the engineers realised that they had created a resonant structure. This could have been modelled dynamically if they had thought about it. Some models that produce profound political and economic consequences have never faced such a challenge."
 
*Subconscious: "The human subconscious is a powerful force. Even in relatively simple physical measurements it has been shown that the results can be affected by the desires and expectations of the experimenter. In a large computer model this effect can be multiplied a thousandfold. Naturally, we discount the possibility of deliberate fraud."
 
*Sophistication: "This word, which literally means falsification or adulteration, has come to mean advanced and efficient. In large computer models, however, the literal meaning is often more applicable. The structure simply becomes too large and complex for the inputs that support it."
 
*Testability: "When we were pioneering the applications of computer modelling about forty years ago, we soon came to the conclusion that a model is useless unless it can be tested against reality. If a model gives a reasonably accurate prediction on a simple system then we have reasonable, but not irrefutable, grounds for believing it to be accurate in other circumstances. Unfortunately, this is one of the truisms that have been lost in the enthusiasms of the new age."
 
*Chaos: "Large models are often chaotic, which means that very small changes in the input variables produce very large changes in the output variables." Errors present in the input variables are magnified in the output. If feedback mechanisms are present, "it is quite possible for systems to operate on the noise alone."
 
 
Brignell comments: "Many of the computer models that receive great media coverage and political endorsement fail under some of these headings; and, indeed, some fail under all of them. Yet they are used as the excuse for profound, and often extremely damaging, policies that affect everyone. That is why computer models are dangerous tools."
 
 
Brignell's comments on computer models can be found here.
 
[http://www.numberwatch.co.uk/computer_modelling.htm]
 
 
Another problem seen in some models is the presentation of input assumptions as predictions of the model. The researcher produces a theory and wishes to test it. He crafts a model to implement his theory, and uses it to make predictions. He then looks to see if the predictions match the real world situation. If they do, they support the accuracy of his model; if the model proves accurate, it in turn provides support for his theory. If the predictions are wrong, the researcher changes his model and tries again. If the model cannot be made to produce accurate predictions, he must look for other ways to support his theory. The predictions of the theory are not predictions of the model, rather they are design constraints of it. If the model does not produce results that meet these design constraints, then it is wrong and must be changed so that it does.
 
 
A classic example of this lies in models used to support theories about the long term effects of global warming. A researcher theorises that one such effect will be a rise in sea level. He then attempts to support this theory by modelling the prediction that there will be a rise, using it to produce data which can be tested against real world data. If there is agreement, then he can use the accuracy of his model as evidence to support his theory. If there isn't, he must change his model in an effort to find such agreement. If he cannot make the model agree he must ultimately abandon his theory. The model shows that the sea level will rise; it was written to do so; the fact that it does so is not of itself evidence that the sea level will indeed rise!
 
 
There is a simple way to ascertain whether a prediction is of the theory or of the model, and that is to ask "If the model doesn't support the prediction will the researcher change the model so that it does?" If the answer is no, the prediction came from the model, if yes, from the researcher.
 
 
== Poisson ==
 
 
If some process generates random numbers, those numbers will be distributed in a predictable way. The function that describes their distribution is called their probability function. The 'normal' probability function, the one associated with most sources of random numbers, is called the Normal or Binomial Distribution. Some sources of random numbers are distributed differently. In order to do statistics on a source of random numbers it is vitally important to known what the underlying distribution is. Applying statistical tests designed to be used with Normally distributed data to data distributed in a different way is a great way to generate nonsense.
 
 
The Poisson distribution is a simplification of the Binomal distribution that applies in extreme cases. Both are based on the probability of a single event happening. If this probability is very small, the Poisson distribution applies. "It is important because most of the cases for which statistical support is needed are necessarily concerned with small probabilities, i.e. relatively rare events."
 
 
A feature of the Poission distribution is that the standard deviation is the square root of the mean. This makes it easy to work out how many extra results are needed to achieve significance without doing massive calculations, because 95% of results fall within two standard deviations of the mean and 99% within three.
 
 
For example, assume a study, based on 10,000 people, investigating a rare disease (toenail cancer) that strikes 1 in 1000. This is a small probability, so the Poisson distribution applies. The expected number of samples is 10, of which the square root is 3.16.
 
*Twice that is 6.32, so for significance at (P<.05) you must find more than 16 or less than 3 people with toenail cancer for the result to be significant.
 
*Thrice that is 9.48, so for significance at (P<.01) you must find either 0 or more than 19 people with toenail cancer for the result to be significant.
 
If there are 14 people with toenail cancer then the result cannot be significant, even at (P<.05).
 
 
It's interesting to look at the relative risks (RR) associated with these results. For significance at (P<0.05), you have to have at least 16 people with toenail cancer, which equates to a RR of 1.6, or "Passive drinking increases the likelihood of getting toenail cancer by 60%!". For (P<0.01) the minimum number of cases needed is 20, which equates to a RR of 2.0, or "Passive drinking doubles the chances of getting toenail cancer!" You can suggest that the relative risk could be 1.25, because the size of the study doesn't allow such a small relative risk to be significant.
 
 
Brignell's much more rigourous explanation of the Poisson distribution can be found here.
 
[http://www.numberwatch.co.uk/Poisson.htm]
 
 
== Relative Risk ==
 
 
If a researcher conducts a study seeking to link toenail cancer with passive drinking, and finds 16 people with toenail cancer amongst the passive drinkers when she expected to find 10, then the relative risk of toenail cancer associated with passive drinking is 1.6, or an increase of 60%.
 
 
Of course, this is not actually true. The RR of 1.6 is actually only the RR found by the one study. Because of the random nature of the data being dealt with, successive studies may find 12, 8, and 21 cases of toenail cancer, in which case the relative risks would be 1.2, 0.8, and 2.1 respecively. The ''real'' relative risk is much more difficult to uncover. If a headline states that a study has found that "passive drinking causes a 60% increase in toenail cancer", this does not mean that it actually does. It just means that one study found a relative risk of 1.6.
 
 
Actually, this is all a gross simplification. Brignell gives a more formal definition: "If X% of people exposed to a putative cause suffer a certain effect and Y% not exposed to the cause (or alternatively the general population) suffer the same effect, the RR is X/Y. If the effect is 'bad', then a RR greater than unity denotes a 'bad' cause, while an RR less than unity suggests beneficial cause (and likewise if they are both 'good'). An RR of exactly unity suggests that there is no correlation."
 
 
Brignell states: "most scientists [...] take a fairly rigorous view of RR values. In observational studies, they will not normally accept an RR of less than 3 as significant and never an RR of less than 2. Likewise, for a putative beneficial effect, they never accept an RR of greater than 0.5." Some scientists dispute this, and there are respected publications who will happily publish studies that do not meet these rigourous standards.
 
 
Brignell gives his reasons for his point of view as follows:
 
*"Even where there is no correlation, the RR is never exactly unity, since both X and Y are estimates of statistical variates, so the question arises as to how much deviation from unity should be acceptable as significant."
 
*"X and Y, while inherently unrelated, might be correlated through a third factor, or indeed many others (for example, age). Sometimes such confounding factors might be known (or thought to be known) and (sometimes dubious) attempts are made to allow for them. Where they are not known they cannot be compensated for, by definition."
 
*"Sometimes biases are inherent in the method of measurement employed."
 
*"Statistical results are often subjected to a chain of manipulations and selections which (whether designed to or not) can increase the deviation of the RR from unity."
 
*"Publication bias can give the impression of average RRs greater than 1.5 when there is no effect at all."
 
 
To these can be added Brignell's own experience as a former Professor of Industrial Instrumentation. But, in the end, as Brignell himself admits, the level of relative risk that should be accepted is a judgement call.
 
 
The field of epidemiology is one where the rigourous standards of significance that Brignell calls for are not generally accepted. This field has produced the following headlines in recent years:
 
 
*Pets "double children's risk of asthma attacks" (''The Independent'', 5 June 2001) Relative Risk: 2.0
 
*Keeping pets 'prevents allergies' (''BBC News'', 27 May 2001) Relative Risk: 2.0 [http://news.bbc.co.uk/1/hi/health/1352145.stm]
 
 
*Regular painkiller use linked to cancer (''BBC News'', 10 September 1999) [http://news.bbc.co.uk/1/hi/health/443250.stm]
 
*Pain killers prevent cancer (''BBC News'', 8 April 2003) Relative Risks: 1.21, 1.28, and 1.49 [http://news.bbc.co.uk/1/hi/health/2928343.stm]
 
*Aspirin linked with 30% increase in cancer (January 2004) Relative Risk: 1.3
 
 
*Soy sauce cancer warning (''BBC News'', 20 June 2001) [http://news.bbc.co.uk/1/hi/health/1399042.stm]
 
*Another Study Showing Soy Fights Cancer (University of Missouri, December 2001) [http://www.agweb.com/get_article.asp?pageid=82237&newscat=GN&src=gennews]
 
 
*Hormone Studies: What Went Wrong? (''New York Times'', 22 April 2003) "How could two large high-quality studies come to diametrically different conclusions about menopause, hormone therapy and heart disease?" Relative Risks: 1.3 and 1.4
 
[http://www.susanlovemd.com/community/flashes/in-the-news/news030422.htm]
 
 
Brignell suggests that "If just one such contradiction occurred in a branch of real science there would be the immediate calling of an international conference to sort it out. As with cold fusion, laboratories all over the world would attempt to replicate the results.[...] in the field of epidemiology, such conflicts are accepted as normal."
 
 
Brignell's comments on relative risk can be found here:
 
[http://www.numberwatch.co.uk/RR.htm]
 
 
== Data Dredging ==
 
 
If a researcher conducts a statistical test at a significance level of (P<0.05), she is accepting that there will be a 1 in 20 chance of getting a false positive, that is of getting a bogus result. If she conducts a study that requires 100 such tests, she can expect to get five bogus results, and has a probability of 99.4% of getting at least one. She can then publish these bogus results. Brignell calls this sort of activity "data dredging".
 
 
One way to do this would be to conduct a study of the usage of 10 substances, checking to see if each is a possible factor in the incidence of each of 10 diseases. Another would be to break the data into categories based on different criteria. For example, 2 sexes, 5 age ranges, 4 different putative factors, and 3 different dosage factors make for a total of 120 separate tests. When the study is published, the researcher will seek to downplay the number of tests actually carried out. This can be done by focussing on the areas that produced results and leaving out or glossing over the bits that didn't.
 
 
A third and popular form of data dredge is to set up a database containing results on hundreds of putative causes and effects. Each combination can then be checked as a separate experiment, in the knowledge that about one in every 20 combinations will produce a publishable result. And each result can be presented as a separate study.
 
 
Besides this fundamental statistical flaw, Brignell suggests that there are other problems inherant in the results of data dredging:
 
 
*"The data gathered are often of an anecdotal nature. They are often in areas where people lie to themselves, let alone to others, especially with the modern pressures of political correctness (e.g. alcohol, tobacco or fat intake). People like to be helpful and give the answers they think questioners want to hear. There is little or no checking as to whether the recorded 'facts' are true." For example, people who have given up smoking may claim that they have never smoked.
 
 
*"People are often equally vague about their illnesses. They might be either hypochondriacs or ostriches." Thus a vague description of some symptoms can be either diagnosed as a particular disease, or ignored because of its vagueness, depending on the requirements of the researcher.
 
 
*"Many of the questions put to subjects are vague to the point of absurdity, such as what people have in their diet. Can you remember what and in what quantity you ate last week?"
 
 
*"Questioners are liable to add biases of their own in recording responses." For example, recording a vague response as a more definite one. Or, if there are multiple data gathers, each may apply different standards to the data they gather.
 
 
*"Some of the variables are then calculated indirectly from these vague data, such as the intake of a particular vitamin being estimated from the reported diet, or factors such as deprivation and pollution being estimated from the postal code."
 
 
*"There are often indications of reverse causality (e.g. people might have abandoned a therapy through illness, rather than become ill because of abandoning the therapy)."
 
 
*"Often researchers disguise the fact that a result is part of a data dredge and present it as a single experiment, but they sometimes inadvertently let the cat out of the bag by mentioning other variables."
 
 
*"The subjects of the data base are often unrepresentative of the population at large, e.g. they might be restricted to the medical professions or co-religionists." Kinsey's landmark studies of human sexuality are often taken as being representative of the general population, but were actually based mostly on a mixture of the responses of prisoners and of those who attended Kinsey's lectures.
 
 
Brignell's comments on data dredging can be found here.
 
[http://www.numberwatch.co.uk/data_dredge.htm]
 
 
== Premature Termination ==
 
 
Assume that you conduct a study to determine the effects of passive drinking on  toenail cancer. The latter affects 1 in 1000 people, your significance level is (P<0.05), and your study is of 10,000 people, so you need to find at least 17 cases of toenail cancer in passive drinkers before you can declare that there is a link. Let's say that this study is being accumulated over time, perhaps during the course of a year. The question then arises, when do you find these cases? Do they occur at regular intervals over the year? Do they all come in a clump at the end? Or at the beginning? The answer is that their appearance is random, and so they can appear at any time.
 
 
Let us say that your study is going to find 14 cases of toenail cancer in passive drinkers, that 8 of these cases will occur in the first three months of the study, and the other 6 in the final nine months. The conclusion you would have to come to after the year has passed is that passive drinking is not a factor in the incidence of toenail cancer. Suppose, however that you were to stop the study after three months. You could then predict that the study was actually going to find 32 cases, which would be very significant. You could of course justify your action by claiming that it is unethical to subject people to passive drinking when your study shows that it causes toenail cancer.
 
 
During the course of a study, it is quite normal for the study to wander between significance and insignificance. The practice of monitoring it so as to be able to stop the study at a point that suits the researcher best is dodgy to say the least.
 
 
Brignell's comments on this practice can be found under the headings "Number of the month 1.0" [http://www.numberwatch.co.uk/2002%20July.htm#monthnumber], "From the scaremongers' manual" [http://www.numberwatch.co.uk/2005%20March.htm], and "Dangerous and destructive nonsense!" [http://www.numberwatch.co.uk/2002%20July.htm].
 
  
 
== DDT ==
 
== DDT ==

Latest revision as of 11:07, 9 February 2006

Professor John Brignell held the Chair in Industrial Instrumentation at University of Southampton (UK) from 1980 to the late 1990s. [1]

Brignell retired in the late 1990's from his academic career and now devotes his time to his interest in debunking the use of what he claims to be false statistics common in much of today's media. He presents his views on his website Numberwatch, which was launched in July 2000, and is "devoted to the monitoring of the misleading numbers that rain down on us via the media. Whether they are generated by Single Issue Fanatics (SIFs), politicians, bureaucrats, quasi-scientists (junk, pseudo- or just bad), such numbers swamp the media, generating unnecessary alarm and panic. They are seized upon by media, hungry for eye-catching stories. There is a growing band of people whose livelihoods depend on creating and maintaining panic." [2]

Brignell has expressed delight with the feedback from the "encouragement and support I have received from some of the giants of the pro-science movement in the USA -- in no particular order Steve Milloy, Alan Coruba [sic], James Randi, Bob Caroll, Michael Fumento and S. Fred Singer." [3].

Brignell has self-published two books debunking the mathematics behind media scares, Sorry, wrong number! and The epidemiologists: Have they got scares for you!, under the name Brignell Associates.

Brignell's Point of View

Brignell is a trained mathematician and scientist who has spent much of his life working with statistics. His point of view can be summed up as follows:

  • A theory is only as good as the evidence that supports it. Theories supported by weak evidence should not be given much credence.
  • Evidence drawn from mathematics is only as sound as the underlying mathematics. This is particularly true in the field of statistics.
  • Most people are not qualified to judge the soundness of mathematics presented as evidence. They instead choose to rely on the testimony of experts. Many people who present themselves as experts are either themselves not qualified to judge, or are self interested.
  • The result is that much evidence is presented as sound when it is not. Consequently many theories are given a great deal more credence by most people than they deserve.

Brignell has made it his task to seek out media articles where conclusions are drawn from unsound mathematical evidence and debunk them on his website.

A number of popular, politically correct theories are based on unsound mathematical evidence. In choosing to debunk such theories Brignell occasionally provokes the wrath of self interested supporters. He seems to enjoy this.

DDT

On his website Brignell rails against what he calls the "deadly legacy of Rachel Carson" in curtailing the use of DDT. The Environnental Protection Agency, he wrote, "and its allies used their influence with international organisations to enforce the ban throughout the world. Some poor countries were actually blackmailed into banning it under threat of withdrawal of aid. As a result two and a half million people die of malaria every year, most of them poor children in Africa." [4]

Brignell claimed that the resurgence of malaria in Sri Lanka was a case in point. He claimed that Sri Lanka banned DDT in 1964 "under the influence of the sainted Rachel" whose book, Silent Spring, was published in 1962. Brignell notes that the number of cases of malaria has been reduced to 17 in 1963 before rising once more to 2.5 million at the end of the decade.

The author of the Deltoid blog, Tim Lambert, took issue with Brignell and others claims. "Now when you think about it, the story that they tell just isn't credible. If DDT spraying had almost eliminated malaria, and they got a new outbreak, then no environmentalists would be able to stop them from resuming spraying," he wrote. On investigating Lambert found that Sri Lanka did restart spraying with DDT but found that the target mosquito had grown resistant to DDT.

"So in 1977 they switched to the more expensive malathion and were able to reduce the number of cases to about 50,000 by 1980. In 2004, the number was down to 3,000, without using DDT," he wrote. "And the reason why they stopped spraying in 1964? It wasn't environmentalist pressure. With only 17 cases in 1963, they didn't think it was needed any more ... The anti-environmentalist version of what happened is a hoax. That doesn't mean that all the writers above were being deliberately misleading: they might be just repeating what another anti-environmentalist wrote and be unaware of the true story," he wrote. [5]

Brignell responded to Tim Lambert's blog entry as follows: "Tim Lambert supplies a reference that is certainly well worth reading, which suggests that DDT was abandoned rather than banned. It is possible that the Government in Sri Lanka, when it decided its budget priorities, was unaware of the international hype surrounding Rachel Carson at the time, but whatever the motivation the result was the same." [6]

Brignell and his ideological supporters believe that that the use of DDT, by reducing the number of deaths per year from malaria from 2.8 million in 1948 to 18 in 1963, had by 1970 saved about 56 million lives. Brignell states that the number of deaths due to malaria "makes The Holocaust look like a dress rehearsal." However, their argument willfully omits a number of important facts.

For starters, Rachel Carson herself was not opposed to all pesticide use. Prophetically, she worried that widespread agricultural use of pesticides would endanger efforts to control malaria, typhus and other diseases. In Silent Spring, she wrote, "No responsible person contends that insect-borne disease should be ignored. The question that has now urgently presented itself is whether it is either wise or responsible to attack the problem by methods that are rapidly making it worse. The world has heard much of the triumphant war against disease through the control of insect vectors of infection, but it has heard little of the other side of the story - the defeats, the short-lived triumphs that now strongly support the alarming view that the insect enemy has been made stronger by our efforts. Even worse, we may have destroyed our very means of fighting." Carson noted that the widespread use of DDT created selection pressure that led to the emergence of DDT-resistance mosquitoes and flies.

As science writer Laurie Garrett notes in her 1994 book, The Coming Plague: Newly Emerging Diseases In a World Out of Balance, the early success of efforts to control malaria contributed to the disease's later resurgence. The effort to control the disease was led by malariologist Paul Russell, who promised in 1956 that a multimillion-dollar effort could eliminate the disease by 1963: "Thus, in 1958 Russell's battle for malaria eradication began, backed directly by $23.3 million a year from Congress. Because Russell had been so adamant about the time frame, Congress stipulated that the funds would stop flowing in 1963. ... It was a staggering economic commitment, the equivalent of billions of dollars in 1990." By 1963, however, malaria "had indeed reached its nadir. But it had not been eliminated. ... But a deal's a deal. Russell promised success by 1963, and Congress was in no mood to entertain extending funds for another year, or two. As far as Congress was concerned, failure to reach eradication by 1963 simply meant it couldn't be done, in any time frame. And at the time virtually all the spare cash was American; without steady infusions of U.S. dollars, the effort tied abruptly." Worse yet,

Thanks to the near-eradication effort, hundreds of millions of people now lacked immunity to the disease, but lived in areas where the Anopheles [mosquitos that carry the disease] would undoubtedly return. Pulling the plug abruptly on their control programs virtually guaranteed future surges in malaria deaths, particularly in poor countries lacking their own disease control infrastructures. As malaria relentlessly increased again after 1963, developing countries were forced to commit ever-larger amounts of scarce public health dollars to the problem. India, for example, dedicated over a third of its entire health budget to malaria control. ...
At the very time malaria control efforts were splintering or collapsing, the agricultural use of DDT and its sister compounds was soaring. Almost overnight resistant mosquito populations appeared all over the world. ... By the time the smallpox campaign was approaching victory in 1975, parasite resistance to chloroquine and mosquito resistance to DDT and other pesticides were so widespread that nobody spoke of eliminating malaria. (The Coming Plague, pp. 46-52)

Counting the dead

In the discussion that followed a study published in The Lancet estimating the number of people killed in Iraq since the invasion, Brignell wrote that the use of a relative risk figure of 1.5 was inappropriate. "A relative risk of 1.5 is not acceptable as significant," he wrote. [7] Brignell argues that increases of risk less than 100% should be ignored.

However, as Tim Lambert pointed out, the increased risk is statistically significant. "You won't find support for Brignell's claim in any conventional statistical text or paper. To support his claim he cites a book called Sorry, wrong number!. Trouble is, that book was written by? John Brignell. Not only that, it was published by? John Brignell," he wrote. It should be noted that Brignell does provide a number of supporting statements from recognised authorities. [8]

Lambert suggests a thought experiment:

Suppose we had perfect records of every death in Iraq and there were 200,000 in the year before the invasion, and 300,000 in the year after. Then the relative risk would be 1.5 and Brignell would dismiss the increase as not significant even though in this case we have absolutely certainty that there were 100,000 extra deaths. [9]

This is what's called a "straw man" argument, where you make an unrelated claim and then attack it as though it were the original claim. The statistics in these surveys are not drawn from hundreds of thousands of results, but from dozens, and the samples are not "perfect records", they are samples from populations, which means they are randomly distributed. So the "thought experiment" is obviously invalid.

In fact, the journal Science published in 1995 a list of published risks for cancer from the previous 8 years; among those were

  • Smoking more than 100 cigarettes in a lifetime--rr 1.2 for breast cancer (February 1990)
  • Lengthy occupational exposure to dioxin--rr 1.5 for all cancers (January 1991)
  • Regular use of high-alcohol mouthwash--rr 1.5 for mouth cancer (June 1991)
  • Use of phenoxy herbicides on lawns--rr 1.3 for malignant lymphoma in dogs (September 1991)
  • Weighing 3.6 kilograms or more at birth--rr 1.3 for breast cancer (October 1992)
  • Occupational exposure to electromagnetic fields--rr 1.38 for breast cancer (June 1994)
  • Ever having used a sun lamp--rr 1.3 for melanoma (November 1994)
  • Abortion--rr 1.5 for breast cancer (November 1994)
  • Consuming olive oil only once a day or less--rr 1.25 for breast cancer (January 1995)

(Sizing Up the Cancer Risks, Science 269, p. 165, 1995). Although the validity of some of these relationships is still controversial, it is clear that "A relative risk of 1.5 is not acceptable as significant," is not true as far as scientific publication is concerned.

Publication, of course, is not the same as being significant. The above example supports Professor Brignell's demonstration that publication bias causes a relative risk that can be approximated to 1.6.

Attention should also be drawn to the publication of studies which produce contradictory results. For example, the Nurse's Health Study found that the use of hormone treatments reduced the risk of cancer (relative risk 1.3), whereas the Women's Health Initiative found that the same hormone treatments increased the risk of cancer (relative risk 1.4). (Gina Kolata, Hormone Studies: What Went Wrong?, The New York Times, 22 April 2003). These results are not due to faults in the studies themselves, but rather to the acceptance of low levels of relative risk as significant.

The hole in the ozone layer

In another article Brignell complained about restrictions imposed in the U.K. on people being able to dump old refrigerators due to concerns about the release of ozone depeleting gases. "It is now illegal to dispose of both the coolant and the insulant in fridges, but in Britain there is no legal way of doing it. All because of a hole in the ozone layer that was probably always there and an unproven theory as to how it was caused," he wrote. [10]

Once more Lambert challenged Brignell's claim and cited Antarctic scientific data. "It is perfectly clear that the hole was not always there. There is not one scrap of evidence to support Brignell's claim. Yet even when confronted with the evidence that proves his claim is false he continues to maintain that it is true," Lambert wrote. [11]

Second Hand Smoke

Bob Carroll (author of the Skeptic's Dictionary) initially accepted Brignell's argument against the EPA's finding that second hand smoke caused lung cancer. However, he changed his mind when he found that the "scientific principle" (relative risk less than 2) that Brignell used to reject the finding was not recognized by epidemiologists, just tobacco companies.

Books

  • John Brignell, Sorry, wrong number!, Brignell Associates, September 2000. ISBN 0-9539108-0-6
  • John Brignell, The epidemiologists:Have they got scares for you!, Brignell Associates, July 2004. ISBN 0-9539108-2-2.

Contact details

Department of Electronics & Computer Science,
University of Southampton,
Highfield, Southampton SO17 1BJ
Telephone: (01703) 594450
FAX: (01703) 592901
Email: jeb AT numberwatch.co.uk
Web: http://www.numberwatch.co.uk

External links