How scientists cheat

Models, Hypotheses and Logic in Science

[...]"the logic of science," said John Stuart Mill, "is also that of business and life," and science, said T. H. Huxley, is "organised common sense." Indeed, scientific philosophy does produce much the same conclusions as common sense.

Faced with unexplained observations, a scientist is advised to devise a model. In some fields this model can be a physical object but, on many other occasions, the word model is interchangeable with hypothesis. In the philosophy of science the two words have somewhat different meanings but here the distinction is unimportant. If a hypothesis successfully predicts the outcome of many critical experiments, then it is proved beyond reasonable doubt and has become a theory.

The term beyond reasonable doubt, again brings out the analogies between scientific and legal investigations. Scientific logic is the logic of investigation and decision making everywhere. No theory is ever actually proved, it is only not disproved while lawyers use the phrase "beyond reasonable doubt" to recognise that the guilt of a defendant is never proved with absolute certainty. Guilt is proved only beyond reasonable doubt.


A model is a set of axioms or postulates which, it is thought, might fairly describe the nature of the phenomenon being studied. Model building is like using an intellectual version of a child's construction kit; scientists gather a set of axioms and concepts (the component parts of a hypothesis), assemble them into a model and compare its behaviour with that of nature. Models are valuable because they can be used to predict the outcome of experiments and scientists compare these predictions with observation. They may discard a new model immediately if it fails to predict existing results. More usefully, the predictions of a model will guide the experimenter's hand, enabling him to design investigations to differentiate two or more opposing ideas. The model(s) failing to predict the outcome of the test being discarded in favour of those that do.

Philosophers of science point out overarching or general models, called paradigms, ideas that are very wide-ranging and provide the framework for the formation of many more specific models. An example might be Newton's mechanics, a paradigm whose ideas are contained in lots of narrower models from fields as diverse as atomic theory and cosmology.

Classic Scientific Logic

There is no more to science than its method, and there is no more to its method than Popper has said. Hermann Bondi (Quoted by Magee (1973))

Model building is the classic description of scientific method expounded at length by Karl Popper in his famous books The Logic of Scientific Discovery (1968) and Conjectures and Refutations (1972). His approach, often called the hypothetico-deductive method, is accepted as a major feature of scientific logic. Popper is often thought to have regarded falsification as the centre of scientific logic but this is an error. To him falsification was extremely important and the elaboration of this principle was his own major contribution. However, he also held that all ideas, even his own, could and should be subject to reasoned, rational criticism. This principle of critical rationalism originated in ancient Greece, not with Popper, but to him it, not falsification, was the central scientific principle. Thus, it is necessary to be clear about the meaning of these two words, rationality and criticism.

The philosophy of rationality is the philosophy of the enlightenment. It originated much earlier but was elaborated in the 17th and 18th century by Descartes, Spinoza, Leibnitz and others in response to the growing success of science. Rationalism incorporates the principles of logic and certain ideas about the universe. It holds, for example, that there is only one single reality, hence that a person cannot simultaneously hold two contradictory beliefs about the world. It follows that to assert one theory is to simultaneously reject all competing theories. To assert otherwise is, in the strict meaning of the word, irrational. Further, a rational belief must be based on sufficient reason and that a rational believer should proffer reasons that are sufficient to justify holding his view. Rationality asserts that, to hold any belief, one must equally accept all the logical deductions that flow from it. The process of testing ideas by experiment depends on this principle, it leads to the conclusion that inconsistent experimental results undermine a theory.

Rationalism does contain different streams of thought, one split being into subjective and objective rationality. The latter is exemplified by Popper and asserts that the external world is real and that science seeks that reality. Objective rationalisty is the traditional system and remains the foundation of science, it reject all authorities other than observation and reason but does accept that no certain conclusions can ever be drawn. Subjective rationalists include pragmatists and naturalists, who note that lack of certainty and conclude that ultimate reality must reside in humans themselves - their motives, objectives and beliefs. The subjective/objective distinction was made by Horkheimer, The Eclipse of Reason (1947), who attacked subjective philosophies noting how they can rationalise any act, for example, "I have to consider my own best interests," or "I was just following orders". Thus subjective rationality can maintain bizarre social practices, such as witchcraft, or become the tool of authoritarian social attitudes. Such social impacts led Horkheimer to reject all subjective rationality, adding that the, "denunciation of what is currently called reason is the greatest service reason can render." Both in science and elsewhere, people who use the word rationality normally mean objective rationality.

Coming now to the meaning of criticise - to find fault with. This is word that does have quite negative overtones but finding fault is exactly what scientists are asked to do with theories - hypothesis testing is a negative logic. However, they are not asked to give just any criticism, it should be rational, reasoned criticism. The three practical characteristics, of such criticism were summed up by Bertrand Russell (1935, p66) in his description of reason, "in the first place it relies upon persuasion rather than force; in the second place it seeks to persuade by arguments which the man using them believes to be completely valid; and in the third place it uses observation .... as much as possible and intuition as little as possible." The first of these rules out the use of inquisitorial methods, the second rules out the use of propaganda and the third rules out appeals to the emotions or self-interest of the audience.

The implication of this is that critically rationalist debate requires certain behaviours from participants, generally that they be seriously seeking the truth. Thus, they must present all arguments they believe to be valid and may only present arguments they believe to be valid; both facts and opinions must be reported honestly. To enable criticism, such presentations must be open and available to all. A further facet of critical rationalism is, "the principle of sufficient reason", decisions are not made arbitrarily but must be founded on reasons that are stated and adequate to justify the verdict.

Critically rational debate in science, involves relevant experiment and the last idea surviving after a period of such debate becomes knowledge. We can never be sure that a piece of knowledge is true, because a better idea or contrary observation may come along later. Nevertheless such knowledge is the closest we can come to knowing external reality. Because doubt can always be expressed, it is often useful to think of knowledge as a contrast concept to a guess (Harré (1972)). Knowledge is the product of a rationally considered choice between alternative hypotheses, rather than choosing between them by guesswork. Thus, one may not randomly choose two alternatives from three, then conduct a rational debate to decide which of these two is correct. Such a mixing of rationality with irrationality is simply irrational.

These principles of critical rationalism generate the ethical imperatives of science. Popper suggested that they separate random ideas from knowledge, pseudoscience from science; modern scientists agree. It is evident that many human dialogues are not critically rationalist. In many situations the aim of participants in dialogue is to "win," whatever that may mean in their circumstances. Accordingly, in Popper's hands, critical rationalism became more than a scientific principle, he saw it as the alternative to all authoritarianism and it guided his political thinking. To him these principles underlay the freedom of speech and democracy upon which western society prides itself. Science is often held up as a bastion against authoritarianism because of this.

Today Popper's ideas are widely accepted. So much so that they are offered as advice to prospective research students. For example, Phillips & Pugh (1987), begin their advice to students by demolishing an older scientific philosophy, the idea that science starts with the gathering of disparate facts by entirely objective and dispassionate researchers:-

The myth of scientific method is that it is inductive: that the formulation of scientific theory starts with the basic raw evidence of the senses - simple unbiased unprejudiced observation. Out of these sensory data, commonly referred to as "facts" - generalizations will form. The myth is that from a disorderly array of factual information an orderly, relevant theory will somehow emerge. However the starting point of induction is an impossible one.

They point out that even scientists are human and begin with their own prejudices:-

There is no such thing as an unbiased observation. Every act of observation is a function of what we have seen or otherwise experienced in the past. All scientific work of an experimental or exploratory nature starts with some expectation about the outcome. This expectation is an hypothesis. They provide the initiative and incentive for the enquiry and influence the method. It is in the light of an expectation that some observations are held to be relevant and some irrelevant, that one methodology is chosen and others discarded, that some experiments are conducted and others are not. Where is your naive pure and objective researcher now?

Then, crucially, they go on - all scientists start with a hypothesis, a model, but they must never think they have proved it - they must try to disprove it :-

Hypotheses arise by guesswork, or by inspiration, but having been formulated they can and must be tested rigorously, using the appropriate methodology. If the predictions you make as a result of deducing certain consequences from your hypothesis are not shown to be correct then you must discard or modify your hypothesis. If the predictions turn out to be correct then your hypothesis has been supported and may be retained until such time as some further test shows it not to be correct. Once you have arrived at your hypothesis, which is a product of your imagination, you then proceed to a strictly logical and rigorous process, based upon deductive argument - hence the term "hypothetico-deductive".

Prejudices may govern how a hypothesis is created but it is illegitimate to display the same prejudice when comparing its predictions with data. A scientist should permit criticism of his ideas and accept disproofs, even of his own models, when they are there.

Probable and Improbable Hypotheses

Not all models are equal. Apart from well thought out concepts, a whole range of improbable or downright silly notions could be created to account for a set of observed results - Heath Robinson could have worked on scientific theories had he so chosen. How one model is chosen for test, and another deemed silly, is for the judgement of scientists but the verdict should not be random. Intuition, guesswork, prejudice, analogy or any other thought process may help conceive a model but, once devised, there is little reason for the judgement of its reasonableness to be personal and absolutely none for the interpretation to be inexplicable or secret. Scientists can articulate the reasons to consider one model, while dismissing another. There are analogous situations.

[...]Great scientists may be distinguished by their insight into how to eliminate unworkable models. This is scientific strategy but it is a phase of reasoning almost never recorded. During their training, scientists do not read books explaining the principles used to reduce the number of hypotheses to be considered. Even so, practising scientists must surely use such principles, possibly subconsciously. Analysis of this thinking is quite disparate. Most thought has been due to philosophers of science, with their demarcation criteria, and to sociologists of science, who simply ask the workers concerned. In both cases their studies are little read by practising scientists, some will be reviewed later. It is strange that this stage of reasoning is so little recorded. Not only is it perfectly possible to make a record but, at times, scientists have an evident duty to do so.

[...]Three Stages of Scientific Method

The hypothetico-deductive method can be seen as requiring three phases in a scientific thought. These phases are -

1. Laying down, or brainstorming, of all possible explanations of an observation. As many hypotheses as possible can be created here as this gives the best chance of the "correct" model being among those considered. The inclusion of incorrect models should be unimportant.

2. A judgement or strategy based screening of the various models to decide between those worthy of being tested and those that can be discarded on some general principle - some demarcation criterion. For this stage to work, it should be regarded as permissible to criticise the ideas put forward in stage 1. The models surviving this stage are likely to be those for which a reasonable … priori (or prima facie) case can be made.

3. Test of surviving models against empirical observation, either by reference to available data, or by designing new and critical experiments.

The three stages need not be executed consecutively. A new hypothesis may be advanced at any time, even after attempts have been made to test other hypotheses. No theory is ever proved. All theories are open to challenge and criticism may be advanced at any time.

Moreover, there is no reason why a new hypothesis should not be proposed by anybody, including people not deemed to be "expert". Non-experts, people without considerable training, would find it difficult to produce a theoretical novelty that could not be dismissed by reference to established experimental data or a demarcation criterion. Even so, there is no logical barrier to them doing so. The task of criticising theories seems easier than that of devising them and may well be within the capabilities of non-experts but, in practice, the difficulty of the task is not the only fence an amateur would have to jump. Even if his new theory, or his criticisms, met all scientific criteria, the non-scientist may not be listened to by professionals. Even well-established scientists find it difficult to get new theories heard against earlier alternatives.

Of the three stages, generally only the third is found in the scientific literature. The processes going on during the first two stages are rarely recorded. This is unfortunate as the agenda of science, its operational timetable, is laid down during those earlier periods. The exclusion of a concept from that agenda is just as important as the inclusion of another, and more capable of invalidating scientific conclusions. Exclusion, at any stage, is equivalent to saying a theory is wrong. No experiment can ever be done without some form of screening process having been performed but the scientific literature explains these stages only after the event or, more probably, does not explain them at all. When it does, the presentation is a sanitised representation of what may have been a messy process.

To put it another way, and more baldly, it is during those first two stages of a scientific programme that decisions are made as to how research funds will be allocated. In the real world, those decisions largely prejudge the outcome of scientific inquiry, yet there is little study of their formation and only the most opaque of records.

Gatekeepers and the Management of Science

Whatever system of philosophy is adopted, science poses certain unavoidable management problems. Its fields are highly specialised and proper, effective decisions depend upon access to technical knowhow. Such expertise is normally available only from the scientists themselves. To ensure such knowledge is available during administrative decisions, certain scientists, are appointed to decision making positions involving, for example, deciding what projects should receive research funds, which individuals will be appointed or promoted, or what papers will be published. The scientists chosen for these roles have often distinguished themselves in some way and are the elite of science. These gatekeepers play a key role in scientific management.

Scientific gatekeepers decide what is, or is not, science. Their corporate decisions define science in an administrative and practical way, marking out the area of human endeavour called science. Something in the nature of gatekeeping exists for all subcultures and the role is a key and often very powerful one. Most professional subcultures try to select gatekeepers so as to avoid their having any personal vested interest in the decisions they will take. However, science is different in this regard. Because of its highly technical nature, science selects its gatekeepers solely from the field being gatekept. As a result, virtually every gatekeeping decision in science is taken by an individual with a very definite self-interest in its outcome. Also, there is almost no definition of gatekeeping responsibilities and virtually no public accountability for the way gatekeepers discharge their roles. Scientific gatekeeping decisions are taken anonymously, even those affected are kept ignorant of the identity of the person who made it and the rationale he used.[...]

It is most disturbing. The gatekeepers of a field are its existing experts. They can exclude views, not merely because those views lack sense, but simply because they "disagree" with them, and in this context "disagree" can have a range of meaning running from "disagree," through "can't reply," to "I'm jealous." In "disagreeing", gatekeepers can and do turn their back on reasoned explanations. This administrative state of affairs flies in the face of Popperian logic, the principles of critical rationalism, openness and freedom of speech. In effect, science is subject to authoritarian government by gatekeepers.

Chronological Order Dictates Merit

It seems that what matters about a theory is not whether it is right or wrong but whether it was proposed first, second or third etc. (Who proposed it also matters, if the innovator is himself already a gatekeeper things are different.) The first theory in a field is advocated by its first workers. Those workers are taken to be experts. New hypotheses are assessed, anonymously and without unaccountability, by the same men who, now acting in the role of gatekeeper, have a vested interest - an interest in thwarting any ideas that threaten to replace those from which their own influence flows. Those "experts" have complete freedom to reply to the alternative in a rationalist way, simply ignore or patronise the upstart idea or perhaps even steal it. If a good argument is available to rebut an alternative theory, they will no doubt present it in their reply. But even if the newly developed theory is plainly superior, the "expert" gatekeeper is in no way obliged to accept or even consider it. New theories can simply be stifled by gatekeeper disinterest.[...]

Weakness of the Hypothetico-deductive Method

Popper's basic idea, of model (or hypothesis) falsification based on critical rationalism and its concomitant antiauthoritarianism, is the accepted base of scientific logic. It is a testing protocol linking scientific ideas to experimental reality. This link, connecting theory, through experiment, to reality, is the reason for the great success of science as a philosophy but it is not a perfect link - it has weaknesses. The main problem is in the early phases of the process. Firstly, science makes almost no record of how it decides which models or theories it should test. Secondly, and compounding the first problem, in the real world scientific judgement is clouded by the personal subjectivities and deviations of scientists themselves. Thus it is that the initial development and selection of models to be tested, a process not necessarily linked to experiment at all, that remains the major logical difficulty inherent in the paradigm of falsification.

Robert K. Merton enunciated principles of scientific ethics which included Universalism, the belief that ideas must be considered without regard for their origins or who proposed them and this is implicit in Popper's logic. However, that cannot mean all theories must be translated into experiment, that would be impractical. To put it baldly, again, the problem is how to decide which research projects to fund. Especially, how this is decided when sociological observation indicates that the advice given by scientists themselves is hampered by personal subjectivities and deviations from logic. It is necessary to have some ground, some demarcation criterion, to decide before experiment, which theories are most likely to be correct.

In law, similar problems can arise. On the basis of the law and the evidence before him, a judge must often try a case but be unsure of the right decision. If the case is a criminal case, the benefit of this doubt will go to the defendant. In a civil action a judge may be forced to take some kind of practical line. He does not have the luxury of scratching his head for ever, he must decide on the balance of probability. He will need to find a rationale, even if it is not perfectly logical. This may lead the judge to error but it is unlikely it will lead him to fraud - he must give an open account of his judgement and explain the case and how it relates to the law. If he gets these things wrong his judgement is subject to appeal. What is more, a judge should never try a case in which there was any hint of a personal interest.

In one role, a scientist can scratch his head and vacillate between two theories for ever, or stick to a wrong theory purely to save face. There will always be some argument to put. Set against a great mass of often conflicting experimental data, no opposing scientific theory will ever be completely perfect. But gatekeepers are the judges of science and for scientists in this things are different, at the end of the day they must decide. When they go home at night, they must have made funding decisions, or job appointment decisions, or publication decisions. They must decide - whether or not they are sure. A rationale must be found even if it is not perfectly logical. However, although he is forming a judgement, the gatekeeper is not in nearly the same position as a judge. He is not subject to the discipline of explaining his decisions or recounting any scientific law or principle. What is more, he would not be deciding the issue at all unless he had a vested interest in its outcome. For the gatekeeper the temptation to follow the easy route of his interests or relativism must be very real.

In these circumstances problems arise, more for everyone else than for the gatekeeper. There are logical approaches, demarcation criteria, for selecting without experiment those theories most likely to be valid and therefore to reward funding. But how can anyone be sure the gatekeeper follows them? The observer is in a predicament. Strictly, the problem should be addressed by the administrators of science but, [...] they are content. That is not surprising - they are the gatekeepers.

Reducing the number of models - Demarcation Criteria

We will now turn to the question of which hypotheses are scientific. How to choose from a range of possibilities those hypotheses that are worthy of attention and deserve to be pursued. Philosophers of science address this problem by laying down demarcation criteria. A new theory should then be tested against the chosen criterion. Those ideas which satisfy the demarcation criteria would be most likely to be productive and most attention would be payed to them. The following sections present a series of demarcation criteria, though it may not be complete.


The main demarcation criterion associated with Popper is falsifiability - in order to be scientific, a hypothesis should be falsifiable - it should make predictions that can be tested by observation or experiment. By tested, Popper meant some of its predictions must be such that, at least in principle, the contrary could be observed. This was his primary demarcation criterion and was seen by him as very important. On this basis, for example, he criticised the various schools of psychiatric thought because each could accommodate all observations. As a result the ideas did not compete with one another and attempts to distinguish them could not be informative. This test separates the hypotheses inherent in an act of faith - religion for example - from a scientific hypothesis. The statement, "God created the heavens and the earth," cannot be contradicted by observation. Therefore, Popper would not see it as a scientific hypothesis, whether or not it is believed true.

The idea is that only models which can, in principle, be falsified are scientific - others need not be considered. It is useful to view this assertion from a different perspective. Popper is saying that, to be meaningful, a scientific theory must deny something. The idea must prohibit some observations from being made; this is extremely important, because Popper's logic is purely negative, it asserts that the actual meaning of a theory lies not in what it asserts about the universe but what it denies. Some philosophers go further, arguing that any statement has meaning only in what it denies. Thus, even a sentence as simple as, "this paper is white," actually means, "this paper is not, not white." I.e. it is not green, not blue etc.

Falsifiability is the first example of a strategy, or general principle, for reducing the number of models. It is probably the most widely discussed demarcation criterion and shows at once that asserting a scientific theory is equivalent to denying alternatives.

Popper listed two other criteria besides falsifiability. Firstly, a good, new theory should, "proceed from some simple, new, and powerful unifying idea," (Conjectures and Refutations). It should, in principle, be able to unify a body of knowledge that would otherwise be a set of disparate facts. Secondly, Popper held that it should pass some tests. A good new theory should make at least one successful prediction not apparent from existing theory. This seems rather restrictive but is not as bad as sounds. Popper would not have demanded that a theoretical astronomer build a radio telescope before publishing a new theory. Predictions explaining data within existing knowledge do meet this criterion.[...]

2.16 Metaphysical Logic and Scientific Logic

It is undesirable to believe a proposition when there is no ground whatever for supposing it true. (Bertrand Russell, Sceptical Essays)

The distinction between science and metaphysics is significant because there seems to be a significant difference between the logics of metaphysics and science. Science seeks to disprove a hypothesis and a persistent failure to do so leads to its acceptance. This is the negative logic of falsification. Metaphysics is not quite like this; before the existence of a postulated entity should be accepted, there needs to be positive reason to require the existence in question. For example, the postulate of life on Mars is a postulate of existence. It may be believed or not but well-justified belief would require positive supportive evidence, such as Martian roses.

In laying down theories, scientists do not normally distinguish science from metaphysics. That may be unfortunate, much of the philosophical disputation between confirmation and elimination of theories might be removed if this were done. Metaphysical logic seems to be largely the positive logic of confirmation, while scientific logic seems largely the negative logic of falsification.

Popper's hypothetico-deductive model applies to the scientific parts of theories but not so obviously to their metaphysical elements. It is generally a very difficult, or even universally impossible, task to disprove a metaphysical postulate. Even though it seems very unlikely, it would be difficult to actually prove that there is no life on the moon.

However, it is only when a metaphysical idea has supportive evidence that it becomes important. As an example, consider the atomic theory of matter. As every schoolboy knows, the idea of atoms was originally advanced by the Greeks but in this form the idea was metaphysical speculation unsupported by evidence. The idea of atoms was merely a conjecture, unproven, unlinked to any body of experimental evidence, and irrelevant to any possible course of action. Agnosticism was a rational view of the debate about atoms until Dalton's chemical laws, based as they were on observation, began to require them for chemical interpretations. The observations that positively required atoms also made them relevant, and they began to influence men's actions. In the twentieth century, photographs of atoms have been obtained, and disbelief has become irrational.

In logic, then, you just cannot win. Theories need positive evidence for the entities whose existence they postulate. Then they need negative disproof of competing theories.[...]

Occam's Razor - the Coherence Criterion

A principle stated in correspondence by Dr. John Maddox, as Editor-in-Chief of Nature is that a hypothesis should be "grounded on previous understanding or observation." To take his example, in the nineteenth century there might have been competing hypotheses about the make up of the moon. One school of thought arguing the moon was made of rock, another school advancing the view that it was green cheese. As he says, even without experiment intelligent scientists would not have considered the green cheese hypothesis, because it was founded upon no present knowledge or observation. There are other, rather trite, reasons to reject the green cheese model. Cheese is a dairy product made by men from milk, in turn produced by lactating mammals. The green cheese hypothesis implies that men and other mammals are at large within the solar system, giving the green cheese hypothesis some very complex, improbable and unsupported implications.

The existence of such complex ramifications is a general reason for rejecting, or at least downgrading, a hypothesis without experiment. All this boils down to Occam's razor - hypotheses involving the least possible departure from the existing body of knowledge are most likely to be correct. Hypotheses that pick up well-established ideas from related areas inherit much of their supportive evidence, much as an organism inherits many characteristics from its evolutionary forebears.

Occam's razor is related to the idea of coherence with existing knowledge. To understand coherence one may think of all knowledge as being cut into a large number of small pieces much like a jigsaw puzzle. To reassemble the picture we must examine a piece to see if the pattern on it fits in with, or coheres with, the pattern on those pieces we have already assembled in that area. For a new piece of knowledge fits comfortably in place, the shape of knowledge painted onto it should form a continuous pattern with, or cohere with, surrounding pieces.

A new claim to knowledge or a hypothesis which fails to cohere with surrounding knowledge is an extraordinary claim. Its acceptance would demand the revision of knowledge within those surrounding areas and, consequently, its acceptance demands extraordinary evidence.

Coherence, or Occam's razor, is a well known and important principle but two important caveats should be stated. Firstly, the coherence criterion must be used with care and moderation, applied rigidly it produces closed systems of thought. The pieces of the jigsaw already assembled may actually be in the wrong places. Secondly, the existing body of knowledge means exactly what it says and knowledge is well-founded belief (Popper). The existing body of knowledge does not mean the existing body of hypotheses. To be of any real value, a new idea must compete with existing suppositions used to explain the same data set. It is diametrically wrong to demand of a new hypothesis that it be consistent with the ideas it sets out to replace.

Hypothesis Testing and Probability

Many years before Popper, Bayes investigated the branch of mathematics applied to formal hypothesis evaluation and now known as Bayesian statistics. A scientific investigation links experimental results with the probability assignments attached to particular hypotheses. Before any experimental test is performed initial probabilities (known as antecedent probabilities) must be assigned to the various hypotheses. As experimental data become available these antecedent probabilities are adjusted up or down depending on whether the observations support or do not support the corresponding hypothesis. The theorem used to adjust the probabilities is known as Bayes' theorem. Some fields can use the procedure quite formally. For example, in medical diagnostics, antecedent probabilities reflect the incidence of a disease in the population. In practical science Bayes' theorem has little formal use because of the general difficulty giving objective numerical values to the antecedent probabilities. Accordingly, the theorem is neither stated nor used here. Even so, scientists must intuitively use Bayes' theorem, assigning antecedent probabilities by judgement.

Mathematicians have investigated the fallacies arising in Bayesian statistics, some of which help to clarify points made earlier. A hypothesis is meaningful only if it partitions the possibility space; for example, the hypotheses that a dice will fall as a five or as an odd number are both meaningful in that they can both be wrong - it may fall as a four. On the other hand, the hypothesis that the dice will fall with a number uppermost is not meaningful because all possible outcomes are numbers - the hypothesis cannot be falsified because it fails to partition the possibility space. This failure is what philosophers of science mean when a hypothesis is described as vacuous.

A hypothesis may be "academic" (in a pejorative sense); whether it be true or not will make no difference to actions or beliefs flowing from the statistical analysis. The distinction is important for doctors making a diagnosis - only if two diseases require different treatment, is the physician concerned to know which his patient suffers from. Returning to the example of the dice, whether it falls as a five or not will affect my actions only if I am playing snakes and ladders or have some other link to this test. For most people, the outcome of throwing dice is academic and uninteresting. In science, this pejorative form of the word academic means that whether a hypothesis is true will have no effect on perceptions of the world or how people act.

Finally, note again that a hypothesis set should be well chosen and, without overlap, cover all possible explanations. It is hard, in science, to prove that a hypothesis set does entirely cover the possibility space. The proper response to this problem is to contemplate the possibility that all the considered hypotheses are wrong. It remains very wrong to use a hypothesis set that is known not cover the possibility space.

Assessment of Antecedent Probabilities

Much of the intuitive Bayesian statistics used by practising scientists consists of the assignment of antecedent probabilities to any suggested hypotheses. This is the statistical equivalent of initial hypothesis screening [...]. If a hypothesis fails to cohere with existing knowledge, it is right to assign it a low antecedent probability. Only very clear evidence supporting it, and contradicting more cohering hypotheses, will bring its probability assignment up to a point where it would be accepted.

Invalid criteria such as relativism and self-interest will intrude on the intuitive assignment of antecedent probabilities. They will lead to the assignment of a low antecedent probability to a correct hypothesis and vice versa. However, unless the correct theory is actually assigned an antecedent probability of zero, this should only slow things down. The objective application of Bayes' theorem would steadily improve the probability assigned to the correct hypothesis as experimental data became available. (In Bayesian statistics, antecedent probabilities can, in principle, be assigned randomly but still ultimately produce good knowledge. This may be how some sciences arose from areas we would today classify as mythology. Alchemy for example led to chemistry and astrology to astronomy.) Only if the correct hypothesis is dismissed entirely will Bayesian statistics fail. If the antecedent probability assigned to a correct hypothesis is zero, Bayes' theorem will keep the probability at zero no matter what the outcome of experiment and the remaining ideas will become a closed system of thought. This seems to be true of the intuitive Bayesian scientist, just as it is of the formal statistical process.

Intuitive Bayesian statistics are applied both by individuals and by the community of scientists. Both levels will assign intuitive antecedent probabilities to hypotheses and both, being human, will err. [...] In general, the scientific community is too willing both to assign a probability of zero to dissenting ideas and to assign a probability of one its own beliefs.

The Origins of Uncertainty

It is universally accepted, and implied by use of probability theory in hypothesis testing, that no scientific theory can be known, with total certainty, to be true. Scientific certainty is lost in two general ways - uncertainty in the outcome of experiments and uncertainty in their interpretation. Our certainty in the outcome of experiments is greatly increased by care in its execution and repetition by other groups or on analogous systems. Unfortunately, these hardly improve our confidence in the interpretation of the results.

Clearly repeat experiments and studies on related systems has a role in ensuring validity of results but there are also structural and social reasons for such studies. If an experiment is cheap, quick and already within the laboratory's range, it is quite easy to perform a series of studies around a theme. Moreover, results that accord with earlier data are theoretically uncontroversial and, if the field already understands a technique, other workers are less likely to obstruct publication by raising queries about the validity of the observations. Thus, a large body of publication can quickly accumulate that hinges round one basic experiment.

For purposes of interpretation it is important to realise that, for all its size, that body of papers only amounts to one experiment. Failing to recognise this is to act like the man Wittgenstein mentions in Philosophical Investigations, who purchases several copies of the morning paper to reassure himself that what he reads there is true. Committing this fallacy is both a common individual fault and also structurally embedded in modern scientific administration. Of course, scientists do not buy many copies of their morning paper, but they do publish many copies of the same, or very similar, experiment; then they point to the "mountain of evidence" supporting their ideas.

Experiments report reality much as newspapers report news. The hypothesis used to explain their outcome is the impression of reality they give. Like a newspaper article, the scientific observations may be clear and accurate, or misleading and inaccurate. Because observations may be inaccurate, they need to be reported in a way that enables other workers to replicate them. Because the observation may be misleading, even though accurate, the generated hypothesis should be confirmed by data which is as unrelated to the original observations as possible. Reverting to Wittgenstein's analogy, his man would have been well advised to read another newspaper, one which employed a different reporter who, himself, employed different sources for the news he reported.

This point has been made by many philosophers of science; for example, in the nineteenth century, Whewell, adopted it as a criterion of induction, referring to it as the consilience of hypotheses. Although we no longer think there is a logic of induction, his point remains valid as a means of increasing our confidence in a theory. On the same lines, Popper asserted that a hypothesis supported by data of two or more distinctly different types should be preferred to an alternative able to explain only a narrow domain of data.

In summary, repetition offers confidence that the published data are accurate but those scientists who believe that repetition of data can support ideas are buying too many copies of the morning paper. No matter how many times an experiment, or its close siblings, are repeated - one hundred times or one thousand papers - repetition adds no assurance that any particular interpretation of that result into a hypothesis is valid. If another idea will explain the data from one such experiment, then it will equally apply to any number of repetitions. Assurance of interpretation can come only by comparing the success of competing hypotheses in interpreting data from disparate areas. The more dissimilar are the sources of data used the better, providing only that they do fall within the range of application of the hypotheses in question. Modern scientific administrations fail to recognise this fallacy, a failure linked closely to the procedures they use for quality assessment.

Quality Assessment - Peer Review and Citation Analysis

Science managers and gatekeepers base many policies and decisions on quality assessments. Consequently, how quality is defined, maintained and assessed, is a pivotal issue for modern science - it is also one of the few areas in which scientific practice overlaps with scientific philosophy. In principle assessment of quality in research programmes should include a rational assignment of the antecedent probability of the underlying ideas. In practice, however, the methods adopted simply abandon rationality and one of them jumps head first into Wittgenstein's fallacy, buying as many copies of the morning paper as leaders in a field might find convenient. Assessments are made at several levels, for example of :-

* Research projects before they are funded.

* The value of work before it is published in the scientific literature.

* The worth of researchers before they are appointed to posts.

These prospective evaluations are usually made by peer review. Referees, anonymous experts in the field, are selected by scientific authorities. The expert will then write a report, which is taken to be an objective evaluation of the work in question, but that report is unlikely to make any attempt at explanation and it may not be seen by the scientist concerned, who will have little or no opportunity to reply if he does see it. Besides these initial screening steps, post hoc assessments are also made of :-

* The "success" of published articles in terms of their scientific impact when set against competing articles.

* The "success" of published scientists in terms of their scientific impact when set against competing workers.

* The "status" of institutions and journals.

Sometimes such assessments are made by committees of experts but one of the most important tools used for the appraisal is citation analysis, a tool developed over the past twenty to thirty years.

A scientific paper does not stand alone, it builds on what has gone before, using other workers ideas, techniques and results. To place the work in context, the scientific article ends with a list of relevant publications showing where the ideas it used came from[...]. These are citations and they interested an American named Eugene Garfield. His Institute of Scientific Information (ISI) notes every scientific paper published and, from their citation lists, constructs a computer database, called the Science Citations Index (SCI). Scientists can use the SCI to find all papers citing any earlier article. It has proved to be a very valuable research tool, enabling workers to research a topic forward through the literature, whereas traditional abstracting media permitted only a backwards search.

The SCI is also used in quality assessments. Using it, one can easily determine how often, or whether, a paper is cited by subsequent publications, a process called citation analysis. The argument is that rarely-cited studies cannot have been very important. In making this count, the ISI itself carefully avoids the term "quality", preferring to call the resulting measure the "impact" of a paper, but scientific institutions do take this impact as a measure of quality. Journals and institutions can also be ranked according to the impact of articles published during a given period. Journals even tout their impact rating when advertising to libraries for sales or soliciting the scientific community for new papers.

This way of assessing quality means that the citation practices of authors influence the assessment of the work done by their contemporaries and colleagues. If a scientific theory is not mentioned by establishment figures, and the articles which propose it are not cited by them, the theory is automatically assessed as of low quality, even if no reason for disregarding it has been given. By contrast, if scientists go to great lengths to rebut an incorrect theory, that theory will be assessed as being of high quality, even if most observers regarded the theory as absurd from the outset.

Whatever its value as a management tool, quality assessment by citation analysis is clearly prone to Wittgenstein's fallacy. Moreover, its practical implications for the assessment of theories are clear. Under that process, ignoring, or not citing, a theory is the same as rejecting it. For their part, scientists are well aware of the quality assessment procedures used and the implications of their actions. When a scientist disregards a theory, he knows the result this will have for its assessment and presumably intends that outcome. In short, a scientist who chooses to ignore a theory, is broadcasting a message about that theory - namely that he rejects the theory as of low quality. The message thus broadcast may be implicit but the scientist knows it is sent, he knows who receives it and he knows how they will interpret and act upon it.

Both citation analysis and peer review are highly questionable as methods of quality assessment and amount to little more than statements of establishment opinion[...].

© Copyright John A Hewitt.

[Source: John A Hewitt - A Habit of Lies - How Scientists Cheat :]