Uncomfortable science, as identified by statistician John Tukey,[1][2] comprises situations in which there is a need to draw an inference from a limited sample of data, where further samples influenced by the same cause system will not be available. More specifically, it involves the analysis of a finite natural phenomenon for which it is difficult to overcome the problem of using a common sample of data for both exploratory data analysis and confirmatory data analysis. This leads to the danger of systematic bias through testing hypotheses suggested by the data.
A typical example is Bode's law, which provides a simple numerical rule for the distances of the planets in the Solar System from the Sun. Once the rule has been derived, through the trial and error matching of various rules with the observed data (exploratory data analysis), there are not enough planets remaining for a rigorous and independent test of the hypothesis (confirmatory data analysis). We have exhausted the natural phenomena. The agreement between data and the numerical rule should be no surprise, as we have deliberately chosen the rule to match the data. If we are concerned about what Bode's law tells us about the cause system of planetary distribution then we demand confirmation that will not be available until better information about other planetary systems becomes available.
See also
- Cosmic variance for an extreme example of this phenomenon
- Data mining
Bibliography
- Diaconis, P. (1985). "Theories of data analysis: from magical thinking through classical statistics". In Hoaglin, D.C; et al. (eds.). Exploring Data Tables Trends and Shapes. Wiley. ISBN 0-471-09776-4.
References
- ↑ Norel, R.; Rice, J. J.; Stolovitzky, G. (2011). "The self-assessment trap: Can we all be better than average?". Molecular Systems Biology. 7: 537. doi:10.1038/msb.2011.70. PMC 3261704. PMID 21988833.
- ↑ Hoaglin, D.C; et al. (eds.). Exploring Data Tables Trends and Shapes. Wiley. ISBN 0-471-09776-4.
Much of science also falls under John Tukey's label "uncomfortable science," because real repetition is not feasible or practical.