1、 济南大学 泉城学院 毕业论文 外文资料翻译 - 1 - Statistical hypothesis testing Adriana Albu, Loredana Ungureanu Politehnica University Timisoara, adrianaaaut.utt.ro Politehnica University Timisoara, loredanauaut.utt.ro Abstract In this article, we present a Bayesian statistical hypothesis testing inspection, testing t
2、heory and the process Mentioned hypothesis testing in the real world and the importance of, and successful test of the Notes. Key words Bayesian hypothesis testing; Bayesian inference; Test of significance Introduction A statistical hypothesis test is a method of making decisions using data, whether
3、 from a controlled experiment or an observational study (not controlled). In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold probability, the significance level. The phrase test of significance was c
4、oined by Ronald Fisher: Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first.1 Hypothesis testing is sometimes called confirmatory data analysis, in contrast to e
5、xploratory data analysis. In frequency probability, these decisions are almost always made using null-hypothesis tests. These are tests that answer the question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme
6、as the value that was actually observed?) 2 More formally, they represent answers to the question, posed before undertaking an experiment, of what outcomes of the experiment would lead to rejection of the null hypothesis for a pre-specified probability of an incorrect rejection. One use of hypothesi
7、s testing is deciding whether experimental results contain enough information to cast doubt on conventional wisdom. Statistical hypothesis testing is a key technique of frequentist statistical inference. The Bayesian approach to hypothesis testing is to base rejection of the hypothesis on the poster
8、ior probability.34 Other approaches to reaching a decision based on data are available via decision theory and optimal decisions. The critical region of a hypothesis test is the set of all outcomes which cause the null hypothesis to be rejected in favor of the alternative hypothesis. The critical re
9、gion is usually denoted by the letter C. One-sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory or are calculated from the population. 济南大学 泉城学院 毕业论文 外文资料翻译 - 2 - Two-sample tests are appropriate for
10、comparing two samples, typically experimental and control samples from a scientifically controlled experiment. Paired tests are appropriate for comparing two samples where it is impossible to control important variables. Rather than comparing two sets, members are paired between samples so the diffe
11、rence between the members becomes the sample. Typically the mean of the differences is then compared to zero. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. T-tests are appropriate for comparing means under relaxed condition
12、s (less is assumed). Tests of proportions are analogous to tests of means (the 50% proportion). Chi-squared tests use the same calculations and the same probability distribution for different applications: Chi-squared tests for variance are used to determine whether a normal population has a specifi
13、ed variance. The null hypothesis is that it does. Chi-squared tests of independence are used for deciding whether two variables are associated or are independent. The variables are categorical rather than numeric. It can be used to decide whether left-handedness is correlated with libertarian politi
14、cs (or not). The null hypothesis is that the variables are independent. The numbers used in the calculation are the observed and expected frequencies of occurrence (from contingency tables). Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data. The null hypothes
15、is is that the curve fit is adequate. It is common to determine curve shapes to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation sums the squared errors. F-tests (analysis of variance, ANOVA) are commonly used when deciding whether groupings of data by catego
16、ry are meaningful. If the variance of test scores of the left-handed in a class is much smaller than the variance of the whole class, then it may be useful to study lefties as a group. The null hypothesis is that two variances are the same - so the proposed grouping is not meaningful. The testing pr
17、ocess In the statistical literature, statistical hypothesis testing plays a fundamental role. The usual line of reasoning is as follows: 1. There is an initial research hypothesis of which the truth is unknown. 2. The first step is to state the relevant null and alternative hypotheses. This is impor
18、tant as mis-stating the hypotheses will muddy the rest of the process. Specifically, the null hypothesis allows attaching an attribute: it should be chosen in such a way that it allows us to conclude whether the alternative hypothesis can either be accepted or stays undecided as it was before the te
19、st.9 3. The second step is to consider the statistical assumptions being made about the sample in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is 济南大学 泉城学院 毕业论文 外文资料翻译 - 3 - equally important as invalid a
20、ssumptions will mean that the results of the test are invalid. 4. Decide which test is appropriate, and state the relevant test statistic T. 5. Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases this will be a well-known result. For exampl
21、e the test statistic may follow a Students t distribution or a normal distribution. 6. Select a significance level (), a probability threshold below which the null hypothesis will be rejected. Common values are 5% and 1%. 7. The distribution of the test statistic under the null hypothesis partitions
22、 the possible values of T into those for which the null-hypothesis is rejected, the so called critical region, and those for which it is not. The probability of the critical region is . 8. Compute from the observations the observed value tobs of the test statistic T. 9. Decide to either fail to reje
23、ct the null hypothesis or reject it in favor of the alternative. The decision rule is to reject the null hypothesis H0 if the observed value tobs is in the critical region, and to accept or fail to reject the hypothesis otherwise. Use and Importance Statistics are helpful in analyzing most collectio
24、ns of data. This is equally true of hypothesis testing which can justify conclusions even when no scientific theory exists. Real world applications of hypothesis testing include 7: Testing whether more men than women suffer from nightmares Establishing authorship of documents Evaluating the effect o
25、f the full moon on behavior Determining the range at which a bat can detect an insect by echo Deciding whether hospital carpeting results in more infections Selecting the best means to stop smoking Checking whether bumper stickers reflect car owner behavior Testing the claims of handwriting analysts
26、 Statistical hypothesis testing plays an important role in the whole of statistics and in statistical inference. For example, Lehmann (1992) in a review of the fundamental paper by Neyman and Pearson (1933) says: Nevertheless, despite their shortcomings, the new paradigm formulated in the 1933 paper
27、, and the many developments carried out within its framework continue to play a central role in both the theory and practice of statistics and can be expected to do so in the foreseeable future. Significance testing has been the favored statistical tool in some experimental social sciences (over 90%
28、 of articles in the Journal of Applied Psychology during the early 1990s).8 Other fields have favored the estimation of parameters. Editors often consider significance as a criterion for the publication of scientific conclusions based on experiments with statistical results. Cautions The successful hypothesis test is associated with a probability and a type-I error rate. The conclusion might be wrong.