ATLAS Offline Software
|
Functions | |
def | ks_2samp (data1, data2, binned=False) |
def | chi2_2samp (data1, data2, normed=True, binned=True) |
def | BDM_2samp (data1, data2, normed=True, binned=True) |
def | CVM_2samp (data1, data2, normed=True, binned=False) |
def | _anderson_ksamp_midrank_binned (data, Z, Zstar, k, n, N) |
def | _anderson_ksamp_midrank (samples, Z, Zstar, k, n, N) |
def | _anderson_ksamp_right (samples, Z, Zstar, k, n, N) |
def | anderson_ksamp (data1, data2, binned=False, midrank=True) |
def | _log_fac (m, n) |
def | _log_binomial (n, k) |
def | _loglikelihood (data1, data2, N_data1, N_data2, ratio, H="H0") |
def | _zloglikelihood (data1, data2, N_data1, N_data2, H="H0") |
def | likelihoodratio_ksamp (data1, data2, normed=True, binned=True) |
def | likelihoodvalue_ksamp (data1, data2, normed=True, binned=True) |
Variables | |
bool | DEBUG = False |
list | statistic_seq = [] |
Ks_2sampResult = namedtuple('Ks_2sampResult', ('statistic', 'pvalue','ndf')) | |
Chi2_2sampResult = namedtuple('Chi2_2sampResult', ('statistic', 'pvalue','ndf')) | |
BDM_2sampResult = namedtuple('BDM_2sampResult', ('statistic', 'pvalue', 'ndf')) | |
CVM_2sampResult = namedtuple('CVM_2sampResult', ('statistic', 'pvalue', 'ndf')) | |
Anderson_ksampResult = namedtuple('Anderson_ksampResult', ('statistic', 'pvalue', 'ndf')) | |
LikelihoodRatio_ksampResult = namedtuple('LikelihoodRatio_ksampResult', ('statistic', 'pvalue', 'ndf')) | |
LikelihoodValue_ksampResult = namedtuple('LikelihoodValue_ksampResult', ('statistic', 'pvalue', 'ndf')) | |
|
private |
Compute A2akN equation 7 of Scholz and Stephens. Parameters ---------- samples : sequence of 1-D array_like Array of sample arrays. Z : array_like Sorted array of all observations. Zstar : array_like Sorted array of unique observations. k : int Number of samples. n : array_like Number of observations in each sample. N : int Total number of observations. Returns ------- A2aKN : float The A2aKN statistics of Scholz and Stephens 1987.
Definition at line 309 of file test_statistics.py.
|
private |
Definition at line 288 of file test_statistics.py.
|
private |
Compute A2akN equation 6 of Scholz & Stephens. Parameters ---------- samples : sequence of 1-D array_like Array of sample arrays. Z : array_like Sorted array of all observations. Zstar : array_like Sorted array of unique observations. k : int Number of samples. n : array_like Number of observations in each sample. N : int Total number of observations. Returns ------- A2KN : float The A2KN statistics of Scholz and Stephens 1987.
Definition at line 361 of file test_statistics.py.
|
private |
Definition at line 542 of file test_statistics.py.
|
private |
Definition at line 539 of file test_statistics.py.
|
private |
Definition at line 548 of file test_statistics.py.
|
private |
Definition at line 564 of file test_statistics.py.
The Anderson-Darling test for 2 samples. It tests the null hypothesis that the 2 samples are drawn from the same population without having to specify the distribution function of that population. The critical values depend on the number of samples. Parameters ---------- samples : sequence of 1-D array_like Array of sample data in arrays. Returns ------- statistic : float AD statistic pvalue : float two-tailed p-value ndf : NaN 「Degrees of freedom」: the degrees of freedom for the p-value. Always NaN in this case. Raises ------ ValueError If less than 2 samples are provided, a sample is empty, or no distinct observations are in the samples. See Also -------- ks_2samp : 2 sample Kolmogorov-Smirnov test Notes ----- This code is modified from scipy.stats.anderson and extended with supporting on frequency data. See [1]_ for further information. [2]_ Defines three versions of the k-sample Anderson-Darling test: one for continuous distributions and two for discrete distributions, in which ties between samples may occur. The default of this routine is to compute the version based on the midrank empirical distribution function. This test is applicable to continuous and discrete data. If midrank is set to False, the right side empirical distribution is used for a test for discrete data. According to [1]_, the two discrete test statistics differ only slightly if a few collisions due to round-off errors occur in the test not adjusted for ties between samples. References ---------- .. [1] scipy.stats.anderson — SciPy Reference Guide http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson.html .. [2] Scholz, F. W and Stephens, M. A. (1987), K-Sample Anderson-Darling Tests, Journal of the American Statistical Association, Vol. 82, pp. 918-924. Examples -------- >>> from scipy import stats >>> np.random.seed(314159) The null hypothesis that the two random samples come from the same distribution can be rejected at the 5% level because the returned test value is greater than the critical value for 5% (1.961) but not at the 2.5% level. The interpolation gives an approximate significance level of 3.1%: >>> stats.anderson_ksamp([np.random.normal(size=50), ... np.random.normal(loc=0.5, size=30)]) (2.4615796189876105, array([ 0.325, 1.226, 1.961, 2.718, 3.752]), 0.03134990135800783) The null hypothesis cannot be rejected for three samples from an identical distribution. The approximate p-value (87%) has to be computed by extrapolation and may not be very accurate: >>> stats.anderson_ksamp([np.random.normal(size=50), ... np.random.normal(size=30), np.random.normal(size=20)]) (-0.73091722665244196, array([ 0.44925884, 1.3052767 , 1.9434184 , 2.57696569, 3.41634856]), 0.8789283903979661)
Definition at line 398 of file test_statistics.py.
The Bhattacharyya test tests the null hypothesis that 2 given frequencies are drawn from the same distribution. Parameters ---------- data1, data2 : sequence of 1-D ndarrays Input data. Observed frequencies in each category. normed : bool, optional If True/False, shape/value comparison test. binned : bool, optional If True/False, assume the type of input data is observed frequencies/samples. Returns ------- statistic : float The Bhattacharyya distance measure. pvalue : float Corresponding p-value. ndf : int 「Degrees of freedom」: the degrees of freedom for the p-value. The p-value is computed using a chi-squared distribution with k - 1 degrees of freedom, where k is the number of observed frequencies without bins has zero counts in both histograms.
Definition at line 196 of file test_statistics.py.
The chi square test tests the null hypothesis that 2 given frequencies are drawn from the same distribution. Parameters ---------- data1, data2 : sequence of 1-D ndarrays Input data. Observed frequencies in each category. normed : bool, optional, default: True If True/False, shape/value comparison test. binned : bool, optional, default: True If True/False, assume the type of input data is observed frequencies/samples. Returns ------- statistic : float The chi-squared test statistic. pvalue : float Corresponding p-value. ndf : int 「Degrees of freedom」: the degrees of freedom for the p-value. The p-value is computed using a chi-squared distribution with k - 1 degrees of freedom, where k is the number of observed frequencies without bins has zero counts in both histograms. Note ---- This code is modified from scipy.stats.chisquare and extended with supporting on 2 sample cases and shape comparison test.
Definition at line 142 of file test_statistics.py.
Computes the Cram ́er-von-Mises statistic on 2 samples/frequencies. This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution. Parameters ---------- data1 ,data2 : sequence of 1-D ndarrays Input data. Can be either observed frequencies in each category or array of sample observations. binned : bool, optional, default: False If True/False, assume the type of input data is observed frequencies/samples. Returns ------- statistic : float CVM statistic pvalue : float two-tailed p-value ndf : NaN 「Degrees of freedom」: the degrees of freedom for the p-value. Always NaN in this case.
Definition at line 240 of file test_statistics.py.
def python.test_statistics.ks_2samp | ( | data1, | |
data2, | |||
binned = False |
|||
) |
Computes the Kolmogorov-Smirnov statistic on 2 samples/frequencies. This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution. Parameters ---------- data1, data2 : sequence of 1-D ndarrays Input data. Can be either observed frequencies in each category or array of sample observations. binned : bool, optional, default: False If True/False, assume the type of input data is observed frequencies/samples. Returns ------- statistic : float KS statistic pvalue : float two-tailed p-value ndf : NaN 「Degrees of freedom」: the degrees of freedom for the p-value. Always NaN in this case. Notes ----- This code is modified from scipy.stats.ks_2samp and extended with supporting on frequency data. See [1]_ for further information. This tests whether 2 samples are drawn from the same distribution. Note that, like in the case of the one-sample K-S test, the distribution is assumed to be continuous. This is the two-sided test, one-sided tests are not implemented. The test uses the two-sided asymptotic Kolmogorov-Smirnov distribution. If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same. References ---------- .. [1] scipy.stats.mstats.ks_2samp — SciPy Reference Guide http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.ks_2samp.html .. [2] Frank C. Porter. (2008). Testing Consistency of Two Histograms, arXiv:0804.0380 [physics.data-an]. Examples -------- >>> from scipy import stats >>> np.random.seed(12345678) #fix random seed to get the same result >>> n1 = 200 # size of first sample >>> n2 = 300 # size of second sample For a different distribution, we can reject the null hypothesis since the pvalue is below 1%: >>> rvs1 = stats.norm.rvs(size=n1, loc=0., scale=1) >>> rvs2 = stats.norm.rvs(size=n2, loc=0.5, scale=1.5) >>> stats.ks_2samp(rvs1, rvs2) (0.20833333333333337, 4.6674975515806989e-005) For a slightly different distribution, we cannot reject the null hypothesis at a 10% or lower alpha since the p-value at 0.144 is higher than 10% >>> rvs3 = stats.norm.rvs(size=n2, loc=0.01, scale=1.0) >>> stats.ks_2samp(rvs1, rvs3) (0.10333333333333333, 0.14498781825751686) For an identical distribution, we cannot reject the null hypothesis since the p-value is high, 41%: >>> rvs4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0) >>> stats.ks_2samp(rvs1, rvs4) (0.07999999999999996, 0.41126949729859719)
Definition at line 22 of file test_statistics.py.
The likelihood-ratio test tests the null hypothesis that 2 given frequencies are drawn from the same distribution. Parameters ---------- data1, data2 : sequence of 1-D ndarrays Input data. Observed frequencies in each category. normed : bool, optional, default: True If True/False, shape/value comparison test. binned : bool, optional, default: True If True/False, assume the type of input data is observed frequencies/samples. Returns ------- statistic : float The llike-ratio test statistic. pvalue : float Corresponding p-value. ndf : int 「Degrees of freedom」: the degrees of freedom for the p-value. The p-value is computed using a chi-squared distribution with k - 1 degrees of freedom, where k is the number of observed frequencies without bins has zero counts in both histograms.
Definition at line 581 of file test_statistics.py.
The likelihood-value test tests the null hypothesis that 2 given frequencies are drawn from the same distribution. Parameters ---------- data1, data2 : sequence of 1-D ndarrays Input data. Observed frequencies in each category. normed : bool, optional, default: True If True/False, shape/value comparison test. binned : bool, optional, default: True If True/False, assume the type of input data is observed frequencies/samples. Returns ------- statistic : float The llike-value test statistic. pvalue : float Corresponding p-value. ndf : int 「Degrees of freedom」: the degrees of freedom for the p-value. The p-value is computed using a chi-squared distribution with k - 1 degrees of freedom, where k is the number of observed frequencies without bins has zero counts in both histograms.
Definition at line 625 of file test_statistics.py.
python.test_statistics.Anderson_ksampResult = namedtuple('Anderson_ksampResult', ('statistic', 'pvalue', 'ndf')) |
Definition at line 397 of file test_statistics.py.
python.test_statistics.BDM_2sampResult = namedtuple('BDM_2sampResult', ('statistic', 'pvalue', 'ndf')) |
Definition at line 195 of file test_statistics.py.
python.test_statistics.Chi2_2sampResult = namedtuple('Chi2_2sampResult', ('statistic', 'pvalue','ndf')) |
Definition at line 141 of file test_statistics.py.
python.test_statistics.CVM_2sampResult = namedtuple('CVM_2sampResult', ('statistic', 'pvalue', 'ndf')) |
Definition at line 239 of file test_statistics.py.
bool python.test_statistics.DEBUG = False |
Definition at line 17 of file test_statistics.py.
python.test_statistics.Ks_2sampResult = namedtuple('Ks_2sampResult', ('statistic', 'pvalue','ndf')) |
Definition at line 21 of file test_statistics.py.
python.test_statistics.LikelihoodRatio_ksampResult = namedtuple('LikelihoodRatio_ksampResult', ('statistic', 'pvalue', 'ndf')) |
Definition at line 580 of file test_statistics.py.
python.test_statistics.LikelihoodValue_ksampResult = namedtuple('LikelihoodValue_ksampResult', ('statistic', 'pvalue', 'ndf')) |
Definition at line 624 of file test_statistics.py.
list python.test_statistics.statistic_seq = [] |
Definition at line 19 of file test_statistics.py.