Comparison of Test Statistics for Mean Difference Testing Between Two Independent Populations

A BSTRACT . The purpose of the article is to evaluate the efficiency of seven test statistics for mean difference testing between two independent populations. The evaluation was based on the probability of type I error and power of the test at 0.05 significance level under population distributions assumed to be normal, exponential, log-normal, gamma, and Laplace with equal sample sizes, and both equal and unequal variances. The results showed that for equal variance, the test statistics with the highest testing power controlled the probability of type I error were Z-test for normal and exponential distributions, Welch based on rank test (WBR) for log-normal and gamma distributions, and Mann-Whitney U test (MWU) for Laplace distribution. For unequal variance, Z-test was more efficient under normal, exponential, log-normal, and gamma distributions, while WBR was appropriate for Laplace distribution.


Introduction
The testing of mean differences obtained from two independent populations have been extensively utilized for testing statistical hypotheses in a variety of research areas, especially in educational science and medical studies.Generally, there are various test statistics; therefore, the selection of appropriate test statistics corresponding to the properties of data sets and the objectives of the research are important for mean comparison tests.If the distributions of two populations from which samples are drawn are normal and known variances, Z-test is used.However, if the distributions of two populations have normal and unknown variances, t-test is applied in case of homogeneity of variances, Welch t-test is used in case of heterogeneity of variances.These test statistics are called parametric statistics, which strongly depend on assumption of normality [5].In practice, the populations that are non-normally distributed lead to possible erroneous conclusions.Consequently, non-parametric statistics have become an interesting alternative method when the normality assumption is violated ( [8], [13]).
In literature, comparison of performance parametric and non-parametric statistics for testing means between two independent populations have been discussed by numerous publications.Thidarat et al. [12] compared the efficiency of test statistics of Mann-Whitney U test (MWU), Brunner and Munzel test, and nonparametric bootstrap rank Welch test (BRW) in terms of the probability of type I error and power of the test when the two populations were assumed to be normal, gamma, exponential, and Chi-squared distributions.They recommended that BRW had the highest testing power but the lowest ability to control the probability of type I error, whereas MWU was the second most powerful and able to control the probability of type I error if the variances were assumed to be equal.Eriobu and Umeh [4] pointed that MWU was better than Kolmogorov-Smirnov test and modified intrinsically ties adjusted Mann-Whitney U test (MAMWU) for the testing of two population means when populations had gamma and Weibull distributions.Sangthong and Klubnual [11] found that Welch based on rank test (WBR) was appropriate when the populations had log-normal, gamma, and poisson distributions with equal variance, while Welch t test yielded the highest efficiency when the populations had log-normal, gamma, exponential, uniform and poisson distributions with unequal variance.Dollada et al. [2] stated that MWU was the most powerful when the populations had negatively skewed distribution, symmetrical and leptokurtic distribution, and positively skewed and leptokurtic distribution.
In this article, the researchers aim to evaluate the performance of test statistics for mean differences between two independent populations.The test statistics compared are Z-test, t-test, Welch t-test, MWU, MAMWU, WBR, and BRW.A simulation study is implemented to verify that their test statistics have the ability to control the probability of type I error and the highest testing power under normal, exponential, log-normal, gamma, and Laplace distributions with equal and unequal variances at 0.05 significance level.
The rest of this article is organized as follows: Section 2 introduces the test statistics of this study.Then the construction method of these test statistics is provided in Section 3. To assess the efficiency of the test statistics, Section 4 presents simulated results to investigate the probability of type I error and power of the test.Finally, Section 5 presents the conclusion and discussion.

Z-test
Z -test is a parametric statistic used for testing the difference between means of two independent populations from a normal distribution with known variance [6].The Z-test is:  are the means of populations 1 and 2. 12 ,  are the standard deviations of populations 1 and 2. 12 n , n are the sample sizes drawn from populations 1 and 2 and 12 x , x are the means of the samples drawn from populations 1 and 2.
Then reject 0 H ; when

t-test
The t-test for two populations, also known as the independent samples t-test, is a parametric statistic that is appropriate when you want to determine whether there is a significant difference between the means of the two populations if the population standard deviations are unknown and the samples are relatively small.There are two variations of the t-test for two populations: t-test for equal variances and Welch t-test for unequal variances.
The independent sample t-test is applied under the assumption of normal distribution and equal population variances [7].The t-test statistic is: Mann-Whitney U test was proposed by Wilcoxon [15] based on rank statistics assigned to observations from two populations.This test statistic is useful for testing hypothesis by assigning the same distribution from two independent populations [4].
For 1 20  n  and 2 20 n  , the MWU test statistic is: where n  , the MWU test statistic is:

Modified intrinsically ties adjusted Mann-Whitney U test (MAMWU)
Oyeka and Okeh [9] developed modified intrinsically ties adjusted Mann-Whitney U test (MAMWU), which is adjusted from the approach of MWU to solve the problem of tied observations between the two-sample populations.MAMWU is useful for hypothesis testing in which the sample sizes are drawn from the same distribution [4].The MAMWU test statistic is: Welch [14] proposed the Welch based on rank, which is associated with the midrank.It was explained as follows ( [10], [11], [16]): (1) Combine 1 n and 2 n and rank the data from the lowest to the highest.If tied observations between two samples are found, the midrank are averaged.
( The nonparametric bootstrap rank Welch test (BRW) is applied from the principle of bootstrap and the technique of rank Welch test statistic ([3], [10]).The method of hypothesis testing is the following: (1) Let If tied observations between two samples are found, the midrank are averaged.
(4) Return sampling with replacement from 1 x and 2 x of size 1 n and 2 n transform into 1 * x and 2

*
x , then ascend order for the combined observations.If tied observations between two samples are found, the midrank are averaged.

Research Method
The Monte Carlo simulation techniques were used to generate data for determining the performance of test statistics.MATLAB version R2021b was written to calculate the probability of type I error and power of the test for the following conditions: 1) Generate two independent populations into five distributions:

The probability of type I error calculation
Power of the test calculation 2) Determine the equal sample sizes for two populations.12 10 10 = , (20,20), and (50,50).
3) Assign the difference of means for two population groups as follows: 3.1) The means between two population groups are not different when calculating the probability of type I error. 3.
2) The means between two population groups are different when computing power estimation with the ratios of 1:1.5 and 1:2 [13].
6) The number of iterations for each case is 10,000.
7) Compute the probability of the type I error by assigning the equivalent mean ratio.
The probability of the type I error is defined as the ratio of number of times of rejection 0 H to number of iterations.When hypothesis testing at a significance level of 0.05, if the probability of type I error is between 0.025 -0.075, the test statistics are considered to be capable of controlling the probability of type I error based on criteria from Bradley [1].8) Compute the testing power by assigning the mean differences with the ratios of 1:1.5 for the small mean differences and 1:2 for the moderate mean differences.Power of the test is computed by the ratio of number of times of rejection 0 H to number of iterations.

Results
To evaluate the performance of seven test statistics, Z-test, t-test, Welch t-test, MWU, MAMWU, WMR, and BRW, we conducted the simulation data based on research method to calculate their probability of type I error and power of the test for testing the mean differences of two independent populations at the significance level of 0.05 ( 0 05 .=


) in which the populations have normal, exponential, log-normal, gamma, and Laplace distributions.
In comparison, the test statistics that have ability to control the probability of type I error and higher test estimation power are preferable to the choice of the test statistics that are robust.
Tables 1 to 5 display the common probability of type I error and testing power from simulation results classified by normal, exponential, log-normal, gamma, and Laplace distributions, respectively.
The following abbreviations are used in Tables.12 (n , n ) , 12 (V :V ) , TE, PE 1:1.5 and PE 1:2 represent the sample sizes, the variance ratios, the probability of type I error, power of the test for the small and the moderate mean differences obtained from two populations, respectively.with unequal variance ratio of 1:5 were able to control the probability of type I error.
This Table 3 confirms that for equal variance with small and moderate mean differences, WBR had superior test estimation power for controlling the probability of type I error, followed by MWU.For unequal variance with small and moderate mean differences, Z-test was very effective with the highest testing power that could control the probability of type I error.with unequal variance ratios of 1:5 and 1:9.For equal variance, MWU and WBR could control the probability of type I error for all sizes, while BRW could control the probability of type I error for only 12 10 10 (n ,n ) ( , ) = .
Based on power of the test for equal variance with small and moderate mean differences, WBR dominated the other test statistics with the highest test estimation power and ability to control the probability of type I error.Whereas the second largest test estimation power and ability to control the probability of type I error were MWU and BRW in the case of small mean difference, and MWU in the case of moderate mean difference.For unequal variance, Z-test was suitable for both small and moderate mean differences.While the second most powerful test statistics were t-test for small mean difference, and Welch t-test for moderate mean difference.= with variance ratio of 1:5.
Considering power of the test for equal variance with small and moderate mean differences, the superior test statistics with the highest testing power for controlling the probability of type I error in most conditions were MWU and WBR, respectively.For unequal variance with small mean differences, MWU and WBR had the greatest estimation power for controlling the probability of type I error, followed by BRW.For moderate mean difference, the highest testing power which could control the probability of type I error were WBR, MWU and BRW, respectively.

Conclusion and Discussion
The results showed that when the distributions of two populations assumed to be normal, For equal variance with both small and moderate mean differences, the test statistics with the highest estimation power for controlling the probability of type I error in most conditions were Z-test for normal distribution, WBR for log-normal and gamma distributions, and MWU for Laplace distribution.Z-test was better for small mean difference, while t-test was superior for moderate mean difference when two populations had exponential distributions.
For unequal variance with both small and moderate mean differences, Z-test had the highest testing power and ability to control the probability of type I error in most conditions for normal, exponential, log-normal and gamma distributions.MWU and WBR were superior for Laplace distribution when the mean difference was small.WBR was suitable when the mean difference was moderate.
Besides, it was found that MAMWU gave the highest test estimation power for every condition but did not have the capacity to control the probability of type I error.This related to the research of Eriobu and Umeh [4].When two populations had gamma distributions and equal variance, WBR was recommended for testing hypothesis because this test statistic had the highest testing power for controlling the probability of type I error.This is consistent with the work of Sangthong and Klubnual [11].Furthermore, the results also revealed that the testing power estimation tended to be higher as sample sizes increased when the variance ratios were identical between two populations.
to observations from populations 1 and 2 in the combined ranking of these observations from the two populations.

( 2 )
Pool the two samples into one combined sample of size 12 n n n =+, then ascend order.
Calculate the means and variances of midrank to the first and second samples, denoted

Table 1 .
Note bold entries indicate the test statistics that could control the probability of type I error, whereas bold and underlined entries indicate the test statistics that could control the probability of type I error while retaining the highest test estimation power.According to normally distributed population in Table 1, it is clearly seen that Z-test, ttest, Welch t-test, MWU and WBR were all able to control the probability of type I error for all sample sizes in both equal and unequal variances.Probability of type I error and power of the test of seven test statistics for two Table1also indicates that for equal variance, Z-test had the highest testing power that could control the probability of type I error, as shown by the higher testing power relative to other test statistics for both small and moderate mean differences.For unequal variance with a small mean difference, Z-test had the highest test estimation power that had capacity for controlling the probability of type I error in most conditions, while t-test was quite similar in performance to MWU, which were close to 22.22% of the total conditions.When the mean difference was moderate, Z-test had the greatest estimation power.

Table 2 .
Probability of type I error and power of the test of seven test statistics for two

Table 3 .
Probability of type I error and power of the test of seven test statistics for two

Table 4 .
Probability of type I error and power of the test of seven test statistics for two

Table 5 .
Probability of type I error and power of the test of seven test statistics for twoAs shown in Table2, when the populations had exponential distribution, it was found that Z-test, t-test, and Welch t-test were capable of controlling the probability of type I error for all sample sizes in both equal and unequal variances.On the other hand, MWU and WBR were capable of controlling the probability of type I error for only equal variance.Additionally, for equal variance, the test statistics with the highest testing power for controlling the probability of type I error in most conditions were Z-test for small mean difference and t-test for moderate mean difference.For unequal variance with small mean difference, the test statistics with the highest testing power for controlling the probability of type I error were Ztest and t-test.Furthermore, when the mean difference was moderate, the test statistics with the highest testing power for controlling the probability of type I error were Z-test, followed by t-test, Welch t-test, respectively.Based on log-normal distribution given in Table3, the results demonstrated that Z-test had the ability to control the probability of type I error for all sample sizes in both equal and unequal variances.For equal variance, t-test, Welch t-test, MWU and WBR had the ability to control the probability of type I error for all sample sizes.Nevertheless, for unequal variance, t-

Table 4
gives the results of gamma distribution.It was found that Z-test could control the probability of type I error for all sample sizes in both equal and unequal variances.Moreover, ttest and Welch t-test could also control the probability of type I error for all conditions except for

Table 5
displays the results of Laplace distribution, where it was observed that t-test, Welch t-test, MWU and WBR were able to control the probability of type I error for all cases in both equal and unequal variances.For unequal variance, Z-test was able to control the probability of type I error with variance ratio of 1:3, while BRW could control the probability of type I error exponential, log-normal, gamma and Laplace distributions in the case of homogeneous variances, Z-test, t-test, Welch t-test, MWU and WBR yielded acceptable capability of controlling the probability of type I error for all sample sizes based on Bradley's criteria except for Z-test, which was not suitable for Laplace distribution.Consequently, the test statistics were robust to changes in the distribution of the population.Regardless of heterogeneity of variances for two population groups, all test statistics yielded a lower acceptable capability of controlling the probability of type I error.Z-test, t-test, and Welch t-test had the capability of controlling the probability of type I error for almost conditions under all considered distributions.Moreover, MWU and WBR were superior for all sample sizes under normal and Laplace distributions, whereas BRW was