Abstract
The common way to deal with outliers in empirical Economics and Finance is to delete them, either by trimming or winsorizing, or by computing statistics robust to outliers. However, due to their importance, there are situations where the exclusion of these observations is not reasonable and may even be counterproductive. For example, should we exclude the very high stock prices of Amazon and Google from an empirical analysis? Even if the purpose is to compute an average of tech stock prices, does it make economic and financial sense? Maybe not. A solution that would keep the two companies in the data set and yet not penalize the higher observations as much as the median, harmonic and geometric averages, might—were such a solution to be available—constitute an attractive alternative. In this paper we propose and analyze a modified measure, the adjusted median, where the influence of the outlying observations, while not as high as in the arithmetic average would, however, give more weight to the outlying observations than the median, harmonic and geometric averages. Monte Carlo simulations and bootstrapping real financial data confirm how useful the adjusted median could be.
Similar content being viewed by others
Notes
We use “average” or “mean” interchangeably.
See “Appendix B” for demonstration.
See “Appendix A” for demonstrations.
See “Appendix B” for demonstrations.
See “Appendix C” for simple applications of the results of this subsection.
See Sect. 4 for details.
The data source is: https://www.tradingview.com. Prices refer to June 30, 2020.
References
Agrrawal, P., Borgman, R. H., Clark, J. M., & Strong, R. (2010). Using the price-to-earnings harmonic mean to improve firm valuation estimates. Journal of Financial Education, 36, 98–110.
Alf, E. F., & Grossberg, J. M. (1979). The geometric mean: Confidence limits and significance tests. Perception & Psychophysics, 26(5), 419–421.
Assari, S., Boyce, S., Bazargan, M., & Caldwell, C. H. (2020). Diminished returns of parental education in terms of youth school performance: Ruling out regression toward the mean. Children, 2020(7), 74.
Basu, S., & DasGupta, A. (1997). The mean, median, and mode of unimodal distributions: A characterization. Theory of Probability & Its Applications, 41(2), 210–223.
Chen, L. (1995). Testing the mean of skewed distributions. Journal of the American Statistical Association, 90, 762–772.
Cheuk, T. H., & Vorst, T. C. (1999). Average interest rate caps. Computational Economics, 14, 183–196.
Choi, T.-M., Wen, X., Sun, X., & Chung, S.-H. (2019). The mean-variance approach for global supply chain risk analysis with air logistics in the blockchain technology era. Transportation Research Part E: Logistics and Transportation Review, 127, 178–191.
Coggeshall, F. (1886). The arithmetic, geometric, and harmonic means [response to Jevons]. Quarterly Journal of Economics, 1(1), 83–86.
del Barrio, T., Camarero, M., & Tamarit, C. (2019). Testing for periodic integration with a changing mean. Computational Economics, 54, 45–75.
Galton, F. (1897). The geometric mean in vital and social statistics. Proceedings of the Royal Society of London, 29, 365–367.
Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change, 153, 119928.
Gonzãlez-Manteiga, V., Sáncez, J. M. P., & Romo, J. (1994). The bootstrap—A review. Computational Statistics.
Gou, J., Ma, H., Ou, W., Zeng, S., Rao, Y., & Yang, H. (2019). A generalized mean distance-based k-nearest neighbor classifier. Expert Systems with Applications, 115, 356–372.
Kaplan, J., Fischer, D. G., & Rogness, N. T. (2010). Lexical Ambiguity in Statistics: How students use and define the words: association, average, confidence, random and spread. Journal of Statistics Education 18(2).
Kolahdouz, F., Radmehr, F., & Alamolhodaei, H. (2020). Exploring students’ proof comprehension of the Cauchy Generalized Mean Value Theorem. Teaching Mathematics and its Applications: An International Journal of the IMA, 39(3), 213–235.
Lyu, K., Zhang, X., & Church, J. A. (2020). Regional dynamic sea level simulated in the CMIP5 and CMIP6 models: Mean biases, future projections, and their linkages. Journal of Climate, 33(15), 6377–6398.
Maki, D., & Ota, Y. (2020). Testing for time-varying properties under misspecified conditional mean and variance. Computational Economics.
Matthews, G. E. (2004). Fairness opinions: Common errors and omissions. In “The handbook of business valuation and intellectual property analysis (pp. 209–232). McGraw Hill.
Nakagawa, T., Takei, T., Ishii, A., & Tomizawa, S. (2020). Geometric mean type measure of marginal homogeneity for square contingency tables with ordered categories. Journal of Mathematics and Statistics, 16(1), 170–175.
Priam, R. (2020). Visualization of generalized mean estimators using auxiliary information in survey sampling. Communications in Statistics - Theory and Methods, 49(18), 4468–4489.
Trönqvist, L., Vartia, P., & Vartia, Y. O. (1985). How should relative changes be measured? The American Statistician, 39(1), 43–46.
Wellalage, N. H., & Fernandez, V. (2019). Innovation and SME finance: Evidence from developing countries. International Review of Financial Analysis, 66, 101370.
Wenzel, M., & Kubiak, T. (2020). Neuroticism may reflect emotional variability when correcting for the confound with the mean. Proceedings of the National Academy of Sciences.
Wu, X., Liang, S., Ma, X., Lu, T., & Ahmadi, S. A. (2020). The mean sensitivity and mean equicontinuity in uniform spaces. International Journal of Bifurcation and Chaos, 30(8), 2050122.
Acknowledgements
I would like to thank the Editor-in-Chief Hans Amman and the anonymous referee for their suggestions and comments.
Author information
Authors and Affiliations
Contributions
Everything written in the paper it is the responsibility of the author.
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Ethical approval
No particular ethical approval was required for this study because it does not entail human participation or personal data.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by Fundação para a Ciência e a Tecnologia, Grant UIDB/00315/2020.
Appendices
A Contributions of Each Data Point
See Eq. (5):
if \(x_i \sum _{i=1}^n x_i \ne 0.\)
See Eq. (7):
if \(x_i \sum _{i=1}^n x_i \ne 0.\)
B Averages Inequalities
According to the means definition, see Eqs. (1), (2) and (3), their logarithms are:
By Jensen’s inequality,
which can be exponentiated to give the arithmetic mean-geometric mean inequality:
Now comparing the harmonic with the geometric mean (and by Jensen’s inequality):
and by exponentiating both sides:
C Different Meanings of the Center
Median
Consider a first data set: 4, 6, 10, 100 (\(n=4\), even). Thus, the median rank is \(r_M=\frac{1+4}{2}=2.5\), the median is \(\bar{X}_M= \frac{6+10}{2}=8\) and \(\sum _{i=1}^4 \left( r_i-r_M\right) =(1-2.5)+(2-2.5)+(3-2.5)+(4-2.5)=0\).
For a second data set 4, 6, 10, 20, 100 (\(n=5\), odd), the median rank is \(r_M=\frac{1+5}{2}=3\), the median is \(\bar{X}_M= 10\) and \(\sum _{i=1}^5 \left( r_i-r_M \right) =(1-3)+(2-3)+(3-3)+(4-3)+(5-3)=0\). Thus, the median is the center of the distribution in terms of the counting observations: one half of the observations is on the left and one half is on the right of the median, no matter the value of the observations.
Arithmetic average
Consider again the second data set: \(\bar{X}_A=\frac{4+6+10+20+100}{5}=28\) and
\(\sum _{i=1}^5 \left( x_i-\bar{X}_A\right) =(4-28)+(6-28) + (10-28)+ (20-28) + (100-28)=0\).
Thus, the arithmetic average is the center of the distribution in terms of the deviations in absolute terms: the arithmetic mean is such that the absolute deviations on its right is compensated by the absolute deviations on its left. So, the center is defined in terms of the absolute deviations (or distances) between each value and the arithmetic average:
Geometric average
For the second data set: \(\bar{X}_G=\root 5 \of {4 \times 6 \times 10 \times 20 \times 100}=13.69\) and
\(x_i\) | \(\ln (x_i)\) | \(\ln (x_i)-\ln (\bar{X}_G)\) |
---|---|---|
4 | 1.386 | − 123.00% |
6 | 1.792 | − 82.45% |
10 | 2.303 | − 31.37% |
20 | 2.996 | 37.94% |
100 | 4.605 | 198.89% |
sum | 0 |
Compounding percentage deviation means that:
\(4=13.69 \times \exp (-123\%)\), \(\ldots \), \(100=13.69 \times \exp (198.89\%)\).
Thus, the geometric average is the value that balances the negative percentage deviations with the positive ones.
Harmonic average
\(x_i\) | \(x_i-\bar{X}_H\) | \(w_i=1/x_i\) | \(\frac{w_i}{\sum _{i=1}^n w_i}\) | \(\left( x_i-\bar{X}_H\right) \left( \frac{w_i}{\sum _{i=1}^n w_i}\right) \) |
---|---|---|---|---|
4 | − 4.671 | 0.250 | 0.434 | − 2.025 |
6 | − 2.671 | 0.167 | 0.289 | − 0.772 |
10 | 1.329 | 0.100 | 0.173 | 0.231 |
20 | 11.329 | 0.050 | 0.087 | 0.982 |
100 | 91.329 | 0.010 | 0.017 | 1.584 |
0.577 | 1 | 0 |
Thus, the harmonic average defines the center of the distribution in order that the weighted deviations on its left compensate the weighted deviations on its right. The weights are inversely proportional to the original values.
Rights and permissions
About this article
Cite this article
Dias Curto, J. Averages: There is Still Something to Learn. Comput Econ 60, 755–779 (2022). https://doi.org/10.1007/s10614-021-10165-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-021-10165-y