Averages: There is Still Something to Learn

Dias Curto, José

doi:10.1007/s10614-021-10165-y

Averages: There is Still Something to Learn

Published: 25 July 2021

Volume 60, pages 755–779, (2022)
Cite this article

Computational Economics Aims and scope Submit manuscript

José Dias Curto ORCID: orcid.org/0000-0003-2012-9015^1,2

307 Accesses
2 Citations
Explore all metrics

Abstract

The common way to deal with outliers in empirical Economics and Finance is to delete them, either by trimming or winsorizing, or by computing statistics robust to outliers. However, due to their importance, there are situations where the exclusion of these observations is not reasonable and may even be counterproductive. For example, should we exclude the very high stock prices of Amazon and Google from an empirical analysis? Even if the purpose is to compute an average of tech stock prices, does it make economic and financial sense? Maybe not. A solution that would keep the two companies in the data set and yet not penalize the higher observations as much as the median, harmonic and geometric averages, might—were such a solution to be available—constitute an attractive alternative. In this paper we propose and analyze a modified measure, the adjusted median, where the influence of the outlying observations, while not as high as in the arithmetic average would, however, give more weight to the outlying observations than the median, harmonic and geometric averages. Monte Carlo simulations and bootstrapping real financial data confirm how useful the adjusted median could be.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outliers

A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers

Outliers in Time Series: An Empirical Likelihood Approach

Notes

We use “average” or “mean” interchangeably.
See “Appendix B” for demonstration.
See “Appendix A” for demonstrations.
See “Appendix B” for demonstrations.
See “Appendix C” for simple applications of the results of this subsection.
See Sect. 4 for details.
The data source is: https://www.tradingview.com. Prices refer to June 30, 2020.

References

Agrrawal, P., Borgman, R. H., Clark, J. M., & Strong, R. (2010). Using the price-to-earnings harmonic mean to improve firm valuation estimates. Journal of Financial Education, 36, 98–110.
Google Scholar
Alf, E. F., & Grossberg, J. M. (1979). The geometric mean: Confidence limits and significance tests. Perception & Psychophysics, 26(5), 419–421.
Article Google Scholar
Assari, S., Boyce, S., Bazargan, M., & Caldwell, C. H. (2020). Diminished returns of parental education in terms of youth school performance: Ruling out regression toward the mean. Children, 2020(7), 74.
Article Google Scholar
Basu, S., & DasGupta, A. (1997). The mean, median, and mode of unimodal distributions: A characterization. Theory of Probability & Its Applications, 41(2), 210–223.
Article Google Scholar
Chen, L. (1995). Testing the mean of skewed distributions. Journal of the American Statistical Association, 90, 762–772.
Google Scholar
Cheuk, T. H., & Vorst, T. C. (1999). Average interest rate caps. Computational Economics, 14, 183–196.
Article Google Scholar
Choi, T.-M., Wen, X., Sun, X., & Chung, S.-H. (2019). The mean-variance approach for global supply chain risk analysis with air logistics in the blockchain technology era. Transportation Research Part E: Logistics and Transportation Review, 127, 178–191.
Article Google Scholar
Coggeshall, F. (1886). The arithmetic, geometric, and harmonic means [response to Jevons]. Quarterly Journal of Economics, 1(1), 83–86.
Article Google Scholar
del Barrio, T., Camarero, M., & Tamarit, C. (2019). Testing for periodic integration with a changing mean. Computational Economics, 54, 45–75.
Article Google Scholar
Galton, F. (1897). The geometric mean in vital and social statistics. Proceedings of the Royal Society of London, 29, 365–367.
Google Scholar
Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change, 153, 119928.
Article Google Scholar
Gonzãlez-Manteiga, V., Sáncez, J. M. P., & Romo, J. (1994). The bootstrap—A review. Computational Statistics.
Gou, J., Ma, H., Ou, W., Zeng, S., Rao, Y., & Yang, H. (2019). A generalized mean distance-based k-nearest neighbor classifier. Expert Systems with Applications, 115, 356–372.
Article Google Scholar
Kaplan, J., Fischer, D. G., & Rogness, N. T. (2010). Lexical Ambiguity in Statistics: How students use and define the words: association, average, confidence, random and spread. Journal of Statistics Education 18(2).
Kolahdouz, F., Radmehr, F., & Alamolhodaei, H. (2020). Exploring students’ proof comprehension of the Cauchy Generalized Mean Value Theorem. Teaching Mathematics and its Applications: An International Journal of the IMA, 39(3), 213–235.
Article Google Scholar
Lyu, K., Zhang, X., & Church, J. A. (2020). Regional dynamic sea level simulated in the CMIP5 and CMIP6 models: Mean biases, future projections, and their linkages. Journal of Climate, 33(15), 6377–6398.
Article Google Scholar
Maki, D., & Ota, Y. (2020). Testing for time-varying properties under misspecified conditional mean and variance. Computational Economics.
Matthews, G. E. (2004). Fairness opinions: Common errors and omissions. In “The handbook of business valuation and intellectual property analysis (pp. 209–232). McGraw Hill.
Nakagawa, T., Takei, T., Ishii, A., & Tomizawa, S. (2020). Geometric mean type measure of marginal homogeneity for square contingency tables with ordered categories. Journal of Mathematics and Statistics, 16(1), 170–175.
Article Google Scholar
Priam, R. (2020). Visualization of generalized mean estimators using auxiliary information in survey sampling. Communications in Statistics - Theory and Methods, 49(18), 4468–4489.
Article Google Scholar
Trönqvist, L., Vartia, P., & Vartia, Y. O. (1985). How should relative changes be measured? The American Statistician, 39(1), 43–46.
Google Scholar
Wellalage, N. H., & Fernandez, V. (2019). Innovation and SME finance: Evidence from developing countries. International Review of Financial Analysis, 66, 101370.
Article Google Scholar
Wenzel, M., & Kubiak, T. (2020). Neuroticism may reflect emotional variability when correcting for the confound with the mean. Proceedings of the National Academy of Sciences.
Wu, X., Liang, S., Ma, X., Lu, T., & Ahmadi, S. A. (2020). The mean sensitivity and mean equicontinuity in uniform spaces. International Journal of Bifurcation and Chaos, 30(8), 2050122.
Article Google Scholar

Download references

Acknowledgements

I would like to thank the Editor-in-Chief Hans Amman and the anonymous referee for their suggestions and comments.

Author information

Authors and Affiliations

Instituto Universitário de Lisboa (ISCTE-IUL), BRU-UNIDE, Lisbon, Portugal
José Dias Curto
Department of Quantitative Methods for Management and Economics, Av. Prof. Aníbal Bettencourt, 1600-189, Lisbon, Portugal
José Dias Curto

Authors

José Dias Curto
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Everything written in the paper it is the responsibility of the author.

Corresponding author

Correspondence to José Dias Curto.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Ethical approval

No particular ethical approval was required for this study because it does not entail human participation or personal data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Fundação para a Ciência e a Tecnologia, Grant UIDB/00315/2020.

Appendices

A Contributions of Each Data Point

See Eq. (5):

$$\begin{aligned} \bar{X}_{H}&= \sum _{i=1}^n x_i \frac{w_i}{\sum _{i=1}^n w_i} = \sum _{i=1}^n \left[ x_i \frac{ \frac{\sum _{i=1}^n x_i}{x_i} }{\frac{\sum _{i=1}^n x_i}{x_1}+\frac{\sum _{i=1}^n x_i}{x_2}+ \ldots + \frac{\sum _{i=1}^n x_i}{x_n}} \right] \\&= \sum _{i=1}^n \left[ \frac{x_i \sum _{i=1}^n x_i}{x_i \sum _{i=1}^n x_i {\sum _{i=1}^n \frac{1}{x_i}}} \right] =,\\&= \sum _{i=1}^n \left[ \frac{1}{\sum _{i=1}^n \frac{1}{x_i}}\right] = \frac{1}{\sum _{i=1}^n \frac{1}{x_i}}+\frac{1}{\sum _{i=1}^n \frac{1}{x_i}} + \ldots + \frac{1}{\sum _{i=1}^n \frac{1}{x_i}}=\frac{n}{\sum _{i=1}^n \frac{1}{x_i}}, \end{aligned}$$

if $x_i \sum _{i=1}^n x_i \ne 0.$

See Eq. (7):

$$\begin{aligned} x_i \frac{w_i}{\sum _{i=1}^n w_i}=x_i \frac{ \frac{\sum _{i=1}^n x_i}{x_i} }{\frac{\sum _{i=1}^n x_i}{x_1}+\frac{\sum _{i=1}^n x_i}{x_2}+ \ldots + \frac{\sum _{i=1}^n x_i}{x_n}}= \frac{x_i \sum _{i=1}^n x_i}{x_i \sum _{i=1}^n x_i {\sum _{i=1}^n \frac{1}{x_i}}}=\frac{1}{\sum _{i=1}^n \frac{1}{x_i}}, \end{aligned}$$

if $x_i \sum _{i=1}^n x_i \ne 0.$

B Averages Inequalities

According to the means definition, see Eqs. (1), (2) and (3), their logarithms are:

$$\begin{aligned} \ln \left( \bar{X} \right) =\ln \left( \frac{1}{n} \sum _{i=1}^{n}X_i \right) , \quad \ln \left( \bar{X}_G \right) =\frac{1}{n}\sum _{i=1}^{n} \ln \left( X_i\right) \text{ and } \ln \left( \bar{X}_H \right) =-\ln \left( \frac{1}{n} \sum _{i=1}^{n}\frac{1}{X_i}\right) . \end{aligned}$$

By Jensen’s inequality,

$$\begin{aligned} \ln \left( \frac{1}{n}\sum _{i=1}^nX_i \right) \ge \frac{1}{n} \sum _{i=1}^n \ln \left( X_i\right) , \end{aligned}$$

which can be exponentiated to give the arithmetic mean-geometric mean inequality:

$$\begin{aligned} \underbrace{\frac{1}{n} \sum _{i=1}^{n}X_i}_{\bar{X}} \ge \underbrace{\left( \prod _{i=1}^{n} X_i\right) ^{\frac{1}{n}}}_{\bar{X}_G}, \;\; \text{ thus } \bar{X} \ge \bar{X}_G. \end{aligned}$$

Now comparing the harmonic with the geometric mean (and by Jensen’s inequality):

$$\begin{aligned} -\ln \left( \frac{1}{n} \sum _{i=1}^{n}\frac{1}{X_i}\right) \le -\frac{1}{n} \sum _{i=1}^{n}\ln \left( \frac{1}{X_i}\right) =\frac{1}{n}\sum _{i=1}^{n} \ln \left( X_i\right) , \end{aligned}$$

and by exponentiating both sides:

$$\begin{aligned} \underbrace{\frac{n}{\sum _{i=1}^{n} \frac{1}{X_i}}}_{\bar{X}_H} \le \underbrace{\left( \prod _{i=1}^{n} X_i\right) ^{\frac{1}{n}}}_{\bar{X}_G}, \; \; \text{ thus } \bar{X}_H \le \bar{X}_G. \end{aligned}$$

C Different Meanings of the Center

Median

Consider a first data set: 4, 6, 10, 100 ($n=4$, even). Thus, the median rank is $r_M=\frac{1+4}{2}=2.5$, the median is $\bar{X}_M= \frac{6+10}{2}=8$ and $\sum _{i=1}^4 \left( r_i-r_M\right) =(1-2.5)+(2-2.5)+(3-2.5)+(4-2.5)=0$.

For a second data set 4, 6, 10, 20, 100 ($n=5$, odd), the median rank is $r_M=\frac{1+5}{2}=3$, the median is $\bar{X}_M= 10$ and $\sum _{i=1}^5 \left( r_i-r_M \right) =(1-3)+(2-3)+(3-3)+(4-3)+(5-3)=0$. Thus, the median is the center of the distribution in terms of the counting observations: one half of the observations is on the left and one half is on the right of the median, no matter the value of the observations.

Arithmetic average

Consider again the second data set: $\bar{X}_A=\frac{4+6+10+20+100}{5}=28$ and

$\sum _{i=1}^5 \left( x_i-\bar{X}_A\right) =(4-28)+(6-28) + (10-28)+ (20-28) + (100-28)=0$.

Thus, the arithmetic average is the center of the distribution in terms of the deviations in absolute terms: the arithmetic mean is such that the absolute deviations on its right is compensated by the absolute deviations on its left. So, the center is defined in terms of the absolute deviations (or distances) between each value and the arithmetic average:

$$\begin{aligned} \underbrace{\underbrace{(4-28)}_{-24}+\underbrace{(6-28)}_{-22}+ \underbrace{(10-28)}_{-18}\underbrace{(20-28)}_{-8}}_{-72}+\underbrace{(100-28)}_{+72}. \end{aligned}$$

Geometric average

For the second data set: $\bar{X}_G=\root 5 \of {4 \times 6 \times 10 \times 20 \times 100}=13.69$ and

$x_i$	$\ln (x_i)$	$\ln (x_i)-\ln (\bar{X}_G)$
4	1.386	− 123.00%
6	1.792	− 82.45%
10	2.303	− 31.37%
20	2.996	37.94%
100	4.605	198.89%
	sum	0

Compounding percentage deviation means that:

$4=13.69 \times \exp (-123\%)$, $\ldots $, $100=13.69 \times \exp (198.89\%)$.

Thus, the geometric average is the value that balances the negative percentage deviations with the positive ones.

Harmonic average

$x_i$	$x_i-\bar{X}_H$	$w_i=1/x_i$	$\frac{w_i}{\sum _{i=1}^n w_i}$	$\left( x_i-\bar{X}_H\right) \left( \frac{w_i}{\sum _{i=1}^n w_i}\right) $
4	− 4.671	0.250	0.434	− 2.025
6	− 2.671	0.167	0.289	− 0.772
10	1.329	0.100	0.173	0.231
20	11.329	0.050	0.087	0.982
100	91.329	0.010	0.017	1.584
		0.577	1	0

Thus, the harmonic average defines the center of the distribution in order that the weighted deviations on its left compensate the weighted deviations on its right. The weights are inversely proportional to the original values.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dias Curto, J. Averages: There is Still Something to Learn. Comput Econ 60, 755–779 (2022). https://doi.org/10.1007/s10614-021-10165-y

Download citation

Accepted: 15 July 2021
Published: 25 July 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10614-021-10165-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Averages: There is Still Something to Learn

Abstract

Access this article

Similar content being viewed by others

Outliers

A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers

Outliers in Time Series: An Empirical Likelihood Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

A Contributions of Each Data Point

B Averages Inequalities

C Different Meanings of the Center

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Averages: There is Still Something to Learn

Abstract

Access this article

Similar content being viewed by others

Outliers

A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers

Outliers in Time Series: An Empirical Likelihood Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

A Contributions of Each Data Point

B Averages Inequalities

C Different Meanings of the Center

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation