Skip to main content
Log in

Averages: There is Still Something to Learn

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

The common way to deal with outliers in empirical Economics and Finance is to delete them, either by trimming or winsorizing, or by computing statistics robust to outliers. However, due to their importance, there are situations where the exclusion of these observations is not reasonable and may even be counterproductive. For example, should we exclude the very high stock prices of Amazon and Google from an empirical analysis? Even if the purpose is to compute an average of tech stock prices, does it make economic and financial sense? Maybe not. A solution that would keep the two companies in the data set and yet not penalize the higher observations as much as the median, harmonic and geometric averages, might—were such a solution to be available—constitute an attractive alternative. In this paper we propose and analyze a modified measure, the adjusted median, where the influence of the outlying observations, while not as high as in the arithmetic average would, however, give more weight to the outlying observations than the median, harmonic and geometric averages. Monte Carlo simulations and bootstrapping real financial data confirm how useful the adjusted median could be.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We use “average” or “mean” interchangeably.

  2. See “Appendix B” for demonstration.

  3. See “Appendix A” for demonstrations.

  4. See “Appendix B” for demonstrations.

  5. See “Appendix C” for simple applications of the results of this subsection.

  6. See Sect. 4 for details.

  7. The data source is: https://www.tradingview.com. Prices refer to June 30, 2020.

References

  • Agrrawal, P., Borgman, R. H., Clark, J. M., & Strong, R. (2010). Using the price-to-earnings harmonic mean to improve firm valuation estimates. Journal of Financial Education, 36, 98–110.

    Google Scholar 

  • Alf, E. F., & Grossberg, J. M. (1979). The geometric mean: Confidence limits and significance tests. Perception & Psychophysics, 26(5), 419–421.

    Article  Google Scholar 

  • Assari, S., Boyce, S., Bazargan, M., & Caldwell, C. H. (2020). Diminished returns of parental education in terms of youth school performance: Ruling out regression toward the mean. Children, 2020(7), 74.

    Article  Google Scholar 

  • Basu, S., & DasGupta, A. (1997). The mean, median, and mode of unimodal distributions: A characterization. Theory of Probability & Its Applications, 41(2), 210–223.

    Article  Google Scholar 

  • Chen, L. (1995). Testing the mean of skewed distributions. Journal of the American Statistical Association, 90, 762–772.

    Google Scholar 

  • Cheuk, T. H., & Vorst, T. C. (1999). Average interest rate caps. Computational Economics, 14, 183–196.

    Article  Google Scholar 

  • Choi, T.-M., Wen, X., Sun, X., & Chung, S.-H. (2019). The mean-variance approach for global supply chain risk analysis with air logistics in the blockchain technology era. Transportation Research Part E: Logistics and Transportation Review, 127, 178–191.

    Article  Google Scholar 

  • Coggeshall, F. (1886). The arithmetic, geometric, and harmonic means [response to Jevons]. Quarterly Journal of Economics, 1(1), 83–86.

    Article  Google Scholar 

  • del Barrio, T., Camarero, M., & Tamarit, C. (2019). Testing for periodic integration with a changing mean. Computational Economics, 54, 45–75.

    Article  Google Scholar 

  • Galton, F. (1897). The geometric mean in vital and social statistics. Proceedings of the Royal Society of London, 29, 365–367.

    Google Scholar 

  • Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change, 153, 119928.

    Article  Google Scholar 

  • Gonzãlez-Manteiga, V., Sáncez, J. M. P., & Romo, J. (1994). The bootstrap—A review. Computational Statistics.

  • Gou, J., Ma, H., Ou, W., Zeng, S., Rao, Y., & Yang, H. (2019). A generalized mean distance-based k-nearest neighbor classifier. Expert Systems with Applications, 115, 356–372.

    Article  Google Scholar 

  • Kaplan, J., Fischer, D. G., & Rogness, N. T. (2010). Lexical Ambiguity in Statistics: How students use and define the words: association, average, confidence, random and spread. Journal of Statistics Education 18(2).

  • Kolahdouz, F., Radmehr, F., & Alamolhodaei, H. (2020). Exploring students’ proof comprehension of the Cauchy Generalized Mean Value Theorem. Teaching Mathematics and its Applications: An International Journal of the IMA, 39(3), 213–235.

    Article  Google Scholar 

  • Lyu, K., Zhang, X., & Church, J. A. (2020). Regional dynamic sea level simulated in the CMIP5 and CMIP6 models: Mean biases, future projections, and their linkages. Journal of Climate, 33(15), 6377–6398.

    Article  Google Scholar 

  • Maki, D., & Ota, Y. (2020). Testing for time-varying properties under misspecified conditional mean and variance. Computational Economics.

  • Matthews, G. E. (2004). Fairness opinions: Common errors and omissions. In “The handbook of business valuation and intellectual property analysis (pp. 209–232). McGraw Hill.

  • Nakagawa, T., Takei, T., Ishii, A., & Tomizawa, S. (2020). Geometric mean type measure of marginal homogeneity for square contingency tables with ordered categories. Journal of Mathematics and Statistics, 16(1), 170–175.

    Article  Google Scholar 

  • Priam, R. (2020). Visualization of generalized mean estimators using auxiliary information in survey sampling. Communications in Statistics - Theory and Methods, 49(18), 4468–4489.

    Article  Google Scholar 

  • Trönqvist, L., Vartia, P., & Vartia, Y. O. (1985). How should relative changes be measured? The American Statistician, 39(1), 43–46.

    Google Scholar 

  • Wellalage, N. H., & Fernandez, V. (2019). Innovation and SME finance: Evidence from developing countries. International Review of Financial Analysis, 66, 101370.

    Article  Google Scholar 

  • Wenzel, M., & Kubiak, T. (2020). Neuroticism may reflect emotional variability when correcting for the confound with the mean. Proceedings of the National Academy of Sciences.

  • Wu, X., Liang, S., Ma, X., Lu, T., & Ahmadi, S. A. (2020). The mean sensitivity and mean equicontinuity in uniform spaces. International Journal of Bifurcation and Chaos, 30(8), 2050122.

    Article  Google Scholar 

Download references

Acknowledgements

I would like to thank the Editor-in-Chief Hans Amman and the anonymous referee for their suggestions and comments.

Author information

Authors and Affiliations

Authors

Contributions

Everything written in the paper it is the responsibility of the author.

Corresponding author

Correspondence to José Dias Curto.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Ethical approval

No particular ethical approval was required for this study because it does not entail human participation or personal data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Fundação para a Ciência e a Tecnologia, Grant UIDB/00315/2020.

Appendices

A Contributions of Each Data Point

See Eq. (5):

$$\begin{aligned} \bar{X}_{H}&= \sum _{i=1}^n x_i \frac{w_i}{\sum _{i=1}^n w_i} = \sum _{i=1}^n \left[ x_i \frac{ \frac{\sum _{i=1}^n x_i}{x_i} }{\frac{\sum _{i=1}^n x_i}{x_1}+\frac{\sum _{i=1}^n x_i}{x_2}+ \ldots + \frac{\sum _{i=1}^n x_i}{x_n}} \right] \\&= \sum _{i=1}^n \left[ \frac{x_i \sum _{i=1}^n x_i}{x_i \sum _{i=1}^n x_i {\sum _{i=1}^n \frac{1}{x_i}}} \right] =,\\&= \sum _{i=1}^n \left[ \frac{1}{\sum _{i=1}^n \frac{1}{x_i}}\right] = \frac{1}{\sum _{i=1}^n \frac{1}{x_i}}+\frac{1}{\sum _{i=1}^n \frac{1}{x_i}} + \ldots + \frac{1}{\sum _{i=1}^n \frac{1}{x_i}}=\frac{n}{\sum _{i=1}^n \frac{1}{x_i}}, \end{aligned}$$

if \(x_i \sum _{i=1}^n x_i \ne 0.\)

See Eq. (7):

$$\begin{aligned} x_i \frac{w_i}{\sum _{i=1}^n w_i}=x_i \frac{ \frac{\sum _{i=1}^n x_i}{x_i} }{\frac{\sum _{i=1}^n x_i}{x_1}+\frac{\sum _{i=1}^n x_i}{x_2}+ \ldots + \frac{\sum _{i=1}^n x_i}{x_n}}= \frac{x_i \sum _{i=1}^n x_i}{x_i \sum _{i=1}^n x_i {\sum _{i=1}^n \frac{1}{x_i}}}=\frac{1}{\sum _{i=1}^n \frac{1}{x_i}}, \end{aligned}$$

if \(x_i \sum _{i=1}^n x_i \ne 0.\)

B Averages Inequalities

According to the means definition, see Eqs. (1), (2) and (3), their logarithms are:

$$\begin{aligned} \ln \left( \bar{X} \right) =\ln \left( \frac{1}{n} \sum _{i=1}^{n}X_i \right) , \quad \ln \left( \bar{X}_G \right) =\frac{1}{n}\sum _{i=1}^{n} \ln \left( X_i\right) \text{ and } \ln \left( \bar{X}_H \right) =-\ln \left( \frac{1}{n} \sum _{i=1}^{n}\frac{1}{X_i}\right) . \end{aligned}$$

By Jensen’s inequality,

$$\begin{aligned} \ln \left( \frac{1}{n}\sum _{i=1}^nX_i \right) \ge \frac{1}{n} \sum _{i=1}^n \ln \left( X_i\right) , \end{aligned}$$

which can be exponentiated to give the arithmetic mean-geometric mean inequality:

$$\begin{aligned} \underbrace{\frac{1}{n} \sum _{i=1}^{n}X_i}_{\bar{X}} \ge \underbrace{\left( \prod _{i=1}^{n} X_i\right) ^{\frac{1}{n}}}_{\bar{X}_G}, \;\; \text{ thus } \bar{X} \ge \bar{X}_G. \end{aligned}$$

Now comparing the harmonic with the geometric mean (and by Jensen’s inequality):

$$\begin{aligned} -\ln \left( \frac{1}{n} \sum _{i=1}^{n}\frac{1}{X_i}\right) \le -\frac{1}{n} \sum _{i=1}^{n}\ln \left( \frac{1}{X_i}\right) =\frac{1}{n}\sum _{i=1}^{n} \ln \left( X_i\right) , \end{aligned}$$

and by exponentiating both sides:

$$\begin{aligned} \underbrace{\frac{n}{\sum _{i=1}^{n} \frac{1}{X_i}}}_{\bar{X}_H} \le \underbrace{\left( \prod _{i=1}^{n} X_i\right) ^{\frac{1}{n}}}_{\bar{X}_G}, \; \; \text{ thus } \bar{X}_H \le \bar{X}_G. \end{aligned}$$

C Different Meanings of the Center

Median

Consider a first data set: 4, 6, 10, 100 (\(n=4\), even). Thus, the median rank is \(r_M=\frac{1+4}{2}=2.5\), the median is \(\bar{X}_M= \frac{6+10}{2}=8\) and \(\sum _{i=1}^4 \left( r_i-r_M\right) =(1-2.5)+(2-2.5)+(3-2.5)+(4-2.5)=0\).

For a second data set 4, 6, 10, 20, 100 (\(n=5\), odd), the median rank is \(r_M=\frac{1+5}{2}=3\), the median is \(\bar{X}_M= 10\) and \(\sum _{i=1}^5 \left( r_i-r_M \right) =(1-3)+(2-3)+(3-3)+(4-3)+(5-3)=0\). Thus, the median is the center of the distribution in terms of the counting observations: one half of the observations is on the left and one half is on the right of the median, no matter the value of the observations.

Arithmetic average

Consider again the second data set: \(\bar{X}_A=\frac{4+6+10+20+100}{5}=28\) and

\(\sum _{i=1}^5 \left( x_i-\bar{X}_A\right) =(4-28)+(6-28) + (10-28)+ (20-28) + (100-28)=0\).

Thus, the arithmetic average is the center of the distribution in terms of the deviations in absolute terms: the arithmetic mean is such that the absolute deviations on its right is compensated by the absolute deviations on its left. So, the center is defined in terms of the absolute deviations (or distances) between each value and the arithmetic average:

$$\begin{aligned} \underbrace{\underbrace{(4-28)}_{-24}+\underbrace{(6-28)}_{-22}+ \underbrace{(10-28)}_{-18}\underbrace{(20-28)}_{-8}}_{-72}+\underbrace{(100-28)}_{+72}. \end{aligned}$$

Geometric average

For the second data set: \(\bar{X}_G=\root 5 \of {4 \times 6 \times 10 \times 20 \times 100}=13.69\) and

\(x_i\)

\(\ln (x_i)\)

\(\ln (x_i)-\ln (\bar{X}_G)\)

4

1.386

− 123.00%

6

1.792

− 82.45%

10

2.303

− 31.37%

20

2.996

37.94%

100

4.605

198.89%

 

sum

0

Compounding percentage deviation means that:

\(4=13.69 \times \exp (-123\%)\), \(\ldots \), \(100=13.69 \times \exp (198.89\%)\).

Thus, the geometric average is the value that balances the negative percentage deviations with the positive ones.

Harmonic average

\(x_i\)

\(x_i-\bar{X}_H\)

\(w_i=1/x_i\)

\(\frac{w_i}{\sum _{i=1}^n w_i}\)

\(\left( x_i-\bar{X}_H\right) \left( \frac{w_i}{\sum _{i=1}^n w_i}\right) \)

4

− 4.671

0.250

0.434

− 2.025

6

− 2.671

0.167

0.289

− 0.772

10

1.329

0.100

0.173

0.231

20

11.329

0.050

0.087

0.982

100

91.329

0.010

0.017

1.584

  

0.577

1

0

Thus, the harmonic average defines the center of the distribution in order that the weighted deviations on its left compensate the weighted deviations on its right. The weights are inversely proportional to the original values.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dias Curto, J. Averages: There is Still Something to Learn. Comput Econ 60, 755–779 (2022). https://doi.org/10.1007/s10614-021-10165-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-021-10165-y

Keywords

Navigation