Abstract
Real estate is one of the most critical investments in the household portfolio, and represents the greatest proportion of wealth of the private households in highly developed countries. This research provides a succinct review of machine learning techniques for predicting house prices. Data on dwelling transaction prices in Taipei City were collected from the real price registration system of the Ministry of the Interior, Taiwan. Four well-known artificial intelligence techniques—Artificial Neural Networks (ANNs), Support Vector Machine, Classification and Regression Tree, and Linear Regression- were used to develop both baseline and ensemble models. A hybrid model was also built and its predictive performance compared with those of the individual models in both baseline and ensemble schemes. The comprehensive comparison indicated that the particle swarm optimization (PSO)-Bagging-ANNs hybrid model outperforms the other models that are proposed herein as well as others that can be found in the literature. The provision of multiple prediction models allows users to determine the most suitable one, based on their background, needs, and comprehension of machine learning, for predicting house prices.
Similar content being viewed by others
Data availability
The fundamental codes and data that support the findings of this study are available from the corresponding author under reasonable request.
References
Adetiloye, K. A., & Eke, P. O. (2014). A review of real estate valuation and optimal pricing techniques. Asian Economic and Financial Review, 4(12), 1878–1893.
Alfiyatin, A. N., Febrita, R. E., Taufiq, H., & Mahmudy, W. F. (2017). Modeling house price prediction using regression analysis and particle swarm optimization. International Journal of Advanced Computer Science and Applications, 8(10), 323–326. https://doi.org/10.14569/IJACSA.2017.081042
Armaghani, D. J., Raja, R. S. N. S. B., Faizi, K., & Rashid, A. S. A. (2017). Developing a hybrid PSO–ANN model for estimating the ultimate bearing capacity of rock-socketed piles. Neural Computing and Applications, 28(2), 391–405. https://doi.org/10.1007/s00521-015-2072-z
Bahia, I. S. H. (2013). A data mining model by using ANN for predicting real estate market: Comparative study. International Journal of Intelligence Science, 3(4), 162–169. https://doi.org/10.4236/ijis.2013.34017
Barzegar, R., Adamowski, J., & Moghaddam, A. A. (2016). Application of wavelet-artificial intelligence hybrid models for water quality prediction: A case study in Aji-Chay River, Iran. Stochastic Environmental Research and Risk Assessment, 30(7), 1797–1819. https://doi.org/10.1007/s00477-016-1213-y
Chaphalkar, N., & Sandbhor, S. (2013). Use of artificial intelligence in real property valuation. International Journal of Engineering and Technology, 5(3), 2334–2337.
Chau, K. W., & Chin, T. (2003). A critical review of literature on the hedonic price model. International Journal for Housing Science and Its Applications, 27(2), 145–165.
Cheng, M.-Y., Prayogo, D., & Wu, Y.-W. (2019). A self-tuning least squares support vector machine for estimating the pavement rutting behavior of asphalt mixtures. Soft Computing, 23(17), 7755–7768. https://doi.org/10.1007/s00500-018-3400-x
Chiang, C., Han, C.-C., Chiang, Y.-M., Tsai, T.-C., Wu, F.-S., & Seng, D. (2015). Funding liquidity in the news and housing price. Market Liquidity, 1–25. https://doi.org/10.2139/ssrn.2565340
Chiang, Y.-H., Chuang, Y.-T., & Chang, C.-O. (2017). The impact of public bike station on residential housing price in Taipei City. Transportation Planning Journal, 46(4), 399–428. https://www.AiritiLibrary.com/Publication/Index/10177159-201712-201802050017-201802050017-399-428
Chou, J.-S., & Bui, D.-K. (2014). Modeling heating and cooling loads by artificial intelligence for energy-efficient building design. Energy and Buildings, 82, 437–446. https://doi.org/10.1016/j.enbuild.2014.07.036
Chou, J.-S., Ho, C.-C., & Hoang, H.-S. (2018). Determining quality of water in reservoir using machine learning. Ecological Informatics, 44, 57–75. https://doi.org/10.1016/j.ecoinf.2018.01.005
Chou, J.-S., Ngo, N.-T., & Chong, W. K. (2017). The use of artificial intelligence combiners for modeling steel pitting risk and corrosion rate. Engineering Applications of Artificial Intelligence, 65, 471–483. https://doi.org/10.1016/j.engappai.2016.09.008
Chou, J.-S., & Tran, D.-S. (2018). Forecasting energy consumption time series using machine learning techniques based on usage patterns of residential householders. Energy, 165, 709–726. https://doi.org/10.1016/j.energy.2018.09.144
Chou, J.-S., & Truong, D.-N. (2021). Multistep energy consumption forecasting by metaheuristic optimization of time-series analysis and machine learning. International Journal of Energy Research, 45(3), 4581–4612. https://doi.org/10.1002/er.6125
Chou, J.-S., Truong, D.-N., Le, T.-L., & Thu Ha Truong, T. (2021a). Bio-inspired optimization of weighted-feature machine learning for strength property prediction of fiber-reinforced soil. Expert Systems with Applications, 180, 115042. https://doi.org/10.1016/j.eswa.2021.115042
Chou, J.-S., Truong, D.-N., & Tsai, C.-F. (2021b). Solving regression problems with intelligent machine learner for engineering informatics. Mathematics, 9(6), 686. https://doi.org/10.3390/math9060686
Chou, J.-S., Yang, K.-H., & Lin, J.-Y. (2016). Peak shear strength of discrete fiber-reinforced soils computed by machine learning and metaensemble methods. Journal of Computing in Civil Engineering, 30(6), 04016036. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000595
Claesen, M., & De Moor, B. (2015). Hyperparameter search in machine learning. In MIC 2015: The XI Metaheuristics International Conference in Agadir, Morocco, pp. 1-5. Retrieved September 7, 2021, from https://arxiv.org/abs/1502.02127
Dawson, R. (2011). How significant is a boxplot outlier? Journal of Statistics Education, 19(2), 1–13. https://doi.org/10.1080/10691898.2011.11889610
Delmendo, L. C. (2021). Taiwan’s house prices surging, amidst strong economic growth. Retrieved September 7, 2021 from https://www.globalpropertyguide.com/Asia/Taiwan/Price-History
Dey, A., Miyani, G., & Sil, A. (2019). Application of artificial neural network (ANN) for estimating reliable service life of reinforced concrete (RC) structure bookkeeping factors responsible for deterioration mechanism. Soft Computing, 24(3), 2109-2123. https://doi.org/10.1007/s00500-019-04042-y
Do, H., Silverman, H. F., & Yu, Y. (2007). A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, pp. I-121–I-124. https://doi.org/10.1109/ICASSP.2007.366631
Du, Y.-S., Song, F.-C., Zeng, Y.-S., Ge, J.-N., & Chen, F.-Y. (2013). Retrospective analysis of hedonic price model in Taiwan. Quarterly Research on Land Issues, 12(2), 44–57.
Elbeltagi, E., Hegazy, T., & Grierson, D. (2005). Comparison among five evolutionary-based optimization algorithms. Advanced Engineering Informatics, 19(1), 43–53. https://doi.org/10.1016/j.aei.2005.01.004
Erdal, H. I., & Karakurt, O. (2013). Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms. Journal of Hydrology, 477, 119–128. https://doi.org/10.1016/j.jhydrol.2012.11.015
Fallahi, A., & Jafari, S. (2011). An expert system for detection of breast cancer using data preprocessing and bayesian network. International Journal of Advanced Science and Technology, 34, 65–70.
Fan, C., Cui, Z., & Zhong, X. (2018). House prices prediction with machine learning algorithms. In ICMLC 2018: Proceedings of the 2018 10th International Conference on Machine Learning and Computing (pp. 6–10). https://doi.org/10.1145/3195106.3195133
Fan, G.-Z., Ong, S. E., & Koh, H. C. (2006). Determinants of house price: A decision tree approach. Urban Studies, 43(12), 2301–2315. https://doi.org/10.1080/00420980600990928
Fumo, N., & Rafe Biswas, M. A. (2015). Regression analysis for prediction of residential energy consumption. Renewable and Sustainable Energy Reviews, 47, 332–343. https://doi.org/10.1016/j.rser.2015.03.035
Geng, B., Bao, H., & Liang, Y. (2015). A study of the effect of a high-speed rail station on spatial variations in housing price based on the hedonic model. Habitat International, 49, 333–339. https://doi.org/10.1016/j.habitatint.2015.06.005
Ghasemi, M., Akbari, E., Rahimnejad, A., Razavi, S. E., Ghavidel, S., & Li, L. (2019). Phasor particle swarm optimization: A simple and efficient variant of PSO. Soft Computing, 23(19), 9701–9718. https://doi.org/10.1007/s00500-018-3536-8
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Hammouche, K., Diaf, M., & Siarry, P. (2010). A comparative study of various meta-heuristic techniques applied to the multilevel thresholding problem. Engineering Applications of Artificial Intelligence, 23(5), 676–688. https://doi.org/10.1016/j.engappai.2009.09.011
Hira, Z. M., & Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data. Advances in bioinformatics, 2015, 198363. https://doi.org/10.1155/2015/198363
Huang, Y., McCullagh, P. J., & Black, N. D. (2009). An optimization of ReliefF for classification in large datasets. Data & Knowledge Engineering, 68(11), 1348–1356. https://doi.org/10.1016/j.datak.2009.07.011
Huang, Y.-J., Chiang, Y.-H., & Chang, C.-O. (2017a). Impact of public housing on nearby residential property values in Taipei city. Journal of City and Planning, 44(3), 277–302. https://doi.org/10.6128/CP.44.3.277
Huang, Z., Chen, R., Xu, D., & Zhou, W. (2017b). Spatial and hedonic analysis of housing prices in Shanghai. Habitat International, 67, 69–78. https://doi.org/10.1016/j.habitatint.2017.07.002
Isa, I., Saad, Z., Omar, S., Osman, M., Ahmad, K., & Sakim, H. M. (2010). Suitable MLP network activation functions for breast cancer and thyroid disease detection. In 2010 second international conference on computational intelligence, modelling and simulation (pp. 39–44). IEEE. https://doi.org/10.1109/CIMSiM.2010.93
Job, F., Mathew, D. S., Meyer, D. T., & Narbey, S. (2021). An investigation on the experimental analysis and MATLAB simulation for dye-sensitized solar cell. Materials Today: Proceedings, 1–7. https://doi.org/10.1016/j.matpr.2021.07.225
Kamaruddin, S., & Ravi, V. (2016). Credit card fraud detection using big data analytics: Use of PSOANN based one-class classification. In Proceedings of the International Conference on Informatics and Analytics (pp. 1–8). ACM. https://doi.org/10.1145/2980258.2980319
Karlik, B., & Olgac, A. V. (2011). Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems, 1(4), 111–122.
Kaur, H., & Salaria, D. S. (2013). Bayesian regularization based neural network tool for software effort estimation. Global Journal of Computer Science and Technology, 13(2), 45–50.
Kazienko, P., Lughofer, E., & Trawiński, B. (2013). Hybrid and ensemble methods in machine learning J. UCS special issue. Journal of Universal Computer Science, 19(4), 457–461.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN'95— International Conference on Neural Networks, 4, 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
Khamis, A. B., & Kamarudin, N. K. K. B. (2014). Comparative study on estimate house price using statistical and neural network model. International Journal of Scientific & Technology Research, 3(12), 126–131.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence, 2, 1137–11452.
Kontrimas, V., & Verikas, A. (2011). The mass appraisal of the real estate by computational intelligence. Applied Soft Computing, 11(1), 443–448. https://doi.org/10.1016/j.asoc.2009.12.003
Kouwenberg, R., & Zwinkels, R. (2014). Forecasting the US housing market. International Journal of Forecasting, 30(3), 415–425. https://doi.org/10.1016/j.ijforecast.2013.12.010
Li, J., Liu, X., Liu, J., & Li, W. (2016). City profile: Taipei. Cities, 55, 1–8. https://doi.org/10.1016/j.cities.2016.03.007
Limsombunchai, V., Gan, C., & Lee, M. (2004). House price prediction: hedonic price model vs. artificial neural network. American Journal of Applied Sciences, 1(3), 193-201. https://doi.org/10.3844/ajassp.2004.193.201
Lin, S.-J. (2004). The marginal willingness-to-pay of star public elementary and junior high school districts in Taipei City. Journal of Housing Studies, 13(1), 15–34. https://doi.org/10.6375/JHS.200406.0015
Liu, R., & Liu, L. (2019). Predicting housing price in China based on long short-term memory incorporating modified genetic algorithm. Soft Computing, 23(22), 11829-11839. https://doi.org/10.1007/s00500-018-03739-w
Merlini, D., & Rossini, M. (2021). Text categorization with WEKA: A survey. Machine Learning with Applications, 4, 100033. https://doi.org/10.1016/j.mlwa.2021.100033
Methaprayoon, K., Yingvivatanapong, C., Lee, W.-J., & Liao, J. R. (2007). An integration of ANN wind power estimation into unit commitment considering the forecasting uncertainty. IEEE Transactions on Industry Applications, 43(6), 1441–1448. https://doi.org/10.1109/TIA.2007.908203
Muralidharan, S., Phiri, K., Sinha, S. K., & Kim, B. (2018). Analysis and prediction of real estate prices: A case of the Boston housing market. Issues in Information Systems, 19(2), 109–118. https://doi.org/10.48009/2_iis_2018_109-118
Núñez-Tabales, J., Rey Carmona, F., & Caridad, J. (2013). Implicit prices in urban real estate valuation. Revista de la Construcción, 12(2), 116–126. https://doi.org/10.4067/S0718-915X2013000200009
Palma-Mendoza, R.-J., Rodriguez, D., & De-Marcos, L. (2018). Distributed ReliefF-based feature selection in Spark. Knowledge and Information Systems, 57(1), 1–20. https://doi.org/10.1007/s10115-017-1145-y
Peter, S. E., & Raglend, I. J. (2017). Sequential wavelet-ANN with embedded ANN-PSO hybrid electricity price forecasting model for Indian energy exchange. Neural Computing and Applications, 28(8), 2277–2292. https://doi.org/10.1007/s00521-015-2141-3
Potter, K. (2006). Methods for presenting statistical information: The box plot. Visualization of Large and Unstructured Data Sets, GI-Edition Lecture Notes in Informatics (LNI), S-4, 97–106.
Qian, B., & Rasheed, K. (2007). Stock market prediction with multiple classifiers. Applied Intelligence, 26(1), 25–33. https://doi.org/10.1007/s10489-006-0001-7
Rong, L. H., & Sun, Y. M. (2014). The analysis of second-hand housing price influencing factors based on hedonic model and WEB information. Applied Mechanics and Materials, 587–589, 2285–2289. https://doi.org/10.4028/www.scientific.net/AMM.587-589.2285
Rosen, S. (1974). Hedonic prices and implicit markets: Product differentiation in pure competition. Journal of Political Economy, 82(1), 34–55. https://doi.org/10.1086/260169
Sarip, A. G., & Hafez, M. B. (2015). Fuzzy logic application for house price prediction. International Journal of Property Sciences, 5(1), 24–30. https://doi.org/10.22452/ijps.vol5no1.3
Schwertman, N. C., Owens, M. A., & Adnan, R. (2004). A simple more general boxplot method for identifying outliers. Computational Statistics & Data Analysis, 47(1), 165–174. https://doi.org/10.1016/j.csda.2003.10.012
Urbanowicz, R. J., Meeker, M., La Cava, W., Olson, R. S., & Moore, J. H. (2018). Relief-based feature selection: introduction and review. Journal of biomedical informatics, 85, 189–203. https://doi.org/10.1016/j.jbi.2018.07.014
Vapnik, V. N. (1995). The nature of statistical learning theory. Springer. https://doi.org/10.1007/978-1-4757-2440-0
Varma, A., Sarma, A., Doshi, S., & Nair, R. (2018). House price prediction using machine learning and neural networks. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (pp. 1936–1939). IEEE. https://doi.org/10.1109/ICICCT.2018.8473231
Wang, L. V., & Yao, G. (2001). Ultrasound-modulated laser tomography. In Saratov Fall Meeting 2000: Optical Technologies in Biophysics and Medicine II, 4241, pp. 1–5. Retrieved September 7, 2021, from https://doi.org/10.1117/12.431526
Wang, X. (2011). The application of SPSS in empirical research of housing hedonic price. In 2011 International Conference on Multimedia Technology (pp. 3262–3265). IEEE. https://doi.org/10.1109/ICMT.2011.6003072
Wang, X., Wen, J., Zhang, Y., & Wang, Y. (2014). Real estate price forecasting based on SVM optimized by PSO. Optik, 125(3), 1439–1443. https://doi.org/10.1016/j.ijleo.2013.09.017
Wang, Y., Chen, P.-C., Ma, H.-W., Cheng, K.-L., & Chang, C.-Y. (2018). Socio-economic metabolism of urban construction materials: A case study of the Taipei metropolitan area. Resources, Conservation and Recycling, 128, 563–571. https://doi.org/10.1016/j.resconrec.2016.08.019
Wei, W., Guang-ji, T., & Hong-rui, Z. (2010). Empirical analysis on the housing price in Harbin City based on hedonic model. In 2010 International Conference on Management Science & Engineering 17th Annual Conference Proceedings (pp. 1659–1664). IEEE. https://doi.org/10.1109/ICMSE.2010.5720005
Wen, H.-Z., Sheng-hua, J., & Xiao-yu, G. (2005). Hedonic price analysis of urban housing: An empirical research on Hangzhou, China. Journal of Zhejiang University-Science A, 6(8), 907–914. https://doi.org/10.1631/jzus.2005.A0907
Williamson, D. F., Parker, R. A., & Kendrick, J. S. (1989). The box plot: A simple visual method to interpret data. Annals of Internal Medicine, 110(11), 916–921. https://doi.org/10.7326/0003-4819-110-11-916
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
Wu, J., Wang, M., Li, W., Peng, J., & Huang, L. (2015). Impact of urban green space on residential housing prices: Case study in Shenzhen. Journal of Urban Planning and Development, 141(4), 05014023. https://doi.org/10.1061/(ASCE)UP.1943-5444.0000241
Wu, J. Y. (2017). Housing price prediction using support vector regression (pp. 1–56). San José State University. https://doi.org/10.31979/etd.vpub-6bgs
Xiao, Y. (2017). Urban morphology and housing market. Springer. https://doi.org/10.1007/978-981-10-2762-8
Yang, C.-H. & Su, S.-H. (2011). The impacts of housing price in YIMBY and NIMBY. Journal of Housing Studies, 20(2), 61–80. https://doi.org/10.6375/JHS.201112.0062
Yu, H., & Xiaohui, W. (2011). PSO-based energy-balanced double cluster-heads clustering routing for wireless sensor networks. Procedia Engineering, 15, 3073–3077. https://doi.org/10.1016/j.proeng.2011.08.576
Zhang, H., & Zhang, M. (2010). Environment hedonic price analysis: Evidence from Jilin city. In 2010 Second International Conference on Communication Systems, Networks and Applications (pp. 354–356). IEEE. https://doi.org/10.1109/ICCSNA.2010.5588741
Funding
Funding was provided by Ministry of Science and Technology, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chou, JS., Fleshman, DB. & Truong, DN. Comparison of machine learning models to provide preliminary forecasts of real estate prices. J Hous and the Built Environ 37, 2079–2114 (2022). https://doi.org/10.1007/s10901-022-09937-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10901-022-09937-1