Combination of machine learning-based automatic valuation models for residential properties in South Korea
Abstract
The applicability of machine learning (ML) techniques has recently been expanding to include automatic real estate valuation models. The main advantage of this technique is that it can better capture complexity in the value determination process. Therefore, the performance of these techniques is shown to be superior to conventional models. In this paper, the latest ML algorithms (i.e., support vector machine, random forest, XGBoost, LightGBM, and CatBoost algorithms) are examined as automatic valuation models, and several combination methods are proposed to improve the models’ predictive power. We applied ML models to approximately 57,000 records on apartment transactions, which were provided by South Korea’s Ministry of Land, Infrastructure, and Transport, that occurred in Seoul in 2018. The results are as follows. First, ML-based predictors (especially, the latest decision tree-based algorithms) are more performative than conventional models. Second, the prediction error from a model can be partially offset by another model’s error, which implies that an efficient averaging of the predictors improves their predictive accuracy. Third, the models’ relative performance may be relearned by the ML algorithms, which means that they can also be used to recommend which algorithm should be selected for making predictions.
Keyword : automatic valuation model, mass appraisal, machine learning (ML) techniques, combined approach, decision tree-based algorithms
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588. https://doi.org/10.1162/neco.1997.9.7.1545
Antipov, E. A., & Pokryshevskaya, E. B. (2012). Mass appraisal of residential apartments: an application of Random Forest for valuation and a CART-based approach for model diagnostics. Expert Systems with Applications, 39(2), 1772–1778. https://doi.org/10.1016/j.eswa.2011.08.077
Bellotti, A. (2017). Reliable region predictions for automated valuation models. Annals of Mathematics and Artificial Intelligence, 81(1–2), 71–84. https://doi.org/10.1007/s10472-016-9534-6
Binoy, B. V., Naseer, M. A., Kumar, P. A., & Lazar, N. (2022). A bibliometric analysis of property valuation research. International Journal of Housing Markets and Analysis, 15(1), 35–54. https://doi.org/10.1108/IJHMA-09-2020-0115
Bogin, A. N., & Shui, J. (2020). Appraisal accuracy and automated valuation models in rural areas. Journal of Real Estate Finance and Economics, 60(1–2), 40–52. https://doi.org/10.1007/s11146-019-09712-0
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (pp. 144–152). Association for Computing Machinery. https://doi.org/10.1145/130385.130401
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Cannon, S. E., & Cole, R. A. (2011). How accurate are commercial real estate appraisals? Evidence from 25 years of NCREIF sales data. Journal of Portfolio Management, 37(5), 68–88. https://doi.org/10.3905/jpm.2011.37.5.068
Chau, K. W., & Chin, T. L. (2003). A critical review of literature on the hedonic price model. International Journal for Housing Science and its Applications, 27(2), 145–165.
Chau, K., Wong, S., Yiu, C., & Leung, H. (2005). Real estate price indices in Hong Kong. Journal of Real Estate Literature, 13(3), 337–356. https://doi.org/10.1080/10835547.2005.12090166
Chen, J. H., Ong, C. F., Zheng, L., & Hsu, S. C. (2017). Forecasting spatial dynamics of the housing market using support vector machine. International Journal of Strategic Property Management, 21(3), 273–283. https://doi.org/10.3846/1648715X.2016.1259190
Chen, T., & Guestrin, C. (2016, August). Xgboost: a scalable tree boosting system. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
Chris, A. (2020, July 15). Price rankings by city of price per square meter to buy apartment in city centre (buy apartment price). https://www.numbeo.com/cost-of-living/city_price_rankings?itemId=100
Čeh, M., Kilibarda, M., Lisec, A., & Bajat, B. (2018). Estimating the performance of random forest versus multiple regression for predicting prices of the apartments. ISPRS International Journal of Geo-Information, 7(5), 168. https://doi.org/10.3390/ijgi7050168
Deaconu, A., Buiga, A., & Tothăzan, H. (2022). Real estate valuation models performance in price prediction. International Journal of Strategic Property Management, 26(2), 86–105. https://doi.org/10.3846/ijspm.2022.15962
Dimopoulos, T., Tyralis, H., Bakas, N. P., & Hadjimitsis, D. (2018). Accuracy measurement of random forests and linear regression for mass appraisal models that estimate the prices of residential apartments in Nicosia, Cyprus. Advances in Geosciences, 45, 377–382. https://doi.org/10.5194/adgeo-45-377-2018
Do, A. Q., & Grudnitski, G. (1992). A neural network approach to residential property appraisal. Real Estate Appraiser, 58(3), 38–45.
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). Catboost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.
Dubin, R. A., & Sung, C. H. (1990). Specification of hedonic regressions: non-nested tests on measures of neighborhood quality. Journal of Urban Economics, 27(1), 97–110. https://doi.org/10.1016/0094-1190(90)90027-K
Fan, G. Z., Ong, S. E., & Koh, H. C. (2006). Determinants of house price: a decision tree approach. Urban Studies, 43(12), 2301–2315. https://doi.org/10.1080/00420980600990928
Feng, S. T., Peng, C. W., Yang, C. H., & Chen, P. W. (2021). Non-linear relationships between house size and price. International Journal of Strategic Property Management, 25(3), 240–253. https://doi.org/10.3846/ijspm.2021.14607
Fletcher, M., Gallimore, P., & Mangan, J. (2000). Heteroscedasticity in hedonic house price models. Journal of Property Research, 17(2), 93–108. https://doi.org/10.1080/095999100367930
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Gabrielli, L., & French, N. (2021). Pricing to market: property valuation methods–a practical review. Journal of Property Investment & Finance, 39(5), 464–480. https://doi.org/10.1108/JPIF-09-2020-0101
Garrod, G. D., & Willis, K. G. (1992). Valuing goods’ characteristics: an application of the hedonic price method to environmental attributes. Journal of Environmental Management, 34(1), 59–76. https://doi.org/10.1016/S0301-4797(05)80110-0
Glumac, B., & Des Rosiers, F. (2021). Practice briefing–Automated valuation models (AVMs): their role, their advantages and their limitations. Journal of Property Investment and Finance, 39(5), 481–491. https://doi.org/10.1108/JPIF-07-2020-0086
Gnat, S. (2021). Property mass valuation on small markets. Land, 10(4), 388. https://doi.org/10.3390/land10040388
Guo, J. Q., Chiang, S. H., Liu, M., Yang, C. C., & Guo, K. Y. (2020). Can machine learning algorithms associated with text mining from internet data improve housing price prediction performance? International Journal of Strategic Property Management, 24(5), 300–312. https://doi.org/10.3846/ijspm.2020.12742
Han, X., & Clemmensen, L. (2014). On weighted support vector regression. Quality and Reliability Engineering International, 30(6), 891–903. https://doi.org/10.1002/qre.1654
Hannonen, M. (2005). An analysis of land prices: a structural time‐series approach. International Journal of Strategic Property Management, 9(3), 145–172. https://doi.org/10.3846/1648715X.2005.9637534
Ho, T. K. (1995, August). Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). IEEE Publications.
Ho, W. K., Tang, B. S., & Wong, S. W. (2021). Predicting property prices with machine learning algorithms. Journal of Property Research, 38(1), 48–70. https://doi.org/10.1080/09599916.2020.1832558
Hong, J., Choi, H., & Kim, W. S. (2020). A house price valuation based on the random forest approach: the mass appraisal of residential property in South Korea. International Journal of Strategic Property Management, 24(3), 140–152. https://doi.org/10.3846/ijspm.2020.11544
Huh, S., & Kwak, S. J. (1997). The choice of functional form and variables in the hedonic price model in Seoul. Urban Studies, 34(7), 989–998. https://doi.org/10.1080/0042098975691
Yeap, G. P., & Lean, H. H. (2020). Nonlinear relationship between housing supply and house price in Malaysia. International Journal of Strategic Property Management, 24(5), 313–322. https://doi.org/10.3846/ijspm.2020.12343
Yilmazer, S., & Kocaman, S. (2020). A mass appraisal assessment study using machine learning based on multiple regression and random forest. Land Use Policy, 99, 104889. https://doi.org/10.1016/j.landusepol.2020.104889
Yu, D. (2007). Modeling owner-occupied single-family house values in the city of Milwaukee: a geographically weighted regression approach. GIScience and Remote Sensing, 44(3), 267–282. https://doi.org/10.2747/1548-1603.44.3.267
Kain, J. F., & Quigley, J. M. (1970). Measuring the value of housing quality. Journal of the American Statistical Association, 65(330), 532–548. https://doi.org/10.1080/01621459.1970.10481102
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: a highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. https://doi.org/10.1109/34.667881
Kok, N., Koponen, E. L., & Martínez-Barbosa, C. A. (2017). Big data in real estate? From manual appraisal to automated valuation. Journal of Portfolio Management, 43(6), 202–211. https://doi.org/10.3905/jpm.2017.43.6.202
Kryvobokov, M., & Wilhelmsson, M. (2007). Analysing location attributes with a hedonic model for apartment prices in Donetsk, Ukraine. International Journal of Strategic Property Management, 11(3), 157–178. https://doi.org/10.3846/1648715X.2007.9637567
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems, 7, 231–238.
Lancaster, K. J. (1966). A new approach to consumer theory. Journal of Political Economy, 74(2), 132–157. https://doi.org/10.1086/259131
Lee, T. W., & Chen, K. (2016). Prediction of house unit price in Taipei City using support vector regression [Conference presentation]. Asia Pacific Industrial Engineering and Management Systems Conference, Taipei City, China.
Levantesi, S., & Piscopo, G. (2020). The importance of economic variables on London real estate market: a random forest approach. Risks, 8(4), 112. https://doi.org/10.3390/risks8040112
Li, M. M., & Brown, H. J. (1980). Micro-neighborhood externalities and hedonic housing prices. Land Economics, 56(2), 125–141. https://doi.org/10.2307/3145857
Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18–22.
Limsombunchai, V. (2004, June). House price prediction: hedonic price model vs. artificial neural network. In New Zealand Agricultural and Resource Economics Society Conference (pp. 25–26), Blenheim, New Zealand.
Lin, H., & Chen, K. (2011, July). Predicting price of Taiwan real estates by neural networks and support vector regression. In Proceedings of the 15th WSEAS International Conference on Systems (pp. 220–225), Corfu Island, Greece.
Liu, C. L. (2005). Classifier combination based on confidence transformation. Pattern Recognition, 38(1), 11–28. https://doi.org/10.1016/j.patcog.2004.05.013
Lu, C. J., Lee, T. S., & Chiu, C. C. (2009). Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems, 47(2), 115–125. https://doi.org/10.1016/j.dss.2009.02.001
Malpezzi, S. (2003). Hedonic pricing models: a selective and applied review. Housing Economics and Public Policy, 1, 67–89. https://doi.org/10.1002/9780470690680.ch5
McCluskey, W. J., Deddis, W. G., Lamont, I. G., & Borst, R. A. (2000). The application of surface generated interpolation models for the prediction of residential property values. Journal of Property Investment and Finance, 18(2), 162–176. https://doi.org/10.1108/14635780010324321
McCluskey, W., & Anand, S. (1999). The application of intelligent hybrid techniques for the mass appraisal of residential properties. Journal of Property Investment and Finance, 17(3), 218–239. https://doi.org/10.1108/14635789910270495
McCluskey, W., Davis, P., Haran, M., McCord, M., & McIlhatton, D. (2012). The potential of artificial neural networks in mass appraisal: the case revisited. Journal of Financial Management of Property and Construction, 17(3), 274–292. https://doi.org/10.1108/13664381211274371
McMillan, M. L., Reid, B. G., & Gillen, D. W. (1980). An extension of the hedonic approach for estimating the value of quiet. Land Economics, 56(3), 315–328. https://doi.org/10.2307/3146034
Merz, C., & Pazzani, M. (1996). Combining neural network regression estimates with regularized linear weights. Advances in Neural Information Processing Systems, 9, 564–570.
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics, 18(6), 275–285. https://doi.org/10.1002/cem.873
Pace, R. K., & Hayunga, D. (2020). Examining the information content of residuals from hedonic and spatial models using trees and forests. Journal of Real Estate Finance and Economics, 60(1–2), 170–180. https://doi.org/10.1007/s11146-019-09724-w
Pagourtzi, E., Assimakopoulos, V., Hatzichristos, T., & French, N. (2003). Real estate appraisal: a review of valuation methods. Journal of Property Investment & Finance, 21(4), 383–401. https://doi.org/10.1108/14635780310483656
Pi-ying, L. (2011). Analysis of the mass appraisal model by using artificial neural network in Kaohsiung city. Journal of Modern Accounting and Auditing, 7(10), 1080.
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2017). Catboost: unbiased boosting with categorical features. arXiv preprint arXiv:1706.09516.
Raymond, Y. C. (1997). An application of the ARIMA model to real‐estate prices in Hong Kong. Journal of Property Finance, 8(2), 152–163. https://doi.org/10.1108/09588689710167843
Rosen, S. (1974). Hedonic prices and implicit markets: product differentiation in pure competition. Journal of Political Economy, 82(1), 34–55. https://doi.org/10.1086/260169
Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3), 660–674. https://doi.org/10.1109/21.97458
Selim, H. (2009). Determinants of house prices in Turkey: hedonic regression versus artificial neural network. Expert Systems with Applications, 36(2), 2843–2852. https://doi.org/10.1016/j.eswa.2008.01.044
Sheppard, S. (1999). Chapter 41 Hedonic analysis of housing markets. Handbook of Regional and Urban Economics, 3, 1595–1635. https://doi.org/10.1016/S1574-0080(99)80010-8
Sims, S., Dent, P., & Oskrochi, G. R. (2008). Modelling the impact of wind farms on house prices in the UK. International Journal of Strategic Property Management, 12(4), 251–269. https://doi.org/10.3846/1648-715X.2008.12.251-269
Sing, T. F., Yang, J. J., & Yu, S. M. (2022). Boosted tree ensembles for artificial intelligence based automated valuation models (AI-AVM). Journal of Real Estate Finance and Economics, 65, 649–674. https://doi.org/10.1007/s11146-021-09861-1
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
Song, Y. Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130–135.
Taniguchi, M., & Tresp, V. (1997) Averaging regularized estimators. Neural Computation, 9(5), 1163–1178. https://doi.org/10.1162/neco.1997.9.5.1163
Torres-Pruñonosa, J., García-Estévez, P., & Prado-Román, C. (2021). Artificial neural network, quantile and semi-log regression modelling of mass appraisal in housing. Mathematics, 9(7), 783. https://doi.org/10.3390/math9070783
Verikas, A., Lipnickas, A., & Malmqvist, K. (2002). Selecting neural networks for a committee decision. International Journal of Neural Systems, 12(5), 351–361. https://doi.org/10.1142/S0129065702001229
Verikas, A., Lipnickas, A., Malmqvist, K., Bacauskiene, M., & Gelzinis, A. (1999). Soft combination of neural classifiers: a comparative study. Pattern Recognition Letters, 20(4), 429–444. https://doi.org/10.1016/S0167-8655(99)00012-4
Wang, D., & Li, V. J. (2019). Mass appraisal models of real estate in the 21st century: a systematic literature review. Sustainability, 11(24), 7006. https://doi.org/10.3390/su11247006
Wikimedia Commons. (2005). Districts of Seoul [Digital image]. https://commons.wikimedia.org/wiki/File:Map_Seoul_districts_de.png
Zhou, G., Ji, Y., Chen, X., & Zhang, F. (2018). Artificial neural networks and the mass appraisal of real estate. International Journal of Online Engineering, 14(3), 180–187. https://doi.org/10.3991/ijoe.v14i03.8420
Zurada, J., Levitan, A., & Guan, J. (2011). A comparison of regression and artificial intelligence methods in a mass appraisal context. Journal of Real Estate Research, 33(3), 349–388. https://doi.org/10.1080/10835547.2011.12091311