Internet Research ( IF 5.9 ) Pub Date : 2021-06-29 , DOI: 10.1108/intr-05-2020-0299 Daejin Kim 1 , Hyoung-Goo Kang 2 , Kyounghun Bae 3 , Seongmin Jeon 4
Purpose
To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American Industry Classification System North American Industry Classification System, and Global Industry Classification Standard Global Industry Classification Standard, the authors explore industry classifications using machine learning methods as an application of interpretable artificial intelligence (AI).
Design/methodology/approach
The authors propose a text-based industry classification combined with a machine learning technique by extracting distinguishable features from business descriptions in financial reports. The proposed method can reduce the dimensions of word vectors to avoid the curse of dimensionality when measuring the similarities of firms.
Findings
Using the proposed method, the sample firms form clusters of distinctive industries, thus overcoming the limitations of existing classifications. The method also clarifies industry boundaries based on lower-dimensional information. The graphical closeness between industries can reflect the industry-level relationship as well as the closeness between individual firms.
Originality/value
The authors’ work contributes to the industry classification literature by empirically investigating the effectiveness of machine learning methods. The text mining method resolves issues concerning the timeliness of traditional industry classifications by capturing new information in annual reports. In addition, the authors’ approach can solve the computing concerns of high dimensionality.
中文翻译:
人工智能赋能的行业分类及其解读
目的
针对传统行业分类系统如标准行业分类标准行业分类、北美行业分类系统北美行业分类系统、全球行业分类标准全球行业分类标准等传统行业分类系统的不足,作者利用机器学习方法探索行业分类:可解释人工智能 (AI) 的应用。
设计/方法/方法
作者通过从财务报告中的业务描述中提取可区分特征,提出了一种基于文本的行业分类与机器学习技术相结合。所提出的方法可以减少词向量的维数,以避免在测量公司相似度时出现维数灾难。
发现
使用所提出的方法,样本公司形成了不同行业的集群,从而克服了现有分类的局限性。该方法还基于低维信息明确了行业边界。行业之间的图形紧密度可以反映行业层面的关系以及个体企业之间的紧密度。
原创性/价值
作者的工作通过实证研究机器学习方法的有效性,为行业分类文献做出了贡献。文本挖掘方法通过捕获年报中的新信息,解决了传统行业分类的时效性问题。此外,作者的方法可以解决高维的计算问题。