当前位置: X-MOL 学术Journal of Official Statistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Lexical Approach to Estimating Environmental Goods and Services Output in the Construction Sector via Soft Classification of Enterprise Activity Descriptions Using Latent Dirichlet Allocation
Journal of Official Statistics ( IF 1.1 ) Pub Date : 2019-09-01 , DOI: 10.2478/jos-2019-0026
Gerard Keogh 1
Affiliation  

Abstract The research question addressed here is whether the semantic value implicit in environmental terms in an activity description text string, can be translated into economic value for firms in the construction sector. We address this question using a relatively new applied statistical method called Latent Dirichlet Allocation (LDA). We first identify a satellite register of firms in construction sector that engage in some form of environmental work. From these we construct a vocabulary of meaningful words. Then, for each firm in turn on this satellite register we take its activity description text string and process this string with LDA. This softly-classifies the descriptions on the satellite register into just seven environmentally relevant topics. With this seven-topic classification we proceed to extract a statistically meaningful weight of evidence associated with environmental terms in each activity description. This weight is applied to the associated firm’s overall output value recorded on our national Business Register to arrive at a supply side estimate of the firm’s EGSS value. On this basis we find the EGSS estimate for construction in Ireland in 2013 is about EURO 229m. We contrast this estimate with estimates from other countries obtained by demand side methods and show it compares satisfactorily, thereby enhancing its credibility. Our method also has the advantage that it provides a breakdown of EGSS output by EU environmental classifications (CEPA/CReMA) as these align closely to discovered topics. We stress the success of this application of LDA relies greatly on our small vocabulary which is constructed directly from the satellite register.

中文翻译:

通过使用潜在狄利克雷分配对企业活动描述进行软分类来估算建筑部门环境商品和服务产出的一种词汇方法

摘要此处讨论的研究问题是,活动描述文本字符串中环境术语中隐含的语义值是否可以转换为建筑业公司的经济价值。我们使用一种称为Latent Dirichlet分配(LDA)的相对较新的应用统计方法来解决此问题。我们首先确定从事某种形式的环境工作的建筑业公司的卫星登记册。通过这些我们构建了有意义的单词的词汇表。然后,对于每个公司,依次打开该卫星寄存器,我们获取其活动描述文本字符串,并使用LDA处理该字符串。这将卫星登记册上的描述轻描淡写地分为七个与环境相关的主题。通过这七个主题的分类,我们继续在每个活动描述中提取与环境术语相关的统计学意义的证据权重。该权重应用于记录在我们国家商业登记簿中的关联公司的总产值,以得出该公司EGSS价值的供应方估计。在此基础上,我们发现2013年EGSS在爱尔兰的建筑预算约为2.29亿欧元。我们将此估计值与通过需求方方法从其他国家获得的估计值进行对比,并显示出令人满意的比较,从而提高了其可信度。我们的方法还具有以下优势:按欧盟环境分类(CEPA / CReMA)提供了EGSS输出的细分,因为这些分类与已发现的主题紧密相关。
更新日期:2019-09-01
down
wechat
bug