当前位置: X-MOL 学术Int. J. Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On semi-supervised multiple representation behavior learning
Journal of Computational Science ( IF 3.1 ) Pub Date : 2020-05-23 , DOI: 10.1016/j.jocs.2020.101111
Ruqian Lu , Shengluan Hou

Since Shahshahani and Landgrebe published their seminal paper (Shahshahani and Landgrebe, 1994) [1] in 1994, the study on semi-supervised learning (SSL) developed fast and has already become one of the main streams of machine learning (ML) research. However, there are still some areas or problems where the capability of SSL remains seriously limited. Firstly, according to our observation, almost all SSL researches are towards classification, regression or clustering tasks. More difficult tasks such as planning, construction, summarization, argumentation, etc. are rarely seen studied with SSL methods. Secondly, most SSL researches use only simple labels to (e.g. a string, an identifier, a numerical value, etc.) mark the text data. It is difficult to use such simple labels to characterize data with delicate information. This limitation might be the reason why current SSL technique is not appropriate in processing complex tasks. Thirdly, after entering the age of big data and big knowledge, SSL, like the other branches of ML, is now facing the challenge of learning big knowledge from big data. The shortage of traditional SSL as mentioned above became even more serious and we are looking forward to new technology of SSL.

In this paper, we propose and discuss a novel paradigm of SSL: the semi-supervised multiple representation behavior learning (SSMRBL). It is towards matching the challenge to SSL stated above. SSMRBL should extend current SSL techniques to support complex task learning such as planning, construction, summarization, argumentation etc. In order to meet the challenge, SSMRBL introduces compound structured labels such as trees, graphs, lattices, etc. to represent complicated information of objects and tasks to be learned. Thus, to label an unlabeled datum is to construct a compound structured label for it. As a consequence, SSMRBL needs to have multiple representations. There may be one representation for compound structured labels, one for the target model which is the unification of all local models (labels), one for representing the process (behavior) of label construction, and one for the efficient computation during the learning process. This paper introduces also a typical circumstance of SSMRBL—semi-supervised grammar learning (SSGL), which learns a grammar from a set of natural language texts and then applies this grammar to parse new texts and to summarize its content. We provide also experimental results based on a variety of algorithms to show the reasonability of our ideas.



中文翻译:

关于半监督多表示行为学习

自从Shahshahani和Landgrebe在1994年发表他们的开创性论文(Shahshahani和Landgrebe,1994)[1]以来,关于半监督学习(SSL)的研究发展很快,已经成为机器学习(ML)研究的主要内容之一。但是,在某些领域或问题上,SSL的功能仍然受到严重限制。首先,根据我们的观察,几乎所有SSL研究都针对分类,回归或聚类任务。使用SSL方法很少研究更困难的任务,例如计划,构造,总结,论证等。其次,大多数SSL研究仅使用简单的标签(例如字符串,标识符,数值等)来标记文本数据。很难使用这种简单的标签来表征具有精美信息的数据。该限制可能是当前SSL技术不适用于处理复杂任务的原因。第三,在进入大数据和大知识的时代之后,SSL与ML的其他分支一样,现在面临着从大数据中学习大知识的挑战。如上所述,传统SSL的短缺更加严重,我们期待SSL的新技术。

在本文中,我们提出并讨论了SSL的一种新型范例:半监督多表示行为学习(SSMRBL)。它旨在将挑战与上述SSL相匹配。SSMRBL应该扩展当前的SSL技术,以支持复杂的任务学习,例如计划,构造,摘要,论证等。为了应对这一挑战,SSMRBL引入了复合结构化的标签(例如树,图形,网格等)来表示对象的复杂信息。和要学习的任务。因此,标记未标记的基准就是为其构造复合结构的标记。结果,SSMRBL需要具有多种表示形式。复合结构化标签可能有一种表示形式,目标模型可能是所有局部模型(标签)的统一表示形式,一种用于表示标签构建的过程(行为),另一种用于在学习过程中进行有效的计算。本文还介绍了SSMRBL的典型情况-半监督语法学习(SSGL),该方法从一组自然语言文本中学习语法,然后将该语法应用于新文本的语法分析并总结其内容。我们还提供基于各种算法的实验结果,以证明我们的想法的合理性。

更新日期:2020-05-23
down
wechat
bug