当前位置: X-MOL 学术Comput. Aided Civ. Infrastruct. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model
Computer-Aided Civil and Infrastructure Engineering ( IF 9.6 ) Pub Date : 2022-11-28 , DOI: 10.1111/mice.12954
Gunwoo Yong 1 , Kahyun Jeon 1 , Daeyoung Gil 1 , Ghang Lee 1
Affiliation  

Zero-shot learning, applied with vision-language pretrained (VLP) models, is expected to be an alternative to existing deep learning models for defect detection, under insufficient dataset. However, VLP models, including contrastive language-image pretraining (CLIP), showed fluctuated performance on prompts (inputs), resulting in research on prompt engineering—optimization of prompts for improving performance. Therefore, this study aims to identify the features of a prompt that can yield the best performance in classifying and detecting building defects using the zero-shot and few-shot capabilities of CLIP. The results reveal the following: (1) domain-specific definitions are better than general definitions and images; (2) a complete sentence is better than a set of core terms; and (3) multimodal information is better than single-modal information. The resulting detection performance using the proposed prompting method outperformed that of existing supervised models.

中文翻译:

使用视觉语言预训练模型进行零样本和少样本缺陷检测和分类的快速工程

在数据集不足的情况下,与视觉语言预训练(VLP)模型一起应用的零样本学习有望成为现有深度学习模型的替代品,用于缺陷检测。然而,VLP 模型,包括对比语言图像预训练 (CLIP),在提示(输入)上表现出波动的性能,从而引发了对提示工程的研究——优化提示以提高性能。因此,本研究旨在确定提示的特征,这些特征可以使用 CLIP 的零样本和少样本功能在分类和检测建筑缺陷方面产生最佳性能。结果表明:(1)特定领域的定义优于通用定义和图像;(2) 一个完整的句子优于一组核心术语;(3)多模态信息优于单模态信息。
更新日期:2022-11-28
down
wechat
bug