Cosine Similarity of Multimodal Content Vectors for TV Programmes,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cosine Similarity of Multimodal Content Vectors for TV Programmes
arXiv - CS - Multimedia Pub Date : 2020-09-23 , DOI: arxiv-2009.11129
Saba Nazir, Taner Cagali, Chris Newell, Mehrnoosh Sadrzadeh

Multimodal information originates from a variety of sources: audiovisual files, textual descriptions, and metadata. We show how one can represent the content encoded by each individual source using vectors, how to combine the vectors via middle and late fusion techniques, and how to compute the semantic similarities between the contents. Our vectorial representations are built from spectral features and Bags of Audio Words, for audio, LSI topics and Doc2vec embeddings for subtitles, and the categorical features, for metadata. We implement our model on a dataset of BBC TV programmes and evaluate the fused representations to provide recommendations. The late fused similarity matrices significantly improve the precision and diversity of recommendations.

中文翻译：

电视节目多模态内容向量的余弦相似度

多模态信息来源于多种来源：视听文件、文本描述和元数据。我们展示了如何使用向量表示由每个单独源编码的内容，如何通过中后期融合技术组合向量，以及如何计算内容之间的语义相似性。我们的向量表示是根据音频、LSI 主题和字幕的 Doc2vec 嵌入的频谱特征和音频词袋以及元数据的分类特征构建的。我们在 BBC 电视节目数据集上实施我们的模型，并评估融合表示以提供建议。后期融合的相似矩阵显着提高了推荐的精度和多样性。

更新日期：2020-11-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>