当前期刊: arXiv - CS - Multimedia Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
    arXiv.cs.MM Pub Date : 2021-01-21
    Ruilong Li; Shan Yang; David A. Ross; Angjoo Kanazawa

    In this paper, we present a transformer-based learning framework for 3D dance generation conditioned on music. We carefully design our network architecture and empirically study the keys for obtaining qualitatively pleasing results. The critical components include a deep cross-modal transformer, which well learns the correlation between the music and dance motion; and the full-attention with future-N

    更新日期:2021-01-22
  • Weighted Fuzzy-Based PSNR for Watermarking
    arXiv.cs.MM Pub Date : 2021-01-21
    Maedeh Jamali; Nader Karimi; Shadrokh Samavi

    One of the problems of conventional visual quality evaluation criteria such as PSNR and MSE is the lack of appropriate standards based on the human visual system (HVS). They are calculated based on the difference of the corresponding pixels in the original and manipulated image. Hence, they practically do not provide a correct understanding of the image quality. Watermarking is an image processing

    更新日期:2021-01-22
  • Guidelines for the Development of Immersive Virtual Reality Software for Cognitive Neuroscience and Neuropsychology: The Development of Virtual Reality Everyday Assessment Lab (VR-EAL)
    arXiv.cs.MM Pub Date : 2021-01-20
    Panagiotis Kourtesis; Danai Korre; Simona Collina; Leonidas A. A. Doumas; Sarah E. MacPherson

    Virtual reality (VR) head-mounted displays (HMD) appear to be effective research tools, which may address the problem of ecological validity in neuropsychological testing. However, their widespread implementation is hindered by VR induced symptoms and effects (VRISE) and the lack of skills in VR software development. This study offers guidelines for the development of VR software in cognitive neuroscience

    更新日期:2021-01-21
  • Validation of the Virtual Reality Neuroscience Questionnaire: Maximum Duration of Immersive Virtual Reality Sessions Without the Presence of Pertinent Adverse Symptomatology
    arXiv.cs.MM Pub Date : 2021-01-20
    Panagiotis Kourtesis; Simona Collina; Leonidas A. A. Doumas; Sarah E. MacPherson

    Research suggests that the duration of a VR session modulates the presence and intensity of VRISE, but there are no suggestions regarding the appropriate maximum duration of VR sessions. The implementation of high-end VR HMDs in conjunction with ergonomic VR software seems to mitigate the presence of VRISE substantially. However, a brief tool does not currently exist to appraise and report both the

    更新日期:2021-01-21
  • Technological Competence is a Precondition for Effective Implementation of Virtual Reality Head Mounted Displays in Human Neuroscience: A Technological Review and Meta-analysis
    arXiv.cs.MM Pub Date : 2021-01-20
    Panagiotis Kourtesis; Simona Collina; Leonidas A. A. Doumas; Sarah E. MacPherson

    Immersive virtual reality (VR) emerges as a promising research and clinical tool. However, several studies suggest that VR induced adverse symptoms and effects (VRISE) may undermine the health and safety standards, and the reliability of the scientific results. In the current literature review, the technical reasons for the adverse symptomatology are investigated to provide suggestions and technological

    更新日期:2021-01-21
  • Wide Color Gamut Image Content Characterization: Method, Evaluation, and Applications
    arXiv.cs.MM Pub Date : 2021-01-19
    Junghyuk Lee; Toinon Vigier; Patrick Le Callet; Jong-Seok Lee

    In this paper, we propose a novel framework to characterize a wide color gamut image content based on perceived quality due to the processes that change color gamut, and demonstrate two practical use cases where the framework can be applied. We first introduce the main framework and implementation details. Then, we provide analysis for understanding of existing wide color gamut datasets with quantitative

    更新日期:2021-01-20
  • Ambiguity of Objective Image Quality Metrics: A New Methodology for Performance Evaluation
    arXiv.cs.MM Pub Date : 2021-01-19
    Manri Cheon; Toinon Vigier; Lukáš Krasula; Junghyuk Lee; Patrick Le Callet; Jong-Seok Lee

    Objective image quality metrics try to estimate the perceptual quality of the given image by considering the characteristics of the human visual system. However, it is possible that the metrics produce different quality scores even for two images that are perceptually indistinguishable by human viewers, which have not been considered in the existing studies related to objective quality assessment.

    更新日期:2021-01-20
  • Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences
    arXiv.cs.MM Pub Date : 2021-01-19
    Chang Li; Qian Huang; Xing Li; Qianhan Wu

    Human action recognition is an active research area in computer vision. Although great process has been made, previous methods mostly recognize actions based on depth data at only one scale, and thus they often neglect multi-scale features that provide additional information action recognition in practical application scenarios. In this paper, we present a novel framework focusing on multi-scale motion

    更新日期:2021-01-20
  • Designing a mobile game to generate player data -- lessons learned
    arXiv.cs.MM Pub Date : 2021-01-18
    William Wallis; William Kavanagh; Alice Miller; Tim Storer

    User friendly tools have lowered the requirements of high-quality game design to the point where researchers without development experience can release their own games. However, there is no established best-practice as few games have been produced for research purposes. Having developed a mobile game without the guidance of similar projects, we realised the need to share our experience so future researchers

    更新日期:2021-01-19
  • A Novel Local Binary Pattern Based Blind Feature Image Steganography
    arXiv.cs.MM Pub Date : 2021-01-16
    Soumendu Chakraborty; Anand Singh Jalal

    Steganography methods in general terms tend to embed more and more secret bits in the cover images. Most of these methods are designed to embed secret information in such a way that the change in the visual quality of the resulting stego image is not detectable. There exists some methods which preserve the global structure of the cover after embedding. However, the embedding capacity of these methods

    更新日期:2021-01-19
  • Attention Based Video Summaries of Live Online Zoom Classes
    arXiv.cs.MM Pub Date : 2021-01-15
    Hyowon Lee; Mingming Liu; Hamza Riaz; Navaneethan Rajasekaren; Michael Scriney; Alan F. Smeaton

    This paper describes a system developed to help University students get more from their online lectures, tutorials, laboratory and other live sessions. We do this by logging their attention levels on their laptops during live Zoom sessions and providing them with personalised video summaries of those live sessions. Using facial attention analysis software we create personalised video summaries composed

    更新日期:2021-01-19
  • Generalized Image Reconstruction over T-Algebra
    arXiv.cs.MM Pub Date : 2021-01-17
    Liang Liao; Xuechun Zhang; Xinqiang Wang; Sen Lin; Xin Liu

    Principal Component Analysis (PCA) is well known for its capability of dimension reduction and data compression. However, when using PCA for compressing/reconstructing images, images need to be recast to vectors. The vectorization of images makes some correlation constraints of neighboring pixels and spatial information lost. To deal with the drawbacks of the vectorizations adopted by PCA, we used

    更新日期:2021-01-19
  • A Hitchhiker's Guide to Structural Similarity
    arXiv.cs.MM Pub Date : 2021-01-16
    Abhinau K. Venkataramanan; Chengyang Wu; Alan C. Bovik; Ioannis Katsavounidis; Zafar Shahid

    The Structural Similarity (SSIM) Index is a very widely used image/video quality model that continues to play an important role in the perceptual evaluation of compression algorithms, encoding recipes and numerous other image/video processing algorithms. Several public implementations of the SSIM and Multiscale-SSIM (MS-SSIM) algorithms have been developed, which differ in efficiency and performance

    更新日期:2021-01-19
  • The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements
    arXiv.cs.MM Pub Date : 2021-01-15
    Lukas Stappen; Alice Baird; Lea Schumann; Björn Schuller

    Truly real-life data presents a strong, but exciting challenge for sentiment and emotion research. The high variety of possible `in-the-wild' properties makes large datasets such as these indispensable with respect to building robust machine learning models. A sufficient quantity of data covering a deep variety in the challenges of each modality to force the exploratory analysis of the interplay of

    更新日期:2021-01-18
  • Video Summarization Using Deep Neural Networks: A Survey
    arXiv.cs.MM Pub Date : 2021-01-15
    Evlampios Apostolidis; Eleni Adamantidou; Alexandros I. Metsai; Vasileios Mezaris; Ioannis Patras

    Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on the recent advances in the area and provides a comprehensive

    更新日期:2021-01-18
  • Augmented Informative Cooperative Perception
    arXiv.cs.MM Pub Date : 2021-01-14
    Pengyuan Zhou; Pranvera Kortoci; Yui-Pan Yau; Tristan Braud; Xiujun Wang; Benjamin Finley; Lik-Hang Lee; Sasu Tarkoma; Jussi Kangasharju; Pan Hui

    Connected vehicles, whether equipped with advanced driver-assistance systems or fully autonomous, are currently constrained to visual information in their lines-of-sight. A cooperative perception system among vehicles increases their situational awareness by extending their perception ranges. Existing solutions imply significant network and computation load, as well as high flow of not-always-relevant

    更新日期:2021-01-15
  • Piano Skills Assessment
    arXiv.cs.MM Pub Date : 2021-01-13
    Paritosh Parmar; Jaiden Reddy; Brendan Morris

    Can a computer determine a piano player's skill level? Is it preferable to base this assessment on visual analysis of the player's performance or should we trust our ears over our eyes? Since current CNNs have difficulty processing long video videos, how can shorter clips be sampled to best reflect the players skill level? In this work, we collect and release a first-of-its-kind dataset for multimodal

    更新日期:2021-01-14
  • Urban land-use analysis using proximate sensing imagery: a survey
    arXiv.cs.MM Pub Date : 2021-01-13
    Zhinan Qiao; Xiaohui Yuan

    Urban regions are complicated functional systems that are closely associated with and reshaped by human activities. The propagation of online geographic information-sharing platforms and mobile devices equipped with Global Positioning System (GPS) greatly proliferates proximate sensing images taken near or on the ground at a close distance to urban targets. Studies leveraging proximate sensing imagery

    更新日期:2021-01-14
  • Evaluation of quality scalability techniques for video transmission
    arXiv.cs.MM Pub Date : 2021-01-12
    Wilder Castellanos

    The significant increase of the transmission of multimedia content over Internet are demanded new delivery strategies to assure a good quality of experience of the users. Transmission of video over packet networks is not an easy task due to multiple fluctuations of the network conditions. One possibility to improve the quality of some video streaming services is the combinate use of the scalable video

    更新日期:2021-01-14
  • A Compact Deep Learning Model for Face Spoofing Detection
    arXiv.cs.MM Pub Date : 2021-01-12
    Seyedkooshan Hashemifard; Mohammad Akbari

    In recent years, face biometric security systems are rapidly increasing, therefore, the presentation attack detection (PAD) has received significant attention from research communities and has become a major field of research. Researchers have tackled the problem with various methods, from exploiting conventional texture feature extraction such as LBP, BSIF, and LPQ to using deep neural networks with

    更新日期:2021-01-14
  • Network-Distributed Video Coding
    arXiv.cs.MM Pub Date : 2021-01-12
    Johan De Praeter; Christopher Hollmann; Rickard Sjoberg; Glenn Van Wallendael; Peter Lambert

    Nowadays, an enormous amount of videos are streamed every day to countless users, all using different devices and networks. These videos must be adapted in order to provide users with the most suitable video representation based on their device properties and current network conditions. However, the two most common techniques for video adaptation, simulcast and transcoding, represent two extremes.

    更新日期:2021-01-13
  • Multimodal Engagement Analysis from Facial Videos in the Classroom
    arXiv.cs.MM Pub Date : 2021-01-11
    Ömer Sümer; Patricia Goldberg; Sidney D'Mello; Peter Gerjets; Ulrich Trautwein; Enkelejda Kasneci

    Student engagement is a key construct for learning and teaching. While most of the literature explored the student engagement analysis on computer-based settings, this paper extends that focus to classroom instruction. To best examine student visual engagement in the classroom, we conducted a study utilizing the audiovisual recordings of classes at a secondary school over one and a half month's time

    更新日期:2021-01-13
  • Facial Biometric System for Recognition using Extended LGHP Algorithm on Raspberry Pi
    arXiv.cs.MM Pub Date : 2021-01-09
    Soumendu Chakraborty; Satish Kumar Singh; Kush Kumar

    In todays world, where the need for security is paramount and biometric access control systems are gaining mass acceptance due to their increased reliability, research in this area is quite relevant. Also with the advent of IOT devices and increased community support for cheap and small computers like Raspberry Pi its convenient than ever to design a complete standalone system for any purpose. This

    更新日期:2021-01-12
  • Efficiency of Using Utility for Usernames Verification in Online Community Management
    arXiv.cs.MM Pub Date : 2021-01-04
    Solomiia Fedushko; Yuriy Syerov; Oleksandr Skybinskyi; Nataliya Shakhovska; Zoryana Kunch

    The study deals with the methods and means of checking the reliability of usernames of online communities on the basis of computer-linguistic analysis of the results of their communicative interaction. The methodological basis of the study is a combination of general scientific methods and special approaches to the study of the data verification of online communities in the Ukrainian segment of the

    更新日期:2021-01-07
  • QoE-driven Secure Video Transmission in Cloud-edge Collaborative Networks
    arXiv.cs.MM Pub Date : 2021-01-05
    Tantan Zhao; Lijun He; Xinyu Huang; Fan Li

    Video transmission over the backhaul link in cloudedge collaborative networks usually suffers security risks. Only a few existing studies focus on ensuring secure backhaul link transmission. However, video content characteristics, which have significant effects on quality of experience (QoE), are ignored in the study. In this paper, we investigate the QoE-driven crosslayer optimization of secure video

    更新日期:2021-01-06
  • End-to-End Video Question-Answer Generation with Generator-Pretester Network
    arXiv.cs.MM Pub Date : 2021-01-05
    Hung-Ting Su; Chen-Hsi Chang; Po-Wei Shen; Yu-Siang Wang; Ya-Liang Chang; Yu-Cheng Chang; Pu-Jen Cheng; Winston H. Hsu

    We study a novel task, Video Question-Answer Generation (VQAG), for challenging Video Question Answering (Video QA) task in multimedia. Due to expensive data annotation costs, many widely used, large-scale Video QA datasets such as Video-QA, MSVD-QA and MSRVTT-QA are automatically annotated using Caption Question Generation (CapQG) which inputs captions instead of the video itself. As captions neither

    更新日期:2021-01-06
  • A Database for Digital Image Forensics of Recaptured Document
    arXiv.cs.MM Pub Date : 2021-01-05
    Changsheng Chen; Shuzheng Zhang; Fengbo Lan

    Recapturing attack of document images is a topic with few research attention. However, such attack can be employed as a simple but effective anti-forensic tool for digital document images. In this work, we present a high quality captured and recaptured image dataset of some representative identity documents to facilitate the study of this important issue. To highlight the risks posed by such attack

    更新日期:2021-01-06
  • Interpersonal distance in VR: reactions of older adults to the presence of a virtual agent
    arXiv.cs.MM Pub Date : 2021-01-05
    Grzegorz Pochwatko; Barbara Karpowicz; Anna Chrzanowska; Wiesław Kopeć

    The rapid development of virtual reality technology has increased its availability and, consequently, increased the number of its possible applications. The interest in the new medium has grown due to the entertainment industry (games, VR experiences and movies). The number of freely available training and therapeutic applications is also increasing. Contrary to popular opinion, new technologies are

    更新日期:2021-01-06
  • Similarity Reasoning and Filtration for Image-Text Matching
    arXiv.cs.MM Pub Date : 2021-01-05
    Haiwen Diao; Ying Zhang; Lin Ma; Huchuan Lu

    Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words. However, how to make the most of these alignments to infer more accurate matching scores is still underexplored. In this paper, we propose a novel Similarity Graph Reasoning and Attention

    更新日期:2021-01-06
  • All Factors Should Matter! Reference Checklist for Describing Research Conditions in Pursuit of Comparable IVR Experiments
    arXiv.cs.MM Pub Date : 2021-01-04
    Kinga Skorupska; Daniel Cnotkowski; Julia Paluch; Rafał Masłyk; Anna Jaskulska Monika Kornacka; Wiesław Kopeć

    A significant problem with immersive virtual reality (IVR) experiments is the ability to compare research conditions. VR kits and IVR environments are complex and diverse but researchers from different fields, e.g. ICT, psychology, or marketing, often neglect to describe them with a level of detail sufficient to situate their research on the IVR landscape. Careful reporting of these conditions may

    更新日期:2021-01-06
  • CSIS: compressed sensing-based enhanced-embedding capacity image steganography scheme
    arXiv.cs.MM Pub Date : 2021-01-03
    Rohit Agrawal; Kapil Ahuja

    Image steganography plays a vital role in securing secret data by embedding it in the cover images. Usually, these images are communicated in a compressed format. Existing techniques achieve this but have low embedding capacity. Enhancing this capacity causes a deterioration in the visual quality of the stego-image. Hence, our goal here is to enhance the embedding capacity while preserving the visual

    更新日期:2021-01-05
  • Duration-Squeezing-Aware Communication and Computing for Proactive VR
    arXiv.cs.MM Pub Date : 2021-01-03
    Xing Wei; Chenyang Yang; Shengqian Han

    Proactive tile-based virtual reality video streaming computes and delivers the predicted tiles to be requested before playback. All existing works overlook the important fact that computing and communication (CC) tasks for a segment may squeeze the time for the tasks for the next segment, which will cause less and less available time for the latter segments. In this paper, we jointly optimize the durations

    更新日期:2021-01-05
  • Deploying Crowdsourcing for Workflow Driven Business Process
    arXiv.cs.MM Pub Date : 2021-01-04
    Rafał Masłyk; Kinga Skorupska; Piotr Gago; Marcin Niewiński; Barbara Karpowicz; Anna Jaskulska; Katarzyna Abramczuk; Wiesław Kopeć

    The main goal of this paper is to discuss how to integrate the possibilities of crowdsourcing platforms with systems supporting workflow to enable the engagement and interaction with business tasks of a wider group of people. Thus, this work is an attempt to expand the functional capabilities of typical business systems by allowing selected process tasks to be performed by unlimited human resources

    更新日期:2021-01-05
  • Personal Privacy Protection via Irrelevant Faces Tracking and Pixelation in Video Live Streaming
    arXiv.cs.MM Pub Date : 2021-01-04
    Jizhe Zhou; Chi-Man Pun

    To date, the privacy-protection intended pixelation tasks are still labor-intensive and yet to be studied. With the prevailing of video live streaming, establishing an online face pixelation mechanism during streaming is an urgency. In this paper, we develop a new method called Face Pixelation in Video Live Streaming (FPVLS) to generate automatic personal privacy filtering during unconstrained streaming

    更新日期:2021-01-05
  • Temporal Contrastive Graph for Self-supervised Video Representation Learning
    arXiv.cs.MM Pub Date : 2021-01-04
    Yang Liu; Keze Wang; Haoyuan Lan; Liang Lin

    Attempt to fully explore the fine-grained temporal structure and global-local chronological characteristics for self-supervised video representation learning, this work takes a closer look at exploiting the temporal structure of videos and further proposes a novel self-supervised method named Temporal Contrastive Graph (TCG). In contrast to the existing methods that randomly shuffle the video frames

    更新日期:2021-01-05
  • News Image Steganography: A Novel Architecture Facilitates the Fake News Identification
    arXiv.cs.MM Pub Date : 2021-01-03
    Jizhe Zhou; Chi-Man Pun; Yu Tong

    A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports stealthy, thereby, palm off the spurious for the genuine. This paper proposes an architecture named News Image Steganography (NIS) to reveal the aforementioned inconsistency through image

    更新日期:2021-01-05
  • Privacy-sensitive Objects Pixelation for Live Video Streaming
    arXiv.cs.MM Pub Date : 2021-01-03
    Jizhe Zhou; Chi-Man Pun; Yu Tong

    With the prevailing of live video streaming, establishing an online pixelation method for privacy-sensitive objects is an urgency. Caused by the inaccurate detection of privacy-sensitive objects, simply migrating the tracking-by-detection structure into the online form will incur problems in target initialization, drifting, and over-pixelation. To cope with the inevitable but impacting detection issue

    更新日期:2021-01-05
  • Identity-aware Facial Expression Recognition in Compressed Video
    arXiv.cs.MM Pub Date : 2021-01-01
    Xiaofeng Liu; Linghao Jin; Xu Han; Jun Lu; Jane You; Lingsheng Kong

    This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the

    更新日期:2021-01-05
  • Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?
    arXiv.cs.MM Pub Date : 2020-12-31
    Alba García Seco De Herrera; Rukiye Savran Kiziltepe; Jon Chamberlain; Mihai Gabriel Constantin; Claire-Hélène Demarty; Faiyaz Doctor; Bogdan Ionescu; Alan F. Smeaton

    This paper describes the MediaEval 2020 \textit{Predicting Media Memorability} task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of

    更新日期:2021-01-01
  • Investigating Memorability of Dynamic Media
    arXiv.cs.MM Pub Date : 2020-12-31
    Phuc H. Le-Khac; Ayush K. Rai; Graham Healy; Alan F. Smeaton; Noel E. O'Connor

    The Predicting Media Memorability task in MediaEval'20 has some challenging aspects compared to previous years. In this paper we identify the high-dynamic content in videos and dataset of limited size as the core challenges for the task, we propose directions to overcome some of these challenges and we present our initial result in these directions.

    更新日期:2021-01-01
  • Leveraging Audio Gestalt to Predict Media Memorability
    arXiv.cs.MM Pub Date : 2020-12-31
    Lorin Sweeney; Graham Healy; Alan F. Smeaton

    Memorability determines what evanesces into emptiness, and what worms its way into the deepest furrows of our minds. It is the key to curating more meaningful media content as we wade through daily digital torrents. The Predicting Media Memorability task in MediaEval 2020 aims to address the question of media memorability by setting the task of automatically predicting video memorability. Our approach

    更新日期:2021-01-01
  • Sub-sampled Cross-component Prediction for Emerging Video Coding Standards
    arXiv.cs.MM Pub Date : 2020-12-30
    Junru Li; Meng Wang; Li Zhang; Shiqi Wang; Kai Zhang; Shanshe Wang; Siwei Ma; Wen Gao

    Cross-component linear model (CCLM) prediction has been repeatedly proven to be effective in reducing the inter-channel redundancies in video compression. Essentially speaking, the linear model is identically trained by employing accessible luma and chroma reference samples at both encoder and decoder, elevating the level of operational complexity due to the least square regression or max-min based

    更新日期:2021-01-01
  • An Efficient QP Variable Convolutional Neural Network Based In-loop Filter for Intra Coding
    arXiv.cs.MM Pub Date : 2020-12-30
    Zhijie Huang; Xiaopeng Guo; Mingyu Shang; Jie Gao; Jun Sun

    In this paper, a novel QP variable convolutional neural network based in-loop filter is proposed for VVC intra coding. To avoid training and deploying multiple networks, we develop an efficient QP attention module (QPAM) which can capture compression noise levels for different QPs and emphasize meaningful features along channel dimension. Then we embed QPAM into the residual block, and based on it

    更新日期:2021-01-01
  • Exploration of Voice User Interfaces for Older Adults - A Pilot Study to Address Progressive Vision Loss
    arXiv.cs.MM Pub Date : 2020-12-31
    Anna Jaskulska; Kinga Skorupska; Barbara Karpowicz; Cezary Biele; Jarosław Kowalski; Wiesław Kopeć

    Voice User Interfaces (VUIs) owing to recent developments in Artificial Intelligence (AI) and Natural Language Processing (NLP), are becoming increasingly intuitive and functional. They are especially promising for older adults, also with special needs, as VUIs remove some barriers related to access to Information and Communications Technology (ICT) solutions. In this pilot study we examine interdisciplinary

    更新日期:2021-01-01
  • The VIP Gallery for Video Processing Education
    arXiv.cs.MM Pub Date : 2020-12-29
    Todd Goodall; Alan C. Bovik

    Digital video pervades daily life. Mobile video, digital TV, and digital cinema are now ubiquitous, and as such, the field of Digital Video Processing (DVP) has experienced tremendous growth. Digital video systems also permeate scientific and engineering disciplines including but not limited to astronomy, communications, surveillance, entertainment, video coding, computer vision, and vision research

    更新日期:2021-01-01
  • Detecting Medical Misinformation on Social Media Using Multimodal Deep Learning
    arXiv.cs.MM Pub Date : 2020-12-27
    Zuhui Wang; Zhaozheng Yin; Young Anna Argyris

    In 2019, outbreaks of vaccine-preventable diseases reached the highest number in the US since 1992. Medical misinformation, such as antivaccine content propagating through social media, is associated with increases in vaccine delay and refusal. Our overall goal is to develop an automatic detector for antivaccine messages to counteract the negative impact that antivaccine messages have on the public

    更新日期:2020-12-29
  • Study On Coding Tools Beyond Av1
    arXiv.cs.MM Pub Date : 2020-12-25
    Xin Zhao; Liang Zhao; Madhu Krishnan; Yixin Du; Shan Liu; Debargha Mukherjee; Yaowu Xu; Adrian Grange

    The Alliance for Open Media has recently initiated coding tool exploration activities towards the next-generation video coding beyond AV1. With this regard, this paper presents a package of coding tools that have been investigated, implemented and tested on top of the codebase, known as libaom, which is used for the exploration of next-generation video compression tools. The proposed tools cover several

    更新日期:2020-12-29
  • Deep Learning-Based Human Pose Estimation: A Survey
    arXiv.cs.MM Pub Date : 2020-12-24
    Ce Zheng; Wenhan Wu; Taojiannan Yang; Sijie Zhu; Chen Chen; Ruixu Liu; Ju Shen; Nasser Kehtarnavaz; Mubarak Shah

    Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide range of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep

    更新日期:2020-12-25
  • Digital Reconstruction of Elmina Castle for Mobile Virtual Reality via Point-based Detail Transfer
    arXiv.cs.MM Pub Date : 2020-12-19
    Sifan Ye; Ting Wu; Michael Jarvis; Yuhao Zhu

    Reconstructing 3D models from large, dense point clouds is critical to enable Virtual Reality (VR) as a platform for entertainment, education, and heritage preservation. Existing 3D reconstruction systems inevitably make trade-offs between three conflicting goals: the efficiency of reconstruction (e.g., time and memory requirements), the visual quality of the constructed scene, and the rendering speed

    更新日期:2020-12-22
  • Self-Supervision based Task-Specific Image Collection Summarization
    arXiv.cs.MM Pub Date : 2020-12-19
    Anurag Singh; Deepak Kumar Sharma; Sudhir Kumar Sharma; Joel J. P. C. Rodrigues

    Successful applications of deep learning (DL) requires large amount of annotated data. This often restricts the benefits of employing DL to businesses and individuals with large budgets for data-collection and computation. Summarization offers a possible solution by creating much smaller representative datasets that can allow real-time deep learning and analysis of big data and thus democratize use

    更新日期:2020-12-22
  • PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension
    arXiv.cs.MM Pub Date : 2020-12-20
    Chao Yang; Guoqing Wang; Dongsheng Li; Huawei Shen; Su Feng; Bin Jiang

    Reference expression comprehension (REC) aims to find the location that the phrase refer to in a given image. Proposal generation and proposal representation are two effective techniques in many two-stage REC methods. However, most of the existing works only focus on proposal representation and neglect the importance of proposal generation. As a result, the low-quality proposals generated by these

    更新日期:2020-12-22
  • Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
    arXiv.cs.MM Pub Date : 2020-12-17
    Xi Zhu; Zhendong Mao; Chunxiao Liu; Peng Zhang; Bin Wang; Yongdong Zhang

    Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to

    更新日期:2020-12-22
  • Visual Speech Enhancement Without A Real Visual Stream
    arXiv.cs.MM Pub Date : 2020-12-20
    Sindhu B Hegde; K R Prajwal; Rudrabha Mukhopadhyay; Vinay Namboodiri; C. V. Jawahar

    In this work, we re-think the task of speech enhancement in unconstrained real-world environments. Current state-of-the-art methods use only the audio stream and are limited in their performance in a wide range of real-world noises. Recent works using lip movements as additional cues improve the quality of generated speech over "audio-only" methods. But, these methods cannot be used for several applications

    更新日期:2020-12-22
  • Self-Supervised Sketch-to-Image Synthesis
    arXiv.cs.MM Pub Date : 2020-12-16
    Bingchen Liu; Yizhe Zhu; Kunpeng Song; Ahmed Elgammal

    Imagining a colored realistic image from an arbitrarily drawn sketch is one of the human capabilities that we eager machines to mimic. Unlike previous methods that either requires the sketch-image pairs or utilize low-quantity detected edges as sketches, we study the exemplar-based sketch-to-image (s2i) synthesis task in a self-supervised learning manner, eliminating the necessity of the paired sketch

    更新日期:2020-12-18
  • 3D Trajectory Design for UAV-Assisted Oblique Image Acquisition
    arXiv.cs.MM Pub Date : 2020-12-16
    Xiao-Wei Tang; Changsheng You; Shuowen Zhang; Xin-Lin Huang; Rui Zhang

    In this correspondence, we consider a new unmanned aerial vehicle (UAV)-assisted oblique image acquisition system where a UAV is dispatched to take images of multiple ground targets (GTs). To study the three-dimensional (3D) UAV trajectory design for image acquisition, we first propose a novel UAV-assisted oblique photography model, which characterizes the image resolution with respect to the UAV's

    更新日期:2020-12-17
  • An adaptive algorithm for embedding information into compressed JPEG images using the QIM method
    arXiv.cs.MM Pub Date : 2020-12-16
    Anna Melman; Pavel Petrov; Alexander Shelupanov

    The widespread use of JPEG images makes them good covers for secret messages storing and transmitting. This paper proposes a new algorithm for embedding information in JPEG images based on the steganographic QIM method. The main problem of such embedding is the vulnerability to statistical steganalysis. To solve this problem, it is proposed to use a variable quantization step, which is adaptively selected

    更新日期:2020-12-17
  • Secret Key Agreement with Physical Unclonable Functions: An Optimality Summary
    arXiv.cs.MM Pub Date : 2020-12-16
    Onur Günlü; Rafael F. Schaefer

    We address security and privacy problems for digital devices and biometrics from an information-theoretic optimality perspective, where a secret key is generated for authentication, identification, message encryption/decryption, or secure computations. A physical unclonable function (PUF) is a promising solution for local security in digital devices and this review gives the most relevant summary for

    更新日期:2020-12-17
  • Learning-Based Quality Assessment for Image Super-Resolution
    arXiv.cs.MM Pub Date : 2020-12-16
    Tiesong Zhao; Yuting Lin; Yiwen Xu; Weiling Chen; Zhou Wang

    Image Super-Resolution (SR) techniques improve visual quality by enhancing the spatial resolution of images. Quality evaluation metrics play a critical role in comparing and optimizing SR algorithms, but current metrics achieve only limited success, largely due to the lack of large-scale quality databases, which are essential for learning accurate and robust SR quality metrics. In this work, we first

    更新日期:2020-12-17
  • A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis
    arXiv.cs.MM Pub Date : 2020-12-15
    Ashima Yadav; Dinesh Kumar Vishwakarma

    Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted

    更新日期:2020-12-16
  • An Artistic Visualization of Music Modeling a Synesthetic Experience
    arXiv.cs.MM Pub Date : 2020-12-15
    Matthew Joseph Adiletta; Oliver Thomas

    This project brings music to sight. Music can be a visual masterpiece. Some people naturally experience a visualization of audio - a condition called synesthesia. The type of synesthesia explored is when sounds create colors in the 'mind's eye.' Project included interviews with people who experience synesthesia, examination of prior art, and topic research to inform project design. Audio input, digital

    更新日期:2020-12-16
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
微生物研究
亚洲大洋洲地球科学
NPJ欢迎投稿
自然科研论文编辑
ERIS期刊投稿
欢迎阅读创刊号
自然职场,为您触达千万科研人才
spring&清华大学出版社
城市可持续发展前沿研究专辑
Springer 纳米技术权威期刊征稿
全球视野覆盖
施普林格·自然新
chemistry
物理学研究前沿热点精选期刊推荐
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
屿渡论文,编辑服务
阿拉丁试剂right
上海中医药大学
浙江大学
西湖大学
化学所
北京大学
清华
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
清华大学-1
武汉大学
浙江大学
天合科研
x-mol收录
试剂库存
down
wechat
bug