当前期刊: arXiv - CS - Multimedia Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Packet Compressed Sensing Imaging (PCSI): Robust Image Transmission over Noisy Channels
    arXiv.cs.MM Pub Date : 2020-09-24
    Scott Howard; Grant Barthelmes; Cara Ravasio; Lisa Huang; Benjamin Poag; Varun Mannam

    Packet Compressed Sensing Imaging (PCSI) is digital unconnected image transmission method resilient to packet loss. The goal is to develop a robust image transmission method that is computationally trivial to transmit (e.g., compatible with low-power 8-bit microcontrollers) and well suited for weak signal environments where packets are likely to be lost. In other image transmission techniques, noise

    更新日期:2020-09-25
  • Cosine Similarity of Multimodal Content Vectors for TV Programmes
    arXiv.cs.MM Pub Date : 2020-09-23
    Saba Nazir. Taner Cagali; Chris Newell; Mehrnoosh Sadrzadeh

    Multimodal information originates from a variety of sources: audiovisual files, textual descriptions, and metadata. We show how one can represent the content encoded by each individual source using vectors, how to combine the vectors via middle and late fusion techniques, and how to compute the semantic similarities between the contents. Our vectorial representations are built from spectral features

    更新日期:2020-09-24
  • Exploring global diverse attention via pairwise temporal relation for video summarization
    arXiv.cs.MM Pub Date : 2020-09-23
    Ping Li; Qinghao Ye; Luming Zhang; Li Yuan; Xianghua Xu; Ling Shao

    Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail to explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention

    更新日期:2020-09-24
  • Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents
    arXiv.cs.MM Pub Date : 2020-09-22
    Patrik Jonell; Taras Kucherenko; Ilaria Torre; Jonas Beskow

    Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab) and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run

    更新日期:2020-09-24
  • H.264/SVC Mode Decision Based on Mode Correlation and Desired Mode List
    arXiv.cs.MM Pub Date : 2020-09-22
    L. Balaji; K. K. Thyagharajan

    The design of video encoders involves the implementation of fast mode decision (FMD) algorithm to reduce computation complexity while maintaining the performance of the coding. Although H.264/scalable video coding (SVC) achieves high scalability and coding efficiency, it also has high complexity in implementing its exhaustive computation. In this paper, a novel algorithm is proposed to reduce the redundant

    更新日期:2020-09-23
  • Frame-wise Cross-modal Match for Video Moment Retrieval
    arXiv.cs.MM Pub Date : 2020-09-22
    Haoyu Tang; Jihua Zhu; Meng Liu; Member; IEEE; Zan Gao; Zhiyong Cheng

    Video moment retrieval targets at retrieving a golden moment in a video for a given natural language query. The main challenges of this task include 1) the requirement of accurately localizing (i.e., the start time and the end time of) the relevant moment in an untrimmed video stream, and 2) bridging the semantic gap between textual query and video contents. To tackle those problems, One mainstream

    更新日期:2020-09-23
  • Visual Methods for Sign Language Recognition: A Modality-Based Review
    arXiv.cs.MM Pub Date : 2020-09-22
    Bassem Seddik; Najoua Essoukri Ben Amara

    Sign language visual recognition from continuous multi-modal streams is still one of the most challenging fields. Recent advances in human actions recognition are exploiting the ascension of GPU-based learning from massive data, and are getting closer to human-like performances. They are then prone to creating interactive services for the deaf and hearing-impaired communities. A population that is

    更新日期:2020-09-23
  • PodSumm -- Podcast Audio Summarization
    arXiv.cs.MM Pub Date : 2020-09-22
    Aneesh Vartakavi; Amanmeet Garg

    The diverse nature, scale, and specificity of podcasts present a unique challenge to content discovery systems. Listeners often rely on text descriptions of episodes provided by the podcast creators to discover new content. Some factors like the presentation style of the narrator and production quality are significant indicators of subjective user preference but are difficult to quantify and not reflected

    更新日期:2020-09-23
  • An enhanced performance for H.265/SHVC based on combined AEGBM3D filter and back-propagation neural network
    arXiv.cs.MM Pub Date : 2020-09-20
    L. Balaji; K. K. Thyagharajan

    This paper deals with the latest video coding standard H265 SHVC, a scalable extension to High Efficiency Video Coding (HEVC). HEVC introduces new coding tools compared to its predecessor and is backward compatible with all types of electronic gadgets. The gadgets with different display capabilities cannot be offered the same quality video due to the constraints in transmission bandwidth is a major

    更新日期:2020-09-22
  • Features based Mammogram Image Classification using Weighted Feature Support Vector Machine
    arXiv.cs.MM Pub Date : 2020-09-19
    S. Kavitha; K. K. Thyagharajan

    In the existing research of mammogram image classification, either clinical data or image features of a specific type is considered along with the supervised classifiers such as Neural Network (NN) and Support Vector Machine (SVM). This paper considers automated classification of breast tissue type as benign or malignant using Weighted Feature Support Vector Machine (WFSVM) through constructing the

    更新日期:2020-09-22
  • Temporally Guided Music-to-Body-Movement Generation
    arXiv.cs.MM Pub Date : 2020-09-17
    Hsuan-Kai Kao; Li Su

    This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To

    更新日期:2020-09-20
  • A Multimodal Memes Classification: A Survey and Open Research Issues
    arXiv.cs.MM Pub Date : 2020-09-17
    Tariq Habib Afridi; Aftab Alam; Muhammad Numan Khan; Jawad Khan; Young-Koo Lee

    Memes are graphics and text overlapped so that together they present concepts that become dubious if one of them is absent. It is spread mostly on social media platforms, in the form of jokes, sarcasm, motivating, etc. After the success of BERT in Natural Language Processing (NLP), researchers inclined to Visual-Linguistic (VL) multimodal problems like memes classification, image captioning, Visual

    更新日期:2020-09-20
  • Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts
    arXiv.cs.MM Pub Date : 2020-09-17
    Cheng-Che Lee; Wan-Yi Lin; Yen-Ting Shih; Pei-Yi Patricia Kuo; Li Su

    Music-to-visual style transfer is a challenging yet important cross-modal learning problem in the practice of creativity. Its major difference from the traditional image style transfer problem is that the style information is provided by music rather than images. Assuming that musical features can be properly mapped to visual contents through semantic links between the two domains, we solve the music-to-visual

    更新日期:2020-09-20
  • Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform
    arXiv.cs.MM Pub Date : 2020-09-17
    Pawan Kumar Singh; Shubham Sinha; Sagnik Pal Chowdhury; Ram Sarkar; Mita Nasipuri

    Segmentation of handwritten document images into text lines and words is one of the most significant and challenging tasks in the development of a complete Optical Character Recognition (OCR) system. This paper addresses the automatic segmentation of text words directly from unconstrained Bangla handwritten document images. The popular Distance transform (DT) algorithm is applied for locating the outer

    更新日期:2020-09-20
  • Using Sensory Time-cue to enable Unsupervised Multimodal Meta-learning
    arXiv.cs.MM Pub Date : 2020-09-16
    Qiong Liu; Yanxia Zhang

    As data from IoT (Internet of Things) sensors become ubiquitous, state-of-the-art machine learning algorithms face many challenges on directly using sensor data. To overcome these challenges, methods must be designed to learn directly from sensors without manual annotations. This paper introduces Sensory Time-cue for Unsupervised Meta-learning (STUM). Different from traditional learning approaches

    更新日期:2020-09-20
  • A Human-Computer Duet System for Music Performance
    arXiv.cs.MM Pub Date : 2020-09-16
    Yuen-Jen Lin; Hsuan-Kai Kao; Yih-Chih Tseng; Ming Tsai; Li Su

    Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention

    更新日期:2020-09-18
  • Exploring Speech Cues in Web-mined COVID-19 Conversational Vlogs
    arXiv.cs.MM Pub Date : 2020-09-16
    Kexin Feng; Preeti Zanwar; Amir H. Behzadan; Theodora Chaspari

    The COVID-19 pandemic caused by the novel SARS-Coronavirus-2 (n-SARS-CoV-2) has impacted people's lives in unprecedented ways. During the time of the pandemic, social vloggers have used social media to actively share their opinions or experiences in quarantine. This paper collected videos from YouTube to track emotional responses in conversational vlogs and their potential associations with events

    更新日期:2020-09-18
  • ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit
    arXiv.cs.MM Pub Date : 2020-09-16
    Zijie Ye; Haozhe Wu; Jia Jia; Yaohua Bu; Wei Chen; Fanbo Meng; Yanfeng Wang

    Dance and music are two highly correlated artistic forms. Synthesizing dance motions has attracted much attention recently. Most previous works conduct music-to-dance synthesis via directly music to human skeleton keypoints mapping. Meanwhile, human choreographers design dance motions from music in a two-stage manner: they firstly devise multiple choreographic dance units (CAUs), each with a series

    更新日期:2020-09-18
  • Helping Users Tackle Algorithmic Threats on Social Media: A Multimedia Research Agenda
    arXiv.cs.MM Pub Date : 2020-08-26
    Christian von der Weth; Ashraf Abdul; Shaojing Fan; Mohan Kankanhalli

    Participation on social media platforms has many benefits but also poses substantial threats. Users often face an unintended loss of privacy, are bombarded with mis-/disinformation, or are trapped in filter bubbles due to over-personalized content. These threats are further exacerbated by the rise of hidden AI-driven algorithms working behind the scenes to shape users' thoughts, attitudes, and behavior

    更新日期:2020-09-18
  • SLGAN: Style- and Latent-guided Generative Adversarial Network for Desirable Makeup Transfer and Removal
    arXiv.cs.MM Pub Date : 2020-09-16
    Daichi Horita; Kiyoharu Aizawa

    There are five features to consider when using generative adversarial networks to apply makeup to photos of the human face. These features include (1) facial components, (2) interactive color adjustments, (3) makeup variations, (4) robustness to poses and expressions, and the (5) use of multiple reference images. Several related works have been proposed, mainly using generative adversarial networks

    更新日期:2020-09-18
  • CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation
    arXiv.cs.MM Pub Date : 2020-09-16
    Jing Yu; Yuan Chai; Yue Hu; Qi Wu

    Scene graphs are semantic abstraction of images that encourage visual understanding and reasoning. However, the performance of Scene Graph Generation (SGG) is unsatisfactory when faced with biased data in real-world scenarios. Conventional debiasing research mainly studies from the view of data representation, e.g. balancing data distribution or learning unbiased models and representations, ignoring

    更新日期:2020-09-18
  • A Convolutional LSTM based Residual Network for Deepfake Video Detection
    arXiv.cs.MM Pub Date : 2020-09-16
    Shahroz Tariq; Sangyup Lee; Simon S. Woo

    In recent years, deep learning-based video manipulation methods have become widely accessible to masses. With little to no effort, people can easily learn how to generate deepfake videos with only a few victims or target images. This creates a significant social problem for everyone whose photos are publicly available on the Internet, especially on social media websites. Several deep learning-based

    更新日期:2020-09-18
  • Themes Inferred Audio-visual Correspondence Learning
    arXiv.cs.MM Pub Date : 2020-09-14
    Runze Su; Fei Tao; Xudong Liu; Haoran Wei; Xiaorong Mei; Zhiyao Duan; Lei Yuan; Ji Liu; Yuying Xie

    The applications of short-termuser generated video(UGV),such as snapchat, youtube short-term videos, booms recently,raising lots of multimodal machine learning tasks. Amongthem, learning the correspondence between audio and vi-sual information from videos is a challenging one. Mostprevious work of theaudio-visual correspondence(AVC)learning only investigated on constrained videos or simplesettings

    更新日期:2020-09-15
  • DualLip: A System for Joint Lip Reading and Generation
    arXiv.cs.MM Pub Date : 2020-09-12
    Weicong Chen; Xu Tan; Yingce Xia; Tao Qin; Yu Wang; Tie-Yan Liu

    Lip reading aims to recognize text from talking lip, while lip generation aims to synthesize talking lip according to text, which is a key component in talking face generation and is a dual task of lip reading. In this paper, we develop DualLip, a system that jointly improves lip reading and generation by leveraging the task duality and using unlabeled text and lip video data. The key ideas of the

    更新日期:2020-09-15
  • A Review of Visual Descriptors and Classification Techniques Used in Leaf Species Identification
    arXiv.cs.MM Pub Date : 2020-09-13
    K. K. Thyagharajan; I. Kiruba Raji

    Plants are fundamentally important to life. Key research areas in plant science include plant species identification, weed classification using hyper spectral images, monitoring plant health and tracing leaf growth, and the semantic interpretation of leaf information. Botanists easily identify plant species by discriminating between the shape of the leaf, tip, base, leaf margin and leaf vein, as well

    更新日期:2020-09-15
  • Attention Cube Network for Image Restoration
    arXiv.cs.MM Pub Date : 2020-09-13
    Yucheng Hang; Qingmin Liao; Wenming Yang; Yupeng Chen; Jie Zhou

    Recently, deep convolutional neural network (CNN) have been widely used in image restoration and obtained great success. However, most of existing methods are limited to local receptive field and equal treatment of different types of information. Besides, existing methods always use a multi-supervised method to aggregate different feature maps, which can not effectively aggregate hierarchical feature

    更新日期:2020-09-15
  • Micro-Facial Expression Recognition Based on Deep-Rooted Learning Algorithm
    arXiv.cs.MM Pub Date : 2020-09-12
    S. D. Lalitha; K. K. Thyagharajan

    Facial expressions are important cues to observe human emotions. Facial expression recognition has attracted many researchers for years, but it is still a challenging topic since expression features vary greatly with the head poses, environments, and variations in the different persons involved. In this work, three major steps are involved to improve the performance of micro-facial expression recognition

    更新日期:2020-09-15
  • RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
    arXiv.cs.MM Pub Date : 2020-09-12
    Niluthpol Chowdhury Mithun; Karan Sikka; Han-Pang Chiu; Supun Samarasekera; Rakesh Kumar

    We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over

    更新日期:2020-09-15
  • Hybrid Space Learning for Language-based Video Retrieval
    arXiv.cs.MM Pub Date : 2020-09-10
    Jianfeng Dong; Xirong Li; Chaoxi Xu; Gang Yang; Xun Wang

    This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described exclusively in the form of a natural-language sentence, with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is crucial. To

    更新日期:2020-09-14
  • OCR Graph Features for Manipulation Detection in Documents
    arXiv.cs.MM Pub Date : 2020-09-10
    Hailey James; Otkrist Gupta; Dan Raviv

    Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven

    更新日期:2020-09-14
  • Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space
    arXiv.cs.MM Pub Date : 2020-08-22
    Sicheng Zhao; Yaxian Li; Xingxu Yao; Weizhi Nie; Pengfei Xu; Jufeng Yang; Kurt Keutzer

    Both images and music can convey rich semantics and are widely used to induce specific emotions. Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music matching methods either employ limited categorical emotion states which cannot well reflect the complexity and subtlety of emotions, or train the matching

    更新日期:2020-09-14
  • Key-Point Sequence Lossless Compression for Intelligent Video Analysis
    arXiv.cs.MM Pub Date : 2020-09-10
    Weiyao Lin; Xiaoyi He; Wenrui Dai; John See; Tushar Shinde; Hongkai Xiong; Lingyu Duan

    Feature coding has been recently considered to facilitate intelligent video analysis for urban computing. Instead of raw videos, extracted features in the front-end are encoded and transmitted to the back-end for further processing. In this article, we present a lossless key-point sequence compression approach for efficient feature coding. The essence of this predict-and-encode strategy is to eliminate

    更新日期:2020-09-11
  • Multi-modal Attention for Speech Emotion Recognition
    arXiv.cs.MM Pub Date : 2020-09-09
    Zexu Pan; Zhaojie Luo; Jichen Yang; Haizhou Li

    Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to make use of visual and textual cues in speech emotion recognition. We propose a novel multi-modal attention mechanism, cLSTM-MMA,

    更新日期:2020-09-10
  • An optimal mode selection algorithm for scalable video coding
    arXiv.cs.MM Pub Date : 2020-09-08
    L. Balaji; K. K. Thyagharajan; C. Raja; A. Dhanalakshmi

    Scalable video coding (SVC) is extended from its predecessor advanced video coding (AVC) because of its flexible transmission to all type of gadgets. However, SVC is more flexible and scalable than AVC, but it is more complex in determining the computations than AVC. The traditional full search method in the standard H.264 SVC consumes more encoding time for computation. This complexity in computation

    更新日期:2020-09-10
  • Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
    arXiv.cs.MM Pub Date : 2020-09-07
    Wei Zhou; Zhibo Chen

    In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method

    更新日期:2020-09-10
  • User-assisted Video Reflection Removal
    arXiv.cs.MM Pub Date : 2020-09-07
    Amgad Ahmed; Suhong Kim; Mohamed Elgharib; Mohamed Hefeeda

    Reflections in videos are obstructions that often occur when videos are taken behind reflective surfaces like glass. These reflections reduce the quality of such videos, lead to information loss and degrade the accuracy of many computer vision algorithms. A video containing reflections is a combination of background and reflection layers. Thus, reflection removal is equivalent to decomposing the video

    更新日期:2020-09-08
  • Deepfake detection: humans vs. machines
    arXiv.cs.MM Pub Date : 2020-09-07
    Pavel Korshunov; Sébastien Marcel

    Deepfake videos, where a person's face is automatically swapped with a face of someone else, are becoming easier to generate with more realistic results. In response to the threat such manipulations can pose to our trust in video evidence, several large datasets of deepfake videos and many methods to detect them were proposed recently. However, it is still unclear how realistic deepfake videos are

    更新日期:2020-09-08
  • A Convolutional Neural Network-Based Low Complexity Filter
    arXiv.cs.MM Pub Date : 2020-09-06
    Chao Liu; Heming Sun; Jiro Katto; Xiaoyang Zeng; Yibo Fan

    Convolutional Neural Network (CNN)-based filters have achieved significant performance in video artifacts reduction. However, the high complexity of existing methods makes it difficult to be applied in real usage. In this paper, a CNN-based low complexity filter is proposed. We utilize depth separable convolution (DSC) merged with the batch normalization (BN) as the backbone of our proposed CNN-based

    更新日期:2020-09-08
  • Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
    arXiv.cs.MM Pub Date : 2020-09-05
    Jingjun Liang; Ruichen Li; Qin Jin

    Automatic emotion recognition is an active research topic with wide range of applications. Due to the high manual annotation cost and inevitable label ambiguity, the development of emotion recognition dataset is limited in both scale and quality. Therefore, one of the key challenges is how to build effective models with limited data resource. Previous works have explored different approaches to tackle

    更新日期:2020-09-08
  • Visual Sentiment Analysis from Disaster Images in Social Media
    arXiv.cs.MM Pub Date : 2020-09-04
    Syed Zohaib Hassan; Kashif Ahmad; Steven Hicks; Paal Halvorsen; Ala Al-Fuqaha; Nicola Conci; Michael Riegler

    The increasing popularity of social networks and users' tendency towards sharing their feelings, expressions, and opinions in text, visual, and audio content, have opened new opportunities and challenges in sentiment analysis. While sentiment analysis of text streams has been widely explored in literature, sentiment analysis from images and videos is relatively new. This article focuses on visual sentiment

    更新日期:2020-09-08
  • Dynamic Context-guided Capsule Network for Multimodal Machine Translation
    arXiv.cs.MM Pub Date : 2020-09-04
    Huan Lin; Fandong Meng; Jinsong Su; Yongjing Yin; Zhengyuan Yang; Yubin Ge; Jie Zhou; Jiebo Luo

    Multimodal machine translation (MMT), which mainly focuses on enhancing text-only translation with visual features, has attracted considerable attention from both computer vision and natural language processing communities. Most current MMT models resort to attention mechanism, global context modeling or multimodal joint representation learning to utilize visual features. However, the attention mechanism

    更新日期:2020-09-08
  • Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics
    arXiv.cs.MM Pub Date : 2020-09-03
    Arun K. SinghIndian Institute of Technology Jammu; Priyanka SinghDhirubhai Ambani Institute of Information and Communication Technology

    Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech

    更新日期:2020-09-08
  • Robust Homomorphic Video Hashing
    arXiv.cs.MM Pub Date : 2020-09-03
    Priyanka Singh

    The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these

    更新日期:2020-09-08
  • Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
    arXiv.cs.MM Pub Date : 2020-09-03
    Long Chen; Wenbo Ma; Jun Xiao; Hanwang Zhang; Wei Liu; Shih-Fu Chang

    The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals. In this paper, we argue that these methods overlook an obvious mismatch between

    更新日期:2020-09-05
  • Embedded Blockchains: A Synthesis of Blockchains, Spread Spectrum Watermarking, Perceptual Hashing & Digital Signatures
    arXiv.cs.MM Pub Date : 2020-09-02
    Sam Blake

    In this paper we introduce a scheme for detecting manipulated audio and video. The scheme is a synthesis of blockchains, encrypted spread spectrum watermarks, perceptual hashing and digital signatures, which we call an Embedded Blockchain. Within this scheme, we use the blockchain for its data structure of a cryptographically linked list, cryptographic hashing for absolute comparisons, perceptual hashing

    更新日期:2020-09-03
  • Depth Range Reduction for 3D Range Geometry Compression
    arXiv.cs.MM Pub Date : 2020-09-02
    Matthew G. Finley; Tyler Bell

    Three-dimensional (3D) shape measurement devices and techniques are being rapidly adopted within a variety of industries and applications. As acquiring 3D range data becomes faster and more accurate it becomes more challenging to efficiently store, transmit, or stream this data. One prevailing approach to compressing 3D range data is to encode it within the color channels of regular 2D images. This

    更新日期:2020-09-03
  • Unsupervised Single-Image Reflection Separation Using Perceptual Deep Image Priors
    arXiv.cs.MM Pub Date : 2020-09-01
    Suhong Kim; Hamed RahmaniKhezri; Seyed Mohammad Nourbakhsh; Mohamed Hefeeda

    Reflections often degrade the quality of the image by obstructing the background scene. This is not desirable for everyday users, and it negatively impacts the performance of multimedia applications that process images with reflections. Most current methods for removing reflections utilize supervised-learning models. However, these models require an extensive number of image pairs to perform well,

    更新日期:2020-09-03
  • Online Multi-Object Tracking and Segmentation with GMPHD Filter and Simple Affinity Fusion
    arXiv.cs.MM Pub Date : 2020-08-31
    Young-min Song; Moongu Jeon

    In this paper, we propose a highly practical fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input in video. The proposed method exploits the Gaussian mixture probability hypothesis density (GMPHD) filter for online approach which is extended with a hierarchical data association (HDA) and a simple affinity fusion (SAF) model. HDA consists

    更新日期:2020-09-02
  • Personal Food Model
    arXiv.cs.MM Pub Date : 2020-08-28
    Ali Rostami; Vaibhav Pandey; Nitish Nag; Vesper Wang; Ramesh Jain

    Food is central to life. Food provides us with energy and foundational building blocks for our body and is also a major source of joy and new experiences. A significant part of the overall economy is related to food. Food science, distribution, processing, and consumption have been addressed by different communities using silos of computational approaches. In this paper, we adopt a person-centric multimedia

    更新日期:2020-09-01
  • Semantics Preserving Hierarchy based Retrieval of Indian heritage monuments
    arXiv.cs.MM Pub Date : 2020-08-28
    Ronak Gupta; Prerana Mukherjee; Brejesh Lall; Varshul Gupta

    Monument classification can be performed on the basis of their appearance and shape from coarse to fine categories. Although there is much semantic information present in the monuments which is reflected in the eras they were built, its type or purpose, the dynasty which established it, etc. Particularly, Indian subcontinent exhibits a huge deal of variation in terms of architectural styles owing to

    更新日期:2020-09-01
  • Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset for Personality Assessment
    arXiv.cs.MM Pub Date : 2020-08-31
    Shahid Nawaz Khan; Maitree Leekha; Jainendra Shukla; Rajiv Ratn Shah

    Automatically detecting personality traits can aid several applications, such as mental health recognition and human resource management. Most datasets introduced for personality detection so far have analyzed these traits for each individual in isolation. However, personality is intimately linked to our social behavior. Furthermore, surprisingly little research has focused on personality analysis

    更新日期:2020-09-01
  • Augmented Reality-Based Advanced Driver-Assistance System for Connected Vehicles
    arXiv.cs.MM Pub Date : 2020-08-31
    Ziran Wang; Kyungtae Han; Prashant Tiwari

    With the development of advanced communication technology, connected vehicles become increasingly popular in our transportation systems, which can conduct cooperative maneuvers with each other as well as road entities through vehicle-to-everything communication. A lot of research interests have been drawn to other building blocks of a connected vehicle system, such as communication, planning, and control

    更新日期:2020-09-01
  • Joint Transmission in QoE-Driven Backhaul-Aware MC-NOMA Cognitive Radio Network
    arXiv.cs.MM Pub Date : 2020-08-30
    Hosein Zarini; Ata Khalili; Hina Tabassum; Mehdi Rasti

    In this paper, we develop a resource allocation framework to optimize the downlink transmission of a backhaul-aware multi-cell cognitive radio network (CRN) which is enabled with multi-carrier non-orthogonal multiple access (MC-NOMA). The considered CRN is composed of a single macro base station (MBS) and multiple small BSs (SBSs) that are referred to as the primary and secondary tiers, respectively

    更新日期:2020-09-01
  • Dual Attention GANs for Semantic Image Synthesis
    arXiv.cs.MM Pub Date : 2020-08-29
    Hao Tang; Song Bai; Nicu Sebe

    In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose

    更新日期:2020-09-01
  • Rate distortion optimization over large scale video corpus with machine learning
    arXiv.cs.MM Pub Date : 2020-08-27
    Sam John; Akshay Gadde; Balu Adsumilli

    We present an efficient codec-agnostic method for bitrate allocation over a large scale video corpus with the goal of minimizing the average bitrate subject to constraints on average and minimum quality. Our method clusters the videos in the corpus such that videos within one cluster have similar rate-distortion (R-D) characteristics. We train a support vector machine classifier to predict the R-D

    更新日期:2020-08-31
  • Quality of Service (QoS): Measurements of Video Streaming
    arXiv.cs.MM Pub Date : 2020-08-27
    Sajida Karim; Hui He; Asif Ali Laghari; Hina Madiha

    Nowadays video streaming is growing over the social clouds, where end-users always want to share High Definition (HD) videos among friends. Mostly videos were recorded via smartphones and other HD devices and short time videos have a big file size. The big file size of videos required high bandwidth to upload and download on the Internet and also required more time to load in a web page for play. So

    更新日期:2020-08-28
  • Multi-task deep CNN model for no-reference image quality assessment on smartphone camera photos
    arXiv.cs.MM Pub Date : 2020-08-27
    Chen-Hsiu Huang; Ja-Ling Wu

    Smartphone is the most successful consumer electronic product in today's mobile social network era. The smartphone camera quality and its image post-processing capability is the dominant factor that impacts consumer's buying decision. However, the quality evaluation of photos taken from smartphones remains a labor-intensive work and relies on professional photographers and experts. As an extension

    更新日期:2020-08-28
  • High Efficiency Rate Control for Versatile Video Coding Based on Composite Cauchy Distribution
    arXiv.cs.MM Pub Date : 2020-08-26
    Yunhao Mao; Meng Wang; Shiqi Wang; Sam Kwong

    In this work, we propose a novel rate control algorithm for Versatile Video Coding (VVC) standard based on its distinct rate-distortion characteristics. By modelling the transform coefficients with the composite Cauchy distribution, higher accuracy compared with traditional distributions has been achieved. Based on the transform coefficient modelling, the theoretically derived R-Q and D-Q models which

    更新日期:2020-08-27
  • Low Complexity Trellis-Coded Quantization in Versatile Video Coding
    arXiv.cs.MM Pub Date : 2020-08-26
    Meng Wang; Shiqi Wang; Junru Li; Li Zhang; Yue Wang; Siwei Ma; Sam Kwong

    The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce

    更新日期:2020-08-27
  • ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images
    arXiv.cs.MM Pub Date : 2020-08-25
    Yu-Hui Lee; Shang-Hong Lai

    In this paper, we propose a novel image-to-image GAN framework for eyeglasses removal, called ByeGlassesGAN, which is used to automatically detect the position of eyeglasses and then remove them from face images. Our ByeGlassesGAN consists of an encoder, a face decoder, and a segmentation decoder. The encoder is responsible for extracting information from the source face image, and the face decoder

    更新日期:2020-08-26
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
物理学研究前沿热点精选期刊推荐
chemistry
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷
屿渡论文,编辑服务
阿拉丁试剂right
南昌大学
王辉
南方科技大学
彭小水
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
赵延川
李霄羽
廖矿标
朱守非
试剂库存
down
wechat
bug