当前期刊: arXiv - CS - Multimedia Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)
    arXiv.cs.MM Pub Date : 2020-06-10
    Andrew Perkis; Christian Timmerer; Sabina Baraković; Jasmina Baraković Husić; Søren Bech; Sebastian Bosse; Jean Botev; Kjell Brunnström; Luis Cruz; Katrien De Moor; Andrea de Polo Saibanti; Wouter Durnez; Sebastian Egger-Lampl; Ulrich Engelke; Tiago H. Falk; Asim Hameed; Andrew Hines; Tanja Kojic; Dragan Kukolj; Eirini Liotou; Dragorad Milovanovic; Sebastian Möller; Niall Murray; Babak Naderi; Manuela

    With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions, scope

    更新日期:2020-07-15
  • Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes
    arXiv.cs.MM Pub Date : 2020-07-11
    Xianchao Wu; Chengyuan Wang; Qinying Lei

    Current state-of-the-art AI based classical music creation algorithms such as Music Transformer are trained by employing single sequence of notes with time-shifts. The major drawback of absolute time interval expression is the difficulty of similarity computing of notes that share the same note value yet different tempos, in one or among MIDI files. In addition, the usage of single sequence restricts

    更新日期:2020-07-15
  • MFRNet: A New CNN Architecture for Post-Processing and In-loop Filtering
    arXiv.cs.MM Pub Date : 2020-07-14
    Di Ma; Fan Zhang; David R. Bull

    In this paper, we propose a novel convolutional neural network (CNN) architecture, MFRNet, for post-processing (PP) and in-loop filtering (ILF) in the context of video compression. This network consists of four Multi-level Feature review Residual dense Blocks (MFRBs), which are connected using a cascading structure. Each MFRB extracts features from multiple convolutional layers using dense connections

    更新日期:2020-07-15
  • Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing
    arXiv.cs.MM Pub Date : 2020-07-13
    Piyush Yadav; Dhaval Salwala; Edward Curry

    Complex Event Processing (CEP) is an event processing paradigm to perform real-time analytics over streaming data and match high-level event patterns. Presently, CEP is limited to process structured data stream. Video streams are complicated due to their unstructured data model and limit CEP systems to perform matching over them. This work introduces a graph-based structure for continuous evolving

    更新日期:2020-07-14
  • Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis
    arXiv.cs.MM Pub Date : 2020-07-11
    Ankit Sharma; Puneet Kumar; Vikas Maddukuri; Nagasai Madamshettib; Kishore KG; Sahit Sai Sriram Kavurub; Balasubramanian Raman; Partha Pratim Roy

    The performance of text-to-speech (TTS) systems heavily depends on spectrogram to waveform generation, also known as the speech reconstruction phase. The time required for the same is known as synthesis delay. In this paper, an approach to reduce speech synthesis delay has been proposed. It aims to enhance the TTS systems for real-time applications such as digital assistants, mobile phones, embedded

    更新日期:2020-07-14
  • $\ell_1$SABMIS: $\ell_1$-minimization and sparse approximation based blind multi-image steganography scheme
    arXiv.cs.MM Pub Date : 2020-07-09
    Rohit Agrawal

    Steganography plays a vital role in achieving secret data security by embedding it into cover media. The cover media and the secret data can be text or multimedia, such as images, videos, etc. In this paper, we propose a novel $\ell_1$-minimization and sparse approximation based blind multi-image steganography scheme, termed $\ell_1$SABMIS. By using $\ell_1$SABMIS, multiple secret images can be hidden

    更新日期:2020-07-13
  • Multi-task Regularization Based on Infrequent Classes for Audio Captioning
    arXiv.cs.MM Pub Date : 2020-07-09
    Emre Çakır; Konstantinos Drossos; Tuomas Virtanen

    Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio. Most audio captioning methods are based on deep neural networks, employing an encoder-decoder scheme and a dataset with audio clips and corresponding natural language descriptions (i.e. captions). A significant challenge for audio captioning is the distribution of words in the captions:

    更新日期:2020-07-10
  • Reversible Data Hiding in Encrypted Images Based on Bit plane Compression of Prediction Error
    arXiv.cs.MM Pub Date : 2020-07-08
    Youqing Wu; Wenjing Ma; Yinyin Peng; Ruiling Zhang; Zhaoxia Yin

    As a technology that can protect the information on the original image of being disclosed and accurately extract the embedded information, the reversible data hiding in encrypted images (RDHEI) has been widely concerned by researchers. One of the current challenges is how to further improve the performance of the RDHEI method. In this paper, a high-capacity RDHEI method based on bit plane compression

    更新日期:2020-07-09
  • Reversible data hiding in encrypted images based on pixel prediction and multi-MSB planes rearrangement
    arXiv.cs.MM Pub Date : 2020-07-08
    Zhaoxia Yin; Xiaomeng She; Jin Tang; Bin Luo

    Great concern has arisen in the field of reversible data hiding in encrypted images (RDHEI) due to the development of cloud storage and privacy protection. RDHEI is an effective technology that can embed additional data after image encryption, extract additional data without any errors and reconstruct original images losslessly. In this paper, a high-capacity and fully reversible data hiding in encrypted

    更新日期:2020-07-09
  • Real-time Semantic Segmentation with Fast Attention
    arXiv.cs.MM Pub Date : 2020-07-07
    Ping Hu; Federico Perazzi; Fabian Caba Heilbron; Oliver Wang; Zhe Lin; Kate Saenko; Stan Sclaroff

    In deep CNN based models for semantic segmentation, high accuracy relies on rich spatial context (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel architecture that addresses both challenges and achieves state-of-the-art performance for semantic segmentation of high-resolution images and videos in real-time

    更新日期:2020-07-09
  • Cost-Efficient Storage for On-Demand Video Streaming on Cloud
    arXiv.cs.MM Pub Date : 2020-07-07
    Mahmoud Darwich; Yasser Ismail; Talal Darwich; Magdy Bayoumi

    Video stream is converted to several formats to support the user's device, this conversion process is called video transcoding, which imposes high storage and powerful resources. With emerging of cloud technology, video stream companies adopted to process video on the cloud. Generally, many formats of the same video are made (pre-transcoded) and streamed to the adequate user's device. However, pre-transcoding

    更新日期:2020-07-08
  • Smartphone-based Wellness Assessment Using Mobile Environmental Sensor
    arXiv.cs.MM Pub Date : 2020-07-01
    Katherine McLeod; Petros Spachos; Konstantinos Plataniotis

    Mental health and general wellness are becoming a growing concern in our society. Environmental factors contribute to mental illness and have the power to affect a person's wellness. This work presents a smartphone-based wellness assessment system and examines if there is any correlation with one's environment and their wellness. The introduced system was initiated in response to a growing need for

    更新日期:2020-07-08
  • An Automated and Robust Image Watermarking Scheme Based on Deep Neural Networks
    arXiv.cs.MM Pub Date : 2020-07-05
    Xin Zhong; Pei-Chi Huang; Spyridon Mastorakis; Frank Y. Shih

    Digital image watermarking is the process of embedding and extracting a watermark covertly on a cover-image. To dynamically adapt image watermarking algorithms, deep learning-based image watermarking schemes have attracted increased attention during recent years. However, existing deep learning-based watermarking methods neither fully apply the fitting ability to learn and automate the embedding and

    更新日期:2020-07-07
  • Deep Convolutional Neural Network for Identifying Seam-Carving Forgery
    arXiv.cs.MM Pub Date : 2020-07-05
    Seung-Hun Nam; Wonhyuk Ahn; In-Jae Yu; Myung-Joon Kwon; Minseok Son; Heung-Kyu Lee

    Seam carving is a representative content-aware image retargeting approach to adjust the size of an image while preserving its visually prominent content. To maintain visually important content, seam-carving algorithms first calculate the connected path of pixels, referred to as the seam, according to a defined cost function and then adjust the size of an image by removing and duplicating repeatedly

    更新日期:2020-07-07
  • Image Aesthetics Prediction Using Multiple Patches Preserving the Original Aspect Ratio of Contents
    arXiv.cs.MM Pub Date : 2020-07-05
    Lijie Wang; Xueting Wang; Toshihiko Yamasaki

    The spread of social networking services has created an increasing demand for selecting, editing, and generating impressive images. This trend increases the importance of evaluating image aesthetics as a complementary function of automatic image processing. We propose a multi-patch method, named MPA-Net (Multi-Patch Aggregation Network), to predict image aesthetics scores by maintaining the original

    更新日期:2020-07-07
  • An Integer Approximation Method for Discrete Sinusoidal Transforms
    arXiv.cs.MM Pub Date : 2020-07-05
    R. J. Cintra

    Approximate methods have been considered as a means to the evaluation of discrete transforms. In this work, we propose and analyze a class of integer transforms for the discrete Fourier, Hartley, and cosine transforms (DFT, DHT, and DCT), based on simple dyadic rational approximation methods. The introduced method is general, applicable to several block-lengths, whereas existing approaches are usually

    更新日期:2020-07-07
  • Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
    arXiv.cs.MM Pub Date : 2020-07-04
    Hengguan Huang; Fuzhao Xue; Hao Wang; Ye Wang

    Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational

    更新日期:2020-07-07
  • Quo Vadis, Skeleton Action Recognition ?
    arXiv.cs.MM Pub Date : 2020-07-04
    Pranay Gupta; Anirudh Thatipelli; Aditya Aggarwal; Shubh Maheshwari; Neel Trivedi; Sourav Das; Ravi Kiran Sarvadevabhatla

    In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. To examine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700

    更新日期:2020-07-07
  • Estimating Blink Probability for Highlight Detection in Figure Skating Videos
    arXiv.cs.MM Pub Date : 2020-07-02
    Tamami Nakano; Atsuya Sakata; Akihiro Kishimoto

    Highlight detection in sports videos has a broad viewership and huge commercial potential. It is thus imperative to detect highlight scenes more suitably for human interest with high temporal accuracy. Since people instinctively suppress blinks during attention-grabbing events and synchronously generate blinks at attention break points in videos, the instantaneous blink rate can be utilized as a highly

    更新日期:2020-07-03
  • Playback experience driven cross layer optimisation of APP, transport and MAC layer for video clients over long-term evolution system
    arXiv.cs.MM Pub Date : 2020-07-02
    Xinyu Huang; Lijun He

    In traditional communication system, information of APP (Application) layer, transport layer and MAC (Media Access Control)layer has not been fully interacted,which inevitably leads to inconsistencies among TCP congestion state, clients'requirements and resource allocation. To solve the problem, we propose a joint optimization framework, which consists of APP layer, transport layer and MAC layer, to

    更新日期:2020-07-03
  • Bayesian Low Rank Tensor Ring Model for Image Completion
    arXiv.cs.MM Pub Date : 2020-06-29
    Zhen Long; Ce Zhu; Jiani Liu; Yipeng Liu

    Low rank tensor ring model is powerful for image completion which recovers missing entries in data acquisition and transformation. The recently proposed tensor ring (TR) based completion algorithms generally solve the low rank optimization problem by alternating least squares method with predefined ranks, which may easily lead to overfitting when the unknown ranks are set too large and only a few measurements

    更新日期:2020-07-03
  • FVV Live: A real-time free-viewpoint video system with consumer electronics hardware
    arXiv.cs.MM Pub Date : 2020-07-01
    Pablo Carballeira; Carlos Carmona; César Díaz; Daniel Berjón; Daniel Corregidor; Julián Cabrera; Francisco Morán; Carmen Doblado; Sergio Arnaldo; María del Mar Martín; Narciso García

    FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation, based on off-the-shelf components. The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware, which enables low deployment costs and easy installation for immersive event-broadcasting or videoconferencing. The paper describes the architecture

    更新日期:2020-07-02
  • FVV Live: Real-Time, Low-Cost, Free Viewpoint Video
    arXiv.cs.MM Pub Date : 2020-06-30
    Daniel Berjón; Pablo Carballeira; Julián Cabrera; Carlos Carmona; Daniel Corregidor; César Díaz; Francisco Morán; Narciso García

    FVV Live is a novel real-time, low-latency, end-to-end free viewpoint system including capture, transmission, synthesis on an edge server and visualization and control on a mobile terminal. The system has been specially designed for low-cost and real-time operation, only using off-the-shelf components.

    更新日期:2020-07-01
  • BitMix: Data Augmentation for Image Steganalysis
    arXiv.cs.MM Pub Date : 2020-06-30
    In-Jae Yu; Wonhyuk Ahn; Seung-Hun Nam; Heung-Kyu Lee

    Convolutional neural networks (CNN) for image steganalysis demonstrate better performances with employing concepts from high-level vision tasks. The major employed concept is to use data augmentation to avoid overfitting due to limited data. To augment data without damaging the message embedding, only rotating multiples of 90 degrees or horizontally flipping are used in steganalysis, which generates

    更新日期:2020-07-01
  • A Universal Framework to Construct a Huffman-Code-Mapping-based Reversible Data Hiding Scheme for JPEG Images
    arXiv.cs.MM Pub Date : 2020-06-29
    Zhaoxia Yin; Yang Du; Yuan Ji

    Huffman code mapping (HCM) is a recent technique for reversible data hiding (RDH) in JPEG images. The existing HCM-based RDH schemes cause neither file-size increment nor visual distortion for the marked JPEG image, which is the superiority compared to the RDH schemes that use other techniques, such as histogram shifting (HS). However, the embedding capacity achieved by the HCM-based RDH schemes is

    更新日期:2020-06-30
  • Chroma Intra Prediction with attention-based CNN architectures
    arXiv.cs.MM Pub Date : 2020-06-27
    Marc Górriz; Saverio Blasi; Alan F. Smeaton; Noel E. O'Connor; Marta Mrak

    Neural networks can be used in video coding to improve chroma intra-prediction. In particular, usage of fully-connected networks has enabled better cross-component prediction with respect to traditional linear models. Nonetheless, state-of-the-art architectures tend to disregard the location of individual reference samples in the prediction process. This paper proposes a new neural network architecture

    更新日期:2020-06-30
  • An Advert Creation System for 3D Product Placements
    arXiv.cs.MM Pub Date : 2020-06-26
    Ivan Bacher; Hossein Javidnia; Soumyabrata Dev; Rahul Agrahari; Murhaf Hossari; Matthew Nicholson; Clare Conran; Jian Tang; Peng Song; David Corrigan; François Pitié

    Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This

    更新日期:2020-06-29
  • QoE-Driven UAV-Enabled Pseudo-Analog Wireless Video Broadcast: A Joint Optimization of Power and Trajectory
    arXiv.cs.MM Pub Date : 2020-06-25
    Xiao-Wei Tang; Xin-Lin Huang; Fei Hu

    The explosive demands for high quality mobile video services have caused heavy overload to the existing cellular networks. Although the small cell has been proposed to alleviate such a problem, the network operators may not be interested in deploying numerous base stations (BSs) due to expensive infrastructure construction and maintenance. The unmanned aerial vehicles (UAVs) can provide the low-cost

    更新日期:2020-06-26
  • Fine granularity access in interactive compression of 360-degree images based on rate adaptive channel codes
    arXiv.cs.MM Pub Date : 2020-06-25
    Navid Mahmoudian Bidgoli; Thomas Maugey; Aline Roumy

    In this paper, we propose a new interactive compression scheme for omnidirectional images. This requires two characteristics: efficient compression of data, to lower the storage cost, and random access ability to extract part of the compressed stream requested by the user (for reducing the transmission rate). For efficient compression, data needs to be predicted by a series of references that have

    更新日期:2020-06-26
  • Audeo: Audio Generation for a Silent Performance Video
    arXiv.cs.MM Pub Date : 2020-06-23
    Kun Su; Xiulong Liu; Eli Shlizerman

    We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video. Generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association

    更新日期:2020-06-26
  • Comprehensive Information Integration Modeling Framework for Video Titling
    arXiv.cs.MM Pub Date : 2020-06-24
    Shengyu Zhang; Ziqi Tan; Jin Yu; Zhou Zhao; Kun Kuang; Tan Jiang; Jingren Zhou; Hongxia Yang; Fei Wu

    In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive

    更新日期:2020-06-25
  • A Study on Impacts of Multiple Factors on Video Qualify of Experience
    arXiv.cs.MM Pub Date : 2020-06-23
    Huyen T. T. Tran; Nam Pham Ngoc; Truong Cong Thang

    HTTP Adaptive Streaming (HAS) has become a cost-effective means for multimedia delivery nowadays. However, how the quality of experience (QoE) is jointly affected by 1) varying perceptual quality and 2) interruptions is not well-understood. In this paper, we present the first attempt to quantitatively quantify the relative impacts of these factors on the QoE of streaming sessions. To achieve this purpose

    更新日期:2020-06-24
  • DeepQTMT: A Deep Learning Approach for Fast QTMT-based CU Partition of Intra-mode VVC
    arXiv.cs.MM Pub Date : 2020-06-23
    Tianyi Li; Mai Xu; Runzhi Tang

    The latest standard Versatile Video Coding (VVC) significantly improves the coding efficiency over its ancestor standard High Efficiency Video Coding (HEVC), but at the expense of sharply increased complexity. In VVC, the quadtree plus multi-type tree (QTMT) structure of coding unit (CU) partition accounts for most of encoding time, due to the brute-force search for recursive rate-distortion (RD) optimization

    更新日期:2020-06-24
  • On Addressing the Impact of ISO Speed upon PRNU and Forgery Detection
    arXiv.cs.MM Pub Date : 2020-06-20
    Yijun Quan; Chang-Tsun Li

    Photo Response Non-Uniformity (PRNU) has been used as a powerful device fingerprint for image forgery detection because image forgeries can be revealed by finding the absence of the PRNU in the manipulated areas. The correlation between an image's noise residual with the device's reference PRNU is often compared with a decision threshold to check the existence of the PRNU. A PRNU correlation predictor

    更新日期:2020-06-23
  • Capturing Video Frame Rate Variations through Entropic Differencing
    arXiv.cs.MM Pub Date : 2020-06-19
    Pavan C. Madhusudana; Neil Birkbeck; Yilin Wang; Balu Adsumilli; Alan C. Bovik

    High frame rate videos are increasingly getting popular in recent years majorly driven by strong requirements by the entertainment and streaming industries to provide high quality of experiences to consumers. To achieve the best trade-off between the bandwidth requirements and video quality in terms of frame rate adaptation, it is imperative to understand the effects of frame rate on video quality

    更新日期:2020-06-23
  • Feel The Music: Automatically Generating A Dance For An Input Song
    arXiv.cs.MM Pub Date : 2020-06-21
    Purva Tendulkar; Abhishek Das; Aniruddha Kembhavi; Devi Parikh

    We present a general computational approach that enables a machine to generate a dance for any input music. We encode intuitive, flexible heuristics for what a 'good' dance is: the structure of the dance should align with the structure of the music. This flexibility allows the agent to discover creative dances. Human studies show that participants find our dances to be more creative and inspiring compared

    更新日期:2020-06-23
  • Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams
    arXiv.cs.MM Pub Date : 2020-06-20
    Huirong Huang; Zhiyong Wu; Shiyin Kang; Dongyang Dai; Jia Jia; Tianxiao Fu; Deyi Tuo; Guangzhi Lei; Peng Liu; Dan Su; Dong Yu; Helen Meng

    Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phonetic

    更新日期:2020-06-23
  • A Multiparametric Class of Low-complexity Transforms for Image and Video Coding
    arXiv.cs.MM Pub Date : 2020-06-19
    D. R. Canterle; T. L. T. da Silveira; F. M. Bayer; R. J. Cintra

    Discrete transforms play an important role in many signal processing applications, and low-complexity alternatives for classical transforms became popular in recent years. Particularly, the discrete cosine transform (DCT) has proven to be convenient for data compression, being employed in well-known image and video coding standards such as JPEG, H.264, and the recent high efficiency video coding (HEVC)

    更新日期:2020-06-23
  • M2P2: Multimodal Persuasion Prediction using Adaptive Fusion
    arXiv.cs.MM Pub Date : 2020-06-03
    Chongyang Bai; Haipeng Chen; Srijan Kumar; Jure Leskovec; V. S. Subrahmanian

    Identifying persuasive speakers in an adversarial environment is a critical task. In a national election, politicians would like to have persuasive speakers campaign on their behalf. When a company faces adverse publicity, they would like to engage persuasive advocates for their position in the presence of adversaries who are critical of them. Debates represent a common platform for these forms of

    更新日期:2020-06-23
  • Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches
    arXiv.cs.MM Pub Date : 2020-06-19
    Omid Jafari; Parth Nagarkar

    Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many multimedia retrieval applications. Exact tree-based indexing approaches are known to suffer from the notorious curse of dimensionality for high-dimensional data. Approximate searching techniques sacrifice some accuracy while returning good enough results for faster performance. Locality Sensitive Hashing (LSH) is

    更新日期:2020-06-23
  • Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors
    arXiv.cs.MM Pub Date : 2020-06-19
    Omid Jafari; Parth Nagarkar; Jonathan Montaño

    Similarity search in high-dimensional spaces is an important task for many multimedia applications. Due to the notorious curse of dimensionality, approximate nearest neighbor techniques are preferred over exact searching techniques since they can return good enough results at a much better speed. Locality Sensitive Hashing (LSH) is a very popular random hashing technique for finding approximate nearest

    更新日期:2020-06-23
  • iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks
    arXiv.cs.MM Pub Date : 2020-06-13
    Aman Chadha; John Britto; M. Mani Roja

    Recently, learning-based models have enhanced the performance of single-image super-resolution (SISR). However, applying SISR successively to each video frame leads to a lack of temporal coherency. Convolutional neural networks (CNNs) outperform traditional approaches in terms of image quality metrics such as peak signal to noise ratio (PSNR) and structural similarity (SSIM). However, generative adversarial

    更新日期:2020-06-22
  • N=1 Modelling of Lifestyle Impact on SleepPerformance
    arXiv.cs.MM Pub Date : 2020-06-18
    Dhruv Upadhyay; Vaibhav Pandey; Nitish Nag; Ramesh Jain

    Sleep is critical to leading a healthy lifestyle. Each day, most people go to sleep without any idea about how their night's rest is going to be. For an activity that humans spend around a third of their life doing, there is a surprising amount of mystery around it. Despite current research, creating personalized sleep models in real-world settings has been challenging. Existing literature provides

    更新日期:2020-06-22
  • Artificial Musical Intelligence: A Survey
    arXiv.cs.MM Pub Date : 2020-06-17
    Elad Liebman; Peter Stone

    Computers have been used to analyze and create music since they were first introduced in the 1950s and 1960s. Beginning in the late 1990s, the rise of the Internet and large scale platforms for music recommendation and retrieval have made music an increasingly prevalent domain of machine learning and artificial intelligence research. While still nascent, several different approaches have been employed

    更新日期:2020-06-19
  • Video Moment Localization using Object Evidence and Reverse Captioning
    arXiv.cs.MM Pub Date : 2020-06-18
    Madhawa Vidanapathirana; Supriya Pandhre; Sonia Raychaudhuri; Anjali Khurana

    We address the problem of language-based temporal localization of moments in untrimmed videos. Compared to temporal localization with fixed categories, this problem is more challenging as the language-based queries have no predefined activity classes and may also contain complex descriptions. Current state-of-the-art model MAC addresses it by mining activity concepts from both video and language modalities

    更新日期:2020-06-19
  • Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance
    arXiv.cs.MM Pub Date : 2020-06-16
    Hao Hao Tan; Yin-Jyun Luo; Dorien Herremans

    We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics. We demonstrate how the model is able to apply fine-grained style morphing over the course of synthesizing

    更新日期:2020-06-18
  • AcED: Accurate and Edge-consistent Monocular Depth Estimation
    arXiv.cs.MM Pub Date : 2020-06-16
    Kunal Swami; Prasanna Vishnu Bondada; Pankaj Kumar Bajpai

    Single image depth estimation is a challenging problem. The current state-of-the-art method formulates the problem as that of ordinal regression. However, the formulation is not fully differentiable and depth maps are not generated in an end-to-end fashion. The method uses a na\"ive threshold strategy to determine per-pixel depth labels, which results in significant discretization errors. For the first

    更新日期:2020-06-16
  • AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
    arXiv.cs.MM Pub Date : 2020-06-16
    Andrew Rouditchenko; Angie Boggust; David Harwath; Dhiraj Joshi; Samuel Thomas; Kartik Audhkhasi; Rogerio Feris; Brian Kingsbury; Michael Picheny; Antonio Torralba; James Glass

    Current methods for learning visually grounded language from videos often rely on time-consuming and expensive data collection, such as human annotated textual summaries or machine generated automatic speech recognition transcripts. In this work, we introduce Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs

    更新日期:2020-06-16
  • Iterative Nadaraya-Watson Distribution Transfer for Colour Grading
    arXiv.cs.MM Pub Date : 2020-06-15
    Hana Alghamdi; Rozenn Dahyot

    We propose a new method with Nadaraya-Watson that maps one N-dimensional distribution to another taking into account available information about correspondences. We extend the 2D/3D problem to higher dimensions by encoding overlapping neighborhoods of data points and solve the high dimensional problem in 1D space using an iterative projection approach. To show potentials of this mapping, we apply it

    更新日期:2020-06-15
  • Go-CaRD -- Generic, Optical Car Part Recognition and Detection: Collection, Insights, and Applications
    arXiv.cs.MM Pub Date : 2020-06-15
    Lukas Stappen; Xinchen Du; Vincent Karas; Stefan Müller; Björn W. Schuller

    Systems for the automatic recognition and detection of automotive parts are crucial in several emerging research areas in the development of intelligent vehicles. They enable, for example, the detection and modelling of interactions between human and the vehicle. In this paper, we present three suitable datasets as well as quantitatively and qualitatively explore the efficacy of state-of-the-art deep

    更新日期:2020-06-15
  • ORD: Object Relationship Discovery for Visual Dialogue Generation
    arXiv.cs.MM Pub Date : 2020-06-15
    Ziwei Wang; Zi Huang; Yadan Luo; Huimin Lu

    With the rapid advancement of image captioning and visual question answering at single-round level, the question of how to generate multi-round dialogue about visual content has not yet been well explored.Existing visual dialogue methods encode the image into a fixed feature vector directly, concatenated with the question and history embeddings to predict the response.Some recent methods tackle the

    更新日期:2020-06-15
  • Mitigating Gender Bias in Captioning Systems
    arXiv.cs.MM Pub Date : 2020-06-15
    Ruixiang Tang; Mengnan Du; Yuening Li; Zirui Liu; Xia Hu

    Recent studies have shown that captioning datasets, such as the COCO dataset, may contain severe social bias which could potentially lead to unintentional discrimination in learning models. In this work, we specifically focus on the gender bias problem. The existing dataset fails to quantify bias because models that intrinsically memorize gender bias from training data could still achieve a competitive

    更新日期:2020-06-15
  • The genesis of Hippachus' celestial globe
    arXiv.cs.MM Pub Date : 2020-06-12
    Susanne M Hoffmann

    This paper summarises briefly and in English some of the results of the book Hoffmann: Hipparchs Himmelsglobus, Springer, 2017 that had to be written in German. The globe of Hipparchus is not preserved. For that reason, it has been a source of much speculation and scientific inquiry during the last few centuries. This study presents a new analysis of the data given in the commentary on Aratus' poem

    更新日期:2020-06-12
  • Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding
    arXiv.cs.MM Pub Date : 2020-06-11
    Luka Murn; Saverio Blasi; Alan F. Smeaton; Noel E. O'Connor; Marta Mrak

    Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion compensation

    更新日期:2020-06-11
  • Automatic Photo to Ideophone Manga Matching
    arXiv.cs.MM Pub Date : 2020-06-11
    David A. Shamma; Tony Dunnigan; Lyndon Kennedy

    Photo applications offer tools for annotation via text and stickers. Ideophones, mimetic and onomatopoeic words, which are common in graphic novels, have yet to be explored for photo annotation use. We present a method for automatic ideophone recommendation and positioning of the text on photos. These annotations are accomplished by obtaining a list of ideophones with English definitions and applying

    更新日期:2020-06-11
  • Hysia: Serving DNN-Based Video-to-Retail Applications in Cloud
    arXiv.cs.MM Pub Date : 2020-06-09
    Huaizheng Zhang; Yuanming Li; Qiming Ai; Yong Luo; Yonggang Wen; Yichao Jin; Nguyen Binh Duong Ta

    Combining \underline{v}ideo streaming and online \underline{r}etailing (V2R) has been a growing trend recently. In this paper, we provide practitioners and researchers in multimedia with a cloud-based platform named Hysia for easy development and deployment of V2R applications. The system consists of: 1) a back-end infrastructure providing optimized V2R related services including data engine, model

    更新日期:2020-06-09
  • Robust watermarking with double detector-discriminator approach
    arXiv.cs.MM Pub Date : 2020-06-06
    Marcin Plata; Piotr Syga

    In this paper we present a novel deep framework for a watermarking - a technique of embedding a transparent message into an image in a way that allows retrieving the message from a (perturbed) copy, so that copyright infringement can be tracked. For this technique, it is essential to extract the information from the image even after imposing some digital processing operations on it. Our framework outperforms

    更新日期:2020-06-06
  • Are Social Networks Watermarking Us or Are We (Unawarely) Watermarking Ourself?
    arXiv.cs.MM Pub Date : 2020-06-06
    Flavio Bertini; Rajesh Sharma; Danilo Montesi

    In the last decade, Social Networks (SNs) have deeply changed many aspects of society, and one of the most widespread behaviours is the sharing of pictures. However, malicious users often exploit shared pictures to create fake profiles leading to the growth of cybercrime. Thus, keeping in mind this scenario, authorship attribution and verification through image watermarking techniques are becoming

    更新日期:2020-06-06
  • Ensemble Network for Ranking Images Based on Visual Appeal
    arXiv.cs.MM Pub Date : 2020-06-06
    Sachin Singh; Victor Sanchez; Tanaya Guha

    We propose a computational framework for ranking images (group photos in particular) taken at the same event within a short time span. The ranking is expected to correspond with human perception of overall appeal of the images. We hypothesize and provide evidence through subjective analysis that the factors that appeal to humans are its emotional content, aesthetics and image quality. We propose a

    更新日期:2020-06-06
  • A Dataset and Benchmarks for Multimedia Social Analysis
    arXiv.cs.MM Pub Date : 2020-06-05
    Bofan Xue; David Chan; John Canny

    We present a new publicly available dataset with the goal of advancing multi-modality learning by offering vision and language data within the same context. This is achieved by obtaining data from a social media website with posts containing multiple paired images/videos and text, along with comment trees containing images/videos and/or text. With a total of 677k posts, 2.9 million post images, 488k

    更新日期:2020-06-05
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
产业、创新与基础设施
AI核心技术
10years
自然科研线上培训服务
材料学研究精选
Springer Nature Live 产业与创新线上学术论坛
胸腔和胸部成像专题
自然科研论文编辑服务
ACS ES&T Engineering
ACS ES&T Water
屿渡论文,编辑服务
杨超勇
周一歌
华东师范大学
段炼
清华大学
廖矿标
李远
跟Nature、Science文章学绘图
隐藏1h前已浏览文章
中洪博元
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
x-mol收录
福州大学
南京大学
王杰
丘龙斌
电子显微学
何凤
洛杉矶分校
吴杰
赵延川
试剂库存
天合科研
down
wechat
bug