当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Landmarking for Navigational Streaming of Stored High-Dimensional Media
arXiv - CS - Multimedia Pub Date : 2021-04-14 , DOI: arxiv-2104.06876
Yuan Yuan, Gene Cheung, Pascal Frossard, H. Vicky Zhao, Jiwu Huang

Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

中文翻译:

存储的高维媒体导航流的地标

现代媒体数据(例如360视频和光场(LF)图像)通常以比观察者的视觉显示更高的维度捕获。为了在带宽受限的网络上有效浏览高维媒体,需要考虑一种导航流模型:客户端通过指示到服务器的导航路径来导航大型媒体空间,而服务器则将相应的预编码媒体数据单元(MDU)发送给服务器。 )依次向客户发送)。对MDU(I-MDU)进行帧内编码会导致较大的比特率,但I-MDU可以随机访问,而使用另一个MDU作为预测变量对MDU(P-MDU)进行帧间编码会产生较小的编码成本,但会增加必须先发送和解码预测变量的顺序。从压缩的角度来看,技术挑战是:如何通过MDU的互编码实现编码增益,同时实现足够的随机访问以实现令人满意的用户导航。为了解决这个问题,我们提出了界标,即从高维媒体中选择关键的MDU。使用地标作为预测因子,对本地邻域中附近的MDU进行互编码,从而得到具有受控编码成本的预测MDU结构。这意味着最多可以通过发送界标和帧间编码的MDU来解码任何请求的MDU,从而实现导航随机访问。为了建立具有里程碑意义的MDU结构,我们采用树结构矢量量化器(TSVQ)首先优化地标位置,然后使用快速分支定界技术迭代地添加/删除帧间编码的MDU作为细化。以交互式LF图像和视口自适应360图像为例,I-,
更新日期:2021-04-15
down
wechat
bug