Full length articleDepth-aware blending of smoothed images for Bokeh effect generation☆,☆☆
Introduction
Depth-of-field effect or Bokeh effect is often used in photography to generate aesthetic pictures. Bokeh images basically focus on a certain subject and out-of-focus regions are blurred. Bokeh images can be captured in Single Lens Reflex cameras using high aperture. In contrast, most smartphone cameras have small fixed-sized apertures that cannot capture bokeh images. Many smartphone cameras with dual rear cameras can synthesize bokeh effect. Two images are captured from the cameras and stereo matching algorithms are used to compute depth maps and using this depth map, depth-of-field effect is generated. Some smartphones (e.g. iPhone7, Google Pixel 2) with good auto-focus hardware (dual lens or Phase-Detect Auto-Focus) can generate depth maps which helps in rendering Bokeh images. However, smartphones with single camera that do not have a good auto-focus sensor have to rely on software to synthesize bokeh effect.
Also, already captured images can be post-processed to have Bokeh effect by using this kind of software. That is why generation of synthetic depth-of-field or Bokeh effect is an important problem in Computer Vision and has gained attention recently. Most of the existing approaches [1], [2], [3] work on human portraits by leveraging image segmentation and depth estimation. However, not many approaches have been proposed for bokeh effect generation for images in the wild. Recently, [4] proposed an end-to-end network to generate bokeh effect on random images by leveraging monocular depth estimation and saliency detection.
In this paper, one such algorithm is proposed that can generate Bokeh effect from diverse images. The proposed approach relies on a depth-estimation network to generate weight maps that blend the input image and different smoothed versions of the input image. The generated bokeh images by this algorithm are visually pleasing. The proposed approach ranked 2nd in AIM 2019 challenge on Bokeh effect Synthesis-Perceptual Track [5].
Section snippets
Monocular depth estimation
Depth estimation from a single RGB image is a significant problem in Computer Vision with a long range of applications including robotics, augmented reality and autonomous driving. Recent advances in deep learning have helped in the progress of monocular depth estimation algorithms. Supervised algorithms rely on ground truth Depth data captured from depth sensors. [6] formulated monocular depth estimation as a combination of two sub-problems: view synthesis and stereo matching. View synthesis
Depth estimation network
Depth maps are important for Bokeh effect rendering. Since the goal is to design a system that is independent of the camera hardware, we have to compute depth map from the input RGB image. Thus, Monocular Depth Estimation network is an important element in the proposed algorithm. Megadepth [11] is used as Monocular Depth estimation network in this work. The authors use an hourglass architecture which was originally proposed in [7]. The architecture is shown in Fig. 1. The encoder part of this
System configuration
The codes were written in Python and Pytorch [25] is used as the deep learning framework. The models were trained on a machine with Intel Xeon 2.40 GHz processor, 64 GB RAM and NVIDIA GeForce TITAN X GPU card with approximately 12 GB of GPU memory.
Dataset description
We use ETH Zurich Bokeh dataset [5] (also known as EBB! Dataset[20]), which was used in AIM 2019 Bokeh Effect Synthesis Challenge. This dataset contains 4893 pairs of bokeh and bokeh-free images. Training set contains 4493 pairs whereas Validation and
Testing strategy
During testing, the input image is first resized to the original dimension in which the network was trained (384 × 512 for Phase-1 and 768 × 1024 for Phase-2 and Phase-3) and then passed to the network. The synthesized image is then scaled back to the input image resolution using bilinear interpolation.
Evaluation metrics
Both fidelity and perceptual metrics are used to evaluate the model’s performance. Fidelity measures include Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) [26]. Learned
Changing the type of blur kernel
Although the proposed network produces perceptually good bokeh effect, the type of blurring in the background in the rendered images is different from that of the ground truth image. Instead of using gaussian blur kernels to obtain different versions of smoothed images, one can also use disk blur kernels. Table 7 shows that PSNR and SSIM scores have slightly decreased when disk blur is used instead of gaussian blur and LPIPS score slightly improves in disk blur setting. However, disk blur also
Conclusion
In this paper, an end-to-end deep learning approach for Bokeh effect synthesis is proposed. The synthesized bokeh image is rendered as a weighted sum of the input image and a number of differently smoothed images, where the corresponding weight maps are predicted by a depth-estimation network. The proposed system is trained in three phases to synthesize realistic bokeh images. It is shown through experiments that using more number of blur kernels and bigger blur kernels produce better quality
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
I would like to thank Computer Vision Lab, IIT Madras for providing GPU resources used in this work.
References (36)
- et al.
Automatic portrait segmentation for image stylization
Comput. Graph. Forum
(2016) - et al.
Synthetic depth-of-field with a single-camera mobile phone
ACM Trans. Graph.
(2018) - X. Xu, D. Sun, S. Liu, W. Ren, Y.-J. Zhang, M.-H. Yang, J. Sun, Rendering portraitures from monocular camera and...
- et al.
Depth-guided dense dynamic filtering network for bokeh effect rendering
- et al.
Aim 2019 challenge on bokeh effect synthesis: Methods and results
- Y. Luo, J. Ren, M. Lin, J. Pang, W. Sun, H. Li, L. Lin, Single view stereo matching, in: Proceedings of the IEEE...
- et al.
Single-image depth perception in the wild
- A. Atapour-Abarghouei, T.P. Breckon, Real-time monocular depth estimation using synthetic data with domain adaptation...
- C. Godard, O. Mac Aodha, G.J. Brostow, Unsupervised monocular depth estimation with left–right consistency, in:...
- C. Godard, O. Mac Aodha, M. Firman, G.J. Brostow, Digging into self-supervised monocular depth estimation, in:...
Realistic rendering of bokeh effect based on optical aberrations
Vis. Comput.
Real-time lens blur effects and focus control
ACM Trans. Graph.
Real-time depth of field rendering via dynamic light field generation and filtering
Comput. Graph. Forum
Fourier depth of field
ACM Trans. Graph.
Learning affinity via spatial propagation networks
Cited by (19)
Discriminative target predictor based on temporal-scene attention context enhancement and candidate matching mechanism
2024, Expert Systems with ApplicationsDefocus to focus: Photo-realistic bokeh rendering by fusing defocus and radiance priors
2023, Information FusionCitation Excerpt :Wadhwa et al. [1] focus on the shallow DoF effect with dual-pixel sensors in the mobile phone camera by segmenting portraits/objects and predicting depth. In addition, a blending method for depth-based bokeh rendering [2,32,39] is proposed to generate the shallow DoF based on composition of images blurred by different kernels. The blur kernel can be produced by a scatter [40] or a cluster [41] operation.
Self-supervised multi-scale pyramid fusion networks for realistic bokeh effect rendering
2022, Journal of Visual Communication and Image RepresentationCitation Excerpt :Dataset, the available training set consists of 4694 image pairs. Similar to work [17], during experiments, it is divided into two parts, in which 294 pairs are taken for evaluation and the rest 4400 pairs are for training. In order to verify the selection of kernel sizes for DefocusBlur, an ablation study is conducted.
Depth-guided deep filtering network for efficient single image bokeh rendering
2023, Neural Computing and ApplicationsDisability-First Design and Creation of A Dataset Showing Private Visual Information Collected With People Who Are Blind
2023, Conference on Human Factors in Computing Systems - ProceedingsEfficient Multi-Lens Bokeh Effect Rendering and Transformation
2023, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops