Skip to main content
Log in

Robust Detection and Affine Rectification of Planar Homogeneous Texture for Scene Understanding

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Man-made environments tend to be abundant with planar homogeneous texture, which manifests as regularly repeating scene elements along a plane. In this work, we propose to exploit such structure to facilitate high-level scene understanding. By robustly fitting a texture projection model to optimal dominant frequency estimates in image patches, we arrive at a projective-invariant method to localize such generic, semantically meaningful regions in multi-planar scenes. The recovered projective parameters also allow an affine-ambiguous rectification in real-world images marred with outliers, room clutter, and photometric severities. Comprehensive qualitative and quantitative evaluations are performed that show our method outperforms existing representative work for both rectification and detection. The potential of homogeneous texture for two scene understanding tasks is then explored. Firstly, in environments where vanishing points cannot be reliably detected, or the Manhattan assumption is not satisfied, homogeneous texture detected by the proposed approach is shown to provide alternative cues to obtain a scene geometric layout. Second, low-level feature descriptors extracted upon affine rectification of detected texture are found to be not only class-discriminative but also complementary to features without rectification, improving recognition performance on the 67-category MIT benchmark of indoor scenes. One of our configurations involving deep ConvNet features outperforms most current state-of-the-art work on this dataset, achieving a classification accuracy of 76.90%. The approach is additionally validated on a set of 31 categories (mostly outdoor man-made environments exhibiting regular, repeating structure), being a subset of the large-scale Places2 scene dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The symbol tilde ( \(\tilde{}\) ) is used to denote an instantaneous quantity in (Super and Bovik 1995a, b). In this paper, however, it is used to denote an estimated quantity, while the instantaneous nature is already clear by writing it as a function of \(\mathbf {x}\). As such, equality (\(=\)) is used in Eq. 10 instead of the approximate equality (\(\approx \)) appearing in (Super and Bovik 1995a, b).

  2. We define drift as deviations from the “ideal” frequencies expected to be present in a perspectively projected image of a homogeneous textured patch due to perturbations by other scene elements.

  3. For computational stability, the pixel coordinates are also normalized such that the top-left of the patch is given by (\(-\) 1,\(-\) 1) and the bottom right by (1,1).

  4. The Fourier spectrum (magnitude of the Fourier transform) of a given texture is known to be invariant to an affine transform upon normalization by its \(l_1\)-norm (Zhang and Tan 2003). Our scenario, however, concerns the frequency plane coordinates (i.e., the frequency itself), having undergone said unknown transform.

  5. For rectification experiments on cropped homogeneous texture (Sect. 7.1), a strict error tolerance of \(10^{-3}\) is used. Since RANSAC is run for a large number of iterations (50), and because multiple anisotropically scaled representations are used, the algorithm usually converges for most (if not all) of them. For experiments on detection (Sect. 7.2), however, the tolerance is relaxed to \(10^{-2}\), and the number of remaining iterations is adapted continuously based on the current proportion of outliers to speed up convergence (Fischler and Bolles 1981). While more iterations would certainly improve performance, we choose to make this trade-off since we evaluate a large number of overlapping patches.

  6. Since our detector is not “trained” to produce an exact bounding box (as we need multiple detections to cover a perspectively projected textured region whose boundaries are thus not aligned to the image axis, and we also allow multi-scale detections), we slightly differ in our definitions of these measures from object detection (Everingham et al. 2014). Object detection methodology considers any more than one detection for a given ground truth as FPs, but all such detections are considered TPs in our scenario.

  7. The evaluation presented herein can be considered as that for both detection as well as geometric class assignment (presented in Sect. 8). This is since it is really the assignment (via proposed approach) of a geometric class to a given proposal that goes on to determine the detector’s precision and recall.

  8. In principle, it is also possible to classify a given detection as frontal; if the vanishing line lies ‘far’  from the patch (based on some pre-defined threshold), it may be classified as a frontal surface exhibiting no or minute perspective distortion. Note the slope in this case is useless. In practice, however, this was observed to cause mis-classifications of planes as frontal that would otherwise be assigned to the vertical (walls) or horizontal (ceiling/floor) classes due to the ill-conditioned nature of the slope in such cases. This adversely increases false positives and decreases true positives. Moreover, since fronto-parallel planes rarely appear in this dataset, we choose, for simplicity, not to model the fronto-parallel class.

  9. Recall that for a line in the general form \(ax + by + c = 0\), the slope and y-intercept are given as \(-a/b\) and \(-c/b\), respectively. Thus, we have the slope of the vanishing line as \(-h_7/h_8\) and the intercept as \(-1/h_8\).

  10. Indeed, we found original SIFT to yield a lower performance of 59.14%, as opposed to 60.93% by RootSIFT, on the MIT Indoor67. Incidentally, it is to be compatible with (Juneja et al. 2013)(60.77%) that a denser grid spacing of 4 pixels is used in our experiments for SIFT feature extraction (though computationally expensive for rectified representation), while the other three descriptors use 8 pixels.

  11. As an aside, an accuracy of 68.57% obtained by CNN image description is very impressive, especially since the dimensionality is merely 4096 and a linear kernel SVM is used. By contrast, a Fisher encoded descriptor (Table 6a) is 204,800-dimensional, and also needs a non-linear kernel (Hellinger mapping) to achieve an accuracy that is still significantly lower than CNN! Clearly, CNNs are able to produce a very low-dimensional, highly discriminative, invariant and powerful representation for a given image. Comparing with previous works employing off-the-shelf ConvNet models to compute a single image-level descriptor and linear SVM classification, the performance obtained here is slightly higher than by (Cimpoi et al. 2015) (FC-CNN in Table 5(a), 67.6% using the VGG-M pre-trained model), and slightly lower than by (Razavian et al. 2014) (CNNaug-SVM, 69% using the OverFeat model, but with additional augmented training images).

  12. In order to ascertain whether the improvement is indeed due to inclusion of rectified features in the image representation, and not simply due to the incorporation of multiscale features, we have performed an experiment wherein an image representation is constructed consisting of element-wise max-pooled CNN features extracted from the same patches as detected for the rectified CNN representation (recall the patches are detected at multiple scales as described in Sect. 7.2). We call this non-rectified representation \(\hbox {CNN}^{\prime }\). A classification performance of 60.37% is obtained, which is similar to that obtained by CNN_Rect(max) (Table 6b). When combining \(\hbox {CNN}^{\prime }\) with CNN, a performance of 72.16% is obtained, similar to CNN \(+\) CNN_Rect (max). Finally, when combining \(\hbox {CNN}^{\prime }\) with both CNN and CNN_Rect(max), a performance of 75.08% is observed. From this experiment, we conclude that explicit rectification does introduce additional and complementary CNN features that are not otherwise present in a non-rectified representation. Therefore, where augmenting data with multiscale information (Cimpoi et al. 2015), or with rotated and cropped examples (Razavian et al. 2014) is known to yield improved deep image representations, our work demonstrates that rectification can help as well.

References

  • Ahmad, S., & Cheong, L.-F. (2016). Facilitating and exploring planar homogeneous texture for indoor scene understanding. In Proceedings of European conference on computer vision (pp. 35–51).

  • Aiger, D., Cohen-Or, D., & Mitra, N. J. (2012). Repetition maximization based texture rectification. Computer Graphics Forum (EUROGRAPHICS), 31(2.2), 439–448.

    Article  Google Scholar 

  • Arandjelovi, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2911–2918).

  • Bappy, J. H., & Roy-Chowdhury, A. K. (2016). Inter-dependent CNNs for joint scene and object recognition. In Proceedings of international conference on pattern recognition.

  • Boureau, Y.-L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2559–2566).

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  • Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In Proceedings of British machine vision conference (pp. 76.1–76.12).

  • Chum, O., & Matas, J. (2010). Planar affine rectification from change of scale. In Proceedings of Asian conference on computer vision (pp. 347–360).

  • Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Proceedings IEEE conference on computer vision and pattern recognition (pp. 3606–3613).

  • Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recongition and segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3828–3836).

  • Collins, T., Durou, J., Gurdjos, P., & Bartoli, A. (2010). Single-view perspective shape-from-texture with focal length estimation: A piecewise affine approach. In Proceedings of 3D data processing, visualization and transmission (3DPVT).

  • Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by Bayesian inference. In Proceedings of IEEE international conference on computer vision (pp. 941–947).

  • Criminsi, A., & Zisserman, A. (2000). Shape from texture: Homogeneity revisited. In Proceedings of British machine vision conference (p. 8291).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE international conference on computer vision (pp. 886–893).

  • Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In Proceedings of neural information processing systems (pp. 494–502).

  • Donahue*, J., Jia*, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E. & Darrell, T. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of international conference on machine learning. ( * = equal contribution).

  • Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of IEEE international conference on computer vision.

  • Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2014). The PASCAL visual object classes challenge: A retrospective. IJCV, 111(1), 98–136.

    Article  Google Scholar 

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.

    Article  MathSciNet  Google Scholar 

  • Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European conference on computer vision (pp. 392–407).

  • Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press. ISBN: 0521540518.

    Book  MATH  Google Scholar 

  • Havlicek, J. P., Bovik, A. C., & Maragos, P. (1992). Modulation models for image processing and wavelet-based image demodulation. In Proceedings of Asilomar conference on signals, systems and computers (pp. 805–810).

  • Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In Proceedings of IEEE international conference on computer vision (pp. 1849–1856).

  • Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.

    Article  MATH  Google Scholar 

  • Hong, W., Yang, A. Y., Huang, K., & Ma, Y. (2004). On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image. International Journal of Computer Vision, 60(3), 241–265.

    Article  Google Scholar 

  • Huang, Y., Wu, Z., Wang, L., & Tan, T. (2014). Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 493–506.

    Article  Google Scholar 

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093. http://caffe.berkeleyvision.org/.

  • Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 923–930).

  • Kosecka, J., & Zhang, W. (2003). Extraction, matching and pose recovery based on dominant rectangular structures. In First IEEE international workshop on higher-level knowledge in 3D modeling and motion analysis (pp. 83–91).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of neural information processing systems (pp. 1097–1105).

  • Krumm, J., & Shafer, S. (1992). Shape from periodic texture using the spectrogram. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 284–289).

  • Kulkarni, P., Jurie, F., Zepeda, J., Prez, P., & Chevallier, L. (2016). SPLeaP: Soft pooling of learned parts for image classification. In Proceedings of European conference on computer vision (pp. 329–345).

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2169–2178).

  • Leung, T., & Malik, J. (1996). Detecting, localizing and grouping repeated scene elements from an image. In Proceedings of European conference on computer vision (pp. 546–555).

  • Lian, X.-C., Li, Z., Lu, B.-L., & Zhang, L. (2010). Max-margin dictionary learning for multiclass image categorization. In Proceedings of European conference on computer vision (pp. 157–170).

  • Lin, D., Lu, C., Liao, R., & Jia, J. (2014). Learning important spatial pooling regions for scene classification. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3726–3733).

  • Liu, X., Veksler, O., & Samarabandu, J. (2010). Order-preserving moves for graph-cut-based optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1182–1196.

    Article  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., et al. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.

    Article  Google Scholar 

  • Ojala, T., Pietikinen, M., & Menp, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.

    Article  Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Pandey, M., & Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of IEEE international conference on computer vision (pp. 1307–1314).

  • Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The SUN attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1), 59–81.

    Article  Google Scholar 

  • Petkov, N., & Kruizinga, P. (1997). Computational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: Bar and grating cells. Biological Cybernetics, 76(2), 83–96.

    Article  MATH  Google Scholar 

  • Picard, R. W. (2010). A society of models for video and image libraries. IBM Systems Journal, 35(3.4), 292–312.

    Article  Google Scholar 

  • Pritts, J., Chum, O., & Matas, J. (2014). Detection, rectification and segmentation of coplanar repeated patterns. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2973–2980).

  • Qi, M., & Wang, Y. (2016). Deep-CSSR: Scene classification using category-specific salient region with deep features. In Proceedings of international conference on image processing.

  • Quan, Y., Xu, Y., Sun, Y., & Luo, Y. (2014). Lacunarity analysis on image patterns for texture classification. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 160–167).

  • Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 413–420).

  • Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of IEEE conference on computer vision and pattern recognition workshop (pp. 512–519).

  • Renninger, L. W., & Malik, J. (2004). When is scene identification just texture recognition? Vision Research, 44(19), 2301–2311.

    Article  Google Scholar 

  • Ribeiro, E., & Hancock, E. R. (2000). Estimating the 3d orientation of texture planes using local spectral analysis. Image and Vision Computing, 18(8), 619–631.

    Article  Google Scholar 

  • Rosenholtz, R., & Malik, J. (1997). Surface orientation from texture: Isotropy or homogeneity (or both)? Vision Resarch, 37(16), 2283–2293.

    Article  Google Scholar 

  • Rother, C. (2000). A new approach for vanishing point detection in architectural environments. In Proceedings of British machine vision conference (pp. 382–391).

  • Russakovsky*, O., Deng*, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. ( * = equal contribution).

  • Schaffalitzky, F., & Zisserman, A. (1998). Geometric grouping of repeated elements within images. In Proceedings of British machine vision conference (pp. 165–181).

  • Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of international conference on learning representations. http://cilvr.nyu.edu/doku.php?id=software:overfeat:start.

  • Shaw, D., & Barnes, N. (2006). Perspective rectangle detection. In Proceedings of European conference on computer vision workshop on applications of computer vision.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations.

  • Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In Proceedings of European conference on computer vision (pp. 73–86).

  • Stella, X. Y., Zhang, H., & Malik, J. (2008). Inferring spatial layout from a single image via depth-ordered grouping. In IEEE conference on computer vision and pattern recognition workshop (pp. 1–7).

  • Super, B. J., & Bovik, A. C. (1991). Three-dimensional orientation from texture using gabor wavelets. In Proceedings of SPIE visual communications and image processing ’91: Image processing.

  • Super, B. J., & Bovik, A. C. (1995a). Planar surface orientation from texture spatial frequencies. Pattern Recognition, 28(5), 729–743.

    Article  Google Scholar 

  • Super, B. J., & Bovik, A. C. (1995b). Shape from texture using local spectral moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4), 333–343.

    Article  Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1–9).

  • Tuytelaars, T., Turina, A., & Gool, L. V. (2003). Noncombinatorial detection of regular repetitions under perspective skew. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4), 418–432.

    Article  Google Scholar 

  • Varma, M., & Zisserman, A. (2002). Classifying images of materials: Achieving viewpoint and illumination independence. In Proceedings of European conference on computer vision (pp. 255–271).

  • Vedaldi, A., & Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.

  • Vedaldi, A., & Lenc, K. (2015). MatConvNet—convolutional neural networks for MATLAB. In Proceedings of ACM international conference on multimedia. http://www.vlfeat.org/matconvnet/.

  • Wu, C., Frahm, J.-M., & Pollefeys, M. (2010). Detecting large repetitive structures with salient boundaries. In Proceedings of European conference on computer vision (pp. 142–155).

  • Wu, C., Frahm, J.-M., & Pollefeys, M. (2011). Repetition-based dense single-view reconstruction. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3113–3120).

  • Wu, J., & Rehg, J. M. (2011). CENTRIST: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1489–1501.

    Article  Google Scholar 

  • Wu, R., Wang, B., Wang, W., & Yu, Y. (2015). Harvesting discriminative meta objects with deep CNN features for scene classification. In Proceedings of IEEE international conference on computer vision.

  • Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., & Oliva, A. (2016). SUN database: Exploring a large collection of scene categories. International Journal of Computer Vision, 119, 3–22.

    Article  MathSciNet  Google Scholar 

  • Xie, L., Wang, J., Guo, B., Zhang, B., & Tian, Q. (2014). Orientational pyramid matching for recognizing indoor scenes. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3734–3741).

  • Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3517–3524).

  • Zhang, J., & Tan, T. (2003). Affine invariant classification and retrieval of texture images. Pattern Recognition, 36(3), 657–664.

    Article  Google Scholar 

  • Zhang, J., Marszaek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

  • Zhang, Z. (1998). Determining the epipolar geometry and its uncertainty: A review. International Journal of Computer Vision, 27(2), 161–195.

    Article  Google Scholar 

  • Zhang, Z., Liang, X., Ganesh, A., & Ma, Y. (2010). TILT: Transform invariant low-rank textures. In Proceedings of Asian conference on computer vision (pp. 314–328).

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using Places Database. In Proceedings of neural information processing systems.

  • Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., & Oliva, A. (2016). Places: An image database for deep scene understanding. arXiv preprint. http://places2.csail.mit.edu/.

  • Zuo, Z., Wang, G., Shuai, B., Zhao, L., Yang, Q., & Jiang, X. (2014). Learning discriminative and shareable features for scene classification. In Proceedings of European conference on computer vision (pp. 552–568).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahzor Ahmad.

Additional information

Communicated by Larry Davis.

This work was carried out when Shahzor Ahmad was a graduate student at NUS.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmad, S., Cheong, LF. Robust Detection and Affine Rectification of Planar Homogeneous Texture for Scene Understanding. Int J Comput Vis 126, 822–854 (2018). https://doi.org/10.1007/s11263-018-1078-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1078-2

Keywords

Navigation