Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | IEEE Journals & Magazine | IEEE Xplore