Multi-modal video retrieval using Dilated Pyramidal Residual network

An Ngoc Thuy La; Dat Phuoc Nguyen; Nhut Minh Pham; Quan Hai Vu

doi:10.32508/stdjns.v2i5.789

VNUHCM Journal of

Natural Sciences

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam

ISSN 2588-106X

Skip to main content Skip to main navigation menu Skip to site footer

Original Research

Download PDF

HTML

1591

Total

277

Citations

Share

Multi-modal video retrieval using Dilated Pyramidal Residual network

An Ngoc Thuy La

Dat Phuoc Nguyen

Nhut Minh Pham

Quan Hai Vu

Open Access

Abstract

Pyramidal Residual Network achieved high accuracy in image classification tasks. However, there is no previous work on sequence recognition tasks using this model. We presented how to extend its architecture to form Dilated Pyramidal Residual Network (DPRN), for this long-standing research topic and evaluate it on the problems of automatic speech recognition and optical character recognition. Together, they formed a multi-modal video retrieval framework for Vietnamese Broadcast News. Experiments were conducted on caption images and speech frames extracted from VTV broadcast videos. Results showed that DPRN was not only end-to-end trainable but also performed well in sequence recognition tasks.

Comments

VNUHCM Journal of

Natural Sciences

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam

ISSN 2588-106X

HTML

1591

Total

277

Citations

Share

Multi-modal video retrieval using Dilated Pyramidal Residual network

An Ngoc Thuy La

Dat Phuoc Nguyen

Nhut Minh Pham

Quan Hai Vu

Abstract

An Ngoc Thuy La

Dat Phuoc Nguyen

Nhut Minh Pham

Quan Hai Vu

Downloads

INFORMATION

FOR AUTHORS

CONTACT US

VNUHCM Journal of

Natural Sciences

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam

ISSN 2588-106X

HTML1591 Total 277 Citations Share Multi-modal video retrieval using Dilated Pyramidal Residual network

An Ngoc Thuy La Dat Phuoc Nguyen Nhut Minh Pham Quan Hai Vu

Abstract

INFORMATION

FOR AUTHORS

CONTACT US

HTML

1591

Total

277

Citations

Share

Multi-modal video retrieval using Dilated Pyramidal Residual network

An Ngoc Thuy La

Dat Phuoc Nguyen

Nhut Minh Pham

Quan Hai Vu