Open Access

Downloads

Download data is not yet available.

Abstract

Pyramidal Residual Network achieved high accuracy in image classification tasks. However, there is no previous work on sequence recognition tasks using this model. We presented how to extend its architecture to form Dilated Pyramidal Residual Network (DPRN), for this long-standing research topic and evaluate it on the problems of automatic speech recognition and optical character recognition. Together, they formed a multi-modal video retrieval framework for Vietnamese Broadcast News. Experiments were conducted on caption images and speech frames extracted from VTV broadcast videos. Results showed that DPRN was not only end-to-end trainable but also performed well in sequence recognition tasks.



Author's Affiliation
Article Details

Issue: Vol 2 No 5 (2018)
Page No.: 138-143
Published: Jul 2, 2019
Section: Original Research
DOI: https://doi.org/10.32508/stdjns.v2i5.789

 Copyright Info

Creative Commons License

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

 How to Cite
La, A., Nguyen, D., Pham, N., & Vu, Q. (2019). Multi-modal video retrieval using Dilated Pyramidal Residual network. Science & Technology Development Journal: Natural Sciences, 2(5), 138-143. https://doi.org/https://doi.org/10.32508/stdjns.v2i5.789

 Cited by



Article level Metrics by Paperbuzz/Impactstory
Article level Metrics by Altmetrics

 Article Statistics
HTML = 1510 times
Download PDF   = 221 times
Total   = 221 times