Downloads
Abstract
Pyramidal Residual Network achieved high accuracy in image classification tasks. However, there is no previous work on sequence recognition tasks using this model. We presented how to extend its architecture to form Dilated Pyramidal Residual Network (DPRN), for this long-standing research topic and evaluate it on the problems of automatic speech recognition and optical character recognition. Together, they formed a multi-modal video retrieval framework for Vietnamese Broadcast News. Experiments were conducted on caption images and speech frames extracted from VTV broadcast videos. Results showed that DPRN was not only end-to-end trainable but also performed well in sequence recognition tasks.
Issue: Vol 2 No 5 (2018)
Page No.: 138-143
Published: Jul 2, 2019
Section: Original Research
DOI: https://doi.org/10.32508/stdjns.v2i5.789
Download PDF = 221 times
Total = 221 times