Anime Character Recognition Using Intermediate Features Aggregation

Details

Presenter(s)

Edwin Arkel Rios

Affiliation: Affiliation

National Yang Ming Chiao Tung University
Country

View profile

Author(s)

Edwin Arkel Rios

Affiliation: Affiliation

National Yang Ming Chiao Tung University

View profile

Min-Chun Hu

Affiliation: Affiliation

National Tsing Hua University

View profile

Bo-Cheng Lai

Affiliation: Affiliation

National Yang Ming Chiao Tung University

View profile

Abstract

We study anime character recognition task. We propose a novel Intermediate Features Aggregation classification head for this task, which helps smooth the optimization landscape of Vision Transformers (ViTs) by adding skip connections between intermediate layers and the classification head, thereby improving relative classification accuracy by up to 28\\%. We conduct extensive experiments using a variety of classification models and also adapt Vision-Language Transformers (ViLT), to incorporate external tag data for classification, without additional multimodal pre-training. Our results present new insights into how hyperparameters such as input sequence length, mini-batch size, and variations on the architecture, affect this task.

Slides

Anime Character Recognition Using Intermediate Features Aggregation (application/pdf)

Download