Skip to main content
Video s3
    Details
    Presenter(s)
    Daniele Jahier Pagliari Headshot
    Affiliation
    Affiliation
    Politecnico di Torino
    Country
    Author(s)
    Display Name
    Yukai Chen
    Affiliation
    Affiliation
    Politecnico di Torino
    Display Name
    Roberta Chiaro
    Affiliation
    Affiliation
    Politecnico di Torino
    Display Name
    Enrico Macii
    Affiliation
    Affiliation
    Politecnico di Torino
    Display Name
    Massimo Poncino
    Affiliation
    Affiliation
    Politecnico di Torino
    Affiliation
    Affiliation
    Politecnico di Torino
    Abstract

    Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence- to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach.

    Slides
    • C-NMT: A Collaborative Inference Framework for Neural Machine Translation (application/pdf)