theme image
RETVec: Resilient and Efficient Text Vectorizer RETVec: Resilient and Efficient Text Vectorizer
  1. publications
  2. AI

RETVec: Resilient and Efficient Text Vectorizer

Available Media

Publication (Pdf)

Conference Neural Information Processing Systems (NeurIPS) - 2023
Authors Elie Bursztein , Marina Zhang , Owen Vallis ,
Citation

Bibtex Citation

@inproceedings{ BURSZTEIN2023RETVEC:,title = {RETVec: Resilient and Efficient Text Vectorizer},author = {"Elie, Bursztein" and "Marina, Zhang" and "Owen, Vallis" and "Xinyu, Jia" and "Alexandros, Kapravelos" and "Alexey, Kurakin"},booktitle = {Neural Information Processing Systems},year = {2023},organization = {NeurIPS}}

This paper describes RETVec, an efficient, resilient, and multilingual text vec-torizer designed for neural-based text processing. RETVec combines a novelcharacter encoding with an optional small embedding model to embed wordsinto a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks. In this paper, we evaluate and compare RETVec tostate-of-the-art vectorizers and word embeddings on popular model architec-tures and datasets. These comparisons demonstrate that RETVec leads to com-petitive, multilingual models that are significantly more resilient to typos andadversarial text attacks. RETVec is available under the Apache 2 license at https://github.com/google-research/retvec

Related

newsletter signup slide

Get cutting edge research directly in your inbox.

newsletter signup slide

Get cutting edge research directly in your inbox.