RETVec: Resilient and Efficient Text Vectorizer

Elie Bursztein; Marina Zhang; Owen Vallis; Xinyu Jia; Alexandros Kapravelos; Alexey Kurakin

This paper describes RETVec, an efficient, resilient, and multilingual text vec-torizer designed for neural-based text processing. RETVec combines a novelcharacter encoding with an optional small embedding model to embed wordsinto a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks. In this paper, we evaluate and compare RETVec tostate-of-the-art vectorizers and word embeddings on popular model architec-tures and datasets. These comparisons demonstrate that RETVec leads to com-petitive, multilingual models that are significantly more resilient to typos andadversarial text attacks. RETVec is available under the Apache 2 license at https://github.com/google-research/retvec

Available Media	Publication (Pdf)
Conference	Neural Information Processing Systems (NeurIPS) - 2023
Authors	Elie Bursztein , Marina Zhang , Owen Vallis , Xinyu Jia , Alexandros Kapravelos , Alexey Kurakin
Citation	Bibtex Citation @inproceedings{ BURSZTEIN2023RETVEC:,title = {RETVec: Resilient and Efficient Text Vectorizer},author = {"Elie, Bursztein" and "Marina, Zhang" and "Owen, Vallis" and "Xinyu, Jia" and "Alexandros, Kapravelos" and "Alexey, Kurakin"},booktitle = {Neural Information Processing Systems},year = {2023},organization = {NeurIPS}}

Related

Attacks against machine learning — an overview

Challenges faced while training an AI to combat abuse

Why AI is the key to robust anti-abuse defenses

Get cutting edge research directly in your inbox.