RETVec: Resilient and Efficient Text Vectorizer

Elie Bursztein; Marina Zhang; Owen Vallis; Xinyu Jia; Alexandros Kapravelos; Alexey Kurakin

Available Media
Available Media	Publication (Pdf)
Conference	Neural Information Processing Systems
Authors	Elie Bursztein , Marina Zhang , Owen Vallis , Xinyu Jia , Alexandros Kapravelos , Alexey Kurakin
Citation	Bibtex Citation @inproceedings{ BURSZTEIN2023RETVEC:,title = {RETVec: Resilient and Efficient Text Vectorizer},author = {"Elie, Bursztein" and "Marina, Zhang" and "Owen, Vallis" and "Xinyu, Jia" and "Alexandros, Kapravelos" and "Alexey, Kurakin"},booktitle = {Neural Information Processing Systems},year = {2023},organization = {NeurIPS}}

This paper describes RETVec, an efficient, resilient, and multilingual text vec-torizer designed for neural-based text processing. RETVec combines a novelcharacter encoding with an optional small embedding model to embed wordsinto a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks. In this paper, we evaluate and compare RETVec tostate-of-the-art vectorizers and word embeddings on popular model architec-tures and datasets. These comparisons demonstrate that RETVec leads to com-petitive, multilingual models that are significantly more resilient to typos andadversarial text attacks. RETVec is available under the Apache 2 license at https://github.com/google-research/retvec

anti-abuse

18.4% of us internet users got at least one of their account compromised

Blog post

Jan 2013

how-to

5 useful tips to bulletproof your credit cards against identity theft

Blog post

Jul 2016

Attacks against machine learning — an overview

Blog post

May 2018

RETVec: Resilient and Efficient Text Vectorizer

Related

18.4% of us internet users got at least one of their account compromised

5 useful tips to bulletproof your credit cards against identity theft

Attacks against machine learning — an overview

Get cutting edge research directly in your inbox.