Analyzing ambiguity and word embeddings by probing semantic classes
Word embeddings have had a big impact on many applications in natural language processing (NLP) and information retrieval. It is, therefore, crucial to open the blackbox and understand their meaning representation. We propose probing tasks…
DialoGPT
The DialoGPT project establishes a foundation for building versatile open-domain chatbots that can deliver engaging and natural conversational responses across a variety of conversational topics, tasks, and information requests, without resorting to heavy hand-crafting.
Bringing the power of machine reading comprehension to specialized documents
With the advent of AI assistants, initially developed for structured databases and manually curated knowledge graphs, answers to the types of basic fact-based questions people encounter during the course of regular conversation became keystrokes or…
Multilingual Model Transfer
In this project we develop new deep learning models for bootstrapping language understanding models for languages with no labeled data using labeled data from other languages.
MASS: Masked Sequence to Sequence Pre-training for Language Generation
MASS is a novel pre-training method for sequence to sequence based language generation tasks. It randomly masks a sentence fragment in the encoder, and then predicts it in the decoder.
Phonetic matching library
A phonetic matching library. Includes text utilities to do string comparisons on phonemes (the sound of the string), as opposed to characters.