VQA Introspect
The VQA-Introspect dataset consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split of the…
Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
The VQA-Introspect dataset consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split of the…
This repository contains source code necessary to reproduce the results presented in the paper Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. We propose a new cross-modal pre-training method Oscar (Object-Semantics Aligned Pre-training). It leverages object…
This repository contains the code for reproducing the quantitative experiments in our publication “Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations.”
This data was collected for and used in our ACL 2020 paper that demonstrates the potential to effectively combine explanations and demonstrations to learn web-based procedures. This data consists of 520 explanations and corresponding demonstrations…
SPLASH is dataset for the task of semantic parse correction with natural language feedback. The task, dataset along with baseline results are presented in: Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback Ahmed…
Using machine learning to detect beluga whale calls in hydrophone recordings. Of the five populations of beluga whales in Alaska, the Cook Inlet population is the smallest and has declined by about seventy-five percent since…
Code accompanying “Conservative Uncertainty Estimation By Fitting Prior Networks” – ICLR 2020
VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be fine-tuned for various down-stream visual-linguistic tasks, such as Visual…
This repository implements Ranking-Critical Training (RaCT) for Collaborative Filtering, accepted in International Conference on Learning Representations (ICLR), 2020. By using an actor-critic architecture to fine-tune a differentiable collaborative filtering model, we can improve the performance…