{"id":664554,"date":"2020-06-11T11:01:36","date_gmt":"2020-06-11T18:01:36","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=664554"},"modified":"2020-06-22T14:13:04","modified_gmt":"2020-06-22T21:13:04","slug":"xglue-expanding-cross-lingual-understanding-and-generation-with-tasks-from-real-world-scenarios","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/xglue-expanding-cross-lingual-understanding-and-generation-with-tasks-from-real-world-scenarios\/","title":{"rendered":"XGLUE: Expanding cross-lingual understanding and generation with tasks from real-world scenarios"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-665691 size-full\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/1400x788_Xglue_NoLogo__updated.gif\" alt=\"\" width=\"1400\" height=\"788\" \/><\/p>\n<p>What we can teach a model to do with natural language is dictated by the availability of data. Currently, we have a lot of labeled data for very few languages, making it difficult to train models to accomplish question answering, text summarization, and other tasks in every language and ultimately limiting the amount of people who can benefit from advanced AI systems. In natural language processing (NLP), low-resource learning\u2014scenarios in which a model is trained using very little training data or data without examples of the type of data it will encounter in testing\u2014is an active challenge. Cross-lingual transfer learning is a type of low-resource learning that trains a model with data in one language, such as English, and tests the model on the same task in different languages. With cross-lingual transfer capability, we could leverage the rich resources of a few languages to build NLP services for all the languages in the world.<\/p>\n<p>Recently, pre-trained models such as <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/unicoder-a-universal-language-encoder-by-pre-training-with-multiple-cross-lingual-tasks\/\">Unicoder<\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/google-research\/bert\/blob\/master\/multilingual.md\">M-BERT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/facebookresearch\/XLM\"> XLM<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> have been developed to learn multilingual representations for cross-lingual and multilingual tasks. By performing masked language model, translation language model, and other bilingual pre-training tasks on multilingual and bilingual corpora with shared vocabulary and weights for multiple languages, these models obtain surprisingly good cross-lingual capability. However, the community still lacks benchmark datasets to evaluate such capability. The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/cims.nyu.edu\/~sbowman\/xnli\/\">Cross-Lingual Natural Language Inference (XNLI) corpus<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is the most used cross-lingual benchmark for these models, but its evaluation scenario\u2014natural language inference\u2014is too simple to cover various real-world cross-lingual tasks.<\/p>\n<p>To help the research community further advance language-agnostic models and make AI systems more inclusive, we introduce \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/xglue-a-new-benchmark-dataset-for-cross-lingual-pre-training-understanding-and-generation\/\">XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation.<\/a>\u201d With its training data available only in English, XGLUE\u2019s 11 downstream tasks test a model\u2019s zero-shot cross-lingual transfer capability\u2014that is, its ability to transfer what it learned in English to the same task in other languages. In all, XGLUE covers 19 languages, including Italian, Portuguese, Russian, Swahili, and Urdu. XGLUE comprises both cross-lingual natural language understanding (NLU) tasks <em>and<\/em> cross-lingual natural language generation (NLG) tasks and offers six <em>new<\/em> tasks used in creating and evaluating search engine and news site scenarios, unique features that set XGLUE apart from existing NLP datasets. We also extended the universal language encoder Unicoder for a baseline, introducing two different generation pre-training tasks to accommodate NLG.<\/p>\n<p style=\"margin: 0in; font-family: Calibri; font-size: 11.0pt;\">\t<div data-wp-context='{\"items\":[]}' data-wp-interactive=\"msr\/accordion\">\n\t\t\t\t\t<div class=\"clearfix\">\n\t\t\t\t<div\n\t\t\t\t\tclass=\"btn-group align-items-center mb-g float-sm-right\"\n\t\t\t\t\tdata-bi-aN=\"accordion-collapse-controls\"\n\t\t\t\t>\n\t\t\t\t\t<button\n\t\t\t\t\t\tclass=\"btn btn-link m-0\"\n\t\t\t\t\t\tdata-bi-cN=\"Expand all\"\n\t\t\t\t\t\tdata-wp-bind--aria-controls=\"state.ariaControls\"\n\t\t\t\t\t\tdata-wp-bind--aria-expanded=\"state.ariaExpanded\"\n\t\t\t\t\t\tdata-wp-bind--disabled=\"state.isAllExpanded\"\n\t\t\t\t\t\tdata-wp-class--inactive=\"state.isAllExpanded\"\n\t\t\t\t\t\tdata-wp-on--click=\"actions.onExpandAll\"\n\t\t\t\t\t\ttype=\"button\"\n\t\t\t\t\t>\n\t\t\t\t\t\tExpand all\t\t\t\t\t<\/button>\n\t\t\t\t\t<span aria-hidden=\"true\"> | <\/span>\n\t\t\t\t\t<button\n\t\t\t\t\t\tclass=\"btn btn-link m-0\"\n\t\t\t\t\t\tdata-bi-cN=\"Collapse all\"\n\t\t\t\t\t\tdata-wp-bind--aria-controls=\"state.ariaControls\"\n\t\t\t\t\t\tdata-wp-bind--aria-expanded=\"state.ariaExpanded\"\n\t\t\t\t\t\tdata-wp-bind--disabled=\"state.isAllCollapsed\"\n\t\t\t\t\t\tdata-wp-class--inactive=\"state.isAllCollapsed\"\n\t\t\t\t\t\tdata-wp-on--click=\"actions.onCollapseAll\"\n\t\t\t\t\t\ttype=\"button\"\n\t\t\t\t\t>\n\t\t\t\t\t\tCollapse all\t\t\t\t\t<\/button>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t\t<ul class=\"msr-accordion\">\n\t\t\t\t\t\t\t style=\"margin: 0in; font-family: Calibri; font-size: 11.0pt;\">\t<li class=\"m-0\" data-wp-context='{\"id\":\"accordion-content-2\"}' data-wp-init=\"callbacks.init\">\n\t\t<div class=\"accordion-header\">\n\t\t\t<button\n\t\t\t\taria-controls=\"accordion-content-2\"\n\t\t\t\tclass=\"btn btn-collapse\"\n\t\t\t\tdata-wp-bind--aria-expanded=\"state.isExpanded\"\n\t\t\t\tdata-wp-on--click=\"actions.onClick\"\n\t\t\t\tid=\"accordion-button-1\"\n\t\t\t\ttype=\"button\"\n\t\t\t>\n\t\t\t\tLanguages covered by XGLUE tasks\t\t\t<\/button>\n\t\t<\/div>\n\t\t<div\n\t\t\taria-labelledby=\"accordion-button-1\"\n\t\t\tclass=\"msr-accordion__content\"\n\t\t\tdata-wp-bind--inert=\"!state.isExpanded\"\n\t\t\tdata-wp-run=\"callbacks.run\"\n\t\t\tid=\"accordion-content-2\"\n\t\t>\n\t\t\t<div class=\"msr-accordion__body\">\n\t\t\t\t<\/p>\n<p>The 19 languages covered by XGLUE\u2019s 11 tasks, broken down by task. Asterisks denote the new understanding and generation tasks offered by the dataset.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-665721 \" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-XGLUE-languages-table.jpg\" alt=\"\" width=\"653\" height=\"495\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-XGLUE-languages-table.jpg 553w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-XGLUE-languages-table-300x227.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-XGLUE-languages-table-80x60.jpg 80w\" sizes=\"auto, (max-width: 653px) 100vw, 653px\" \/><\/p>\n<p style=\"margin: 0in; font-family: Calibri; font-size: 11.0pt;\">\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/li>\n\t<\/p><p style=\"margin: 0in; font-family: Calibri; font-size: 11.0pt;\"\t\t\t\t\t<\/ul>\n\t<\/div>\n\t<\/p>\n<p>We see XGLUE as an important tool in helping researchers and developers ensure that access to advanced AI systems isn\u2019t limited by the language an individual speaks. That includes those AI systems being created through <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/ai-at-scale\/\">AI at Scale<\/a>, the Microsoft initiative driving next-generation AI capabilities.<\/p>\n<div id=\"attachment_664557\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-664557\" class=\"wp-image-664557 size-large\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-1-_-XGLUE-1024x348.png\" alt=\"\" width=\"1024\" height=\"348\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-1-_-XGLUE-1024x348.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-1-_-XGLUE-300x102.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-1-_-XGLUE-768x261.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-1-_-XGLUE.png 1430w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-664557\" class=\"wp-caption-text\">Figure 1: The new XGLUE benchmark dataset has 11 tasks, including four new NLU tasks and two new NLG tasks, denoted in the above table with an asterisk. For each task, the training set is available only in English. The third column in the table is the number of labeled instances in the training set. The fourth and fifth columns are the average numbers of labeled instances in the development sets and test sets, respectively.<\/p><\/div>\n<h3>New tasks\u2014including NLG tasks\u2014from real-world scenarios<\/h3>\n<p>XGLUE includes five existing NLU tasks\u2014name-entity recognition (NER), part-of-speech tagging (POS), machine reading comprehension (MLQA dataset), paraphrase classification (PAWS-X dataset), and XNLI\u2014but it\u2019s the dataset\u2019s newest tasks that give researchers an opportunity to evaluate a model\u2019s real-world potential:<\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><strong>News Classification (NC):<\/strong> A model is tasked with identifying an article\u2019s news category, such as whether it\u2019s sports, entertainment, or world news; languages: English, French, German, Spanish, and Russian<\/li>\n<li><strong>Query-Ad Matching (QADSM)<\/strong>: A model determines whether an ad recommendation is \u201cgood\u201d or \u201cbad\u201d given a query; languages: English, French, and German<\/li>\n<li><strong>Web Page Ranking (WPR):<\/strong> Based on a scale of 0 (bad) to 4 (perfect), a model ranks the relevance of results given a query; languages: English, French, German, Spanish, Italian, Portuguese, and Chinese<\/li>\n<li><strong>QA Matching (QAM):<\/strong> A model is tasked with determining if a passage is relevant to a given query; languages: English, French, and German<\/li>\n<li><strong>Question Generation (QG):<\/strong> Given a passage a model provides a query; languages: English, French, German, Spanish, Italian, and Portuguese<\/li>\n<li><strong>News Title Generation (NTG):<\/strong> A model creates a headline based on a news article; languages: English, French, German, Spanish, and Russian<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Data for XGLUE\u2019s news classification and news title generation tasks originated from the Microsoft news website Microsoft News, while data for the remaining tasks came from the Microsoft search engine Bing. Privacy-preserving steps were taken in the collection of the data, including the removal of any data that could potentially contain personally identifiable information.<\/p>\n<p>Collectively, these six new tasks represent much of what today\u2019s commercial search engines do and capture the user experience, providing a true test of how well a model generalizes across NLU and NLG tasks and demonstrating more concretely how a model can impact people down the line. That could mean improved direct answers to search queries, saving people the time of having to sift through pages of search results, or news organized in a way that lets people find what they want to read about more easily.<\/p>\n<p>The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/XGLUE\">XGLUE dataset<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/Unicoder\">example code for running the XGLUE baseline<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> are available on GitHub. Those interested in sharing their results can do so via the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/microsoft.github.io\/XGLUE\/\">XGLUE leaderboard<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<div id=\"attachment_665676\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-665676\" class=\"wp-image-665676 size-large\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-1-1024x508.jpg\" alt=\"A figure shows the training procedure of four tasks used in pre-training Unicoder for cross-lingual understanding tasks; Unicoder is labeled as having 12 layers and a shared vocabulary size of 250,000 across 100 languages. For the masked language model task, the sentence \u201cthis is an example\u201d becomes \u201cthis is [MASK] [MASK],\u201d and Unicoder predicts the masked words are \u201can\u201d and \u201cexample.\u201d The translation language model task combines a bilingual sentence pair\u2014\u201cthis is an example\u201d in English and Chinese\u2014and then masks words, which Unicoder then predicts. Contrastive learning combines the sentence pair \u201cThis is an example\u201d and its Chinese equivalent, and the model determines whether they have the same meaning. In the cross-lingual word recovery task, the sentence pair is represented by a new generated word representation sequence, from which Unicoder recovers all the words. \" width=\"1024\" height=\"508\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-1-1024x508.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-1-300x149.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-1-768x381.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-1.jpg 1424w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-665676\" class=\"wp-caption-text\">Figure 2: Researchers used a simplified version of the universal language encoder Unicoder for an XGLUE baseline. The original Unicoder is pre-trained for cross-lingual understanding tasks using masked language model, translation language model, contrastive learning, and cross-lingual word recovery (above). Each column shows an example for each NLU task, respectively. For XGLUE, Unicoder is pre-trained by masked language model and translation language model only.<\/p><\/div>\n<h3>Cross-lingual pre-training in Unicoder<\/h3>\n<p>For our baseline, we chose Unicoder, which we introduced at the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.emnlp-ijcnlp2019.org\/\">2019 Conference on Empirical Methods in Natural Language Processing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. For cross-lingual NLU tasks, Unicoder is pre-trained using both multilingual and bilingual corpora by the following tasks (Figure 2 above): <em>masked language model<\/em> and <em>translation language model<\/em>, which predict each masked token or phrase based on monolingual context and bilingual context respectively; <em>contrastive learning<\/em>, which predicts whether a bilingual word\/phrase\/sentence pair is a translation pair; and <em>cross-lingual word recovery<\/em>, which predicts the original source sequence by its vector representations generated based on the token representations of its target translation sequence. For XGLUE, we used a simplified version of Unicoder, pre-training it by masked language model and translation language model only.<\/p>\n<p><div id=\"attachment_665679\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-665679\" class=\"wp-image-665679 size-large\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-2-1024x445.jpg\" alt=\"A flowchart depicts the process of extending Unicoder for cross-lingual generation tasks; Unicoder is labeled as having 12 layers and a shared vocabulary size of 250,000 across 100 languages. The sentence \u201cThis could be a sentence in any language\u201d is corrupted via one of four text noising methods: sentence permutation (\u201ccould this be sentence a in . any language\u201d); token deletion (\u201cthis be a in any language\u201d); token masking (\u201c[MASK] could be a [MASK] in any [MASK] .\u201d); or text infilling (\u201cthis could be [MASK] in [MASK] .\u201d). The corrupted sentence is input into the Unicoder encoder. The sentence moves through the decoder, which uses one of the two text denoising methods\u2014xDAE or xFNP\u2014to generate the original sentence. A figure representing xDAE shows the decoder generating a single token at each time step. A figure representing xFNP shows the decoder generating multiple tokens each step. \" width=\"1024\" height=\"445\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-2-1024x445.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-2-300x130.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-2-768x334.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/Unicoder-Updated-Fig-2.jpg 1435w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-665679\" class=\"wp-caption-text\">Figure 3: Researchers extended Unicoder for cross-lingual generation tasks using two generation tasks during pre-training: multilingual Denoising Auto-Encoding (xDAE) and multilingual Future N-gram Prediction (xFNP). In pre-training Unicoder, a text noising approach is used to corrupt a sentence, which is then used as the input of the Unicoder encoder. Then, the decoder attempts to generate the original input sequence based on its corrupted form. Researchers tried four different text noising methods and the two different text denoising methods.<\/p><\/div>Using an encoder-decoder architecture, we extend the original Unicoder to cross-lingual NLG tasks (Figure 3 above) by introducing generation tasks into the pre-training stage: <em>multilingual Denoising Auto-Encoding (xDAE)<\/em> and <em>multilingual Future N-gram Prediction (xFNP)<\/em>. The task of xDAE is to predict the original input text given its corrupted form, generated by text noising methods, which randomly mask, delete, or reorder words. In xDAE, the decoder generates a single token at each time step. The task of xFNP is to predict each masked span, or series of words, given the corrupted form of the input text, generated by span-based masking, which masks several successive words at once rather than a single word. In xFNP, the decoder generates multiple tokens simultaneously at each time step. Based on the already generated context and the most possible future tokens, xFNP can select the best token for the current time step. xFNP is a multilingual version of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2001.04063\">ProphetNet<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Both <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/prophetnet\">xFNP and ProphetNet are available on GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<h3><\/h3>\n<h3>Evaluation results on XGLUE<\/h3>\n<p>We evaluate Unicoder and two other recent cross-lingual pre-trained models, M-BERT and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/pytorch\/fairseq\/tree\/master\/examples\/xlmr?utm_source=-facebook&utm_medium=PyTorch&utm_campaign=organic&utm_content=post-url&utm_offering=artificial-intelligence&utm_product=fairseq_122719&fbclid=IwAR0InosCpFtkT1bHyjc3SU9TBWdTXnKNcmHAglDV_ZOP0cQX626OsPUQAvg\">XLM-RoBERTa (XLM-R)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, on XGLUE. For each task, the models are fine-tuned on its task-specific English data and then applied to all test sets, which are in a variety of languages, including English, aiming to evaluate each model\u2019s zero-shot cross-lingual transfer capability. For the cross-lingual NLU tasks, Unicoder performs slightly better than M-BERT and XLM-R because it\u2019s pre-trained using both multilingual and bilingual corpora; the other two models are pre-trained using a multilingual corpus only. For the cross-lingual NLG tasks, Unicoder performs significantly better than M-BERT and XLM-R because it introduces generation tasks into the pre-training stage; the other two models are fine-tuned on these two downstream tasks directly without pre-training.<\/p>\n<p>We also investigated the impacts of different fine-tuning strategies:<\/p>\n<ol>\n<li><em>Pivot-language fine-tuning<\/em>, which fine-tunes a pre-trained model on its labeled data in one language, referred to here as a pivot language, and evaluates the model on test sets in different languages. Interestingly, Spanish, Greek, and Turkish, rather than English, proved to be the most effective pivot languages on XNLI. This phenomenon shows a possibility to further improve the average performance of a cross-lingual pre-trained model by using different pivot languages depending on the downstream task.<\/li>\n<li><em>Multi-language fine-tuning<\/em>, which fine-tunes a pre-trained model on a combination of available labeled data in different languages. By doing this, significant gains can be obtained on different downstream tasks such as XNLI and NTG.<\/li>\n<li><em>Multi-task fine-tuning,<\/em> which fine-tunes a pre-trained model for multiple downstream tasks on the tasks\u2019 combined English labeled data. The success of joint fine-tuning varied by task. Further investigation is needed to better understand the relationships between different tasks and how they can improve fine-tuning.<\/li>\n<\/ol>\n<h3>Looking forward<\/h3>\n<p>With XGLUE, we seek to leverage the abundance of rich training data in English to support the development of models that can be applied to all languages\u2014models that have a truly universal language representation. Moving forward, we\u2019ll extend XGLUE to more languages and downstream tasks while continuing to push forward cross-lingual pre-trained models by exploring new model structures, introducing new pre-training tasks, using different types of data, and expanding cross-lingual pre-training to other modalities such as images and videos.<\/p>\n<p><em>Acknowledgment: This research was conducted by Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, and Ming Zhou. We thank the Search Technology Center Asia NLP team, the Bing Answer team, the Bing Relevance team, the Bing Ads team, and the Microsoft News team for providing the real-world datasets.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What we can teach a model to do with natural language is dictated by the availability of data. Currently, we have a lot of labeled data for very few languages, making it difficult to train models to accomplish question answering, text summarization, and other tasks in every language and ultimately limiting the amount of people [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":665694,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-664554","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[649749],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Yaobo Liang","user_id":36036,"display_name":"Yaobo Liang","author_link":"<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/yalia\/\" aria-label=\"Visit the profile page for Yaobo Liang\">Yaobo Liang<\/a>","is_active":false,"last_first":"Liang, Yaobo","people_section":0,"alias":"yalia"},{"type":"guest","value":"daniel-campos","user_id":"664566","display_name":"Daniel Campos","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/spacemanidol\/\" aria-label=\"Visit the profile page for Daniel Campos\">Daniel Campos<\/a>","is_active":true,"last_first":"Campos, Daniel","people_section":0,"alias":"daniel-campos"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-960x540.png 960w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-1536x864.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-640x360.png 640w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image-1280x720.png 1280w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2020\/06\/XGLUE-homepage-feat-image.png 1573w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Nan Duan, <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/yalia\/\" title=\"Go to researcher profile for Yaobo Liang\" aria-label=\"Go to researcher profile for Yaobo Liang\" data-bi-type=\"byline author\" data-bi-cN=\"Yaobo Liang\">Yaobo Liang<\/a>, and <a href=\"https:\/\/www.linkedin.com\/in\/spacemanidol\/\" title=\"Go to researcher profile for Daniel Campos\" aria-label=\"Go to researcher profile for Daniel Campos\" data-bi-type=\"byline author\" data-bi-cN=\"Daniel Campos\">Daniel Campos<\/a>","formattedDate":"June 11, 2020","formattedExcerpt":"What we can teach a model to do with natural language is dictated by the availability of data. Currently, we have a lot of labeled data for very few languages, making it difficult to train models to accomplish question answering, text summarization, and other tasks&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/664554","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=664554"}],"version-history":[{"count":10,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/664554\/revisions"}],"predecessor-version":[{"id":665733,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/664554\/revisions\/665733"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/665694"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=664554"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=664554"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=664554"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=664554"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=664554"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=664554"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=664554"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=664554"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=664554"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=664554"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=664554"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}