This is the Trace Id: 37476b1ea46e811a651f8da5763a7dcc
Skip to main content Microsoft 365 Office Azure Copilot Windows Support Windows Apps OneDrive Outlook Moving from Skype to Teams OneNote Microsoft Teams Accessories Xbox games Microsoft AI Microsoft Security Azure Dynamics 365 Microsoft 365 for business Microsoft Power Platform Windows 365 Digital Sovereignty Microsoft Developer Microsoft Learn Support for AI marketplace apps Microsoft Tech Community Microsoft Marketplace Visual Studio Marketplace Rewards Free downloads & security Education Gift cards View Sitemap

Translator Human Parity Data

Human evaluation results and translation output for the Translator Human Parity Data release.

Important! Selecting a language below will dynamically change the complete page content to that language.

Download
  • Version:

    1.0

    Date Published:

    15/07/2024

    File Name:

    Translator-HumanParityData-v1.0.zip

    File Size:

    1.3 MB

    Human evaluation results and translation output for the Translator Human Parity Data release, as described in https://blogs.microsoft.com/ai/machine-translation-news-test-set-human-parity/

    The Translator Human Parity Data release contains all human evaluation results and translations related to our paper "Achieving Human Parity on Automatic Chinese to English News Translation", published on March 14, 2018. We have released this data to 1) allow external validation of our claim of having achieved human parity and 2) to foster future research by releasing two additional human references for the Reference-WMT test set.

    The package includes 1) two new references for newstest2017, one based on human translation from scratch (Reference-HT), the other based on human post-editing (Reference-PE); 2) human parity translations generated by our research systems Combo-4, Combo-5, and Combo-6, as well as translation output from online machine translation service Online-A-1710, collected on October 16, 2017; and 3) all data points collected in our human evaluation campaigns. This includes annotations for Subset-1, Subset-2, Subset-3, and Subset-4. We share the (anonymized) annotator IDs, segment IDs, system IDs, type ID (either TGT or CHK, the second being a repeated judgment for the first), raw scores r in [0,100], as well as annotation start and end times. Additionally, we share the combined data for Meta-1 campaign on Subset-1.

    When using this data we require that you cite our paper, which is available here: https://cm-edgetun.pages.dev/en-us/research/publication/achieving-human-parity-on-automatic-chinese-to-english-news-translation/
  • Supported Operating Systems

    Windows 10

    • Windows 10
    • Click Download and follow the instructions.