{"id":921747,"date":"2023-02-21T21:56:10","date_gmt":"2023-02-22T05:56:10","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/"},"modified":"2023-02-21T21:56:10","modified_gmt":"2023-02-22T05:56:10","slug":"dataset-and-baseline-system-for-multi-lingual-extraction-and-normalization-of-temporal-and-numerical-expressions","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/dataset-and-baseline-system-for-multi-lingual-extraction-and-normalization-of-temporal-and-numerical-expressions\/","title":{"rendered":"Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical Expressions"},"content":{"rendered":"<p><span dir=\"ltr\" role=\"presentation\">Temporal and numerical expression understand<\/span><span dir=\"ltr\" role=\"presentation\">ing is of great importance in many downstream <\/span><span dir=\"ltr\" role=\"presentation\">Natural Language Processing (NLP) and Infor<\/span><span dir=\"ltr\" role=\"presentation\">mation Retrieval (IR) tasks. However, much <\/span><span dir=\"ltr\" role=\"presentation\">previous work covers only a few sub-types <\/span><span dir=\"ltr\" role=\"presentation\">and focuses only on entity extraction, which <\/span><span dir=\"ltr\" role=\"presentation\">severely limits the usability of identified men<\/span><span dir=\"ltr\" role=\"presentation\">tions. In order for such entities to be useful in\u00a0<\/span><br role=\"presentation\" \/><span dir=\"ltr\" role=\"presentation\">downstream scenarios, the coverage and gran<\/span><span dir=\"ltr\" role=\"presentation\">ularity of sub-types are important; and even <\/span><span dir=\"ltr\" role=\"presentation\">more so, the resolution into concrete values <\/span><span dir=\"ltr\" role=\"presentation\">that can be manipulated. Moreover, most previ<\/span><span dir=\"ltr\" role=\"presentation\">ous work addresses only a handful of languages. <\/span><span dir=\"ltr\" role=\"presentation\">Here we propose both a multi-lingual evalua<\/span><span dir=\"ltr\" role=\"presentation\">tion dataset &#8211; NTX &#8211; covering diverse temporal <\/span><span dir=\"ltr\" role=\"presentation\">and numerical expressions across 14 languages; <\/span><span dir=\"ltr\" role=\"presentation\">including extraction, normalization, and resolution. <\/span><span dir=\"ltr\" role=\"presentation\">Along with the dataset we provide a robust rule-<\/span><span dir=\"ltr\" role=\"presentation\">based system as a strong baseline for compar<\/span><span dir=\"ltr\" role=\"presentation\">isons against other models to be evaluated in <\/span><span dir=\"ltr\" role=\"presentation\">this dataset.<\/span> <span dir=\"ltr\" role=\"presentation\">Data and code can be accessed <\/span><span dir=\"ltr\" role=\"presentation\">at https:\/\/aka.ms\/NTX.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Temporal and numerical expression understanding is of great importance in many downstream Natural Language Processing (NLP) and Information Retrieval (IR) tasks. However, much previous work covers only a few sub-types and focuses only on entity extraction, which severely limits the usability of identified mentions. In order for such entities to be useful in\u00a0downstream scenarios, the [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"MSR-TR-2023-9","msr_organization":"Microsoft Research","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2023-2-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193718],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-921747","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2023-2-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2023-9","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Microsoft Research","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/aka.ms\/NTX","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Sanxing Chen","user_id":0,"rest_url":false},{"type":"text","value":"Yongqiang Chen","user_id":0,"rest_url":false},{"type":"user_nicename","value":"B\u00f6rje F. Karlsson","user_id":31280,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=B\u00f6rje F. Karlsson"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144919,714577],"msr_project":[714646],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"techreport","related_content":{"projects":[{"ID":714646,"post_title":"VERT: Versatile Entity Recognition &amp; Disambiguation Toolkit","post_name":"vert-versatile-entity-recognition-disambiguation-toolkit","post_type":"msr-project","post_date":"2020-12-30 02:54:35","post_modified":"2021-10-13 21:15:01","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/vert-versatile-entity-recognition-disambiguation-toolkit\/","post_excerpt":"While knowledge about entities is a key building block in the mentioned systems, creating effective\/efficient models for real-world scenarios remains a challenge (tech\/data\/real workloads). Based on such needs, we've created VERT - a Versatile Entity Recognition &amp; Disambiguation Toolkit. VERT is a pragmatic toolkit that combines rules and ML, offering both powerful pretrained models for core entity types (recognition and linking) and the easy creation of custom models. Custom models use our deep learning-based NER\/EL&hellip;","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/714646"}]}}]},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/921747","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/921747\/revisions"}],"predecessor-version":[{"id":921756,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/921747\/revisions\/921756"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=921747"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=921747"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=921747"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=921747"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=921747"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=921747"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=921747"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=921747"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=921747"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=921747"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=921747"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=921747"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=921747"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}