{"id":171315,"date":"2014-03-24T02:04:37","date_gmt":"2014-03-24T02:04:37","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/voice-conversion-with-neural-network\/"},"modified":"2016-04-08T20:28:03","modified_gmt":"2016-04-08T20:28:03","slug":"voice-conversion-with-neural-network","status":"publish","type":"msr-project","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/voice-conversion-with-neural-network\/","title":{"rendered":"Voice Conversion with Neural Network"},"content":{"rendered":"<div class=\"asset-content\">Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion<\/div>\n<p><!-- .asset-content --><\/p>\n<div id=\"en-usprojectsvcnndefault\" class=\"page-content\">\n<p>Neural network (NN) based voice conversion, which employs a nonlinear function to map the features from a source to a target speaker, has been shown to outperform GMM-based voice version approach. However, there are still limitations to be overcome in NN-based voice conversion: NN is trained on a frame error (FE) minimization criterion and the corresponding weights are adjusted to minimize the error squares over the whole source-target, stereo training data set. In this paper, we use the idea of sentence optimization based, minimum generation error (MGE) training in HMM-based TTS synthesis, and modify the frame error (FE) minimization to Sequence Error (SE) minimization in NN training for voice conversion. The conversion error over a training sentence from a source speaker to a target speaker is minimized via a gradient descent-based back propagation (BP) procedure. Experimental results show that the speech converted by the NN, which is first trained with frame error minimization and then refined with sequence error minimization, sounds subjectively better than the converted speech by NN trained with frame error minimization only. Scores on both naturalness and similarity to the target speaker are improved.<\/p>\n<p>Some samples (click to play)<\/p>\n<p><strong>Source\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Target\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0FE\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SE<\/strong><\/p>\n<\/div>\n<table class=\"blue tWiz\" style=\"height: 135px;\" width=\"719\">\n<tbody>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_source_bdl.wav\">BDL<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_target_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_fe_bdl2slt.wav\">BDL to SLT<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_se_bdl2slt.wav\">BDL to SLT<\/a><\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0156_source_rms.wav\">RMS <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_target_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_fe_bdl2slt.wav\">BDL to SLT<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_se_bdl2slt.wav\">BDL to SLT<\/a><\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_source_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_source_bdl.wav\">BDL<\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong>SLT\u00a0to\u00a0BDL<\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong>SLT\u00a0to\u00a0BDL <\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_source_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_source_bdl.wav\">BDL<\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong>SLT\u00a0to\u00a0BDL<\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong>SLT\u00a0to\u00a0BDL<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<table class=\"blue tWiz\" style=\"height: 135px;\" width=\"720\">\n<tbody>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_source_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_target_clb.wav\">CLB<\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_fe_slt2clb.wav\">SLT to CLB<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_se_slt2clb.wav\">SLT to CLB<\/a><\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0118_target_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_target_clb.wav\">CLB<\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_fe_slt2clb.wav\">SLT to CLB<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0156_se_rms2bdl.wav\">RMS to BDL<\/a> <\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_target_clb.wav\">CLB<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_source_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong>CLB to\u00a0SLT<\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0142_se_bdl2rms.wav\">BDL to RMS<\/a> <\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_target_clb.wav\">CLB<\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0125_source_slt.wav\">SLT <\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong>CLB to\u00a0SLT<\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0146_se_bdl2rms.wav\">BDL to RMS<\/a><\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<table class=\"blue tWiz\" style=\"height: 135px;\" width=\"719\">\n<tbody>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0152_source_rms.wav\">RMS <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0152_target_bdl.wav\">BDL <\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0152_fe_rms2bdl.wav\">RMS to BDL<\/a> <\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0152_se_rms2bdl.wav\">RMS to BDL<\/a> <\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0156_source_rms.wav\">RMS <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0156_target_bdl.wav\">BDL <\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0156_fe_rms2bdl.wav\">RMS to BDL<\/a> <\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0156_se_rms2bdl.wav\">RMS to BDL<\/a> <\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0142_source_bdl.wav\">BDL <\/a><\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0142_target_rms.wav\">RMS<\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0142_fe_bdl2rms.wav\">BDL to RMS<\/a> <\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0142_se_bdl2rms.wav\">BDL to RMS<\/a> <\/strong><\/td>\n<\/tr>\n<tr class=\"blueTableOddRow\" style=\"background-color: #9cc6e6;\">\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0146_source_bdl.wav\">BDL<\/a> <\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0146_target_rms.wav\">RMS<\/a><\/strong><\/td>\n<td class=\"blueTableEvenCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0146_fe_bdl2rms.wav\">BDL to RMS<\/a> <\/strong><\/td>\n<td class=\"blueTableOddCol\"><strong><a class=\"invalidLink\" href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/vcnn-arctic_a0146_se_bdl2rms.wav\">BDL to RMS<\/a><\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion Neural network (NN) based voice conversion, which employs a nonlinear function to map the features from a source to a target speaker, has been shown to outperform GMM-based voice version approach. However, there are still limitations to be overcome in NN-based voice conversion: NN [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-171315","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2014-03-24","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171315","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":1,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171315\/revisions"}],"predecessor-version":[{"id":213331,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171315\/revisions\/213331"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=171315"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=171315"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=171315"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=171315"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=171315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}