{"id":511253,"date":"2018-10-18T08:55:20","date_gmt":"2018-10-18T15:55:20","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?p=511253"},"modified":"2018-10-18T15:23:18","modified_gmt":"2018-10-18T22:23:18","slug":"the-poet-in-the-machine-auto-generation-of-poetry-directly-from-images-through-multi-adversarial-training-and-a-little-inspiration","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/blog\/the-poet-in-the-machine-auto-generation-of-poetry-directly-from-images-through-multi-adversarial-training-and-a-little-inspiration\/","title":{"rendered":"The poet in the machine: Auto-generation of poetry directly from images through multi-adversarial training \u2013 and a little inspiration"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-511256 aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-1024x576.png\" alt=\"Bei Liu and Jianlong Fu of Microsoft research\" width=\"1024\" height=\"576\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-343x193.png 343w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>As the means of expressing thoughts and feelings too sublime and elusive to be conveyed in everyday language, poetry across all cultures occupies a most sophisticated and mysterious of realms, just beyond the outskirts of creativity. Along the language-based avenues of expression available to people, poetry represents a departure from casual speech; few possess the talent to produce it. Yet successful poetry is instantly recognized and appreciated by millions when it is encountered.<\/p>\n<p>To achieve true poetry then requires a very special and little understood brand of creativity. As difficult as it is for people to write it, imagine the challenge of designing artificial intelligence that might mimic the rare ability to produce poetic language in response to experience and existence.<\/p>\n<p>A team of talented researchers at Microsoft Research Asia set out to attempt just that. In a paper titled \u201c<a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/beyond-narrative-description-generating-poetry-from-images-by-multi-adversarial-training\/\">Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training<\/a>\u201d to be presented at the Association for Computing Machinery\u2019s Multimedia 2018 conference in Seoul, South Korea, October 22\u201326, researchers Bei Liu and <a href=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/people\/jianf\/\">Jianlong Fu<\/a> of Microsoft Research Asia in Beijing, along with teammates Makoto P. Kato and Masatoshi Yoshikawa of Kyoto University, took an imaginative approach to the quest of generating poetic language in response to images for automatic poetry creation, opening new possibilities for augmenting human endeavor. The project involved multiple challenges, including discovering poetic clues from images, as well as generating poems that would satisfy both relevance to an image and \u2014 something difficult to define but not disputed to exist \u2014 poeticness, to use the term coined by the researchers.<\/p>\n<p>Automatic text generation from images is a field of research that has generated a lot of interest in recent years in, for example, automated image captioning and intelligent description and automatic sentence generation for images. These applications are concerned more with accuracy and utility. But compared with image captioning and paragraphing, things get very challenging when the objective is to wax poetic. Why? Relatively speaking, there is a philosophical chasm between mere visual representations and the poetic concepts and emotions that can be inspired in people by images that could then potentially be used to generate better \u2014 more poetic \u2014 poems. One illuminating figure in the ACM Multimedia 2018 paper illustrates the difference between mere textual description and a poem for the same image:<\/p>\n<div id=\"attachment_511259\" style=\"width: 635px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-511259\" class=\"wp-image-511259 size-full\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/A-falcon-eating-figure-2.png\" alt=\"\" width=\"625\" height=\"221\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/A-falcon-eating-figure-2.png 625w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/A-falcon-eating-figure-2-300x106.png 300w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><p id=\"caption-attachment-511259\" class=\"wp-caption-text\">The difference between a human-written description and poem, based on the same image. Instead of describing the image factually, a poem tends to perceive a deeper meaning and poetic symbols from objects \u2013 knight from falcon, hunting and fight versus eating, and so on.<\/p><\/div>\n<p>Understanding clearly what the challenge involved was central to the researchers\u2019 success. \u201cGenerating a poem from an image is a cross-modality problem compared with poem generation from topics,\u201d said Researcher Jianlong Fu. A tried-and-true and perhaps more intuitive way for poem generation from images is to first extract keywords or captions from the images and then use these as seeds for poem generation. However, as fellow researcher Bei Liu pointed out, keywords or captions don\u2019t perceive a lot of the information in images \u2014 particularly the poetic clues that are important for poem generation. Compared with image captioning and image paragraphing, poem generation from images is an infinitely more subjective task.<\/p>\n<p>The researchers also were aware that the style and form of lines in poetry are vastly different than those found in narrative sentences. The team made an early decision to concentrate on free verse, a much more open form of poetry, abandoning any requirements that the poem should rhyme or adhere to meter. This still allowed for a real sense of poetic structures and poetic language, what the team decided to call poeticness. Such poems by definition and the lines they contained wouldn\u2019t be overly long. Word selections reflected words preferred by real poets versus the more mundane or literal words found in image descriptions. It also was acknowledged that specific words are preferred in poems compared with image descriptions and that lines in poems would maintain consistency to an overarching theme.<\/p>\n<blockquote>\n<p style=\"text-align: center;\"><strong>\u201cThe possibility of creating human-level content by AI requires building a deep and multi-modal understanding model spanning vision and language boundaries. Our team has consistently accelerated AI innovation to advance this dream.\u201d \u2013 Jianlong Fu<\/strong><\/p>\n<\/blockquote>\n<p>Approaching their unique set of challenges, the researchers assembled two poem datasets using living annotators and decided on a methodology of poetry creation via integrating retrieval and generation techniques in one system. The team drove their methodology through extensive experimenting across over 8,000 images and evaluated the results using both algorithmic agents and people.<\/p>\n<p>To better learn poetic clues from images for poem generation, they first learned a deep coupled visual-poetic embedding model with CNN features of images and skip-thought vector features of poems from a multi-modal poem dataset \u2014 MultiM-Poem \u2014 that consisted of thousands of image-poem pairs. This embedding model was then used to retrieve relevant and diverse poems from a larger uni-modal poem corpus \u2014 UniM-Poem \u2014 for images. Images with these retrieved poems and MultiM-Poem together constructed an enlarged image-poem pair dataset (MultiM-Poem (Ex)). The team took it a step further, deciding to leverage state-of-the-art sequential learning techniques for training an end-to-end image-to-poem model on the MultiM-Poem (Ex) dataset. This framework would ensure that the more robust poetic clues significant for poem generation would be discovered and modeled from the extended pairs. Two discriminative networks were used to provide rewards based on the generated poem\u2019s relevance to the given image and its poeticness.<\/p>\n<p>The generated poems were evaluated in both objective and subjective ways. The team defined evaluation metrics with regard to relevance, novelty and translative consistence and then conducted user studies to examine relevance, coherence, and imaginativeness of generated poems to compare its model to existing methods.<\/p>\n<p>The point of this research is not to have AI replace poets. It\u2019s about the myriad applications that can augment creative activity and achievement that the existence of even mildly creative AI could represent. Although the researchers acknowledge achieving truly creative AI is yet very far away, the boldness of their project and the encouraging results have been inspiring. To have set out to have a non-sentient machine take a run at a genre \u2014 English free verse \u2014 that is rather elusive even for motivated souls, whether you\u2019re a besotted high school student or Bob Dylan himself, got the attention of fellow researchers in the space. That the early results aren\u2019t on par with The Bard is beside the point; to the best of the researchers\u2019 knowledge, theirs represents the first attempt to study the image-inspired English poem generation problem in a holistic framework, enabling a machine to approach people-level capability in cognition tasks.<\/p>\n<p>They incorporated a deep-coupled visual-poetic embedding model and an RNN-based generator for joint learning in which two discriminators provide rewards for measuring cross-modality relevance and poeticness by multi-adversarial training. On the way, they built the first paired dataset of image and poem annotated by people, pairing it with the largest public poem corpus dataset. Extensive experimentation demonstrated the effectiveness of their approach, relying on objective and subjective evaluation metrics, including Turing testing it on over 500 people, including 30 poetry experts! In a spirit of sharing and to encourage further research in poetry generation from images, the team has released their datasets and code on \u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/bei21\/img2poem\">Github2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>\u201cIt\u2019s important to point out that we didn\u2019t define what poeticness is, which actually is difficult to define,\u201d said Liu. \u201cWe tried to make the machine learn both from poems and non-poems, so it could distinguish whether generated sentences were in a poem style or not.\u201d<\/p>\n<p>The results of the efforts are both fascinating and encouraging. Both subjective and objective evaluations showed superior achievements against existing state-of-the-art methods for poem generation from images.<\/p>\n<blockquote>\n<p style=\"text-align: center;\"><strong>\u201cWe have introduced a new artist. And I hope he\/she can help prompt more people\u2019s interest in art.\u201d \u2013 Bei Liu<\/strong><\/p>\n<\/blockquote>\n<p>Generation of poems isn\u2019t something that hasn\u2019t been tried. XiaoIce, the Microsoft advanced natural language chatbot, has been dabbling in Chinese poetry aimed at entertaining Chinese users for a couple of years now. But XiaoIce relies on keywords to generate poetry, unlike the research model pursued by this team that generates poems directly from images using an end-to-end approach without relying on keywords as a midlevel result.<\/p>\n<p>Again, why the quest to achieve machine-generated poetry? There are both aesthetic and commercial motivations. Fu sees applications of any advances in creative abilities in AI as augmenting human creativity in spaces such as gaming, image generation, and the fashion industry. \u201cTo be honest, the current creative capability of AI is still far from that of people,\u201d said Fu. \u201cBut we believe AI could be a type of assistant \u2018who\u2019 can help to reduce redundant work in design for artists and designers in the future.\u201d<\/p>\n<h3>Parallel research<\/h3>\n<p>What are the researchers looking at next? Storytelling. Their current project is looking to generate a story from multiple images \u2014 they call it visual storytelling. Right now, things are limited to the generation of general sentences, limited to a dataset. Fu and Liu are trying to introduce extra signals into the mix, such as emotion, to generate the stories. Building a model to simulate the human emotions and given a picture, they will first generate a distribution of emotions; indeed, there can be multiple related emotions to an image because every individual may have different views based on experience, culture, and identity. The model would automatically sample a specific emotion from the distribution and then use this signal as an input to generate the final story. \u201cThis could enhance the model to generate more diverse stories and thus resemble something more created by an individual,\u201d said Fu. \u201cWe are trying to incorporate random signals in training the model to create a more unique machine, that is, one more resembling a subjective individual.\u201d<\/p>\n<p>Liu pointed out that the machine\u2019s eventual individuality may be unavoidable. \u201cAlthough we currently are training the machine to simulate how most people might respond, the machine nevertheless progressively learns from its own subjective experience, just as do individual people.\u201d<\/p>\n<h3>Moved by AI<\/h3>\n<p>And was there a favorite poem for Liu, among all the AI created during the study?<\/p>\n<p>Liu smiled. \u201cIt was this one,\u201d she said, pasting a poem along with the image that inspired it (see below.)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-511262 aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/Poetic-image-figure-3.jpg\" alt=\"poetic image of a field through the treees\" width=\"500\" height=\"377\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/Poetic-image-figure-3.jpg 500w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/Poetic-image-figure-3-300x226.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/Poetic-image-figure-3-80x60.jpg 80w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/Poetic-image-figure-3-240x180.jpg 240w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p><em>The sun is shining<\/em><br \/>\n<em>The wind moves<\/em><br \/>\n<em>Naked trees<\/em><br \/>\n<em>You dance<\/em><\/p>\n<p>\u201cThis is a poem simple in language, inspired by a very common image we might easily glimpse in daily life,\u201d she said. \u201cAnd yet it seems real.\u201d She draws our attention to the ambiguity of the word \u201cyou\u201d in the final line. \u201cIt could be the tree. A friend. Even myself. I think this is the magic of poetry, and our work has the potential to create that magic.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As the means of expressing thoughts and feelings too sublime and elusive to be conveyed in everyday language, poetry across all cultures occupies a most sophisticated and mysterious of realms, just beyond the outskirts of creativity. Along the language-based avenues of expression available to people, poetry represents a departure from casual speech; few possess the [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":511256,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194471],"tags":[],"research-area":[13562,13551],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-511253","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","msr-research-area-computer-vision","msr-research-area-graphics-and-multimedia","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144736,144916],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788.png\" class=\"img-object-cover\" alt=\"Bei Liu and Jianlong Fu of Microsoft research\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788.png 1400w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-1066x600.png 1066w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-655x368.png 655w, https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-content\/uploads\/2018\/10\/ACM-Multimedia-Blog_Site_10_2018_1400x788-343x193.png 343w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"October 18, 2018","formattedExcerpt":"As the means of expressing thoughts and feelings too sublime and elusive to be conveyed in everyday language, poetry across all cultures occupies a most sophisticated and mysterious of realms, just beyond the outskirts of creativity. Along the language-based avenues of expression available to people,&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/511253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/comments?post=511253"}],"version-history":[{"count":9,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/511253\/revisions"}],"predecessor-version":[{"id":544095,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/posts\/511253\/revisions\/544095"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media\/511256"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=511253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/categories?post=511253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/tags?post=511253"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=511253"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=511253"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=511253"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=511253"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=511253"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=511253"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=511253"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=511253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}