{"id":1169861,"date":"2026-04-27T11:16:51","date_gmt":"2026-04-27T18:16:51","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/koala-efficient-pipeline-training-through-automated-schedule-searching-on-domain-specific-language\/"},"modified":"2026-05-08T15:02:10","modified_gmt":"2026-05-08T22:02:10","slug":"koala-efficient-pipeline-training-through-automated-schedule-searching-on-domain-specific-language","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/koala-efficient-pipeline-training-through-automated-schedule-searching-on-domain-specific-language\/","title":{"rendered":"Koala: Efficient Pipeline Training through Automated Schedule Searching on Domain-Specific Language"},"content":{"rendered":"<p>Pipeline parallelism is a crucial technique for large-scale model training, enabling parameter splitting and performance enhancement. However, creating effective pipeline schedules often requires significant manual effort and coding skills, leading to practical inconveniences and complex debugging. Major frameworks such as DeepSpeed and ColossalAI simplify the process by adopting predefined pipeline schedule strategies, such as GPipe and 1F1B. The use of predefined schedules offers limited flexibility and suboptimal training efficiency, as the limited number of manually set candidates cannot provide the optimal strategy for arbitrary model training. To deal with the issue, this article aims to automatically search for the optimal strategy with high efficiency. Since current frameworks only support a limited set of fixed strategies, lacking the technical capability to create a comprehensive strategy search space, we first design a novel domain-specific language (DSL) for pipeline schedule development. The DSL exhibits great understandability, agility, and reusability, supporting the development of all known pipeline schedule strategies and their variants. Second, we are the first to model the complete pipeline schedule strategy space via the DSL, enabling an automated end-to-end globally optimal pipeline schedule searching, while past work may get stuck in a local optimum. Finally, we propose to optimize pipeline performance by modeling and solving the pipeline schedule as a Binary-Tree-Traversing (BTT) optimization problem. Based on the formalization, we further adopt a Dynamic Try-Test Genetic Algorithm to search for the best pipeline schedule strategy, which overwhelms a variety of pre-defined ones. Experimental results show that Koala achieves an enhanced performance by up to (1.53times) over state-of-the-art approaches. Besides, the pipeline schedule strategy searched by Koala outperforms pre-defined pipeline schedule strategies by (1.10times sim 1.55times) . Moreover, Koala has superior scalability and effectiveness in combining with data parallelism and tensor parallelism.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pipeline parallelism is a crucial technique for large-scale model training, enabling parameter splitting and performance enhancement. However, creating effective pipeline schedules often requires significant manual effort and coding skills, leading to practical inconveniences and complex debugging. Major frameworks such as DeepSpeed and ColossalAI simplify the process by adopting predefined pipeline schedule strategies, such as GPipe [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"1","msr_page_range_end":"25","msr_series":"","msr_volume":"22","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2025-03-07","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":false,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[{"provider":"s2","id":"a3f657d21eb9762783d1ab424096cbc230bcfcb6"},{"provider":"doi","id":"10.1145\/3722113"}],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13546,13553],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246691,268170],"msr-conference":[],"msr-journal":[270300],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1169861","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-computational-sciences-mathematics","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-field-of-study-computer-science","msr-field-of-study-systems-and-networking"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2025-03-07","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"22","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":0,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"https:\/\/doi.org\/10.1145\/3722113","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/dblp.org\/rec\/journals\/taco\/TangYLZLZQLL25.html","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"name","value":"Yu Tang","user_id":0,"rest_url":false},{"type":"name","value":"Lujia Yin","user_id":0,"rest_url":false},{"type":"name","value":"Qiao Li","user_id":0,"rest_url":false},{"type":"name","value":"Hongyu Zhu","user_id":0,"rest_url":false},{"type":"name","value":"Hengjie Li","user_id":0,"rest_url":false},{"type":"name","value":"Xingcheng Zhang","user_id":0,"rest_url":false},{"type":"name","value":"Linbo Qiao","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Dongsheng Li","user_id":39402,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dongsheng Li"},{"type":"name","value":"Jiaxin Li","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1169861","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1169861\/revisions"}],"predecessor-version":[{"id":1171225,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1169861\/revisions\/1171225"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1169861"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1169861"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1169861"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1169861"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1169861"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1169861"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1169861"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1169861"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1169861"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1169861"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1169861"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1169861"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1169861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}