{"id":1170806,"date":"2026-05-06T10:27:46","date_gmt":"2026-05-06T17:27:46","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/energy-use-of-ai-inference-efficiency-pathways-and-test-time-scaling\/"},"modified":"2026-05-07T14:58:43","modified_gmt":"2026-05-07T21:58:43","slug":"energy-use-of-ai-inference-efficiency-pathways-and-test-time-scaling","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/energy-use-of-ai-inference-efficiency-pathways-and-test-time-scaling\/","title":{"rendered":"Energy use of AI inference, efficiency pathways, and test-time scaling"},"content":{"rendered":"<p>As artificial intelligence (AI) becomes widely used and new workflows scale token demand, questions about how much energy large language models (LLMs) consume are increasingly important for grid planning, AI development, and energy policy. Rapid expansion of AI data-center capacity further increases the importance of accurate energy estimates. Public estimates of AI inference energy often overlook how efficiently large-scale inference systems operate and how rapidly efficiency improvements are reducing per-query energy use. Under realistic assumptions for large-scale deployments, this study finds that optimized frontier-scale inference consumes sub-watt-hour energy per query (median 0.31 Wh, interquartile range [IQR] 0.16\u20130.60), aligning with multiple independent measurements and estimates under production conditions. It also shows that high-profile public estimates and measurements based on non-production assumptions overstate energy use by 4\u201320\u00d7. Long reasoning and agentic queries, however, are estimated to increase energy consumption by more than an order of magnitude due to increased token generation and reduced serving concurrency. At data-center scale, serving billions of queries per day translates into substantial electricity demand, and even a modest share of long reasoning queries, for example, 10% of daily requests, can more than double total energy consumption. Encouragingly, recent efficiency improvements in model design, serving systems, and hardware could together reduce energy use by 8\u201320\u00d7, helping to keep overall demand manageable even as AI inference scales to billions of queries per day. This pattern echoes how data centers historically tempered energy demand through efficiency gains during the internet and cloud buildup.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As artificial intelligence (AI) becomes widely used and new workflows scale token demand, questions about how much energy large language models (LLMs) consume are increasingly important for grid planning, AI development, and energy policy. Rapid expansion of AI data-center capacity further increases the importance of accurate energy estimates. Public estimates of AI inference energy often [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2026-04-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":false,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[{"provider":"s2","id":"5dff1eb6df018110d0da578a6b640a295a3a95a4"},{"provider":"doi","id":"10.1016\/j.joule.2026.102430"}],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246694,251950],"msr-conference":[],"msr-journal":[270325],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1170806","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-field-of-study-artificial-intelligence","msr-field-of-study-efficient-energy-use"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2026-04-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":0,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"https:\/\/doi.org\/10.1016\/j.joule.2026.102430","label_id":"243106","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Felipe Oviedo","user_id":39925,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Felipe Oviedo"},{"type":"user_nicename","value":"Fiodar Kazhamiaka","user_id":42714,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Fiodar Kazhamiaka"},{"type":"user_nicename","value":"Esha Choukse","user_id":40417,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Esha Choukse"},{"type":"user_nicename","value":"Allen Kim","user_id":41704,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Allen Kim"},{"type":"user_nicename","value":"Amy Luers","user_id":41269,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Amy Luers"},{"type":"name","value":"Melanie Nakagawa","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ricardo Bianchini","user_id":33393,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ricardo Bianchini"},{"type":"user_nicename","value":"Juan M. Lavista Ferres","user_id":39552,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Juan M. Lavista Ferres"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[696544],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1170806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1170806\/revisions"}],"predecessor-version":[{"id":1171099,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1170806\/revisions\/1171099"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1170806"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1170806"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1170806"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1170806"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1170806"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1170806"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1170806"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1170806"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1170806"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1170806"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1170806"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1170806"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1170806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}