Euclean: Automated Geometry Problem Formalization with Unified Verification in Lean

  • Linbin Tang ,
  • Jingyan You ,
  • Zilin Kang ,
  • Hanzhang Liu ,
  • Sophia Zhang ,
  • Zenan Li ,
  • Chenrui Cao ,
  • Liangcheng Song ,
  • Jiaao Wu ,
  • ,

2026 ICML 2026 |

Recent formal reasoning systems achieve IMO-level performance, but create a fragmented landscape: algebra and number theory use Lean, while geometry relies on domain-specific languages with limited formal guarantees. This fragmentation increases the trusted computing base and hinders unified model development. Existing geometry-in-Lean efforts (LeanEuclid, LeanGeo) introduce custom axiom systems incompatible with standard \mathlib{}, and their small scale (1K problems) prevents large-scale training. However, native \mathlib{} autoformalization of geometry poses unique challenges: explicating implicit diagrammatic assumptions (e.g., topological configuration and non-degeneracy)—unlike existing custom systems that defer validity checks to external solvers—and adapting to \mathlib{}’s small, rapidly-evolving geometric constructs. We present \method{}, a framework that addresses these challenges through a four-stage pipeline—constraint explication, configuration anchoring, formalization mapping, and iterative repair—to automatically formalize geometry in native \mathlib{}. We construct OMNI-Geometry (768 competition problems) and Numina-Geometry (177,597 problems), the largest geometry formalization dataset in Lean. Human evaluation shows 48.89\% TOP1 and 73.33\% TOP5 accuracy. Training Goedel v2 on our formalizations improves proof success from 13.6\% to 15.1\%, validating dataset quality for unified neural theorem proving.

GitHubGitHub