Euclean: Automated Geometry Problem Formalization with Unified Verification in Lean
- Linbin Tang ,
- Jingyan You ,
- Zilin Kang ,
- Hanzhang Liu ,
- Sophia Zhang ,
- Zenan Li ,
- Chenrui Cao ,
- Liangcheng Song ,
- Jiaao Wu ,
- Xian Zhang ,
- Fan Yang
2026 ICML 2026 |
Recent formal reasoning systems achieve IMO-level performance, but create a fragmented landscape: algebra and number theory use Lean, while geometry relies on domain-specific languages with limited formal guarantees. This fragmentation increases the trusted computing base and hinders unified model development. Existing geometry-in-Lean efforts (LeanEuclid, LeanGeo) introduce custom axiom systems incompatible with standard \mathlib{}, and their small scale (1K problems) prevents large-scale training. However, native \mathlib{} autoformalization of geometry poses unique challenges: explicating implicit diagrammatic assumptions (e.g., topological configuration and non-degeneracy)—unlike existing custom systems that defer validity checks to external solvers—and adapting to \mathlib{}’s small, rapidly-evolving geometric constructs. We present \method{}, a framework that addresses these challenges through a four-stage pipeline—constraint explication, configuration anchoring, formalization mapping, and iterative repair—to automatically formalize geometry in native \mathlib{}. We construct OMNI-Geometry (768 competition problems) and Numina-Geometry (177,597 problems), the largest geometry formalization dataset in Lean. Human evaluation shows 48.89\% TOP1 and 73.33\% TOP5 accuracy. Training Goedel v2 on our formalizations improves proof success from 13.6\% to 15.1\%, validating dataset quality for unified neural theorem proving.