We introduce ProcGen3D, a new approach for 3D content creation by generating procedural graph abstractions of 3D objects, which can then be decoded into rich, complex 3D assets. Inspired by the prevalent use of procedural generators in production 3D applications, we propose a sequentialized, graph-based procedural graph representation for 3D assets. We use this to learn to approximate the landscape of a procedural generator for image-based 3D reconstruction. We employ edge-based tokenization to encode the procedural graphs, and train a transformer prior to predict the next token conditioned on an input RGB image. Crucially, to enable better alignment of our generated outputs to an input image, we incorporate Monte Carlo Tree Search (MCTS) guided sampling into our generation process, steering output procedural graphs towards more image-faithful reconstructions. Our approach is applicable across a variety of objects that can be synthesized with procedural generators. Extensive experiments on cacti, trees, and bridges show that our neural procedural graph generation outperforms both state-of-the-art generative 3D methods and domain-specific modeling techniques. Furthermore, this enables improved generalization on real-world input images, despite training only on synthetic data.
ProcGen3D enables high-fidelity image-to-3D reconstruction across diverse categories of procedurally generated objects by developing a sequentialized, graph-based representation.Using an autoregressive transformer, we predict a procedural graph from an input image, which is then decoded by a procedural generator into a high-fidelity 3D asset.
We adopt a GPT-style transformer to model procedural graph generation. Each graph is sequentialized into edge-based tokens, where each token encodes the spatial positions of its two endpoint vertices, along with their corresponding attributes and the attributes of the edge itself. The transformer is trained autoregressively for next-token prediction, conditioned on the input RGB image. To better align the predicted graph to the input condition, we introduce a Monte Carlo Tree Search (MCTS)–guided sampling into our inference process.
By iteratively applying the MCTS steps, our MCTS-guided search steers towards graphs that maximize consistency with the input image.
}