AAAI 2025
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
Umar Khalid, Kashif Munir, Hasan Iqbal, Azib Farooq, Jing Hua, Nazanin Rahnavard, Chen Chen, Victor Zhu, Zhengping Ji
AAAI Creative AI for Live Interactive Performances 2025
A system that interprets ambiguous instructions in conjunction with reference visuals to produce precise, context-aware editing prompts using reflective reasoning framework and Chain-of-Thought reasoning.
Editing complex visual content from ambiguous or partially specified instructions remains a core challenge in vision-language modeling. Existing models can contextualize content but often fail to infer the underlying intent within a reference image or scene, leading to inconsistent or misaligned edits. We introduce the Editing Vision-Language Model (EVLM), a system that interprets ambiguous instructions in conjunction with reference visuals to produce precise, context-aware editing prompts. EVLM's key innovation is a reflective reasoning framework that translates subjective user intent into structured, actionable outputs by aligning with human-rated rationales through Reflection-Aware KL-Divergence Target Optimization (RKTO). By combining Chain-of-Thought (CoT) reasoning with RKTO alignment, EVLM captures fine-grained editing preferences without relying on binary supervision.
ECCV 2024
LatentEditor: Text Driven Local Editing of 3D Scenes
Umar Khalid*, Hasan Iqbal*, Nazmul Karim, Muhammad Tayyab, Jing Hua, Chen Chen
European Conference on Computer Vision (ECCV) 2024
An innovative framework for precise and locally controlled editing of neural fields using text prompts, leveraging denoising diffusion models for faster and more adaptable NeRF editing.
While neural fields have made significant strides in view synthesis and scene reconstruction, editing them poses a formidable challenge due to their implicit encoding of geometry and texture information from multi-view inputs. In this paper, we introduce LatentEditor, an innovative framework designed to empower users with the ability to perform precise and locally controlled editing of neural fields using text prompts. Leveraging denoising diffusion models, we successfully embed real-world scenes into the latent space, resulting in a faster and more adaptable NeRF backbone for editing compared to traditional methods.
ECCV 2024
Free-Editor: Zero-shot Text-driven 3D Scene Editing
Nazmul Karim*, Hasan Iqbal*, Umar Khalid, Chen Chen, Jing Hua
European Conference on Computer Vision (ECCV) 2024
A novel training-free 3D scene editing technique that enables users to edit 3D scenes without model retraining, achieving 20x faster editing than SOTA methods.
Text-to-Image (T2I) diffusion models have recently gained traction for their versatility and user-friendliness in 2D content generation and editing. However, training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets. In this study, we introduce Free-Editor, a novel, training-free 3D scene editing technique that effectively addresses the issue of multi-view style inconsistency through the implementation of a single-view editing scheme.
ECCV 2024
3DEgo: 3D Editing on the Go!
Umar Khalid*, Hasan Iqbal*, Nazmul Karim, Azib Farooq, Chen Chen, Jing Hua
European Conference on Computer Vision (ECCV) 2024
A streamlined framework for directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts, utilizing 3D Gaussian Splatting.
We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow by overcoming the reliance on COLMAP and eliminating the cost of model initialization.
CVPR 2025
SPF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
Umar Khalid, Nazmul Karim, Hasan Iqbal, Jing Hua, Chen Chen, Nazanin Rahnavard
CVPR 2025 (Under Review)
A progressive sampling framework for view-consistent 4D scene editing using diffusion models.
ICRA 2025
SAVE: Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-driven Video Editing
Umar Khalid, Nazmul Karim, Mohsen Joneidi, Chen Chen, Nazanin Rahnavard
IEEE International Conference on Robotics and Automation (ICRA) 2025
A novel spectral-shift-aware adaptation framework for fine-tuning diffusion models for video editing with 10x faster training.
MICCAI 2023
Unsupervised Anomaly Detection in Medical Images Using Masked Diffusion Model
Hasan Iqbal, Umar Khalid, Chen Chen, Jing Hua
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2023
CVPR Workshop 2022
RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection
Umar Khalid, Ashkan Esmaeili, Nazmul Karim, Nazanin Rahnavard
CVPR Workshop on Robust Vision 2022
A self-supervised based Out-of-distribution technique that maps ID class embeddings in 1-dimensional sub-space for efficient OOD detection.
CVPR Workshop 2022
CNLL: A Semi-supervised Approach For Continual Noisy Label Learning
Nazmul Karim, Umar Khalid, Ashkan Esmaeili, Nazanin Rahnavard
CVPR Workshop on Continual Learning 2022
The first study investigating semi-supervised learning in continual learning with noisy labels.