Ashish Goswami

I'm a fourth year PhD student (TCS Research Fellow) at Yardi School of AI@IIT-Delhi, where I'm working on object-centric generation and editing of images using Diffusion Models, advised by Prof. Parag Singla

I completed my Bachelors in Electronics and Communication Engineering from IIIT-Guwahati, during which I did a few internships centered around Large Scale ML and generative models at Delhivery and Spyne.ai

Email  /  CV  /  Bio  /  Scholar  /  Twitter  /  Github

profile photo

News

  • Nov 2025 Gave a Tutorial on Diffusion Models @ ICGEB Workshop on AI and data-science
    • Sep 2025 Gave a presentation on Foundations of Flow-Based Generative models in weekly reading group: Slides!
    • May 2025 Our paper GraPE accepted at MMFM Workshop @ CVPR 2025!
    • Mar 2025 Gave a talk on Diffusion+RL as part of AIL 821: Advanced Reinforcement Learning : Slides!
    • Dec 2024 New preprint on GraPE: A Generate-Plan-Edit Framework for compositional T2I synthesis.
    • Aug 2024 Passed my Comprehensive Examination : Slides!
    • July 2024 Received TCS Research Fellowship
    • Oct 2023 Paper on multi-hop image manipulation accepted at EMNLP 2023.

Research

I'm broadly interested in building reasoning capabilities in multi-modal generative models like Text-to-image diffusion and Text driven image-editing models. Recent work on object centric representation learning has some pretty neat ideas which I'd like explore.

Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach
Harman Singh, Poorva Garg, Mohit Gupta, Kevin Shah, Ashish Goswami, Satyam Modi, Arnab Kumar Mondal, Dinesh Khandelwal, Parag Singla, Dinesh Garg,

EMNLP 2023
video / arXiv

New datasets and a modular method for weakly-supervised instruction guided image manipulations.

fast-texture GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Ashish Goswami, Satyam Modi, Santhosh Rishi Deshineni, Harman Singh, Prathosh A. P, Parag Singla,

Arxiv Preprint 2024 & MMFM Workshop @ CVPR'25
Project Page / arXiv / Code

An unifying and generic framework for improving semantic alignemt of T2I models by post-hoc alignment of generated images via iterating editing.

TA Duties

Sem II · 2025-26 COL775: Deep Learning
Sem I · 2025-26 COL774: Machine Learning
Sem II · 2024-25 AIL721: Deep Learning
Sem I · 2024-25 COL828: Advance Computer Vision
Sem II · 2023-24 COL775: Deep Learning
Sem I · 2023-24 COL774: Machine Learning
Sem II · 2022-23 AIL861: Special topics in Applications
Sem I · 2022-23 COL671: Intro to Principles of AI

Website template by: John Barron