Mohammad Mahdi Derakhshani

Computer Vision | Machine Learning

Welcome! I'm Mohammad, a Ph.D. student at the University of Amsterdam's VIS lab under the supervision of Cees Snoek and Yuki Asano. My research delves into multi-modal foundation models, with a keen interest in vision and language models.

In summer 2023, I interned at Microsoft Research, Cambridge, working on fine-tuning large-scale LLMs, e.g. GPT3 and GPT3.5 and conditional text-to-image generation alongside Molly Xia, Harkirat Behl and Victor Ruehle. In summer 2022, I was at Samsung AI Center, Cambridge, researching large-scale language-image models and Federated Learning with Brais Martinez and Georgios Tzimiropoulos.

Before that, I pursued my master's at the University of Tehran under the guidance of Babak Nadjar Araabi and Mohammad Amin Sadeghi. My studies at the Machine Learning and Computational Modeling lab focused on object detection and image compression. I also researched object detection with Mohammad Rastegari.

I'm proud to be an ELLIS society member and have reviewed for prestigious conferences such as CVPR, NeurIPs, ICLR, ICML, ICCV and TPAMI.

profile photo

News

Research



method
TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska*, Mohammad Mahdi Derakhshani*, Yuki M. Asano, Nanne van Noord, Marcel Worring, Cees G. M. Snoek.

*: equal contribution; random order.

Arxiv bibtex

We propose a generalizable method, named TULIP, able to upgrade the token length to any length for CLIP-like models. We do so by improving the architecture with relative position encodings, followed by a training procedure that (i) distills the original CLIP text encoder into an encoder with relative position encodings and (ii) enhances the model for aligning longer captions with images.

method
Learning to Ground VLMs without Forgetting

Aritra Bhowmik*, Mohammad Mahdi Derakhshani*, Dennis Koelma, Martin R. Oswald, Yuki M. Asano, Cees G. M. Snoek

*: equal contribution; random order.

Arxiv bibtex

Our contributions include (a) enabling a pre-trained captionbased vision-language model to learn new grounding skills by fine-tuning without forgetting old ones, (b) improving model performance by scaling the generated dataset, and (c) enabling visual grounding tasks through step-by-step training on our synthetic dataset.

method
Any-Shift Prompting for Generalization over Distributions

Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani, Shengcai Liao, Cees GM Snoek

CVPR bibtex

we propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning. We explicitly connect training and test distributions in the latent space by constructing training and test prompts in a hierarchical architecture.

method
Unlocking Spatial Comprehension in Text-to-Image Diffusion Models

Mohammad Mahdi Derakhshani, Menglin Xia, Harkirat Behl, Cees GM Snoek, Victor Rühle.

Arxiv bibtex

We propose CompFuser, an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models. Our pipeline enables the interpretation of instructions defining spatial relationships between objects in a scene.

method
Self-Supervised Open-Ended Classification with Small Visual Language Models

Mohammad Mahdi Derakhshani*, Ivona Najdenkoska*, Cees G. M. Snoek, Marcel Worring, Yuki M. Asano.

Arxiv bibtex

We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks open-ended few-shot abilities of small visual language models. Our proposed adaptation algorithm explicitly learns from symbolic, yet self-supervised training tasks.

method
Bayesian Prompt Learning for Image-Language Model Generalization

Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez.

ICCV bibtex

We propose a probabilistic modeling of the underlying distribution of prompts, allowing prompts within the support of an associated concept to be derived through stochastic sampling. This results in a more complete and richer transfer of the information captured by the language model, providing better generalization capabilities for downstream tasks.

method
Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Tom van Sonsbeek*, Mohammad Mahdi Derakhshani*, Ivona Najdenkoska*, Cees G. M. Snoek, Marcel Worring.

MICCAI bibtex

we introduce a novel method particularly suited for small, domain-specific, medical datasets. To properly communicate the medical images to the language model, we develop a network that maps the extracted visual features to a set of learnable tokens. Then, alongside the question, these learnable tokens directly prompt the language model.

method
LifeLonger: A Benchmark for Continual Disease Classification

Mohammad Mahdi Derakhshani*, Ivona Najdenkoska*, Tom van Sonsbeek*, Xiantong Zhen, Dwarikanath Mahapatra, Marcel Worring, Cees G. M. Snoek.

MICCAI bibtex

We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection, by applying existing state-of-the-art continual learning methods. We perform a thorough analysis of the performance and examine how the well-known challenges of continual learning, such as the catastrophic forgetting exhibit themselves in this setting.

method
Generative Kernel Continual learning

Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

arXiv bibtex

We introduce generative kernel continual learning, which explores and exploits the synergies between generative models and kernels for continual learning.

method
Kernel Continual learning

Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

ICML bibtex

This paper introduces kernel continual learning, a simple but effective variant of continual learning that leverages the non-parametric nature of kernel methods to tackle catastrophic forgetting.

method
Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors

Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, Babak N. Araabi

CVPR bibtex

We present a simple and effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed.

method
BlockCNN: A Deep Network for Artifact Removal and Image Compression

Danial Maleki, Soheila Nadalian, Mohammad Mahdi Derakhshani, Mohammad Mahdi Derakhshani, Mohammad Amin Sadeghi

CVPR (Workshop) bibtex

We present a general technique that performs both artifact removal and image compression. For artifact removal, we input a JPEG image and try to remove its compression artifacts.