My primary research interests are in the area of natural language processing (NLP), and specifically, in multimodal semantics. This area of computational semantics aims at making purely text-based models of meaning interact with other modalities, such as visual and sensorimotor, and thus simulate how humans acquire and process information and draw inferences using all available modalities. Specifically, I focus on interfacing the linguistic modality with its visual counterpart through the cross-modal learning paradigm.

So far, I’ve work on demonstrating how cross-modal mapping can be used in novel computational tasks in computer vision (attribute learning) and NLP (learning word embeddings). Given the promising results of cross-modal mapping in several tasks, I’ve looked into ways to improve the learning of mapping algorithms by exploiting ideas from other discriminative tasks as well as from domain adaptation.

When the thunder rolled, the wampimuk ran behind the oak. Most of us, after hearing this sentence would create visual expectations about how wampimuks look like. Humans are able to reason across modalities even when exposed to limited linguistic input, which is obviously the case for novel concepts and rare words. Currently, I’m investigating to which extend distributional semantics and cross-modal mapping can come together to simulate cross-modal reasoning in the way that humans do.

In the near future, I’m planning to demonstrate that cross-modal mapping combined with recent advances in computer vision can provide a window to computational models of meanings through concept image generation.