I am a Computer Vision engineer with a PhD from EPFL in Computational Imaging. My interests are in 3D reconstruction, segmentation, inverse rendering, inverse problems and numerical optimization. Below is a selection of some of my recent work.
We aim at detecting surface defects, anomalies and events on Artmyn’s multimodal gigapixel digital twins. We worked with external experts to develop a small annotated dataset. For training, we leveraged distribution aware resampling, heavy augmentations, a class frequency weighted loss function, selective masking, and elements of weakly supervised and semi-supervised learning in order to develop a zoo of semantic segmentation models. At gigapixel scales getting consistent predictions without compromising precision or recall is not trivial, so during inference in addition to severe test-time augmentations, we also ensemble the models via majority voting.
An aspect of real world objects that is often ignored in 3D-reconstructions is reflectance capture. The visual richness of material textures in real world arises from the way light interacts with the surface roughness of the object, resulting in cues we perceive such as matte, glossy, shiny. To effectively estimate this reflectance, I built a differentiable renderer for inverse rendering, partially inspired by the idea behind NeRF. The resulting reflectance model brought drastic realism improvements with an increase of up to 15dB of PSNR in objects with regions of specular highlights.
Artmyn’s multimodal imaging pipeline operates at 2000ppi, producing gigapixel assets. This comes with several challenges. Consistent image registration is a big challenge, particularly across regions with repeating patterns, which can occur surprisingly often in paintings. To overcome this, I built CRAFT, an inhouse adaptation of the RAFT optical flow CNN, where we added a real geometry encoder based on estimated depth and transfer learnt on synthetic data, resulting in a reduction of failure rates from 20% to under 1%. Accurate depth estimation at these scales is also non-trivial. I cast the problem as a large scale inverse problem with priors from both photometric cues and stereo displacement to improve depth estimation drastically. I also contributed heavily to proper camera sensor and color calibration models resulting in extremely high fidelity digital twins.
This was a research project for the course CS231N - Deep Learning for Computer Vision at Stanford Online. In this project the problem of Phototourism - i.e recreating a 3D model of the real world from unstructured set of photographs is considered from a deep learning perspective. The work explores the use of deep learning components in the classical structure from motion pipeline. It also explores the replacement of the optimization component bundle adjustment using the recently proposed DBARF, a generalized NERF inspired neural rendering method that simultaneously optimizes camera pose and image rendering. It is seen that while deep-learning components are in general successful in improving the feature description and matching stage, even neural rendering methods that optimize instead of learn, fail to achieve the accuracy of bundle adjustment.
This was a research project for the course CS224N - Natural Language Processing at Stanford Online. In this study, we are interested in Tamil language models that can tell stories with the same complexity as told to toddlers and young children. To build such a model we created a machine translated version of the TinyStories dataset with 1M stories in the train split. We then explore GPTNeo and Llama models of differing sizes, all less than 150M parameters to learn story telling. We take a three stage approach, where the model is first pretrained on internet quality Tamil data. Next the machine translated dataset is used for continual training. Followed by this we run a final fine tuning run with a very small expert curated dataset of 2000 stories in the train split. We also attempt LoRA fine tuning of an English language GPTNeo model. We see that while the models are able to tell stories, they are not of high quality, mainly arising from the low-quality of machine translations. Some of the resulting models are small enough at less than 100MB and can easily run on your browser. Head over to the project site to give it a try.
At Rayform, I contributed heavily to the algorithm in production for the design of freeform optics. My particular focus was on the computational method behind the lens’ surface design, resulting in a parameter-free algorithm that was robust to more general light configurations. This has since been licensed to LVMH’s Fred brand.
During my PhD at EPFL, I worked on a Google funded project called “eFacsimile” that was focused on modern computational methods for high fidelity digitization of cultural heritage artifacts. The work resulted in collaborations and exhibitions at Vitromusée and IP that was later licensed to Artmyn through Invaluable. A selection of resulting publications and media coverage is linked below.
I used to be obsessed with mountains and the night sky. Most of my photos still reside on a hard drive somewhere, but I have a few on my 500px.