• Deep Learning

    On the Attention Mechanism

    What is revolutionary about the attention mechanism is that it allows to dynamically change the weight of a piece of information. This is not possible with, say, traditional RNNs or LSTMs. Attention mechanisms were proposed for sequence-to-sequence translation, where they made a significant difference by allowing networks to identify which words are more relevant with the $t$th word in the translation (i.e., soft alignment), and putting more weight to them even if they are far apart from the word that is…

  • Deep Learning

    LSTM Examples #1: Basic Time Series Prediction

    The results above illustrate how LSTM works with continuous input/output on simple 1D time series prediction tasks. We aim to estimate simple shapes by using past data. We follow an increasingly more complex scenarios. Results show the limitations of continuous time-series prediction via LSTM; the last slides show that even simple shapes cannot be accurately predicted. But the code is helpful to illustrate the basics of time-series prediction. Below we provide all the commands and the python script needed to generate…

  • Deep Learning

    Self-supervision on Deep Nets

    Arguably, the main reason that deep nets became so powerful is self-supervision. In many domains, from image, to text, to DNA analysis, the concept of self-supervision was sufficient to generate practically infinite “labelled” data for training deep models. The idea is simple yet extremely powerful: just hide some parts of the (unlabelled) data and turn the hidden parts into the labels to predict. Here are some notes (mostly to myself) about self-supervision. There are two standard ways to make self supervision:…

  • Computer Vision,  Research

    State of the art in 3D face reconstruction may be wrong

    Our IJCB’23 study exposed a significant problem with the benchmark procedures of 3D face reconstruction — something that should make any researcher in the field worry. That is, we showed that the standard metric for evaluating 3D face reconstruction methods, namely geometric error via Chamfer (i.e., nearest-neighbor) correspondence, is highly problematic. Results showed that the Chamfer error does not only significantly underestimate the true error, but it does so inconsistently across reconstruction methods, thus the ranking between methods can be artificially…

  • Computer Vision,  Research,  Software

    3DI: Face Reconstruction via Inequality Constraints

    We present the CUDA code of our optimization-based 3DMM fitting method (i.e., no learning), which was first presented in ECCV’20. The journal version of the paper (currently under revision) presented a significant speed up with a new optimization approach, making the method feasible for real-life applications. 3DI is an optimization-based 3DMM fitting (3D reconstruction) method that enforces inequality constraints on 3DMM parameters and landmarks, thus significantly restricts the search space and rules out implausible solutions (Figure 1). 3DI is not a…

  • Research,  Signal Processing,  Software

    SyncRef: Fast & Scalable Way to Find Synchronized Time Series

    In CVPR’20 presented a new method (with code) for finding the largest subset of synchronized time series from a given set of time series. Specifically, we aim to find the largest subset of time series such that all pairs of in the subset are correlated at least by a (given) threshold value $\rho$. This is an NP-hard problem and the exact solution is, in general, unfeasible. We propose a new method, called SyncRef, for finding an approximate solution in an efficient…

  • Computer Vision,  Research

    Is Pose & Expression Separable with WP Camera?

    This study on facial analysis pipelines shows the limitations of using a Weak Perspective (WP) camera when it comes to decoupling pose and expression from face videos. That is, when decoupling of facial pose and expression within images requires a camera model for 3D-to-2D mapping when done through 3D reconstruction. The weak perspective (WP) camera has been the most popular choice; it is the default, if not the only option, in state-of-the-art facial analysis methods and software. WP camera is justified…

  • Linear Algebra

    Can I swap one matrix norm with another?

    Some matrix norms are significantly costlier than others. For example, the matrix-2 norm requires computation of eigenvalues, whereas the matrix-1 norm is simply computed by finding the column of the matrix with the largest (absolute) sum (p283 of Carl D. Meyer). Thus, the following question becomes relevant: Can we use some matrix norm in place of some other norm? Fortunately, for some applications, we can do this. First of all, for analyzing limiting behavior, it makes no difference whether we use…

  • Linear Algebra

    Does it really take 10¹⁴¹ years to compute the determinant of a 100×100 matrix?

    Well, it depends on how you compute it. If you compute it by using the definition of determinant directly, in fact it can take more than 10¹⁴¹ years, as we’ll see.${}^\dagger$ Let’s recall the definition first. The determinant of a matrix $\mathbf A$ is $$\text{det}(\mathbf A) = \sum\limits_p \sigma(p) a_{1p_1} a_{2p_2} \dots a_{np_n},$$ where the sum is taken over the $n!$ permutations $p=(p_1,p_2,\dots,p_n)$ of the numbers $1,2,\dots,n$. The total number of multiplications in this definition are $n! (n-1)$. Based on my…