Variational (deep) parametric auto-encoders (VAE) have shown a great potential for unsupervised extraction of latent representations from large amounts of data. Human face exhibits an inherent hierarchy in facial representations (encoded in facial action units (AUs) and their intensity). This makes VAE a sophisticated method for learning facial features for AU intensity estimation. Yet, most existing methods apply classifiers learned separately from the encoded features. On the other hand, non-parametric (probabilistic) approaches, such as Gaussian Processes (GPs), typically outperform their parametric counterparts, but cannot deal easily with large amounts of data. In this paper, we propose a novel VAE semi-parametric modeling framework, named DeepCoder, which combines the modeling power of parametric (convolutional) and nonparametric (ordinal GPs) VAEs, for joint learning of (1) latent representations at multiple levels in a task hierarchy, and (2) classification of multiple ordinal outputs (AUs intensities). We show on benchmark datasets for AU intensity estimation that the proposed DeepCoder significantly outperforms state-of-the-art approaches, and related parametric VAEs, deep learning and parametric models.