Convolutional Neural Networks (CNNs) have been successfully applied to many computer vision tasks, such as image classification. By performing linear combinations and element-wise nonlinear operations, these networks can be thought of as extracting solely first-order information from an input image. In the past, however, second-order statistics computed from handcrafted features, e.g., covariances, have proven highly effective in diverse recognition tasks. In this paper, we introduce a novel class of CNNs that exploit second-order statistics. To this end, we design a series of new layers that (i) extract a covariance matrix from convolutional activations, (ii) compute a parametric, second-order transformation of a matrix, and (iii) perform a parametric vectorization of a matrix. These operations can be assembled to form a Covariance Descriptor Unit (CDU), which replaces the fully-connected layers of standard CNNs. Our experiments demonstrate the benefits of our new architecture, which outperform the first-order CNNs, while relying on up to 90% fewer parameters.