We consider the problem of obtaining individualized estimates for the effect of a certain treatment given observational data. The problem differs fundamentally from classical supervised learning since for each individual subject, we either observe the response with or without the treatment but never both. Hence, estimating the effect of a treatment entails a causal inference task in which we need to estimate counterfactual outcomes. To address this problem, we propose a novel multi-task learning framework in which the individuals’ responses with and without the treatment are modeled as a vector-valued function that belongs to a reproducing kernel Hilbert space. Unlike previous methods for causal inference that use the G-computation formula, our approach does not obtain separate estimates for the treatment and control response surfaces, but rather obtains a joint estimate that ensures data efficiency in scenarios where the selection bias is strong. In order to be able to provide individualized measures of uncertainty in our estimates, we adopt a Bayesian approach for learning this vector-valued function using a multi-task Gaussian process prior; uncertainty is quantified via posterior credible intervals. We develop a novel risk based empirical Bayes approach for calibrating the Gaussian process hyper-parameters in a data-driven fashion based on gradient descent in which the update rule is itself learned from the data using a recurrent neural network. Experiments conducted on semi-synthetic data show that our algorithm significantly outperforms state-of-the-art causal inference methods.