This paper introduces QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in the network architecture. The QMDP-net is fully differentiable and allows end-to-end training. We train a QMDP-net over a set of different environments so that it can generalize over new ones. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, it also sometimes outperformed the QMDP algorithm, which generated the data for learning, because of QMDP-net’s robustness resulting from end-to-end learning.