Multiple object detection in wide area aerial videos, has drawn the attention of the computer vision research community for a number of years. A novel framework is proposed in this paper using a fully convolutional deep neural network, which is able to detect all objects simultaneously for a given region of interest. The network is designed to accept multiple video frames at a time as the input and yields detection results for all objects in the temporally center frame. This multi-frame approach yield far better results than its single frame counterpart. Additionally, the proposed method can detect vehicles which are slowing, stopped, and/or partially or fully occluded during some frames, which cannot be handled by nearly all state-of-the-art methods. To the best of our knowledge, this is the first use of a multiple-frame, fully convolutional deep model for detecting multiple small objects and the only framework which can detect stopped and temporarily occluded vehicles, for aerial videos. The proposed network exceeds state-of-the-art results significantly on WPAFB 2009 dataset.