When training an image classifier, one of the things we normally do is add noise to the dataset. This paper tackles the inverse problem (denoising) and adds semantic information to their images. Using ImageNet as their test set, the results are really compelling and show tons of promise / future research.
Change detection datasets can be hard to come by. Especially so with low altitude geolocated drone datasets. While there’s a lot of opportunity for mis-use of research and we’re uncomfortable with all the scenarios presented, there’s also a lot of humanitarian use cases as well. Drone delivery of medication to natural disaster areas is just one that comes to mind.
April 05 2017
Want to build a faster / better image search? Combine your hashing and aggregating systems. Or at least that’s the advice from a new paper out of Baidu research yesterday. The storage space needed gets a bit larger, but you reap the benefits of a much faster lookup.
This latest paper has state-of-the-art results getting depth information out of noisy data, working in situations where the 3d space output partitions are unknown. There’s still a couple downsides though, you’ll need a large number of frames for filtering and it doesn’t guess at what it can’t see.
April 04 2017
Ooh, this is interesting. There’s a new DAVIS Challenge for 2017. It comes with a beautiful updated dataset composed of some new videos and some old videos that were relabeled with multiple objects. There’s been a lot of research in the area of semi-supervised video object detection lately, so we expect there will be some strong competitors. Especially since the results of the public challenge will be presented during a workshop at CVPR 2017 in Hawaii.
Keyphrase extraction and classification has uses ranging from chatbots to scientific analysis. We were just looking into it the other day as a way to automate tag generation. This latest paper gets a new state-of-the-art on the SemEval-2017 Task 10 and ACL RD-TEC 2.0 datasets.
Semantic Role Labeling took a big step forward today. The newly proposed Syntax Aware LSTM model reset the benchmark on the challenging Chinese Proposition Bank dataset, achieving a new state-of-the-art F1 score of 79.60%.
April 03 2017
GANs just keep getting better and better. Last Friday, researchers from Google posted some really solid results using a pretty simple (for GANs) model architecture. They put an auto-encoder in the discriminator and combined it with a training procedure built on Wasserstein GANs. It allows you to control the dial on diversity vs. realism in the generated images. Oh, and the results look great too.
Looking at the images in this paper, it’s easy to see a correlation between machine learning and archeology. The authors propose a new image problem: using small patches of an image to reconstruct the full image. They train a custom GAN with a spatial loss on multiple object-specific datasets including faces, waterfalls, cars, and ceramics and show its ability to generate images from the small patches. To fuel future research, the datasets that were used in this paper were also published.
This completely unsupervised model learns to detect and segment foreground objects in images. The teacher (VideoPCA, an unsupervised video segmentation model) learns to recognize images in videos while the student (a deep neural net) attempts to match the teacher’s prediction for each frame. By the end of training, the student outperforms the teacher, works on single frames, and supports new, unseen classes. This model achieves state of the art on both the Object Discovery Dataset and the Youtube-Objects Dataset.
March 31 2017
Words are hard. It’s one thing to generate a caption for an image (a difficult problem) it’s another to generate human-esque captions (a very difficult problem). Patterns of speech are complex, and the uncanny valley is wide. With all the effort being put in to GANs lately, it was only a matter of time before someone used them to generate better captions. The discriminator compares a set of generated sentences to both the image and each other.
From the group that brought you pix2pix, comes CycleGAN. CycleGAN learns to translate partial images from one domain or style to another (e.g. turning a horse into a zebra) but without requiring matching image pairs, a limitation of pix2pix. Removing the image pairs and relying on the network to learn a latent representation of each style makes this useful in the real world. If you’re interested in this area, CVAE-GAN is another similar paper from last week.
Want a more efficient attention mechanism while saving on your computational time? Baidu Research did. They just introduced a new model that uses policy optimization and two new actions: continue and stop. This is useful because it allows the network to learn how long to spend attending to a target. Even better, they released a git repo with source code. Before you run off into the wilds, one thing to keep in mind is that it still requires a few steps of training and fine-tuning (strongly supervised).
Predicting protein-ligand interactions is a computationally intensive and domain specific problem. One that requires expert knowledge to accurately model. To make it more data-driven, the authors present an Atomic Convolutional Network (ACNN). This network utilizes a 3D convolution for learning physical relationships between atoms and predicts their stability. The more stable, the better. Comparing the network’s output to experimental results shows that ACNNs stand toe-to-toe with current methods. This is the first end-to-end fully-differentiable model of protein-ligand interactions.
March 30 2017
Sentiment analysis of photos, plus a new public dataset. The authors combine adjective noun pairs such as ‘bad dog’ or ‘cute dog’ for sentiment information and CNNs for visual contextual information. The dataset consists of ~12k images taken while wearing a body-camera. Overall it’s an interesting read and gives a decent background on the issues involved.
State Of The Art Results
- Apr 25 End to End Module Networks
- Apr 13 General Approach to Real World Text Extraction
- Apr 13 MAGAN, Better than BEGAN
- Apr 12 New SOTA for VQA 1.0
- Apr 11 Predicting Recomendations with TransNets
- Apr 7 Use Machine Learning to Write Your Code For You
- Apr 5 Build A Faster Image Search
- Apr 5 2D to 3D Depth in Noisy Environments
- Apr 4 New State of The Art on Keyphrase Boundary Classification
- Apr 4 New State of the Art In Semantic Role Labeling
- Apr 21 SREFI: Synthesis of Realistic Example Face Images
- Apr 15 Neural Paraphrase Identification of Questions with Noisy Pretraining
- Apr 13 3d Point Cloud Dataset and Benchmark
- Apr 10 Loss Max-Pooling for Semantic Image Segmentation
- Apr 9 BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis
- Apr 6 A Low Altitude Geo-Referenced Drone Dataset
- Apr 3 Auto-Encode Your Way to Realistic Images
- Mar 21 Boost Your Cross-Media Retrieval Process with Twitter100k
- Mar 21 Counterfactual Fairness: Combat the Inherent Social Biases of Your Dataset
- Apr 24 Accelerated Nearest Neighbor Search with Quick ADC
- Apr 24 Consistency of community detection in multi-layer networks using spectral and matrix factorization methods
- Apr 24 A Saddle Point Approach to Structured Low-rank Matrix Learning in Large-scale Applications
- Apr 24 Detecting and Recognizing Human-Object Interactions
- Apr 24 A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation
- Apr 24 Accurate Optical Flow via Direct Cost Volume Processing
- Apr 24 Elite Bases Regression: A Real-time Algorithm for Symbolic Regression
- Apr 24 A Real-time Hand Gesture Recognition and Human-Computer Interaction System
- Apr 24 Measuring the Accuracy of Object Detectors and Trackers
- Apr 24 Joint Modeling of Text and Acoustic-Prosodic Cues for Neural Parsing
- Apr 24 Fast PET reconstruction using Multi-scale Fully Convolutional Neural Networks
- Apr 24 Supervised Adversarial Networks for Image Saliency Detection
- Apr 24 Automatic Liver Lesion Segmentation Using A Deep Convolutional Neural Network Method
- Apr 24 Learning from Comparisons and Choices
- Apr 24 Entropic Trace Estimates for Log Determinants
- Apr 24 Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM
- Apr 24 What is the Essence of a Claim? Cross-Domain Claim Identification
- Apr 24 Reinforcement Learning Based Dynamic Selection of Auxiliary Objectives with Preserving of the Best Found Solution
- Apr 24 Stochastic Constraint Programming as Reinforcement Learning
- Apr 24 Monocular Visual Odometry with a Rolling Shutter Camera