arxivst stuff from arxiv that you should probably bookmark

Want Less Noise in Your Images? Add Semantic Information

When training an image classifier, one of the things we normally do is add noise to the dataset. This paper tackles the inverse problem (denoising) and adds semantic information to their images. Using ImageNet as their test set, the results are really compelling and show tons of promise / future research.

imagenet denoising classification

A Low Altitude Geo-Referenced Drone Dataset

Change detection datasets can be hard to come by. Especially so with low altitude geolocated drone datasets. While there’s a lot of opportunity for mis-use of research and we’re uncomfortable with all the scenarios presented, there’s also a lot of humanitarian use cases as well. Drone delivery of medication to natural disaster areas is just one that comes to mind.

drone change-detection mosaics dataset

April 05 2017

Build A Faster Image Search

Want to build a faster / better image search? Combine your hashing and aggregating systems. Or at least that’s the advice from a new paper out of Baidu research yesterday. The storage space needed gets a bit larger, but you reap the benefits of a much faster lookup.

oxford5k holidays image-search baidu state-of-the-art

2D to 3D Depth in Noisy Environments

This latest paper has state-of-the-art results getting depth information out of noisy data, working in situations where the 3d space output partitions are unknown. There’s still a couple downsides though, you’ll need a large number of frames for filtering and it doesn’t guess at what it can’t see.

state-of-the-art 2d-3d-depth

April 04 2017

2017 DAVIS Challenge and Dataset

Ooh, this is interesting. There’s a new DAVIS Challenge for 2017. It comes with a beautiful updated dataset composed of some new videos and some old videos that were relabeled with multiple objects. There’s been a lot of research in the area of semi-supervised video object detection lately, so we expect there will be some strong competitors. Especially since the results of the public challenge will be presented during a workshop at CVPR 2017 in Hawaii.

davis-challange video object-segmentation

New State of The Art on Keyphrase Boundary Classification

Keyphrase extraction and classification has uses ranging from chatbots to scientific analysis. We were just looking into it the other day as a way to automate tag generation. This latest paper gets a new state-of-the-art on the SemEval-2017 Task 10 and ACL RD-TEC 2.0 datasets.

state-of-the-art semeval-2017 acl-rd-tec

New State of the Art In Semantic Role Labeling

Semantic Role Labeling took a big step forward today. The newly proposed Syntax Aware LSTM model reset the benchmark on the challenging Chinese Proposition Bank dataset, achieving a new state-of-the-art F1 score of 79.60%.

state-of-the-art semantic-role-labeling cpb-dataset

April 03 2017

Auto-Encode Your Way to Realistic Images

GANs just keep getting better and better. Last Friday, researchers from Google posted some really solid results using a pretty simple (for GANs) model architecture. They put an auto-encoder in the discriminator and combined it with a training procedure built on Wasserstein GANs. It allows you to control the dial on diversity vs. realism in the generated images. Oh, and the results look great too.

gan dataset

Reassembling Image Fragments

Looking at the images in this paper, it’s easy to see a correlation between machine learning and archeology. The authors propose a new image problem: using small patches of an image to reconstruct the full image. They train a custom GAN with a spatial loss on multiple object-specific datasets including faces, waterfalls, cars, and ceramics and show its ability to generate images from the small patches. To fuel future research, the datasets that were used in this paper were also published.

gan wgan ebgan

State-Of-The-Art Foreground Object Detection

This completely unsupervised model learns to detect and segment foreground objects in images. The teacher (VideoPCA, an unsupervised video segmentation model) learns to recognize images in videos while the student (a deep neural net) attempts to match the teacher’s prediction for each frame. By the end of training, the student outperforms the teacher, works on single frames, and supports new, unseen classes. This model achieves state of the art on both the Object Discovery Dataset and the Youtube-Objects Dataset.

state-of-the-art object-discovery-dataset youtube-objects video

March 31 2017

Generate Human-like Image Captions

Words are hard. It’s one thing to generate a caption for an image (a difficult problem) it’s another to generate human-esque captions (a very difficult problem). Patterns of speech are complex, and the uncanny valley is wide. With all the effort being put in to GANs lately, it was only a matter of time before someone used them to generate better captions. The discriminator compares a set of generated sentences to both the image and each other.


Horses to Zebras and Back Again With CycleGAN

From the group that brought you pix2pix, comes CycleGAN. CycleGAN learns to translate partial images from one domain or style to another (e.g. turning a horse into a zebra) but without requiring matching image pairs, a limitation of pix2pix. Removing the image pairs and relying on the network to learn a latent representation of each style makes this useful in the real world. If you’re interested in this area, CVAE-GAN is another similar paper from last week.

gan images

Save Time by Teaching Your Model When to Stop

Want a more efficient attention mechanism while saving on your computational time? Baidu Research did. They just introduced a new model that uses policy optimization and two new actions: continue and stop. This is useful because it allows the network to learn how long to spend attending to a target. Even better, they released a git repo with source code. Before you run off into the wilds, one thing to keep in mind is that it still requires a few steps of training and fine-tuning (strongly supervised).

attention dynamic computation time reinforcement learning

Speed Up Your Drug Design with Atomic Convolutional Networks

Predicting protein-ligand interactions is a computationally intensive and domain specific problem. One that requires expert knowledge to accurately model. To make it more data-driven, the authors present an Atomic Convolutional Network (ACNN). This network utilizes a 3D convolution for learning physical relationships between atoms and predicts their stability. The more stable, the better. Comparing the network’s output to experimental results shows that ACNNs stand toe-to-toe with current methods. This is the first end-to-end fully-differentiable model of protein-ligand interactions.

cnn chemistry

March 30 2017

Inferring Feelings from Photos

Sentiment analysis of photos, plus a new public dataset. The authors combine adjective noun pairs such as ‘bad dog’ or ‘cute dog’ for sentiment information and CNNs for visual contextual information. The dataset consists of ~12k images taken while wearing a body-camera. Overall it’s an interesting read and gives a decent background on the issues involved.

images sentiment analysis

State Of The Art Results

New Datasets

Recent Abstracts