arXiv Paper Daily: Fri, 9 Jun 2017

内容简介：arXiv Paper Daily: Fri, 9 Jun 2017

Neural and Evolutionary Computing

Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks

Yujie Wu , Lei Deng , Guoqi Li , Jun Zhu , Luping Shi Subjects : Neural and Evolutionary Computing (cs.NE) ; Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

Compared with artificial neural networks (ANNs), spiking neural networks

(SNNs) are promising to explore the brain-like behaviors since the spikes could

encode more spatio-temporal information. Although pre-training from ANN or

direct training based on backpropagation (BP) makes the supervised training of

SNNs possible, these methods only exploit the networks’ spatial domain

information which leads to the performance bottleneck and requires many

complicated training skills. One fundamental issue is that the spike activity

is naturally non-differentiable which causes great difficulties in training

SNNs. To this end, we build an iterative LIF model that is more friendly for

gradient descent training. By simultaneously considering the layer-by-layer

spatial domain (SD) and the timing-dependent temporal domain (TD) in the

training phase, as well as an approximated derivative for the spike activity,

we propose a spatio-temporal backpropagation (STBP) training framework without

using any complicated technology. We achieve the best performance of

multi-layered perceptron (MLP) compared with existing state-of-the-art

algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a

custom object detection dataset. This work provides a new perspective to

explore the high-performance SNNs for future brain-like computing paradigm with

rich spatio-temporal dynamics.

Surprise Search for Evolutionary Divergence

Daniele Gravina , Antonios Liapis , Georgios N. Yannakakis Subjects : Neural and Evolutionary Computing (cs.NE)

Inspired by the notion of surprise for unconventional discovery we introduce

a general search algorithm we name surprise search as a new method of

evolutionary divergent search. Surprise search is grounded in the divergent

search paradigm and is fabricated within the principles of evolutionary search.

The algorithm mimics the self-surprise cognitive process and equips

evolutionary search with the ability to seek for solutions that deviate from

the algorithm’s expected behaviour. The predictive model of expected solutions

is based on historical trails of where the search has been and local

information about the search space. Surprise search is tested extensively in a

robot maze navigation task: experiments are held in four authored deceptive

mazes and in 60 generated mazes and compared against objective-based

evolutionary search and novelty search. The key findings of this study reveal

that surprise search is advantageous compared to the other two search

processes. In particular, it outperforms objective search and it is as

efficient as novelty search in all tasks examined. Most importantly, surprise

search is faster, on average, and more robust in solving the navigation problem

compared to any other algorithm examined. Finally, our analysis reveals that

surprise search explores the behavioural space more extensively and yields

higher population diversity compared to novelty search. What distinguishes

surprise search from other forms of divergent search, such as the search for

novelty, is its ability to diverge not from earlier and seen solutions but

rather from predicted and unseen points in the domain considered.

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

Karla Stepanova , Matej Hoffmann , Zdenek Straka , Frederico B. Klein , Angelo Cangelosi , Michal Vavrecka

Comments: pp. 155-162

Subjects

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)

Humans and animals are constantly exposed to a continuous stream of sensory

information from different modalities. At the same time, they form more

compressed representations like concepts or symbols. In species that use

language, this process is further structured by this interaction, where a

mapping between the sensorimotor concepts and linguistic elements needs to be

established. There is evidence that children might be learning language by

simply disambiguating potential meanings based on multiple exposures to

utterances in different contexts (cross-situational learning). In existing

models, the mapping between modalities is usually found in a single step by

directly using frequencies of referent and meaning co-occurrences. In this

paper, we present an extension of this one-step mapping and introduce a newly

proposed sequential mapping algorithm together with a publicly available Matlab

implementation. For demonstration, we have chosen a less typical scenario:

instead of learning to associate objects with their names, we focus on body

representations. A humanoid robot is receiving tactile stimulations on its

body, while at the same time listening to utterances of the body part names

(e.g., hand, forearm and torso). With the goal at arriving at the correct “body

categories”, we demonstrate how a sequential mapping algorithm outperforms

one-step mapping. In addition, the effect of data set size and noise in the

linguistic input are studied.

Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs

Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

To achieve state-of-the-art results on challenges in vision, Convolutional

Neural Networks learn stationary filters that take advantage of the underlying

image structure. Our purpose is to propose an efficient layer formulation that

extends this property to any domain described by a graph. Namely, we use the

support of its adjacency matrix to design learnable weight sharing filters able

to exploit the underlying structure of signals. The proposed formulation makes

it possible to learn the weights of the filter as well as a scheme that

controls how they are shared across the graph. We perform validation

experiments with image datasets and show that these filters offer performances

comparable with convolutional ones.

Reading Twice for Natural Language Understanding

Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Despite the recent success of neural networks in tasks involving natural

language understanding (NLU) there has only been limited progress in some of

the fundamental challenges of NLU, such as the disambiguation of the meaning

and function of words in context. This work approaches this problem by

incorporating contextual information into word representations prior to

processing the task at hand. To this end we propose a general-purpose reading

architecture that is employed prior to a task-specific NLU model. It is

responsible for refining context-agnostic word representations with contextual

information and lends itself to the introduction of additional,

context-relevant information from external knowledge sources. We demonstrate

that previously non-competitive models benefit dramatically from employing

contextual representations, closing the gap between general-purpose reading

architectures and the state-of-the-art performance obtained with fine-tuned,

task-specific architectures. Apart from our empirical results we present a

comprehensive analysis of the computed representations which gives insights

into the kind of information added during the refinement process.

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Denis A. Gudovskiy , Luca Rigazio Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Neural and Evolutionary Computing (cs.NE)

In this paper we introduce ShiftCNN, a generalized low-precision architecture

for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN

is based on a power-of-two weight representation and, as a result, performs

only shift and addition operations. Furthermore, ShiftCNN substantially reduces

computational cost of convolutional layers by precomputing convolution terms.

Such an optimization can be applied to any CNN architecture with a relatively

small codebook of weights and allows to decrease the number of product

operations by at least two orders of magnitude. The proposed architecture

targets custom inference accelerators and can be realized on FPGAs or ASICs.

Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be

converted without retraining into ShiftCNN with less than 1% drop in accuracy

when the proposed quantization algorithm is employed. RTL simulations,

targeting modern FPGAs, show that power consumption of convolutional layers is

reduced by a factor of 4 compared to conventional 8-bit fixed-point

architectures.

Computer Vision and Pattern Recognition

Structured Light Phase Measuring Profilometry Pattern Design for Binary Spatial Light Modulators

Daniel L. Lau , Yu Zhang , Kai Liu Subjects : Computer Vision and Pattern Recognition (cs.CV)

Structured light illumination is an active 3-D scanning technique based on

projecting/capturing a set of striped patterns and measuring the warping of the

patterns as they reflect off a target object’s surface. In the case of phase

measuring profilometry (PMP), the projected patterns are composed of a rolling

sinusoidal wave, but as a set of time-multiplexed patterns, PMP requires the

target surface to remain motionless or for scanning to be performed at such

high rates that any movement is small. But high speed scanning places a

significant burden on the projector electronics to produce contone patterns

inside of short exposure intervals. Binary patterns are, therefore, of great

value, but converting contone patterns into binary comes with significant risk.

As such, this paper introduces a contone-to-binary conversion algorithm for

deriving binary patterns that best mimic their contone counterparts.

Experimental results will show a greater than 3 times reduction in pattern

noise over traditional halftoning procedures.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal , Piotr Dollár , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , Kaiming He

Comments: Tech report

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Deep learning thrives with large neural networks and large datasets. However,

larger networks and larger datasets result in longer training times that impede

research and development progress. Distributed synchronous SGD offers a

potential solution to this problem by dividing SGD minibatches over a pool of

parallel workers. Yet to make this scheme efficient, the per-worker workload

must be large, which implies nontrivial growth in the SGD minibatch size. In

this paper, we empirically show that on the ImageNet dataset large minibatches

cause optimization difficulties, but when these are addressed the trained

networks exhibit good generalization. Specifically, we show no loss of accuracy

when training with large minibatch sizes up to 8192 images. To achieve this

result, we adopt a linear scaling rule for adjusting learning rates as a

function of minibatch size and develop a new warmup scheme that overcomes

optimization challenges early in training. With these simple techniques, our

Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs

in one hour, while matching small minibatch accuracy. Using commodity hardware,

our implementation achieves ~90% scaling efficiency when moving from 8 to 256

GPUs. This system enables us to train visual recognition models on

internet-scale data with high efficiency.

An Efficient Approach for Object Detection and Tracking of Objects in a Video with Variable Background

Kumar S. Ray , Soma Chakraborty Subjects : Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel approach to create an automated visual

surveillance system which is very efficient in detecting and tracking moving

objects in a video captured by moving camera without any apriori information

about the captured scene. Separating foreground from the background is

challenging job in videos captured by moving camera as both foreground and

background information change in every consecutive frames of the image

sequence; thus a pseudo-motion is perceptive in background. In the proposed

algorithm, the pseudo-motion in background is estimated and compensated using

phase correlation of consecutive frames based on the principle of Fourier shift

theorem. Then a method is proposed to model an acting background from recent

history of commonality of the current frame and the foreground is detected by

the differences between the background model and the current frame. Further

exploiting the recent history of dissimilarities of the current frame, actual

moving objects are detected in the foreground. Next, a two-stepped

morphological operation is proposed to refine the object region for an optimum

object size. Each object is attributed by its centroid, dimension and three

highest peaks of its gray value histogram. Finally, each object is tracked

using Kalman filter based on its attributes. The major advantage of this

algorithm over most of the existing object detection and tracking algorithms is

that, it does not require initialization of object position in the first frame

or training on sample data to perform. Performance of the algorithm is tested

on benchmark videos containing variable background and very satisfiable results

is achieved. The performance of the algorithm is also comparable with some of

the state-of-the-art algorithms for object detection and tracking.

Generative Autotransporters

Jiqing Wu , Zhiwu Huang , Wen Li , Luc Van Gool

Comments: *First two authors made equal contributions. Submitted to NIPS on May 19, 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (stat.ML)

In this paper, we aim to introduce the classic Optimal Transport theory to

enhance deep generative probabilistic modeling. For this purpose, we design a

Generative Autotransporter (GAT) model with explicit distribution optimal

transport. Particularly, the GAT model owns a deep distribution transporter to

transfer the target distribution to a specific prior probability distribution,

which enables a regular decoder to generate target samples from the input data

that follows the transported prior distribution. With such a design, the GAT

model can be stably trained to generate novel data by merely using a very

simple (l_1) reconstruction loss function with a generalized manifold-based

Adam training algorithm. The experiments on two standard benchmarks demonstrate

its strong generation ability.

ToxTrac: a fast and robust software for tracking organisms

Alvaro Rodriquez , Hanqing Zhang , Jonatan Klaminder , Tomas Brodin , Patrik L. Andersson , Magnus Andersson

Comments: File contains supplementary materials (user guide)

Subjects

Computer Vision and Pattern Recognition (cs.CV)

1. Behavioral analysis based on video recording is becoming increasingly

popular within research fields such as; ecology, medicine, ecotoxicology, and

toxicology. However, the programs available to analyze the data, which are;

free of cost, user-friendly, versatile, robust, fast and provide reliable

statistics for different organisms (invertebrates, vertebrates and mammals) are

significantly limited.

2. We present an automated open-source executable software (ToxTrac) for

image-based tracking that can simultaneously handle several organisms monitored

in a laboratory environment. We compare the performance of ToxTrac with current

accessible programs on the web.

3. The main advantages of ToxTrac are: i) no specific knowledge of the

geometry of the tracked bodies is needed; ii) processing speed, ToxTrac can

operate at a rate >25 frames per second in HD videos using modern desktop

computers; iii) simultaneous tracking of multiple organisms in multiple arenas;

iv) integrated distortion correction and camera calibration; v) robust against

false positives; vi) preservation of individual identification if crossing

occurs; vii) useful statistics and heat maps in real scale are exported in:

image, text and excel formats.

4. ToxTrac can be used for high speed tracking of insects, fish, rodents or

other species, and provides useful locomotor information. We suggest using

ToxTrac for future studies of animal behavior independent of research area.

Download ToxTrac here: this https URL

Learning Deep Representations for Scene Labeling with Guided Supervision

Zhe Wang , Hongsheng Li , Wanli Ouyang , Xiaogang Wang

Comments: 13 pages

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Scene labeling is a challenging classification problem where each input image

requires a pixel-level prediction map. Recently, deep-learning-based methods

have shown their effectiveness on solving this problem. However, we argue that

the large intra-class variation provides ambiguous training information and

hinders the deep models’ ability to learn more discriminative deep feature

representations. Unlike existing methods that mainly utilize semantic context

for regularizing or smoothing the prediction map, we design novel supervisions

from semantic context for learning better deep feature representations. Two

types of semantic context, scene names of images and label map statistics of

image patches, are exploited to create label hierarchies between the original

classes and newly created subclasses as the learning supervisions. Such

subclasses show lower intra-class variation, and help CNN detect more

meaningful visual patterns and learn more effective deep features. Novel

training strategies and network structure that take advantages of such label

hierarchies are introduced. Our proposed method is evaluated extensively on

four popular datasets, Stanford Background (8 classes), SIFTFlow (33 classes),

Barcelona (170 classes) and LM+Sun datasets (232 classes) with 3 different

networks structures, and show state-of-the-art performance. The experiments

show that our proposed method makes deep models learn more discriminative

feature representations without increasing model size or complexity.

Automatic tracking of vessel-like structures from a single starting point

Dario Augusto Borges Oliveira , Laura Leal-Taixe , Raul Queiroz Feitosa , Bodo Rosenhahn Subjects : Computer Vision and Pattern Recognition (cs.CV)

The identification of vascular networks is an important topic in the medical

image analysis community. While most methods focus on single vessel tracking,

the few solutions that exist for tracking complete vascular networks are

usually computationally intensive and require a lot of user interaction. In

this paper we present a method to track full vascular networks iteratively

using a single starting point. Our approach is based on a cloud of sampling

points distributed over concentric spherical layers. We also proposed a vessel

model and a metric of how well a sample point fits this model. Then, we

implement the network tracking as a min-cost flow problem, and propose a novel

optimization scheme to iteratively track the vessel structure by inherently

handling bifurcations and paths. The method was tested using both synthetic and

real images. On the 9 different data-sets of synthetic blood vessels, we

achieved maximum accuracies of more than 98\%. We further use the synthetic

data-set to analyse the sensibility of our method to parameter setting, showing

the robustness of the proposed algorithm. For real images, we used coronary,

carotid and pulmonary data to segment vascular structures and present the

visual results. Still for real images, we present numerical and visual results

for networks of nerve fibers in the olfactory system. Further visual results

also show the potential of our approach for identifying vascular networks

topologies. The presented method delivers good results for the several

different datasets tested and have potential for segmenting vessel-like

structures. Also, the topology information, inherently extracted, can be used

for further analysis to computed aided diagnosis and surgical planning.

Finally, the method’s modular aspect holds potential for problem-oriented

adjustments and improvements.

Image Captioning with Object Detection and Localization

Zhongliang Yang , Yu-Jin Zhang , Sadaqat ur Rehman , Yongfeng Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Automatically generating a natural language description of an image is a task

close to the heart of image understanding. In this paper, we present a

multi-model neural network method closely related to the human visual system

that automatically learns to describe the content of images. Our model consists

of two sub-models: an object detection and localization model, which extract

the information of objects and their spatial relationship in images

respectively; Besides, a deep recurrent neural network (RNN) based on long

short-term memory (LSTM) units with attention mechanism for sentences

generation. Each word of the description will be automatically aligned to

different objects of the input image when it is generated. This is similar to

the attention mechanism of the human visual system. Experimental results on the

COCO dataset showcase the merit of the proposed method, which outperforms

previous benchmark models.

C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones

Nuhad A. Malalla , Ying Chen Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper, we investigated a C-arm tomographic technique as a new three

dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone

detection over view angle less than 180o. Our C-arm tomographic technique

provides a series of two dimensional (2D) images with a single scan over 40o

view angle. Experimental studies were performed with a kidney phantom that was

formed from a pig kidney with two embedded kidney stones. Different

reconstruction methods were developed for C-arm tomographic technique to

generate 3D kidney information including: point by point back projection (BP),

filtered back projection (FBP), simultaneous algebraic reconstruction technique

(SART) and maximum likelihood expectation maximization (MLEM). Computer

simulation study was also done with simulated 3D spherical object to evaluate

the reconstruction results. Preliminary results demonstrated the capability of

our C-arm tomographic technique to generate 3D kidney information for kidney

stone detection with low exposure of radiation. The kidney stones are visible

on reconstructed planes with identifiable shapes and sizes.

Leveraging deep neural networks to capture psychological representations

Joshua C. Peterson , Joshua T. Abbott , Thomas L. Griffiths

Comments: 22 pages, 3 figures, submitted for publication

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Artificial neural networks have seen a recent surge in popularity for their

ability to solve complex problems as well as or better than humans. In computer

vision, deep convolutional neural networks have become the standard for object

classification and image understanding due to their ability to learn efficient

representations of high-dimensional data. However, the relationship between

these representations and human psychological representations has remained

unclear. Here we evaluate the quantitative and qualitative nature of this

correspondence. We find that state-of-the-art object classification networks

provide a reasonable first approximation to human similarity judgments, but

fail to capture some of the structure of psychological representations. We show

that a simple transformation that corrects these discrepancies can be obtained

through convex optimization. Such representations provide a tool that can be

used to study human performance on complex tasks with naturalistic stimuli,

such as predicting the difficulty of learning novel categories. Our results

extend the scope of psychological experiments and computational modeling of

cognition by enabling tractable use of large natural stimulus sets.

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles R. Qi , Li Yi , Hao Su , Leonidas J. Guibas Subjects : Computer Vision and Pattern Recognition (cs.CV)

Few prior works study deep learning on point sets. PointNet by Qi et al. is a

pioneer in this direction. However, by design PointNet does not capture local

structures induced by the metric space points live in, limiting its ability to

recognize fine-grained patterns and generalizability to complex scenes. In this

work, we introduce a hierarchical neural network that applies PointNet

recursively on a nested partitioning of the input point set. By exploiting

metric space distances, our network is able to learn local features with

increasing contextual scales. With further observation that point sets are

usually sampled with varying densities, which results in greatly decreased

performance for networks trained on uniform densities, we propose novel set

learning layers to adaptively combine features from multiple scales.

Experiments show that our network called PointNet++ is able to learn deep point

set features efficiently and robustly. In particular, results significantly

better than state-of-the-art have been obtained on challenging benchmarks of 3D

point clouds.