内容简介:arXiv Paper Daily: Fri, 9 Jun 2017
Neural and Evolutionary Computing
Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks
Yujie Wu , Lei Deng , Guoqi Li , Jun Zhu , Luping Shi Subjects : Neural and Evolutionary Computing (cs.NE) ; Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Compared with artificial neural networks (ANNs), spiking neural networks
(SNNs) are promising to explore the brain-like behaviors since the spikes could
encode more spatio-temporal information. Although pre-training from ANN or
direct training based on backpropagation (BP) makes the supervised training of
SNNs possible, these methods only exploit the networks’ spatial domain
information which leads to the performance bottleneck and requires many
complicated training skills. One fundamental issue is that the spike activity
is naturally non-differentiable which causes great difficulties in training
SNNs. To this end, we build an iterative LIF model that is more friendly for
gradient descent training. By simultaneously considering the layer-by-layer
spatial domain (SD) and the timing-dependent temporal domain (TD) in the
training phase, as well as an approximated derivative for the spike activity,
we propose a spatio-temporal backpropagation (STBP) training framework without
using any complicated technology. We achieve the best performance of
multi-layered perceptron (MLP) compared with existing state-of-the-art
algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a
custom object detection dataset. This work provides a new perspective to
explore the high-performance SNNs for future brain-like computing paradigm with
rich spatio-temporal dynamics.
Surprise Search for Evolutionary Divergence
Daniele Gravina , Antonios Liapis , Georgios N. Yannakakis Subjects : Neural and Evolutionary Computing (cs.NE)
Inspired by the notion of surprise for unconventional discovery we introduce
a general search algorithm we name surprise search as a new method of
evolutionary divergent search. Surprise search is grounded in the divergent
search paradigm and is fabricated within the principles of evolutionary search.
The algorithm mimics the self-surprise cognitive process and equips
evolutionary search with the ability to seek for solutions that deviate from
the algorithm’s expected behaviour. The predictive model of expected solutions
is based on historical trails of where the search has been and local
information about the search space. Surprise search is tested extensively in a
robot maze navigation task: experiments are held in four authored deceptive
mazes and in 60 generated mazes and compared against objective-based
evolutionary search and novelty search. The key findings of this study reveal
that surprise search is advantageous compared to the other two search
processes. In particular, it outperforms objective search and it is as
efficient as novelty search in all tasks examined. Most importantly, surprise
search is faster, on average, and more robust in solving the navigation problem
compared to any other algorithm examined. Finally, our analysis reveals that
surprise search explores the behavioural space more extensively and yields
higher population diversity compared to novelty search. What distinguishes
surprise search from other forms of divergent search, such as the search for
novelty, is its ability to diverge not from earlier and seen solutions but
rather from predicted and unseen points in the domain considered.
Comments: pp. 155-162
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)
Humans and animals are constantly exposed to a continuous stream of sensory
information from different modalities. At the same time, they form more
compressed representations like concepts or symbols. In species that use
language, this process is further structured by this interaction, where a
mapping between the sensorimotor concepts and linguistic elements needs to be
established. There is evidence that children might be learning language by
simply disambiguating potential meanings based on multiple exposures to
utterances in different contexts (cross-situational learning). In existing
models, the mapping between modalities is usually found in a single step by
directly using frequencies of referent and meaning co-occurrences. In this
paper, we present an extension of this one-step mapping and introduce a newly
proposed sequential mapping algorithm together with a publicly available Matlab
implementation. For demonstration, we have chosen a less typical scenario:
instead of learning to associate objects with their names, we focus on body
representations. A humanoid robot is receiving tactile stimulations on its
body, while at the same time listening to utterances of the body part names
(e.g., hand, forearm and torso). With the goal at arriving at the correct “body
categories”, we demonstrate how a sequential mapping algorithm outperforms
one-step mapping. In addition, the effect of data set size and noise in the
linguistic input are studied.
Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs
Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
To achieve state-of-the-art results on challenges in vision, Convolutional
Neural Networks learn stationary filters that take advantage of the underlying
image structure. Our purpose is to propose an efficient layer formulation that
extends this property to any domain described by a graph. Namely, we use the
support of its adjacency matrix to design learnable weight sharing filters able
to exploit the underlying structure of signals. The proposed formulation makes
it possible to learn the weights of the filter as well as a scheme that
controls how they are shared across the graph. We perform validation
experiments with image datasets and show that these filters offer performances
comparable with convolutional ones.
Reading Twice for Natural Language Understanding
Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Despite the recent success of neural networks in tasks involving natural
language understanding (NLU) there has only been limited progress in some of
the fundamental challenges of NLU, such as the disambiguation of the meaning
and function of words in context. This work approaches this problem by
incorporating contextual information into word representations prior to
processing the task at hand. To this end we propose a general-purpose reading
architecture that is employed prior to a task-specific NLU model. It is
responsible for refining context-agnostic word representations with contextual
information and lends itself to the introduction of additional,
context-relevant information from external knowledge sources. We demonstrate
that previously non-competitive models benefit dramatically from employing
contextual representations, closing the gap between general-purpose reading
architectures and the state-of-the-art performance obtained with fine-tuned,
task-specific architectures. Apart from our empirical results we present a
comprehensive analysis of the computed representations which gives insights
into the kind of information added during the refinement process.
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
Denis A. Gudovskiy , Luca Rigazio Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Neural and Evolutionary Computing (cs.NE)
In this paper we introduce ShiftCNN, a generalized low-precision architecture
for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN
is based on a power-of-two weight representation and, as a result, performs
only shift and addition operations. Furthermore, ShiftCNN substantially reduces
computational cost of convolutional layers by precomputing convolution terms.
Such an optimization can be applied to any CNN architecture with a relatively
small codebook of weights and allows to decrease the number of product
operations by at least two orders of magnitude. The proposed architecture
targets custom inference accelerators and can be realized on FPGAs or ASICs.
Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be
converted without retraining into ShiftCNN with less than 1% drop in accuracy
when the proposed quantization algorithm is employed. RTL simulations,
targeting modern FPGAs, show that power consumption of convolutional layers is
reduced by a factor of 4 compared to conventional 8-bit fixed-point
architectures.
Computer Vision and Pattern Recognition
Structured Light Phase Measuring Profilometry Pattern Design for Binary Spatial Light Modulators
Daniel L. Lau , Yu Zhang , Kai Liu Subjects : Computer Vision and Pattern Recognition (cs.CV)
Structured light illumination is an active 3-D scanning technique based on
projecting/capturing a set of striped patterns and measuring the warping of the
patterns as they reflect off a target object’s surface. In the case of phase
measuring profilometry (PMP), the projected patterns are composed of a rolling
sinusoidal wave, but as a set of time-multiplexed patterns, PMP requires the
target surface to remain motionless or for scanning to be performed at such
high rates that any movement is small. But high speed scanning places a
significant burden on the projector electronics to produce contone patterns
inside of short exposure intervals. Binary patterns are, therefore, of great
value, but converting contone patterns into binary comes with significant risk.
As such, this paper introduces a contone-to-binary conversion algorithm for
deriving binary patterns that best mimic their contone counterparts.
Experimental results will show a greater than 3 times reduction in pattern
noise over traditional halftoning procedures.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Comments: Tech report
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Deep learning thrives with large neural networks and large datasets. However,
larger networks and larger datasets result in longer training times that impede
research and development progress. Distributed synchronous SGD offers a
potential solution to this problem by dividing SGD minibatches over a pool of
parallel workers. Yet to make this scheme efficient, the per-worker workload
must be large, which implies nontrivial growth in the SGD minibatch size. In
this paper, we empirically show that on the ImageNet dataset large minibatches
cause optimization difficulties, but when these are addressed the trained
networks exhibit good generalization. Specifically, we show no loss of accuracy
when training with large minibatch sizes up to 8192 images. To achieve this
result, we adopt a linear scaling rule for adjusting learning rates as a
function of minibatch size and develop a new warmup scheme that overcomes
optimization challenges early in training. With these simple techniques, our
Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs
in one hour, while matching small minibatch accuracy. Using commodity hardware,
our implementation achieves ~90% scaling efficiency when moving from 8 to 256
GPUs. This system enables us to train visual recognition models on
internet-scale data with high efficiency.
Kumar S. Ray , Soma Chakraborty Subjects : Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a novel approach to create an automated visual
surveillance system which is very efficient in detecting and tracking moving
objects in a video captured by moving camera without any apriori information
about the captured scene. Separating foreground from the background is
challenging job in videos captured by moving camera as both foreground and
background information change in every consecutive frames of the image
sequence; thus a pseudo-motion is perceptive in background. In the proposed
algorithm, the pseudo-motion in background is estimated and compensated using
phase correlation of consecutive frames based on the principle of Fourier shift
theorem. Then a method is proposed to model an acting background from recent
history of commonality of the current frame and the foreground is detected by
the differences between the background model and the current frame. Further
exploiting the recent history of dissimilarities of the current frame, actual
moving objects are detected in the foreground. Next, a two-stepped
morphological operation is proposed to refine the object region for an optimum
object size. Each object is attributed by its centroid, dimension and three
highest peaks of its gray value histogram. Finally, each object is tracked
using Kalman filter based on its attributes. The major advantage of this
algorithm over most of the existing object detection and tracking algorithms is
that, it does not require initialization of object position in the first frame
or training on sample data to perform. Performance of the algorithm is tested
on benchmark videos containing variable background and very satisfiable results
is achieved. The performance of the algorithm is also comparable with some of
the state-of-the-art algorithms for object detection and tracking.
Generative Autotransporters
Comments: *First two authors made equal contributions. Submitted to NIPS on May 19, 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (stat.ML)
In this paper, we aim to introduce the classic Optimal Transport theory to
enhance deep generative probabilistic modeling. For this purpose, we design a
Generative Autotransporter (GAT) model with explicit distribution optimal
transport. Particularly, the GAT model owns a deep distribution transporter to
transfer the target distribution to a specific prior probability distribution,
which enables a regular decoder to generate target samples from the input data
that follows the transported prior distribution. With such a design, the GAT
model can be stably trained to generate novel data by merely using a very
simple (l_1) reconstruction loss function with a generalized manifold-based
Adam training algorithm. The experiments on two standard benchmarks demonstrate
its strong generation ability.
ToxTrac: a fast and robust software for tracking organisms
Comments: File contains supplementary materials (user guide)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
1. Behavioral analysis based on video recording is becoming increasingly
popular within research fields such as; ecology, medicine, ecotoxicology, and
toxicology. However, the programs available to analyze the data, which are;
free of cost, user-friendly, versatile, robust, fast and provide reliable
statistics for different organisms (invertebrates, vertebrates and mammals) are
significantly limited.
2. We present an automated open-source executable software (ToxTrac) for
image-based tracking that can simultaneously handle several organisms monitored
in a laboratory environment. We compare the performance of ToxTrac with current
accessible programs on the web.
3. The main advantages of ToxTrac are: i) no specific knowledge of the
geometry of the tracked bodies is needed; ii) processing speed, ToxTrac can
operate at a rate >25 frames per second in HD videos using modern desktop
computers; iii) simultaneous tracking of multiple organisms in multiple arenas;
iv) integrated distortion correction and camera calibration; v) robust against
false positives; vi) preservation of individual identification if crossing
occurs; vii) useful statistics and heat maps in real scale are exported in:
image, text and excel formats.
4. ToxTrac can be used for high speed tracking of insects, fish, rodents or
other species, and provides useful locomotor information. We suggest using
ToxTrac for future studies of animal behavior independent of research area.
Download ToxTrac here: this https URL
Learning Deep Representations for Scene Labeling with Guided Supervision
Comments: 13 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Scene labeling is a challenging classification problem where each input image
requires a pixel-level prediction map. Recently, deep-learning-based methods
have shown their effectiveness on solving this problem. However, we argue that
the large intra-class variation provides ambiguous training information and
hinders the deep models’ ability to learn more discriminative deep feature
representations. Unlike existing methods that mainly utilize semantic context
for regularizing or smoothing the prediction map, we design novel supervisions
from semantic context for learning better deep feature representations. Two
types of semantic context, scene names of images and label map statistics of
image patches, are exploited to create label hierarchies between the original
classes and newly created subclasses as the learning supervisions. Such
subclasses show lower intra-class variation, and help CNN detect more
meaningful visual patterns and learn more effective deep features. Novel
training strategies and network structure that take advantages of such label
hierarchies are introduced. Our proposed method is evaluated extensively on
four popular datasets, Stanford Background (8 classes), SIFTFlow (33 classes),
Barcelona (170 classes) and LM+Sun datasets (232 classes) with 3 different
networks structures, and show state-of-the-art performance. The experiments
show that our proposed method makes deep models learn more discriminative
feature representations without increasing model size or complexity.
Automatic tracking of vessel-like structures from a single starting point
Dario Augusto Borges Oliveira , Laura Leal-Taixe , Raul Queiroz Feitosa , Bodo Rosenhahn Subjects : Computer Vision and Pattern Recognition (cs.CV)
The identification of vascular networks is an important topic in the medical
image analysis community. While most methods focus on single vessel tracking,
the few solutions that exist for tracking complete vascular networks are
usually computationally intensive and require a lot of user interaction. In
this paper we present a method to track full vascular networks iteratively
using a single starting point. Our approach is based on a cloud of sampling
points distributed over concentric spherical layers. We also proposed a vessel
model and a metric of how well a sample point fits this model. Then, we
implement the network tracking as a min-cost flow problem, and propose a novel
optimization scheme to iteratively track the vessel structure by inherently
handling bifurcations and paths. The method was tested using both synthetic and
real images. On the 9 different data-sets of synthetic blood vessels, we
achieved maximum accuracies of more than 98\%. We further use the synthetic
data-set to analyse the sensibility of our method to parameter setting, showing
the robustness of the proposed algorithm. For real images, we used coronary,
carotid and pulmonary data to segment vascular structures and present the
visual results. Still for real images, we present numerical and visual results
for networks of nerve fibers in the olfactory system. Further visual results
also show the potential of our approach for identifying vascular networks
topologies. The presented method delivers good results for the several
different datasets tested and have potential for segmenting vessel-like
structures. Also, the topology information, inherently extracted, can be used
for further analysis to computed aided diagnosis and surgical planning.
Finally, the method’s modular aspect holds potential for problem-oriented
adjustments and improvements.
Image Captioning with Object Detection and Localization
Zhongliang Yang , Yu-Jin Zhang , Sadaqat ur Rehman , Yongfeng Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Automatically generating a natural language description of an image is a task
close to the heart of image understanding. In this paper, we present a
multi-model neural network method closely related to the human visual system
that automatically learns to describe the content of images. Our model consists
of two sub-models: an object detection and localization model, which extract
the information of objects and their spatial relationship in images
respectively; Besides, a deep recurrent neural network (RNN) based on long
short-term memory (LSTM) units with attention mechanism for sentences
generation. Each word of the description will be automatically aligned to
different objects of the input image when it is generated. This is similar to
the attention mechanism of the human visual system. Experimental results on the
COCO dataset showcase the merit of the proposed method, which outperforms
previous benchmark models.
C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones
Nuhad A. Malalla , Ying Chen Subjects : Computer Vision and Pattern Recognition (cs.CV)
In this paper, we investigated a C-arm tomographic technique as a new three
dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone
detection over view angle less than 180o. Our C-arm tomographic technique
provides a series of two dimensional (2D) images with a single scan over 40o
view angle. Experimental studies were performed with a kidney phantom that was
formed from a pig kidney with two embedded kidney stones. Different
reconstruction methods were developed for C-arm tomographic technique to
generate 3D kidney information including: point by point back projection (BP),
filtered back projection (FBP), simultaneous algebraic reconstruction technique
(SART) and maximum likelihood expectation maximization (MLEM). Computer
simulation study was also done with simulated 3D spherical object to evaluate
the reconstruction results. Preliminary results demonstrated the capability of
our C-arm tomographic technique to generate 3D kidney information for kidney
stone detection with low exposure of radiation. The kidney stones are visible
on reconstructed planes with identifiable shapes and sizes.
Leveraging deep neural networks to capture psychological representations
Comments: 22 pages, 3 figures, submitted for publication
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Artificial neural networks have seen a recent surge in popularity for their
ability to solve complex problems as well as or better than humans. In computer
vision, deep convolutional neural networks have become the standard for object
classification and image understanding due to their ability to learn efficient
representations of high-dimensional data. However, the relationship between
these representations and human psychological representations has remained
unclear. Here we evaluate the quantitative and qualitative nature of this
correspondence. We find that state-of-the-art object classification networks
provide a reasonable first approximation to human similarity judgments, but
fail to capture some of the structure of psychological representations. We show
that a simple transformation that corrects these discrepancies can be obtained
through convex optimization. Such representations provide a tool that can be
used to study human performance on complex tasks with naturalistic stimuli,
such as predicting the difficulty of learning novel categories. Our results
extend the scope of psychological experiments and computational modeling of
cognition by enabling tractable use of large natural stimulus sets.
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Charles R. Qi , Li Yi , Hao Su , Leonidas J. Guibas Subjects : Computer Vision and Pattern Recognition (cs.CV)
Few prior works study deep learning on point sets. PointNet by Qi et al. is a
pioneer in this direction. However, by design PointNet does not capture local
structures induced by the metric space points live in, limiting its ability to
recognize fine-grained patterns and generalizability to complex scenes. In this
work, we introduce a hierarchical neural network that applies PointNet
recursively on a nested partitioning of the input point set. By exploiting
metric space distances, our network is able to learn local features with
increasing contextual scales. With further observation that point sets are
usually sampled with varying densities, which results in greatly decreased
performance for networks trained on uniform densities, we propose novel set
learning layers to adaptively combine features from multiple scales.
Experiments show that our network called PointNet++ is able to learn deep point
set features efficiently and robustly. In particular, results significantly
better than state-of-the-art have been obtained on challenging benchmarks of 3D
point clouds.
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
Denis A. Gudovskiy , Luca Rigazio Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Neural and Evolutionary Computing (cs.NE)
In this paper we introduce ShiftCNN, a generalized low-precision architecture
for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN
is based on a power-of-two weight representation and, as a result, performs
only shift and addition operations. Furthermore, ShiftCNN substantially reduces
computational cost of convolutional layers by precomputing convolution terms.
Such an optimization can be applied to any CNN architecture with a relatively
small codebook of weights and allows to decrease the number of product
operations by at least two orders of magnitude. The proposed architecture
targets custom inference accelerators and can be realized on FPGAs or ASICs.
Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be
converted without retraining into ShiftCNN with less than 1% drop in accuracy
when the proposed quantization algorithm is employed. RTL simulations,
targeting modern FPGAs, show that power consumption of convolutional layers is
reduced by a factor of 4 compared to conventional 8-bit fixed-point
architectures.
Active Learning for Structured Prediction from Partially Labeled Data
Comments: This paper is submitted to ICCV 2017. 2nd and 3rd authors are in alphabetic order (equal contribution)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We propose a general purpose active learning algorithm for structured
prediction, gathering labeled data for training a model that outputs a set of
related labels for an image or video. Active learning starts with a limited
initial training set, then iterates querying a user for labels on unlabeled
data and retraining the model. We propose a novel algorithm for selecting data
for labeling, choosing examples to maximize expected information gain based on
belief propagation inference. This is a general purpose method and can be
applied to a variety of tasks or models. As a specific example we demonstrate
this framework for learning to recognize human actions and group activities in
video sequences. Experiments show that our proposed algorithm outperforms
previous active learning methods and can achieve accuracy comparable to fully
supervised methods while utilizing significantly less labeled data.
Comments: CVPR 2017 Spotlight
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG)
We present an end-to-end, multimodal, fully convolutional network for
extracting semantic structures from document images. We consider document
semantic structure extraction as a pixel-wise segmentation task, and propose a
unified model that classifies pixels based not only on their visual appearance,
as in the traditional page segmentation task, but also on the content of
underlying text. Moreover, we propose an efficient synthetic document
generation process that we use to generate pretraining data for our network.
Once the network is trained on a large set of synthetic documents, we fine-tune
the network on unlabeled real documents using a semi-supervised approach. We
systematically study the optimum network architecture and show that both our
multimodal approach and the synthetic data pretraining significantly boost the
performance.
Low-shot learning with large-scale diffusion
Matthijs Douze , Arthur Szlam , Bharath Hariharan , Hervé Jégou Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)
This paper considers the problem of inferring image labels for which only a
few labelled examples are available at training time. This setup is often
referred to as low-shot learning in the literature, where a standard approach
is to re-train the last few layers of a convolutional neural network learned on
separate classes. We consider a semi-supervised setting in which we exploit a
large collection of images to support label propagation. This is made possible
by leveraging the recent advances on large-scale similarity graph construction.
We show that despite its conceptual simplicity, scaling up label propagation to
up hundred millions of images leads to state of the art accuracy in the
low-shot learning regime.
CoMaL Tracking: Tracking Points at the Object Boundaries
Comments: 10 pages, 10 figures, to appear in 1st Joint BMTT-PETS Workshop on Tracking and Surveillance, CVPR 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Traditional point tracking algorithms such as the KLT use local 2D
information aggregation for feature detection and tracking, due to which their
performance degrades at the object boundaries that separate multiple objects.
Recently, CoMaL Features have been proposed that handle such a case. However,
they proposed a simple tracking framework where the points are re-detected in
each frame and matched. This is inefficient and may also lose many points that
are not re-detected in the next frame. We propose a novel tracking algorithm to
accurately and efficiently track CoMaL points. For this, the level line segment
associated with the CoMaL points is matched to MSER segments in the next frame
using shape-based matching and the matches are further filtered using
texture-based matching. Experiments show improvements over a simple
re-detect-and-match framework as well as KLT in terms of speed/accuracy on
different real-world applications, especially at the object boundaries.
Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs
Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
To achieve state-of-the-art results on challenges in vision, Convolutional
Neural Networks learn stationary filters that take advantage of the underlying
image structure. Our purpose is to propose an efficient layer formulation that
extends this property to any domain described by a graph. Namely, we use the
support of its adjacency matrix to design learnable weight sharing filters able
to exploit the underlying structure of signals. The proposed formulation makes
it possible to learn the weights of the filter as well as a scheme that
controls how they are shared across the graph. We perform validation
experiments with image datasets and show that these filters offer performances
comparable with convolutional ones.
Training Quantized Nets: A Deeper Understanding
Hao Li , Soham De , Zheng Xu , Christoph Studer , Hanan Samet , Tom Goldstein Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV)
Currently, deep neural networks are deployed on low-power embedded devices by
first training a full-precision model using powerful computing hardware, and
then deriving a corresponding low-precision model for efficient inference on
such systems. However, training models directly with coarsely quantized weights
is a key step towards learning on embedded platforms that have limited
computing resources, memory capacity, and power consumption. Numerous recent
publications have studied methods for training quantized network, but these
studies have mostly been empirical. In this work, we investigate training
methods for quantized neural networks from a theoretical viewpoint. We first
explore accuracy guarantees for training methods under convexity assumptions.
We then look at the behavior of algorithms for non-convex problems, and we show
that training algorithms that exploit high-precision representations have an
important annealing property that purely quantized training methods lack, which
explains many of the observed empirical differences between these types of
algorithms.
Artificial Intelligence
What Does a Belief Function Believe In ?
Comments: 13 pages
Subjects:
Artificial Intelligence (cs.AI)
The conditioning in the Dempster-Shafer Theory of Evidence has been defined
(by Shafer cite{Shafer:90} as combination of a belief function and of an
“event” via Dempster rule.
On the other hand Shafer cite{Shafer:90} gives a “probabilistic”
interpretation of a belief function (hence indirectly its derivation from a
sample). Given the fact that conditional probability distribution of a
sample-derived probability distribution is a probability distribution derived
from a subsample (selected on the grounds of a conditioning event), the paper
investigates the empirical nature of the Dempster- rule of combination.
It is demonstrated that the so-called “conditional” belief function is not a
belief function given an event but rather a belief function given manipulation
of original empirical data.\ Given this, an interpretation of belief function
different from that of Shafer is proposed. Algorithms for construction of
belief networks from data are derived for this interpretation.
Responsible Autonomy
Comments: IJCAI2017 (International Joint Conference on Artificial Intelligence)
Subjects:
Artificial Intelligence (cs.AI)
As intelligent systems are increasingly making decisions that directly affect
society, perhaps the most important upcoming research direction in AI is to
rethink the ethical implications of their actions. Means are needed to
integrate moral, societal and legal values with technological developments in
AI, both during the design process as well as part of the deliberation
algorithms employed by these systems. In this paper, we describe leading ethics
theories and propose alternative ways to ensure ethical behavior by artificial
systems. Given that ethics are dependent on the socio-cultural context and are
often only implicit in deliberation processes, methodologies are needed to
elicit the values held by designers and stakeholders, and to make these
explicit leading to better understanding and trust on artificial autonomous
systems.
Regular Boardgames
Jakub Kowalski , Jakub Sutowicz , Marek Szykuła Subjects : Artificial Intelligence (cs.AI)
We present an initial version of Regular Boardgames general game description
language. This stands as an extension of Simplified Boardgames language. Our
language is designed to be able to express the rules of a majority of popular
boardgames including the complex rules such as promotions, castling, en
passant, jump captures, liberty captures, and obligatory moves. The language
describes all the above through one consistent general mechanism based on
regular expressions, without using exceptions or ad hoc rules.
Predictive Coding-based Deep Dynamic Neural Network for Visuomotor Learning
Comments: Accepted at the 7th Joint IEEE International Conference of Developmental Learning and Epigenetic Robotics (ICDL-EpiRob 2017)
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)
This study presents a dynamic neural network model based on the predictive
coding framework for perceiving and predicting the dynamic visuo-proprioceptive
patterns. In our previous study [1], we have shown that the deep dynamic neural
network model was able to coordinate visual perception and action generation in
a seamless manner. In the current study, we extended the previous model under
the predictive coding framework to endow the model with a capability of
perceiving and predicting dynamic visuo-proprioceptive patterns as well as a
capability of inferring intention behind the perceived visuomotor information
through minimizing prediction error. A set of synthetic experiments were
conducted in which a robot learned to imitate the gestures of another robot in
a simulation environment. The experimental results showed that with given
intention states, the model was able to mentally simulate the possible incoming
dynamic visuo-proprioceptive patterns in a top-down process without the inputs
from the external environment. Moreover, the results highlighted the role of
minimizing prediction error in inferring underlying intention of the perceived
visuo-proprioceptive patterns, supporting the predictive coding account of the
mirror neuron systems. The results also revealed that minimizing prediction
error in one modality induced the recall of the corresponding representation of
another modality acquired during the consolidative learning of raw-level
visuo-proprioceptive patterns.
Comments: Accepted in the IEEE Transactions on Cognitive and Developmental Systems (TCDS), 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Robotics (cs.RO)
This study investigates how adequate coordination among the different
cognitive processes of a humanoid robot can be developed through end-to-end
learning of direct perception of visuomotor stream. We propose a deep dynamic
neural network model built on a dynamic vision network, a motor generation
network, and a higher-level network. The proposed model was designed to process
and to integrate direct perception of dynamic visuomotor patterns in a
hierarchical model characterized by different spatial and temporal constraints
imposed on each level. We conducted synthetic robotic experiments in which a
robot learned to read human’s intention through observing the gestures and then
to generate the corresponding goal-directed actions. Results verify that the
proposed model is able to learn the tutored skills and to generalize them to
novel situations. The model showed synergic coordination of perception, action
and decision making, and it integrated and coordinated a set of cognitive
skills including visual perception, intention reading, attention switching,
working memory, action preparation and execution in a seamless manner. Analysis
reveals that coherent internal representations emerged at each level of the
hierarchy. Higher-level representation reflecting actional intention developed
by means of continuous integration of the lower-level visuo-proprioceptive
stream.
Design and Implementation of Modified Fuzzy based CPU Scheduling Algorithm
Comments: 6 Pages
Journal-ref: International Journal of Computer Applications, Volume 77, No 17,
September 2013
Subjects:
Operating Systems (cs.OS)
; Artificial Intelligence (cs.AI)
CPU Scheduling is the base of multiprogramming. Scheduling is a process which
decides order of task from a set of multiple tasks that are ready to execute.
There are number of CPU scheduling algorithms available, but it is very
difficult task to decide which one is better. This paper discusses the design
and implementation of modified fuzzy based CPU scheduling algorithm. This paper
present a new set of fuzzy rules. It demonstrates that scheduling done with new
priority improves average waiting time and average turnaround time.
Reading Twice for Natural Language Understanding
Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Despite the recent success of neural networks in tasks involving natural
language understanding (NLU) there has only been limited progress in some of
the fundamental challenges of NLU, such as the disambiguation of the meaning
and function of words in context. This work approaches this problem by
incorporating contextual information into word representations prior to
processing the task at hand. To this end we propose a general-purpose reading
architecture that is employed prior to a task-specific NLU model. It is
responsible for refining context-agnostic word representations with contextual
information and lends itself to the introduction of additional,
context-relevant information from external knowledge sources. We demonstrate
that previously non-competitive models benefit dramatically from employing
contextual representations, closing the gap between general-purpose reading
architectures and the state-of-the-art performance obtained with fine-tuned,
task-specific architectures. Apart from our empirical results we present a
comprehensive analysis of the computed representations which gives insights
into the kind of information added during the refinement process.
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
Serkan Ayvaz , Mehmet Aydar Subjects : Databases (cs.DB) ; Artificial Intelligence (cs.AI)
The continuing development of Semantic Web technologies and the increasing
user adoption in the recent years have accelerated the progress incorporating
explicit semantics with data on the Web. With the rapidly growing RDF (Resource
Description Framework) data on the Semantic Web, processing large semantic
graph data have become more challenging. Constructing a summary graph structure
from the raw RDF can help obtain semantic type relations and reduce the
computational complexity for graph processing purposes. In this paper, we
addressed the problem of graph summarization in RDF graphs, and we proposed an
approach for building summary graph structures automatically from RDF graph
data. Moreover, we introduced a measure to help discover optimum class
dissimilarity thresholds and an effective method to discover the type classes
automatically. In future work, we plan to investigate further improvement
options on the scalability of the proposed method.
Comments: pp. 155-162
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)
Humans and animals are constantly exposed to a continuous stream of sensory
information from different modalities. At the same time, they form more
compressed representations like concepts or symbols. In species that use
language, this process is further structured by this interaction, where a
mapping between the sensorimotor concepts and linguistic elements needs to be
established. There is evidence that children might be learning language by
simply disambiguating potential meanings based on multiple exposures to
utterances in different contexts (cross-situational learning). In existing
models, the mapping between modalities is usually found in a single step by
directly using frequencies of referent and meaning co-occurrences. In this
paper, we present an extension of this one-step mapping and introduce a newly
proposed sequential mapping algorithm together with a publicly available Matlab
implementation. For demonstration, we have chosen a less typical scenario:
instead of learning to associate objects with their names, we focus on body
representations. A humanoid robot is receiving tactile stimulations on its
body, while at the same time listening to utterances of the body part names
(e.g., hand, forearm and torso). With the goal at arriving at the correct “body
categories”, we demonstrate how a sequential mapping algorithm outperforms
one-step mapping. In addition, the effect of data set size and noise in the
linguistic input are studied.
Distribution-Free One-Pass Learning
Peng Zhao , Zhi-Hua Zhou Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In many large-scale machine learning applications, data are accumulated with
time, and thus, an appropriate model should be able to update in an online
paradigm. Moreover, as the whole data volume is unknown when constructing the
model, it is desired to scan each data item only once with a storage
independent with the data volume. It is also noteworthy that the distribution
underlying may change during the data accumulation procedure. To handle such
tasks, in this paper we propose DFOP, a distribution-free one-pass learning
approach. This approach works well when distribution change occurs during data
accumulation, without requiring prior knowledge about the change. Every data
item can be discarded once it has been scanned. Besides, theoretical guarantee
shows that the estimate error, under a mild assumption, decreases until
convergence with high probability. The performance of DFOP for both regression
and classification are validated in experiments.
Generalized Value Iteration Networks: Life Beyond Lattices
Comments: 14 pages, conference
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI)
In this paper, we introduce a generalized value iteration network (GVIN),
which is an end-to-end neural network planning module. GVIN emulates the value
iteration algorithm by using a novel graph convolution operator, which enables
GVIN to learn and plan on irregular spatial graphs. We propose three novel
differentiable kernels as graph convolution operators and show that the
embedding based kernel achieves the best performance. We further propose
episodic Q-learning, an improvement upon traditional n-step Q-learning that
stabilizes training for networks that contain a planning module. Lastly, we
evaluate GVIN on planning problems in 2D mazes, irregular graphs, and
real-world street networks, showing that GVIN generalizes well for both
arbitrary graphs and unseen graphs of larger scale and outperforms a naive
generalization of VIN (discretizing a spatial graph into a 2D image).
Information Retrieval
On the Robustness of Deep Convolutional Neural Networks for Music Classification
Keunwoo Choi , George Fazekas , Kyunghyun Cho , Mark Sandler Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
Deep neural networks (DNN) have been successfully applied for music
classification including music tagging. However, there are several open
questions regarding generalisation and best practices in the choice of network
architectures, hyper-parameters and input representations. In this article, we
investigate specific aspects of neural networks to deepen our understanding of
their properties. We analyse and (re-)validate a large music tagging dataset to
investigate the reliability of training and evaluation. We perform
comprehensive experiments involving audio preprocessing using different
time-frequency representations, logarithmic magnitude compression, frequency
weighting and scaling. Using a trained network, we compute label vector
similarities which is compared to groundtruth similarity.
The results highlight several import aspects of music tagging and neural
networks. We show that networks can be effective despite of relatively large
error rates in groundtruth datasets. We subsequently show that many commonly
used input preprocessing techniques are redundant except magnitude compression.
Lastly, the analysis of our trained network provides valuable insight into the
relationships between music tags. These results highlight the benefit of using
data-driven methods to address automatic music tagging.
Comments: 2 pages, workshop paper accepted at the SIGIR 2017
Subjects:
Digital Libraries (cs.DL)
; Information Retrieval (cs.IR)
The large scale of scholarly publications poses a challenge for scholars in
information seeking and sensemaking. Bibliometrics, information retrieval (IR),
text mining and NLP techniques could help in these search and look-up
activities, but are not yet widely used. This workshop is intended to stimulate
IR researchers and digital library professionals to elaborate on new approaches
in natural language processing, information retrieval, scientometrics, text
mining and recommendation techniques that can advance the state-of-the-art in
scholarly document understanding, analysis, and retrieval at scale. The BIRNDL
workshop at SIGIR 2017 will incorporate an invited talk, paper sessions and the
third edition of the Computational Linguistics (CL) Scientific Summarization
Shared Task.
Computation and Language
Reading Twice for Natural Language Understanding
Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Despite the recent success of neural networks in tasks involving natural
language understanding (NLU) there has only been limited progress in some of
the fundamental challenges of NLU, such as the disambiguation of the meaning
and function of words in context. This work approaches this problem by
incorporating contextual information into word representations prior to
processing the task at hand. To this end we propose a general-purpose reading
architecture that is employed prior to a task-specific NLU model. It is
responsible for refining context-agnostic word representations with contextual
information and lends itself to the introduction of additional,
context-relevant information from external knowledge sources. We demonstrate
that previously non-competitive models benefit dramatically from employing
contextual representations, closing the gap between general-purpose reading
architectures and the state-of-the-art performance obtained with fine-tuned,
task-specific architectures. Apart from our empirical results we present a
comprehensive analysis of the computed representations which gives insights
into the kind of information added during the refinement process.
The Algorithmic Inflection of Russian and Generation of Grammatically Correct Text
We present a deterministic algorithm for Russian inflection. This algorithm
is implemented in a publicly available web-service www.passare.ru which
provides functions for inflection of single words, word matching and synthesis
of grammatically correct Russian text. The inflectional functions have been
tested against the annotated corpus of Russian language OpenCorpora.
Comments: Accepted by ACL
Subjects:
Computation and Language (cs.CL)
Current Chinese social media text summarization models are based on an
encoder-decoder framework. Although its generated summaries are similar to
source texts literally, they have low semantic relevance. In this work, our
goal is to improve semantic relevance between source texts and summaries for
Chinese social media summarization. We introduce a Semantic Relevance Based
neural model to encourage high semantic similarity between texts and summaries.
In our model, the source text is represented by a gated attention encoder,
while the summary representation is produced by a decoder. Besides, the
similarity score between the representations is maximized during training. Our
experiments show that the proposed model outperforms baseline systems on a
social media corpus.
Content-Based Table Retrieval for Web Queries
Zhao Yan , Duyu Tang , Nan Duan , Junwei Bao , Yuanhua Lv , Ming Zhou , Zhoujun Li Subjects : Computation and Language (cs.CL)
Understanding the connections between unstructured text and semi-structured
table is an important yet neglected problem in natural language processing. In
this work, we focus on content-based table retrieval. Given a query, the task
is to find the most relevant table from a collection of tables. Further
progress towards improving this area requires powerful models of semantic
matching and richer training and evaluation resources. To remedy this, we
present a ranking based approach, and implement both carefully designed
features and neural network architectures to measure the relevance between a
query and the content of a table. Furthermore, we release an open-domain
dataset that includes 21,113 web queries for 273,816 tables. We conduct
comprehensive experiments on both real world and synthetic datasets. Results
verify the effectiveness of our approach and present the challenges for this
task.
Context encoders as a simple but powerful extension of word2vec
Comments: ACL 2017 2nd Workshop on Representation Learning for NLP
Subjects:
Machine Learning (stat.ML)
; Computation and Language (cs.CL); Learning (cs.LG)
With a simple architecture and the ability to learn meaningful word
embeddings efficiently from texts containing billions of words, word2vec
remains one of the most popular neural language models used today. However, as
only a single embedding is learned for every word in the vocabulary, the model
fails to optimally represent words with multiple meanings. Additionally, it is
not possible to create embeddings for new (out-of-vocabulary) words on the
spot. Based on an intuitive interpretation of the continuous bag-of-words
(CBOW) word2vec model’s negative sampling training objective in terms of
predicting context based similarities, we motivate an extension of the model we
call context encoders (ConEc). By multiplying the matrix of trained word2vec
embeddings with a word’s average context vector, out-of-vocabulary (OOV)
embeddings and representations for a word with multiple meanings can be created
based on the word’s local contexts. The benefits of this approach are
illustrated by using these word embeddings as features in the CoNLL 2003 named
entity recognition (NER) task.
Comments: pp. 155-162
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)
Humans and animals are constantly exposed to a continuous stream of sensory
information from different modalities. At the same time, they form more
compressed representations like concepts or symbols. In species that use
language, this process is further structured by this interaction, where a
mapping between the sensorimotor concepts and linguistic elements needs to be
established. There is evidence that children might be learning language by
simply disambiguating potential meanings based on multiple exposures to
utterances in different contexts (cross-situational learning). In existing
models, the mapping between modalities is usually found in a single step by
directly using frequencies of referent and meaning co-occurrences. In this
paper, we present an extension of this one-step mapping and introduce a newly
proposed sequential mapping algorithm together with a publicly available Matlab
implementation. For demonstration, we have chosen a less typical scenario:
instead of learning to associate objects with their names, we focus on body
representations. A humanoid robot is receiving tactile stimulations on its
body, while at the same time listening to utterances of the body part names
(e.g., hand, forearm and torso). With the goal at arriving at the correct “body
categories”, we demonstrate how a sequential mapping algorithm outperforms
one-step mapping. In addition, the effect of data set size and noise in the
linguistic input are studied.
Distributed, Parallel, and Cluster Computing
Study of Vital Data Analysis Platform Using Wearable Sensor
Comments: 5 pages, 2 figures, IEICE Technical Report, SC2016-34, Mar. 2017. arXiv admin note: substantial text overlap with arXiv:1704.05573
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Computers and Society (cs.CY)
In this paper, we propose a vital data analysis platform which resolves
existing problems to utilize vital data for real-time actions. Recently, IoT
technologies have been progressed but in the healthcare area, real-time actions
based on analyzed vital data are not considered sufficiently yet. The causes
are proper use of analyzing methods of stream / micro batch processing and
network cost. To resolve existing problems, we propose our vital data analysis
platform. Our platform collects vital data of Electrocardiograph and
acceleration using an example of wearable vital sensor and analyzes them to
extract posture, fatigue and relaxation in smart phones or cloud. Our platform
can show analyzed dangerous posture or fatigue level change. We implemented the
platform and we are now preparing a field test.
Clique Gossiping
Yang Liu , Bo Li , Brian Anderson , Guodong Shi Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
This paper proposes and investigates a framework for clique gossip protocols.
As complete subnetworks, the existence of cliques is ubiquitous in various
social, computer, and engineering networks. By clique gossiping, nodes interact
with each other along a sequence of cliques. Clique-gossip protocols are
defined as arbitrary linear node interactions where node states are vectors
evolving as linear dynamical systems. Such protocols become clique-gossip
averaging algorithms when node states are scalars under averaging rules. We
generalize the classical notion of line graph to capture the essential node
interaction structure induced by both the underlying network and the specific
clique sequence. We prove a fundamental eigenvalue invariance principle for
periodic clique-gossip protocols, which implies that any permutation of the
clique sequence leads to the same spectrum for the overall state transition
when the generalized line graph contains no cycle. We also prove that for a
network with (n) nodes, cliques with smaller sizes determined by factors of (n)
can always be constructed leading to finite-time convergent clique-gossip
averaging algorithms, provided (n) is not a prime number. Particularly, such
finite-time convergence can be achieved with cliques of equal size (m) if and
only if (n) is divisible by (m) and they have exactly the same prime factors. A
proven fastest finite-time convergent clique-gossip algorithm is constructed
for clique-gossiping using size-(m) cliques. Additionally, the acceleration
effects of clique-gossiping are illustrated via numerical examples.
Asynchronous Pattern Formation: the effects of a rigorous approach
Comments: 41 pages
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Given a multiset F of points in the Euclidean plane and a set R of robots
such that |R|=|F|, the Pattern Formation (PF) problem asks for a distributed
algorithm that moves robots so as to reach a configuration similar to F.
Similarity means that robots must be disposed as F regardless of translations,
rotations, reflections, uniform scalings. Initially, each robot occupies a
distinct position. When active, a robot operates in standard Look-Compute-Move
cycles. Robots are asynchronous, oblivious, anonymous, silent and execute the
same distributed algorithm. So far, the problem has been mainly addressed by
assuming chirality, that is robots share a common left-right orientation. We
are interested in removing such a restriction. While working on the subject, we
faced several issues that required close attention. We deeply investigated how
such difficulties were overcome in the literature, revealing that crucial
arguments for the correctness proof of the existing algorithms have been
neglected. Here we design a new deterministic distributed algorithm that solves
PF for any pattern when asynchronous robots start from asymmetric
configurations, without chirality. The focus on asymmetric configurations might
be perceived as an over-simplification of the subject due to the common feeling
with the PF problem by the scientific community. However, we demonstrate that
this is not the case. The systematic lack of rigorous arguments with respect to
necessary conditions required for providing correctness proofs deeply affects
the validity as well as the relevance of strategies proposed in the literature.
Our new methodology is characterized by the use of logical predicates in order
to formally describe our algorithm as well as its correctness.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Comments: Tech report
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Deep learning thrives with large neural networks and large datasets. However,
larger networks and larger datasets result in longer training times that impede
research and development progress. Distributed synchronous SGD offers a
potential solution to this problem by dividing SGD minibatches over a pool of
parallel workers. Yet to make this scheme efficient, the per-worker workload
must be large, which implies nontrivial growth in the SGD minibatch size. In
this paper, we empirically show that on the ImageNet dataset large minibatches
cause optimization difficulties, but when these are addressed the trained
networks exhibit good generalization. Specifically, we show no loss of accuracy
when training with large minibatch sizes up to 8192 images. To achieve this
result, we adopt a linear scaling rule for adjusting learning rates as a
function of minibatch size and develop a new warmup scheme that overcomes
optimization challenges early in training. With these simple techniques, our
Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs
in one hour, while matching small minibatch accuracy. Using commodity hardware,
our implementation achieves ~90% scaling efficiency when moving from 8 to 256
GPUs. This system enables us to train visual recognition models on
internet-scale data with high efficiency.
Learning
Principled Detection of Out-of-Distribution Examples in Neural Networks
Shiyu Liang , Yixuan Li , R. Srikant Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
We consider the problem of detecting out-of-distribution examples in neural
networks. We propose ODIN, a simple and effective out-of-distribution detector
for neural networks, that does not require any change to a pre-trained model.
Our method is based on the observation that using temperature scaling and
adding small perturbations to the input can separate the softmax score
distributions of in- and out-of-distribution samples, allowing for more
effective detection. We show in a series of experiments that our approach is
compatible with diverse network architectures and datasets. It consistently
outperforms the baseline approach[1] by a large margin, establishing a new
state-of-the-art performance on this task. For example, ODIN reduces the false
positive rate from the baseline 34.7% to 4.3% on the DenseNet (applied to
CIFAR-10) when the true positive rate is 95%. We theoretically analyze the
method and prove that performance improvement is guaranteed under mild
conditions on the image distributions.
Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs
Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
To achieve state-of-the-art results on challenges in vision, Convolutional
Neural Networks learn stationary filters that take advantage of the underlying
image structure. Our purpose is to propose an efficient layer formulation that
extends this property to any domain described by a graph. Namely, we use the
support of its adjacency matrix to design learnable weight sharing filters able
to exploit the underlying structure of signals. The proposed formulation makes
it possible to learn the weights of the filter as well as a scheme that
controls how they are shared across the graph. We perform validation
experiments with image datasets and show that these filters offer performances
comparable with convolutional ones.
Nuclear Discrepancy for Active Learning
Comments: 32 pages, 5 figures, 4 tables
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Active learning algorithms propose which unlabeled objects should be queried
for their labels to improve a predictive model the most. We study active
learners that minimize generalization bounds and uncover relationships between
these bounds that lead to an improved approach to active learning. In
particular we show the relation between the bound of the state-of-the-art
Maximum Mean Discrepancy (MMD) active learner, the bound of the Discrepancy,
and a new and looser bound that we refer to as the Nuclear Discrepancy bound.
We motivate this bound by a probabilistic argument: we show it considers
situations which are more likely to occur. Our experiments indicate that active
learning using the tightest Discrepancy bound performs the worst in terms of
the squared loss. Overall, our proposed loosest Nuclear Discrepancy
generalization bound performs the best. We confirm our probabilistic argument
empirically: the other bounds focus on more pessimistic scenarios that are
rarer in practice. We conclude that tightness of bounds is not always of main
importance and that active learning methods should concentrate on realistic
scenarios in order to improve performance.
Decoupling "when to update" from "how to update"
Eran Malach , Shai Shalev-Shwartz Subjects : Learning (cs.LG)
Deep learning requires data. A useful approach to obtain data is to be
creative and mine data from various sources, that were created for different
purposes. Unfortunately, this approach often leads to noisy labels. In this
paper, we propose a meta algorithm for tackling the noisy labels problem. The
key idea is to decouple “when to update” from “how to update”. We demonstrate
the effectiveness of our algorithm by mining data for gender classification by
combining the Labeled Faces in the Wild (LFW) face recognition dataset with a
textual genderizing service, which leads to a noisy dataset. While our approach
is very simple to implement, it leads to state-of-the-art results. We analyze
some convergence properties of the proposed algorithm.
Clustering with t-SNE, provably
George C. Linderman , Stefan Steinerberger Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and
visualization method proposed by van der Maaten & Hinton in 2008, has rapidly
become a standard tool in a number of natural sciences. Despite its
overwhelming success, there is a distinct lack of mathematical foundations and
the inner workings of the algorithm are not well understood. The purpose of
this paper is to prove that t-SNE is able to recover well-separated clusters;
more precisely, we prove that t-SNE in the `early exaggeration’ phase, an
optimization technique proposed by van der Maaten & Hinton (2008) and van der
Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests
novel ways for setting the exaggeration parameter (alpha) and step size (h).
Numerical examples illustrate the effectiveness of these rules: in particular,
the quality of embedding of topological structures (e.g. the swiss roll)
improves. We also discuss a connection to spectral clustering methods.
Pain-Free Random Differential Privacy with Sensitivity Sampling
Comments: 12 pages, 9 figures, 1 table; full report of paper accepted into ICML’2017
Subjects:
Learning (cs.LG)
; Cryptography and Security (cs.CR); Databases (cs.DB)
Popular approaches to differential privacy, such as the Laplace and
exponential mechanisms, calibrate randomised smoothing through global
sensitivity of the target non-private function. Bounding such sensitivity is
often a prohibitively complex analytic calculation. As an alternative, we
propose a straightforward sampler for estimating sensitivity of non-private
mechanisms. Since our sensitivity estimates hold with high probability, any
mechanism that would be ((epsilon,delta))-differentially private under
bounded global sensitivity automatically achieves
((epsilon,delta,gamma))-random differential privacy (Hall et al., 2012),
without any target-specific calculations required. We demonstrate on worked
example learners how our usable approach adopts a naturally-relaxed privacy
guarantee, while achieving more accurate releases even for non-private
functions that are black-box computer programs.
Self-Normalizing Neural Networks
Comments: 9 pages (+ 93 pages appendix)
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Deep Learning has revolutionized vision via convolutional neural networks
(CNNs) and natural language processing via recurrent neural networks (RNNs).
However, success stories of Deep Learning with standard feed-forward neural
networks (FNNs) are rare. FNNs that perform well are typically shallow and,
therefore cannot exploit many levels of abstract representations. We introduce
self-normalizing neural networks (SNNs) to enable high-level abstract
representations. While batch normalization requires explicit normalization,
neuron activations of SNNs automatically converge towards zero mean and unit
variance. The activation function of SNNs are “scaled exponential linear units”
(SELUs), which induce self-normalizing properties. Using the Banach fixed-point
theorem, we prove that activations close to zero mean and unit variance that
are propagated through many network layers will converge towards zero mean and
unit variance — even under the presence of noise and perturbations. This
convergence property of SNNs allows to (1) train deep networks with many
layers, (2) employ strong regularization, and (3) to make learning highly
robust. Furthermore, for activations not close to unit variance, we prove an
upper and lower bound on the variance, thus, vanishing and exploding gradients
are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning
repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with
standard FNNs and other machine learning methods such as random forests and
support vector machines. SNNs significantly outperformed all competing FNN
methods at 121 UCI tasks, outperformed all competing methods at the Tox21
dataset, and set a new record at an astronomy data set. The winning SNN
architectures are often very deep. Implementations are available at:
github.com/bioinf-jku/SNNs.
Unlocking the Potential of Simulators: Design with RL in Mind
Comments: Extended abstract for RLDM17 (3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making)
Subjects:
Learning (cs.LG)
; Robotics (cs.RO)
Using Reinforcement Learning (RL) in simulation to construct policies useful
in real life is challenging. This is often attributed to the sequential
decision making aspect: inaccuracies in simulation accumulate over multiple
steps, hence the simulated trajectories diverge from what would happen in
reality.
In our work we show the need to consider another important aspect: the
mismatch in simulating control. We bring attention to the need for modeling
control as well as dynamics, since oversimplifying assumptions about applying
actions of RL policies could make the policies fail on real-world systems.
We design a simulator for solving a pivoting task (of interest in Robotics)
and demonstrate that even a simple simulator designed with RL in mind
outperforms high-fidelity simulators when it comes to learning a policy that is
to be deployed on a real robotic system. We show that a phenomenon that is hard
to model – friction – could be exploited successfully, even when RL is
performed using a simulator with a simple dynamics and noise model. Hence, we
demonstrate that as long as the main sources of uncertainty are identified, it
could be possible to learn policies applicable to real systems even using a
simple simulator.
RL-compatible simulators could open the possibilities for applying a wide
range of RL algorithms in various fields. This is important, since currently
data sparsity in fields like healthcare and education frequently forces
researchers and engineers to only consider sample-efficient RL approaches.
Successful simulator-aided RL could increase flexibility of experimenting with
RL algorithms and help applying RL policies to real-world settings in fields
where data is scarce. We believe that lessons learned in Robotics could help
other fields design RL-compatible simulators, so we summarize our experience
and conclude with suggestions.
Distribution-Free One-Pass Learning
Peng Zhao , Zhi-Hua Zhou Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In many large-scale machine learning applications, data are accumulated with
time, and thus, an appropriate model should be able to update in an online
paradigm. Moreover, as the whole data volume is unknown when constructing the
model, it is desired to scan each data item only once with a storage
independent with the data volume. It is also noteworthy that the distribution
underlying may change during the data accumulation procedure. To handle such
tasks, in this paper we propose DFOP, a distribution-free one-pass learning
approach. This approach works well when distribution change occurs during data
accumulation, without requiring prior knowledge about the change. Every data
item can be discarded once it has been scanned. Besides, theoretical guarantee
shows that the estimate error, under a mild assumption, decreases until
convergence with high probability. The performance of DFOP for both regression
and classification are validated in experiments.
Luck is Hard to Beat: The Difficulty of Sports Prediction
Comments: 10 pages, KDD2017, Applied Data Science track
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Predicting the outcome of sports events is a hard task. We quantify this
difficulty with a coefficient that measures the distance between the observed
final results of sports leagues and idealized perfectly balanced competitions
in terms of skill. This indicates the relative presence of luck and skill. We
collected and analyzed all games from 198 sports leagues comprising 1503
seasons from 84 countries of 4 different sports: basketball, soccer, volleyball
and handball. We measured the competitiveness by countries and sports. We also
identify in each season which teams, if removed from its league, result in a
completely random tournament. Surprisingly, not many of them are needed. As
another contribution of this paper, we propose a probabilistic graphical model
to learn about the teams’ skills and to decompose the relative weights of luck
and skill in each game. We break down the skill component into factors
associated with the teams’ characteristics. The model also allows to estimate
as 0.36 the probability that an underdog team wins in the NBA league, with a
home advantage adding 0.09 to this probability. As shown in the first part of
the paper, luck is substantially present even in the most competitive
championships, which partially explains why sophisticated and complex
feature-based models hardly beat simple models in the task of forecasting
sports’ outcomes.
Generalized Value Iteration Networks: Life Beyond Lattices
Comments: 14 pages, conference
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI)
In this paper, we introduce a generalized value iteration network (GVIN),
which is an end-to-end neural network planning module. GVIN emulates the value
iteration algorithm by using a novel graph convolution operator, which enables
GVIN to learn and plan on irregular spatial graphs. We propose three novel
differentiable kernels as graph convolution operators and show that the
embedding based kernel achieves the best performance. We further propose
episodic Q-learning, an improvement upon traditional n-step Q-learning that
stabilizes training for networks that contain a planning module. Lastly, we
evaluate GVIN on planning problems in 2D mazes, irregular graphs, and
real-world street networks, showing that GVIN generalizes well for both
arbitrary graphs and unseen graphs of larger scale and outperforms a naive
generalization of VIN (discretizing a spatial graph into a 2D image).
A Convex Framework for Fair Regression
Richard Berk , Hoda Heidari , Shahin Jabbari , Matthew Joseph , Michael Kearns , Jamie Morgenstern , Seth Neel , Aaron Roth Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
We introduce a flexible family of fairness regularizers for (linear and
logistic) regression problems. These regularizers all enjoy convexity,
permitting fast optimization, and they span the rang from notions of group
fairness to strong individual fairness. By varying the weight on the fairness
regularizer, we can compute the efficient frontier of the accuracy-fairness
trade-off on any given dataset, and we measure the severity of this trade-off
via a numerical quantity we call the Price of Fairness (PoF). The centerpiece
of our results is an extensive comparative study of the PoF across six
different datasets in which fairness is a primary consideration.
On learning the structure of Bayesian Networks and submodular function maximization
Giulio Caravagna , Daniele Ramazzotti , Guido Sanguinetti Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Learning the structure of dependencies among multiple random variables is a
problem of considerable theoretical and practical interest. In practice, score
optimisation with multiple restarts provides a practical and surprisingly
successful solution, yet the conditions under which this may be a well founded
strategy are poorly understood. In this paper, we prove that the problem of
identifying the structure of a Bayesian Network via regularised score
optimisation can be recast, in expectation, as a submodular optimisation
problem, thus guaranteeing optimality with high probability. This result both
explains the practical success of optimisation heuristics, and suggests a way
to improve on such algorithms by artificially simulating multiple data sets via
a bootstrap procedure. We show on several synthetic data sets that the
resulting algorithm yields better recovery performance than the state of the
art, and illustrate in a real cancer genomic study how such an approach can
lead to valuable practical insights.
Training Quantized Nets: A Deeper Understanding
Hao Li , Soham De , Zheng Xu , Christoph Studer , Hanan Samet , Tom Goldstein Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV)
Currently, deep neural networks are deployed on low-power embedded devices by
first training a full-precision model using powerful computing hardware, and
then deriving a corresponding low-precision model for efficient inference on
such systems. However, training models directly with coarsely quantized weights
is a key step towards learning on embedded platforms that have limited
computing resources, memory capacity, and power consumption. Numerous recent
publications have studied methods for training quantized network, but these
studies have mostly been empirical. In this work, we investigate training
methods for quantized neural networks from a theoretical viewpoint. We first
explore accuracy guarantees for training methods under convexity assumptions.
We then look at the behavior of algorithms for non-convex problems, and we show
that training algorithms that exploit high-precision representations have an
important annealing property that purely quantized training methods lack, which
explains many of the observed empirical differences between these types of
algorithms.
Fast Black-box Variational Inference through Stochastic Trust-Region Optimization
Comments: submitted to NIPS 2017
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
We introduce TrustVI, a fast second-order algorithm for black-box variational
inference based on trust-region optimization and the reparameterization trick.
At each iteration, TrustVI proposes and assesses a step based on minibatches of
draws from the variational distribution. The algorithm provably converges to a
stationary point. We implement TrustVI in the Stan framework and compare it to
ADVI. TrustVI typically converges in tens of iterations to a solution at least
as good as the one that ADVI reaches in thousands of iterations. TrustVI
iterations can be more computationally expensive, but total computation is
typically an order of magnitude less in our experiments.
Generative-Discriminative Variational Model for Visual Recognition
Chih-Kuan Yeh , Yao-Hung Hubert Tsai , Yu-Chiang Frank Wang Subjects : Learning (cs.LG)
The paradigm shift from shallow classifiers with hand-crafted features to
end-to-end trainable deep learning models has shown significant improvements on
supervised learning tasks. Despite the promising power of deep neural networks
(DNN), how to alleviate overfitting during training has been a research topic
of interest. In this paper, we present a Generative-Discriminative Variational
Model (GDVM) for visual classification, in which we introduce a latent variable
inferred from inputs for exhibiting generative abilities towards prediction. In
other words, our GDVM casts the supervised learning task as a generative
learning process, with data discrimination to be jointly exploited for improved
classification. In our experiments, we consider the tasks of multi-class
classification, multi-label classification, and zero-shot learning. We show
that our GDVM performs favorably against the baselines or recent generative DNN
models.
Meta-Learning for Construction of Resampling Recommendation Systems
Evgeny Burnaev , Pavel Erofeev , Artem Papanov Subjects : Learning (cs.LG) ; Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)
One possible approach to tackle class imbalance in classification tasks is to
resample training dataset, i.e., to drop some of its elements or to synthesize
new ones. There exist several widely-used resampling methods. Recent research
showed that selection of resampling method essentially affects quality of
classification, which raises resampling selection problem. Exhaustive search
for optimal resampling is time-consuming and hence it is of limited use. In
this paper, we describe an alternative approach to resampling selection. We
follow meta-learning concept to build resampling recommendation systems, i.e.,
algorithms recommending resampling for datasets on the basis of their
properties.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Comments: Tech report
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Deep learning thrives with large neural networks and large datasets. However,
larger networks and larger datasets result in longer training times that impede
research and development progress. Distributed synchronous SGD offers a
potential solution to this problem by dividing SGD minibatches over a pool of
parallel workers. Yet to make this scheme efficient, the per-worker workload
must be large, which implies nontrivial growth in the SGD minibatch size. In
this paper, we empirically show that on the ImageNet dataset large minibatches
cause optimization difficulties, but when these are addressed the trained
networks exhibit good generalization. Specifically, we show no loss of accuracy
when training with large minibatch sizes up to 8192 images. To achieve this
result, we adopt a linear scaling rule for adjusting learning rates as a
function of minibatch size and develop a new warmup scheme that overcomes
optimization challenges early in training. With these simple techniques, our
Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs
in one hour, while matching small minibatch accuracy. Using commodity hardware,
our implementation achieves ~90% scaling efficiency when moving from 8 to 256
GPUs. This system enables us to train visual recognition models on
internet-scale data with high efficiency.
Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs
Comments: 11 pages, 4 figures, 2 tables
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
Generative Adversarial Networks (GANs) have shown remarkable success as a
framework for training models to produce realistic-looking data. In this work,
we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to
produce realistic real-valued multi-dimensional time series, with an emphasis
on their application to medical data. RGANs make use of recurrent neural
networks in the generator and the discriminator. In the case of RCGANs, both of
these RNNs are conditioned on auxiliary information. We demonstrate our models
in a set of toy datasets, where we show visually and quantitatively (using
sample likelihood and maximum mean discrepancy) that they can successfully
generate realistic time-series. We also describe novel evaluation methods for
GANs, where we generate a synthetic labelled training dataset, and evaluate on
a real test set the performance of a model trained on the synthetic data, and
vice-versa. We illustrate with these metrics that RCGANs can generate
time-series data useful for supervised training, with only minor degradation in
performance on real test data. This is demonstrated on digit classification
from ‘serialised’ MNIST and by training an early warning system on a medical
dataset of 17,000 patients from an intensive care unit. We further discuss and
analyse the privacy concerns that may arise when using RCGANs to generate
realistic synthetic medical time series data.
Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes
Hyunjik Kim , Yee Whye Teh Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)
Automating statistical modelling is a challenging problem that has
far-reaching implications for artificial intelligence. The Automatic
Statistician employs a kernel search algorithm to provide a first step in this
direction for regression problems. However this does not scale due to its
(O(N^3)) running time for the model selection. This is undesirable not only
because the average size of data sets is growing fast, but also because there
is potentially more information in bigger data, implying a greater need for
more expressive models that can discover finer structure. We propose Scalable
Kernel Composition (SKC), a scalable kernel search algorithm, to encompass big
data within the boundaries of automated statistical modelling.
Context encoders as a simple but powerful extension of word2vec
Comments: ACL 2017 2nd Workshop on Representation Learning for NLP
Subjects:
Machine Learning (stat.ML)
; Computation and Language (cs.CL); Learning (cs.LG)
With a simple architecture and the ability to learn meaningful word
embeddings efficiently from texts containing billions of words, word2vec
remains one of the most popular neural language models used today. However, as
only a single embedding is learned for every word in the vocabulary, the model
fails to optimally represent words with multiple meanings. Additionally, it is
not possible to create embeddings for new (out-of-vocabulary) words on the
spot. Based on an intuitive interpretation of the continuous bag-of-words
(CBOW) word2vec model’s negative sampling training objective in terms of
predicting context based similarities, we motivate an extension of the model we
call context encoders (ConEc). By multiplying the matrix of trained word2vec
embeddings with a word’s average context vector, out-of-vocabulary (OOV)
embeddings and representations for a word with multiple meanings can be created
based on the word’s local contexts. The benefits of this approach are
illustrated by using these word embeddings as features in the CoNLL 2003 named
entity recognition (NER) task.
Comments: pp. 155-162
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)
Humans and animals are constantly exposed to a continuous stream of sensory
information from different modalities. At the same time, they form more
compressed representations like concepts or symbols. In species that use
language, this process is further structured by this interaction, where a
mapping between the sensorimotor concepts and linguistic elements needs to be
established. There is evidence that children might be learning language by
simply disambiguating potential meanings based on multiple exposures to
utterances in different contexts (cross-situational learning). In existing
models, the mapping between modalities is usually found in a single step by
directly using frequencies of referent and meaning co-occurrences. In this
paper, we present an extension of this one-step mapping and introduce a newly
proposed sequential mapping algorithm together with a publicly available Matlab
implementation. For demonstration, we have chosen a less typical scenario:
instead of learning to associate objects with their names, we focus on body
representations. A humanoid robot is receiving tactile stimulations on its
body, while at the same time listening to utterances of the body part names
(e.g., hand, forearm and torso). With the goal at arriving at the correct “body
categories”, we demonstrate how a sequential mapping algorithm outperforms
one-step mapping. In addition, the effect of data set size and noise in the
linguistic input are studied.
Forward Thinking: Building and Training Neural Networks One Layer at a Time
Chris Hettinger , Tanner Christensen , Ben Ehlert , Jeffrey Humpherys , Tyler Jarvis , Sean Wade Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)
We present a general framework for training deep neural networks without
backpropagation. This substantially decreases training time and also allows for
construction of deep networks with many sorts of learners, including networks
whose layers are defined by functions that are not easily differentiated, like
decision trees. The main idea is that layers can be trained one at a time, and
once they are trained, the input data are mapped forward through the layer to
create a new learning problem. The process is repeated, transforming the data
through multiple layers, one at a time, rendering a new data set, which is
expected to be better behaved, and on which a final output layer can achieve
good performance. We call this forward thinking and demonstrate a proof of
concept by achieving state-of-the-art accuracy on the MNIST dataset for
convolutional neural networks. We also provide a general mathematical
formulation of forward thinking that allows for other types of deep learning
problems to be considered.
Predictive Coding-based Deep Dynamic Neural Network for Visuomotor Learning
Comments: Accepted at the 7th Joint IEEE International Conference of Developmental Learning and Epigenetic Robotics (ICDL-EpiRob 2017)
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)
This study presents a dynamic neural network model based on the predictive
coding framework for perceiving and predicting the dynamic visuo-proprioceptive
patterns. In our previous study [1], we have shown that the deep dynamic neural
network model was able to coordinate visual perception and action generation in
a seamless manner. In the current study, we extended the previous model under
the predictive coding framework to endow the model with a capability of
perceiving and predicting dynamic visuo-proprioceptive patterns as well as a
capability of inferring intention behind the perceived visuomotor information
through minimizing prediction error. A set of synthetic experiments were
conducted in which a robot learned to imitate the gestures of another robot in
a simulation environment. The experimental results showed that with given
intention states, the model was able to mentally simulate the possible incoming
dynamic visuo-proprioceptive patterns in a top-down process without the inputs
from the external environment. Moreover, the results highlighted the role of
minimizing prediction error in inferring underlying intention of the perceived
visuo-proprioceptive patterns, supporting the predictive coding account of the
mirror neuron systems. The results also revealed that minimizing prediction
error in one modality induced the recall of the corresponding representation of
another modality acquired during the consolidative learning of raw-level
visuo-proprioceptive patterns.
Comments: Accepted in the IEEE Transactions on Cognitive and Developmental Systems (TCDS), 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Robotics (cs.RO)
This study investigates how adequate coordination among the different
cognitive processes of a humanoid robot can be developed through end-to-end
learning of direct perception of visuomotor stream. We propose a deep dynamic
neural network model built on a dynamic vision network, a motor generation
network, and a higher-level network. The proposed model was designed to process
and to integrate direct perception of dynamic visuomotor patterns in a
hierarchical model characterized by different spatial and temporal constraints
imposed on each level. We conducted synthetic robotic experiments in which a
robot learned to read human’s intention through observing the gestures and then
to generate the corresponding goal-directed actions. Results verify that the
proposed model is able to learn the tutored skills and to generalize them to
novel situations. The model showed synergic coordination of perception, action
and decision making, and it integrated and coordinated a set of cognitive
skills including visual perception, intention reading, attention switching,
working memory, action preparation and execution in a seamless manner. Analysis
reveals that coherent internal representations emerged at each level of the
hierarchy. Higher-level representation reflecting actional intention developed
by means of continuous integration of the lower-level visuo-proprioceptive
stream.
Creating Virtual Universes Using Generative Adversarial Networks
Comments: 8 pages, 5 figures
Subjects:
Instrumentation and Methods for Astrophysics (astro-ph.IM)
; Learning (cs.LG)
Inferring model parameters from experimental data is a grand challenge in
many sciences, including cosmology. This often relies critically on high
fidelity numerical simulations, which are prohibitively computationally
expensive. The application of deep learning techniques to generative modeling
is renewing interest in using high dimensional density estimators as
computationally inexpensive emulators of fully-fledged simulations. These
generative models have the potential to make a dramatic shift in the field of
scientific simulations, but for that shift to happen we need to study the
performance of such generators in the precision regime needed for science
applications. To this end, in this letter we apply Generative Adversarial
Networks to the problem of generating cosmological weak lensing convergence
maps. We show that our generator network produces maps that are described by,
with high statistical confidence, the same summary statistics as the fully
simulated maps.
On the Robustness of Deep Convolutional Neural Networks for Music Classification
Keunwoo Choi , George Fazekas , Kyunghyun Cho , Mark Sandler Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
Deep neural networks (DNN) have been successfully applied for music
classification including music tagging. However, there are several open
questions regarding generalisation and best practices in the choice of network
architectures, hyper-parameters and input representations. In this article, we
investigate specific aspects of neural networks to deepen our understanding of
their properties. We analyse and (re-)validate a large music tagging dataset to
investigate the reliability of training and evaluation. We perform
comprehensive experiments involving audio preprocessing using different
time-frequency representations, logarithmic magnitude compression, frequency
weighting and scaling. Using a trained network, we compute label vector
similarities which is compared to groundtruth similarity.
The results highlight several import aspects of music tagging and neural
networks. We show that networks can be effective despite of relatively large
error rates in groundtruth datasets. We subsequently show that many commonly
used input preprocessing techniques are redundant except magnitude compression.
Lastly, the analysis of our trained network provides valuable insight into the
relationships between music tags. These results highlight the benefit of using
data-driven methods to address automatic music tagging.
Comments: CVPR 2017 Spotlight
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG)
We present an end-to-end, multimodal, fully convolutional network for
extracting semantic structures from document images. We consider document
semantic structure extraction as a pixel-wise segmentation task, and propose a
unified model that classifies pixels based not only on their visual appearance,
as in the traditional page segmentation task, but also on the content of
underlying text. Moreover, we propose an efficient synthetic document
generation process that we use to generate pretraining data for our network.
Once the network is trained on a large set of synthetic documents, we fine-tune
the network on unlabeled real documents using a semi-supervised approach. We
systematically study the optimum network architecture and show that both our
multimodal approach and the synthetic data pretraining significantly boost the
performance.
Low-shot learning with large-scale diffusion
Matthijs Douze , Arthur Szlam , Bharath Hariharan , Hervé Jégou Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)
This paper considers the problem of inferring image labels for which only a
few labelled examples are available at training time. This setup is often
referred to as low-shot learning in the literature, where a standard approach
is to re-train the last few layers of a convolutional neural network learned on
separate classes. We consider a semi-supervised setting in which we exploit a
large collection of images to support label propagation. This is made possible
by leveraging the recent advances on large-scale similarity graph construction.
We show that despite its conceptual simplicity, scaling up label propagation to
up hundred millions of images leads to state of the art accuracy in the
low-shot learning regime.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
Sharath Adavanne , Giambattista Parascandolo , Pasi Pertilä , Toni Heittola , Tuomas Virtanen Subjects : Sound (cs.SD) ; Learning (cs.LG)
In this paper, we propose the use of spatial and harmonic features in
combination with long short term memory (LSTM) recurrent neural network (RNN)
for automatic sound event detection (SED) task. Real life sound recordings
typically have many overlapping sound events, making it hard to recognize with
just mono channel audio. Human listeners have been successfully recognizing the
mixture of overlapping sound events using pitch cues and exploiting the stereo
(multichannel) audio signal available at their ears to spatially localize these
events. Traditionally SED systems have only been using mono channel audio,
motivated by the human listener we propose to extend them to use multichannel
audio. The proposed SED system is compared against the state of the art mono
channel method on the development subset of TUT sound events detection 2016
database. The usage of spatial and harmonic features are shown to improve the
performance of SED.
Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition
Comments: Accepted for Sound and Music Computing (SMC 2017)
Subjects:
Sound (cs.SD)
; Learning (cs.LG)
This paper studies the emotion recognition from musical tracks in the
2-dimensional valence-arousal (V-A) emotional space. We propose a method based
on convolutional (CNN) and recurrent neural networks (RNN), having
significantly fewer parameters compared with the state-of-the-art method for
the same task. We utilize one CNN layer followed by two branches of RNNs
trained separately for arousal and valence. The method was evaluated using the
‘MediaEval2015 emotion in music’ dataset. We achieved an RMSE of 0.202 for
arousal and 0.268 for valence, which is the best result reported on this
dataset.
Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network
Comments: Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017)
Subjects:
Sound (cs.SD)
; Learning (cs.LG)
This paper proposes to use low-level spatial features extracted from
multichannel audio for sound event detection. We extend the convolutional
recurrent neural network to handle more than one type of these multichannel
features by learning from each of them separately in the initial stages. We
show that instead of concatenating the features of each channel into a single
feature vector the network learns sound events in multichannel audio better
when they are presented as separate layers of a volume. Using the proposed
spatial features over monaural features on the same network gives an absolute
F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and
2.7% on the TUT-SED 2009 dataset that is fifteen times larger.
Information Theory
Physical Layer Security of Generalised Pre-coded Spatial Modulation with Antenna Scrambling
Rong Zhang , Lie-Liang Yang , Lajos Hanzo Subjects : Information Theory (cs.IT)
We now advocate a novel physical layer security solution that is unique to
our previously proposed GPSM scheme with the aid of the proposed antenna
scrambling. The novelty and contribution of our paper lies in three aspects: 1/
principle: we introduce a `security key’ generated at Alice that is unknown to
both Bob and Eve, where the design goal is that the publicly unknown security
key only imposes barrier for Eve. 2/ approach: we achieve it by conveying
useful information only through the activation of RA indices, which is in turn
concealed by the unknown security key in terms of the randomly scrambled
symbols used in place of the conventional modulated symbols in GPSM scheme. 3/
design: we consider both Circular Antenna Scrambling (CAS) and Gaussian Antenna
Scrambling (GAS) in detail and the resultant security capacity of both designs
are quantified and compared.
Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel
This paper investigates polar codes for the additive white Gaussian noise
(AWGN) channel. The scaling exponent (mu) of polar codes for a memoryless
channel (q_{Y|X}) with capacity (I(q_{Y|X})) characterizes the closest gap
between the capacity and non-asymptotic achievable rates in the following way:
For a fixed (varepsilon in (0, 1)), the gap between the capacity (I(q_{Y|X}))
and the maximum non-asymptotic rate (R_n^*) achieved by a length-(n) polar code
with average error probability (varepsilon) scales as (n^{-1/mu}), i.e.,
(I(q_{Y|X})-R_n^* = Theta(n^{-1/mu})).
It is well known that the scaling exponent (mu) for any binary-input
memoryless channel (BMC) with (I(q_{Y|X})in(0,1)) is bounded above by (4.714),
which was shown by an explicit construction of polar codes. Our main result
shows that (4.714) remains to be a valid upper bound on the scaling exponent
for the AWGN channel. Our proof technique involves the following two ideas: (i)
The capacity of the AWGN channel can be achieved within a gap of
(O(n^{-1/mu}sqrt{log n})) by using an input alphabet consisting of (n)
constellations and restricting the input distribution to be uniform; (ii) The
capacity of a multiple access channel (MAC) with an input alphabet consisting
of (n) constellations can be achieved within a gap of (O(n^{-1/mu}log n)) by
using a superposition of (log n) binary-input polar codes. In addition, we
investigate the performance of polar codes in the moderate deviations regime
where both the gap to capacity and the error probability vanish as (n) grows.
An explicit construction of polar codes is proposed to obey a certain tradeoff
between the gap to capacity and the decay rate of the error probability for the
AWGN channel.
Estimating Mixture Entropy with Pairwise Distances
Artemy Kolchinsky , Brendan D. Tracey Subjects : Information Theory (cs.IT) ; Methodology (stat.ME); Machine Learning (stat.ML)
Mixture distributions arise in many parametric and non-parametric settings,
for example in Gaussian mixture models and in non-parametric estimation. It is
often necessary to compute the entropy of a mixture, but in most cases this
quantity has no closed-form expression, making some form of approximation
necessary. We propose a family of estimators based on a pairwise distance
function between mixture components, and show that this estimator class has
many attractive properties. For many distributions of interest, the proposed
estimators are efficient to compute, differentiable in the mixture parameters,
and become exact when the mixture components are clustered. We prove this
family includes lower and upper bounds on the mixture entropy. The Chernoff
(alpha)-divergence gives a lower bound when chosen as the distance function,
with the Bhattacharyaa distance providing the tightest lower bound for
components that are symmetric and members of a location family. The
Kullback-Leibler divergence gives an upper bound when used as the distance
function. We provide closed-form expressions of these bounds for mixtures of
Gaussians, and discuss their applications to the estimation of mutual
information. We then demonstrate that our bounds are significantly tighter than
well-known existing bounds using numeric simulations. This pairwise estimator
class is very useful in optimization problems involving
maximization/minimization of entropy and mutual information, such as MaxEnt and
rate distortion problems.
Delay Optimal Scheduling for Chunked Random Linear Network Coding Broadcast
Comments: 13 pages, 17 figures, to be submitted in Transactions on Control of Network Systems
Subjects:
Information Theory (cs.IT)
We study the broadcast transmission of a single file to an arbitrary number
of receivers using Random Linear Network Coding (RLNC) in a network with
unreliable channels. Due to the increased computational complexity of the
decoding process (especially for large files) we apply chunked RLNC (i.e. RLNC
is applied within non-overlapping subsets of the file).
In our work we show the optimality of the Least Received (LR) batch
scheduling policy (which was introduced in our prior work) with regards to the
expected file transfer completion time. Furthermore, we refine some of our
earlier results, namely the expected file transfer completion time of the LR
policy and the minimum achievable coding window size in the case of a user
defined delay constraint. Finally, we experimentally evaluate a modification of
the LR policy in a more realistic system setting with reduced feedback from the
receivers.
欢迎加入我爱机器学习QQ11群:191401275
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习
以上所述就是小编给大家介绍的《arXiv Paper Daily: Fri, 9 Jun 2017》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
HTML 编码/解码
HTML 编码/解码
XML、JSON 在线转换
在线XML、JSON转换工具