内容简介:arXiv Paper Daily: Wed, 14 Jun 2017
Neural and Evolutionary Computing
Temporally Efficient Deep Learning with Spikes
Comments: 8 pages + references and appendix
Subjects:
Neural and Evolutionary Computing (cs.NE)
The vast majority of natural sensory data is temporally redundant. Video
frames or audio samples which are sampled at nearby points in time tend to have
similar values. Typically, deep learning algorithms take no advantage of this
redundancy to reduce computation. This can be an obscene waste of energy. We
present a variant on backpropagation for neural networks in which computation
scales with the rate of change of the data – not the rate at which we process
the data. We do this by having neurons communicate a combination of their
state, and their temporal change in state. Intriguingly, this simple
communication rule give rise to units that resemble biologically-inspired leaky
integrate-and-fire neurons, and to a weight-update rule that is equivalent to a
form of Spike-Timing Dependent Plasticity (STDP), a synaptic learning rule
observed in the brain. We demonstrate that on MNIST and a temporal variant of
MNIST, our algorithm performs about as well as a Multilayer Perceptron trained
with backpropagation, despite only communicating discrete values between
layers.
Prediction of Muscle Activations for Reaching Movements using Deep Neural Networks
Comments: To be presented at the Annual meeting of American Society of Biomechanics 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
The motor control problem involves determining the time-varying muscle
activation trajectories required to accomplish a given movement. Muscle
redundancy makes motor control a challenging task: there are many possible
activation trajectories that accomplish the same movement. Despite this
redundancy, most movements are accomplished in highly stereotypical ways. For
example, point-to-point reaching movements are almost universally performed
with very similar smooth trajectories. Optimization methods are commonly used
to predict muscle forces for measured movements. However, these approaches
require computationally expensive simulations and are sensitive to the chosen
optimality criteria and regularization. In this work, we investigate deep
autoencoders for the prediction of muscle activation trajectories for
point-to-point reaching movements. We evaluate our DNN predictions with
simulated reaches and two methods to generate the muscle activations: inverse
dynamics (ID) and optimal control (OC) criteria. We also investigate optimal
network parameters and training criteria to improve the accuracy of the
predictions.
From MEGATON to RASCAL: Surfing the Parameter Space of Evolutionary Algorithms
Moshe Sipper , Weixuan Fu , Karuna Ahuja , Jason H. Moore Subjects : Neural and Evolutionary Computing (cs.NE)
The practice of evolutionary algorithms involves a mundane yet inescapable
phase, namely, finding parameters that work well. How big should the population
be? How many generations should the algorithm run? What is the (tournament
selection) tournament size? What probabilities should one assign to crossover
and mutation? All these nagging questions need good answers if one is to
embrace success. Through an extensive series of experiments over multiple
evolutionary algorithm implementations and problems we show that parameter
space tends to be rife with viable parameters. We aver that this renders the
life of the practitioner that much easier, and cap off our study with an
advisory digest for the weary.
Recurrent Inference Machines for Solving Inverse Problems
Patrick Putzky , Max Welling Subjects : Neural and Evolutionary Computing (cs.NE) ; Computer Vision and Pattern Recognition (cs.CV)
Much of the recent research on solving iterative inference problems focuses
on moving away from hand-chosen inference algorithms and towards learned
inference. In the latter, the inference process is unrolled in time and
interpreted as a recurrent neural network (RNN) which allows for joint learning
of model and inference parameters with back-propagation through time. In this
framework, the RNN architecture is directly derived from a hand-chosen
inference algorithm, effectively limiting its capabilities. We propose a
learning framework, called Recurrent Inference Machines (RIM), in which we turn
algorithm construction the other way round: Given data and a task, train an RNN
to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they
are capable to implement any inference algorithm. The framework allows for an
abstraction which removes the need for domain knowledge. We demonstrate in
several image restoration experiments that this abstraction is effective,
allowing us to achieve state-of-the-art performance on image denoising and
super-resolution tasks and superior across-task generalization.
Comments: AAAI 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Monte Carlo tree search (MCTS) is extremely popular in computer Go which
determines each action by enormous simulations in a broad and deep search tree.
However, human experts select most actions by pattern analysis and careful
evaluation rather than brute search of millions of future nteractions. In this
paper, we propose a computer Go system that follows experts way of thinking and
playing. Our system consists of two parts. The first part is a novel deep
alternative neural network (DANN) used to generate candidates of next move.
Compared with existing deep convolutional neural network (DCNN), DANN inserts
recurrent layer after each convolutional layer and stacks them in an
alternative manner. We show such setting can preserve more contexts of local
features and its evolutions which are beneficial for move prediction. The
second part is a long-term evaluation (LTE) module used to provide a reliable
evaluation of candidates rather than a single probability from move predictor.
This is consistent with human experts nature of playing since they can foresee
tens of steps to give an accurate estimation of candidates. In our system, for
each candidate, LTE calculates a cumulative reward after several future
interactions when local variations are settled. Combining criteria from the two
parts, our system determines the optimal choice of next move. For more
comprehensive experiments, we introduce a new professional Go dataset (PGD),
consisting of 253233 professional records. Experiments on GoGoD and PGD
datasets show the DANN can substantially improve performance of move prediction
over pure DCNN. When combining LTE, our system outperforms most relevant
approaches and open engines based on MCTS.
Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks
Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
Recommendation algorithms that incorporate techniques from deep learning are
becoming increasingly popular. Due to the structure of the data coming from
recommendation domains (i.e., one-hot-encoded vectors of item preferences),
these algorithms tend to have large input and output dimensionalities that
dominate their overall size. This makes them difficult to train, due to the
limited memory of graphical processing units, and difficult to deploy on mobile
devices with limited hardware. To address these difficulties, we propose Bloom
embeddings, a compression technique that can be applied to the input and output
of neural network models dealing with sparse high-dimensional binary-coded
instances. Bloom embeddings are computationally efficient, and do not seriously
compromise the accuracy of the model up to 1/5 compression ratios. In some
cases, they even improve over the original accuracy, with relative increases up
to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4
alternative methods, obtaining favorable results. We also discuss a number of
further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time
operation, zero or marginal space requirements, training time speedups, or the
fact that they do not require any change to the core model architecture or
training configuration.
A Supervised Approach to Extractive Summarisation of Scientific Papers
Comments: 11 pages, 6 figures
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)
Automatic summarisation is a popular approach to reduce a document to its
main arguments. Recent research in the area has focused on neural approaches to
summarisation, which can be very data-hungry. However, few large datasets exist
and none for the traditionally popular domain of scientific publications, which
opens up challenging research avenues centered on encoding large, complex
documents. In this paper, we introduce a new dataset for summarisation of
computer science publications by exploiting a large resource of author provided
summaries and show straightforward ways of extending it further. We develop
models on the dataset making use of both neural sentence encoding and
traditionally used summarisation features and show that models which encode
sentences as well as their local and global context perform best, significantly
outperforming well-established baseline methods.
Computer Vision and Pattern Recognition
Video Imagination from a Single Image with Transformation Generation
Comments: 9 pages, 10 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In this work, we focus on a challenging task: synthesizing multiple imaginary
videos given a single image. Major problems come from high dimensionality of
pixel space and the ambiguity of potential motions. To overcome those problems,
we propose a new framework that produce imaginary videos by transformation
generation. The generated transformations are applied to the original image in
a novel volumetric merge network to reconstruct frames in imaginary video.
Through sampling different latent variables, our method can output different
imaginary video samples. The framework is trained in an adversarial way with
unsupervised learning. For evaluation, we propose a new assessment metric
(RIQA). In experiments, we test on 3 datasets varying from synthetic data to
natural scene. Our framework achieves promising performance in image quality
assessment. The visual inspection indicates that it can successfully generate
diverse five-frame videos in acceptable perceptual quality.
Joint Max Margin and Semantic Features for Continuous Event Detection in Complex Scenes
Comments: submit to journal of Computer Vision and Image Understanding
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In this paper the problem of complex event detection in the continuous domain
(i.e. events with unknown starting and ending locations) is addressed. Existing
event detection methods are limited to features that are extracted from the
local spatial or spatio-temporal patches from the videos. However, this makes
the model vulnerable to the events with similar concepts e.g. “Open drawer” and
“Open cupboard”. In this work, in order to address the aforementioned
limitations we present a novel model based on the combination of semantic and
temporal features extracted from video frames. We train a max-margin classifier
on top of the extracted features in an adaptive framework that is able to
detect the events with unknown starting and ending locations. Our model is
based on the Bidirectional Region Neural Network and large margin Structural
Output SVM. The generality of our model allows it to be simply applied to
different labeled and unlabeled datasets. We finally test our algorithm on
three challenging datasets, “UCF 101-Action Recognition”, “MPII Cooking
Activities” and “Hollywood”, and we report state-of-the-art performance.
Deep Learning-Based Food Calorie Estimation Method in Dietary Assessment
Comments: 13 pages. arXiv admin note: substantial text overlap with arXiv:1705.07632
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Obesity treatment requires obese patients to record all food intakes per day.
Computer vision has been introduced to estimate calories from food images. In
order to increase accuracy of detection and reduce the error of volume
estimation in food calorie estimation, we present our calorie estimation method
in this paper. To estimate calorie of food, a top view and side view is needed.
Faster R-CNN is used to detect the food and calibration object. GrabCut
algorithm is used to get each food’s contour. Then the volume is estimated with
the food and corresponding object. Finally we estimate each food’s calorie. And
the experiment results show our estimation method is effective.
Text Extraction From Texture Images Using Masked Signal Decomposition
Comments: arXiv admin note: text overlap with arXiv:1704.07711
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Text extraction is an important problem in image processing with applications
from optical character recognition to autonomous driving. Most of the
traditional text segmentation algorithms consider separating text from a simple
background (which usually has a different color from texts). In this work we
consider separating texts from a textured background, that has similar color to
texts. We look at this problem from a signal decomposition perspective, and
consider a more realistic scenario where signal components are overlaid on top
of each other (instead of adding together). When the signals are overlaid, to
separate signal components, we need to find a binary mask which shows the
support of each component. Because directly solving the binary mask is
intractable, we relax this problem to the approximated continuous problem, and
solve it by alternating optimization method. We show that the proposed
algorithm achieves significantly better results than other recent works on
several challenging images.
Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty
Pedro F. Proenca , Yang Gao Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)
This work proposes a robust visual odometry method for structured
environments that combines point features with line and plane segments,
extracted through an RGB-D camera. Noisy depth maps are processed by a
probabilistic depth fusion framework based on Mixtures of Gaussians to denoise
and derive the depth uncertainty, which is then propagated throughout the
visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are
used to model the uncertainties of the feature parameters and pose is estimated
by combining the three types of primitives based on their uncertainties.
Performance evaluation on RGB-D sequences collected in this work and two public
RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth
fusion framework and combining the three feature-types, particularly in scenes
with low-textured surfaces, dynamic objects and missing depth measurements.
Long-Term Video Interpolation with Bidirectional Predictive Network
Comments: 5 pages, 7 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This paper considers the challenging task of long-term video interpolation.
Unlike most existing methods that only generate few intermediate frames between
existing adjacent ones, we attempt to speculate or imagine the procedure of an
episode and further generate multiple frames between two non-consecutive frames
in videos. In this paper, we present a novel deep architecture called
bidirectional predictive network (BiPN) that predicts intermediate frames from
two opposite directions. The bidirectional architecture allows the model to
learn scene transformation with time as well as generate longer video
sequences. Besides, our model can be extended to predict multiple possible
procedures by sampling different noise vectors. A joint loss composed of clues
in image and feature spaces and adversarial loss is designed to train our
model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes
and UCF101 and report competitive results to recent approaches.
SEP-Nets: Small and Effective Pattern Networks
Zhe Li , Xiaoyu Wang , Xutao Lv , Tianbao Yang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)
While going deeper has been witnessed to improve the performance of
convolutional neural networks (CNN), going smaller for CNN has received
increasing attention recently due to its attractiveness for mobile/embedded
applications. It remains an active and important topic how to design a small
network while retaining the performance of large and deep CNNs (e.g., Inception
Nets, ResNets). Albeit there are already intensive studies on compressing the
size of CNNs, the considerable drop of performance is still a key concern in
many designs. This paper addresses this concern with several new contributions.
First, we propose a simple yet powerful method for compressing the size of deep
CNNs based on parameter binarization. The striking difference from most
previous work on parameter binarization/quantization lies at different
treatments of (1 imes 1) convolutions and (k imes k) convolutions ((k>1)),
where we only binarize (k imes k) convolutions into binary patterns. The
resulting networks are referred to as pattern networks. By doing this, we show
that previous deep CNNs such as GoogLeNet and Inception-type Nets can be
compressed dramatically with marginal drop in performance. Second, in light of
the different functionalities of (1 imes 1) (data projection/transformation)
and (k imes k) convolutions (pattern extraction), we propose a new block
structure codenamed the pattern residual block that adds transformed feature
maps generated by (1 imes 1) convolutions to the pattern feature maps
generated by (k imes k) convolutions, based on which we design a small network
with (sim 1) million parameters. Combining with our parameter binarization, we
achieve better performance on ImageNet than using similar sized networks
including recently released Google MobileNets.
Comments: Submitted to BMVC 2017 on 2nd May 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Training a deep convolutional neural net typically starts with a random
initialisation of all filters in all layers which severely reduces the forward
signal and back-propagated error and leads to slow and sub-optimal training.
Techniques that counter that focus on either increasing the signal or
increasing the gradients adaptively but the model behaves very differently at
the beginning of training compared to later when stable pathways through the
net have been established. To compound this problem the effective minibatch
size varies greatly between layers at different depths and between individual
filters as activation sparsity typically increases with depth leading to a
reduction in effective learning rate since gradients may superpose rather than
add and this further compounds the covariate shift problem as deeper neurons
are less able to adapt to upstream shift.
Proposed here is a method of automatic gain control of the signal built into
each convolutional neuron that achieves equivalent or superior performance than
batch normalisation and is compatible with single sample or minibatch gradient
descent. The same model is used both for training and inference.
The technique comprises a scaled per sample map mean subtraction from the raw
convolutional filter output followed by scaling of the difference.
Contrast Enhancement Estimation for Digital Image Forensics
Longyin Wen , Honggang Qi , Siwei Lyu Subjects : Computer Vision and Pattern Recognition (cs.CV)
Inconsistency in contrast enhancement can be used to expose image forgeries.
In this work, we describe a new method to estimate contrast enhancement from a
single image. Our method takes advantage of the nature of contrast enhancement
as a mapping between pixel values, and the distinct characteristics it
introduces to the image pixel histogram. Our method recovers the original pixel
histogram and the contrast enhancement simultaneously from a single image with
an iterative algorithm. Unlike previous methods, our method is robust in the
presence of additive noise perturbations that are used to hide the traces of
contrast enhancement. Furthermore, we also develop an e effective method to to
detect image regions undergone contrast enhancement transformations that are
different from the rest of the image, and use this method to detect composite
images. We perform extensive experimental evaluations to demonstrate the
efficacy and efficiency of our method method.
Comments: 5 pages, 6 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Plant aliveness is proven through laboratory experiments and special
scientific instruments. In this paper, we aim to detect the degree of animation
of plants based on the magnification of the small color changes in the plant’s
green leaves using the Eulerian video magnification. Capturing the video under
a controlled environment, e.g., using a tripod and direct current (DC) light
sources, reduces camera movements and minimizes light fluctuations; we aim to
reduce the external factors as much as possible. The acquired video is then
stabilized and a proposed algorithm used to reduce the illumination variations.
Lastly, the Euler magnification is utilized to magnify the color changes on the
light invariant video. The proposed system does not require any special purpose
instruments as it uses a digital camera with a regular frame rate. The results
of magnified color changes on both natural and plastic leaves show that the
live green leaves have color changes in contrast to the plastic leaves. Hence,
we can argue that the color changes of the leaves are due to biological
operations, such as photosynthesis. To date, this is possibly the first work
that focuses on interpreting visually, some biological operations of plants
without any special purpose instruments.
Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking
James Tompkin , Kwang In Kim , Hanspeter Pfister , Christian Theobalt Subjects : Computer Vision and Pattern Recognition (cs.CV)
Large databases are often organized by hand-labeled metadata, or criteria,
which are expensive to collect. We can use unsupervised learning to model
database variation, but these models are often high dimensional, complex to
parameterize, or require expert knowledge. We learn low-dimensional continuous
criteria via interactive ranking, so that the novice user need only describe
the relative ordering of examples. This is formed as semi-supervised label
propagation in which we maximize the information gained from a limited number
of examples. Further, we actively suggest data points to the user to rank in a
more informative way than existing work. Our efficient approach allows users to
interactively organize thousands of data points along 1D and 2D continuous
sliders. We experiment with datasets of imagery and geometry to demonstrate
that our tool is useful for quickly assessing and organizing the content of
large databases.
A Direction Search and Spectral Clustering Based Approach to Subspace Clustering
Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
This paper presents a new spectral-clustering-based approach to the subspace
clustering problem in which the data lies in the union of an unknown number of
unknown linear subspaces. Underpinning the proposed method is a convex program
for optimal direction search, which for each data point d, finds an optimal
direction in the span of the data that has minimum projection on the other data
points and non-vanishing projection on d. The obtained directions are
subsequently leveraged to identify a neighborhood set for each data point. An
Alternating Direction Method of Multipliers (ADMM) framework is provided to
efficiently solve for the optimal directions. The proposed method is shown to
often outperform the existing subspace clustering methods, particularly for
unwieldy scenarios involving high levels of noise and close subspaces, and
yields the state-of-the-art results for the problem of face clustering using
subspace segmentation.
Indirect Image Registration with Large Diffeomorphic Deformations
Comments: 34 pages, 4 figures, 1 table
Subjects:
Numerical Analysis (math.NA)
; Computer Vision and Pattern Recognition (cs.CV); Dynamical Systems (math.DS); Functional Analysis (math.FA); Optimization and Control (math.OC)
We introduce a variational framework for indirect image registration where a
template is registered against a target that is known only through indirect
noisy observations, such as in tomographic imaging. The registration uses
diffeomorphisms that transform the template through a (group) action. These
diffeomorphisms are generated using the large deformation diffeomorphic metric
mapping framework, i.e., they are given as solutions of a flow equation that is
defined by a velocity field. We prove existence of solutions to this indirect
image registration procedure and provide examples of its performance on 2D
tomography with very sparse and/or highly noisy data.
Recurrent Inference Machines for Solving Inverse Problems
Patrick Putzky , Max Welling Subjects : Neural and Evolutionary Computing (cs.NE) ; Computer Vision and Pattern Recognition (cs.CV)
Much of the recent research on solving iterative inference problems focuses
on moving away from hand-chosen inference algorithms and towards learned
inference. In the latter, the inference process is unrolled in time and
interpreted as a recurrent neural network (RNN) which allows for joint learning
of model and inference parameters with back-propagation through time. In this
framework, the RNN architecture is directly derived from a hand-chosen
inference algorithm, effectively limiting its capabilities. We propose a
learning framework, called Recurrent Inference Machines (RIM), in which we turn
algorithm construction the other way round: Given data and a task, train an RNN
to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they
are capable to implement any inference algorithm. The framework allows for an
abstraction which removes the need for domain knowledge. We demonstrate in
several image restoration experiments that this abstraction is effective,
allowing us to achieve state-of-the-art performance on image denoising and
super-resolution tasks and superior across-task generalization.
SmoothGrad: removing noise by adding noise
Comments: 10 pages
Subjects:
Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Explaining the output of a deep network remains a challenge. In the case of
an image classifier, one type of explanation is to identify pixels that
strongly influence the final decision. A starting point for this strategy is
the gradient of the class score function with respect to the input image. This
gradient can be interpreted as a sensitivity map, and there are several
techniques that elaborate on this basic idea. This paper makes two
contributions: it introduces SmoothGrad, a simple method that can help visually
sharpen gradient-based sensitivity maps, and it discusses lessons in the
visualization of these maps. We publish the code for our experiments and a
website with our results.
Artificial Intelligence
Technical Report: Implementation and Validation of a Smart Health Application
Comments: 4-page Tech Report
Subjects:
Artificial Intelligence (cs.AI)
; Computers and Society (cs.CY)
In this article, we explain in detail the internal structures and databases
of a smart health application. Moreover, we describe how to generate a
statistically sound synthetic dataset using real-world medical data.
Comments: AAAI 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Monte Carlo tree search (MCTS) is extremely popular in computer Go which
determines each action by enormous simulations in a broad and deep search tree.
However, human experts select most actions by pattern analysis and careful
evaluation rather than brute search of millions of future nteractions. In this
paper, we propose a computer Go system that follows experts way of thinking and
playing. Our system consists of two parts. The first part is a novel deep
alternative neural network (DANN) used to generate candidates of next move.
Compared with existing deep convolutional neural network (DCNN), DANN inserts
recurrent layer after each convolutional layer and stacks them in an
alternative manner. We show such setting can preserve more contexts of local
features and its evolutions which are beneficial for move prediction. The
second part is a long-term evaluation (LTE) module used to provide a reliable
evaluation of candidates rather than a single probability from move predictor.
This is consistent with human experts nature of playing since they can foresee
tens of steps to give an accurate estimation of candidates. In our system, for
each candidate, LTE calculates a cumulative reward after several future
interactions when local variations are settled. Combining criteria from the two
parts, our system determines the optimal choice of next move. For more
comprehensive experiments, we introduce a new professional Go dataset (PGD),
consisting of 253233 professional records. Experiments on GoGoD and PGD
datasets show the DANN can substantially improve performance of move prediction
over pure DCNN. When combining LTE, our system outperforms most relevant
approaches and open engines based on MCTS.
Meta learning Framework for Automated Driving
Ahmad El Sallab , Mahmoud Saeed , Omar Abdel Tawab , Mohammed Abdou Subjects : Artificial Intelligence (cs.AI) ; Machine Learning (stat.ML)
The success of automated driving deployment is highly depending on the
ability to develop an efficient and safe driving policy. The problem is well
formulated under the framework of optimal control as a cost optimization
problem. Model based solutions using traditional planning are efficient, but
require the knowledge of the environment model. On the other hand, model free
solutions suffer sample inefficiency and require too many interactions with the
environment, which is infeasible in practice. Methods under the Reinforcement
Learning framework usually require the notion of a reward function, which is
not available in the real world. Imitation learning helps in improving sample
efficiency by introducing prior knowledge obtained from the demonstrated
behavior, on the risk of exact behavior cloning without generalizing to unseen
environments. In this paper we propose a Meta learning framework, based on data
set aggregation, to improve generalization of imitation learning algorithms.
Under the proposed framework, we propose MetaDAgger, a novel algorithm which
tackles the generalization issues in traditional imitation learning. We use The
Open Race Car Simulator (TORCS) to test our algorithm. Results on unseen test
tracks show significant improvement over traditional imitation learning
algorithms, improving the learning time and sample efficiency in the same time.
The results are also supported by visualization of the learnt features to prove
generalization of the captured details.
On Natural Language Generation of Formal Argumentation
Comments: 17 pages, 4 figures, technical report
Subjects:
Artificial Intelligence (cs.AI)
In this paper we provide a first analysis of the research questions that
arise when dealing with the problem of communicating pieces of formal
argumentation through natural language interfaces. It is a generally held
opinion that formal models of argumentation naturally capture human argument,
and some preliminary studies have focused on justifying this view.
Unfortunately, the results are not only inconclusive, but seem to suggest that
explaining formal argumentation to humans is a rather articulated task.
Graphical models for expressing argumentation-based reasoning are appealing,
but often humans require significant training to use these tools effectively.
We claim that natural language interfaces to formal argumentation systems offer
a real alternative, and may be the way forward for systems that capture human
argument.
Comments: conference
Subjects:
Artificial Intelligence (cs.AI)
; Computers and Society (cs.CY)
A major investment made by a telecom operator goes into the infrastructure
and its maintenance, while business revenues are proportional to how big and
good the customer base is. We present a data-driven analytic strategy based on
combinatorial optimization and analysis of historical data. The data cover
historical mobility of the users in one region of Sweden during a week.
Applying the proposed method to the case study, we have identified the optimal
proportion of geo-demographic segments in the customer base, developed a
functionality to assess the potential of a planned marketing campaign, and
explored the problem of an optimal number and types of the geo-demographic
segments to target through marketing campaigns. With the help of fuzzy logic,
the conclusions of data analysis are automatically translated into
comprehensible recommendations in a natural language.
Fuzzy Recommendations in Marketing Campaigns
Comments: conference
Subjects:
Artificial Intelligence (cs.AI)
The population in Sweden is growing rapidly due to immigration. In this
light, the issue of infrastructure upgrades to provide telecommunication
services is of importance. New antennas can be installed at hot spots of user
demand, which will require an investment, and/or the clientele expansion can be
carried out in a planned manner to promote the exploitation of the
infrastructure in the less loaded geographical zones. In this paper, we explore
the second alternative. Informally speaking, the term Infrastructure-Stressing
describes a user who stays in the zones of high demand, which are prone to
produce service failures, if further loaded. We have studied the
Infrastructure-Stressing population in the light of their correlation with
geo-demographic segments. This is motivated by the fact that specific
geo-demographic segments can be targeted via marketing campaigns. Fuzzy logic
is applied to create an interface between big data, numeric methods for
processing big data and a manager.
Item Difficulty-Based Label Aggregation Models for Crowdsourcing
Chi Hong Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Learning (cs.LG)
A large amount of labeled data is required for supervised learning. However,
labeling by domain experts is expensive and time-consuming. A low cost and high
efficiency way to obtain large training datasets is to aggregate noisy labels
collected from non-professional crowds. Prior works have proposed confusion
matrices to evaluate the reliability of workers. In this paper, we redefine the
structure of the confusion matrices and propose two Bayesian Network based
methods which utilize item difficulty in label aggregation. We assume that
labels are generated by a probability distribution over confusion matrices,
item difficulties, labels and true labels. We use Markov chain Monte Carlo
method to generate samples from the posterior distribution of model parameters
and then infer the results. To avoid bad local optima, we design a method to
preliminarily predict the difficulty of each item and initialize the model
parameters. We also introduce how to improve the scalability of our model.
Empirical results show that our methods consistently outperform
state-of-the-art methods.
A New Probabilistic Algorithm for Approximate Model Counting
Cunjing Ge , Feifei Ma , Tian Liu , Jian Zhang Subjects : Artificial Intelligence (cs.AI)
Constrained counting is important in domains ranging from artificial
intelligence to software analysis. There are already a few approaches for
counting models over various types of constraints. Recently, hashing-based
approaches achieve both theoretical guarantees and scalability, but still rely
on solution enumeration. In this paper, a new probabilistic polynomial time
approximate model counter is proposed, which is also a hashing-based universal
framework, but with only satisfiability queries. A variant with a dynamic
stopping criterion is also presented. Empirical evaluation over benchmarks on
propositional logic formulas and SMT(BV) formulas shows that the approach is
promising.
Gradient descent GAN optimization is locally stable
Vaishnavh Nagarajan , J. Zico Kolter Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Despite their growing prominence, optimization in generative adversarial
networks (GANs) is still a poorly-understood topic. In this paper, we analyze
the “gradient descent” form of GAN optimization (i.e., the natural setting
where we simultaneously take small gradient steps in both generator and
discriminator parameters). We show that even though GAN optimization does not
correspond to a convex-concave game, even for simple parameterizations, under
proper conditions, equilibrium points of this optimization procedure are still
locally asymptotically stable for the traditional GAN formulation. On the other
hand, we show that the recently-proposed Wasserstein GAN can have
non-convergent limit cycles near equilibrium. Motivated by this stability
analysis, we propose an additional regularization term for gradient descent GAN
updates, which is able to guarantee local stability for both the WGAN and for
the traditional GAN, and also shows practical promise in speeding up
convergence and addressing mode collapse.
Zero-Shot Relation Extraction via Reading Comprehension
Comments: CoNLL 2017
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Learning (cs.LG)
We show that relation extraction can be reduced to answering simple reading
comprehension questions, by associating one or more natural-language questions
with each relation slot. This reduction has several advantages: we can (1)
learn relation-extraction models by extending recent neural
reading-comprehension techniques, (2) build very large training sets for those
models by combining relation-specific crowd-sourced questions with distant
supervision, and even (3) do zero-shot learning by extracting new relation
types that are only specified at test-time, for which we have no labeled
training examples. Experiments on a Wikipedia slot-filling task demonstrate
that the approach can generalize to new questions for known relation types with
high accuracy, and that zero-shot generalization to unseen relation types is
possible, at lower accuracy levels, setting the bar for future work on this
task.
Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks
Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
Recommendation algorithms that incorporate techniques from deep learning are
becoming increasingly popular. Due to the structure of the data coming from
recommendation domains (i.e., one-hot-encoded vectors of item preferences),
these algorithms tend to have large input and output dimensionalities that
dominate their overall size. This makes them difficult to train, due to the
limited memory of graphical processing units, and difficult to deploy on mobile
devices with limited hardware. To address these difficulties, we propose Bloom
embeddings, a compression technique that can be applied to the input and output
of neural network models dealing with sparse high-dimensional binary-coded
instances. Bloom embeddings are computationally efficient, and do not seriously
compromise the accuracy of the model up to 1/5 compression ratios. In some
cases, they even improve over the original accuracy, with relative increases up
to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4
alternative methods, obtaining favorable results. We also discuss a number of
further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time
operation, zero or marginal space requirements, training time speedups, or the
fact that they do not require any change to the core model architecture or
training configuration.
A Supervised Approach to Extractive Summarisation of Scientific Papers
Comments: 11 pages, 6 figures
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)
Automatic summarisation is a popular approach to reduce a document to its
main arguments. Recent research in the area has focused on neural approaches to
summarisation, which can be very data-hungry. However, few large datasets exist
and none for the traditionally popular domain of scientific publications, which
opens up challenging research avenues centered on encoding large, complex
documents. In this paper, we introduce a new dataset for summarisation of
computer science publications by exploiting a large resource of author provided
summaries and show straightforward ways of extending it further. We develop
models on the dataset making use of both neural sentence encoding and
traditionally used summarisation features and show that models which encode
sentences as well as their local and global context perform best, significantly
outperforming well-established baseline methods.
Causal Discovery in the Presence of Measurement Error: Identifiability Conditions
Comments: 15 pages, 5 figures, 1 table
Subjects:
Methodology (stat.ME)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Measurement error in the observed values of the variables can greatly change
the output of various causal discovery methods. This problem has received much
attention in multiple fields, but it is not clear to what extent the causal
model for the measurement-error-free variables can be identified in the
presence of measurement error with unknown variance. In this paper, we study
precise sufficient identifiability conditions for the measurement-error-free
causal model and show what information of the causal model can be recovered
from observed data. In particular, we present two different sets of
identifiability conditions, based on the second-order statistics and
higher-order statistics of the data, respectively. The former was inspired by
the relationship between the generating model of the
measurement-error-contaminated data and the factor analysis model, and the
latter makes use of the identifiability result of the over-complete independent
component analysis problem.
Optimal Auctions through Deep Learning
Paul Dütting , Zhe Feng , Harikrishna Narasimhan , David C. Parkes Subjects : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Learning (cs.LG)
Designing an auction that maximizes expected revenue is an intricate task.
Indeed, as of today–despite major efforts and impressive progress over the
past few years–only the single-item case is fully understood. In this work, we
initiate the exploration of the use of tools from deep learning on this topic.
The design objective is revenue optimal, dominant-strategy incentive compatible
auctions. We show that multi-layer neural networks can learn almost-optimal
auctions for settings for which there are analytical solutions, such as
Myerson’s auction for a single item, Manelli and Vincent’s mechanism for a
single bidder with additive preferences over two items, or Yao’s auction for
two additive bidders with binary support distributions and multiple items, even
if no prior knowledge about the form of optimal auctions is encoded in the
network and the only feedback during training is revenue and regret. We further
show how characterization results, even rather implicit ones such as Rochet’s
characterization through induced utilities and their gradients, can be
leveraged to obtain more precise fits to the optimal design. We conclude by
demonstrating the potential of deep learning for deriving optimal auctions with
high revenue for poorly understood problems.
Information Retrieval
Recurrent Latent Variable Networks for Session-Based Recommendation
Sotirios Chatzis , Panayiotis Christodoulou , Andreas S. Andreou Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Machine Learning (stat.ML)
In this work, we attempt to ameliorate the impact of data sparsity in the
context of session-based recommendation. Specifically, we seek to devise a
machine learning mechanism capable of extracting subtle and complex underlying
temporal dynamics in the observed session data, so as to inform the
recommendation algorithm. To this end, we improve upon systems that utilize
deep learning techniques with recurrently connected units; we do so by adopting
concepts from the field of Bayesian statistics, namely variational inference.
Our proposed approach consists in treating the network recurrent units as
stochastic latent variables with a prior distribution imposed over them. On
this basis, we proceed to infer corresponding posteriors; these can be used for
prediction and recommendation generation, in a way that accounts for the
uncertainty in the available sparse training data. To allow for our approach to
easily scale to large real-world datasets, we perform inference under an
approximate amortized variational inference (AVI) setup, whereby the learned
posteriors are parameterized via (conventional) neural networks. We perform an
extensive experimental evaluation of our approach using challenging benchmark
datasets, and illustrate its superiority over existing state-of-the-art
techniques.
RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval
Comments: SIGIR 17 (resource)
Subjects:
Information Retrieval (cs.IR)
Improvements of entity-relationship (E-R) search techniques have been
hampered by a lack of test collections, particularly for complex queries
involving multiple entities and relationships. In this paper we describe a
method for generating E-R test queries to support comprehensive E-R search
experiments. Queries and relevance judgments are created from content that
exists in a tabular form where columns represent entity types and the table
structure implies one or more relationships among the entities. Editorial work
involves creating natural language queries based on relationships represented
by the entries in the table. We have publicly released the RELink test
collection comprising 600 queries and relevance judgments obtained from a
sample of Wikipedia List-of-lists-of-lists tables. The latter comprise tuples
of entities that are extracted from columns and labelled by corresponding
entity types and relationships they represent. In order to facilitate research
in complex E-R retrieval, we have created and released as open source the
RELink Framework that includes Apache Lucene indexing and search specifically
tailored to E-R retrieval. RELink includes entity and relationship indexing
based on the ClueWeb-09-B Web collection with FACC1 text span annotations
linked to Wikipedia entities. With ready to use search resources and a
comprehensive test collection, we support community in pursuing E-R research at
scale.
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks
Massimo Quadrana , Alexandros Karatzoglou , Balázs Hidasi , Paolo Cremonesi Subjects : Learning (cs.LG) ; Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Session-based recommendations are highly relevant in many modern on-line
services (e.g. e-commerce, video streaming) and recommendation settings.
Recurrent Neural Networks have recently been shown to perform very well in
session-based settings. While in many session-based recommendation domains user
identifiers are hard to come by, there are also domains in which user profiles
are readily available. We propose a seamless way to personalize RNN models with
cross-session information transfer and devise a hierarchical RNN model that
relies end evolves latent hidden states of the RNN’s across user sessions.
Results on two industry datasets show large improvements over the session-only
RNN’s.
Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks
Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
Recommendation algorithms that incorporate techniques from deep learning are
becoming increasingly popular. Due to the structure of the data coming from
recommendation domains (i.e., one-hot-encoded vectors of item preferences),
these algorithms tend to have large input and output dimensionalities that
dominate their overall size. This makes them difficult to train, due to the
limited memory of graphical processing units, and difficult to deploy on mobile
devices with limited hardware. To address these difficulties, we propose Bloom
embeddings, a compression technique that can be applied to the input and output
of neural network models dealing with sparse high-dimensional binary-coded
instances. Bloom embeddings are computationally efficient, and do not seriously
compromise the accuracy of the model up to 1/5 compression ratios. In some
cases, they even improve over the original accuracy, with relative increases up
to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4
alternative methods, obtaining favorable results. We also discuss a number of
further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time
operation, zero or marginal space requirements, training time speedups, or the
fact that they do not require any change to the core model architecture or
training configuration.
A Direction Search and Spectral Clustering Based Approach to Subspace Clustering
Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
This paper presents a new spectral-clustering-based approach to the subspace
clustering problem in which the data lies in the union of an unknown number of
unknown linear subspaces. Underpinning the proposed method is a convex program
for optimal direction search, which for each data point d, finds an optimal
direction in the span of the data that has minimum projection on the other data
points and non-vanishing projection on d. The obtained directions are
subsequently leveraged to identify a neighborhood set for each data point. An
Alternating Direction Method of Multipliers (ADMM) framework is provided to
efficiently solve for the optimal directions. The proposed method is shown to
often outperform the existing subspace clustering methods, particularly for
unwieldy scenarios involving high levels of noise and close subspaces, and
yields the state-of-the-art results for the problem of face clustering using
subspace segmentation.
Dionysius: A Framework for Modeling Hierarchical User Interactions in Recommender Systems
Jian Wang , Krishnaram Kenthapadi , Kaushik Rangadurai , David Hardtke Subjects : Social and Information Networks (cs.SI) ; Information Retrieval (cs.IR)
We address the following problem: How do we incorporate user item interaction
signals as part of the relevance model in a large-scale personalized
recommendation system such that, (1) the ability to interpret the model and
explain recommendations is retained, and (2) the existing infrastructure
designed for the (user profile) content-based model can be leveraged? We
propose Dionysius, a hierarchical graphical model based framework and system
for incorporating user interactions into recommender systems, with minimal
change to the underlying infrastructure. We learn a hidden fields vector for
each user by considering the hierarchy of interaction signals, and replace the
user profile-based vector with this learned vector, thereby not expanding the
feature space at all. Thus, our framework allows the use of existing
recommendation infrastructure that supports content based features. We
implemented and deployed this system as part of the recommendation platform at
LinkedIn for more than one year. We validated the efficacy of our approach
through extensive offline experiments with different model choices, as well as
online A/B testing experiments. Our deployment of this system as part of the
job recommendation engine resulted in significant improvement in the quality of
retrieved results, thereby generating improved user experience and positive
impact for millions of users.
Computation and Language
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
Marcin Junczys-Dowmunt , Roman Grundkiewicz Subjects : Computation and Language (cs.CL)
In this work, we explore multiple neural architectures adapted for the task
of automatic post-editing of machine translation output. We focus on neural
end-to-end models that combine both inputs (mt) and (src) in a single neural
architecture, modeling ({mt, src}
ightarrow pe) directly. Apart from that,
we investigate the influence of hard-attention models which seem to be
well-suited for monolingual tasks, as well as combinations of both ideas. We
report results on data sets provided during the WMT 2016 shared task on
automatic post-editing and can demonstrate that double-attention models that
incorporate all available data in the APE scenario in a single model improve on
the best shared task system and on all other published results after the shared
task. Double-attention models that are combined with hard attention remain
competitive despite applying fewer changes to the input.
Zero-Shot Relation Extraction via Reading Comprehension
Comments: CoNLL 2017
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Learning (cs.LG)
We show that relation extraction can be reduced to answering simple reading
comprehension questions, by associating one or more natural-language questions
with each relation slot. This reduction has several advantages: we can (1)
learn relation-extraction models by extending recent neural
reading-comprehension techniques, (2) build very large training sets for those
models by combining relation-specific crowd-sourced questions with distant
supervision, and even (3) do zero-shot learning by extracting new relation
types that are only specified at test-time, for which we have no labeled
training examples. Experiments on a Wikipedia slot-filling task demonstrate
that the approach can generalize to new questions for known relation types with
high accuracy, and that zero-shot generalization to unseen relation types is
possible, at lower accuracy levels, setting the bar for future work on this
task.
Modelling prosodic structure using Artificial Neural Networks
Comments: 4 pages, 3 figures, Experimental linguistics 2017
Subjects:
Computation and Language (cs.CL)
The ability to accurately perceive whether a speaker is asking a question or
is making a statement is crucial for any successful interaction. However,
learning and classifying tonal patterns has been a challenging task for
automatic speech recognition and for models of tonal representation, as tonal
contours are characterized by significant variation. This paper provides a
classification model of Cypriot Greek questions and statements. We evaluate two
state-of-the-art network architectures: a Long Short-Term Memory (LSTM) network
and a convolutional network (ConvNet). The ConvNet outperforms the LSTM in the
classification task and exhibited an excellent performance with 95%
classification accuracy.
A Supervised Approach to Extractive Summarisation of Scientific Papers
Comments: 11 pages, 6 figures
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)
Automatic summarisation is a popular approach to reduce a document to its
main arguments. Recent research in the area has focused on neural approaches to
summarisation, which can be very data-hungry. However, few large datasets exist
and none for the traditionally popular domain of scientific publications, which
opens up challenging research avenues centered on encoding large, complex
documents. In this paper, we introduce a new dataset for summarisation of
computer science publications by exploiting a large resource of author provided
summaries and show straightforward ways of extending it further. We develop
models on the dataset making use of both neural sentence encoding and
traditionally used summarisation features and show that models which encode
sentences as well as their local and global context perform best, significantly
outperforming well-established baseline methods.
Six Challenges for Neural Machine Translation
Comments: 12 pages; First Workshop on Neural Machine Translation, 2017
Subjects:
Computation and Language (cs.CL)
We explore six challenges for neural machine translation: domain mismatch,
amount of training data, rare words, long sentences, word alignment, and beam
search. We show both deficiencies and improvements over the quality of
phrase-based statistical machine translation.
Attention-based Vocabulary Selection for NMT Decoding
Comments: Submitted to Second Conference on Machine Translation (WMT-17); 7 pages
Subjects:
Computation and Language (cs.CL)
Neural Machine Translation (NMT) models usually use large target vocabulary
sizes to capture most of the words in the target language. The vocabulary size
is a big factor when decoding new sentences as the final softmax layer
normalizes over all possible target words. To address this problem, it is
widely common to restrict the target vocabulary with candidate lists based on
the source sentence. Usually, the candidate lists are a combination of external
word-to-word aligner, phrase table entries or most frequent words. In this
work, we propose a simple and yet novel approach to learn candidate lists
directly from the attention layer during NMT training. The candidate lists are
highly optimized for the current NMT model and do not need any external
computation of the candidate pool. We show significant decoding speedup
compared with using the entire vocabulary, without losing any translation
quality for two language pairs.
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Comments: To appear Interspeech 2017
Subjects:
Computation and Language (cs.CL)
Query-by-example search often uses dynamic time warping (DTW) for comparing
queries and proposed matching segments. Recent work has shown that comparing
speech segments by representing them as fixed-dimensional vectors — acoustic
word embeddings — and measuring their vector distance (e.g., cosine distance)
can discriminate between words more accurately than DTW-based approaches. We
consider an approach to query-by-example search that embeds both the query and
database segments according to a neural model, followed by nearest-neighbor
search to find the matching segments. Earlier work on embedding-based
query-by-example, using template-based acoustic word embeddings, achieved
competitive performance. We find that our embeddings, based on recurrent neural
networks trained to optimize word discrimination, achieve substantial
improvements in performance and run-time efficiency over the previous
approaches.
Encoding of phonology in a recurrent neural model of grounded speech
Comments: Accepted at CoNLL 2017
Subjects:
Computation and Language (cs.CL)
; Learning (cs.LG); Sound (cs.SD)
We study the representation and encoding of phonemes in a recurrent neural
network model of grounded speech. We use a model which processes images and
their spoken descriptions, and projects the visual and auditory representations
into the same semantic space. We perform a number of analyses on how
information about individual phonemes is encoded in the MFCC features extracted
from the speech signal, and the activations of the layers of the model. Via
experiments with phoneme decoding and phoneme discrimination we show that
phoneme representations are most salient in the lower layers of the model,
where low-level signals are processed at a fine-grained level, although a large
amount of phonological information is retain at the top recurrent layer. We
further find out that the attention mechanism following the top recurrent layer
significantly attenuates encoding of phonology and makes the utterance
embeddings much more invariant to synonymy. Moreover, a hierarchical clustering
of phoneme representations learned by the network shows an organizational
structure of phonemes similar to those proposed in linguistics.
Verb Physics: Relative Physical Knowledge of Actions and Objects
Comments: 11 pages, published in Proceedings of ACL 2017
Subjects:
Computation and Language (cs.CL)
Learning commonsense knowledge from natural language text is nontrivial due
to reporting bias: people rarely state the obvious, e.g., “My house is bigger
than me.” However, while rarely stated explicitly, this trivial everyday
knowledge does influence the way people talk about the world, which provides
indirect clues to reason about the world. For example, a statement like, “Tyler
entered his house” implies that his house is bigger than Tyler.
In this paper, we present an approach to infer relative physical knowledge of
actions and objects along five dimensions (e.g., size, weight, and strength)
from unstructured natural language text. We frame knowledge acquisition as
joint inference over two closely related problems: learning (1) relative
physical knowledge of object pairs and (2) physical implications of actions
when applied to those object pairs. Empirical results demonstrate that it is
possible to extract knowledge of actions and objects from language and that
joint inference over different types of knowledge improves performance.
Adversarial Feature Matching for Text Generation
Comments: Accepted by ICML 2017
Subjects:
Machine Learning (stat.ML)
; Computation and Language (cs.CL); Learning (cs.LG)
The Generative Adversarial Network (GAN) has achieved great success in
generating realistic (real-valued) synthetic data. However, convergence issues
and difficulties dealing with discrete data hinder the applicability of GAN to
text. We propose a framework for generating realistic text via adversarial
training. We employ a long short-term memory network as generator, and a
convolutional network as discriminator. Instead of using the standard objective
of GAN, we propose matching the high-dimensional latent feature distributions
of real and synthetic sentences, via a kernelized discrepancy metric. This
eases adversarial training by alleviating the mode-collapsing problem. Our
experiments show superior performance in quantitative evaluation, and
demonstrate that our model can generate realistic-looking sentences.
Distributed, Parallel, and Cluster Computing
Live Service Migration in Mobile Edge Clouds
Comments: This is the author’s version of the paper accepted for publication in IEEE Wireless Communications
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Networking and Internet Architecture (cs.NI)
Mobile edge clouds (MECs) bring the benefits of the cloud closer to the user,
by installing small cloud infrastructures at the network edge. This enables a
new breed of real-time applications, such as instantaneous object recognition
and safety assistance in intelligent transportation systems, that require very
low latency. One key issue that comes with proximity is how to ensure that
users always receive good performance as they move across different locations.
Migrating services between MECs is seen as the means to achieve this. This
article presents a layered framework for migrating active service applications
that are encapsulated either in virtual machines (VMs) or containers. This
layering approach allows a substantial reduction in service downtime. The
framework is easy to implement using readily available technologies, and one of
its key advantages is that it supports containers, which is a promising
emerging technology that offers tangible benefits over VMs. The migration
performance of various real applications is evaluated by experiments under the
presented framework. Insights drawn from the experimentation results are
discussed.
Distributed Subgraph Detection
Pierre Fraigniaud , Pedro Montealegre , Dennis Olivetti , Ivan Rapaport , Ioan Todinca Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
In the standard CONGEST model for distributed network computing, it is known
that “global” tasks such as minimum spanning tree, diameter, and all-pairs
shortest paths, consume large bandwidth, for their running-time is
(Omega(mbox{poly}(n))) rounds in (n)-node networks with constant diameter.
Surprisingly, “local” tasks such as detecting the presence of a 4-cycle as a
subgraph also requires (widetilde{Omega}(sqrt{n})) rounds, even using
randomized algorithms, and the best known upper bound for detecting the
presence of a 3-cycle is (widetilde{O}(n^{frac{2}{3}})) rounds. The objective
of this paper is to better understand the landscape of such subgraph detection
tasks. We show that, in contrast to emph{cycles}, which are hard to detect in
the CONGEST model, there exists a deterministic algorithm for detecting the
presence of a subgraph isomorphic to (T) running in a emph{constant} number of
rounds, for every tree (T). Our algorithm provides a distributed implementation
of a combinatorial technique due to ErdH{o}s et al. for sparsening the set of
partial solutions kept by the nodes at each round. Our result has important
consequences to emph{distributed property-testing}, i.e., to randomized
algorithms whose aim is to distinguish between graphs satisfying a property,
and graphs far from satisfying that property. In particular, we get that, for
every graph pattern (H) composed of an edge and a tree connected in an
arbitrary manner, there exists a distributed testing algorithm for
(H)-freeness, performing in a constant number of rounds. Although the class of
graph patterns (H) formed by a tree and an edge connected arbitrarily may look
artificial, all previous results of the literature concerning testing
(H)-freeness for classical patterns such as cycles and cliques can be viewed as
direct consequences of our result, while our algorithm enables testing more
complex patterns.
Distributed Detection of Cycles
Pierre Fraigniaud , Dennis Olivetti Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed property testing in networks has been introduced by Brakerski and
Patt-Shamir (2011), with the objective of detecting the presence of large dense
sub-networks in a distributed manner. Recently, Censor-Hillel et al. (2016)
have shown how to detect 3-cycles in a constant number of rounds by a
distributed algorithm. In a follow up work, Fraigniaud et al. (2016) have shown
how to detect 4-cycles in a constant number of rounds as well. However, the
techniques in these latter works were shown not to generalize to larger cycles
(C_k) with (kgeq 5). In this paper, we completely settle the problem of cycle
detection, by establishing the following result. For every (kgeq 3), there
exists a distributed property testing algorithm for (C_k)-freeness, performing
in a constant number of rounds. All these results hold in the classical CONGEST
model for distributed network computing. Our algorithm is 1-sided error. Its
round-complexity is (O(1/epsilon)) where (epsilonin(0,1)) is the property
testing parameter measuring the gap between legal and illegal instances.
The Power of Choice in Priority Scheduling
Dan Alistarh , Justin Kopinsky , Jerry Li , Giorgi Nadiradze Subjects : Data Structures and Algorithms (cs.DS) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Consider the following random process: we are given (n) queues, into which
elements of increasing labels are inserted uniformly at random. To remove an
element, we pick two queues at random, and remove the element of lower label
(higher priority) among the two. The cost of a removal is the rank of the label
removed, among labels still present in any of the queues, that is, the distance
from the optimal choice at each step. Variants of this strategy are prevalent
in state-of-the-art concurrent priority queue implementations. Nonetheless, it
is not known whether such implementations provide any rank guarantees, even in
a sequential model.
We answer this question, showing that this strategy provides surprisingly
strong guarantees: Although the single-choice process, where we always insert
and remove from a single randomly chosen queue, has degrading cost, going to
infinity as we increase the number of steps, in the two choice process, the
expected rank of a removed element is (O( n )) while the expected worst-case
cost is (O( n log n )). These bounds are tight, and hold irrespective of the
number of steps for which we run the process.
The argument is based on a new technical connection between “heavily loaded”
balls-into-bins processes and priority scheduling.
Our analytic results inspire a new concurrent priority queue implementation,
which improves upon the state of the art in terms of practical performance.
Serialisable Multi-Level Transaction Control: A Specification and Verification
Comments: 25 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1706.01762
Journal-ref: Sci. Comput. Program. 131: 42-58 (2016)
Subjects:
Databases (cs.DB)
; Distributed, Parallel, and Cluster Computing (cs.DC)
We define a programming language independent controller TaCtl for multi-level
transactions and an operator (TA), which when applied to concurrent programs
with multi-level shared locations containing hierarchically structured complex
values, turns their behavior with respect to some abstract termination
criterion into a transactional behavior. We prove the correctness property that
concurrent runs under the transaction controller are serialisable, assuming an
Inverse Operation Postulate to guarantee recoverability. For its applicability
to a wide range of programs we specify the transaction controller TaCtl and the
operator (TA) in terms of Abstract State Machines (ASMs). This allows us to
model concurrent updates at different levels of nested locations in a precise
yet simple manner, namely in terms of partial ASM updates. It also provides the
possibility to use the controller TaCtl and the operator (TA) as a plug-in when
specifying concurrent system components in terms of sequential ASMs.
Asynchronous Graph Pattern Matching on Multiprocessor Systems
Comments: 14 Pages, Extended version for ADBIS 2017
Subjects:
Databases (cs.DB)
; Distributed, Parallel, and Cluster Computing (cs.DC)
Pattern matching on large graphs is the foundation for a variety of
application domains. Strict latency requirements and continuously increasing
graph sizes demand the usage of highly parallel in-memory graph processing
engines that need to consider non-uniform memory access (NUMA) and concurrency
issues to scale up on modern multiprocessor systems. To tackle these aspects,
graph partitioning becomes increasingly important. Hence, we present a
technique to process graph pattern matching on NUMA systems in this paper. As a
scalable pattern matching processing infrastructure, we leverage a
data-oriented architecture that preserves data locality and minimizes
concurrency-related bottlenecks on NUMA systems. We show in detail, how graph
pattern matching can be asynchronously processed on a multiprocessor system.
Learning
Gradient descent GAN optimization is locally stable
Vaishnavh Nagarajan , J. Zico Kolter Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Despite their growing prominence, optimization in generative adversarial
networks (GANs) is still a poorly-understood topic. In this paper, we analyze
the “gradient descent” form of GAN optimization (i.e., the natural setting
where we simultaneously take small gradient steps in both generator and
discriminator parameters). We show that even though GAN optimization does not
correspond to a convex-concave game, even for simple parameterizations, under
proper conditions, equilibrium points of this optimization procedure are still
locally asymptotically stable for the traditional GAN formulation. On the other
hand, we show that the recently-proposed Wasserstein GAN can have
non-convergent limit cycles near equilibrium. Motivated by this stability
analysis, we propose an additional regularization term for gradient descent GAN
updates, which is able to guarantee local stability for both the WGAN and for
the traditional GAN, and also shows practical promise in speeding up
convergence and addressing mode collapse.
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks
Massimo Quadrana , Alexandros Karatzoglou , Balázs Hidasi , Paolo Cremonesi Subjects : Learning (cs.LG) ; Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Session-based recommendations are highly relevant in many modern on-line
services (e.g. e-commerce, video streaming) and recommendation settings.
Recurrent Neural Networks have recently been shown to perform very well in
session-based settings. While in many session-based recommendation domains user
identifiers are hard to come by, there are also domains in which user profiles
are readily available. We propose a seamless way to personalize RNN models with
cross-session information transfer and devise a hierarchical RNN model that
relies end evolves latent hidden states of the RNN’s across user sessions.
Results on two industry datasets show large improvements over the session-only
RNN’s.
Online Learning for Structured Loss Spaces
We consider prediction with expert advice when the loss vectors are assumed
to lie in a set described by the sum of atomic norm balls. We derive a regret
bound for a general version of the online mirror descent (OMD) algorithm that
uses a combination of regularizers, each adapted to the constituent atomic
norms. The general result recovers standard OMD regret bounds, and yields
regret bounds for new structured settings where the loss vectors are (i) noisy
versions of points from a low-rank subspace, (ii) sparse vectors corrupted with
noise, and (iii) sparse perturbations of low-rank vectors. For the problem of
online learning with structured losses, we also show lower bounds on regret in
terms of rank and sparsity of the source set of the loss vectors, which implies
lower bounds for the above additive loss settings as well.
Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations
Comments: Accepted to the International Conference on Machine Learning (ICML), 2017
Subjects:
Learning (cs.LG)
; Data Structures and Algorithms (cs.DS); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
Non-negative matrix factorization is a basic tool for decomposing data into
the feature and weight matrices under non-negativity constraints, and in
practice is often solved in the alternating minimization framework. However, it
is unclear whether such algorithms can recover the ground-truth feature matrix
when the weights for different features are highly correlated, which is common
in applications. This paper proposes a simple and natural alternating gradient
descent based algorithm, and shows that with a mild initialization it provably
recovers the ground-truth in the presence of strong correlations. In most
interesting cases, the correlation can be in the same order as the highest
possible. Our analysis also reveals its several favorable features including
robustness to noise. We complement our theoretical results with empirical
studies on semi-synthetic datasets, demonstrating its advantage over several
popular methods in recovering the ground-truth.
Convergence Analysis of Belief Propagation for Pairwise Linear Gaussian Models
Comments: arXiv admin note: text overlap with arXiv:1704.03969
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Gaussian belief propagation (BP) has been widely used for distributed
inference in large-scale networks such as the smart grid, sensor networks, and
social networks, where local measurements/observations are scattered over a
wide geographical area. One particular case is when two neighboring agents
share a common observation. For example, to estimate voltage in the direct
current (DC) power flow model, the current measurement over a power line is
proportional to the voltage difference between two neighboring buses. When
applying the Gaussian BP algorithm to this type of problem, the convergence
condition remains an open issue. In this paper, we analyze the convergence
properties of Gaussian BP for this pairwise linear Gaussian model. We show
analytically that the updating information matrix converges at a geometric rate
to a unique positive definite matrix with arbitrary positive semidefinite
initial value and further provide the necessary and sufficient convergence
condition for the belief mean vector to the optimal estimate.
Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks
Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
Recommendation algorithms that incorporate techniques from deep learning are
becoming increasingly popular. Due to the structure of the data coming from
recommendation domains (i.e., one-hot-encoded vectors of item preferences),
these algorithms tend to have large input and output dimensionalities that
dominate their overall size. This makes them difficult to train, due to the
limited memory of graphical processing units, and difficult to deploy on mobile
devices with limited hardware. To address these difficulties, we propose Bloom
embeddings, a compression technique that can be applied to the input and output
of neural network models dealing with sparse high-dimensional binary-coded
instances. Bloom embeddings are computationally efficient, and do not seriously
compromise the accuracy of the model up to 1/5 compression ratios. In some
cases, they even improve over the original accuracy, with relative increases up
to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4
alternative methods, obtaining favorable results. We also discuss a number of
further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time
operation, zero or marginal space requirements, training time speedups, or the
fact that they do not require any change to the core model architecture or
training configuration.
Accelerated Dual Learning by Homotopic Initialization
Hadi Daneshmand , Hamed Hassani , Thomas Hofmann Subjects : Learning (cs.LG)
Gradient descent and coordinate descent are well understood in terms of their
asymptotic behavior, but less so in a transient regime often used for
approximations in machine learning. We investigate how proper initialization
can have a profound effect on finding near-optimal solutions quickly. We show
that a certain property of a data set, namely the boundedness of the
correlations between eigenfeatures and the response variable, can lead to
faster initial progress than expected by commonplace analysis. Convex
optimization problems can tacitly benefit from that, but this automatism does
not apply to their dual formulation. We analyze this phenomenon and devise
provably good initialization strategies for dual optimization as well as
heuristics for the non-convex case, relevant for deep learning. We find our
predictions and methods to be experimentally well-supported.
Exact Learning from an Honest Teacher That Answers Membership Queries
Nader H. Bshouty Subjects : Learning (cs.LG)
Given a teacher that holds a function (f:X o R) from some class of functions
(C). The teacher can receive from the learner an element~(d) in the domain (X)
(a query) and returns the value of the function in (d), (f(d)in R). The
learner goal is to find (f) with a minimum number of queries, optimal time
complexity, and optimal resources.
In this survey, we present some of the results known from the literature,
different techniques used, some new problems, and open problems.
A Well-Tempered Landscape for Non-convex Robust Subspace Recovery
Tyler Maunu , Teng Zhang , Gilad Lerman Subjects : Learning (cs.LG) ; Optimization and Control (math.OC); Machine Learning (stat.ML)
We present a mathematical analysis of a non-convex energy landscape for
Robust Subspace Recovery. We prove that an underlying subspace is the only
stationary point and local minimizer in a large neighborhood if a generic
condition holds for a dataset. We further show that if the generic condition is
satisfied, a geodesic gradient descent method over the Grassmannian manifold
can exactly recover the underlying subspace with proper initialization. The
condition is shown to hold with high probability for a certain model of data.
MNL-Bandit: A Dynamic Learning Approach to Assortment Selection
Shipra Agrawal , Vashist Avadhanula , Vineet Goyal , Assaf Zeevi Subjects : Learning (cs.LG)
We consider a dynamic assortment selection problem, where in every round the
retailer offers a subset (assortment) of (N) substitutable products to a
consumer, who selects one of these products according to a multinomial logit
(MNL) choice model. The retailer observes this choice and the objective is to
dynamically learn the model parameters, while optimizing cumulative revenues
over a selling horizon of length (T). We refer to this exploration-exploitation
formulation as the MNL-Bandit problem. Existing methods for this problem follow
an “explore-then-exploit” approach, which estimate parameters to a desired
accuracy and then, treating these estimates as if they are the correct
parameter values, offers the optimal assortment based on these estimates. These
approaches require certain a priori knowledge of “separability”, determined by
the true parameters of the underlying MNL model, and this in turn is critical
in determining the length of the exploration period. (Separability refers to
the distinguishability of the true optimal assortment from the other
sub-optimal alternatives.) In this paper, we give an efficient algorithm that
simultaneously explores and exploits, achieving performance independent of the
underlying parameters. The algorithm can be implemented in a fully online
manner, without knowledge of the horizon length (T). Furthermore, the algorithm
is adaptive in the sense that its performance is near-optimal in both the “well
separated” case, as well as the general parameter setting where this separation
need not hold.
Recurrent Neural Networks with Top-k Gains for Session-based Recommendations
Balázs Hidasi , Alexandros Karatzoglou Subjects : Learning (cs.LG)
RNNs have been shown to be excellent models for sequential data and in
particular for session-based user behavior. The use of RNNs provides impressive
performance benefits over classical methods in session-based recommendations.
In this work we introduce a novel ranking loss function tailored for RNNs in
recommendation settings. The better performance of such loss over alternatives,
along with further tricks and improvements described in this work, allow to
achieve an overall improvement of up to 35% in terms of MRR and Recall@20 over
previous session-based RNN solutions and up to 51% over classical collaborative
filtering approaches. Unlike data augmentation-based improvements, our method
does not increase training times significantly.
SmoothGrad: removing noise by adding noise
Comments: 10 pages
Subjects:
Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Explaining the output of a deep network remains a challenge. In the case of
an image classifier, one type of explanation is to identify pixels that
strongly influence the final decision. A starting point for this strategy is
the gradient of the class score function with respect to the input image. This
gradient can be interpreted as a sensitivity map, and there are several
techniques that elaborate on this basic idea. This paper makes two
contributions: it introduces SmoothGrad, a simple method that can help visually
sharpen gradient-based sensitivity maps, and it discusses lessons in the
visualization of these maps. We publish the code for our experiments and a
website with our results.
Lost Relatives of the Gumbel Trick
Comments: 34th International Conference on Machine Learning (ICML 2017)
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
The Gumbel trick is a method to sample from a discrete probability
distribution, or to estimate its normalizing partition function. The method
relies on repeatedly applying a random perturbation to the distribution in a
particular way, each time solving for the most likely configuration. We derive
an entire family of related methods, of which the Gumbel trick is one member,
and show that the new methods have superior properties in several settings with
minimal additional computational cost. In particular, for the Gumbel trick to
yield computational benefits for discrete graphical models, Gumbel
perturbations on all configurations are typically replaced with so-called
low-rank perturbations. We show how a subfamily of our new methods adapts to
this setting, proving new upper and lower bounds on the log partition function
and deriving a family of sequential samplers for the Gibbs distribution.
Finally, we balance the discussion by showing how the simpler analytical form
of the Gumbel trick enables additional theoretical results.
Zero-Shot Relation Extraction via Reading Comprehension
Comments: CoNLL 2017
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Learning (cs.LG)
We show that relation extraction can be reduced to answering simple reading
comprehension questions, by associating one or more natural-language questions
with each relation slot. This reduction has several advantages: we can (1)
learn relation-extraction models by extending recent neural
reading-comprehension techniques, (2) build very large training sets for those
models by combining relation-specific crowd-sourced questions with distant
supervision, and even (3) do zero-shot learning by extracting new relation
types that are only specified at test-time, for which we have no labeled
training examples. Experiments on a Wikipedia slot-filling task demonstrate
that the approach can generalize to new questions for known relation types with
high accuracy, and that zero-shot generalization to unseen relation types is
possible, at lower accuracy levels, setting the bar for future work on this
task.
Interaction-Based Distributed Learning in Cyber-Physical and Social Networks
Francesco Sasso , Angelo Coluccia , Giuseppe Notarstefano Subjects : Optimization and Control (math.OC) ; Learning (cs.LG); Statistics Theory (math.ST)
In this paper we consider a network scenario in which agents can evaluate
each other according to a score graph that models some physical or social
interaction. The goal is to design a distributed protocol, run by the agents,
allowing them to learn their unknown state among a finite set of possible
values. We propose a Bayesian framework in which scores and states are
associated to probabilistic events with unknown parameters and hyperparameters
respectively. We prove that each agent can learn its state by means of a local
Bayesian classifier and a (centralized) Maximum-Likelihood (ML) estimator of
the parameter-hyperparameter that combines plain ML and Empirical Bayes
approaches. By using tools from graphical models, which allow us to gain
insight on conditional dependences of scores and states, we provide two relaxed
probabilistic models that ultimately lead to ML parameter-hyperparameter
estimators amenable to distributed computation. In order to highlight the
appropriateness of the proposed relaxations, we demonstrate the distributed
estimators on a machine-to-machine testing set-up for anomaly detection and on
a social interaction set-up for user profiling.
Comments: AAAI 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Monte Carlo tree search (MCTS) is extremely popular in computer Go which
determines each action by enormous simulations in a broad and deep search tree.
However, human experts select most actions by pattern analysis and careful
evaluation rather than brute search of millions of future nteractions. In this
paper, we propose a computer Go system that follows experts way of thinking and
playing. Our system consists of two parts. The first part is a novel deep
alternative neural network (DANN) used to generate candidates of next move.
Compared with existing deep convolutional neural network (DCNN), DANN inserts
recurrent layer after each convolutional layer and stacks them in an
alternative manner. We show such setting can preserve more contexts of local
features and its evolutions which are beneficial for move prediction. The
second part is a long-term evaluation (LTE) module used to provide a reliable
evaluation of candidates rather than a single probability from move predictor.
This is consistent with human experts nature of playing since they can foresee
tens of steps to give an accurate estimation of candidates. In our system, for
each candidate, LTE calculates a cumulative reward after several future
interactions when local variations are settled. Combining criteria from the two
parts, our system determines the optimal choice of next move. For more
comprehensive experiments, we introduce a new professional Go dataset (PGD),
consisting of 253233 professional records. Experiments on GoGoD and PGD
datasets show the DANN can substantially improve performance of move prediction
over pure DCNN. When combining LTE, our system outperforms most relevant
approaches and open engines based on MCTS.
Recurrent Latent Variable Networks for Session-Based Recommendation
Sotirios Chatzis , Panayiotis Christodoulou , Andreas S. Andreou Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Machine Learning (stat.ML)
In this work, we attempt to ameliorate the impact of data sparsity in the
context of session-based recommendation. Specifically, we seek to devise a
machine learning mechanism capable of extracting subtle and complex underlying
temporal dynamics in the observed session data, so as to inform the
recommendation algorithm. To this end, we improve upon systems that utilize
deep learning techniques with recurrently connected units; we do so by adopting
concepts from the field of Bayesian statistics, namely variational inference.
Our proposed approach consists in treating the network recurrent units as
stochastic latent variables with a prior distribution imposed over them. On
this basis, we proceed to infer corresponding posteriors; these can be used for
prediction and recommendation generation, in a way that accounts for the
uncertainty in the available sparse training data. To allow for our approach to
easily scale to large real-world datasets, we perform inference under an
approximate amortized variational inference (AVI) setup, whereby the learned
posteriors are parameterized via (conventional) neural networks. We perform an
extensive experimental evaluation of our approach using challenging benchmark
datasets, and illustrate its superiority over existing state-of-the-art
techniques.
Item Difficulty-Based Label Aggregation Models for Crowdsourcing
Chi Hong Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Learning (cs.LG)
A large amount of labeled data is required for supervised learning. However,
labeling by domain experts is expensive and time-consuming. A low cost and high
efficiency way to obtain large training datasets is to aggregate noisy labels
collected from non-professional crowds. Prior works have proposed confusion
matrices to evaluate the reliability of workers. In this paper, we redefine the
structure of the confusion matrices and propose two Bayesian Network based
methods which utilize item difficulty in label aggregation. We assume that
labels are generated by a probability distribution over confusion matrices,
item difficulties, labels and true labels. We use Markov chain Monte Carlo
method to generate samples from the posterior distribution of model parameters
and then infer the results. To avoid bad local optima, we design a method to
preliminarily predict the difficulty of each item and initialize the model
parameters. We also introduce how to improve the scalability of our model.
Empirical results show that our methods consistently outperform
state-of-the-art methods.
Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Yizhen Wang , Somesh Jha , Kamalika Chaudhuri Subjects : Machine Learning (stat.ML) ; Cryptography and Security (cs.CR); Learning (cs.LG)
Motivated by applications such as autonomous vehicles, test-time attacks via
adversarial examples have received a great deal of recent attention. In this
setting, an adversary is capable of making queries to a classifier, and
perturbs a test example by a small amount in order to force the classifier to
report an incorrect label. While a long line of work has explored a number of
attacks, not many reliable defenses are known, and there is an overall lack of
general understanding about the foundations of designing machine learning
algorithms robust to adversarial examples.
In this paper, we take a step towards addressing this challenging question by
introducing a new theoretical framework, analogous to bias-variance theory,
which we can use to tease out the causes of vulnerability. We apply our
framework to a simple classification algorithm: nearest neighbors, and analyze
its robustness to adversarial examples. Motivated by our analysis, we propose a
modified version of the nearest neighbor algorithm, and demonstrate both
theoretically and empirically that it has superior robustness to standard
nearest neighbors.
SEP-Nets: Small and Effective Pattern Networks
Zhe Li , Xiaoyu Wang , Xutao Lv , Tianbao Yang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)
While going deeper has been witnessed to improve the performance of
convolutional neural networks (CNN), going smaller for CNN has received
increasing attention recently due to its attractiveness for mobile/embedded
applications. It remains an active and important topic how to design a small
network while retaining the performance of large and deep CNNs (e.g., Inception
Nets, ResNets). Albeit there are already intensive studies on compressing the
size of CNNs, the considerable drop of performance is still a key concern in
many designs. This paper addresses this concern with several new contributions.
First, we propose a simple yet powerful method for compressing the size of deep
CNNs based on parameter binarization. The striking difference from most
previous work on parameter binarization/quantization lies at different
treatments of (1 imes 1) convolutions and (k imes k) convolutions ((k>1)),
where we only binarize (k imes k) convolutions into binary patterns. The
resulting networks are referred to as pattern networks. By doing this, we show
that previous deep CNNs such as GoogLeNet and Inception-type Nets can be
compressed dramatically with marginal drop in performance. Second, in light of
the different functionalities of (1 imes 1) (data projection/transformation)
and (k imes k) convolutions (pattern extraction), we propose a new block
structure codenamed the pattern residual block that adds transformed feature
maps generated by (1 imes 1) convolutions to the pattern feature maps
generated by (k imes k) convolutions, based on which we design a small network
with (sim 1) million parameters. Combining with our parameter binarization, we
achieve better performance on ImageNet than using similar sized networks
including recently released Google MobileNets.
A Direction Search and Spectral Clustering Based Approach to Subspace Clustering
Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
This paper presents a new spectral-clustering-based approach to the subspace
clustering problem in which the data lies in the union of an unknown number of
unknown linear subspaces. Underpinning the proposed method is a convex program
for optimal direction search, which for each data point d, finds an optimal
direction in the span of the data that has minimum projection on the other data
points and non-vanishing projection on d. The obtained directions are
subsequently leveraged to identify a neighborhood set for each data point. An
Alternating Direction Method of Multipliers (ADMM) framework is provided to
efficiently solve for the optimal directions. The proposed method is shown to
often outperform the existing subspace clustering methods, particularly for
unwieldy scenarios involving high levels of noise and close subspaces, and
yields the state-of-the-art results for the problem of face clustering using
subspace segmentation.
Adversarial Feature Matching for Text Generation
Comments: Accepted by ICML 2017
Subjects:
Machine Learning (stat.ML)
; Computation and Language (cs.CL); Learning (cs.LG)
The Generative Adversarial Network (GAN) has achieved great success in
generating realistic (real-valued) synthetic data. However, convergence issues
and difficulties dealing with discrete data hinder the applicability of GAN to
text. We propose a framework for generating realistic text via adversarial
training. We employ a long short-term memory network as generator, and a
convolutional network as discriminator. Instead of using the standard objective
of GAN, we propose matching the high-dimensional latent feature distributions
of real and synthetic sentences, via a kernelized discrepancy metric. This
eases adversarial training by alleviating the mode-collapsing problem. Our
experiments show superior performance in quantitative evaluation, and
demonstrate that our model can generate realistic-looking sentences.
Encoding of phonology in a recurrent neural model of grounded speech
Comments: Accepted at CoNLL 2017
Subjects:
Computation and Language (cs.CL)
; Learning (cs.LG); Sound (cs.SD)
We study the representation and encoding of phonemes in a recurrent neural
network model of grounded speech. We use a model which processes images and
their spoken descriptions, and projects the visual and auditory representations
into the same semantic space. We perform a number of analyses on how
information about individual phonemes is encoded in the MFCC features extracted
from the speech signal, and the activations of the layers of the model. Via
experiments with phoneme decoding and phoneme discrimination we show that
phoneme representations are most salient in the lower layers of the model,
where low-level signals are processed at a fine-grained level, although a large
amount of phonological information is retain at the top recurrent layer. We
further find out that the attention mechanism following the top recurrent layer
significantly attenuates encoding of phonology and makes the utterance
embeddings much more invariant to synonymy. Moreover, a hierarchical clustering
of phoneme representations learned by the network shows an organizational
structure of phonemes similar to those proposed in linguistics.
Optimal Auctions through Deep Learning
Paul Dütting , Zhe Feng , Harikrishna Narasimhan , David C. Parkes Subjects : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Learning (cs.LG)
Designing an auction that maximizes expected revenue is an intricate task.
Indeed, as of today–despite major efforts and impressive progress over the
past few years–only the single-item case is fully understood. In this work, we
initiate the exploration of the use of tools from deep learning on this topic.
The design objective is revenue optimal, dominant-strategy incentive compatible
auctions. We show that multi-layer neural networks can learn almost-optimal
auctions for settings for which there are analytical solutions, such as
Myerson’s auction for a single item, Manelli and Vincent’s mechanism for a
single bidder with additive preferences over two items, or Yao’s auction for
two additive bidders with binary support distributions and multiple items, even
if no prior knowledge about the form of optimal auctions is encoded in the
network and the only feedback during training is revenue and regret. We further
show how characterization results, even rather implicit ones such as Rochet’s
characterization through induced utilities and their gradients, can be
leveraged to obtain more precise fits to the optimal design. We conclude by
demonstrating the potential of deep learning for deriving optimal auctions with
high revenue for poorly understood problems.
Information Theory
Wiretap Channels: Nonasymptotic Fundamental Limits
Comments: 53 pages, 3 figures
Subjects:
Information Theory (cs.IT)
This paper investigates the maximal secret communication rate over a wiretap
channel subject to reliability and secrecy constraints at a given blocklength.
New achievability and converse bounds are derived, which are uniformly tighter
than existing bounds, and lead to the tightest bounds on the second-order
coding rate for discrete memoryless and Gaussian wiretap channels. The exact
second-order coding rate is established for semi-deterministic wiretap
channels, which characterizes the optimal tradeoff between reliability and
secrecy in the finite-blocklength regime. Underlying our achievability bounds
are two new privacy amplification results, which not only refine the existing
results, but also achieve stronger notions of secrecy.
Fast Maximum-Likelihood Decoder for 4*4 Quasi-Orthogonal Space-Time Block Code
Adel Ahmadi , Siamak Talebi Subjects : Information Theory (cs.IT)
This letter introduces two fast maximum-likelihood (ML) detection methods for
4*4 quasi-orthogonal space-time block code (QOSTBC). The first algorithm with a
relatively simple design exploits structure of quadrature amplitude modulation
(QAM) constellations to achieve its goal and the second algorithm, though
somewhat more complex, can be applied to any arbitrary constellation. Both
decoders utilize a novel decomposition technique for ML metric which divides
the metric into independent positive parts and a positive interference part.
Search spaces of symbols are substantially reduced by employing the independent
parts and statistics of noise. Finally, the members of search spaces are
successively evaluated until the metric is minimized. Simulation results
confirm that the proposed decoder is superior to some of the most recently
published methods in terms of complexity level. More specifically, the results
verified that application of the new algorithm with 1024-QAM would require
reduced computational complexity compared to state-of-the-art solution with
16-QAM.
Comments: Accepted by IPDPS’17, 11 pages, 10 figures, double column
Subjects:
Information Theory (cs.IT)
Today’s HPC applications are producing extremely large amounts of data, such
that data storage and analysis are becoming more challenging for scientific
research. In this work, we design a new error-controlled lossy compression
algorithm for large-scale scientific data. Our key contribution is
significantly improving the prediction hitting rate (or prediction accuracy)
for each data point based on its nearby data values along multiple dimensions.
We derive a series of multilayer prediction formulas and their unified formula
in the context of data compression. One serious challenge is that the data
prediction has to be performed based on the preceding decompressed values
during the compression in order to guarantee the error bounds, which may
degrade the prediction accuracy in turn. We explore the best layer for the
prediction by considering the impact of compression errors on the prediction
accuracy. Moreover, we propose an adaptive error-controlled quantization
encoder, which can further improve the prediction hitting rate considerably.
The data size can be reduced significantly after performing the variable-length
encoding because of the uneven distribution produced by our quantization
encoder. We evaluate the new compressor on production scientific data sets and
compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP,
SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class,
especially with regard to compression factors (or bit-rates) and compression
errors (including RMSE, NRMSE, and PSNR). Our solution is better than the
second-best solution by more than a 2x increase in the compression factor and
3.8x reduction in the normalized root mean squared error on average, with
reasonable error bounds and user-desired bit-rates.
Approximate Optimal Designs for Multivariate Polynomial Regression
Comments: 24 Pages, 5 Figures. arXiv admin note: substantial text overlap with arXiv:1703.01777
Subjects:
Statistics Theory (math.ST)
; Information Theory (cs.IT); Numerical Analysis (math.NA); Computation (stat.CO); Methodology (stat.ME)
We introduce a new approach aiming at computing approximate optimal designs
for multivariate polynomial regressions on compact (semi-algebraic) design
spaces. We use the moment-sum-of-squares hierarchy of semidefinite programming
problems to solve numerically and approximately the optimal design problem. The
geometry of the design is recovered via semidefinite programming duality
theory. This article shows that the hierarchy converges to the approximate
optimal design as the order of the hierarchy increases. Furthermore, we provide
a dual certificate ensuring finite convergence of the hierarchy and showing
that the approximate optimal design can be computed exactly in polynomial time
thanks to our method. Finite convergence of the hierarchy can be standardly
certified numerically as well. As a byproduct, we revisit the equivalence
theorem of the experimental design theory: it is linked to the Christoffel
polynomial and it characterizes finite convergence of the moment-sum-of-square
hierarchies.
欢迎加入我爱机器学习QQ11群:191401275
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
The Golden Ticket
Lance Fortnow / Princeton University Press / 2013-3-31 / USD 26.95
The P-NP problem is the most important open problem in computer science, if not all of mathematics. The Golden Ticket provides a nontechnical introduction to P-NP, its rich history, and its algorithmi......一起来看看 《The Golden Ticket》 这本书的介绍吧!