arXiv Paper Daily: Wed, 14 Jun 2017

栏目: 数据库 · 发布时间: 7年前

内容简介:arXiv Paper Daily: Wed, 14 Jun 2017

Neural and Evolutionary Computing

Temporally Efficient Deep Learning with Spikes

Peter O'Connor , Efstratios Gavves , Max Welling

Comments: 8 pages + references and appendix

Subjects

:

Neural and Evolutionary Computing (cs.NE)

The vast majority of natural sensory data is temporally redundant. Video

frames or audio samples which are sampled at nearby points in time tend to have

similar values. Typically, deep learning algorithms take no advantage of this

redundancy to reduce computation. This can be an obscene waste of energy. We

present a variant on backpropagation for neural networks in which computation

scales with the rate of change of the data – not the rate at which we process

the data. We do this by having neurons communicate a combination of their

state, and their temporal change in state. Intriguingly, this simple

communication rule give rise to units that resemble biologically-inspired leaky

integrate-and-fire neurons, and to a weight-update rule that is equivalent to a

form of Spike-Timing Dependent Plasticity (STDP), a synaptic learning rule

observed in the brain. We demonstrate that on MNIST and a temporal variant of

MNIST, our algorithm performs about as well as a Multilayer Perceptron trained

with backpropagation, despite only communicating discrete values between

layers.

Prediction of Muscle Activations for Reaching Movements using Deep Neural Networks

Najeeb Khan , Ian Stavness

Comments: To be presented at the Annual meeting of American Society of Biomechanics 2017

Subjects

:

Neural and Evolutionary Computing (cs.NE)

The motor control problem involves determining the time-varying muscle

activation trajectories required to accomplish a given movement. Muscle

redundancy makes motor control a challenging task: there are many possible

activation trajectories that accomplish the same movement. Despite this

redundancy, most movements are accomplished in highly stereotypical ways. For

example, point-to-point reaching movements are almost universally performed

with very similar smooth trajectories. Optimization methods are commonly used

to predict muscle forces for measured movements. However, these approaches

require computationally expensive simulations and are sensitive to the chosen

optimality criteria and regularization. In this work, we investigate deep

autoencoders for the prediction of muscle activation trajectories for

point-to-point reaching movements. We evaluate our DNN predictions with

simulated reaches and two methods to generate the muscle activations: inverse

dynamics (ID) and optimal control (OC) criteria. We also investigate optimal

network parameters and training criteria to improve the accuracy of the

predictions.

From MEGATON to RASCAL: Surfing the Parameter Space of Evolutionary Algorithms

Moshe Sipper , Weixuan Fu , Karuna Ahuja , Jason H. Moore Subjects : Neural and Evolutionary Computing (cs.NE)

The practice of evolutionary algorithms involves a mundane yet inescapable

phase, namely, finding parameters that work well. How big should the population

be? How many generations should the algorithm run? What is the (tournament

selection) tournament size? What probabilities should one assign to crossover

and mutation? All these nagging questions need good answers if one is to

embrace success. Through an extensive series of experiments over multiple

evolutionary algorithm implementations and problems we show that parameter

space tends to be rife with viable parameters. We aver that this renders the

life of the practitioner that much easier, and cap off our study with an

advisory digest for the weary.

Recurrent Inference Machines for Solving Inverse Problems

Patrick Putzky , Max Welling Subjects : Neural and Evolutionary Computing (cs.NE) ; Computer Vision and Pattern Recognition (cs.CV)

Much of the recent research on solving iterative inference problems focuses

on moving away from hand-chosen inference algorithms and towards learned

inference. In the latter, the inference process is unrolled in time and

interpreted as a recurrent neural network (RNN) which allows for joint learning

of model and inference parameters with back-propagation through time. In this

framework, the RNN architecture is directly derived from a hand-chosen

inference algorithm, effectively limiting its capabilities. We propose a

learning framework, called Recurrent Inference Machines (RIM), in which we turn

algorithm construction the other way round: Given data and a task, train an RNN

to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they

are capable to implement any inference algorithm. The framework allows for an

abstraction which removes the need for domain knowledge. We demonstrate in

several image restoration experiments that this abstraction is effective,

allowing us to achieve state-of-the-art performance on image denoising and

super-resolution tasks and superior across-task generalization.

Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

Jinzhuo Wang , Wenmin Wang , Ronggang Wang , Wen Gao

Comments: AAAI 2017

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Monte Carlo tree search (MCTS) is extremely popular in computer Go which

determines each action by enormous simulations in a broad and deep search tree.

However, human experts select most actions by pattern analysis and careful

evaluation rather than brute search of millions of future nteractions. In this

paper, we propose a computer Go system that follows experts way of thinking and

playing. Our system consists of two parts. The first part is a novel deep

alternative neural network (DANN) used to generate candidates of next move.

Compared with existing deep convolutional neural network (DCNN), DANN inserts

recurrent layer after each convolutional layer and stacks them in an

alternative manner. We show such setting can preserve more contexts of local

features and its evolutions which are beneficial for move prediction. The

second part is a long-term evaluation (LTE) module used to provide a reliable

evaluation of candidates rather than a single probability from move predictor.

This is consistent with human experts nature of playing since they can foresee

tens of steps to give an accurate estimation of candidates. In our system, for

each candidate, LTE calculates a cumulative reward after several future

interactions when local variations are settled. Combining criteria from the two

parts, our system determines the optimal choice of next move. For more

comprehensive experiments, we introduce a new professional Go dataset (PGD),

consisting of 253233 professional records. Experiments on GoGoD and PGD

datasets show the DANN can substantially improve performance of move prediction

over pure DCNN. When combining LTE, our system outperforms most relevant

approaches and open engines based on MCTS.

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

Joan Serrà , Alexandros Karatzoglou

Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Recommendation algorithms that incorporate techniques from deep learning are

becoming increasingly popular. Due to the structure of the data coming from

recommendation domains (i.e., one-hot-encoded vectors of item preferences),

these algorithms tend to have large input and output dimensionalities that

dominate their overall size. This makes them difficult to train, due to the

limited memory of graphical processing units, and difficult to deploy on mobile

devices with limited hardware. To address these difficulties, we propose Bloom

embeddings, a compression technique that can be applied to the input and output

of neural network models dealing with sparse high-dimensional binary-coded

instances. Bloom embeddings are computationally efficient, and do not seriously

compromise the accuracy of the model up to 1/5 compression ratios. In some

cases, they even improve over the original accuracy, with relative increases up

to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4

alternative methods, obtaining favorable results. We also discuss a number of

further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time

operation, zero or marginal space requirements, training time speedups, or the

fact that they do not require any change to the core model architecture or

training configuration.

A Supervised Approach to Extractive Summarisation of Scientific Papers

Ed Collins , Isabelle Augenstein , Sebastian Riedel

Comments: 11 pages, 6 figures

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)

Automatic summarisation is a popular approach to reduce a document to its

main arguments. Recent research in the area has focused on neural approaches to

summarisation, which can be very data-hungry. However, few large datasets exist

and none for the traditionally popular domain of scientific publications, which

opens up challenging research avenues centered on encoding large, complex

documents. In this paper, we introduce a new dataset for summarisation of

computer science publications by exploiting a large resource of author provided

summaries and show straightforward ways of extending it further. We develop

models on the dataset making use of both neural sentence encoding and

traditionally used summarisation features and show that models which encode

sentences as well as their local and global context perform best, significantly

outperforming well-established baseline methods.

Computer Vision and Pattern Recognition

Video Imagination from a Single Image with Transformation Generation

Baoyang Chen , Wenmin Wang , Jinzhuo Wang , Xiongtao Chen , Weimian Li

Comments: 9 pages, 10 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

In this work, we focus on a challenging task: synthesizing multiple imaginary

videos given a single image. Major problems come from high dimensionality of

pixel space and the ambiguity of potential motions. To overcome those problems,

we propose a new framework that produce imaginary videos by transformation

generation. The generated transformations are applied to the original image in

a novel volumetric merge network to reconstruct frames in imaginary video.

Through sampling different latent variables, our method can output different

imaginary video samples. The framework is trained in an adversarial way with

unsupervised learning. For evaluation, we propose a new assessment metric

(RIQA). In experiments, we test on 3 datasets varying from synthetic data to

natural scene. Our framework achieves promising performance in image quality

assessment. The visual inspection indicates that it can successfully generate

diverse five-frame videos in acceptable perceptual quality.

Joint Max Margin and Semantic Features for Continuous Event Detection in Complex Scenes

Iman Abbasnejad , Sridha Sridharan , Simon Denman , Clinton Fookes , Simon Lucey

Comments: submit to journal of Computer Vision and Image Understanding

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

In this paper the problem of complex event detection in the continuous domain

(i.e. events with unknown starting and ending locations) is addressed. Existing

event detection methods are limited to features that are extracted from the

local spatial or spatio-temporal patches from the videos. However, this makes

the model vulnerable to the events with similar concepts e.g. “Open drawer” and

“Open cupboard”. In this work, in order to address the aforementioned

limitations we present a novel model based on the combination of semantic and

temporal features extracted from video frames. We train a max-margin classifier

on top of the extracted features in an adaptive framework that is able to

detect the events with unknown starting and ending locations. Our model is

based on the Bidirectional Region Neural Network and large margin Structural

Output SVM. The generality of our model allows it to be simply applied to

different labeled and unlabeled datasets. We finally test our algorithm on

three challenging datasets, “UCF 101-Action Recognition”, “MPII Cooking

Activities” and “Hollywood”, and we report state-of-the-art performance.

Deep Learning-Based Food Calorie Estimation Method in Dietary Assessment

Yanchao Liang , Jianhua Li

Comments: 13 pages. arXiv admin note: substantial text overlap with arXiv:1705.07632

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Obesity treatment requires obese patients to record all food intakes per day.

Computer vision has been introduced to estimate calories from food images. In

order to increase accuracy of detection and reduce the error of volume

estimation in food calorie estimation, we present our calorie estimation method

in this paper. To estimate calorie of food, a top view and side view is needed.

Faster R-CNN is used to detect the food and calibration object. GrabCut

algorithm is used to get each food’s contour. Then the volume is estimated with

the food and corresponding object. Finally we estimate each food’s calorie. And

the experiment results show our estimation method is effective.

Text Extraction From Texture Images Using Masked Signal Decomposition

Shervin Minaee , Yao Wang

Comments: arXiv admin note: text overlap with arXiv:1704.07711

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Text extraction is an important problem in image processing with applications

from optical character recognition to autonomous driving. Most of the

traditional text segmentation algorithms consider separating text from a simple

background (which usually has a different color from texts). In this work we

consider separating texts from a textured background, that has similar color to

texts. We look at this problem from a signal decomposition perspective, and

consider a more realistic scenario where signal components are overlaid on top

of each other (instead of adding together). When the signals are overlaid, to

separate signal components, we need to find a binary mask which shows the

support of each component. Because directly solving the binary mask is

intractable, we relax this problem to the approximated continuous problem, and

solve it by alternating optimization method. We show that the proposed

algorithm achieves significantly better results than other recent works on

several challenging images.

Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty

Pedro F. Proenca , Yang Gao Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)

This work proposes a robust visual odometry method for structured

environments that combines point features with line and plane segments,

extracted through an RGB-D camera. Noisy depth maps are processed by a

probabilistic depth fusion framework based on Mixtures of Gaussians to denoise

and derive the depth uncertainty, which is then propagated throughout the

visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are

used to model the uncertainties of the feature parameters and pose is estimated

by combining the three types of primitives based on their uncertainties.

Performance evaluation on RGB-D sequences collected in this work and two public

RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth

fusion framework and combining the three feature-types, particularly in scenes

with low-textured surfaces, dynamic objects and missing depth measurements.

Long-Term Video Interpolation with Bidirectional Predictive Network

Xiongtao Chen , Wenmin Wang , Jinzhuo Wang , Weimian Li , Baoyang Chen

Comments: 5 pages, 7 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This paper considers the challenging task of long-term video interpolation.

Unlike most existing methods that only generate few intermediate frames between

existing adjacent ones, we attempt to speculate or imagine the procedure of an

episode and further generate multiple frames between two non-consecutive frames

in videos. In this paper, we present a novel deep architecture called

bidirectional predictive network (BiPN) that predicts intermediate frames from

two opposite directions. The bidirectional architecture allows the model to

learn scene transformation with time as well as generate longer video

sequences. Besides, our model can be extended to predict multiple possible

procedures by sampling different noise vectors. A joint loss composed of clues

in image and feature spaces and adversarial loss is designed to train our

model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes

and UCF101 and report competitive results to recent approaches.

SEP-Nets: Small and Effective Pattern Networks

Zhe Li , Xiaoyu Wang , Xutao Lv , Tianbao Yang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)

While going deeper has been witnessed to improve the performance of

convolutional neural networks (CNN), going smaller for CNN has received

increasing attention recently due to its attractiveness for mobile/embedded

applications. It remains an active and important topic how to design a small

network while retaining the performance of large and deep CNNs (e.g., Inception

Nets, ResNets). Albeit there are already intensive studies on compressing the

size of CNNs, the considerable drop of performance is still a key concern in

many designs. This paper addresses this concern with several new contributions.

First, we propose a simple yet powerful method for compressing the size of deep

CNNs based on parameter binarization. The striking difference from most

previous work on parameter binarization/quantization lies at different

treatments of (1 imes 1) convolutions and (k imes k) convolutions ((k>1)),

where we only binarize (k imes k) convolutions into binary patterns. The

resulting networks are referred to as pattern networks. By doing this, we show

that previous deep CNNs such as GoogLeNet and Inception-type Nets can be

compressed dramatically with marginal drop in performance. Second, in light of

the different functionalities of (1 imes 1) (data projection/transformation)

and (k imes k) convolutions (pattern extraction), we propose a new block

structure codenamed the pattern residual block that adds transformed feature

maps generated by (1 imes 1) convolutions to the pattern feature maps

generated by (k imes k) convolutions, based on which we design a small network

with (sim 1) million parameters. Combining with our parameter binarization, we

achieve better performance on ImageNet than using similar sized networks

including recently released Google MobileNets.

Deep Control – a simple automatic gain control for memory efficient and high performance training of deep convolutional neural networks

Brendan Ruff

Comments: Submitted to BMVC 2017 on 2nd May 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Training a deep convolutional neural net typically starts with a random

initialisation of all filters in all layers which severely reduces the forward

signal and back-propagated error and leads to slow and sub-optimal training.

Techniques that counter that focus on either increasing the signal or

increasing the gradients adaptively but the model behaves very differently at

the beginning of training compared to later when stable pathways through the

net have been established. To compound this problem the effective minibatch

size varies greatly between layers at different depths and between individual

filters as activation sparsity typically increases with depth leading to a

reduction in effective learning rate since gradients may superpose rather than

add and this further compounds the covariate shift problem as deeper neurons

are less able to adapt to upstream shift.

Proposed here is a method of automatic gain control of the signal built into

each convolutional neuron that achieves equivalent or superior performance than

batch normalisation and is compatible with single sample or minibatch gradient

descent. The same model is used both for training and inference.

The technique comprises a scaled per sample map mean subtraction from the raw

convolutional filter output followed by scaling of the difference.

Contrast Enhancement Estimation for Digital Image Forensics

Longyin Wen , Honggang Qi , Siwei Lyu Subjects : Computer Vision and Pattern Recognition (cs.CV)

Inconsistency in contrast enhancement can be used to expose image forgeries.

In this work, we describe a new method to estimate contrast enhancement from a

single image. Our method takes advantage of the nature of contrast enhancement

as a mapping between pixel values, and the distinct characteristics it

introduces to the image pixel histogram. Our method recovers the original pixel

histogram and the contrast enhancement simultaneously from a single image with

an iterative algorithm. Unlike previous methods, our method is robust in the

presence of additive noise perturbations that are used to hide the traces of

contrast enhancement. Furthermore, we also develop an e effective method to to

detect image regions undergone contrast enhancement transformations that are

different from the rest of the image, and use this method to detect composite

images. We perform extensive experimental evaluations to demonstrate the

efficacy and efficiency of our method method.

Can We See Photosynthesis? Magnifying the Tiny Color Changes of Plant Green Leaves Using Eulerian Video Magnification

Islam A.T.F. Taj-Eddin , Mahmoud Afifi , Mostafa Korashy , Ali H. Ahmed , Ng Yoke Cheng , Evelyng Hernandez , Salma M. Abdel-latif

Comments: 5 pages, 6 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Plant aliveness is proven through laboratory experiments and special

scientific instruments. In this paper, we aim to detect the degree of animation

of plants based on the magnification of the small color changes in the plant’s

green leaves using the Eulerian video magnification. Capturing the video under

a controlled environment, e.g., using a tripod and direct current (DC) light

sources, reduces camera movements and minimizes light fluctuations; we aim to

reduce the external factors as much as possible. The acquired video is then

stabilized and a proposed algorithm used to reduce the illumination variations.

Lastly, the Euler magnification is utilized to magnify the color changes on the

light invariant video. The proposed system does not require any special purpose

instruments as it uses a digital camera with a regular frame rate. The results

of magnified color changes on both natural and plastic leaves show that the

live green leaves have color changes in contrast to the plastic leaves. Hence,

we can argue that the color changes of the leaves are due to biological

operations, such as photosynthesis. To date, this is possibly the first work

that focuses on interpreting visually, some biological operations of plants

without any special purpose instruments.

Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking

James Tompkin , Kwang In Kim , Hanspeter Pfister , Christian Theobalt Subjects : Computer Vision and Pattern Recognition (cs.CV)

Large databases are often organized by hand-labeled metadata, or criteria,

which are expensive to collect. We can use unsupervised learning to model

database variation, but these models are often high dimensional, complex to

parameterize, or require expert knowledge. We learn low-dimensional continuous

criteria via interactive ranking, so that the novice user need only describe

the relative ordering of examples. This is formed as semi-supervised label

propagation in which we maximize the information gained from a limited number

of examples. Further, we actively suggest data points to the user to rank in a

more informative way than existing work. Our efficient approach allows users to

interactively organize thousands of data points along 1D and 2D continuous

sliders. We experiment with datasets of imagery and geometry to demonstrate

that our tool is useful for quickly assessing and organizing the content of

large databases.

A Direction Search and Spectral Clustering Based Approach to Subspace Clustering

Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

This paper presents a new spectral-clustering-based approach to the subspace

clustering problem in which the data lies in the union of an unknown number of

unknown linear subspaces. Underpinning the proposed method is a convex program

for optimal direction search, which for each data point d, finds an optimal

direction in the span of the data that has minimum projection on the other data

points and non-vanishing projection on d. The obtained directions are

subsequently leveraged to identify a neighborhood set for each data point. An

Alternating Direction Method of Multipliers (ADMM) framework is provided to

efficiently solve for the optimal directions. The proposed method is shown to

often outperform the existing subspace clustering methods, particularly for

unwieldy scenarios involving high levels of noise and close subspaces, and

yields the state-of-the-art results for the problem of face clustering using

subspace segmentation.

Indirect Image Registration with Large Diffeomorphic Deformations

Chong Chen , Ozan Öktem

Comments: 34 pages, 4 figures, 1 table

Subjects

:

Numerical Analysis (math.NA)

; Computer Vision and Pattern Recognition (cs.CV); Dynamical Systems (math.DS); Functional Analysis (math.FA); Optimization and Control (math.OC)

We introduce a variational framework for indirect image registration where a

template is registered against a target that is known only through indirect

noisy observations, such as in tomographic imaging. The registration uses

diffeomorphisms that transform the template through a (group) action. These

diffeomorphisms are generated using the large deformation diffeomorphic metric

mapping framework, i.e., they are given as solutions of a flow equation that is

defined by a velocity field. We prove existence of solutions to this indirect

image registration procedure and provide examples of its performance on 2D

tomography with very sparse and/or highly noisy data.

Recurrent Inference Machines for Solving Inverse Problems

Patrick Putzky , Max Welling Subjects : Neural and Evolutionary Computing (cs.NE) ; Computer Vision and Pattern Recognition (cs.CV)

Much of the recent research on solving iterative inference problems focuses

on moving away from hand-chosen inference algorithms and towards learned

inference. In the latter, the inference process is unrolled in time and

interpreted as a recurrent neural network (RNN) which allows for joint learning

of model and inference parameters with back-propagation through time. In this

framework, the RNN architecture is directly derived from a hand-chosen

inference algorithm, effectively limiting its capabilities. We propose a

learning framework, called Recurrent Inference Machines (RIM), in which we turn

algorithm construction the other way round: Given data and a task, train an RNN

to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they

are capable to implement any inference algorithm. The framework allows for an

abstraction which removes the need for domain knowledge. We demonstrate in

several image restoration experiments that this abstraction is effective,

allowing us to achieve state-of-the-art performance on image denoising and

super-resolution tasks and superior across-task generalization.

SmoothGrad: removing noise by adding noise

Daniel Smilkov , Nikhil Thorat , Been Kim , Fernanda Viégas , Martin Wattenberg

Comments: 10 pages

Subjects

:

Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Explaining the output of a deep network remains a challenge. In the case of

an image classifier, one type of explanation is to identify pixels that

strongly influence the final decision. A starting point for this strategy is

the gradient of the class score function with respect to the input image. This

gradient can be interpreted as a sensitivity map, and there are several

techniques that elaborate on this basic idea. This paper makes two

contributions: it introduces SmoothGrad, a simple method that can help visually

sharpen gradient-based sensitivity maps, and it discusses lessons in the

visualization of these maps. We publish the code for our experiments and a

website with our results.

Artificial Intelligence

Technical Report: Implementation and Validation of a Smart Health Application

Fran Casino , Constantinos Patsakis , Antoni Martinez-Balleste , Frederic Borras , Edgar Batista

Comments: 4-page Tech Report

Subjects

:

Artificial Intelligence (cs.AI)

; Computers and Society (cs.CY)

In this article, we explain in detail the internal structures and databases

of a smart health application. Moreover, we describe how to generate a

statistically sound synthetic dataset using real-world medical data.

Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

Jinzhuo Wang , Wenmin Wang , Ronggang Wang , Wen Gao

Comments: AAAI 2017

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Monte Carlo tree search (MCTS) is extremely popular in computer Go which

determines each action by enormous simulations in a broad and deep search tree.

However, human experts select most actions by pattern analysis and careful

evaluation rather than brute search of millions of future nteractions. In this

paper, we propose a computer Go system that follows experts way of thinking and

playing. Our system consists of two parts. The first part is a novel deep

alternative neural network (DANN) used to generate candidates of next move.

Compared with existing deep convolutional neural network (DCNN), DANN inserts

recurrent layer after each convolutional layer and stacks them in an

alternative manner. We show such setting can preserve more contexts of local

features and its evolutions which are beneficial for move prediction. The

second part is a long-term evaluation (LTE) module used to provide a reliable

evaluation of candidates rather than a single probability from move predictor.

This is consistent with human experts nature of playing since they can foresee

tens of steps to give an accurate estimation of candidates. In our system, for

each candidate, LTE calculates a cumulative reward after several future

interactions when local variations are settled. Combining criteria from the two

parts, our system determines the optimal choice of next move. For more

comprehensive experiments, we introduce a new professional Go dataset (PGD),

consisting of 253233 professional records. Experiments on GoGoD and PGD

datasets show the DANN can substantially improve performance of move prediction

over pure DCNN. When combining LTE, our system outperforms most relevant

approaches and open engines based on MCTS.

Meta learning Framework for Automated Driving

Ahmad El Sallab , Mahmoud Saeed , Omar Abdel Tawab , Mohammed Abdou Subjects : Artificial Intelligence (cs.AI) ; Machine Learning (stat.ML)

The success of automated driving deployment is highly depending on the

ability to develop an efficient and safe driving policy. The problem is well

formulated under the framework of optimal control as a cost optimization

problem. Model based solutions using traditional planning are efficient, but

require the knowledge of the environment model. On the other hand, model free

solutions suffer sample inefficiency and require too many interactions with the

environment, which is infeasible in practice. Methods under the Reinforcement

Learning framework usually require the notion of a reward function, which is

not available in the real world. Imitation learning helps in improving sample

efficiency by introducing prior knowledge obtained from the demonstrated

behavior, on the risk of exact behavior cloning without generalizing to unseen

environments. In this paper we propose a Meta learning framework, based on data

set aggregation, to improve generalization of imitation learning algorithms.

Under the proposed framework, we propose MetaDAgger, a novel algorithm which

tackles the generalization issues in traditional imitation learning. We use The

Open Race Car Simulator (TORCS) to test our algorithm. Results on unseen test

tracks show significant improvement over traditional imitation learning

algorithms, improving the learning time and sample efficiency in the same time.

The results are also supported by visualization of the learnt features to prove

generalization of the captured details.

On Natural Language Generation of Formal Argumentation

Federico Cerutti , Alice Toniolo , Timothy J. Norman

Comments: 17 pages, 4 figures, technical report

Subjects

:

Artificial Intelligence (cs.AI)

In this paper we provide a first analysis of the research questions that

arise when dealing with the problem of communicating pieces of formal

argumentation through natural language interfaces. It is a generally held

opinion that formal models of argumentation naturally capture human argument,

and some preliminary studies have focused on justifying this view.

Unfortunately, the results are not only inconclusive, but seem to suggest that

explaining formal argumentation to humans is a rather articulated task.

Graphical models for expressing argumentation-based reasoning are appealing,

but often humans require significant training to use these tools effectively.

We claim that natural language interfaces to formal argumentation systems offer

a real alternative, and may be the way forward for systems that capture human

argument.

Recommendations for Marketing Campaigns in Telecommunication Business based on the footprint analysis

J. Sidorova , L. Skold , O. Rosander , L. Lundberg

Comments: conference

Subjects

:

Artificial Intelligence (cs.AI)

; Computers and Society (cs.CY)

A major investment made by a telecom operator goes into the infrastructure

and its maintenance, while business revenues are proportional to how big and

good the customer base is. We present a data-driven analytic strategy based on

combinatorial optimization and analysis of historical data. The data cover

historical mobility of the users in one region of Sweden during a week.

Applying the proposed method to the case study, we have identified the optimal

proportion of geo-demographic segments in the customer base, developed a

functionality to assess the potential of a planned marketing campaign, and

explored the problem of an optimal number and types of the geo-demographic

segments to target through marketing campaigns. With the help of fuzzy logic,

the conclusions of data analysis are automatically translated into

comprehensible recommendations in a natural language.

Fuzzy Recommendations in Marketing Campaigns

S. Podapati , L. Lundberg , L. Skold , O. Rosander , J. Sidorova

Comments: conference

Subjects

:

Artificial Intelligence (cs.AI)

The population in Sweden is growing rapidly due to immigration. In this

light, the issue of infrastructure upgrades to provide telecommunication

services is of importance. New antennas can be installed at hot spots of user

demand, which will require an investment, and/or the clientele expansion can be

carried out in a planned manner to promote the exploitation of the

infrastructure in the less loaded geographical zones. In this paper, we explore

the second alternative. Informally speaking, the term Infrastructure-Stressing

describes a user who stays in the zones of high demand, which are prone to

produce service failures, if further loaded. We have studied the

Infrastructure-Stressing population in the light of their correlation with

geo-demographic segments. This is motivated by the fact that specific

geo-demographic segments can be targeted via marketing campaigns. Fuzzy logic

is applied to create an interface between big data, numeric methods for

processing big data and a manager.

Item Difficulty-Based Label Aggregation Models for Crowdsourcing

Chi Hong Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Learning (cs.LG)

A large amount of labeled data is required for supervised learning. However,

labeling by domain experts is expensive and time-consuming. A low cost and high

efficiency way to obtain large training datasets is to aggregate noisy labels

collected from non-professional crowds. Prior works have proposed confusion

matrices to evaluate the reliability of workers. In this paper, we redefine the

structure of the confusion matrices and propose two Bayesian Network based

methods which utilize item difficulty in label aggregation. We assume that

labels are generated by a probability distribution over confusion matrices,

item difficulties, labels and true labels. We use Markov chain Monte Carlo

method to generate samples from the posterior distribution of model parameters

and then infer the results. To avoid bad local optima, we design a method to

preliminarily predict the difficulty of each item and initialize the model

parameters. We also introduce how to improve the scalability of our model.

Empirical results show that our methods consistently outperform

state-of-the-art methods.

A New Probabilistic Algorithm for Approximate Model Counting

Cunjing Ge , Feifei Ma , Tian Liu , Jian Zhang Subjects : Artificial Intelligence (cs.AI)

Constrained counting is important in domains ranging from artificial

intelligence to software analysis. There are already a few approaches for

counting models over various types of constraints. Recently, hashing-based

approaches achieve both theoretical guarantees and scalability, but still rely

on solution enumeration. In this paper, a new probabilistic polynomial time

approximate model counter is proposed, which is also a hashing-based universal

framework, but with only satisfiability queries. A variant with a dynamic

stopping criterion is also presented. Empirical evaluation over benchmarks on

propositional logic formulas and SMT(BV) formulas shows that the approach is

promising.

Gradient descent GAN optimization is locally stable

Vaishnavh Nagarajan , J. Zico Kolter Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

Despite their growing prominence, optimization in generative adversarial

networks (GANs) is still a poorly-understood topic. In this paper, we analyze

the “gradient descent” form of GAN optimization (i.e., the natural setting

where we simultaneously take small gradient steps in both generator and

discriminator parameters). We show that even though GAN optimization does not

correspond to a convex-concave game, even for simple parameterizations, under

proper conditions, equilibrium points of this optimization procedure are still

locally asymptotically stable for the traditional GAN formulation. On the other

hand, we show that the recently-proposed Wasserstein GAN can have

non-convergent limit cycles near equilibrium. Motivated by this stability

analysis, we propose an additional regularization term for gradient descent GAN

updates, which is able to guarantee local stability for both the WGAN and for

the traditional GAN, and also shows practical promise in speeding up

convergence and addressing mode collapse.

Zero-Shot Relation Extraction via Reading Comprehension

Omer Levy , Minjoon Seo , Eunsol Choi , Luke Zettlemoyer

Comments: CoNLL 2017

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

We show that relation extraction can be reduced to answering simple reading

comprehension questions, by associating one or more natural-language questions

with each relation slot. This reduction has several advantages: we can (1)

learn relation-extraction models by extending recent neural

reading-comprehension techniques, (2) build very large training sets for those

models by combining relation-specific crowd-sourced questions with distant

supervision, and even (3) do zero-shot learning by extracting new relation

types that are only specified at test-time, for which we have no labeled

training examples. Experiments on a Wikipedia slot-filling task demonstrate

that the approach can generalize to new questions for known relation types with

high accuracy, and that zero-shot generalization to unseen relation types is

possible, at lower accuracy levels, setting the bar for future work on this

task.

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

Joan Serrà , Alexandros Karatzoglou

Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Recommendation algorithms that incorporate techniques from deep learning are

becoming increasingly popular. Due to the structure of the data coming from

recommendation domains (i.e., one-hot-encoded vectors of item preferences),

these algorithms tend to have large input and output dimensionalities that

dominate their overall size. This makes them difficult to train, due to the

limited memory of graphical processing units, and difficult to deploy on mobile

devices with limited hardware. To address these difficulties, we propose Bloom

embeddings, a compression technique that can be applied to the input and output

of neural network models dealing with sparse high-dimensional binary-coded

instances. Bloom embeddings are computationally efficient, and do not seriously

compromise the accuracy of the model up to 1/5 compression ratios. In some

cases, they even improve over the original accuracy, with relative increases up

to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4

alternative methods, obtaining favorable results. We also discuss a number of

further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time

operation, zero or marginal space requirements, training time speedups, or the

fact that they do not require any change to the core model architecture or

training configuration.

A Supervised Approach to Extractive Summarisation of Scientific Papers

Ed Collins , Isabelle Augenstein , Sebastian Riedel

Comments: 11 pages, 6 figures

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)

Automatic summarisation is a popular approach to reduce a document to its

main arguments. Recent research in the area has focused on neural approaches to

summarisation, which can be very data-hungry. However, few large datasets exist

and none for the traditionally popular domain of scientific publications, which

opens up challenging research avenues centered on encoding large, complex

documents. In this paper, we introduce a new dataset for summarisation of

computer science publications by exploiting a large resource of author provided

summaries and show straightforward ways of extending it further. We develop

models on the dataset making use of both neural sentence encoding and

traditionally used summarisation features and show that models which encode

sentences as well as their local and global context perform best, significantly

outperforming well-established baseline methods.

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions

Kun Zhang , Mingming Gong , Joseph Ramsey , Kayhan Batmanghelich , Peter Spirtes , Clark Glymour

Comments: 15 pages, 5 figures, 1 table

Subjects

:

Methodology (stat.ME)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Measurement error in the observed values of the variables can greatly change

the output of various causal discovery methods. This problem has received much

attention in multiple fields, but it is not clear to what extent the causal

model for the measurement-error-free variables can be identified in the

presence of measurement error with unknown variance. In this paper, we study

precise sufficient identifiability conditions for the measurement-error-free

causal model and show what information of the causal model can be recovered

from observed data. In particular, we present two different sets of

identifiability conditions, based on the second-order statistics and

higher-order statistics of the data, respectively. The former was inspired by

the relationship between the generating model of the

measurement-error-contaminated data and the factor analysis model, and the

latter makes use of the identifiability result of the over-complete independent

component analysis problem.

Optimal Auctions through Deep Learning

Paul Dütting , Zhe Feng , Harikrishna Narasimhan , David C. Parkes Subjects : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Learning (cs.LG)

Designing an auction that maximizes expected revenue is an intricate task.

Indeed, as of today–despite major efforts and impressive progress over the

past few years–only the single-item case is fully understood. In this work, we

initiate the exploration of the use of tools from deep learning on this topic.

The design objective is revenue optimal, dominant-strategy incentive compatible

auctions. We show that multi-layer neural networks can learn almost-optimal

auctions for settings for which there are analytical solutions, such as

Myerson’s auction for a single item, Manelli and Vincent’s mechanism for a

single bidder with additive preferences over two items, or Yao’s auction for

two additive bidders with binary support distributions and multiple items, even

if no prior knowledge about the form of optimal auctions is encoded in the

network and the only feedback during training is revenue and regret. We further

show how characterization results, even rather implicit ones such as Rochet’s

characterization through induced utilities and their gradients, can be

leveraged to obtain more precise fits to the optimal design. We conclude by

demonstrating the potential of deep learning for deriving optimal auctions with

high revenue for poorly understood problems.

Information Retrieval

Recurrent Latent Variable Networks for Session-Based Recommendation

Sotirios Chatzis , Panayiotis Christodoulou , Andreas S. Andreou Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Machine Learning (stat.ML)

In this work, we attempt to ameliorate the impact of data sparsity in the

context of session-based recommendation. Specifically, we seek to devise a

machine learning mechanism capable of extracting subtle and complex underlying

temporal dynamics in the observed session data, so as to inform the

recommendation algorithm. To this end, we improve upon systems that utilize

deep learning techniques with recurrently connected units; we do so by adopting

concepts from the field of Bayesian statistics, namely variational inference.

Our proposed approach consists in treating the network recurrent units as

stochastic latent variables with a prior distribution imposed over them. On

this basis, we proceed to infer corresponding posteriors; these can be used for

prediction and recommendation generation, in a way that accounts for the

uncertainty in the available sparse training data. To allow for our approach to

easily scale to large real-world datasets, we perform inference under an

approximate amortized variational inference (AVI) setup, whereby the learned

posteriors are parameterized via (conventional) neural networks. We perform an

extensive experimental evaluation of our approach using challenging benchmark

datasets, and illustrate its superiority over existing state-of-the-art

techniques.

RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval

Pedro Saleiro , Natasa Milic-Frayling , Eduarda Mendes Rodrigues , Carlos Soares

Comments: SIGIR 17 (resource)

Subjects

:

Information Retrieval (cs.IR)

Improvements of entity-relationship (E-R) search techniques have been

hampered by a lack of test collections, particularly for complex queries

involving multiple entities and relationships. In this paper we describe a

method for generating E-R test queries to support comprehensive E-R search

experiments. Queries and relevance judgments are created from content that

exists in a tabular form where columns represent entity types and the table

structure implies one or more relationships among the entities. Editorial work

involves creating natural language queries based on relationships represented

by the entries in the table. We have publicly released the RELink test

collection comprising 600 queries and relevance judgments obtained from a

sample of Wikipedia List-of-lists-of-lists tables. The latter comprise tuples

of entities that are extracted from columns and labelled by corresponding

entity types and relationships they represent. In order to facilitate research

in complex E-R retrieval, we have created and released as open source the

RELink Framework that includes Apache Lucene indexing and search specifically

tailored to E-R retrieval. RELink includes entity and relationship indexing

based on the ClueWeb-09-B Web collection with FACC1 text span annotations

linked to Wikipedia entities. With ready to use search resources and a

comprehensive test collection, we support community in pursuing E-R research at

scale.

Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks

Massimo Quadrana , Alexandros Karatzoglou , Balázs Hidasi , Paolo Cremonesi Subjects : Learning (cs.LG) ; Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Session-based recommendations are highly relevant in many modern on-line

services (e.g. e-commerce, video streaming) and recommendation settings.

Recurrent Neural Networks have recently been shown to perform very well in

session-based settings. While in many session-based recommendation domains user

identifiers are hard to come by, there are also domains in which user profiles

are readily available. We propose a seamless way to personalize RNN models with

cross-session information transfer and devise a hierarchical RNN model that

relies end evolves latent hidden states of the RNN’s across user sessions.

Results on two industry datasets show large improvements over the session-only

RNN’s.

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

Joan Serrà , Alexandros Karatzoglou

Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Recommendation algorithms that incorporate techniques from deep learning are

becoming increasingly popular. Due to the structure of the data coming from

recommendation domains (i.e., one-hot-encoded vectors of item preferences),

these algorithms tend to have large input and output dimensionalities that

dominate their overall size. This makes them difficult to train, due to the

limited memory of graphical processing units, and difficult to deploy on mobile

devices with limited hardware. To address these difficulties, we propose Bloom

embeddings, a compression technique that can be applied to the input and output

of neural network models dealing with sparse high-dimensional binary-coded

instances. Bloom embeddings are computationally efficient, and do not seriously

compromise the accuracy of the model up to 1/5 compression ratios. In some

cases, they even improve over the original accuracy, with relative increases up

to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4

alternative methods, obtaining favorable results. We also discuss a number of

further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time

operation, zero or marginal space requirements, training time speedups, or the

fact that they do not require any change to the core model architecture or

training configuration.

A Direction Search and Spectral Clustering Based Approach to Subspace Clustering

Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

This paper presents a new spectral-clustering-based approach to the subspace

clustering problem in which the data lies in the union of an unknown number of

unknown linear subspaces. Underpinning the proposed method is a convex program

for optimal direction search, which for each data point d, finds an optimal

direction in the span of the data that has minimum projection on the other data

points and non-vanishing projection on d. The obtained directions are

subsequently leveraged to identify a neighborhood set for each data point. An

Alternating Direction Method of Multipliers (ADMM) framework is provided to

efficiently solve for the optimal directions. The proposed method is shown to

often outperform the existing subspace clustering methods, particularly for

unwieldy scenarios involving high levels of noise and close subspaces, and

yields the state-of-the-art results for the problem of face clustering using

subspace segmentation.

Dionysius: A Framework for Modeling Hierarchical User Interactions in Recommender Systems

Jian Wang , Krishnaram Kenthapadi , Kaushik Rangadurai , David Hardtke Subjects : Social and Information Networks (cs.SI) ; Information Retrieval (cs.IR)

We address the following problem: How do we incorporate user item interaction

signals as part of the relevance model in a large-scale personalized

recommendation system such that, (1) the ability to interpret the model and

explain recommendations is retained, and (2) the existing infrastructure

designed for the (user profile) content-based model can be leveraged? We

propose Dionysius, a hierarchical graphical model based framework and system

for incorporating user interactions into recommender systems, with minimal

change to the underlying infrastructure. We learn a hidden fields vector for

each user by considering the hierarchy of interaction signals, and replace the

user profile-based vector with this learned vector, thereby not expanding the

feature space at all. Thus, our framework allows the use of existing

recommendation infrastructure that supports content based features. We

implemented and deployed this system as part of the recommendation platform at

LinkedIn for more than one year. We validated the efficacy of our approach

through extensive offline experiments with different model choices, as well as

online A/B testing experiments. Our deployment of this system as part of the

job recommendation engine resulted in significant improvement in the quality of

retrieved results, thereby generating improved user experience and positive

impact for millions of users.

Computation and Language

An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing

Marcin Junczys-Dowmunt , Roman Grundkiewicz Subjects : Computation and Language (cs.CL)

In this work, we explore multiple neural architectures adapted for the task

of automatic post-editing of machine translation output. We focus on neural

end-to-end models that combine both inputs (mt) and (src) in a single neural

architecture, modeling ({mt, src}

ightarrow pe) directly. Apart from that,

we investigate the influence of hard-attention models which seem to be

well-suited for monolingual tasks, as well as combinations of both ideas. We

report results on data sets provided during the WMT 2016 shared task on

automatic post-editing and can demonstrate that double-attention models that

incorporate all available data in the APE scenario in a single model improve on

the best shared task system and on all other published results after the shared

task. Double-attention models that are combined with hard attention remain

competitive despite applying fewer changes to the input.

Zero-Shot Relation Extraction via Reading Comprehension

Omer Levy , Minjoon Seo , Eunsol Choi , Luke Zettlemoyer

Comments: CoNLL 2017

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

We show that relation extraction can be reduced to answering simple reading

comprehension questions, by associating one or more natural-language questions

with each relation slot. This reduction has several advantages: we can (1)

learn relation-extraction models by extending recent neural

reading-comprehension techniques, (2) build very large training sets for those

models by combining relation-specific crowd-sourced questions with distant

supervision, and even (3) do zero-shot learning by extracting new relation

types that are only specified at test-time, for which we have no labeled

training examples. Experiments on a Wikipedia slot-filling task demonstrate

that the approach can generalize to new questions for known relation types with

high accuracy, and that zero-shot generalization to unseen relation types is

possible, at lower accuracy levels, setting the bar for future work on this

task.

Modelling prosodic structure using Artificial Neural Networks

Jean-Philippe Bernandy , Charalambos Themistocleous

Comments: 4 pages, 3 figures, Experimental linguistics 2017

Subjects

:

Computation and Language (cs.CL)

The ability to accurately perceive whether a speaker is asking a question or

is making a statement is crucial for any successful interaction. However,

learning and classifying tonal patterns has been a challenging task for

automatic speech recognition and for models of tonal representation, as tonal

contours are characterized by significant variation. This paper provides a

classification model of Cypriot Greek questions and statements. We evaluate two

state-of-the-art network architectures: a Long Short-Term Memory (LSTM) network

and a convolutional network (ConvNet). The ConvNet outperforms the LSTM in the

classification task and exhibited an excellent performance with 95%

classification accuracy.

A Supervised Approach to Extractive Summarisation of Scientific Papers

Ed Collins , Isabelle Augenstein , Sebastian Riedel

Comments: 11 pages, 6 figures

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)

Automatic summarisation is a popular approach to reduce a document to its

main arguments. Recent research in the area has focused on neural approaches to

summarisation, which can be very data-hungry. However, few large datasets exist

and none for the traditionally popular domain of scientific publications, which

opens up challenging research avenues centered on encoding large, complex

documents. In this paper, we introduce a new dataset for summarisation of

computer science publications by exploiting a large resource of author provided

summaries and show straightforward ways of extending it further. We develop

models on the dataset making use of both neural sentence encoding and

traditionally used summarisation features and show that models which encode

sentences as well as their local and global context perform best, significantly

outperforming well-established baseline methods.

Six Challenges for Neural Machine Translation

Philipp Koehn , Rebecca Knowles

Comments: 12 pages; First Workshop on Neural Machine Translation, 2017

Subjects

:

Computation and Language (cs.CL)

We explore six challenges for neural machine translation: domain mismatch,

amount of training data, rare words, long sentences, word alignment, and beam

search. We show both deficiencies and improvements over the quality of

phrase-based statistical machine translation.

Attention-based Vocabulary Selection for NMT Decoding

Baskaran Sankaran , Markus Freitag , Yaser Al-Onaizan

Comments: Submitted to Second Conference on Machine Translation (WMT-17); 7 pages

Subjects

:

Computation and Language (cs.CL)

Neural Machine Translation (NMT) models usually use large target vocabulary

sizes to capture most of the words in the target language. The vocabulary size

is a big factor when decoding new sentences as the final softmax layer

normalizes over all possible target words. To address this problem, it is

widely common to restrict the target vocabulary with candidate lists based on

the source sentence. Usually, the candidate lists are a combination of external

word-to-word aligner, phrase table entries or most frequent words. In this

work, we propose a simple and yet novel approach to learn candidate lists

directly from the attention layer during NMT training. The candidate lists are

highly optimized for the current NMT model and do not need any external

computation of the candidate pool. We show significant decoding speedup

compared with using the entire vocabulary, without losing any translation

quality for two language pairs.

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

Shane Settle , Keith Levin , Herman Kamper , Karen Livescu

Comments: To appear Interspeech 2017

Subjects

:

Computation and Language (cs.CL)

Query-by-example search often uses dynamic time warping (DTW) for comparing

queries and proposed matching segments. Recent work has shown that comparing

speech segments by representing them as fixed-dimensional vectors — acoustic

word embeddings — and measuring their vector distance (e.g., cosine distance)

can discriminate between words more accurately than DTW-based approaches. We

consider an approach to query-by-example search that embeds both the query and

database segments according to a neural model, followed by nearest-neighbor

search to find the matching segments. Earlier work on embedding-based

query-by-example, using template-based acoustic word embeddings, achieved

competitive performance. We find that our embeddings, based on recurrent neural

networks trained to optimize word discrimination, achieve substantial

improvements in performance and run-time efficiency over the previous

approaches.

Encoding of phonology in a recurrent neural model of grounded speech

Afra Alishahi , Marie Barking , Grzegorz Chrupała

Comments: Accepted at CoNLL 2017

Subjects

:

Computation and Language (cs.CL)

; Learning (cs.LG); Sound (cs.SD)

We study the representation and encoding of phonemes in a recurrent neural

network model of grounded speech. We use a model which processes images and

their spoken descriptions, and projects the visual and auditory representations

into the same semantic space. We perform a number of analyses on how

information about individual phonemes is encoded in the MFCC features extracted

from the speech signal, and the activations of the layers of the model. Via

experiments with phoneme decoding and phoneme discrimination we show that

phoneme representations are most salient in the lower layers of the model,

where low-level signals are processed at a fine-grained level, although a large

amount of phonological information is retain at the top recurrent layer. We

further find out that the attention mechanism following the top recurrent layer

significantly attenuates encoding of phonology and makes the utterance

embeddings much more invariant to synonymy. Moreover, a hierarchical clustering

of phoneme representations learned by the network shows an organizational

structure of phonemes similar to those proposed in linguistics.

Verb Physics: Relative Physical Knowledge of Actions and Objects

Maxwell Forbes , Yejin Choi

Comments: 11 pages, published in Proceedings of ACL 2017

Subjects

:

Computation and Language (cs.CL)

Learning commonsense knowledge from natural language text is nontrivial due

to reporting bias: people rarely state the obvious, e.g., “My house is bigger

than me.” However, while rarely stated explicitly, this trivial everyday

knowledge does influence the way people talk about the world, which provides

indirect clues to reason about the world. For example, a statement like, “Tyler

entered his house” implies that his house is bigger than Tyler.

In this paper, we present an approach to infer relative physical knowledge of

actions and objects along five dimensions (e.g., size, weight, and strength)

from unstructured natural language text. We frame knowledge acquisition as

joint inference over two closely related problems: learning (1) relative

physical knowledge of object pairs and (2) physical implications of actions

when applied to those object pairs. Empirical results demonstrate that it is

possible to extract knowledge of actions and objects from language and that

joint inference over different types of knowledge improves performance.

Adversarial Feature Matching for Text Generation

Yizhe Zhang , Zhe Gan , Kai Fan , Zhi Chen , Ricardo Henao , Dinghan Shen , Lawrence Carin

Comments: Accepted by ICML 2017

Subjects

:

Machine Learning (stat.ML)

; Computation and Language (cs.CL); Learning (cs.LG)

The Generative Adversarial Network (GAN) has achieved great success in

generating realistic (real-valued) synthetic data. However, convergence issues

and difficulties dealing with discrete data hinder the applicability of GAN to

text. We propose a framework for generating realistic text via adversarial

training. We employ a long short-term memory network as generator, and a

convolutional network as discriminator. Instead of using the standard objective

of GAN, we propose matching the high-dimensional latent feature distributions

of real and synthetic sentences, via a kernelized discrepancy metric. This

eases adversarial training by alleviating the mode-collapsing problem. Our

experiments show superior performance in quantitative evaluation, and

demonstrate that our model can generate realistic-looking sentences.

Distributed, Parallel, and Cluster Computing

Live Service Migration in Mobile Edge Clouds

Andrew Machen , Shiqiang Wang , Kin K. Leung , Bong Jun Ko , Theodoros Salonidis

Comments: This is the author’s version of the paper accepted for publication in IEEE Wireless Communications

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Networking and Internet Architecture (cs.NI)

Mobile edge clouds (MECs) bring the benefits of the cloud closer to the user,

by installing small cloud infrastructures at the network edge. This enables a

new breed of real-time applications, such as instantaneous object recognition

and safety assistance in intelligent transportation systems, that require very

low latency. One key issue that comes with proximity is how to ensure that

users always receive good performance as they move across different locations.

Migrating services between MECs is seen as the means to achieve this. This

article presents a layered framework for migrating active service applications

that are encapsulated either in virtual machines (VMs) or containers. This

layering approach allows a substantial reduction in service downtime. The

framework is easy to implement using readily available technologies, and one of

its key advantages is that it supports containers, which is a promising

emerging technology that offers tangible benefits over VMs. The migration

performance of various real applications is evaluated by experiments under the

presented framework. Insights drawn from the experimentation results are

discussed.

Distributed Subgraph Detection

Pierre Fraigniaud , Pedro Montealegre , Dennis Olivetti , Ivan Rapaport , Ioan Todinca Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

In the standard CONGEST model for distributed network computing, it is known

that “global” tasks such as minimum spanning tree, diameter, and all-pairs

shortest paths, consume large bandwidth, for their running-time is

(Omega(mbox{poly}(n))) rounds in (n)-node networks with constant diameter.

Surprisingly, “local” tasks such as detecting the presence of a 4-cycle as a

subgraph also requires (widetilde{Omega}(sqrt{n})) rounds, even using

randomized algorithms, and the best known upper bound for detecting the

presence of a 3-cycle is (widetilde{O}(n^{frac{2}{3}})) rounds. The objective

of this paper is to better understand the landscape of such subgraph detection

tasks. We show that, in contrast to emph{cycles}, which are hard to detect in

the CONGEST model, there exists a deterministic algorithm for detecting the

presence of a subgraph isomorphic to (T) running in a emph{constant} number of

rounds, for every tree (T). Our algorithm provides a distributed implementation

of a combinatorial technique due to ErdH{o}s et al. for sparsening the set of

partial solutions kept by the nodes at each round. Our result has important

consequences to emph{distributed property-testing}, i.e., to randomized

algorithms whose aim is to distinguish between graphs satisfying a property,

and graphs far from satisfying that property. In particular, we get that, for

every graph pattern (H) composed of an edge and a tree connected in an

arbitrary manner, there exists a distributed testing algorithm for

(H)-freeness, performing in a constant number of rounds. Although the class of

graph patterns (H) formed by a tree and an edge connected arbitrarily may look

artificial, all previous results of the literature concerning testing

(H)-freeness for classical patterns such as cycles and cliques can be viewed as

direct consequences of our result, while our algorithm enables testing more

complex patterns.

Distributed Detection of Cycles

Pierre Fraigniaud , Dennis Olivetti Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Distributed property testing in networks has been introduced by Brakerski and

Patt-Shamir (2011), with the objective of detecting the presence of large dense

sub-networks in a distributed manner. Recently, Censor-Hillel et al. (2016)

have shown how to detect 3-cycles in a constant number of rounds by a

distributed algorithm. In a follow up work, Fraigniaud et al. (2016) have shown

how to detect 4-cycles in a constant number of rounds as well. However, the

techniques in these latter works were shown not to generalize to larger cycles

(C_k) with (kgeq 5). In this paper, we completely settle the problem of cycle

detection, by establishing the following result. For every (kgeq 3), there

exists a distributed property testing algorithm for (C_k)-freeness, performing

in a constant number of rounds. All these results hold in the classical CONGEST

model for distributed network computing. Our algorithm is 1-sided error. Its

round-complexity is (O(1/epsilon)) where (epsilonin(0,1)) is the property

testing parameter measuring the gap between legal and illegal instances.

The Power of Choice in Priority Scheduling

Dan Alistarh , Justin Kopinsky , Jerry Li , Giorgi Nadiradze Subjects : Data Structures and Algorithms (cs.DS) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Consider the following random process: we are given (n) queues, into which

elements of increasing labels are inserted uniformly at random. To remove an

element, we pick two queues at random, and remove the element of lower label

(higher priority) among the two. The cost of a removal is the rank of the label

removed, among labels still present in any of the queues, that is, the distance

from the optimal choice at each step. Variants of this strategy are prevalent

in state-of-the-art concurrent priority queue implementations. Nonetheless, it

is not known whether such implementations provide any rank guarantees, even in

a sequential model.

We answer this question, showing that this strategy provides surprisingly

strong guarantees: Although the single-choice process, where we always insert

and remove from a single randomly chosen queue, has degrading cost, going to

infinity as we increase the number of steps, in the two choice process, the

expected rank of a removed element is (O( n )) while the expected worst-case

cost is (O( n log n )). These bounds are tight, and hold irrespective of the

number of steps for which we run the process.

The argument is based on a new technical connection between “heavily loaded”

balls-into-bins processes and priority scheduling.

Our analytic results inspire a new concurrent priority queue implementation,

which improves upon the state of the art in terms of practical performance.

Serialisable Multi-Level Transaction Control: A Specification and Verification

Egon Börger , Klaus-Dieter Schewe , Qing Wang

Comments: 25 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1706.01762

Journal-ref: Sci. Comput. Program. 131: 42-58 (2016)

Subjects

:

Databases (cs.DB)

; Distributed, Parallel, and Cluster Computing (cs.DC)

We define a programming language independent controller TaCtl for multi-level

transactions and an operator (TA), which when applied to concurrent programs

with multi-level shared locations containing hierarchically structured complex

values, turns their behavior with respect to some abstract termination

criterion into a transactional behavior. We prove the correctness property that

concurrent runs under the transaction controller are serialisable, assuming an

Inverse Operation Postulate to guarantee recoverability. For its applicability

to a wide range of programs we specify the transaction controller TaCtl and the

operator (TA) in terms of Abstract State Machines (ASMs). This allows us to

model concurrent updates at different levels of nested locations in a precise

yet simple manner, namely in terms of partial ASM updates. It also provides the

possibility to use the controller TaCtl and the operator (TA) as a plug-in when

specifying concurrent system components in terms of sequential ASMs.

Asynchronous Graph Pattern Matching on Multiprocessor Systems

Alexander Krause , Annett Ungethüm , Thomas Kissinger , Dirk Habich , Wolfgang Lehner

Comments: 14 Pages, Extended version for ADBIS 2017

Subjects

:

Databases (cs.DB)

; Distributed, Parallel, and Cluster Computing (cs.DC)

Pattern matching on large graphs is the foundation for a variety of

application domains. Strict latency requirements and continuously increasing

graph sizes demand the usage of highly parallel in-memory graph processing

engines that need to consider non-uniform memory access (NUMA) and concurrency

issues to scale up on modern multiprocessor systems. To tackle these aspects,

graph partitioning becomes increasingly important. Hence, we present a

technique to process graph pattern matching on NUMA systems in this paper. As a

scalable pattern matching processing infrastructure, we leverage a

data-oriented architecture that preserves data locality and minimizes

concurrency-related bottlenecks on NUMA systems. We show in detail, how graph

pattern matching can be asynchronously processed on a multiprocessor system.

Learning

Gradient descent GAN optimization is locally stable

Vaishnavh Nagarajan , J. Zico Kolter Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

Despite their growing prominence, optimization in generative adversarial

networks (GANs) is still a poorly-understood topic. In this paper, we analyze

the “gradient descent” form of GAN optimization (i.e., the natural setting

where we simultaneously take small gradient steps in both generator and

discriminator parameters). We show that even though GAN optimization does not

correspond to a convex-concave game, even for simple parameterizations, under

proper conditions, equilibrium points of this optimization procedure are still

locally asymptotically stable for the traditional GAN formulation. On the other

hand, we show that the recently-proposed Wasserstein GAN can have

non-convergent limit cycles near equilibrium. Motivated by this stability

analysis, we propose an additional regularization term for gradient descent GAN

updates, which is able to guarantee local stability for both the WGAN and for

the traditional GAN, and also shows practical promise in speeding up

convergence and addressing mode collapse.

Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks

Massimo Quadrana , Alexandros Karatzoglou , Balázs Hidasi , Paolo Cremonesi Subjects : Learning (cs.LG) ; Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Session-based recommendations are highly relevant in many modern on-line

services (e.g. e-commerce, video streaming) and recommendation settings.

Recurrent Neural Networks have recently been shown to perform very well in

session-based settings. While in many session-based recommendation domains user

identifiers are hard to come by, there are also domains in which user profiles

are readily available. We propose a seamless way to personalize RNN models with

cross-session information transfer and devise a hierarchical RNN model that

relies end evolves latent hidden states of the RNN’s across user sessions.

Results on two industry datasets show large improvements over the session-only

RNN’s.

Online Learning for Structured Loss Spaces

Siddharth Barman , Aditya Gopalan , Aadirupa Saha

Comments: 23 pages

Subjects

:

Learning (cs.LG)

We consider prediction with expert advice when the loss vectors are assumed

to lie in a set described by the sum of atomic norm balls. We derive a regret

bound for a general version of the online mirror descent (OMD) algorithm that

uses a combination of regularizers, each adapted to the constituent atomic

norms. The general result recovers standard OMD regret bounds, and yields

regret bounds for new structured settings where the loss vectors are (i) noisy

versions of points from a low-rank subspace, (ii) sparse vectors corrupted with

noise, and (iii) sparse perturbations of low-rank vectors. For the problem of

online learning with structured losses, we also show lower bounds on regret in

terms of rank and sparsity of the source set of the loss vectors, which implies

lower bounds for the above additive loss settings as well.

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

Yuanzhi Li , Yingyu Liang

Comments: Accepted to the International Conference on Machine Learning (ICML), 2017

Subjects

:

Learning (cs.LG)

; Data Structures and Algorithms (cs.DS); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

Non-negative matrix factorization is a basic tool for decomposing data into

the feature and weight matrices under non-negativity constraints, and in

practice is often solved in the alternating minimization framework. However, it

is unclear whether such algorithms can recover the ground-truth feature matrix

when the weights for different features are highly correlated, which is common

in applications. This paper proposes a simple and natural alternating gradient

descent based algorithm, and shows that with a mild initialization it provably

recovers the ground-truth in the presence of strong correlations. In most

interesting cases, the correlation can be in the same order as the highest

possible. Our analysis also reveals its several favorable features including

robustness to noise. We complement our theoretical results with empirical

studies on semi-synthetic datasets, demonstrating its advantage over several

popular methods in recovering the ground-truth.

Convergence Analysis of Belief Propagation for Pairwise Linear Gaussian Models

Jian Du , Shaodan Ma , Yik-Chung Wu , Soummya Kar , José M. F. Moura

Comments: arXiv admin note: text overlap with arXiv:1704.03969

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Gaussian belief propagation (BP) has been widely used for distributed

inference in large-scale networks such as the smart grid, sensor networks, and

social networks, where local measurements/observations are scattered over a

wide geographical area. One particular case is when two neighboring agents

share a common observation. For example, to estimate voltage in the direct

current (DC) power flow model, the current measurement over a power line is

proportional to the voltage difference between two neighboring buses. When

applying the Gaussian BP algorithm to this type of problem, the convergence

condition remains an open issue. In this paper, we analyze the convergence

properties of Gaussian BP for this pairwise linear Gaussian model. We show

analytically that the updating information matrix converges at a geometric rate

to a unique positive definite matrix with arbitrary positive semidefinite

initial value and further provide the necessary and sufficient convergence

condition for the belief mean vector to the optimal estimate.

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

Joan Serrà , Alexandros Karatzoglou

Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Recommendation algorithms that incorporate techniques from deep learning are

becoming increasingly popular. Due to the structure of the data coming from

recommendation domains (i.e., one-hot-encoded vectors of item preferences),

these algorithms tend to have large input and output dimensionalities that

dominate their overall size. This makes them difficult to train, due to the

limited memory of graphical processing units, and difficult to deploy on mobile

devices with limited hardware. To address these difficulties, we propose Bloom

embeddings, a compression technique that can be applied to the input and output

of neural network models dealing with sparse high-dimensional binary-coded

instances. Bloom embeddings are computationally efficient, and do not seriously

compromise the accuracy of the model up to 1/5 compression ratios. In some

cases, they even improve over the original accuracy, with relative increases up

to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4

alternative methods, obtaining favorable results. We also discuss a number of

further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time

operation, zero or marginal space requirements, training time speedups, or the

fact that they do not require any change to the core model architecture or

training configuration.

Accelerated Dual Learning by Homotopic Initialization

Hadi Daneshmand , Hamed Hassani , Thomas Hofmann Subjects : Learning (cs.LG)

Gradient descent and coordinate descent are well understood in terms of their

asymptotic behavior, but less so in a transient regime often used for

approximations in machine learning. We investigate how proper initialization

can have a profound effect on finding near-optimal solutions quickly. We show

that a certain property of a data set, namely the boundedness of the

correlations between eigenfeatures and the response variable, can lead to

faster initial progress than expected by commonplace analysis. Convex

optimization problems can tacitly benefit from that, but this automatism does

not apply to their dual formulation. We analyze this phenomenon and devise

provably good initialization strategies for dual optimization as well as

heuristics for the non-convex case, relevant for deep learning. We find our

predictions and methods to be experimentally well-supported.

Exact Learning from an Honest Teacher That Answers Membership Queries

Nader H. Bshouty Subjects : Learning (cs.LG)

Given a teacher that holds a function (f:X o R) from some class of functions

(C). The teacher can receive from the learner an element~(d) in the domain (X)

(a query) and returns the value of the function in (d), (f(d)in R). The

learner goal is to find (f) with a minimum number of queries, optimal time

complexity, and optimal resources.

In this survey, we present some of the results known from the literature,

different techniques used, some new problems, and open problems.

A Well-Tempered Landscape for Non-convex Robust Subspace Recovery

Tyler Maunu , Teng Zhang , Gilad Lerman Subjects : Learning (cs.LG) ; Optimization and Control (math.OC); Machine Learning (stat.ML)

We present a mathematical analysis of a non-convex energy landscape for

Robust Subspace Recovery. We prove that an underlying subspace is the only

stationary point and local minimizer in a large neighborhood if a generic

condition holds for a dataset. We further show that if the generic condition is

satisfied, a geodesic gradient descent method over the Grassmannian manifold

can exactly recover the underlying subspace with proper initialization. The

condition is shown to hold with high probability for a certain model of data.

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

Shipra Agrawal , Vashist Avadhanula , Vineet Goyal , Assaf Zeevi Subjects : Learning (cs.LG)

We consider a dynamic assortment selection problem, where in every round the

retailer offers a subset (assortment) of (N) substitutable products to a

consumer, who selects one of these products according to a multinomial logit

(MNL) choice model. The retailer observes this choice and the objective is to

dynamically learn the model parameters, while optimizing cumulative revenues

over a selling horizon of length (T). We refer to this exploration-exploitation

formulation as the MNL-Bandit problem. Existing methods for this problem follow

an “explore-then-exploit” approach, which estimate parameters to a desired

accuracy and then, treating these estimates as if they are the correct

parameter values, offers the optimal assortment based on these estimates. These

approaches require certain a priori knowledge of “separability”, determined by

the true parameters of the underlying MNL model, and this in turn is critical

in determining the length of the exploration period. (Separability refers to

the distinguishability of the true optimal assortment from the other

sub-optimal alternatives.) In this paper, we give an efficient algorithm that

simultaneously explores and exploits, achieving performance independent of the

underlying parameters. The algorithm can be implemented in a fully online

manner, without knowledge of the horizon length (T). Furthermore, the algorithm

is adaptive in the sense that its performance is near-optimal in both the “well

separated” case, as well as the general parameter setting where this separation

need not hold.

Recurrent Neural Networks with Top-k Gains for Session-based Recommendations

Balázs Hidasi , Alexandros Karatzoglou Subjects : Learning (cs.LG)

RNNs have been shown to be excellent models for sequential data and in

particular for session-based user behavior. The use of RNNs provides impressive

performance benefits over classical methods in session-based recommendations.

In this work we introduce a novel ranking loss function tailored for RNNs in

recommendation settings. The better performance of such loss over alternatives,

along with further tricks and improvements described in this work, allow to

achieve an overall improvement of up to 35% in terms of MRR and Recall@20 over

previous session-based RNN solutions and up to 51% over classical collaborative

filtering approaches. Unlike data augmentation-based improvements, our method

does not increase training times significantly.

SmoothGrad: removing noise by adding noise

Daniel Smilkov , Nikhil Thorat , Been Kim , Fernanda Viégas , Martin Wattenberg

Comments: 10 pages

Subjects

:

Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Explaining the output of a deep network remains a challenge. In the case of

an image classifier, one type of explanation is to identify pixels that

strongly influence the final decision. A starting point for this strategy is

the gradient of the class score function with respect to the input image. This

gradient can be interpreted as a sensitivity map, and there are several

techniques that elaborate on this basic idea. This paper makes two

contributions: it introduces SmoothGrad, a simple method that can help visually

sharpen gradient-based sensitivity maps, and it discusses lessons in the

visualization of these maps. We publish the code for our experiments and a

website with our results.

Lost Relatives of the Gumbel Trick

Matej Balog , Nilesh Tripuraneni , Zoubin Ghahramani , Adrian Weller

Comments: 34th International Conference on Machine Learning (ICML 2017)

Subjects

:

Machine Learning (stat.ML)

; Learning (cs.LG)

The Gumbel trick is a method to sample from a discrete probability

distribution, or to estimate its normalizing partition function. The method

relies on repeatedly applying a random perturbation to the distribution in a

particular way, each time solving for the most likely configuration. We derive

an entire family of related methods, of which the Gumbel trick is one member,

and show that the new methods have superior properties in several settings with

minimal additional computational cost. In particular, for the Gumbel trick to

yield computational benefits for discrete graphical models, Gumbel

perturbations on all configurations are typically replaced with so-called

low-rank perturbations. We show how a subfamily of our new methods adapts to

this setting, proving new upper and lower bounds on the log partition function

and deriving a family of sequential samplers for the Gibbs distribution.

Finally, we balance the discussion by showing how the simpler analytical form

of the Gumbel trick enables additional theoretical results.

Zero-Shot Relation Extraction via Reading Comprehension

Omer Levy , Minjoon Seo , Eunsol Choi , Luke Zettlemoyer

Comments: CoNLL 2017

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

We show that relation extraction can be reduced to answering simple reading

comprehension questions, by associating one or more natural-language questions

with each relation slot. This reduction has several advantages: we can (1)

learn relation-extraction models by extending recent neural

reading-comprehension techniques, (2) build very large training sets for those

models by combining relation-specific crowd-sourced questions with distant

supervision, and even (3) do zero-shot learning by extracting new relation

types that are only specified at test-time, for which we have no labeled

training examples. Experiments on a Wikipedia slot-filling task demonstrate

that the approach can generalize to new questions for known relation types with

high accuracy, and that zero-shot generalization to unseen relation types is

possible, at lower accuracy levels, setting the bar for future work on this

task.

Interaction-Based Distributed Learning in Cyber-Physical and Social Networks

Francesco Sasso , Angelo Coluccia , Giuseppe Notarstefano Subjects : Optimization and Control (math.OC) ; Learning (cs.LG); Statistics Theory (math.ST)

In this paper we consider a network scenario in which agents can evaluate

each other according to a score graph that models some physical or social

interaction. The goal is to design a distributed protocol, run by the agents,

allowing them to learn their unknown state among a finite set of possible

values. We propose a Bayesian framework in which scores and states are

associated to probabilistic events with unknown parameters and hyperparameters

respectively. We prove that each agent can learn its state by means of a local

Bayesian classifier and a (centralized) Maximum-Likelihood (ML) estimator of

the parameter-hyperparameter that combines plain ML and Empirical Bayes

approaches. By using tools from graphical models, which allow us to gain

insight on conditional dependences of scores and states, we provide two relaxed

probabilistic models that ultimately lead to ML parameter-hyperparameter

estimators amenable to distributed computation. In order to highlight the

appropriateness of the proposed relaxations, we demonstrate the distributed

estimators on a machine-to-machine testing set-up for anomaly detection and on

a social interaction set-up for user profiling.

Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

Jinzhuo Wang , Wenmin Wang , Ronggang Wang , Wen Gao

Comments: AAAI 2017

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Monte Carlo tree search (MCTS) is extremely popular in computer Go which

determines each action by enormous simulations in a broad and deep search tree.

However, human experts select most actions by pattern analysis and careful

evaluation rather than brute search of millions of future nteractions. In this

paper, we propose a computer Go system that follows experts way of thinking and

playing. Our system consists of two parts. The first part is a novel deep

alternative neural network (DANN) used to generate candidates of next move.

Compared with existing deep convolutional neural network (DCNN), DANN inserts

recurrent layer after each convolutional layer and stacks them in an

alternative manner. We show such setting can preserve more contexts of local

features and its evolutions which are beneficial for move prediction. The

second part is a long-term evaluation (LTE) module used to provide a reliable

evaluation of candidates rather than a single probability from move predictor.

This is consistent with human experts nature of playing since they can foresee

tens of steps to give an accurate estimation of candidates. In our system, for

each candidate, LTE calculates a cumulative reward after several future

interactions when local variations are settled. Combining criteria from the two

parts, our system determines the optimal choice of next move. For more

comprehensive experiments, we introduce a new professional Go dataset (PGD),

consisting of 253233 professional records. Experiments on GoGoD and PGD

datasets show the DANN can substantially improve performance of move prediction

over pure DCNN. When combining LTE, our system outperforms most relevant

approaches and open engines based on MCTS.

Recurrent Latent Variable Networks for Session-Based Recommendation

Sotirios Chatzis , Panayiotis Christodoulou , Andreas S. Andreou Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Machine Learning (stat.ML)

In this work, we attempt to ameliorate the impact of data sparsity in the

context of session-based recommendation. Specifically, we seek to devise a

machine learning mechanism capable of extracting subtle and complex underlying

temporal dynamics in the observed session data, so as to inform the

recommendation algorithm. To this end, we improve upon systems that utilize

deep learning techniques with recurrently connected units; we do so by adopting

concepts from the field of Bayesian statistics, namely variational inference.

Our proposed approach consists in treating the network recurrent units as

stochastic latent variables with a prior distribution imposed over them. On

this basis, we proceed to infer corresponding posteriors; these can be used for

prediction and recommendation generation, in a way that accounts for the

uncertainty in the available sparse training data. To allow for our approach to

easily scale to large real-world datasets, we perform inference under an

approximate amortized variational inference (AVI) setup, whereby the learned

posteriors are parameterized via (conventional) neural networks. We perform an

extensive experimental evaluation of our approach using challenging benchmark

datasets, and illustrate its superiority over existing state-of-the-art

techniques.

Item Difficulty-Based Label Aggregation Models for Crowdsourcing

Chi Hong Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Learning (cs.LG)

A large amount of labeled data is required for supervised learning. However,

labeling by domain experts is expensive and time-consuming. A low cost and high

efficiency way to obtain large training datasets is to aggregate noisy labels

collected from non-professional crowds. Prior works have proposed confusion

matrices to evaluate the reliability of workers. In this paper, we redefine the

structure of the confusion matrices and propose two Bayesian Network based

methods which utilize item difficulty in label aggregation. We assume that

labels are generated by a probability distribution over confusion matrices,

item difficulties, labels and true labels. We use Markov chain Monte Carlo

method to generate samples from the posterior distribution of model parameters

and then infer the results. To avoid bad local optima, we design a method to

preliminarily predict the difficulty of each item and initialize the model

parameters. We also introduce how to improve the scalability of our model.

Empirical results show that our methods consistently outperform

state-of-the-art methods.

Analyzing the Robustness of Nearest Neighbors to Adversarial Examples

Yizhen Wang , Somesh Jha , Kamalika Chaudhuri Subjects : Machine Learning (stat.ML) ; Cryptography and Security (cs.CR); Learning (cs.LG)

Motivated by applications such as autonomous vehicles, test-time attacks via

adversarial examples have received a great deal of recent attention. In this

setting, an adversary is capable of making queries to a classifier, and

perturbs a test example by a small amount in order to force the classifier to

report an incorrect label. While a long line of work has explored a number of

attacks, not many reliable defenses are known, and there is an overall lack of

general understanding about the foundations of designing machine learning

algorithms robust to adversarial examples.

In this paper, we take a step towards addressing this challenging question by

introducing a new theoretical framework, analogous to bias-variance theory,

which we can use to tease out the causes of vulnerability. We apply our

framework to a simple classification algorithm: nearest neighbors, and analyze

its robustness to adversarial examples. Motivated by our analysis, we propose a

modified version of the nearest neighbor algorithm, and demonstrate both

theoretically and empirically that it has superior robustness to standard

nearest neighbors.

SEP-Nets: Small and Effective Pattern Networks

Zhe Li , Xiaoyu Wang , Xutao Lv , Tianbao Yang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)

While going deeper has been witnessed to improve the performance of

convolutional neural networks (CNN), going smaller for CNN has received

increasing attention recently due to its attractiveness for mobile/embedded

applications. It remains an active and important topic how to design a small

network while retaining the performance of large and deep CNNs (e.g., Inception

Nets, ResNets). Albeit there are already intensive studies on compressing the

size of CNNs, the considerable drop of performance is still a key concern in

many designs. This paper addresses this concern with several new contributions.

First, we propose a simple yet powerful method for compressing the size of deep

CNNs based on parameter binarization. The striking difference from most

previous work on parameter binarization/quantization lies at different

treatments of (1 imes 1) convolutions and (k imes k) convolutions ((k>1)),

where we only binarize (k imes k) convolutions into binary patterns. The

resulting networks are referred to as pattern networks. By doing this, we show

that previous deep CNNs such as GoogLeNet and Inception-type Nets can be

compressed dramatically with marginal drop in performance. Second, in light of

the different functionalities of (1 imes 1) (data projection/transformation)

and (k imes k) convolutions (pattern extraction), we propose a new block

structure codenamed the pattern residual block that adds transformed feature

maps generated by (1 imes 1) convolutions to the pattern feature maps

generated by (k imes k) convolutions, based on which we design a small network

with (sim 1) million parameters. Combining with our parameter binarization, we

achieve better performance on ImageNet than using similar sized networks

including recently released Google MobileNets.

A Direction Search and Spectral Clustering Based Approach to Subspace Clustering

Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

This paper presents a new spectral-clustering-based approach to the subspace

clustering problem in which the data lies in the union of an unknown number of

unknown linear subspaces. Underpinning the proposed method is a convex program

for optimal direction search, which for each data point d, finds an optimal

direction in the span of the data that has minimum projection on the other data

points and non-vanishing projection on d. The obtained directions are

subsequently leveraged to identify a neighborhood set for each data point. An

Alternating Direction Method of Multipliers (ADMM) framework is provided to

efficiently solve for the optimal directions. The proposed method is shown to

often outperform the existing subspace clustering methods, particularly for

unwieldy scenarios involving high levels of noise and close subspaces, and

yields the state-of-the-art results for the problem of face clustering using

subspace segmentation.

Adversarial Feature Matching for Text Generation

Yizhe Zhang , Zhe Gan , Kai Fan , Zhi Chen , Ricardo Henao , Dinghan Shen , Lawrence Carin

Comments: Accepted by ICML 2017

Subjects

:

Machine Learning (stat.ML)

; Computation and Language (cs.CL); Learning (cs.LG)

The Generative Adversarial Network (GAN) has achieved great success in

generating realistic (real-valued) synthetic data. However, convergence issues

and difficulties dealing with discrete data hinder the applicability of GAN to

text. We propose a framework for generating realistic text via adversarial

training. We employ a long short-term memory network as generator, and a

convolutional network as discriminator. Instead of using the standard objective

of GAN, we propose matching the high-dimensional latent feature distributions

of real and synthetic sentences, via a kernelized discrepancy metric. This

eases adversarial training by alleviating the mode-collapsing problem. Our

experiments show superior performance in quantitative evaluation, and

demonstrate that our model can generate realistic-looking sentences.

Encoding of phonology in a recurrent neural model of grounded speech

Afra Alishahi , Marie Barking , Grzegorz Chrupała

Comments: Accepted at CoNLL 2017

Subjects

:

Computation and Language (cs.CL)

; Learning (cs.LG); Sound (cs.SD)

We study the representation and encoding of phonemes in a recurrent neural

network model of grounded speech. We use a model which processes images and

their spoken descriptions, and projects the visual and auditory representations

into the same semantic space. We perform a number of analyses on how

information about individual phonemes is encoded in the MFCC features extracted

from the speech signal, and the activations of the layers of the model. Via

experiments with phoneme decoding and phoneme discrimination we show that

phoneme representations are most salient in the lower layers of the model,

where low-level signals are processed at a fine-grained level, although a large

amount of phonological information is retain at the top recurrent layer. We

further find out that the attention mechanism following the top recurrent layer

significantly attenuates encoding of phonology and makes the utterance

embeddings much more invariant to synonymy. Moreover, a hierarchical clustering

of phoneme representations learned by the network shows an organizational

structure of phonemes similar to those proposed in linguistics.

Optimal Auctions through Deep Learning

Paul Dütting , Zhe Feng , Harikrishna Narasimhan , David C. Parkes Subjects : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Learning (cs.LG)

Designing an auction that maximizes expected revenue is an intricate task.

Indeed, as of today–despite major efforts and impressive progress over the

past few years–only the single-item case is fully understood. In this work, we

initiate the exploration of the use of tools from deep learning on this topic.

The design objective is revenue optimal, dominant-strategy incentive compatible

auctions. We show that multi-layer neural networks can learn almost-optimal

auctions for settings for which there are analytical solutions, such as

Myerson’s auction for a single item, Manelli and Vincent’s mechanism for a

single bidder with additive preferences over two items, or Yao’s auction for

two additive bidders with binary support distributions and multiple items, even

if no prior knowledge about the form of optimal auctions is encoded in the

network and the only feedback during training is revenue and regret. We further

show how characterization results, even rather implicit ones such as Rochet’s

characterization through induced utilities and their gradients, can be

leveraged to obtain more precise fits to the optimal design. We conclude by

demonstrating the potential of deep learning for deriving optimal auctions with

high revenue for poorly understood problems.

Information Theory

Wiretap Channels: Nonasymptotic Fundamental Limits

Wei Yang , Rafael F. Schaefer , H. Vincent Poor

Comments: 53 pages, 3 figures

Subjects

:

Information Theory (cs.IT)

This paper investigates the maximal secret communication rate over a wiretap

channel subject to reliability and secrecy constraints at a given blocklength.

New achievability and converse bounds are derived, which are uniformly tighter

than existing bounds, and lead to the tightest bounds on the second-order

coding rate for discrete memoryless and Gaussian wiretap channels. The exact

second-order coding rate is established for semi-deterministic wiretap

channels, which characterizes the optimal tradeoff between reliability and

secrecy in the finite-blocklength regime. Underlying our achievability bounds

are two new privacy amplification results, which not only refine the existing

results, but also achieve stronger notions of secrecy.

Fast Maximum-Likelihood Decoder for 4*4 Quasi-Orthogonal Space-Time Block Code

Adel Ahmadi , Siamak Talebi Subjects : Information Theory (cs.IT)

This letter introduces two fast maximum-likelihood (ML) detection methods for

4*4 quasi-orthogonal space-time block code (QOSTBC). The first algorithm with a

relatively simple design exploits structure of quadrature amplitude modulation

(QAM) constellations to achieve its goal and the second algorithm, though

somewhat more complex, can be applied to any arbitrary constellation. Both

decoders utilize a novel decomposition technique for ML metric which divides

the metric into independent positive parts and a positive interference part.

Search spaces of symbols are substantially reduced by employing the independent

parts and statistics of noise. Finally, the members of search spaces are

successively evaluated until the metric is minimized. Simulation results

confirm that the proposed decoder is superior to some of the most recently

published methods in terms of complexity level. More specifically, the results

verified that application of the new algorithm with 1024-QAM would require

reduced computational complexity compared to state-of-the-art solution with

16-QAM.

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

Dingwen Tao , Sheng Di , Zizhong Chen , Franck Cappello

Comments: Accepted by IPDPS’17, 11 pages, 10 figures, double column

Subjects

:

Information Theory (cs.IT)

Today’s HPC applications are producing extremely large amounts of data, such

that data storage and analysis are becoming more challenging for scientific

research. In this work, we design a new error-controlled lossy compression

algorithm for large-scale scientific data. Our key contribution is

significantly improving the prediction hitting rate (or prediction accuracy)

for each data point based on its nearby data values along multiple dimensions.

We derive a series of multilayer prediction formulas and their unified formula

in the context of data compression. One serious challenge is that the data

prediction has to be performed based on the preceding decompressed values

during the compression in order to guarantee the error bounds, which may

degrade the prediction accuracy in turn. We explore the best layer for the

prediction by considering the impact of compression errors on the prediction

accuracy. Moreover, we propose an adaptive error-controlled quantization

encoder, which can further improve the prediction hitting rate considerably.

The data size can be reduced significantly after performing the variable-length

encoding because of the uneven distribution produced by our quantization

encoder. We evaluate the new compressor on production scientific data sets and

compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP,

SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class,

especially with regard to compression factors (or bit-rates) and compression

errors (including RMSE, NRMSE, and PSNR). Our solution is better than the

second-best solution by more than a 2x increase in the compression factor and

3.8x reduction in the normalized root mean squared error on average, with

reasonable error bounds and user-desired bit-rates.

Approximate Optimal Designs for Multivariate Polynomial Regression

Yohann De Castro , Fabrice Gamboa , Didier Henrion , Roxana Hess , Jean-Bernard Lasserre

Comments: 24 Pages, 5 Figures. arXiv admin note: substantial text overlap with arXiv:1703.01777

Subjects

:

Statistics Theory (math.ST)

; Information Theory (cs.IT); Numerical Analysis (math.NA); Computation (stat.CO); Methodology (stat.ME)

We introduce a new approach aiming at computing approximate optimal designs

for multivariate polynomial regressions on compact (semi-algebraic) design

spaces. We use the moment-sum-of-squares hierarchy of semidefinite programming

problems to solve numerically and approximately the optimal design problem. The

geometry of the design is recovered via semidefinite programming duality

theory. This article shows that the hierarchy converges to the approximate

optimal design as the order of the hierarchy increases. Furthermore, we provide

a dual certificate ensuring finite convergence of the hierarchy and showing

that the approximate optimal design can be computed exactly in polynomial time

thanks to our method. Finite convergence of the hierarchy can be standardly

certified numerically as well. As a byproduct, we revisit the equivalence

theorem of the experimental design theory: it is linked to the Christoffel

polynomial and it characterizes finite convergence of the moment-sum-of-square

hierarchies.

欢迎加入我爱机器学习QQ11群:191401275

arXiv Paper Daily: Wed, 14 Jun 2017

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

The Golden Ticket

The Golden Ticket

Lance Fortnow / Princeton University Press / 2013-3-31 / USD 26.95

The P-NP problem is the most important open problem in computer science, if not all of mathematics. The Golden Ticket provides a nontechnical introduction to P-NP, its rich history, and its algorithmi......一起来看看 《The Golden Ticket》 这本书的介绍吧!

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具