arXiv Paper Daily: Thu, 15 Jun 2017

栏目: 编程工具 · 发布时间: 7年前

内容简介:arXiv Paper Daily: Thu, 15 Jun 2017

Neural and Evolutionary Computing

A Fast Foveated Fully Convolutional Network Model for Human Peripheral Vision

Lex Fridman , Benedikt Jenik , Shaiyan Keshvari , Bryan Reimer , Christoph Zetzsche , Ruth Rosenholtz

Comments: NIPS 2017 submission



Neural and Evolutionary Computing (cs.NE)

; Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Visualizing the information available to a human observer in a single glance

at an image provides a powerful tool for evaluating models of full-field human

vision. The hard part is human-realistic visualization of the periphery.

Degradation of information with distance from fixation is far more complex than

a mere reduction of acuity that might be mimicked using blur with a standard

deviation that linearly increases with eccentricity. Rather,

behaviorally-validated models hypothesize that peripheral vision measures a

large number of local texture statistics in pooling regions that overlap, grow

with eccentricity, and tile the visual field. We propose a “foveated” variant

of a fully convolutional network that approximates one such model. Our approach

achieves a 21,000 fold reduction in average running time (from 4.2 hours to 0.7

seconds per image), and statistically similar results to the

behaviorally-validated model.

MATIC: Adaptation and In-situ Canaries for Energy-Efficient Neural Network Acceleration

Sung Kim , Patrick Howe , Thierry Moreau , Armin Alaghi , Luis Ceze , Visvesh Sathe Subjects : Neural and Evolutionary Computing (cs.NE)

We present MATIC (Memory-Adaptive Training and In-situ Canaries), a voltage

scaling methodology that addresses the SRAM efficiency bottleneck in DNN

accelerators. To overscale DNN weight SRAMs, MATIC combines specific

characteristics of destructive SRAM reads with the error resilience of neural

networks in a memory-adaptive training process. PVT-related voltage margins are

eliminated using bit-cells from synaptic weights as in-situ canaries to track

runtime environmental variation. Demonstrated on a low-power DNN accelerator

fabricated in 65nm CMOS, MATIC enables up to 3.3x total energy reduction, or

18.6x application error reduction.

Neural Models for Key Phrase Detection and Question Generation

Sandeep Subramanian , Tong Wang , Xingdi Yuan , Adam Trischler Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We propose several neural models arranged in a two-stage framework to tackle

question generation from documents. First, we estimate the probability of

“interesting” answers in a document using a neural model trained on a

question-answering corpus. The predicted key phrases are then used as answers

to condition a sequence-to-sequence question generation model. Empirically, our

neural key phrase detection models significantly outperform an entity-tagging

baseline system. We demonstrate that the question generator formulates good

quality natural language questions from extracted key phrases. The resulting

questions and answers can be used to assess reading comprehension in

educational settings.

Transfer entropy-based feedback improves performance in artificial neural networks

Sebastian Herzog , Christian Tetzlaff , Florentin Wörgötter Subjects : Learning (cs.LG) ; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)

The structure of the majority of modern deep neural networks is characterized

by uni- directional feed-forward connectivity across a very large number of

layers. By contrast, the architecture of the cortex of vertebrates contains

fewer hierarchical levels but many recurrent and feedback connections. Here we

show that a small, few-layer artificial neural network that employs feedback

will reach top level performance on a standard benchmark task, otherwise only

obtained by large feed-forward structures. To achieve this we use feed-forward

transfer entropy between neurons to structure feedback connectivity. Transfer

entropy can here intuitively be understood as a measure for the relevance of

certain pathways in the network, which are then amplified by feedback. Feedback

may therefore be key for high network performance in small brain-like


Adversarially Regularized Autoencoders for Generating Discrete Structures

Junbo (Jake)

Zhao , Yoon Kim , Kelly Zhang , Alexander M. Rush , Yann LeCun Subjects : Learning (cs.LG) ; Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Generative adversarial networks are an effective approach for learning rich

latent representations of continuous data, but have proven difficult to apply

directly to discrete structured data, such as text sequences or discretized

images. Ideally we could encode discrete structures in a continuous code space

to avoid this problem, but it is difficult to learn an appropriate

general-purpose encoder. In this work, we consider a simple approach for

handling these two challenges jointly, employing a discrete structure

autoencoder with a code space regularized by generative adversarial training.

The model learns a smooth regularized code space while still being able to

model the underlying data, and can be used as a discrete GAN with the ability

to generate coherent discrete outputs from continuous samples. We demonstrate

empirically how key properties of the data are captured in the model’s latent

space, and evaluate the model itself on the tasks of discrete image generation,

text generation, and semi-supervised learning.

Identifying Spatial Relations in Images using Convolutional Neural Networks

Mandar Haldekar , Ashwinkumar Ganesan , Tim Oates Subjects : Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Traditional approaches to building a large scale knowledge graph have usually

relied on extracting information (entities, their properties, and relations

between them) from unstructured text (e.g. Dbpedia). Recent advances in

Convolutional Neural Networks (CNN) allow us to shift our focus to learning

entities and relations from images, as they build robust models that require

little or no pre-processing of the images. In this paper, we present an

approach to identify and extract spatial relations (e.g., The girl is standing

behind the table) from images using CNNs. Our research addresses two specific

challenges: providing insight into how spatial relations are learned by the

network and which parts of the image are used to predict these relations. We

use the pre-trained network VGGNet to extract features from an image and train

a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09

dataset to extract spatial relations. The MLP predicts spatial relations

without a bounding box around the objects or the space in the image depicting

the relation. To understand how the spatial relations are represented in the

network, a heatmap is overlayed on the image to show the regions that are

deemed important by the network. Also, we analyze the MLP to show the

relationship between the activation of consistent groups of nodes and the

prediction of a spatial relation. We show how the loss of these groups affects

the networks ability to identify relations.

Computer Vision and Pattern Recognition

Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

Christian Rupprecht , Ansh Kapil , Nan Liu , Lamberto Ballan , Federico Tombari

Comments: Submitted to CVIU SI: Computer Vision and the Web



Computer Vision and Pattern Recognition (cs.CV)

Webly-supervised learning has recently emerged as an alternative paradigm to

traditional supervised learning based on large-scale datasets with manual

annotations. The key idea is that models such as CNNs can be learned from the

noisy visual data available on the web. In this work we aim to exploit web data

for video understanding tasks such as action recognition and detection. One of

the main problems in webly-supervised learning is cleaning the noisy labeled

data from the web. The state-of-the-art paradigm relies on training a first

classifier on noisy data that is then used to clean the remaining dataset. Our

key insight is that this procedure biases the second classifier towards samples

that the first one understands. Here we train two independent CNNs, a RGB

network on web images and video frames and a second network using temporal

information from optical flow. We show that training the networks independently

is vastly superior to selecting the frames for the flow classifier by using our

RGB network. Moreover, we show benefits in enriching the training set with

different data sources from heterogeneous public web databases. We demonstrate

that our framework outperforms all other webly-supervised methods on two public

benchmarks, UCF-101 and Thumos’14.

Learning local shape descriptors with view-based convolutional networks

Haibin Huang , Evangelos Kalogerakis , Siddhartha Chaudhuri , Duygu Ceylan , Vladimir G. Kim , Ersin Yumer Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR)

We present a new local descriptor for 3D shapes, directly applicable to a

wide range of shape analysis problems such as point correspondences, semantic

segmentation, affordance prediction, and shape-to-scan matching. Our key

insight is that the neighborhood of a point on a shape is effectively captured

at multiple scales by a succession of progressively zoomed out views, taken

from care fully selected camera positions. We propose a convolutional neural

network that uses local views around a point to embed it to a multidimensional

descriptor space, such that geometrically and semantically similar points are

close to one another. To train our network, we leverage two extremely large

sources of data. First, since our network processes 2D images, we repurpose

architectures pre-trained on massive image datasets. Second, we automatically

generate a synthetic dense correspondence dataset by part-aware, non-rigid

alignment of a massive collection of 3D models. As a result of these design

choices, our view-based architecture effectively encodes multi-scale local

context and fine-grained surface detail. We demonstrate through several

experiments that our learned local descriptors are more general and robust

compared to state of the art alternatives, and have a variety of applications

without any additional fine-tuning.

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

Manuk Akopyan (1), Eshsou Khashba (1) ((1) Institute for System Programming)

Comments: 6 pages, 5 figures, 3 tables



Computer Vision and Pattern Recognition (cs.CV)

Video classification problem has been studied many years. The success of

Convolutional Neural Networks (CNN) in image recognition tasks gives a powerful

incentive for researchers to create more advanced video classification

approaches. As video has a temporal content Long Short Term Memory (LSTM)

networks become handy tool allowing to model long-term temporal clues. Both

approaches need a large dataset of input data. In this paper three models

provided to address video classification using recently announced YouTube-8M

large-scale dataset. The first model is based on frame pooling approach. Two

other models based on LSTM networks. Mixture of Experts intermediate layer is

used in third model allowing to increase model capacity without dramatically

increasing computations. The set of experiments for handling imbalanced

training data has been conducted.

SalProp: Salient object proposals via aggregated edge cues

Prerana Mukherjee , Brejesh Lall , Sarvaswa Tandon

Comments: 5 pages, 4 figures, accepted at ICIP 2017



Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel object proposal generation scheme by

formulating a graph-based salient edge classification framework that utilizes

the edge context. In the proposed method, we construct a Bayesian probabilistic

edge map to assign a saliency value to the edgelets by exploiting low level

edge features. A Conditional Random Field is then learned to effectively

combine these features for edge classification with object/non-object label. We

propose an objectness score for the generated windows by analyzing the salient

edge density inside the bounding box. Extensive experiments on PASCAL VOC 2007

dataset demonstrate that the proposed method gives competitive performance

against 10 popular generic object detection techniques while using fewer number

of proposals.

(ν)-net: Deep Learning for Generalized Biventricular Cardiac Mass and Function Parameters

Hinrich B Winther , Christian Hundt , Bertil Schmidt , Christoph Czerner , Johann Bauersachs , Frank Wacker , Jens Vogel-Claussen Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (stat.ML)

Background: Cardiac MRI derived biventricular mass and function parameters,

such as end-systolic volume (ESV), end-diastolic volume (EDV), ejection

fraction (EF), stroke volume (SV), and ventricular mass (VM) are clinically

well established. Image segmentation can be challenging and time-consuming, due

to the complex anatomy of the human heart.

Objectives: This study introduces (

u)-net (/nju:n(varepsilon)t/) — a deep

learning approach allowing for fully-automated high quality segmentation of

right (RV) and left ventricular (LV) endocardium and epicardium for extraction

of cardiac function parameters.

Methods: A set consisting of 253 manually segmented cases has been used to

train a deep neural network. Subsequently, the network has been evaluated on 4

different multicenter data sets with a total of over 1000 cases.

Results: For LV EF the intraclass correlation coefficient (ICC) is 98, 95,

and 80 % (95 %), and for RV EF 96, and 87 % (80 %) on the respective data sets

(human expert ICCs reported in parenthesis). The LV VM ICC is 95, and 94 % (84

%), and the RV VM ICC is 83, and 83 % (54 %). This study proposes a simple

adjustment procedure, allowing for the adaptation to distinct segmentation

philosophies. (

u)-net exhibits state of-the-art performance in terms of dice


Conclusions: Biventricular mass and function parameters can be determined

reliably in high quality by applying a deep neural network for cardiac MRI

segmentation, especially in the anatomically complex right ventricle. Adaption

to individual segmentation styles by applying a simple adjustment procedure is

viable, allowing for the processing of novel data without time-consuming

additional training.

Alignment Distances on Systems of Bags

Alexander Sagel , Martin Kleinsteuber Subjects : Computer Vision and Pattern Recognition (cs.CV)

Recent research in image and video recognition indicates that many visual

processes can be thought of as being generated by a time-varying generative

model. A nearby descriptive model for visual processes is thus a statistical

distribution that varies over time. Specifically, modeling visual processes as

streams of histograms generated by a kernelized linear dynamic system turns out

to be efficient. We refer to such a model as a System of Bags. In this work, we

investigate Systems of Bags with special emphasis on dynamic scenes and dynamic

textures. Parameters of linear dynamic systems suffer from ambiguities. In

order to cope with these ambiguities in the kernelized setting, we develop a

kernelized version of the alignment distance. For its computation, we use a

Jacobi-type method and prove its convergence to a set of critical points. We

employ it as a dissimilarity measure on Systems of Bags. As such, it

outperforms other known dissimilarity measures for kernelized linear dynamic

systems, in particular the Martin Distance and the Maximum Singular Value

Distance, in every tested classification setting. A considerable margin can be

observed in settings, where classification is performed with respect to an

abstract mean of video sets. For this scenario, the presented approach can

outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or

Orthogonal Tensor Dictionary Learning.

Shape-Color Differential Moment Invariants under Affine Transformations

Hanlin Mo , Shirui Li , You Hao , Hua Li

Comments: 13 pages, 4 figures



Computer Vision and Pattern Recognition (cs.CV)

We propose the general construction formula of shape-color primitives by

using partial differentials of each color channel in this paper. By using all

kinds of shape-color primitives, shape-color differential moment invariants can

be constructed very easily, which are invariant to the shape affine and color

affine transforms. 50 instances of SCDMIs are obtained finally. In experiments,

several commonly used color descriptors and SCDMIs are used in image

classification and retrieval of color images, respectively. By comparing the

experimental results, we find that SCDMIs get better results.

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection

Zhe Wang , Yanxin Yin , Jianping Shi , Wei Fang , Hongsheng Li , Xiaogang Wang

Comments: accepted by MICCAI 2017



Computer Vision and Pattern Recognition (cs.CV)

We propose a convolution neural network based algorithm for simultaneously

diagnosing diabetic retinopathy and highlighting suspicious regions. Our

contributions are two folds: 1) a network termed Zoom-in-Net which mimics the

zoom-in process of a clinician to examine the retinal images. Trained with only

image-level supervisions, Zoomin-Net can generate attention maps which

highlight suspicious regions, and predicts the disease level accurately based

on both the whole image and its high resolution suspicious patches. 2) Only

four bounding boxes generated from the automatically learned attention maps are

enough to cover 80% of the lesions labeled by an experienced ophthalmologist,

which shows good localization ability of the attention maps. By clustering

features at high response locations on the attention maps, we discover

meaningful clusters which contain potential lesions in diabetic retinopathy.

Experiments show that our algorithm outperform the state-of-the-art methods on

two datasets, EyePACS and Messidor.

Hierarchical Gaussian Descriptors with Application to Person Re-Identification

Tetsu Matsukawa , Takahiro Okabe , Einoshin Suzuki , Yoichi Sato

Comments: 14 pages, 12 figures, 4 tables



Computer Vision and Pattern Recognition (cs.CV)

Describing the color and textural information of a person image is one of the

most crucial aspects of person re-identification (re-id). In this paper, we

present novel meta-descriptors based on a hierarchical distribution of pixel

features. Although hierarchical covariance descriptors have been successfully

applied to image classification, the mean information of pixel features, which

is absent from the covariance, tends to be the major discriminative information

for person re-id. To solve this problem, we describe a local region in an image

via hierarchical Gaussian distribution in which both means and covariances are

included in their parameters. More specifically, the region is modeled as a set

of multiple Gaussian distributions in which each Gaussian represents the

appearance of a local patch. The characteristics of the set of Gaussians are

again described by another Gaussian distribution. In both steps, we embed the

parameters of the Gaussian into a point of Symmetric Positive Definite (SPD)

matrix manifold. By changing the way to handle mean information in this

embedding, we develop two hierarchical Gaussian descriptors. Additionally, we

develop feature norm normalization methods with the ability to alleviate the

biased trends that exist on the descriptors. The experimental results conducted

on five public datasets indicate that the proposed descriptors achieve

remarkably high performance on person re-id.

Teaching Compositionality to CNNs

Austin Stone , Huayan Wang , Michael Stark , Yi Liu , D. Scott Phoenix , Dileep George

Comments: Preprint appearing in CVPR 2017



Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

Convolutional neural networks (CNNs) have shown great success in computer

vision, approaching human-level performance when trained for specific tasks via

application-specific loss functions. In this paper, we propose a method for

augmenting and training CNNs so that their learned features are compositional.

It encourages networks to form representations that disentangle objects from

their surroundings and from each other, thereby promoting better

generalization. Our method is agnostic to the specific details of the

underlying CNN to which it is applied and can in principle be used with any

CNN. As we show in our experiments, the learned representations lead to feature

activations that are more localized and improve performance over

non-compositional baselines in object recognition tasks.

Photo-realistic Facial Texture Transfer

Parneet Kaur , Hang Zhang , Kristin J. Dana Subjects : Computer Vision and Pattern Recognition (cs.CV)

Style transfer methods have achieved significant success in recent years with

the use of convolutional neural networks. However, many of these methods

concentrate on artistic style transfer with few constraints on the output image

appearance. We address the challenging problem of transferring face texture

from a style face image to a content face image in a photorealistic manner

without changing the identity of the original content image. Our framework for

face texture transfer (FaceTex) augments the prior work of MRF-CNN with a novel

facial semantic regularization that incorporates a face prior regularization

smoothly suppressing the changes around facial meso-structures (e.g eyes, nose

and mouth) and a facial structure loss function which implicitly preserves the

facial structure so that face texture can be transferred without changing the

original identity. We demonstrate results on face images and compare our

approach with recent state-of-the-art methods. Our results demonstrate superior

texture transfer because of the ability to maintain the identity of the

original face image.

Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks

Jia Ding , Aoxue Li , Zhiqiang Hu , Liwei Wang

Comments: MICCAI 2017 accepted



Computer Vision and Pattern Recognition (cs.CV)

Early detection of pulmonary cancer is the most promising way to enhance a

patient’s chance for survival. Accurate pulmonary nodule detection in computed

tomography (CT) images is a crucial step in diagnosing pulmonary cancer. In

this paper, inspired by the successful use of deep convolutional neural

networks (DCNNs) in natural image recognition, we propose a novel pulmonary

nodule detection approach based on DCNNs. We first introduce a deconvolutional

structure to Faster Region-based Convolutional Neural Network (Faster R-CNN)

for candidate detection on axial slices. Then, a three-dimensional DCNN is

presented for the subsequent false positive reduction. Experimental results of

the LUng Nodule Analysis 2016 (LUNA16) Challenge demonstrate the superior

detection performance of the proposed approach on nodule detection (average

FROC-score of 0.893, ranking the 1st place over all submitted results), which

outperforms the best result on the leaderboard of the LUNA16 Challenge (average

FROC-score of 0.864).

Saliency detection by aggregating complementary background template with optimization framework

Chenxing Xia , Hanling Zhang , Xiuju Gao

Comments: 28 pages,10 figures



Computer Vision and Pattern Recognition (cs.CV)

This paper proposes an unsupervised bottom-up saliency detection approach by

aggregating complementary background template with refinement. Feature vectors

are extracted from each superpixel to cover regional color, contrast and

texture information. By using these features, a coarse detection for salient

region is realized based on background template achieved by different

combinations of boundary regions instead of only treating four boundaries as

background. Then, by ranking the relevance of the image nodes with foreground

cues extracted from the former saliency map, we obtain an improved result.

Finally, smoothing operation is utilized to refine the foreground-based

saliency map to improve the contrast between salient and non-salient regions

until a close to binary saliency map is reached. Experimental results show that

the proposed algorithm generates more accurate saliency maps and performs

favorably against the state-off-the-art saliency detection methods on four

publicly available datasets.

When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach

Ding Liu , Bihan Wen , Xianming Liu , Thomas S. Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Conventionally, image denoising and high-level vision tasks are handled

separately in computer vision, and their connection is fragile. In this paper,

we cope with the two jointly and explore the mutual influence between them,

with the focus on two questions, namely (1) how image denoising can help

solving high-level vision problems, and (2) how the semantic information from

high-level vision tasks can be used to guide image denoising. We propose a deep

convolutional neural network solution that cascades two modules for image

denoising and various high level tasks, respectively, and propose the use of

joint loss for training to allow the semantic information flowing into the

optimization of the denoising network via back-propagation. Our experimental

results demonstrate that the proposed architecture not only yields superior

image denoising results preserving fine details, but also overcomes the

performance degradation of different high-level vision tasks, e.g., image

classification and semantic segmentation, due to image noise or artifacts

caused by conventional denoising approaches such as over-smoothing.

AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces

Mahmoud Afifi , Abdelrahman Abdelhamed

Comments: submitted to Journal of Visual Communication and Image Representation. 26 pages, 7 figures, 7 tables



Computer Vision and Pattern Recognition (cs.CV)

Gender classification aims at recognizing a person’s gender. Despite the high

accuracy achieved by state-of-the-art methods for this task, there still room

for improvement in generalized and unrestricted datasets. In this paper, we

advocate a new strategy inspired by the behavior of humans in gender

recognition. Instead of dealing with the face image as a sole feature, we rely

on the combination of isolated facial features and a holistic feature which we

call the foggy face. Then, we use these features to train deep convolutional

neural networks followed by an AdaBoost-based score fusion to infer the final

gender class. We evaluate our method on four challenging datasets to

demonstrate its efficacy in achieving better or on-par accuracy with

state-of-the-art methods. In addition, we present a new face dataset that

intensifies the challenges of occluded faces and illumination changes, which we

believe to be a much-needed resource for gender classification research.

Action Search: Learning to Search for Human Activities in Untrimmed Videos

Humam Alwassel , Fabian Caba Heilbron , Bernard Ghanem (King Abdullah University of Science and Technology (KAUST))

Comments: 9 pages, 9 figures



Computer Vision and Pattern Recognition (cs.CV)

Traditional approaches for action detection use trimmed data to learn

sophisticated action detector models. Although these methods have achieved

great success at detecting human actions, we argue that huge information is

discarded when ignoring the process, through which this trimmed data is

obtained. In this paper, we propose Action Search, a novel approach that mimics

the way people annotate activities in video sequences. Using a Recurrent Neural

Network, Action Search can efficiently explore a video and determine the time

boundaries during which an action occurs. Experiments on the THUMOS14 dataset

reveal that our model is not only able to explore the video efficiently but

also accurately find human activities, outperforming state-of-the-art methods.

von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification

Md. Abul Hasnat , Julien Bohné , Jonathan Milgram , Stéphane Gentric , Liming Chen

Comments: Under review



Computer Vision and Pattern Recognition (cs.CV)

A number of pattern recognition tasks, e.g., face verification, can be boiled

down to classification or clustering of unit length directional feature vectors

whose distance can be simply computed by their angle. In this paper, we propose

the von Mises-Fisher (vMF) mixture model as the theoretical foundation for an

effective deep-learning of such directional features and derive a novel vMF

Mixture Loss and its corresponding vMF deep features. The proposed vMF features

learning achieves a discriminative learning, i.e., compacting the instances of

the same class while increasing the distance of instances from different

classes, and subsumes a number of loss functions or deep learning practice,

e.g., normalization. The experiments carried out on face verification using 4

different challenging face datasets, i.e., LFW, IJB-A, YouTube Faces and CACD,

show the effectiveness of the proposed approach, which displays very

competitive and state-of-the-art results.

The "something something" video database for learning and evaluating visual common sense

Raghav Goyal , Samira Kahou , Vincent Michalski , Joanna Materzyńska , Susanne Westphal , Heuna Kim , Valentin Haenel , Ingo Fruend , Peter Yianilos , Moritz Mueller-Freitag , Florian Hoppe , Christian Thurau , Ingo Bax , Roland Memisevic Subjects : Computer Vision and Pattern Recognition (cs.CV)

Neural networks trained on datasets such as ImageNet have led to major

advances in visual object classification. One obstacle that prevents networks

from reasoning more deeply about complex scenes and situations, and from

integrating visual knowledge with natural language, like humans do, is their

lack of common sense knowledge about the physical world. Videos, unlike still

images, contain a wealth of detailed information about the physical world.

However, most labelled video datasets represent high-level concepts rather than

detailed physical aspects about actions and scenes. In this work, we describe

our ongoing collection of the “something-something” database of video

prediction tasks whose solutions require a common sense understanding of the

depicted situation. The database currently contains more than 100,000 videos

across 174 classes, which are defined as caption-templates. We also describe

the challenges in crowd-sourcing this data at scale.

Online Convolutional Dictionary Learning for Multimodal Imaging

Kevin Degraux , Ulugbek S. Kamilov , Petros T. Boufounos , Dehong Liu Subjects : Computer Vision and Pattern Recognition (cs.CV)

Computational imaging methods that can exploit multiple modalities have the

potential to enhance the capabilities of traditional sensing systems. In this

paper, we propose a new method that reconstructs multimodal images from their

linear measurements by exploiting redundancies across different modalities. Our

method combines a convolutional group-sparse representation of images with

total variation (TV) regularization for high-quality multimodal imaging. We

develop an online algorithm that enables the unsupervised learning of

convolutional dictionaries on large-scale datasets that are typical in such

applications. We illustrate the benefit of our approach in the context of joint

intensity-depth imaging.

Automatic Localization of Deep Stimulation Electrodes Using Trajectory-based Segmentation Approach

Roger Gomez Nieto , Andres Marino Alvarez Meza , Julian David Echeverry Correa , Alvaro Angel Orozco Gutierrez

Comments: 13 pages, 5 figures



Computer Vision and Pattern Recognition (cs.CV)

; Neurons and Cognition (q-bio.NC)

Parkinson’s disease (PD) is a degenerative condition of the nervous system,

which manifests itself primarily as muscle stiffness, hypokinesia,

bradykinesia, and tremor. In patients suffering from advanced stages of PD,

Deep Brain Stimulation neurosurgery (DBS) is the best alternative to medical

treatment, especially when they become tolerant to the drugs. This surgery

produces a neuronal activity, a result from electrical stimulation, whose

quantification is known as Volume of Tissue Activated (VTA). To locate

correctly the VTA in the cerebral volume space, one should be aware exactly the

location of the tip of the DBS electrodes, as well as their spatial projection.

In this paper, we automatically locate DBS electrodes using a threshold-based

medical imaging segmentation methodology, determining the optimal value of this

threshold adaptively. The proposed methodology allows the localization of DBS

electrodes in Computed Tomography (CT) images, with high noise tolerance, using

automatic threshold detection methods.

Deep Learning Methods for Efficient Large Scale Video Labeling

Miha Skalic , Marcin Pekalski , Xingguo E. Pan

Comments: 7 pages, 5 tables, 1 figure



Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We present a solution to “Google Cloud and YouTube-8M Video Understanding

Challenge” that ranked 5th place. The proposed model is an ensemble of three

model families, two frame level and one video level. The training was performed

on augmented dataset, with cross validation.

A Fast Foveated Fully Convolutional Network Model for Human Peripheral Vision

Lex Fridman , Benedikt Jenik , Shaiyan Keshvari , Bryan Reimer , Christoph Zetzsche , Ruth Rosenholtz

Comments: NIPS 2017 submission



Neural and Evolutionary Computing (cs.NE)

; Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Visualizing the information available to a human observer in a single glance

at an image provides a powerful tool for evaluating models of full-field human

vision. The hard part is human-realistic visualization of the periphery.

Degradation of information with distance from fixation is far more complex than

a mere reduction of acuity that might be mimicked using blur with a standard

deviation that linearly increases with eccentricity. Rather,

behaviorally-validated models hypothesize that peripheral vision measures a

large number of local texture statistics in pooling regions that overlap, grow

with eccentricity, and tile the visual field. We propose a “foveated” variant

of a fully convolutional network that approximates one such model. Our approach

achieves a 21,000 fold reduction in average running time (from 4.2 hours to 0.7

seconds per image), and statistically similar results to the

behaviorally-validated model.

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

Yu-Gang Jiang , Zuxuan Wu , Jinhui Tang , Zechao Li , Xiangyang Xue , Shih-Fu Chang Subjects : Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)

Videos are inherently multimodal. This paper studies the problem of how to

fully exploit the abundant multimodal clues for improved video categorization.

We introduce a hybrid deep learning framework that integrates useful clues from

multiple modalities, including static spatial appearance information, motion

patterns within a short time window, audio information as well as long-range

temporal dynamics. More specifically, we utilize three Convolutional Neural

Networks (CNNs) operating on appearance, motion and audio signals to extract

their corresponding features. We then employ a feature fusion network to derive

a unified representation with an aim to capture the relationships among

features. Furthermore, to exploit the long-range temporal dynamics in videos,

we apply two Long Short Term Memory networks with extracted appearance and

motion features as inputs. Finally, we also propose to refine the prediction

scores by leveraging contextual relationships among video semantics. The hybrid

deep learning framework is able to exploit a comprehensive set of multimodal

features for video classification. Through an extensive set of experiments, we

demonstrate that (1) LSTM networks which model sequences in an explicitly

recurrent manner are highly complementary with CNN models; (2) the feature

fusion network which produces a fused representation through modeling feature

relationships outperforms alternative fusion strategies; (3) the semantic

context of video classes can help further refine the predictions for improved

performance. Experimental results on two challenging benchmarks, the UCF-101

and the Columbia Consumer Videos (CCV), provide strong quantitative evidence

that our framework achieves promising results: (93.1\%) on the UCF-101 and

(84.5\%) on the CCV, outperforming competing methods with clear margins.

Enhanced discrete particle swarm optimization path planning for UAV vision-based surface inspection

Manh Duong Phung , Cong Hoang Quach , Tran Hiep Dinh , Quang Ha

Journal-ref: Automation in Construction, Vol.81, pp.25-33 (2017)



Robotics (cs.RO)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In built infrastructure monitoring, an efficient path planning algorithm is

essential for robotic inspection of large surfaces using computer vision. In

this work, we first formulate the inspection path planning problem as an

extended travelling salesman problem (TSP) in which both the coverage and

obstacle avoidance were taken into account. An enhanced discrete particle swarm

optimization (DPSO) algorithm is then proposed to solve the TSP, with

performance improvement by using deterministic initialization, random mutation,

and edge exchange. Finally, we take advantage of parallel computing to

implement the DPSO in a GPU-based framework so that the computation time can be

significantly reduced while keeping the hardware requirement unchanged. To show

the effectiveness of the proposed algorithm, experimental results are included

for datasets obtained from UAV inspection of an office building and a bridge.

Identifying Spatial Relations in Images using Convolutional Neural Networks

Mandar Haldekar , Ashwinkumar Ganesan , Tim Oates Subjects : Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Traditional approaches to building a large scale knowledge graph have usually

relied on extracting information (entities, their properties, and relations

between them) from unstructured text (e.g. Dbpedia). Recent advances in

Convolutional Neural Networks (CNN) allow us to shift our focus to learning

entities and relations from images, as they build robust models that require

little or no pre-processing of the images. In this paper, we present an

approach to identify and extract spatial relations (e.g., The girl is standing

behind the table) from images using CNNs. Our research addresses two specific

challenges: providing insight into how spatial relations are learned by the

network and which parts of the image are used to predict these relations. We

use the pre-trained network VGGNet to extract features from an image and train

a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09

dataset to extract spatial relations. The MLP predicts spatial relations

without a bounding box around the objects or the space in the image depicting

the relation. To understand how the spatial relations are represented in the

network, a heatmap is overlayed on the image to show the regions that are

deemed important by the network. Also, we analyze the MLP to show the

relationship between the activation of consistent groups of nodes and the

prediction of a spatial relation. We show how the loss of these groups affects

the networks ability to identify relations.

Artificial Intelligence

The Opacity of Backbones and Backdoors Under a Weak Assumption

Lane A. Hemaspaandra , David E. Narváez Subjects : Artificial Intelligence (cs.AI) ; Computational Complexity (cs.CC); Logic in Computer Science (cs.LO)

Backdoors and backbones of Boolean formulas are hidden structural properties

that are relevant to the analysis of the hardness of instances of the SAT

problem. The development and analysis of algorithms to find and make use of

these properties is thus useful to improve the performance of modern solvers

and our general understanding of SAT. In this work we show that, under the

assumption that P(

eq)NP, there are easily-recognizable sets of Boolean

formulas for which it is hard to determine whether they have a backbone. We

also show that, under the same assumption, there are easily-recognizable

families of Boolean formulas with strong backdoors that are easy to find, for

which it is hard to determine whether they are satisfiable or not.

Simultaneous merging multiple grid maps using the robust motion averaging

Zutao Jiang , Jihua Zhu , Yaochen Li , Zhongyu Li , Huimin Lu Subjects : Artificial Intelligence (cs.AI) ; Robotics (cs.RO)

Mapping in the GPS-denied environment is an important and challenging task in

the field of robotics. In the large environment, mapping can be significantly

accelerated by multiple robots exploring different parts of the environment.

Accordingly, a key problem is how to integrate these local maps built by

different robots into a single global map. In this paper, we propose an

approach for simultaneous merging of multiple grid maps by the robust motion

averaging. The main idea of this approach is to recover all global motions for

map merging from a set of relative motions. Therefore, it firstly adopts the

pair-wise map merging method to estimate relative motions for grid map pairs.

To obtain as many reliable relative motions as possible, a graph-based sampling

scheme is utilized to efficiently remove unreliable relative motions obtained

from the pair-wise map merging. Subsequently, the accurate global motions can

be recovered from the set of reliable relative motions by the motion averaging.

Experimental results carried on real robot data sets demonstrate that proposed

approach can achieve simultaneous merging of multiple grid maps with good


Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics

Ken Kansky , Tom Silver , David A. Mély , Mohamed Eldawy , Miguel Lázaro-Gredilla , Xinghua Lou , Nimrod Dorfman , Szymon Sidor , Scott Phoenix , Dileep George Subjects : Artificial Intelligence (cs.AI)

The recent adaptation of deep neural network-based methods to reinforcement

learning and planning domains has yielded remarkable progress on individual

tasks. Nonetheless, progress on task-to-task transfer remains limited. In

pursuit of efficient and robust generalization, we introduce the Schema

Network, an object-oriented generative physics simulator capable of

disentangling multiple causes of events and reasoning backward through causes

to achieve goals. The richly structured architecture of the Schema Network can

learn the dynamics of an environment directly from data. We compare Schema

Networks with Asynchronous Advantage Actor-Critic and Progressive Networks on a

suite of Breakout variations, reporting results on training efficiency and

zero-shot generalization, consistently demonstrating faster, more robust

learning and better transfer. We argue that generalizing from limited data and

learning causal relationships are essential abilities on the path toward

generally intelligent systems.

Identifying Spatial Relations in Images using Convolutional Neural Networks

Mandar Haldekar , Ashwinkumar Ganesan , Tim Oates Subjects : Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Traditional approaches to building a large scale knowledge graph have usually

relied on extracting information (entities, their properties, and relations

between them) from unstructured text (e.g. Dbpedia). Recent advances in

Convolutional Neural Networks (CNN) allow us to shift our focus to learning

entities and relations from images, as they build robust models that require

little or no pre-processing of the images. In this paper, we present an

approach to identify and extract spatial relations (e.g., The girl is standing

behind the table) from images using CNNs. Our research addresses two specific

challenges: providing insight into how spatial relations are learned by the

network and which parts of the image are used to predict these relations. We

use the pre-trained network VGGNet to extract features from an image and train

a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09

dataset to extract spatial relations. The MLP predicts spatial relations

without a bounding box around the objects or the space in the image depicting

the relation. To understand how the spatial relations are represented in the

network, a heatmap is overlayed on the image to show the regions that are

deemed important by the network. Also, we analyze the MLP to show the

relationship between the activation of consistent groups of nodes and the

prediction of a spatial relation. We show how the loss of these groups affects

the networks ability to identify relations.

Neural Models for Key Phrase Detection and Question Generation

Sandeep Subramanian , Tong Wang , Xingdi Yuan , Adam Trischler Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We propose several neural models arranged in a two-stage framework to tackle

question generation from documents. First, we estimate the probability of

“interesting” answers in a document using a neural model trained on a

question-answering corpus. The predicted key phrases are then used as answers

to condition a sequence-to-sequence question generation model. Empirically, our

neural key phrase detection models significantly outperform an entity-tagging

baseline system. We demonstrate that the question generator formulates good

quality natural language questions from extracted key phrases. The resulting

questions and answers can be used to assess reading comprehension in

educational settings.

Learning and Evaluating Musical Features with Deep Autoencoders

Mason Bretan , Sageev Oore , Doug Eck , Larry Heck Subjects : Sound (cs.SD) ; Artificial Intelligence (cs.AI)

In this work we describe and evaluate methods to learn musical embeddings.

Each embedding is a vector that represents four contiguous beats of music and

is derived from a symbolic representation. We consider autoencoding-based

methods including denoising autoencoders, and context reconstruction, and

evaluate the resulting embeddings on a forward prediction and a classification


Enhanced discrete particle swarm optimization path planning for UAV vision-based surface inspection

Manh Duong Phung , Cong Hoang Quach , Tran Hiep Dinh , Quang Ha

Journal-ref: Automation in Construction, Vol.81, pp.25-33 (2017)



Robotics (cs.RO)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In built infrastructure monitoring, an efficient path planning algorithm is

essential for robotic inspection of large surfaces using computer vision. In

this work, we first formulate the inspection path planning problem as an

extended travelling salesman problem (TSP) in which both the coverage and

obstacle avoidance were taken into account. An enhanced discrete particle swarm

optimization (DPSO) algorithm is then proposed to solve the TSP, with

performance improvement by using deterministic initialization, random mutation,

and edge exchange. Finally, we take advantage of parallel computing to

implement the DPSO in a GPU-based framework so that the computation time can be

significantly reduced while keeping the hardware requirement unchanged. To show

the effectiveness of the proposed algorithm, experimental results are included

for datasets obtained from UAV inspection of an office building and a bridge.

Optimization by a quantum reinforcement algorithm

A. Ramezanpour

Comments: 11 pages, 5 figures



Disordered Systems and Neural Networks (cond-mat.dis-nn)

; Statistical Mechanics (cond-mat.stat-mech); Artificial Intelligence (cs.AI); Learning (cs.LG); Quantum Physics (quant-ph)

A reinforcement algorithm solves a classical optimization problem by

introducing a feedback to the system which slowly changes the energy landscape

and converges the algorithm to an optimal solution in the configuration space.

Here, we use this strategy to concentrate (localize) preferentially the wave

function of a quantum particle, which explores the configuration space of the

problem, on an optimal configuration. We examine the method by solving

numerically the equations governing the evolution of the system, which are

similar to the nonlinear Schr”odinger equations, for small problem sizes. In

particular, we observe that reinforcement increases the minimal energy gap of

the system in a quantum annealing algorithm. Our numerical simulations and the

latter observation show that such kind of quantum feedbacks might be helpful in

solving a computationally hard optimization problem by a quantum reinforcement


Autonomous Reactive Mission Scheduling and Task-Path Planning Architecture for Autonomous Underwater Vehicle

Somaiyeh Mahmoud.Zadeh

Comments: Thesis of PhD completed at Flinders University of South Australia, 2017



Robotics (cs.RO)

; Artificial Intelligence (cs.AI)

An Autonomous Underwater Vehicle (AUV) should carry out complex tasks in a

limited time interval. Since existing AUVs have limited battery capacity and

restricted endurance, they should autonomously manage mission time and the

resources to perform effective persistent deployment in longer missions. Task

assignment requires making decisions subject to resource constraints, while

tasks are assigned with costs and/or values that are budgeted in advance. Tasks

are distributed in a particular operation zone and mapped by a waypoint covered

network. Thus, design an efficient routing-task priority assign framework

considering vehicle’s availabilities and properties is essential for increasing

mission productivity and on-time mission completion. This depends strongly on

the order and priority of the tasks that are located between node-like

waypoints in an operation network. On the other hand, autonomous operation of

AUVs in an unfamiliar dynamic underwater and performing quick response to

sudden environmental changes is a complicated process. Water current

instabilities can deflect the vehicle to an undesired direction and perturb

AUVs safety. The vehicle’s robustness to strong environmental variations is

extremely crucial for its safe and optimum operations in an uncertain and

dynamic environment. To this end, the AUV needs to have a general overview of

the environment in top level to perform an autonomous action selection (task

selection) and a lower level local motion planner to operate successfully in

dealing with continuously changing situations. This research deals with

developing a novel reactive control architecture to provide a higher level of

decision autonomy for the AUV operation that enables a single vehicle to

accomplish multiple tasks in a single mission in the face of periodic

disturbances in a turbulent and highly uncertain environment.

Information Retrieval

Hybrid Collaborative Recommendation via Semi-AutoEncoder

Shuai Zhang , Lina Yao , Xiwei Xu , Sen Wang , Liming Zhu

Comments: 10 pages



Information Retrieval (cs.IR)

In this paper, we first present a novel structure of AutoEncoder, namely

Semi-AutoEncoder. We generalize it into a designated hybrid collaborative

filtering model, which is able to predict ratings as well as to generate

personalized top-N recommendations. Experimental results on two real-world

datasets demonstrate its state-of-the-art performances.

Evaluating Personal Assistants on Mobile devices

Julia Kiseleva , Maarten de Rijke Subjects : Human-Computer Interaction (cs.HC) ; Information Retrieval (cs.IR)

The iPhone was introduced only a decade ago in 2007 but has fundamentally

changed the way we interact with online information. Mobile devices differ

radically from classic command-based and point-and-click user interfaces, now

allowing for gesture-based interaction using fine-grained touch and swipe

signals. Due to the rapid growth in the use of voice-controlled intelligent

personal assistants on mobile devices, such as Microsoft’s Cortana, Google Now,

and Apple’s Siri, mobile devices have become personal, allowing us to be online

all the time, and assist us in any task, both in work and in our daily lives,

making context a crucial factor to consider.

Mobile usage is now exceeding desktop usage, and is still growing at a rapid

rate, yet our main ways of training and evaluating personal assistants are

still based on (and framed in) classical desktop interactions, focusing on

explicit queries, clicks, and dwell time spent. However, modern user

interaction with mobile devices is radically different due to touch screens

with a gesture- and voice-based control and the varying context of use, e.g.,

in a car, by bike, often invalidating the assumptions underlying today’s user

satisfaction evaluation.

There is an urgent need to understand voice- and gesture-based interaction,

taking all interaction signals and context into account in appropriate ways. We

propose a research agenda for developing methods to evaluate and improve

context-aware user satisfaction with mobile interactions using gesture-based

signals at scale.

Identifying Condition-Action Statements in Medical Guidelines Using Domain-Independent Features

Hossein Hematialam , Wlodek Zadrozny Subjects : Computation and Language (cs.CL) ; Information Retrieval (cs.IR)

This paper advances the state of the art in text understanding of medical

guidelines by releasing two new annotated clinical guidelines datasets, and

establishing baselines for using machine learning to extract condition-action

pairs. In contrast to prior work that relies on manually created rules, we

report experiment with several supervised machine learning techniques to

classify sentences as to whether they express conditions and actions. We show

the limitations and possible extensions of this work on text mining of medical


Computation and Language

Neural Models for Key Phrase Detection and Question Generation

Sandeep Subramanian , Tong Wang , Xingdi Yuan , Adam Trischler Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We propose several neural models arranged in a two-stage framework to tackle

question generation from documents. First, we estimate the probability of

“interesting” answers in a document using a neural model trained on a

question-answering corpus. The predicted key phrases are then used as answers

to condition a sequence-to-sequence question generation model. Empirically, our

neural key phrase detection models significantly outperform an entity-tagging

baseline system. We demonstrate that the question generator formulates good

quality natural language questions from extracted key phrases. The resulting

questions and answers can be used to assess reading comprehension in

educational settings.

Idea density for predicting Alzheimer's disease from transcribed speech

Kairit Sirts , Olivier Piguet , Mark Johnson

Comments: CoNLL 2017



Computation and Language (cs.CL)

Idea Density (ID) measures the rate at which ideas or elementary predications

are expressed in an utterance or in a text. Lower ID is found to be associated

with an increased risk of developing Alzheimer’s disease (AD) (Snowdon et al.,

1996; Engelman et al., 2010). ID has been used in two different versions:

propositional idea density (PID) counts the expressed ideas and can be applied

to any text while semantic idea density (SID) counts pre-defined information

content units and is naturally more applicable to normative domains, such as

picture description tasks. In this paper, we develop DEPID, a novel

dependency-based method for computing PID, and its version DEPID-R that enables

to exclude repeating ideas—a feature characteristic to AD speech. We conduct

the first comparison of automatically extracted PID and SID in the diagnostic

classification task on two different AD datasets covering both closed-topic and

free-recall domains. While SID performs better on the normative dataset, adding

PID leads to a small but significant improvement (+1.7 F-score). On the

free-topic dataset, PID performs better than SID as expected (77.6 vs 72.3 in

F-score) but adding the features derived from the word embedding clustering

underlying the automatic SID increases the results considerably, leading to an

F-score of 84.8.

Fine-grained human evaluation of neural versus phrase-based machine translation

Filip Klubička , Antonio Toral , Víctor M. Sánchez-Cartagena

Comments: 12 pages, 2 figures, The Prague Bulletin of Mathematical Linguistics

Journal-ref: The Prague Bulletin of Mathematical Linguistics No. 108, pp.

121-132 (2017)



Computation and Language (cs.CL)

We compare three approaches to statistical machine translation (pure

phrase-based, factored phrase-based and neural) by performing a fine-grained

manual evaluation via error annotation of the systems’ outputs. The error types

in our annotation are compliant with the multidimensional quality metrics

(MQM), and the annotation is performed by two annotators. Inter-annotator

agreement is high for such a task, and results show that the best performing

system (neural) reduces the errors produced by the worst system (phrase-based)

by 54%.

Transfer Learning for Neural Semantic Parsing

Xing Fan , Emilio Monti , Lambert Mathias , Markus Dreyer

Comments: Accepted for ACL Repl4NLP 2017



Computation and Language (cs.CL)

; Learning (cs.LG)

The goal of semantic parsing is to map natural language to a machine

interpretable meaning representation language (MRL). One of the constraints

that limits full exploration of deep learning technologies for semantic parsing

is the lack of sufficient annotation training data. In this paper, we propose

using sequence-to-sequence in a multi-task setup for semantic parsing with a

focus on transfer learning. We explore three multi-task architectures for

sequence-to-sequence modeling and compare their performance with an

independently trained model. Our experiments show that the multi-task setup

aids transfer learning from an auxiliary task with large labeled data to a

target task with smaller labeled data. We see absolute accuracy gains ranging

from 1.0% to 4.4% in our in- house data set, and we also see good gains ranging

from 2.5% to 7.0% on the ATIS semantic parsing tasks with syntactic and

semantic auxiliary tasks.

Identifying Condition-Action Statements in Medical Guidelines Using Domain-Independent Features

Hossein Hematialam , Wlodek Zadrozny Subjects : Computation and Language (cs.CL) ; Information Retrieval (cs.IR)

This paper advances the state of the art in text understanding of medical

guidelines by releasing two new annotated clinical guidelines datasets, and

establishing baselines for using machine learning to extract condition-action

pairs. In contrast to prior work that relies on manually created rules, we

report experiment with several supervised machine learning techniques to

classify sentences as to whether they express conditions and actions. We show

the limitations and possible extensions of this work on text mining of medical


Is Natural Language Strongly Nonergodic? A Stronger Theorem about Facts and Words

Łukasz Dębowski

Comments: 20 pages, 1 figure



Information Theory (cs.IT)

; Computation and Language (cs.CL)

As we discuss, a stationary stochastic process is nonergodic when a random

persistent topic can be detected in the infinite random text sampled from the

process, whereas we call the process strongly nonergodic when an infinite

sequence of random bits, called random facts, is needed to describe this topic

completely. Whereas natural language has been often supposed to be nonergodic,

we exhibit some indirect evidence that natural language may be also strongly

nonergodic. First, we present a surprising assertion, which we call the theorem

about facts and words. This proposition states that the number of random facts

which can be inferred from a finite text sampled from a stationary process must

be roughly smaller than the number of word-like strings detected in this text

by the PPM compression algorithm. Second, we observe that the number of the

word-like strings for some texts in natural language follows an empirical

stepwise power law. In view of both observations, the number of inferrable

facts for natural language may also follow a power law, i.e., natural language

may be strongly nonergodic.

Adversarially Regularized Autoencoders for Generating Discrete Structures

Junbo (Jake)

Zhao , Yoon Kim , Kelly Zhang , Alexander M. Rush , Yann LeCun Subjects : Learning (cs.LG) ; Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Generative adversarial networks are an effective approach for learning rich

latent representations of continuous data, but have proven difficult to apply

directly to discrete structured data, such as text sequences or discretized

images. Ideally we could encode discrete structures in a continuous code space

to avoid this problem, but it is difficult to learn an appropriate

general-purpose encoder. In this work, we consider a simple approach for

handling these two challenges jointly, employing a discrete structure

autoencoder with a code space regularized by generative adversarial training.

The model learns a smooth regularized code space while still being able to

model the underlying data, and can be used as a discrete GAN with the ability

to generate coherent discrete outputs from continuous samples. We demonstrate

empirically how key properties of the data are captured in the model’s latent

space, and evaluate the model itself on the tasks of discrete image generation,

text generation, and semi-supervised learning.

Distributed, Parallel, and Cluster Computing

Block-space GPU Mapping for Embedded Sierpiński Gasket Fractals

Cristóbal A. Navarro , Benjamín Bustos , Raimundo Vega , Nancy Hitschfeld

Comments: 7 pages, 8 Figures



Distributed, Parallel, and Cluster Computing (cs.DC)

This work studies the problem of GPU thread mapping for a Sierpi’nski gasket

fractal embedded in a discrete Euclidean space of (n imes n). A block-space

map (lambda: mathbb{Z}_{mathbb{E}}^{2} mapsto mathbb{Z}_{mathbb{F}}^{2})

is proposed, from Euclidean parallel space (mathbb{E}) to embedded fractal

space (mathbb{F}), that maps in (mathcal{O}(log_2 log_2(n))) time and uses

no more than (mathcal{O}(n^mathbb{H})) threads with (mathbb{H} approx

1.58…) being the Hausdorff dimension, making it parallel space efficient.

When compared to a bounding-box map, (lambda(omega)) offers a sub-exponential

improvement in parallel space and a monotonically increasing speedup once (n >

n_0). Experimental performance tests show that in practice (lambda(omega))

can produce performance improvement at any block-size once (n > n_0 = 2^8),

reaching approximately (10 imes) of speedup for (n=2^{16}) under optimal block


Towards Adaptive Resilience in High Performance Computing

Siavash Ghiasvand , Florina M. Ciorba

Comments: 2 pages, to be published in Proceedings of the Work in Progress Session held in connection with the 25th EUROMICRO International Conference on Parallel, Distributed and Network-based Processing, PDP 2017



Distributed, Parallel, and Cluster Computing (cs.DC)

; Performance (cs.PF)

Failure rates in high performance computers rapidly increase due to the

growth in system size and complexity. Hence, failures became the norm rather

than the exception. Different approaches on high performance computing (HPC)

systems have been introduced, to prevent failures (e. g., redundancy) or at

least minimize their impacts (e. g., checkpoint and restart). In most cases,

when these approaches are employed to increase the resilience of certain parts

of a system, energy consumption rapidly increases, or performance significantly

degrades. To address this challenge, we propose on-demand resilience as an

approach to achieve adaptive resilience in HPC systems. In this work, the HPC

system is considered in its entirety and resilience mechanisms such as

checkpointing, isolation, and migration, are activated on-demand. Using the

proposed approach, the unavoidable increase in total energy consumption and

system performance degradation is decreased compared to the typical

checkpoint/restart and redundant resilience mechanisms. Our work aims to

mitigate a large number of failures occurring at various layers in the system,

to prevent their propagation, and to minimize their impact, all of this in an

energy-saving manner. In the case of failures that are estimated to occur but

cannot be mitigated using the proposed on-demand resilience approach, the

system administrators will be notified in view of performing further

investigations into the causes of these failures and their impacts.

Anonymization of System Logs for Privacy and Storage Benefits

Siavash Ghiasvand , Florina M. Ciorba

Comments: 8 pages, 5 figures, for demonstration see this https URL



Distributed, Parallel, and Cluster Computing (cs.DC)

; Cryptography and Security (cs.CR)

System logs constitute valuable information for analysis and diagnosis of

system behavior. The size of parallel computing systems and the number of their

components steadily increase. The volume of generated logs by the system is in

proportion to this increase. Hence, long-term collection and storage of system

logs is challenging. The analysis of system logs requires advanced text

processing techniques. For very large volumes of logs, the analysis is highly

time-consuming and requires a high level of expertise. For many parallel

computing centers, outsourcing the analysis of system logs to third parties is

the only affordable option. The existence of sensitive data within system log

entries obstructs, however, the transmission of system logs to third parties.

Moreover, the analytical tools for processing system logs and the solutions

provided by such tools are highly system specific. Achieving a more general

solution is only possible through the access and analysis system of logs of

multiple computing systems. The privacy concerns impede, however, the sharing

of system logs across institutions as well as in the public domain. This work

proposes a new method for the anonymization of the information within system

logs that employs de-identification and encoding to provide sharable system

logs, with the highest possible data quality and of reduced size. The results

presented in this work indicate that apart from eliminating the sensitive data

within system logs and converting them into shareable data, the proposed

anonymization method provides 25% performance improvement in post-processing of

the anonymized system logs, and more than 50% reduction in their required

storage space.

Runtime Verification for Business Processes Utilizing the Bitcoin Blockchain

Christoph Prybila , Stefan Schulte , Christoph Hochreiner , Ingo Weber Subjects : Software Engineering (cs.SE) ; Distributed, Parallel, and Cluster Computing (cs.DC)

The usage of process choreographies and decentralized Business Process

Management Systems has been named as an alternative to centralized business

process orchestration. In choreographies, control over a process instance is

shared between independent parties, and no party has full control or knowledge

during process runtime. Nevertheless, it is necessary to monitor and verify

process instances during runtime for purposes of documentation, accounting, or


To achieve business process runtime verification, this work explores the

suitability of the Bitcoin blockchain to create a novel solution for

choreographies. The resulting approach is realized in a fully-functional

software prototype. This software solution is evaluated in a qualitative

comparison. Findings show that our blockchain-based approach enables a seamless

execution monitoring and verification of choreographies, while at the same time

preserving anonymity and independence of the process participants. Furthermore,

the prototype is evaluated in a performance analysis.

A Hybrid Observer for a Distributed Linear System with a Changing Neighbor Graph

L. Wang , A. S. Morse , D. Fullmer , J. Liu

Comments: 7 pages, the 56th IEEE Conference on Decision and Control



Systems and Control (cs.SY)

; Distributed, Parallel, and Cluster Computing (cs.DC)

A hybrid observer is described for estimating the state of an (m>0) channel,

(n)-dimensional, continuous-time, distributed linear system of the form

(dot{x} = Ax,;y_i = C_ix,;iin{1,2,ldots, m}). The system’s state (x) is

simultaneously estimated by (m) agents assuming each agent (i) senses (y_i) and

receives appropriately defined data from each of its current neighbors.

Neighbor relations are characterized by a time-varying directed graph

(mathbb{N}(t)) whose vertices correspond to agents and whose arcs depict

neighbor relations. Agent (i) updates its estimate (x_i) of (x) at “event

times” (t_1,t_2,ldots ) using a local observer and a local parameter

estimator. The local observer is a continuous time linear system whose input is

(y_i) and whose output (w_i) is an asymptotically correct estimate of (L_ix)

where (L_i) a matrix with kernel equaling the unobservable space of ((C_i,A)).

The local parameter estimator is a recursive algorithm designed to estimate,

prior to each event time (t_j), a constant parameter (p_j) which satisfies the

linear equations (w_k(t_j- au) =

L_kp_j+mu_k(t_j- au),;kin{1,2,ldots,m}), where ( au) is a small

positive constant and (mu_k) is the state estimation error of local observer

(k). Agent (i) accomplishes this by iterating its parameter estimator state

(z_i), (q) times within the interval ([t_j- au, t_j)), and by making use of

the state of each of its neighbors’ parameter estimators at each iteration. The

updated value of (x_i) at event time (t_j) is then (x_i(t_j) =

e^{A au}z_i(q)). Subject to the assumptions that (i) the neighbor graph

(mathbb{N}(t)) is strongly connected for all time, (ii) the system whose state

is to be estimated is jointly observable, (iii) (q) is sufficiently large, it

is shown that each estimate (x_i) converges to (x) exponentially fast as


ightarrow infty) at a rate which can be controlled.


Provable benefits of representation learning

Sanjeev Arora , Andrej Risteski

Comments: 22 pages



Learning (cs.LG)

; Machine Learning (stat.ML)

There is general consensus that learning representations is useful for a

variety of reasons, e.g. efficient use of labeled data (semi-supervised

learning), transfer learning and understanding hidden structure of data.

Popular techniques for representation learning include clustering, manifold

learning, kernel-learning, autoencoders, Boltzmann machines, etc.

To study the relative merits of these techniques, it’s essential to formalize

the definition and goals of representation learning, so that they are all

become instances of the same definition. This paper introduces such a formal

framework that also formalizes the utility of learning the representation. It

is related to previous Bayesian notions, but with some new twists. We show the

usefulness of our framework by exhibiting simple and natural settings — linear

mixture models and loglinear models, where the power of representation learning

can be formally shown. In these examples, representation learning can be

performed provably and efficiently under plausible assumptions (despite being

NP-hard), and furthermore: (i) it greatly reduces the need for labeled data

(semi-supervised learning) and (ii) it allows solving classification tasks when

simpler approaches like nearest neighbors require too much data (iii) it is

more powerful than manifold learning methods.

On Calibration of Modern Neural Networks

Chuan Guo , Geoff Pleiss , Yu Sun , Kilian Q. Weinberger

Comments: Accepted to ICML 2017



Learning (cs.LG)

Confidence calibration — the problem of predicting probability estimates

representative of the true correctness likelihood — is important for

classification models in many applications. We discover that modern neural

networks, unlike those from a decade ago, are poorly calibrated. Through

extensive experiments, we observe that depth, width, weight decay, and Batch

Normalization are important factors influencing calibration. We evaluate the

performance of various post-processing calibration methods on state-of-the-art

architectures with image and document classification datasets. Our analysis and

experiments not only offer insights into neural network learning, but also

provide a simple and straightforward recipe for practical settings: on most

datasets, temperature scaling — a single-parameter variant of Platt Scaling —

is surprisingly effective at calibrating predictions.

SEARNN: Training RNNs with Global-Local Losses

Rémi Leblond , Jean-Baptiste Alayrac , Anton Osokin , Simon Lacoste-Julien

Comments: 12 pages



Learning (cs.LG)

; Machine Learning (stat.ML)

We propose SEARNN, a novel training algorithm for recurrent neural networks

(RNNs) inspired by the “learning to search” (L2S) approach to structured

prediction. RNNs have been widely successful in structured prediction

applications such as machine translation or parsing, and are commonly trained

using maximum likelihood estimation (MLE). Unfortunately, this training loss is

not always an appropriate surrogate for the test error: by only maximizing the

ground truth probability, it fails to exploit the wealth of information offered

by structured losses. Further, it introduces discrepancies between training and

predicting (such as exposure bias) that may hurt test performance. Instead,

SEARNN leverages test-alike search space exploration to introduce global-local

losses that are closer to the test error. We demonstrate improved performance

over MLE on three different tasks: OCR, spelling correction and text chunking.

Finally, we propose a subsampling strategy to enable SEARNN to scale to large

vocabulary sizes.

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

Levent Sagun , Utku Evci , V. Ugur Guney , Yann Dauphin , Leon Bottou Subjects : Learning (cs.LG)

We study the properties of common loss surfaces through their Hessian matrix.

In particular, in the context of deep learning, we empirically show that the

spectrum of the Hessian is composed of two parts: (1) the bulk centered near

zero, (2) and outliers away from the bulk. We present numerical evidence and

mathematical justifications to the following conjectures laid out by Sagun et.

al. (2016): Fixing data, increasing the number of parameters merely scales the

bulk of the spectrum; fixing the dimension and changing the data (for instance

adding more clusters or making the data less separable) only affects the

outliers. We believe that our observations have striking implications for

non-convex optimization in high dimensions. First, the flatness of such

landscapes (which can be measured by the singularity of the Hessian) implies

that classical notions of basins of attraction may be quite misleading. And

that the discussion of wide/narrow basins may be in need of a new perspective

around over-parametrization and redundancy that are able to create large

connected components at the bottom of the landscape. Second, the dependence of

small number of large eigenvalues to the data distribution can be linked to the

spectrum of the covariance matrix of gradients of model outputs. With this in

mind, we may reevaluate the connections within the data-architecture-algorithm

framework of a model, hoping that it would shed light into the geometry of

high-dimensional and non-convex spaces in modern applications. In particular,

we present a case that links the two observations: a gradient based method

appears to be first climbing uphill and then falling downhill between two

points; whereas, in fact, they lie in the same basin.

A survey of dimensionality reduction techniques based on random projection

Haozhe Xie , Jie Li , Hanqing Xue Subjects : Learning (cs.LG)

Dimensionality reduction techniques play important roles in the analysis of

big data. Traditional dimensionality reduction approaches, such as Principle

Component Analysis (PCA) and Linear Discriminant Analysis (LDA), have been

studied extensively in the past few decades. However, as the dimension of huge

data increases, the computational cost of traditional dimensionality reduction

approaches grows dramatically and becomes prohibitive. It has also triggered

the development of Random Projection (RP) technique which maps high-dimensional

data onto low-dimensional subspace within short time. However, RP generates

transformation matrix without considering intrinsic structure of original data

and usually leads to relatively high distortion. Therefore, in the past few

years, some approaches based on RP have been proposed to address this problem.

In this paper, we summarized these approaches in different applications to help

practitioners to employ proper approaches in their specific applications. Also,

we enumerated their benefits and limitations to provide further references for

researchers to develop novel RP-based approaches.

Dueling Bandits With Weak Regret

Bangrui Chen , Peter I. Frazier Subjects : Learning (cs.LG)

We consider online content recommendation with implicit feedback through

pairwise comparisons, formalized as the so-called dueling bandit problem. We

study the dueling bandit problem in the Condorcet winner setting, and consider

two notions of regret: the more well-studied strong regret, which is 0 only

when both arms pulled are the Condorcet winner; and the less well-studied weak

regret, which is 0 if either arm pulled is the Condorcet winner. We propose a

new algorithm for this problem, Winner Stays (WS), with variations for each

kind of regret: WS for weak regret (WS-W) has expected cumulative weak regret

that is (O(N^2)), and (O(Nlog(N))) if arms have a total order; WS for strong

regret (WS-S) has expected cumulative strong regret of (O(N^2 + N log(T))),

and (O(Nlog(N)+Nlog(T))) if arms have a total order. WS-W is the first

dueling bandit algorithm with weak regret that is constant in time. WS is

simple to compute, even for problems with many arms, and we demonstrate through

numerical experiments on simulated and real data that WS has significantly

smaller regret than existing algorithms in both the weak- and strong-regret


Transfer entropy-based feedback improves performance in artificial neural networks

Sebastian Herzog , Christian Tetzlaff , Florentin Wörgötter Subjects : Learning (cs.LG) ; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)

The structure of the majority of modern deep neural networks is characterized

by uni- directional feed-forward connectivity across a very large number of

layers. By contrast, the architecture of the cortex of vertebrates contains

fewer hierarchical levels but many recurrent and feedback connections. Here we

show that a small, few-layer artificial neural network that employs feedback

will reach top level performance on a standard benchmark task, otherwise only

obtained by large feed-forward structures. To achieve this we use feed-forward

transfer entropy between neurons to structure feedback connectivity. Transfer

entropy can here intuitively be understood as a measure for the relevance of

certain pathways in the network, which are then amplified by feedback. Feedback

may therefore be key for high network performance in small brain-like


Adversarially Regularized Autoencoders for Generating Discrete Structures

Junbo (Jake)

Zhao , Yoon Kim , Kelly Zhang , Alexander M. Rush , Yann LeCun Subjects : Learning (cs.LG) ; Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Generative adversarial networks are an effective approach for learning rich

latent representations of continuous data, but have proven difficult to apply

directly to discrete structured data, such as text sequences or discretized

images. Ideally we could encode discrete structures in a continuous code space

to avoid this problem, but it is difficult to learn an appropriate

general-purpose encoder. In this work, we consider a simple approach for

handling these two challenges jointly, employing a discrete structure

autoencoder with a code space regularized by generative adversarial training.

The model learns a smooth regularized code space while still being able to

model the underlying data, and can be used as a discrete GAN with the ability

to generate coherent discrete outputs from continuous samples. We demonstrate

empirically how key properties of the data are captured in the model’s latent

space, and evaluate the model itself on the tasks of discrete image generation,

text generation, and semi-supervised learning.

Hybrid Reward Architecture for Reinforcement Learning

Harm van Seijen , Mehdi Fatemi , Joshua Romoff , Romain Laroche , Tavian Barnes , Jeffrey Tsang Subjects : Learning (cs.LG)

One of the main challenges in reinforcement learning (RL) is generalisation.

In typical deep RL methods this is achieved by approximating the optimal value

function with a low-dimensional representation using a deep network. While this

approach works well in many domains, in domains where the optimal value

function cannot easily be reduced to a low-dimensional representation, learning

can be very slow and unstable. This paper contributes towards tackling such

challenging domains, by proposing a new method, called Hybrid Reward

Architecture (HRA). HRA takes as input a decomposed reward function and learns

a separate value function for each component reward function. Because each

component typically only depends on a subset of all features, the overall value

function is much smoother and can be easier approximated by a low-dimensional

representation, enabling more effective learning. We demonstrate HRA on a

toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human


Deep Learning Methods for Efficient Large Scale Video Labeling

Miha Skalic , Marcin Pekalski , Xingguo E. Pan

Comments: 7 pages, 5 tables, 1 figure



Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We present a solution to “Google Cloud and YouTube-8M Video Understanding

Challenge” that ranked 5th place. The proposed model is an ensemble of three

model families, two frame level and one video level. The training was performed

on augmented dataset, with cross validation.

Accelerated Reinforcement Learning Algorithms with Nonparametric Function Approximation for Opportunistic Spectrum Access

Theodoros Tsiligkaridis , David Romero

Comments: 12 pages, submitted



Information Theory (cs.IT)

; Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of throughput maximization by predicting spectrum

opportunities using reinforcement learning. Our kernel-based reinforcement

learning approach is coupled with a sparsification technique that efficiently

captures the environment states to control dimensionality and finds the best

possible channel access actions based on the current state. This approach

allows learning and planning over the intrinsic state-action space and extends

well to large state and action spaces. For stationary Markov environments, we

derive the optimal policy for channel access, its associated limiting

throughput, and propose a fast online algorithm for achieving the optimal

throughput. We then show that the maximum-likelihood channel prediction and

access algorithm is suboptimal in general, and derive conditions under which

the two algorithms are equivalent. For reactive Markov environments, we derive

kernel variants of Q-learning, R-learning and propose an accelerated R-learning

algorithm that achieves faster convergence. We finally test our algorithms

against a generic reactive network. Simulation results are shown to validate

the theory and show the performance gains over current state-of-the-art


Transfer Learning for Neural Semantic Parsing

Xing Fan , Emilio Monti , Lambert Mathias , Markus Dreyer

Comments: Accepted for ACL Repl4NLP 2017



Computation and Language (cs.CL)

; Learning (cs.LG)

The goal of semantic parsing is to map natural language to a machine

interpretable meaning representation language (MRL). One of the constraints

that limits full exploration of deep learning technologies for semantic parsing

is the lack of sufficient annotation training data. In this paper, we propose

using sequence-to-sequence in a multi-task setup for semantic parsing with a

focus on transfer learning. We explore three multi-task architectures for

sequence-to-sequence modeling and compare their performance with an

independently trained model. Our experiments show that the multi-task setup

aids transfer learning from an auxiliary task with large labeled data to a

target task with smaller labeled data. We see absolute accuracy gains ranging

from 1.0% to 4.4% in our in- house data set, and we also see good gains ranging

from 2.5% to 7.0% on the ATIS semantic parsing tasks with syntactic and

semantic auxiliary tasks.

Teaching Compositionality to CNNs

Austin Stone , Huayan Wang , Michael Stark , Yi Liu , D. Scott Phoenix , Dileep George

Comments: Preprint appearing in CVPR 2017



Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

Convolutional neural networks (CNNs) have shown great success in computer

vision, approaching human-level performance when trained for specific tasks via

application-specific loss functions. In this paper, we propose a method for

augmenting and training CNNs so that their learned features are compositional.

It encourages networks to form representations that disentangle objects from

their surroundings and from each other, thereby promoting better

generalization. Our method is agnostic to the specific details of the

underlying CNN to which it is applied and can in principle be used with any

CNN. As we show in our experiments, the learned representations lead to feature

activations that are more localized and improve performance over

non-compositional baselines in object recognition tasks.

Leveraging Node Attributes for Incomplete Relational Data

He Zhao , Lan Du , Wray Buntine

Comments: Appearing in ICML 2017



Machine Learning (stat.ML)

; Learning (cs.LG); Social and Information Networks (cs.SI)

Relational data are usually highly incomplete in practice, which inspires us

to leverage side information to improve the performance of community detection

and link prediction. This paper presents a Bayesian probabilistic approach that

incorporates various kinds of node attributes encoded in binary form in

relational models with Poisson likelihood. Our method works flexibly with both

directed and undirected relational networks. The inference can be done by

efficient Gibbs sampling which leverages sparsity of both networks and node

attributes. Extensive experiments show that our models achieve the

state-of-the-art link prediction results, especially with highly incomplete

relational data.

Optimization by a quantum reinforcement algorithm

A. Ramezanpour

Comments: 11 pages, 5 figures



Disordered Systems and Neural Networks (cond-mat.dis-nn)

; Statistical Mechanics (cond-mat.stat-mech); Artificial Intelligence (cs.AI); Learning (cs.LG); Quantum Physics (quant-ph)

A reinforcement algorithm solves a classical optimization problem by

introducing a feedback to the system which slowly changes the energy landscape

and converges the algorithm to an optimal solution in the configuration space.

Here, we use this strategy to concentrate (localize) preferentially the wave

function of a quantum particle, which explores the configuration space of the

problem, on an optimal configuration. We examine the method by solving

numerically the equations governing the evolution of the system, which are

similar to the nonlinear Schr”odinger equations, for small problem sizes. In

particular, we observe that reinforcement increases the minimal energy gap of

the system in a quantum annealing algorithm. Our numerical simulations and the

latter observation show that such kind of quantum feedbacks might be helpful in

solving a computationally hard optimization problem by a quantum reinforcement


On Optimistic versus Randomized Exploration in Reinforcement Learning

Ian Osband , Benjamin Van Roy

Comments: Extended abstract for RLDM 2017



Machine Learning (stat.ML)

; Learning (cs.LG)

We discuss the relative merits of optimistic and randomized approaches to

exploration in reinforcement learning. Optimistic approaches presented in the

literature apply an optimistic boost to the value estimate at each state-action

pair and select actions that are greedy with respect to the resulting

optimistic value function. Randomized approaches sample from among

statistically plausible value functions and select actions that are greedy with

respect to the random sample. Prior computational experience suggests that

randomized approaches can lead to far more statistically efficient learning. We

present two simple analytic examples that elucidate why this is the case. In

principle, there should be optimistic approaches that fare well relative to

randomized approaches, but that would require intractable computation.

Optimistic approaches that have been proposed in the literature sacrifice

statistical efficiency for the sake of computational efficiency. Randomized

approaches, on the other hand, may enable simultaneous statistical and

computational efficiency.

Information Theory

Accelerated Reinforcement Learning Algorithms with Nonparametric Function Approximation for Opportunistic Spectrum Access

Theodoros Tsiligkaridis , David Romero

Comments: 12 pages, submitted



Information Theory (cs.IT)

; Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of throughput maximization by predicting spectrum

opportunities using reinforcement learning. Our kernel-based reinforcement

learning approach is coupled with a sparsification technique that efficiently

captures the environment states to control dimensionality and finds the best

possible channel access actions based on the current state. This approach

allows learning and planning over the intrinsic state-action space and extends

well to large state and action spaces. For stationary Markov environments, we

derive the optimal policy for channel access, its associated limiting

throughput, and propose a fast online algorithm for achieving the optimal

throughput. We then show that the maximum-likelihood channel prediction and

access algorithm is suboptimal in general, and derive conditions under which

the two algorithms are equivalent. For reactive Markov environments, we derive

kernel variants of Q-learning, R-learning and propose an accelerated R-learning

algorithm that achieves faster convergence. We finally test our algorithms

against a generic reactive network. Simulation results are shown to validate

the theory and show the performance gains over current state-of-the-art


On Error Detection in Asymmetric Channels

Mladen Kovačević

Comments: 4 pages, 2 figures



Information Theory (cs.IT)

; Discrete Mathematics (cs.DM)

We study the error detection problem in ( q )-ary asymmetric channels wherein

every input symbol ( x_i ) is mapped to an output symbol ( y_i ) satisfying (

y_i geq x_i ). A general setting is assumed where the noise vectors are

(potentially) restricted in: 1) the amplitude, ( y_i – x_i leq a ), 2) the

Hamming weight, ( sum_{i=1}^n 1_{{y_i

eq x_i}} leq h ), and 3) the total

weight, ( sum_{i=1}^n (y_i – x_i) leq t ). Optimal codes detecting these

types of errors are described for certain sets of parameters ( a, h, t ), both

in the standard and in the cyclic (( operatorname{mod}, q )) version of the

problem. It is also demonstrated that these codes are optimal in the large

alphabet limit for every ( a, h, t ) and every block-length ( n ).

On Distributed Power Control for Uncoordinated Dual Energy Harvesting Links: Performance Bounds and Near-Optimal Policies

Mohit K. Sharma , Chandra R. Murthy , Rahul Vaze

Comments: 8 pages



Information Theory (cs.IT)

In this paper, we consider a point-to-point link between an energy harvesting

transmitter and receiver, where neither node has the information about the

battery state or energy availability at the other node. We consider a model

where data is successfully delivered only in slots where both nodes are active.

Energy loss occurs whenever one node turns on while the other node is in sleep

mode. In each slot, based on their own energy availability, the transmitter and

receiver need to independently decide whether or not to turn on, with the aim

of maximizing the long-term time-average throughput. We present an upper bound

on the throughput achievable by analyzing a genie-aided system that has

noncausal knowledge of the energy arrivals at both the nodes. Next, we propose

an online policy requiring an occasional one-bit feedback whose throughput is

within one bit of the upper bound, asymptotically in the battery size. In order

to further reduce the feedback required, we propose a time-dilated version of

the online policy. As the time dilation gets large, this policy does not

require any feedback and achieves the upper bound asymptotically in the battery

size. Inspired by this, we also propose a near-optimal fully uncoordinated

policy. We use Monte Carlo simulations to validate our theoretical results and

illustrate the performance of the proposed policies.

Is Natural Language Strongly Nonergodic? A Stronger Theorem about Facts and Words

Łukasz Dębowski

Comments: 20 pages, 1 figure



Information Theory (cs.IT)

; Computation and Language (cs.CL)

As we discuss, a stationary stochastic process is nonergodic when a random

persistent topic can be detected in the infinite random text sampled from the

process, whereas we call the process strongly nonergodic when an infinite

sequence of random bits, called random facts, is needed to describe this topic

completely. Whereas natural language has been often supposed to be nonergodic,

we exhibit some indirect evidence that natural language may be also strongly

nonergodic. First, we present a surprising assertion, which we call the theorem

about facts and words. This proposition states that the number of random facts

which can be inferred from a finite text sampled from a stationary process must

be roughly smaller than the number of word-like strings detected in this text

by the PPM compression algorithm. Second, we observe that the number of the

word-like strings for some texts in natural language follows an empirical

stepwise power law. In view of both observations, the number of inferrable

facts for natural language may also follow a power law, i.e., natural language

may be strongly nonergodic.

Strong converse bounds for high-dimensional estimation

Ramji Venkataramanan , Oliver Johnson Subjects : Information Theory (cs.IT) ; Statistics Theory (math.ST); Machine Learning (stat.ML)

In statistical inference problems, we wish to obtain lower bounds on the

minimax risk, that is to bound the performance of any possible estimator. A

standard technique to obtain risk lower bounds involves the use of Fano’s

inequality. In an information-theoretic setting, it is known that Fano’s

inequality typically does not give a sharp converse result (error lower bound)

for channel coding problems. Moreover, recent work has shown that an argument

based on binary hypothesis testing gives tighter results. We adapt this

technique to the statistical setting, and argue that Fano’s inequality can

always be replaced by this approach to obtain tighter lower bounds that can be

easily computed and are asymptotically sharp. We illustrate our technique in

three applications: density estimation, active learning of a binary classifier,

and compressed sensing, obtaining tighter risk lower bounds in each case.

Sequential Channel Estimation in the Presence of Random Phase Noise in NB-IoT Systems

Fredrik Rusek , Sha Hu

Comments: 5 pages, 4 figures, submitted to conference



Information Theory (cs.IT)

We consider channel estimation (CE) in narrowband Internet-of-Things (NB-IoT)

systems. Due to the fluctuations in phase within receiver and transmitter

oscillators, and also the residual frequency offset (FO) caused by

discontinuous receiving of repetition coded transmit data-blocks, random phase

noises are presented in received signals. Although the coherent-time of fading

channel can be assumed fairly long due to the low-mobility of NB-IoT

user-equipments (UEs), such phase noises have to be considered before combining

the the channel estimates over repetition copies to improve their accuracies.

In this paper, we derive a sequential minimum-mean-square-error (MMSE) channel

estimator in the presence of random phase noise that refines the CE

sequentially with each received repetition copy, which has a low-complexity and

a small data storage. Further, we show through simulations that, the proposed

sequential MMSE estimator improves the mean-square-error (MSE) of CE by 1 dB in

the low signal-to-noise ratio (SNR) regime, compared to a traditional

sequential MMSE estimator that does not thoroughly consider the impact of

random phase noises.

WLS-Based Self-Localization Using Perturbed Anchor Positions and RSSI Measurements

Vikram Kumar , Reza Arablouei , Brano Kusy , Raja Jurdak , Neil W. Bergmann Subjects : Information Theory (cs.IT)

We consider the problem of self-localization by a resource-constrained node

within a network given radio signal strength indicator (RSSI) measurements from

a set of anchor nodes where the RSSI measurements as well as the anchor

position information are subject to perturbation. In order to achieve a

computationally efficient estimate for the unknown position, we minimize a

weighted sum-square-distance-error cost function in an iterative fashion

utilizing the gradient-descent method. We calculate the weights in the cost

function by taking into account perturbations in both RSSI measurements and

anchor node position information while assuming normal distribution for the

perturbations in the anchor node position information and log-normal

distribution for the RSSI-induced distance estimates. The latter assumption is

due to considering the log-distance path-loss model with normally-distributed

perturbations for the RSSI measurements in the logarithmic scale. We also

derive the Cramer-Rao lower bound associated with the considered position

estimation problem. We evaluate the performance of the proposed algorithm

considering various arbitrary network topologies and compare it with an

existing algorithm that is based on a similar approach but only accounts for

perturbations in the RSSI measurements. The experimental results show that the

proposed algorithm yields significant improvement in localization performance

over the existing algorithm while maintaining its computational efficiency.

This makes the proposed algorithm suitable for real-world applications where

the information available about the positions of anchor nodes often suffer from

uncertainty due to observational noise or error and the computational and

energy resources of mobile nodes are limited, prohibiting the use of more

sophisticated techniques such as those based on semidefinite or second-order

cone programming.

Compressed Secret Key Agreement: Maximizing Multivariate Mutual Information Per Bit

Chung Chan Subjects : Information Theory (cs.IT)

The multiterminal secret key agreement problem by public discussion is

revisited with an additional source compression step. Prior to public

discussion, users independently compress their private sources to filter out

less correlated information that adds little to the maximum achievable secret

key rate, referred to as the compressed secrecy capacity. The characterization

of this secrecy capacity posts new challenges in information processing and

dimension reduction, and the idea has given rise to one of the best achieving

schemes for secret key agreement under public discussion rate constraints.

Exploiting such connection, we derive single-letter lower and upper bounds for

a general source model, and a precise single-letter characterization for the

pairwise independent network model.

Quantifying genuine multipartite correlations and their pattern complexity

Davide Girolami , Tommaso Tufarelli , Cristian E. Susa

Comments: 4 pages



Quantum Physics (quant-ph)

; Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT); Adaptation and Self-Organizing Systems (nlin.AO); Data Analysis, Statistics and Probability (

We propose an information-theoretic framework to quantify multipartite

correlations in classical and quantum systems, answering questions such as:

what is the amount of seven-partite correlations in a given state of ten

particles? We identify measures of genuine multipartite correlations, i.e.

statistical dependencies which cannot be ascribed to bipartite correlations,

satisfying a set of desirable properties. Inspired by ideas developed in

complexity science, we then introduce the concept of weaving to classify states

with an equal amount of total correlations, but displaying different patterns

of multipartite correlations. The weaving of a state is defined as the weighted

sum of correlations of any order. Weaving measures are good descriptors of the

complexity of correlation structures in multipartite systems.

Transfer entropy-based feedback improves performance in artificial neural networks

Sebastian Herzog , Christian Tetzlaff , Florentin Wörgötter Subjects : Learning (cs.LG) ; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)

The structure of the majority of modern deep neural networks is characterized

by uni- directional feed-forward connectivity across a very large number of

layers. By contrast, the architecture of the cortex of vertebrates contains

fewer hierarchical levels but many recurrent and feedback connections. Here we

show that a small, few-layer artificial neural network that employs feedback

will reach top level performance on a standard benchmark task, otherwise only

obtained by large feed-forward structures. To achieve this we use feed-forward

transfer entropy between neurons to structure feedback connectivity. Transfer

entropy can here intuitively be understood as a measure for the relevance of

certain pathways in the network, which are then amplified by feedback. Feedback

may therefore be key for high network performance in small brain-like



arXiv Paper Daily: Thu, 15 Jun 2017



以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网






陈冰 / 清华大学出版社 / 2006-3 / 45.00元

《Flash第1步:ActionScript编程篇》(珍藏版)为《Flash第一步》的ActionScript编程篇,包含后4部分内容。第3部分为ActionScript篇,你将学会像一个软件设计师那样来思考问题,并掌握在Flash中进行程序开发工作所必须具备的重要知识,还将学会运用Flash完整的编程体系来完成从简单到复杂的各种编程任务。另外,在开发一个Flash应用过程中会涉及的各种其他Web......一起来看看 《Flash第一步》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具



MD5 加密
MD5 加密

MD5 加密工具