arXiv Paper Daily: Fri, 9 Jun 2017

栏目: 编程工具 · 发布时间: 7年前

内容简介:arXiv Paper Daily: Fri, 9 Jun 2017

Neural and Evolutionary Computing

Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks

Yujie Wu , Lei Deng , Guoqi Li , Jun Zhu , Luping Shi Subjects : Neural and Evolutionary Computing (cs.NE) ; Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

Compared with artificial neural networks (ANNs), spiking neural networks

(SNNs) are promising to explore the brain-like behaviors since the spikes could

encode more spatio-temporal information. Although pre-training from ANN or

direct training based on backpropagation (BP) makes the supervised training of

SNNs possible, these methods only exploit the networks’ spatial domain

information which leads to the performance bottleneck and requires many

complicated training skills. One fundamental issue is that the spike activity

is naturally non-differentiable which causes great difficulties in training

SNNs. To this end, we build an iterative LIF model that is more friendly for

gradient descent training. By simultaneously considering the layer-by-layer

spatial domain (SD) and the timing-dependent temporal domain (TD) in the

training phase, as well as an approximated derivative for the spike activity,

we propose a spatio-temporal backpropagation (STBP) training framework without

using any complicated technology. We achieve the best performance of

multi-layered perceptron (MLP) compared with existing state-of-the-art

algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a

custom object detection dataset. This work provides a new perspective to

explore the high-performance SNNs for future brain-like computing paradigm with

rich spatio-temporal dynamics.

Surprise Search for Evolutionary Divergence

Daniele Gravina , Antonios Liapis , Georgios N. Yannakakis Subjects : Neural and Evolutionary Computing (cs.NE)

Inspired by the notion of surprise for unconventional discovery we introduce

a general search algorithm we name surprise search as a new method of

evolutionary divergent search. Surprise search is grounded in the divergent

search paradigm and is fabricated within the principles of evolutionary search.

The algorithm mimics the self-surprise cognitive process and equips

evolutionary search with the ability to seek for solutions that deviate from

the algorithm’s expected behaviour. The predictive model of expected solutions

is based on historical trails of where the search has been and local

information about the search space. Surprise search is tested extensively in a

robot maze navigation task: experiments are held in four authored deceptive

mazes and in 60 generated mazes and compared against objective-based

evolutionary search and novelty search. The key findings of this study reveal

that surprise search is advantageous compared to the other two search

processes. In particular, it outperforms objective search and it is as

efficient as novelty search in all tasks examined. Most importantly, surprise

search is faster, on average, and more robust in solving the navigation problem

compared to any other algorithm examined. Finally, our analysis reveals that

surprise search explores the behavioural space more extensively and yields

higher population diversity compared to novelty search. What distinguishes

surprise search from other forms of divergent search, such as the search for

novelty, is its ability to diverge not from earlier and seen solutions but

rather from predicted and unseen points in the domain considered.

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

Karla Stepanova , Matej Hoffmann , Zdenek Straka , Frederico B. Klein , Angelo Cangelosi , Michal Vavrecka

Comments: pp. 155-162

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)

Humans and animals are constantly exposed to a continuous stream of sensory

information from different modalities. At the same time, they form more

compressed representations like concepts or symbols. In species that use

language, this process is further structured by this interaction, where a

mapping between the sensorimotor concepts and linguistic elements needs to be

established. There is evidence that children might be learning language by

simply disambiguating potential meanings based on multiple exposures to

utterances in different contexts (cross-situational learning). In existing

models, the mapping between modalities is usually found in a single step by

directly using frequencies of referent and meaning co-occurrences. In this

paper, we present an extension of this one-step mapping and introduce a newly

proposed sequential mapping algorithm together with a publicly available Matlab

implementation. For demonstration, we have chosen a less typical scenario:

instead of learning to associate objects with their names, we focus on body

representations. A humanoid robot is receiving tactile stimulations on its

body, while at the same time listening to utterances of the body part names

(e.g., hand, forearm and torso). With the goal at arriving at the correct “body

categories”, we demonstrate how a sequential mapping algorithm outperforms

one-step mapping. In addition, the effect of data set size and noise in the

linguistic input are studied.

Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs

Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

To achieve state-of-the-art results on challenges in vision, Convolutional

Neural Networks learn stationary filters that take advantage of the underlying

image structure. Our purpose is to propose an efficient layer formulation that

extends this property to any domain described by a graph. Namely, we use the

support of its adjacency matrix to design learnable weight sharing filters able

to exploit the underlying structure of signals. The proposed formulation makes

it possible to learn the weights of the filter as well as a scheme that

controls how they are shared across the graph. We perform validation

experiments with image datasets and show that these filters offer performances

comparable with convolutional ones.

Reading Twice for Natural Language Understanding

Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Despite the recent success of neural networks in tasks involving natural

language understanding (NLU) there has only been limited progress in some of

the fundamental challenges of NLU, such as the disambiguation of the meaning

and function of words in context. This work approaches this problem by

incorporating contextual information into word representations prior to

processing the task at hand. To this end we propose a general-purpose reading

architecture that is employed prior to a task-specific NLU model. It is

responsible for refining context-agnostic word representations with contextual

information and lends itself to the introduction of additional,

context-relevant information from external knowledge sources. We demonstrate

that previously non-competitive models benefit dramatically from employing

contextual representations, closing the gap between general-purpose reading

architectures and the state-of-the-art performance obtained with fine-tuned,

task-specific architectures. Apart from our empirical results we present a

comprehensive analysis of the computed representations which gives insights

into the kind of information added during the refinement process.

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Denis A. Gudovskiy , Luca Rigazio Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Neural and Evolutionary Computing (cs.NE)

In this paper we introduce ShiftCNN, a generalized low-precision architecture

for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN

is based on a power-of-two weight representation and, as a result, performs

only shift and addition operations. Furthermore, ShiftCNN substantially reduces

computational cost of convolutional layers by precomputing convolution terms.

Such an optimization can be applied to any CNN architecture with a relatively

small codebook of weights and allows to decrease the number of product

operations by at least two orders of magnitude. The proposed architecture

targets custom inference accelerators and can be realized on FPGAs or ASICs.

Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be

converted without retraining into ShiftCNN with less than 1% drop in accuracy

when the proposed quantization algorithm is employed. RTL simulations,

targeting modern FPGAs, show that power consumption of convolutional layers is

reduced by a factor of 4 compared to conventional 8-bit fixed-point

architectures.

Computer Vision and Pattern Recognition

Structured Light Phase Measuring Profilometry Pattern Design for Binary Spatial Light Modulators

Daniel L. Lau , Yu Zhang , Kai Liu Subjects : Computer Vision and Pattern Recognition (cs.CV)

Structured light illumination is an active 3-D scanning technique based on

projecting/capturing a set of striped patterns and measuring the warping of the

patterns as they reflect off a target object’s surface. In the case of phase

measuring profilometry (PMP), the projected patterns are composed of a rolling

sinusoidal wave, but as a set of time-multiplexed patterns, PMP requires the

target surface to remain motionless or for scanning to be performed at such

high rates that any movement is small. But high speed scanning places a

significant burden on the projector electronics to produce contone patterns

inside of short exposure intervals. Binary patterns are, therefore, of great

value, but converting contone patterns into binary comes with significant risk.

As such, this paper introduces a contone-to-binary conversion algorithm for

deriving binary patterns that best mimic their contone counterparts.

Experimental results will show a greater than 3 times reduction in pattern

noise over traditional halftoning procedures.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal , Piotr Dollár , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , Kaiming He

Comments: Tech report

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Deep learning thrives with large neural networks and large datasets. However,

larger networks and larger datasets result in longer training times that impede

research and development progress. Distributed synchronous SGD offers a

potential solution to this problem by dividing SGD minibatches over a pool of

parallel workers. Yet to make this scheme efficient, the per-worker workload

must be large, which implies nontrivial growth in the SGD minibatch size. In

this paper, we empirically show that on the ImageNet dataset large minibatches

cause optimization difficulties, but when these are addressed the trained

networks exhibit good generalization. Specifically, we show no loss of accuracy

when training with large minibatch sizes up to 8192 images. To achieve this

result, we adopt a linear scaling rule for adjusting learning rates as a

function of minibatch size and develop a new warmup scheme that overcomes

optimization challenges early in training. With these simple techniques, our

Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs

in one hour, while matching small minibatch accuracy. Using commodity hardware,

our implementation achieves ~90% scaling efficiency when moving from 8 to 256

GPUs. This system enables us to train visual recognition models on

internet-scale data with high efficiency.

An Efficient Approach for Object Detection and Tracking of Objects in a Video with Variable Background

Kumar S. Ray , Soma Chakraborty Subjects : Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel approach to create an automated visual

surveillance system which is very efficient in detecting and tracking moving

objects in a video captured by moving camera without any apriori information

about the captured scene. Separating foreground from the background is

challenging job in videos captured by moving camera as both foreground and

background information change in every consecutive frames of the image

sequence; thus a pseudo-motion is perceptive in background. In the proposed

algorithm, the pseudo-motion in background is estimated and compensated using

phase correlation of consecutive frames based on the principle of Fourier shift

theorem. Then a method is proposed to model an acting background from recent

history of commonality of the current frame and the foreground is detected by

the differences between the background model and the current frame. Further

exploiting the recent history of dissimilarities of the current frame, actual

moving objects are detected in the foreground. Next, a two-stepped

morphological operation is proposed to refine the object region for an optimum

object size. Each object is attributed by its centroid, dimension and three

highest peaks of its gray value histogram. Finally, each object is tracked

using Kalman filter based on its attributes. The major advantage of this

algorithm over most of the existing object detection and tracking algorithms is

that, it does not require initialization of object position in the first frame

or training on sample data to perform. Performance of the algorithm is tested

on benchmark videos containing variable background and very satisfiable results

is achieved. The performance of the algorithm is also comparable with some of

the state-of-the-art algorithms for object detection and tracking.

Generative Autotransporters

Jiqing Wu , Zhiwu Huang , Wen Li , Luc Van Gool

Comments: *First two authors made equal contributions. Submitted to NIPS on May 19, 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (stat.ML)

In this paper, we aim to introduce the classic Optimal Transport theory to

enhance deep generative probabilistic modeling. For this purpose, we design a

Generative Autotransporter (GAT) model with explicit distribution optimal

transport. Particularly, the GAT model owns a deep distribution transporter to

transfer the target distribution to a specific prior probability distribution,

which enables a regular decoder to generate target samples from the input data

that follows the transported prior distribution. With such a design, the GAT

model can be stably trained to generate novel data by merely using a very

simple (l_1) reconstruction loss function with a generalized manifold-based

Adam training algorithm. The experiments on two standard benchmarks demonstrate

its strong generation ability.

ToxTrac: a fast and robust software for tracking organisms

Alvaro Rodriquez , Hanqing Zhang , Jonatan Klaminder , Tomas Brodin , Patrik L. Andersson , Magnus Andersson

Comments: File contains supplementary materials (user guide)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

1. Behavioral analysis based on video recording is becoming increasingly

popular within research fields such as; ecology, medicine, ecotoxicology, and

toxicology. However, the programs available to analyze the data, which are;

free of cost, user-friendly, versatile, robust, fast and provide reliable

statistics for different organisms (invertebrates, vertebrates and mammals) are

significantly limited.

2. We present an automated open-source executable software (ToxTrac) for

image-based tracking that can simultaneously handle several organisms monitored

in a laboratory environment. We compare the performance of ToxTrac with current

accessible programs on the web.

3. The main advantages of ToxTrac are: i) no specific knowledge of the

geometry of the tracked bodies is needed; ii) processing speed, ToxTrac can

operate at a rate >25 frames per second in HD videos using modern desktop

computers; iii) simultaneous tracking of multiple organisms in multiple arenas;

iv) integrated distortion correction and camera calibration; v) robust against

false positives; vi) preservation of individual identification if crossing

occurs; vii) useful statistics and heat maps in real scale are exported in:

image, text and excel formats.

4. ToxTrac can be used for high speed tracking of insects, fish, rodents or

other species, and provides useful locomotor information. We suggest using

ToxTrac for future studies of animal behavior independent of research area.

Download ToxTrac here: this https URL

Learning Deep Representations for Scene Labeling with Guided Supervision

Zhe Wang , Hongsheng Li , Wanli Ouyang , Xiaogang Wang

Comments: 13 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Scene labeling is a challenging classification problem where each input image

requires a pixel-level prediction map. Recently, deep-learning-based methods

have shown their effectiveness on solving this problem. However, we argue that

the large intra-class variation provides ambiguous training information and

hinders the deep models’ ability to learn more discriminative deep feature

representations. Unlike existing methods that mainly utilize semantic context

for regularizing or smoothing the prediction map, we design novel supervisions

from semantic context for learning better deep feature representations. Two

types of semantic context, scene names of images and label map statistics of

image patches, are exploited to create label hierarchies between the original

classes and newly created subclasses as the learning supervisions. Such

subclasses show lower intra-class variation, and help CNN detect more

meaningful visual patterns and learn more effective deep features. Novel

training strategies and network structure that take advantages of such label

hierarchies are introduced. Our proposed method is evaluated extensively on

four popular datasets, Stanford Background (8 classes), SIFTFlow (33 classes),

Barcelona (170 classes) and LM+Sun datasets (232 classes) with 3 different

networks structures, and show state-of-the-art performance. The experiments

show that our proposed method makes deep models learn more discriminative

feature representations without increasing model size or complexity.

Automatic tracking of vessel-like structures from a single starting point

Dario Augusto Borges Oliveira , Laura Leal-Taixe , Raul Queiroz Feitosa , Bodo Rosenhahn Subjects : Computer Vision and Pattern Recognition (cs.CV)

The identification of vascular networks is an important topic in the medical

image analysis community. While most methods focus on single vessel tracking,

the few solutions that exist for tracking complete vascular networks are

usually computationally intensive and require a lot of user interaction. In

this paper we present a method to track full vascular networks iteratively

using a single starting point. Our approach is based on a cloud of sampling

points distributed over concentric spherical layers. We also proposed a vessel

model and a metric of how well a sample point fits this model. Then, we

implement the network tracking as a min-cost flow problem, and propose a novel

optimization scheme to iteratively track the vessel structure by inherently

handling bifurcations and paths. The method was tested using both synthetic and

real images. On the 9 different data-sets of synthetic blood vessels, we

achieved maximum accuracies of more than 98\%. We further use the synthetic

data-set to analyse the sensibility of our method to parameter setting, showing

the robustness of the proposed algorithm. For real images, we used coronary,

carotid and pulmonary data to segment vascular structures and present the

visual results. Still for real images, we present numerical and visual results

for networks of nerve fibers in the olfactory system. Further visual results

also show the potential of our approach for identifying vascular networks

topologies. The presented method delivers good results for the several

different datasets tested and have potential for segmenting vessel-like

structures. Also, the topology information, inherently extracted, can be used

for further analysis to computed aided diagnosis and surgical planning.

Finally, the method’s modular aspect holds potential for problem-oriented

adjustments and improvements.

Image Captioning with Object Detection and Localization

Zhongliang Yang , Yu-Jin Zhang , Sadaqat ur Rehman , Yongfeng Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Automatically generating a natural language description of an image is a task

close to the heart of image understanding. In this paper, we present a

multi-model neural network method closely related to the human visual system

that automatically learns to describe the content of images. Our model consists

of two sub-models: an object detection and localization model, which extract

the information of objects and their spatial relationship in images

respectively; Besides, a deep recurrent neural network (RNN) based on long

short-term memory (LSTM) units with attention mechanism for sentences

generation. Each word of the description will be automatically aligned to

different objects of the input image when it is generated. This is similar to

the attention mechanism of the human visual system. Experimental results on the

COCO dataset showcase the merit of the proposed method, which outperforms

previous benchmark models.

C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones

Nuhad A. Malalla , Ying Chen Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper, we investigated a C-arm tomographic technique as a new three

dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone

detection over view angle less than 180o. Our C-arm tomographic technique

provides a series of two dimensional (2D) images with a single scan over 40o

view angle. Experimental studies were performed with a kidney phantom that was

formed from a pig kidney with two embedded kidney stones. Different

reconstruction methods were developed for C-arm tomographic technique to

generate 3D kidney information including: point by point back projection (BP),

filtered back projection (FBP), simultaneous algebraic reconstruction technique

(SART) and maximum likelihood expectation maximization (MLEM). Computer

simulation study was also done with simulated 3D spherical object to evaluate

the reconstruction results. Preliminary results demonstrated the capability of

our C-arm tomographic technique to generate 3D kidney information for kidney

stone detection with low exposure of radiation. The kidney stones are visible

on reconstructed planes with identifiable shapes and sizes.

Leveraging deep neural networks to capture psychological representations

Joshua C. Peterson , Joshua T. Abbott , Thomas L. Griffiths

Comments: 22 pages, 3 figures, submitted for publication

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Artificial neural networks have seen a recent surge in popularity for their

ability to solve complex problems as well as or better than humans. In computer

vision, deep convolutional neural networks have become the standard for object

classification and image understanding due to their ability to learn efficient

representations of high-dimensional data. However, the relationship between

these representations and human psychological representations has remained

unclear. Here we evaluate the quantitative and qualitative nature of this

correspondence. We find that state-of-the-art object classification networks

provide a reasonable first approximation to human similarity judgments, but

fail to capture some of the structure of psychological representations. We show

that a simple transformation that corrects these discrepancies can be obtained

through convex optimization. Such representations provide a tool that can be

used to study human performance on complex tasks with naturalistic stimuli,

such as predicting the difficulty of learning novel categories. Our results

extend the scope of psychological experiments and computational modeling of

cognition by enabling tractable use of large natural stimulus sets.

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles R. Qi , Li Yi , Hao Su , Leonidas J. Guibas Subjects : Computer Vision and Pattern Recognition (cs.CV)

Few prior works study deep learning on point sets. PointNet by Qi et al. is a

pioneer in this direction. However, by design PointNet does not capture local

structures induced by the metric space points live in, limiting its ability to

recognize fine-grained patterns and generalizability to complex scenes. In this

work, we introduce a hierarchical neural network that applies PointNet

recursively on a nested partitioning of the input point set. By exploiting

metric space distances, our network is able to learn local features with

increasing contextual scales. With further observation that point sets are

usually sampled with varying densities, which results in greatly decreased

performance for networks trained on uniform densities, we propose novel set

learning layers to adaptively combine features from multiple scales.

Experiments show that our network called PointNet++ is able to learn deep point

set features efficiently and robustly. In particular, results significantly

better than state-of-the-art have been obtained on challenging benchmarks of 3D

point clouds.

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Denis A. Gudovskiy , Luca Rigazio Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Neural and Evolutionary Computing (cs.NE)

In this paper we introduce ShiftCNN, a generalized low-precision architecture

for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN

is based on a power-of-two weight representation and, as a result, performs

only shift and addition operations. Furthermore, ShiftCNN substantially reduces

computational cost of convolutional layers by precomputing convolution terms.

Such an optimization can be applied to any CNN architecture with a relatively

small codebook of weights and allows to decrease the number of product

operations by at least two orders of magnitude. The proposed architecture

targets custom inference accelerators and can be realized on FPGAs or ASICs.

Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be

converted without retraining into ShiftCNN with less than 1% drop in accuracy

when the proposed quantization algorithm is employed. RTL simulations,

targeting modern FPGAs, show that power consumption of convolutional layers is

reduced by a factor of 4 compared to conventional 8-bit fixed-point

architectures.

Active Learning for Structured Prediction from Partially Labeled Data

Mehran Khodabandeh , Zhiwei Deng , Mostafa Saad , Shinichi Satoh , Greg Mori

Comments: This paper is submitted to ICCV 2017. 2nd and 3rd authors are in alphabetic order (equal contribution)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We propose a general purpose active learning algorithm for structured

prediction, gathering labeled data for training a model that outputs a set of

related labels for an image or video. Active learning starts with a limited

initial training set, then iterates querying a user for labels on unlabeled

data and retraining the model. We propose a novel algorithm for selecting data

for labeling, choosing examples to maximize expected information gain based on

belief propagation inference. This is a general purpose method and can be

applied to a variety of tasks or models. As a specific example we demonstrate

this framework for learning to recognize human actions and group activities in

video sequences. Experiments show that our proposed algorithm outperforms

previous active learning methods and can achieve accuracy comparable to fully

supervised methods while utilizing significantly less labeled data.

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Xiao Yang , Ersin Yumer , Paul Asente , Mike Kraley , Daniel Kifer , C. Lee Giles

Comments: CVPR 2017 Spotlight

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

We present an end-to-end, multimodal, fully convolutional network for

extracting semantic structures from document images. We consider document

semantic structure extraction as a pixel-wise segmentation task, and propose a

unified model that classifies pixels based not only on their visual appearance,

as in the traditional page segmentation task, but also on the content of

underlying text. Moreover, we propose an efficient synthetic document

generation process that we use to generate pretraining data for our network.

Once the network is trained on a large set of synthetic documents, we fine-tune

the network on unlabeled real documents using a semi-supervised approach. We

systematically study the optimum network architecture and show that both our

multimodal approach and the synthetic data pretraining significantly boost the

performance.

Low-shot learning with large-scale diffusion

Matthijs Douze , Arthur Szlam , Bharath Hariharan , Hervé Jégou Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)

This paper considers the problem of inferring image labels for which only a

few labelled examples are available at training time. This setup is often

referred to as low-shot learning in the literature, where a standard approach

is to re-train the last few layers of a convolutional neural network learned on

separate classes. We consider a semi-supervised setting in which we exploit a

large collection of images to support label propagation. This is made possible

by leveraging the recent advances on large-scale similarity graph construction.

We show that despite its conceptual simplicity, scaling up label propagation to

up hundred millions of images leads to state of the art accuracy in the

low-shot learning regime.

CoMaL Tracking: Tracking Points at the Object Boundaries

Santhosh K. Ramakrishnan , Swarna Kamlam Ravindran , Anurag Mittal

Comments: 10 pages, 10 figures, to appear in 1st Joint BMTT-PETS Workshop on Tracking and Surveillance, CVPR 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Traditional point tracking algorithms such as the KLT use local 2D

information aggregation for feature detection and tracking, due to which their

performance degrades at the object boundaries that separate multiple objects.

Recently, CoMaL Features have been proposed that handle such a case. However,

they proposed a simple tracking framework where the points are re-detected in

each frame and matched. This is inefficient and may also lose many points that

are not re-detected in the next frame. We propose a novel tracking algorithm to

accurately and efficiently track CoMaL points. For this, the level line segment

associated with the CoMaL points is matched to MSER segments in the next frame

using shape-based matching and the matches are further filtered using

texture-based matching. Experiments show improvements over a simple

re-detect-and-match framework as well as KLT in terms of speed/accuracy on

different real-world applications, especially at the object boundaries.

Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs

Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

To achieve state-of-the-art results on challenges in vision, Convolutional

Neural Networks learn stationary filters that take advantage of the underlying

image structure. Our purpose is to propose an efficient layer formulation that

extends this property to any domain described by a graph. Namely, we use the

support of its adjacency matrix to design learnable weight sharing filters able

to exploit the underlying structure of signals. The proposed formulation makes

it possible to learn the weights of the filter as well as a scheme that

controls how they are shared across the graph. We perform validation

experiments with image datasets and show that these filters offer performances

comparable with convolutional ones.

Training Quantized Nets: A Deeper Understanding

Hao Li , Soham De , Zheng Xu , Christoph Studer , Hanan Samet , Tom Goldstein Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV)

Currently, deep neural networks are deployed on low-power embedded devices by

first training a full-precision model using powerful computing hardware, and

then deriving a corresponding low-precision model for efficient inference on

such systems. However, training models directly with coarsely quantized weights

is a key step towards learning on embedded platforms that have limited

computing resources, memory capacity, and power consumption. Numerous recent

publications have studied methods for training quantized network, but these

studies have mostly been empirical. In this work, we investigate training

methods for quantized neural networks from a theoretical viewpoint. We first

explore accuracy guarantees for training methods under convexity assumptions.

We then look at the behavior of algorithms for non-convex problems, and we show

that training algorithms that exploit high-precision representations have an

important annealing property that purely quantized training methods lack, which

explains many of the observed empirical differences between these types of

algorithms.

Artificial Intelligence

What Does a Belief Function Believe In ?

Andrzej Matuszewski , Mieczysław A. Kłopotek

Comments: 13 pages

Subjects

:

Artificial Intelligence (cs.AI)

The conditioning in the Dempster-Shafer Theory of Evidence has been defined

(by Shafer cite{Shafer:90} as combination of a belief function and of an

“event” via Dempster rule.

On the other hand Shafer cite{Shafer:90} gives a “probabilistic”

interpretation of a belief function (hence indirectly its derivation from a

sample). Given the fact that conditional probability distribution of a

sample-derived probability distribution is a probability distribution derived

from a subsample (selected on the grounds of a conditioning event), the paper

investigates the empirical nature of the Dempster- rule of combination.

It is demonstrated that the so-called “conditional” belief function is not a

belief function given an event but rather a belief function given manipulation

of original empirical data.\ Given this, an interpretation of belief function

different from that of Shafer is proposed. Algorithms for construction of

belief networks from data are derived for this interpretation.

Responsible Autonomy

Virginia Dignum

Comments: IJCAI2017 (International Joint Conference on Artificial Intelligence)

Subjects

:

Artificial Intelligence (cs.AI)

As intelligent systems are increasingly making decisions that directly affect

society, perhaps the most important upcoming research direction in AI is to

rethink the ethical implications of their actions. Means are needed to

integrate moral, societal and legal values with technological developments in

AI, both during the design process as well as part of the deliberation

algorithms employed by these systems. In this paper, we describe leading ethics

theories and propose alternative ways to ensure ethical behavior by artificial

systems. Given that ethics are dependent on the socio-cultural context and are

often only implicit in deliberation processes, methodologies are needed to

elicit the values held by designers and stakeholders, and to make these

explicit leading to better understanding and trust on artificial autonomous

systems.

Regular Boardgames

Jakub Kowalski , Jakub Sutowicz , Marek Szykuła Subjects : Artificial Intelligence (cs.AI)

We present an initial version of Regular Boardgames general game description

language. This stands as an extension of Simplified Boardgames language. Our

language is designed to be able to express the rules of a majority of popular

boardgames including the complex rules such as promotions, castling, en

passant, jump captures, liberty captures, and obligatory moves. The language

describes all the above through one consistent general mechanism based on

regular expressions, without using exceptions or ad hoc rules.

Predictive Coding-based Deep Dynamic Neural Network for Visuomotor Learning

Jungsik Hwang , Jinhyung Kim , Ahmadreza Ahmadi , Minkyu Choi , Jun Tani

Comments: Accepted at the 7th Joint IEEE International Conference of Developmental Learning and Epigenetic Robotics (ICDL-EpiRob 2017)

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)

This study presents a dynamic neural network model based on the predictive

coding framework for perceiving and predicting the dynamic visuo-proprioceptive

patterns. In our previous study [1], we have shown that the deep dynamic neural

network model was able to coordinate visual perception and action generation in

a seamless manner. In the current study, we extended the previous model under

the predictive coding framework to endow the model with a capability of

perceiving and predicting dynamic visuo-proprioceptive patterns as well as a

capability of inferring intention behind the perceived visuomotor information

through minimizing prediction error. A set of synthetic experiments were

conducted in which a robot learned to imitate the gestures of another robot in

a simulation environment. The experimental results showed that with given

intention states, the model was able to mentally simulate the possible incoming

dynamic visuo-proprioceptive patterns in a top-down process without the inputs

from the external environment. Moreover, the results highlighted the role of

minimizing prediction error in inferring underlying intention of the perceived

visuo-proprioceptive patterns, supporting the predictive coding account of the

mirror neuron systems. The results also revealed that minimizing prediction

error in one modality induced the recall of the corresponding representation of

another modality acquired during the consolidative learning of raw-level

visuo-proprioceptive patterns.

Seamless Integration and Coordination of Cognitive Skills in Humanoid Robots: A Deep Learning Approach

Jungsik Hwang , Jun Tani

Comments: Accepted in the IEEE Transactions on Cognitive and Developmental Systems (TCDS), 2017

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Robotics (cs.RO)

This study investigates how adequate coordination among the different

cognitive processes of a humanoid robot can be developed through end-to-end

learning of direct perception of visuomotor stream. We propose a deep dynamic

neural network model built on a dynamic vision network, a motor generation

network, and a higher-level network. The proposed model was designed to process

and to integrate direct perception of dynamic visuomotor patterns in a

hierarchical model characterized by different spatial and temporal constraints

imposed on each level. We conducted synthetic robotic experiments in which a

robot learned to read human’s intention through observing the gestures and then

to generate the corresponding goal-directed actions. Results verify that the

proposed model is able to learn the tutored skills and to generalize them to

novel situations. The model showed synergic coordination of perception, action

and decision making, and it integrated and coordinated a set of cognitive

skills including visual perception, intention reading, attention switching,

working memory, action preparation and execution in a seamless manner. Analysis

reveals that coherent internal representations emerged at each level of the

hierarchy. Higher-level representation reflecting actional intention developed

by means of continuous integration of the lower-level visuo-proprioceptive

stream.

Design and Implementation of Modified Fuzzy based CPU Scheduling Algorithm

Rajani Kumari , Vivek Kumar Sharma , Sandeep Kumar

Comments: 6 Pages

Journal-ref: International Journal of Computer Applications, Volume 77, No 17,

September 2013

Subjects

:

Operating Systems (cs.OS)

; Artificial Intelligence (cs.AI)

CPU Scheduling is the base of multiprogramming. Scheduling is a process which

decides order of task from a set of multiple tasks that are ready to execute.

There are number of CPU scheduling algorithms available, but it is very

difficult task to decide which one is better. This paper discusses the design

and implementation of modified fuzzy based CPU scheduling algorithm. This paper

present a new set of fuzzy rules. It demonstrates that scheduling done with new

priority improves average waiting time and average turnaround time.

Reading Twice for Natural Language Understanding

Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Despite the recent success of neural networks in tasks involving natural

language understanding (NLU) there has only been limited progress in some of

the fundamental challenges of NLU, such as the disambiguation of the meaning

and function of words in context. This work approaches this problem by

incorporating contextual information into word representations prior to

processing the task at hand. To this end we propose a general-purpose reading

architecture that is employed prior to a task-specific NLU model. It is

responsible for refining context-agnostic word representations with contextual

information and lends itself to the introduction of additional,

context-relevant information from external knowledge sources. We demonstrate

that previously non-competitive models benefit dramatically from employing

contextual representations, closing the gap between general-purpose reading

architectures and the state-of-the-art performance obtained with fine-tuned,

task-specific architectures. Apart from our empirical results we present a

comprehensive analysis of the computed representations which gives insights

into the kind of information added during the refinement process.

Dynamic Discovery of Type Classes and Relations in Semantic Web Data

Serkan Ayvaz , Mehmet Aydar Subjects : Databases (cs.DB) ; Artificial Intelligence (cs.AI)

The continuing development of Semantic Web technologies and the increasing

user adoption in the recent years have accelerated the progress incorporating

explicit semantics with data on the Web. With the rapidly growing RDF (Resource

Description Framework) data on the Semantic Web, processing large semantic

graph data have become more challenging. Constructing a summary graph structure

from the raw RDF can help obtain semantic type relations and reduce the

computational complexity for graph processing purposes. In this paper, we

addressed the problem of graph summarization in RDF graphs, and we proposed an

approach for building summary graph structures automatically from RDF graph

data. Moreover, we introduced a measure to help discover optimum class

dissimilarity thresholds and an effective method to discover the type classes

automatically. In future work, we plan to investigate further improvement

options on the scalability of the proposed method.

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

Karla Stepanova , Matej Hoffmann , Zdenek Straka , Frederico B. Klein , Angelo Cangelosi , Michal Vavrecka

Comments: pp. 155-162

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)

Humans and animals are constantly exposed to a continuous stream of sensory

information from different modalities. At the same time, they form more

compressed representations like concepts or symbols. In species that use

language, this process is further structured by this interaction, where a

mapping between the sensorimotor concepts and linguistic elements needs to be

established. There is evidence that children might be learning language by

simply disambiguating potential meanings based on multiple exposures to

utterances in different contexts (cross-situational learning). In existing

models, the mapping between modalities is usually found in a single step by

directly using frequencies of referent and meaning co-occurrences. In this

paper, we present an extension of this one-step mapping and introduce a newly

proposed sequential mapping algorithm together with a publicly available Matlab

implementation. For demonstration, we have chosen a less typical scenario:

instead of learning to associate objects with their names, we focus on body

representations. A humanoid robot is receiving tactile stimulations on its

body, while at the same time listening to utterances of the body part names

(e.g., hand, forearm and torso). With the goal at arriving at the correct “body

categories”, we demonstrate how a sequential mapping algorithm outperforms

one-step mapping. In addition, the effect of data set size and noise in the

linguistic input are studied.

Distribution-Free One-Pass Learning

Peng Zhao , Zhi-Hua Zhou Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In many large-scale machine learning applications, data are accumulated with

time, and thus, an appropriate model should be able to update in an online

paradigm. Moreover, as the whole data volume is unknown when constructing the

model, it is desired to scan each data item only once with a storage

independent with the data volume. It is also noteworthy that the distribution

underlying may change during the data accumulation procedure. To handle such

tasks, in this paper we propose DFOP, a distribution-free one-pass learning

approach. This approach works well when distribution change occurs during data

accumulation, without requiring prior knowledge about the change. Every data

item can be discarded once it has been scanned. Besides, theoretical guarantee

shows that the estimate error, under a mild assumption, decreases until

convergence with high probability. The performance of DFOP for both regression

and classification are validated in experiments.

Generalized Value Iteration Networks: Life Beyond Lattices

Sufeng Niu , Siheng Chen , Hanyu Guo , Colin Targonski , Melissa C. Smith , Jelena Kovačević

Comments: 14 pages, conference

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI)

In this paper, we introduce a generalized value iteration network (GVIN),

which is an end-to-end neural network planning module. GVIN emulates the value

iteration algorithm by using a novel graph convolution operator, which enables

GVIN to learn and plan on irregular spatial graphs. We propose three novel

differentiable kernels as graph convolution operators and show that the

embedding based kernel achieves the best performance. We further propose

episodic Q-learning, an improvement upon traditional n-step Q-learning that

stabilizes training for networks that contain a planning module. Lastly, we

evaluate GVIN on planning problems in 2D mazes, irregular graphs, and

real-world street networks, showing that GVIN generalizes well for both

arbitrary graphs and unseen graphs of larger scale and outperforms a naive

generalization of VIN (discretizing a spatial graph into a 2D image).

Information Retrieval

On the Robustness of Deep Convolutional Neural Networks for Music Classification

Keunwoo Choi , George Fazekas , Kyunghyun Cho , Mark Sandler Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)

Deep neural networks (DNN) have been successfully applied for music

classification including music tagging. However, there are several open

questions regarding generalisation and best practices in the choice of network

architectures, hyper-parameters and input representations. In this article, we

investigate specific aspects of neural networks to deepen our understanding of

their properties. We analyse and (re-)validate a large music tagging dataset to

investigate the reliability of training and evaluation. We perform

comprehensive experiments involving audio preprocessing using different

time-frequency representations, logarithmic magnitude compression, frequency

weighting and scaling. Using a trained network, we compute label vector

similarities which is compared to groundtruth similarity.

The results highlight several import aspects of music tagging and neural

networks. We show that networks can be effective despite of relatively large

error rates in groundtruth datasets. We subsequently show that many commonly

used input preprocessing techniques are redundant except magnitude compression.

Lastly, the analysis of our trained network provides valuable insight into the

relationships between music tags. These results highlight the benefit of using

data-driven methods to address automatic music tagging.

Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017)

Muthu Kumar Chandrasekaran , Kokil Jaidka , Philipp Mayr

Comments: 2 pages, workshop paper accepted at the SIGIR 2017

Subjects

:

Digital Libraries (cs.DL)

; Information Retrieval (cs.IR)

The large scale of scholarly publications poses a challenge for scholars in

information seeking and sensemaking. Bibliometrics, information retrieval (IR),

text mining and NLP techniques could help in these search and look-up

activities, but are not yet widely used. This workshop is intended to stimulate

IR researchers and digital library professionals to elaborate on new approaches

in natural language processing, information retrieval, scientometrics, text

mining and recommendation techniques that can advance the state-of-the-art in

scholarly document understanding, analysis, and retrieval at scale. The BIRNDL

workshop at SIGIR 2017 will incorporate an invited talk, paper sessions and the

third edition of the Computational Linguistics (CL) Scientific Summarization

Shared Task.

Computation and Language

Reading Twice for Natural Language Understanding

Dirk Weissenborn Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Despite the recent success of neural networks in tasks involving natural

language understanding (NLU) there has only been limited progress in some of

the fundamental challenges of NLU, such as the disambiguation of the meaning

and function of words in context. This work approaches this problem by

incorporating contextual information into word representations prior to

processing the task at hand. To this end we propose a general-purpose reading

architecture that is employed prior to a task-specific NLU model. It is

responsible for refining context-agnostic word representations with contextual

information and lends itself to the introduction of additional,

context-relevant information from external knowledge sources. We demonstrate

that previously non-competitive models benefit dramatically from employing

contextual representations, closing the gap between general-purpose reading

architectures and the state-of-the-art performance obtained with fine-tuned,

task-specific architectures. Apart from our empirical results we present a

comprehensive analysis of the computed representations which gives insights

into the kind of information added during the refinement process.

The Algorithmic Inflection of Russian and Generation of Grammatically Correct Text

T.M. Sadykov , T.A. Zhukov

Comments: 9 pages, 1 figure

Subjects

:

Computation and Language (cs.CL)

We present a deterministic algorithm for Russian inflection. This algorithm

is implemented in a publicly available web-service www.passare.ru which

provides functions for inflection of single words, word matching and synthesis

of grammatically correct Russian text. The inflectional functions have been

tested against the annotated corpus of Russian language OpenCorpora.

Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization

Shuming Ma , Xu Sun , Jingjing Xu , Houfeng Wang , Wenjie Li , Qi Su

Comments: Accepted by ACL

Subjects

:

Computation and Language (cs.CL)

Current Chinese social media text summarization models are based on an

encoder-decoder framework. Although its generated summaries are similar to

source texts literally, they have low semantic relevance. In this work, our

goal is to improve semantic relevance between source texts and summaries for

Chinese social media summarization. We introduce a Semantic Relevance Based

neural model to encourage high semantic similarity between texts and summaries.

In our model, the source text is represented by a gated attention encoder,

while the summary representation is produced by a decoder. Besides, the

similarity score between the representations is maximized during training. Our

experiments show that the proposed model outperforms baseline systems on a

social media corpus.

Content-Based Table Retrieval for Web Queries

Zhao Yan , Duyu Tang , Nan Duan , Junwei Bao , Yuanhua Lv , Ming Zhou , Zhoujun Li Subjects : Computation and Language (cs.CL)

Understanding the connections between unstructured text and semi-structured

table is an important yet neglected problem in natural language processing. In

this work, we focus on content-based table retrieval. Given a query, the task

is to find the most relevant table from a collection of tables. Further

progress towards improving this area requires powerful models of semantic

matching and richer training and evaluation resources. To remedy this, we

present a ranking based approach, and implement both carefully designed

features and neural network architectures to measure the relevance between a

query and the content of a table. Furthermore, we release an open-domain

dataset that includes 21,113 web queries for 273,816 tables. We conduct

comprehensive experiments on both real world and synthetic datasets. Results

verify the effectiveness of our approach and present the challenges for this

task.

Context encoders as a simple but powerful extension of word2vec

Franziska Horn

Comments: ACL 2017 2nd Workshop on Representation Learning for NLP

Subjects

:

Machine Learning (stat.ML)

; Computation and Language (cs.CL); Learning (cs.LG)

With a simple architecture and the ability to learn meaningful word

embeddings efficiently from texts containing billions of words, word2vec

remains one of the most popular neural language models used today. However, as

only a single embedding is learned for every word in the vocabulary, the model

fails to optimally represent words with multiple meanings. Additionally, it is

not possible to create embeddings for new (out-of-vocabulary) words on the

spot. Based on an intuitive interpretation of the continuous bag-of-words

(CBOW) word2vec model’s negative sampling training objective in terms of

predicting context based similarities, we motivate an extension of the model we

call context encoders (ConEc). By multiplying the matrix of trained word2vec

embeddings with a word’s average context vector, out-of-vocabulary (OOV)

embeddings and representations for a word with multiple meanings can be created

based on the word’s local contexts. The benefits of this approach are

illustrated by using these word embeddings as features in the CoNLL 2003 named

entity recognition (NER) task.

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

Karla Stepanova , Matej Hoffmann , Zdenek Straka , Frederico B. Klein , Angelo Cangelosi , Michal Vavrecka

Comments: pp. 155-162

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)

Humans and animals are constantly exposed to a continuous stream of sensory

information from different modalities. At the same time, they form more

compressed representations like concepts or symbols. In species that use

language, this process is further structured by this interaction, where a

mapping between the sensorimotor concepts and linguistic elements needs to be

established. There is evidence that children might be learning language by

simply disambiguating potential meanings based on multiple exposures to

utterances in different contexts (cross-situational learning). In existing

models, the mapping between modalities is usually found in a single step by

directly using frequencies of referent and meaning co-occurrences. In this

paper, we present an extension of this one-step mapping and introduce a newly

proposed sequential mapping algorithm together with a publicly available Matlab

implementation. For demonstration, we have chosen a less typical scenario:

instead of learning to associate objects with their names, we focus on body

representations. A humanoid robot is receiving tactile stimulations on its

body, while at the same time listening to utterances of the body part names

(e.g., hand, forearm and torso). With the goal at arriving at the correct “body

categories”, we demonstrate how a sequential mapping algorithm outperforms

one-step mapping. In addition, the effect of data set size and noise in the

linguistic input are studied.

Distributed, Parallel, and Cluster Computing

Study of Vital Data Analysis Platform Using Wearable Sensor

Yoji Yamato

Comments: 5 pages, 2 figures, IEICE Technical Report, SC2016-34, Mar. 2017. arXiv admin note: substantial text overlap with arXiv:1704.05573

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Computers and Society (cs.CY)

In this paper, we propose a vital data analysis platform which resolves

existing problems to utilize vital data for real-time actions. Recently, IoT

technologies have been progressed but in the healthcare area, real-time actions

based on analyzed vital data are not considered sufficiently yet. The causes

are proper use of analyzing methods of stream / micro batch processing and

network cost. To resolve existing problems, we propose our vital data analysis

platform. Our platform collects vital data of Electrocardiograph and

acceleration using an example of wearable vital sensor and analyzes them to

extract posture, fatigue and relaxation in smart phones or cloud. Our platform

can show analyzed dangerous posture or fatigue level change. We implemented the

platform and we are now preparing a field test.

Clique Gossiping

Yang Liu , Bo Li , Brian Anderson , Guodong Shi Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

This paper proposes and investigates a framework for clique gossip protocols.

As complete subnetworks, the existence of cliques is ubiquitous in various

social, computer, and engineering networks. By clique gossiping, nodes interact

with each other along a sequence of cliques. Clique-gossip protocols are

defined as arbitrary linear node interactions where node states are vectors

evolving as linear dynamical systems. Such protocols become clique-gossip

averaging algorithms when node states are scalars under averaging rules. We

generalize the classical notion of line graph to capture the essential node

interaction structure induced by both the underlying network and the specific

clique sequence. We prove a fundamental eigenvalue invariance principle for

periodic clique-gossip protocols, which implies that any permutation of the

clique sequence leads to the same spectrum for the overall state transition

when the generalized line graph contains no cycle. We also prove that for a

network with (n) nodes, cliques with smaller sizes determined by factors of (n)

can always be constructed leading to finite-time convergent clique-gossip

averaging algorithms, provided (n) is not a prime number. Particularly, such

finite-time convergence can be achieved with cliques of equal size (m) if and

only if (n) is divisible by (m) and they have exactly the same prime factors. A

proven fastest finite-time convergent clique-gossip algorithm is constructed

for clique-gossiping using size-(m) cliques. Additionally, the acceleration

effects of clique-gossiping are illustrated via numerical examples.

Asynchronous Pattern Formation: the effects of a rigorous approach

Serafino Cicerone , Gabriele Di Stefano , Alfredo Navarra

Comments: 41 pages

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

Given a multiset F of points in the Euclidean plane and a set R of robots

such that |R|=|F|, the Pattern Formation (PF) problem asks for a distributed

algorithm that moves robots so as to reach a configuration similar to F.

Similarity means that robots must be disposed as F regardless of translations,

rotations, reflections, uniform scalings. Initially, each robot occupies a

distinct position. When active, a robot operates in standard Look-Compute-Move

cycles. Robots are asynchronous, oblivious, anonymous, silent and execute the

same distributed algorithm. So far, the problem has been mainly addressed by

assuming chirality, that is robots share a common left-right orientation. We

are interested in removing such a restriction. While working on the subject, we

faced several issues that required close attention. We deeply investigated how

such difficulties were overcome in the literature, revealing that crucial

arguments for the correctness proof of the existing algorithms have been

neglected. Here we design a new deterministic distributed algorithm that solves

PF for any pattern when asynchronous robots start from asymmetric

configurations, without chirality. The focus on asymmetric configurations might

be perceived as an over-simplification of the subject due to the common feeling

with the PF problem by the scientific community. However, we demonstrate that

this is not the case. The systematic lack of rigorous arguments with respect to

necessary conditions required for providing correctness proofs deeply affects

the validity as well as the relevance of strategies proposed in the literature.

Our new methodology is characterized by the use of logical predicates in order

to formally describe our algorithm as well as its correctness.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal , Piotr Dollár , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , Kaiming He

Comments: Tech report

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Deep learning thrives with large neural networks and large datasets. However,

larger networks and larger datasets result in longer training times that impede

research and development progress. Distributed synchronous SGD offers a

potential solution to this problem by dividing SGD minibatches over a pool of

parallel workers. Yet to make this scheme efficient, the per-worker workload

must be large, which implies nontrivial growth in the SGD minibatch size. In

this paper, we empirically show that on the ImageNet dataset large minibatches

cause optimization difficulties, but when these are addressed the trained

networks exhibit good generalization. Specifically, we show no loss of accuracy

when training with large minibatch sizes up to 8192 images. To achieve this

result, we adopt a linear scaling rule for adjusting learning rates as a

function of minibatch size and develop a new warmup scheme that overcomes

optimization challenges early in training. With these simple techniques, our

Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs

in one hour, while matching small minibatch accuracy. Using commodity hardware,

our implementation achieves ~90% scaling efficiency when moving from 8 to 256

GPUs. This system enables us to train visual recognition models on

internet-scale data with high efficiency.

Learning

Principled Detection of Out-of-Distribution Examples in Neural Networks

Shiyu Liang , Yixuan Li , R. Srikant Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

We consider the problem of detecting out-of-distribution examples in neural

networks. We propose ODIN, a simple and effective out-of-distribution detector

for neural networks, that does not require any change to a pre-trained model.

Our method is based on the observation that using temperature scaling and

adding small perturbations to the input can separate the softmax score

distributions of in- and out-of-distribution samples, allowing for more

effective detection. We show in a series of experiments that our approach is

compatible with diverse network architectures and datasets. It consistently

outperforms the baseline approach[1] by a large margin, establishing a new

state-of-the-art performance on this task. For example, ODIN reduces the false

positive rate from the baseline 34.7% to 4.3% on the DenseNet (applied to

CIFAR-10) when the true positive rate is 95%. We theoretically analyze the

method and prove that performance improvement is guaranteed under mild

conditions on the image distributions.

Learning Local Receptive Fields and their Weight Sharing Scheme on Graphs

Jean-Charles Vialatte , Vincent Gripon , Gilles Coppin Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

To achieve state-of-the-art results on challenges in vision, Convolutional

Neural Networks learn stationary filters that take advantage of the underlying

image structure. Our purpose is to propose an efficient layer formulation that

extends this property to any domain described by a graph. Namely, we use the

support of its adjacency matrix to design learnable weight sharing filters able

to exploit the underlying structure of signals. The proposed formulation makes

it possible to learn the weights of the filter as well as a scheme that

controls how they are shared across the graph. We perform validation

experiments with image datasets and show that these filters offer performances

comparable with convolutional ones.

Nuclear Discrepancy for Active Learning

Tom J. Viering , Jesse H. Krijthe , Marco Loog

Comments: 32 pages, 5 figures, 4 tables

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Active learning algorithms propose which unlabeled objects should be queried

for their labels to improve a predictive model the most. We study active

learners that minimize generalization bounds and uncover relationships between

these bounds that lead to an improved approach to active learning. In

particular we show the relation between the bound of the state-of-the-art

Maximum Mean Discrepancy (MMD) active learner, the bound of the Discrepancy,

and a new and looser bound that we refer to as the Nuclear Discrepancy bound.

We motivate this bound by a probabilistic argument: we show it considers

situations which are more likely to occur. Our experiments indicate that active

learning using the tightest Discrepancy bound performs the worst in terms of

the squared loss. Overall, our proposed loosest Nuclear Discrepancy

generalization bound performs the best. We confirm our probabilistic argument

empirically: the other bounds focus on more pessimistic scenarios that are

rarer in practice. We conclude that tightness of bounds is not always of main

importance and that active learning methods should concentrate on realistic

scenarios in order to improve performance.

Decoupling "when to update" from "how to update"

Eran Malach , Shai Shalev-Shwartz Subjects : Learning (cs.LG)

Deep learning requires data. A useful approach to obtain data is to be

creative and mine data from various sources, that were created for different

purposes. Unfortunately, this approach often leads to noisy labels. In this

paper, we propose a meta algorithm for tackling the noisy labels problem. The

key idea is to decouple “when to update” from “how to update”. We demonstrate

the effectiveness of our algorithm by mining data for gender classification by

combining the Labeled Faces in the Wild (LFW) face recognition dataset with a

textual genderizing service, which leads to a noisy dataset. While our approach

is very simple to implement, it leads to state-of-the-art results. We analyze

some convergence properties of the proposed algorithm.

Clustering with t-SNE, provably

George C. Linderman , Stefan Steinerberger Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and

visualization method proposed by van der Maaten & Hinton in 2008, has rapidly

become a standard tool in a number of natural sciences. Despite its

overwhelming success, there is a distinct lack of mathematical foundations and

the inner workings of the algorithm are not well understood. The purpose of

this paper is to prove that t-SNE is able to recover well-separated clusters;

more precisely, we prove that t-SNE in the `early exaggeration’ phase, an

optimization technique proposed by van der Maaten & Hinton (2008) and van der

Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests

novel ways for setting the exaggeration parameter (alpha) and step size (h).

Numerical examples illustrate the effectiveness of these rules: in particular,

the quality of embedding of topological structures (e.g. the swiss roll)

improves. We also discuss a connection to spectral clustering methods.

Pain-Free Random Differential Privacy with Sensitivity Sampling

Benjamin I. P. Rubinstein , Francesco Aldà

Comments: 12 pages, 9 figures, 1 table; full report of paper accepted into ICML’2017

Subjects

:

Learning (cs.LG)

; Cryptography and Security (cs.CR); Databases (cs.DB)

Popular approaches to differential privacy, such as the Laplace and

exponential mechanisms, calibrate randomised smoothing through global

sensitivity of the target non-private function. Bounding such sensitivity is

often a prohibitively complex analytic calculation. As an alternative, we

propose a straightforward sampler for estimating sensitivity of non-private

mechanisms. Since our sensitivity estimates hold with high probability, any

mechanism that would be ((epsilon,delta))-differentially private under

bounded global sensitivity automatically achieves

((epsilon,delta,gamma))-random differential privacy (Hall et al., 2012),

without any target-specific calculations required. We demonstrate on worked

example learners how our usable approach adopts a naturally-relaxed privacy

guarantee, while achieving more accurate releases even for non-private

functions that are black-box computer programs.

Self-Normalizing Neural Networks

Günter Klambauer , Thomas Unterthiner , Andreas Mayr , Sepp Hochreiter

Comments: 9 pages (+ 93 pages appendix)

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Deep Learning has revolutionized vision via convolutional neural networks

(CNNs) and natural language processing via recurrent neural networks (RNNs).

However, success stories of Deep Learning with standard feed-forward neural

networks (FNNs) are rare. FNNs that perform well are typically shallow and,

therefore cannot exploit many levels of abstract representations. We introduce

self-normalizing neural networks (SNNs) to enable high-level abstract

representations. While batch normalization requires explicit normalization,

neuron activations of SNNs automatically converge towards zero mean and unit

variance. The activation function of SNNs are “scaled exponential linear units”

(SELUs), which induce self-normalizing properties. Using the Banach fixed-point

theorem, we prove that activations close to zero mean and unit variance that

are propagated through many network layers will converge towards zero mean and

unit variance — even under the presence of noise and perturbations. This

convergence property of SNNs allows to (1) train deep networks with many

layers, (2) employ strong regularization, and (3) to make learning highly

robust. Furthermore, for activations not close to unit variance, we prove an

upper and lower bound on the variance, thus, vanishing and exploding gradients

are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning

repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with

standard FNNs and other machine learning methods such as random forests and

support vector machines. SNNs significantly outperformed all competing FNN

methods at 121 UCI tasks, outperformed all competing methods at the Tox21

dataset, and set a new record at an astronomy data set. The winning SNN

architectures are often very deep. Implementations are available at:

github.com/bioinf-jku/SNNs.

Unlocking the Potential of Simulators: Design with RL in Mind

Rika Antonova , Silvia Cruciani

Comments: Extended abstract for RLDM17 (3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making)

Subjects

:

Learning (cs.LG)

; Robotics (cs.RO)

Using Reinforcement Learning (RL) in simulation to construct policies useful

in real life is challenging. This is often attributed to the sequential

decision making aspect: inaccuracies in simulation accumulate over multiple

steps, hence the simulated trajectories diverge from what would happen in

reality.

In our work we show the need to consider another important aspect: the

mismatch in simulating control. We bring attention to the need for modeling

control as well as dynamics, since oversimplifying assumptions about applying

actions of RL policies could make the policies fail on real-world systems.

We design a simulator for solving a pivoting task (of interest in Robotics)

and demonstrate that even a simple simulator designed with RL in mind

outperforms high-fidelity simulators when it comes to learning a policy that is

to be deployed on a real robotic system. We show that a phenomenon that is hard

to model – friction – could be exploited successfully, even when RL is

performed using a simulator with a simple dynamics and noise model. Hence, we

demonstrate that as long as the main sources of uncertainty are identified, it

could be possible to learn policies applicable to real systems even using a

simple simulator.

RL-compatible simulators could open the possibilities for applying a wide

range of RL algorithms in various fields. This is important, since currently

data sparsity in fields like healthcare and education frequently forces

researchers and engineers to only consider sample-efficient RL approaches.

Successful simulator-aided RL could increase flexibility of experimenting with

RL algorithms and help applying RL policies to real-world settings in fields

where data is scarce. We believe that lessons learned in Robotics could help

other fields design RL-compatible simulators, so we summarize our experience

and conclude with suggestions.

Distribution-Free One-Pass Learning

Peng Zhao , Zhi-Hua Zhou Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In many large-scale machine learning applications, data are accumulated with

time, and thus, an appropriate model should be able to update in an online

paradigm. Moreover, as the whole data volume is unknown when constructing the

model, it is desired to scan each data item only once with a storage

independent with the data volume. It is also noteworthy that the distribution

underlying may change during the data accumulation procedure. To handle such

tasks, in this paper we propose DFOP, a distribution-free one-pass learning

approach. This approach works well when distribution change occurs during data

accumulation, without requiring prior knowledge about the change. Every data

item can be discarded once it has been scanned. Besides, theoretical guarantee

shows that the estimate error, under a mild assumption, decreases until

convergence with high probability. The performance of DFOP for both regression

and classification are validated in experiments.

Luck is Hard to Beat: The Difficulty of Sports Prediction

Raquel YS Aoki , Renato M Assuncao , Pedro OS Vaz de Melo

Comments: 10 pages, KDD2017, Applied Data Science track

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Predicting the outcome of sports events is a hard task. We quantify this

difficulty with a coefficient that measures the distance between the observed

final results of sports leagues and idealized perfectly balanced competitions

in terms of skill. This indicates the relative presence of luck and skill. We

collected and analyzed all games from 198 sports leagues comprising 1503

seasons from 84 countries of 4 different sports: basketball, soccer, volleyball

and handball. We measured the competitiveness by countries and sports. We also

identify in each season which teams, if removed from its league, result in a

completely random tournament. Surprisingly, not many of them are needed. As

another contribution of this paper, we propose a probabilistic graphical model

to learn about the teams’ skills and to decompose the relative weights of luck

and skill in each game. We break down the skill component into factors

associated with the teams’ characteristics. The model also allows to estimate

as 0.36 the probability that an underdog team wins in the NBA league, with a

home advantage adding 0.09 to this probability. As shown in the first part of

the paper, luck is substantially present even in the most competitive

championships, which partially explains why sophisticated and complex

feature-based models hardly beat simple models in the task of forecasting

sports’ outcomes.

Generalized Value Iteration Networks: Life Beyond Lattices

Sufeng Niu , Siheng Chen , Hanyu Guo , Colin Targonski , Melissa C. Smith , Jelena Kovačević

Comments: 14 pages, conference

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI)

In this paper, we introduce a generalized value iteration network (GVIN),

which is an end-to-end neural network planning module. GVIN emulates the value

iteration algorithm by using a novel graph convolution operator, which enables

GVIN to learn and plan on irregular spatial graphs. We propose three novel

differentiable kernels as graph convolution operators and show that the

embedding based kernel achieves the best performance. We further propose

episodic Q-learning, an improvement upon traditional n-step Q-learning that

stabilizes training for networks that contain a planning module. Lastly, we

evaluate GVIN on planning problems in 2D mazes, irregular graphs, and

real-world street networks, showing that GVIN generalizes well for both

arbitrary graphs and unseen graphs of larger scale and outperforms a naive

generalization of VIN (discretizing a spatial graph into a 2D image).

A Convex Framework for Fair Regression

Richard Berk , Hoda Heidari , Shahin Jabbari , Matthew Joseph , Michael Kearns , Jamie Morgenstern , Seth Neel , Aaron Roth Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

We introduce a flexible family of fairness regularizers for (linear and

logistic) regression problems. These regularizers all enjoy convexity,

permitting fast optimization, and they span the rang from notions of group

fairness to strong individual fairness. By varying the weight on the fairness

regularizer, we can compute the efficient frontier of the accuracy-fairness

trade-off on any given dataset, and we measure the severity of this trade-off

via a numerical quantity we call the Price of Fairness (PoF). The centerpiece

of our results is an extensive comparative study of the PoF across six

different datasets in which fairness is a primary consideration.

On learning the structure of Bayesian Networks and submodular function maximization

Giulio Caravagna , Daniele Ramazzotti , Guido Sanguinetti Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

Learning the structure of dependencies among multiple random variables is a

problem of considerable theoretical and practical interest. In practice, score

optimisation with multiple restarts provides a practical and surprisingly

successful solution, yet the conditions under which this may be a well founded

strategy are poorly understood. In this paper, we prove that the problem of

identifying the structure of a Bayesian Network via regularised score

optimisation can be recast, in expectation, as a submodular optimisation

problem, thus guaranteeing optimality with high probability. This result both

explains the practical success of optimisation heuristics, and suggests a way

to improve on such algorithms by artificially simulating multiple data sets via

a bootstrap procedure. We show on several synthetic data sets that the

resulting algorithm yields better recovery performance than the state of the

art, and illustrate in a real cancer genomic study how such an approach can

lead to valuable practical insights.

Training Quantized Nets: A Deeper Understanding

Hao Li , Soham De , Zheng Xu , Christoph Studer , Hanan Samet , Tom Goldstein Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV)

Currently, deep neural networks are deployed on low-power embedded devices by

first training a full-precision model using powerful computing hardware, and

then deriving a corresponding low-precision model for efficient inference on

such systems. However, training models directly with coarsely quantized weights

is a key step towards learning on embedded platforms that have limited

computing resources, memory capacity, and power consumption. Numerous recent

publications have studied methods for training quantized network, but these

studies have mostly been empirical. In this work, we investigate training

methods for quantized neural networks from a theoretical viewpoint. We first

explore accuracy guarantees for training methods under convexity assumptions.

We then look at the behavior of algorithms for non-convex problems, and we show

that training algorithms that exploit high-precision representations have an

important annealing property that purely quantized training methods lack, which

explains many of the observed empirical differences between these types of

algorithms.

Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

Jeffrey Regier , Michael I. Jordan , Jon McAuliffe

Comments: submitted to NIPS 2017

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

We introduce TrustVI, a fast second-order algorithm for black-box variational

inference based on trust-region optimization and the reparameterization trick.

At each iteration, TrustVI proposes and assesses a step based on minibatches of

draws from the variational distribution. The algorithm provably converges to a

stationary point. We implement TrustVI in the Stan framework and compare it to

ADVI. TrustVI typically converges in tens of iterations to a solution at least

as good as the one that ADVI reaches in thousands of iterations. TrustVI

iterations can be more computationally expensive, but total computation is

typically an order of magnitude less in our experiments.

Generative-Discriminative Variational Model for Visual Recognition

Chih-Kuan Yeh , Yao-Hung Hubert Tsai , Yu-Chiang Frank Wang Subjects : Learning (cs.LG)

The paradigm shift from shallow classifiers with hand-crafted features to

end-to-end trainable deep learning models has shown significant improvements on

supervised learning tasks. Despite the promising power of deep neural networks

(DNN), how to alleviate overfitting during training has been a research topic

of interest. In this paper, we present a Generative-Discriminative Variational

Model (GDVM) for visual classification, in which we introduce a latent variable

inferred from inputs for exhibiting generative abilities towards prediction. In

other words, our GDVM casts the supervised learning task as a generative

learning process, with data discrimination to be jointly exploited for improved

classification. In our experiments, we consider the tasks of multi-class

classification, multi-label classification, and zero-shot learning. We show

that our GDVM performs favorably against the baselines or recent generative DNN

models.

Meta-Learning for Construction of Resampling Recommendation Systems

Evgeny Burnaev , Pavel Erofeev , Artem Papanov Subjects : Learning (cs.LG) ; Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)

One possible approach to tackle class imbalance in classification tasks is to

resample training dataset, i.e., to drop some of its elements or to synthesize

new ones. There exist several widely-used resampling methods. Recent research

showed that selection of resampling method essentially affects quality of

classification, which raises resampling selection problem. Exhaustive search

for optimal resampling is time-consuming and hence it is of limited use. In

this paper, we describe an alternative approach to resampling selection. We

follow meta-learning concept to build resampling recommendation systems, i.e.,

algorithms recommending resampling for datasets on the basis of their

properties.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal , Piotr Dollár , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , Kaiming He

Comments: Tech report

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Deep learning thrives with large neural networks and large datasets. However,

larger networks and larger datasets result in longer training times that impede

research and development progress. Distributed synchronous SGD offers a

potential solution to this problem by dividing SGD minibatches over a pool of

parallel workers. Yet to make this scheme efficient, the per-worker workload

must be large, which implies nontrivial growth in the SGD minibatch size. In

this paper, we empirically show that on the ImageNet dataset large minibatches

cause optimization difficulties, but when these are addressed the trained

networks exhibit good generalization. Specifically, we show no loss of accuracy

when training with large minibatch sizes up to 8192 images. To achieve this

result, we adopt a linear scaling rule for adjusting learning rates as a

function of minibatch size and develop a new warmup scheme that overcomes

optimization challenges early in training. With these simple techniques, our

Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs

in one hour, while matching small minibatch accuracy. Using commodity hardware,

our implementation achieves ~90% scaling efficiency when moving from 8 to 256

GPUs. This system enables us to train visual recognition models on

internet-scale data with high efficiency.

Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs

Cristóbal Esteban , Stephanie L. Hyland , Gunnar Rätsch

Comments: 11 pages, 4 figures, 2 tables

Subjects

:

Machine Learning (stat.ML)

; Learning (cs.LG)

Generative Adversarial Networks (GANs) have shown remarkable success as a

framework for training models to produce realistic-looking data. In this work,

we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to

produce realistic real-valued multi-dimensional time series, with an emphasis

on their application to medical data. RGANs make use of recurrent neural

networks in the generator and the discriminator. In the case of RCGANs, both of

these RNNs are conditioned on auxiliary information. We demonstrate our models

in a set of toy datasets, where we show visually and quantitatively (using

sample likelihood and maximum mean discrepancy) that they can successfully

generate realistic time-series. We also describe novel evaluation methods for

GANs, where we generate a synthetic labelled training dataset, and evaluate on

a real test set the performance of a model trained on the synthetic data, and

vice-versa. We illustrate with these metrics that RCGANs can generate

time-series data useful for supervised training, with only minor degradation in

performance on real test data. This is demonstrated on digit classification

from ‘serialised’ MNIST and by training an early warning system on a medical

dataset of 17,000 patients from an intensive care unit. We further discuss and

analyse the privacy concerns that may arise when using RCGANs to generate

realistic synthetic medical time series data.

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Hyunjik Kim , Yee Whye Teh Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)

Automating statistical modelling is a challenging problem that has

far-reaching implications for artificial intelligence. The Automatic

Statistician employs a kernel search algorithm to provide a first step in this

direction for regression problems. However this does not scale due to its

(O(N^3)) running time for the model selection. This is undesirable not only

because the average size of data sets is growing fast, but also because there

is potentially more information in bigger data, implying a greater need for

more expressive models that can discover finer structure. We propose Scalable

Kernel Composition (SKC), a scalable kernel search algorithm, to encompass big

data within the boundaries of automated statistical modelling.

Context encoders as a simple but powerful extension of word2vec

Franziska Horn

Comments: ACL 2017 2nd Workshop on Representation Learning for NLP

Subjects

:

Machine Learning (stat.ML)

; Computation and Language (cs.CL); Learning (cs.LG)

With a simple architecture and the ability to learn meaningful word

embeddings efficiently from texts containing billions of words, word2vec

remains one of the most popular neural language models used today. However, as

only a single embedding is learned for every word in the vocabulary, the model

fails to optimally represent words with multiple meanings. Additionally, it is

not possible to create embeddings for new (out-of-vocabulary) words on the

spot. Based on an intuitive interpretation of the continuous bag-of-words

(CBOW) word2vec model’s negative sampling training objective in terms of

predicting context based similarities, we motivate an extension of the model we

call context encoders (ConEc). By multiplying the matrix of trained word2vec

embeddings with a word’s average context vector, out-of-vocabulary (OOV)

embeddings and representations for a word with multiple meanings can be created

based on the word’s local contexts. The benefits of this approach are

illustrated by using these word embeddings as features in the CoNLL 2003 named

entity recognition (NER) task.

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

Karla Stepanova , Matej Hoffmann , Zdenek Straka , Frederico B. Klein , Angelo Cangelosi , Michal Vavrecka

Comments: pp. 155-162

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Robotics (cs.RO)

Humans and animals are constantly exposed to a continuous stream of sensory

information from different modalities. At the same time, they form more

compressed representations like concepts or symbols. In species that use

language, this process is further structured by this interaction, where a

mapping between the sensorimotor concepts and linguistic elements needs to be

established. There is evidence that children might be learning language by

simply disambiguating potential meanings based on multiple exposures to

utterances in different contexts (cross-situational learning). In existing

models, the mapping between modalities is usually found in a single step by

directly using frequencies of referent and meaning co-occurrences. In this

paper, we present an extension of this one-step mapping and introduce a newly

proposed sequential mapping algorithm together with a publicly available Matlab

implementation. For demonstration, we have chosen a less typical scenario:

instead of learning to associate objects with their names, we focus on body

representations. A humanoid robot is receiving tactile stimulations on its

body, while at the same time listening to utterances of the body part names

(e.g., hand, forearm and torso). With the goal at arriving at the correct “body

categories”, we demonstrate how a sequential mapping algorithm outperforms

one-step mapping. In addition, the effect of data set size and noise in the

linguistic input are studied.

Forward Thinking: Building and Training Neural Networks One Layer at a Time

Chris Hettinger , Tanner Christensen , Ben Ehlert , Jeffrey Humpherys , Tyler Jarvis , Sean Wade Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)

We present a general framework for training deep neural networks without

backpropagation. This substantially decreases training time and also allows for

construction of deep networks with many sorts of learners, including networks

whose layers are defined by functions that are not easily differentiated, like

decision trees. The main idea is that layers can be trained one at a time, and

once they are trained, the input data are mapped forward through the layer to

create a new learning problem. The process is repeated, transforming the data

through multiple layers, one at a time, rendering a new data set, which is

expected to be better behaved, and on which a final output layer can achieve

good performance. We call this forward thinking and demonstrate a proof of

concept by achieving state-of-the-art accuracy on the MNIST dataset for

convolutional neural networks. We also provide a general mathematical

formulation of forward thinking that allows for other types of deep learning

problems to be considered.

Predictive Coding-based Deep Dynamic Neural Network for Visuomotor Learning

Jungsik Hwang , Jinhyung Kim , Ahmadreza Ahmadi , Minkyu Choi , Jun Tani

Comments: Accepted at the 7th Joint IEEE International Conference of Developmental Learning and Epigenetic Robotics (ICDL-EpiRob 2017)

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)

This study presents a dynamic neural network model based on the predictive

coding framework for perceiving and predicting the dynamic visuo-proprioceptive

patterns. In our previous study [1], we have shown that the deep dynamic neural

network model was able to coordinate visual perception and action generation in

a seamless manner. In the current study, we extended the previous model under

the predictive coding framework to endow the model with a capability of

perceiving and predicting dynamic visuo-proprioceptive patterns as well as a

capability of inferring intention behind the perceived visuomotor information

through minimizing prediction error. A set of synthetic experiments were

conducted in which a robot learned to imitate the gestures of another robot in

a simulation environment. The experimental results showed that with given

intention states, the model was able to mentally simulate the possible incoming

dynamic visuo-proprioceptive patterns in a top-down process without the inputs

from the external environment. Moreover, the results highlighted the role of

minimizing prediction error in inferring underlying intention of the perceived

visuo-proprioceptive patterns, supporting the predictive coding account of the

mirror neuron systems. The results also revealed that minimizing prediction

error in one modality induced the recall of the corresponding representation of

another modality acquired during the consolidative learning of raw-level

visuo-proprioceptive patterns.

Seamless Integration and Coordination of Cognitive Skills in Humanoid Robots: A Deep Learning Approach

Jungsik Hwang , Jun Tani

Comments: Accepted in the IEEE Transactions on Cognitive and Developmental Systems (TCDS), 2017

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Robotics (cs.RO)

This study investigates how adequate coordination among the different

cognitive processes of a humanoid robot can be developed through end-to-end

learning of direct perception of visuomotor stream. We propose a deep dynamic

neural network model built on a dynamic vision network, a motor generation

network, and a higher-level network. The proposed model was designed to process

and to integrate direct perception of dynamic visuomotor patterns in a

hierarchical model characterized by different spatial and temporal constraints

imposed on each level. We conducted synthetic robotic experiments in which a

robot learned to read human’s intention through observing the gestures and then

to generate the corresponding goal-directed actions. Results verify that the

proposed model is able to learn the tutored skills and to generalize them to

novel situations. The model showed synergic coordination of perception, action

and decision making, and it integrated and coordinated a set of cognitive

skills including visual perception, intention reading, attention switching,

working memory, action preparation and execution in a seamless manner. Analysis

reveals that coherent internal representations emerged at each level of the

hierarchy. Higher-level representation reflecting actional intention developed

by means of continuous integration of the lower-level visuo-proprioceptive

stream.

Creating Virtual Universes Using Generative Adversarial Networks

Mustafa Mustafa , Deborah Bard , Wahid Bhimji , Rami Al-Rfou , Zarija Lukić

Comments: 8 pages, 5 figures

Subjects

:

Instrumentation and Methods for Astrophysics (astro-ph.IM)

; Learning (cs.LG)

Inferring model parameters from experimental data is a grand challenge in

many sciences, including cosmology. This often relies critically on high

fidelity numerical simulations, which are prohibitively computationally

expensive. The application of deep learning techniques to generative modeling

is renewing interest in using high dimensional density estimators as

computationally inexpensive emulators of fully-fledged simulations. These

generative models have the potential to make a dramatic shift in the field of

scientific simulations, but for that shift to happen we need to study the

performance of such generators in the precision regime needed for science

applications. To this end, in this letter we apply Generative Adversarial

Networks to the problem of generating cosmological weak lensing convergence

maps. We show that our generator network produces maps that are described by,

with high statistical confidence, the same summary statistics as the fully

simulated maps.

On the Robustness of Deep Convolutional Neural Networks for Music Classification

Keunwoo Choi , George Fazekas , Kyunghyun Cho , Mark Sandler Subjects : Information Retrieval (cs.IR) ; Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)

Deep neural networks (DNN) have been successfully applied for music

classification including music tagging. However, there are several open

questions regarding generalisation and best practices in the choice of network

architectures, hyper-parameters and input representations. In this article, we

investigate specific aspects of neural networks to deepen our understanding of

their properties. We analyse and (re-)validate a large music tagging dataset to

investigate the reliability of training and evaluation. We perform

comprehensive experiments involving audio preprocessing using different

time-frequency representations, logarithmic magnitude compression, frequency

weighting and scaling. Using a trained network, we compute label vector

similarities which is compared to groundtruth similarity.

The results highlight several import aspects of music tagging and neural

networks. We show that networks can be effective despite of relatively large

error rates in groundtruth datasets. We subsequently show that many commonly

used input preprocessing techniques are redundant except magnitude compression.

Lastly, the analysis of our trained network provides valuable insight into the

relationships between music tags. These results highlight the benefit of using

data-driven methods to address automatic music tagging.

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

Xiao Yang , Ersin Yumer , Paul Asente , Mike Kraley , Daniel Kifer , C. Lee Giles

Comments: CVPR 2017 Spotlight

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

We present an end-to-end, multimodal, fully convolutional network for

extracting semantic structures from document images. We consider document

semantic structure extraction as a pixel-wise segmentation task, and propose a

unified model that classifies pixels based not only on their visual appearance,

as in the traditional page segmentation task, but also on the content of

underlying text. Moreover, we propose an efficient synthetic document

generation process that we use to generate pretraining data for our network.

Once the network is trained on a large set of synthetic documents, we fine-tune

the network on unlabeled real documents using a semi-supervised approach. We

systematically study the optimum network architecture and show that both our

multimodal approach and the synthetic data pretraining significantly boost the

performance.

Low-shot learning with large-scale diffusion

Matthijs Douze , Arthur Szlam , Bharath Hariharan , Hervé Jégou Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)

This paper considers the problem of inferring image labels for which only a

few labelled examples are available at training time. This setup is often

referred to as low-shot learning in the literature, where a standard approach

is to re-train the last few layers of a convolutional neural network learned on

separate classes. We consider a semi-supervised setting in which we exploit a

large collection of images to support label propagation. This is made possible

by leveraging the recent advances on large-scale similarity graph construction.

We show that despite its conceptual simplicity, scaling up label propagation to

up hundred millions of images leads to state of the art accuracy in the

low-shot learning regime.

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

Sharath Adavanne , Giambattista Parascandolo , Pasi Pertilä , Toni Heittola , Tuomas Virtanen Subjects : Sound (cs.SD) ; Learning (cs.LG)

In this paper, we propose the use of spatial and harmonic features in

combination with long short term memory (LSTM) recurrent neural network (RNN)

for automatic sound event detection (SED) task. Real life sound recordings

typically have many overlapping sound events, making it hard to recognize with

just mono channel audio. Human listeners have been successfully recognizing the

mixture of overlapping sound events using pitch cues and exploiting the stereo

(multichannel) audio signal available at their ears to spatially localize these

events. Traditionally SED systems have only been using mono channel audio,

motivated by the human listener we propose to extend them to use multichannel

audio. The proposed SED system is compared against the state of the art mono

channel method on the development subset of TUT sound events detection 2016

database. The usage of spatial and harmonic features are shown to improve the

performance of SED.

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

Miroslav Malik , Sharath Adavanne , Konstantinos Drossos , Tuomas Virtanen , Dasa Ticha , Roman Jarina

Comments: Accepted for Sound and Music Computing (SMC 2017)

Subjects

:

Sound (cs.SD)

; Learning (cs.LG)

This paper studies the emotion recognition from musical tracks in the

2-dimensional valence-arousal (V-A) emotional space. We propose a method based

on convolutional (CNN) and recurrent neural networks (RNN), having

significantly fewer parameters compared with the state-of-the-art method for

the same task. We utilize one CNN layer followed by two branches of RNNs

trained separately for arousal and valence. The method was evaluated using the

‘MediaEval2015 emotion in music’ dataset. We achieved an RMSE of 0.202 for

arousal and 0.268 for valence, which is the best result reported on this

dataset.

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

Sharath Adavanne , Pasi Pertilä , Tuomas Virtanen

Comments: Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017)

Subjects

:

Sound (cs.SD)

; Learning (cs.LG)

This paper proposes to use low-level spatial features extracted from

multichannel audio for sound event detection. We extend the convolutional

recurrent neural network to handle more than one type of these multichannel

features by learning from each of them separately in the initial stages. We

show that instead of concatenating the features of each channel into a single

feature vector the network learns sound events in multichannel audio better

when they are presented as separate layers of a volume. Using the proposed

spatial features over monaural features on the same network gives an absolute

F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and

2.7% on the TUT-SED 2009 dataset that is fifteen times larger.

Information Theory

Physical Layer Security of Generalised Pre-coded Spatial Modulation with Antenna Scrambling

Rong Zhang , Lie-Liang Yang , Lajos Hanzo Subjects : Information Theory (cs.IT)

We now advocate a novel physical layer security solution that is unique to

our previously proposed GPSM scheme with the aid of the proposed antenna

scrambling. The novelty and contribution of our paper lies in three aspects: 1/

principle: we introduce a `security key’ generated at Alice that is unknown to

both Bob and Eve, where the design goal is that the publicly unknown security

key only imposes barrier for Eve. 2/ approach: we achieve it by conveying

useful information only through the activation of RA indices, which is in turn

concealed by the unknown security key in terms of the randomly scrambled

symbols used in place of the conventional modulated symbols in GPSM scheme. 3/

design: we consider both Circular Antenna Scrambling (CAS) and Gaussian Antenna

Scrambling (GAS) in detail and the resultant security capacity of both designs

are quantified and compared.

Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel

Silas L. Fong , Vincent Y. F. Tan

Comments: 24 pages

Subjects

:

Information Theory (cs.IT)

This paper investigates polar codes for the additive white Gaussian noise

(AWGN) channel. The scaling exponent (mu) of polar codes for a memoryless

channel (q_{Y|X}) with capacity (I(q_{Y|X})) characterizes the closest gap

between the capacity and non-asymptotic achievable rates in the following way:

For a fixed (varepsilon in (0, 1)), the gap between the capacity (I(q_{Y|X}))

and the maximum non-asymptotic rate (R_n^*) achieved by a length-(n) polar code

with average error probability (varepsilon) scales as (n^{-1/mu}), i.e.,

(I(q_{Y|X})-R_n^* = Theta(n^{-1/mu})).

It is well known that the scaling exponent (mu) for any binary-input

memoryless channel (BMC) with (I(q_{Y|X})in(0,1)) is bounded above by (4.714),

which was shown by an explicit construction of polar codes. Our main result

shows that (4.714) remains to be a valid upper bound on the scaling exponent

for the AWGN channel. Our proof technique involves the following two ideas: (i)

The capacity of the AWGN channel can be achieved within a gap of

(O(n^{-1/mu}sqrt{log n})) by using an input alphabet consisting of (n)

constellations and restricting the input distribution to be uniform; (ii) The

capacity of a multiple access channel (MAC) with an input alphabet consisting

of (n) constellations can be achieved within a gap of (O(n^{-1/mu}log n)) by

using a superposition of (log n) binary-input polar codes. In addition, we

investigate the performance of polar codes in the moderate deviations regime

where both the gap to capacity and the error probability vanish as (n) grows.

An explicit construction of polar codes is proposed to obey a certain tradeoff

between the gap to capacity and the decay rate of the error probability for the

AWGN channel.

Estimating Mixture Entropy with Pairwise Distances

Artemy Kolchinsky , Brendan D. Tracey Subjects : Information Theory (cs.IT) ; Methodology (stat.ME); Machine Learning (stat.ML)

Mixture distributions arise in many parametric and non-parametric settings,

for example in Gaussian mixture models and in non-parametric estimation. It is

often necessary to compute the entropy of a mixture, but in most cases this

quantity has no closed-form expression, making some form of approximation

necessary. We propose a family of estimators based on a pairwise distance

function between mixture components, and show that this estimator class has

many attractive properties. For many distributions of interest, the proposed

estimators are efficient to compute, differentiable in the mixture parameters,

and become exact when the mixture components are clustered. We prove this

family includes lower and upper bounds on the mixture entropy. The Chernoff

(alpha)-divergence gives a lower bound when chosen as the distance function,

with the Bhattacharyaa distance providing the tightest lower bound for

components that are symmetric and members of a location family. The

Kullback-Leibler divergence gives an upper bound when used as the distance

function. We provide closed-form expressions of these bounds for mixtures of

Gaussians, and discuss their applications to the estimation of mutual

information. We then demonstrate that our bounds are significantly tighter than

well-known existing bounds using numeric simulations. This pairwise estimator

class is very useful in optimization problems involving

maximization/minimization of entropy and mutual information, such as MaxEnt and

rate distortion problems.

Delay Optimal Scheduling for Chunked Random Linear Network Coding Broadcast

Emmanouil Skevakis , Ioannis Lambadaris , Hassan Halabian

Comments: 13 pages, 17 figures, to be submitted in Transactions on Control of Network Systems

Subjects

:

Information Theory (cs.IT)

We study the broadcast transmission of a single file to an arbitrary number

of receivers using Random Linear Network Coding (RLNC) in a network with

unreliable channels. Due to the increased computational complexity of the

decoding process (especially for large files) we apply chunked RLNC (i.e. RLNC

is applied within non-overlapping subsets of the file).

In our work we show the optimality of the Least Received (LR) batch

scheduling policy (which was introduced in our prior work) with regards to the

expected file transfer completion time. Furthermore, we refine some of our

earlier results, namely the expected file transfer completion time of the LR

policy and the minimum achievable coding window size in the case of a user

defined delay constraint. Finally, we experimentally evaluate a modification of

the LR policy in a more realistic system setting with reduced feedback from the

receivers.

欢迎加入我爱机器学习QQ11群:191401275

arXiv Paper Daily: Fri, 9 Jun 2017

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习


以上所述就是小编给大家介绍的《arXiv Paper Daily: Fri, 9 Jun 2017》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机常用算法

计算机常用算法

徐士良 / 第2版 (1995年11月1日) / 1995-11 / 25.0

《计算机常用算法(第2版)》由清华大学出版社出版。一起来看看 《计算机常用算法》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具