arXiv Paper Daily: Wed, 14 Jun 2017

栏目: 数据库 · 发布时间: 7年前

内容简介：arXiv Paper Daily: Wed, 14 Jun 2017

Neural and Evolutionary Computing

Temporally Efficient Deep Learning with Spikes

Peter O'Connor , Efstratios Gavves , Max Welling

Comments: 8 pages + references and appendix

Subjects

Neural and Evolutionary Computing (cs.NE)

The vast majority of natural sensory data is temporally redundant. Video

frames or audio samples which are sampled at nearby points in time tend to have

similar values. Typically, deep learning algorithms take no advantage of this

redundancy to reduce computation. This can be an obscene waste of energy. We

present a variant on backpropagation for neural networks in which computation

scales with the rate of change of the data – not the rate at which we process

the data. We do this by having neurons communicate a combination of their

state, and their temporal change in state. Intriguingly, this simple

communication rule give rise to units that resemble biologically-inspired leaky

integrate-and-fire neurons, and to a weight-update rule that is equivalent to a

form of Spike-Timing Dependent Plasticity (STDP), a synaptic learning rule

observed in the brain. We demonstrate that on MNIST and a temporal variant of

MNIST, our algorithm performs about as well as a Multilayer Perceptron trained

with backpropagation, despite only communicating discrete values between

layers.

Prediction of Muscle Activations for Reaching Movements using Deep Neural Networks

Najeeb Khan , Ian Stavness

Comments: To be presented at the Annual meeting of American Society of Biomechanics 2017

Subjects

Neural and Evolutionary Computing (cs.NE)

The motor control problem involves determining the time-varying muscle

activation trajectories required to accomplish a given movement. Muscle

redundancy makes motor control a challenging task: there are many possible

activation trajectories that accomplish the same movement. Despite this

redundancy, most movements are accomplished in highly stereotypical ways. For

example, point-to-point reaching movements are almost universally performed

with very similar smooth trajectories. Optimization methods are commonly used

to predict muscle forces for measured movements. However, these approaches

require computationally expensive simulations and are sensitive to the chosen

optimality criteria and regularization. In this work, we investigate deep

autoencoders for the prediction of muscle activation trajectories for

point-to-point reaching movements. We evaluate our DNN predictions with

simulated reaches and two methods to generate the muscle activations: inverse

dynamics (ID) and optimal control (OC) criteria. We also investigate optimal

network parameters and training criteria to improve the accuracy of the

predictions.

From MEGATON to RASCAL: Surfing the Parameter Space of Evolutionary Algorithms

Moshe Sipper , Weixuan Fu , Karuna Ahuja , Jason H. Moore Subjects : Neural and Evolutionary Computing (cs.NE)

The practice of evolutionary algorithms involves a mundane yet inescapable

phase, namely, finding parameters that work well. How big should the population

be? How many generations should the algorithm run? What is the (tournament

selection) tournament size? What probabilities should one assign to crossover

and mutation? All these nagging questions need good answers if one is to

embrace success. Through an extensive series of experiments over multiple

evolutionary algorithm implementations and problems we show that parameter

space tends to be rife with viable parameters. We aver that this renders the

life of the practitioner that much easier, and cap off our study with an

advisory digest for the weary.

Recurrent Inference Machines for Solving Inverse Problems

Patrick Putzky , Max Welling Subjects : Neural and Evolutionary Computing (cs.NE) ; Computer Vision and Pattern Recognition (cs.CV)

Much of the recent research on solving iterative inference problems focuses

on moving away from hand-chosen inference algorithms and towards learned

inference. In the latter, the inference process is unrolled in time and

interpreted as a recurrent neural network (RNN) which allows for joint learning

of model and inference parameters with back-propagation through time. In this

framework, the RNN architecture is directly derived from a hand-chosen

inference algorithm, effectively limiting its capabilities. We propose a

learning framework, called Recurrent Inference Machines (RIM), in which we turn

algorithm construction the other way round: Given data and a task, train an RNN

to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they

are capable to implement any inference algorithm. The framework allows for an

abstraction which removes the need for domain knowledge. We demonstrate in

several image restoration experiments that this abstraction is effective,

allowing us to achieve state-of-the-art performance on image denoising and

super-resolution tasks and superior across-task generalization.

Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

Jinzhuo Wang , Wenmin Wang , Ronggang Wang , Wen Gao

Comments: AAAI 2017

Subjects

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Monte Carlo tree search (MCTS) is extremely popular in computer Go which

determines each action by enormous simulations in a broad and deep search tree.

However, human experts select most actions by pattern analysis and careful

evaluation rather than brute search of millions of future nteractions. In this

paper, we propose a computer Go system that follows experts way of thinking and

playing. Our system consists of two parts. The first part is a novel deep

alternative neural network (DANN) used to generate candidates of next move.

Compared with existing deep convolutional neural network (DCNN), DANN inserts

recurrent layer after each convolutional layer and stacks them in an

alternative manner. We show such setting can preserve more contexts of local

features and its evolutions which are beneficial for move prediction. The

second part is a long-term evaluation (LTE) module used to provide a reliable

evaluation of candidates rather than a single probability from move predictor.

This is consistent with human experts nature of playing since they can foresee

tens of steps to give an accurate estimation of candidates. In our system, for

each candidate, LTE calculates a cumulative reward after several future

interactions when local variations are settled. Combining criteria from the two

parts, our system determines the optimal choice of next move. For more

comprehensive experiments, we introduce a new professional Go dataset (PGD),

consisting of 253233 professional records. Experiments on GoGoD and PGD

datasets show the DANN can substantially improve performance of move prediction

over pure DCNN. When combining LTE, our system outperforms most relevant

approaches and open engines based on MCTS.

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

Joan Serrà , Alexandros Karatzoglou

Comments: Accepted for publication at ACM RecSys 2017; previous version submitted to ICLR 2016

Subjects

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Recommendation algorithms that incorporate techniques from deep learning are

becoming increasingly popular. Due to the structure of the data coming from

recommendation domains (i.e., one-hot-encoded vectors of item preferences),

these algorithms tend to have large input and output dimensionalities that

dominate their overall size. This makes them difficult to train, due to the

limited memory of graphical processing units, and difficult to deploy on mobile

devices with limited hardware. To address these difficulties, we propose Bloom

embeddings, a compression technique that can be applied to the input and output

of neural network models dealing with sparse high-dimensional binary-coded

instances. Bloom embeddings are computationally efficient, and do not seriously

compromise the accuracy of the model up to 1/5 compression ratios. In some

cases, they even improve over the original accuracy, with relative increases up

to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4

alternative methods, obtaining favorable results. We also discuss a number of

further advantages of Bloom embeddings, such as ‘on-the-fly’ constant-time

operation, zero or marginal space requirements, training time speedups, or the

fact that they do not require any change to the core model architecture or

training configuration.

A Supervised Approach to Extractive Summarisation of Scientific Papers

Ed Collins , Isabelle Augenstein , Sebastian Riedel

Comments: 11 pages, 6 figures

Subjects

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Applications (stat.AP); Machine Learning (stat.ML)

Automatic summarisation is a popular approach to reduce a document to its

main arguments. Recent research in the area has focused on neural approaches to

summarisation, which can be very data-hungry. However, few large datasets exist

and none for the traditionally popular domain of scientific publications, which

opens up challenging research avenues centered on encoding large, complex

documents. In this paper, we introduce a new dataset for summarisation of

computer science publications by exploiting a large resource of author provided

summaries and show straightforward ways of extending it further. We develop

models on the dataset making use of both neural sentence encoding and

traditionally used summarisation features and show that models which encode

sentences as well as their local and global context perform best, significantly

outperforming well-established baseline methods.

Computer Vision and Pattern Recognition

Video Imagination from a Single Image with Transformation Generation

Baoyang Chen , Wenmin Wang , Jinzhuo Wang , Xiongtao Chen , Weimian Li

Comments: 9 pages, 10 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In this work, we focus on a challenging task: synthesizing multiple imaginary

videos given a single image. Major problems come from high dimensionality of

pixel space and the ambiguity of potential motions. To overcome those problems,

we propose a new framework that produce imaginary videos by transformation

generation. The generated transformations are applied to the original image in

a novel volumetric merge network to reconstruct frames in imaginary video.

Through sampling different latent variables, our method can output different

imaginary video samples. The framework is trained in an adversarial way with

unsupervised learning. For evaluation, we propose a new assessment metric

(RIQA). In experiments, we test on 3 datasets varying from synthetic data to

natural scene. Our framework achieves promising performance in image quality

assessment. The visual inspection indicates that it can successfully generate

diverse five-frame videos in acceptable perceptual quality.

Joint Max Margin and Semantic Features for Continuous Event Detection in Complex Scenes

Iman Abbasnejad , Sridha Sridharan , Simon Denman , Clinton Fookes , Simon Lucey

Comments: submit to journal of Computer Vision and Image Understanding

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In this paper the problem of complex event detection in the continuous domain

(i.e. events with unknown starting and ending locations) is addressed. Existing

event detection methods are limited to features that are extracted from the

local spatial or spatio-temporal patches from the videos. However, this makes

the model vulnerable to the events with similar concepts e.g. “Open drawer” and

“Open cupboard”. In this work, in order to address the aforementioned

limitations we present a novel model based on the combination of semantic and

temporal features extracted from video frames. We train a max-margin classifier

on top of the extracted features in an adaptive framework that is able to

detect the events with unknown starting and ending locations. Our model is

based on the Bidirectional Region Neural Network and large margin Structural

Output SVM. The generality of our model allows it to be simply applied to

different labeled and unlabeled datasets. We finally test our algorithm on

three challenging datasets, “UCF 101-Action Recognition”, “MPII Cooking

Activities” and “Hollywood”, and we report state-of-the-art performance.

Deep Learning-Based Food Calorie Estimation Method in Dietary Assessment

Yanchao Liang , Jianhua Li

Comments: 13 pages. arXiv admin note: substantial text overlap with arXiv:1705.07632

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Obesity treatment requires obese patients to record all food intakes per day.

Computer vision has been introduced to estimate calories from food images. In

order to increase accuracy of detection and reduce the error of volume

estimation in food calorie estimation, we present our calorie estimation method

in this paper. To estimate calorie of food, a top view and side view is needed.

Faster R-CNN is used to detect the food and calibration object. GrabCut

algorithm is used to get each food’s contour. Then the volume is estimated with

the food and corresponding object. Finally we estimate each food’s calorie. And

the experiment results show our estimation method is effective.

Text Extraction From Texture Images Using Masked Signal Decomposition

Shervin Minaee , Yao Wang

Comments: arXiv admin note: text overlap with arXiv:1704.07711

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Text extraction is an important problem in image processing with applications

from optical character recognition to autonomous driving. Most of the

traditional text segmentation algorithms consider separating text from a simple

background (which usually has a different color from texts). In this work we

consider separating texts from a textured background, that has similar color to

texts. We look at this problem from a signal decomposition perspective, and

consider a more realistic scenario where signal components are overlaid on top

of each other (instead of adding together). When the signals are overlaid, to

separate signal components, we need to find a binary mask which shows the

support of each component. Because directly solving the binary mask is

intractable, we relax this problem to the approximated continuous problem, and

solve it by alternating optimization method. We show that the proposed

algorithm achieves significantly better results than other recent works on

several challenging images.

Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty

Pedro F. Proenca , Yang Gao Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)

This work proposes a robust visual odometry method for structured

environments that combines point features with line and plane segments,

extracted through an RGB-D camera. Noisy depth maps are processed by a

probabilistic depth fusion framework based on Mixtures of Gaussians to denoise

and derive the depth uncertainty, which is then propagated throughout the

visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are

used to model the uncertainties of the feature parameters and pose is estimated

by combining the three types of primitives based on their uncertainties.

Performance evaluation on RGB-D sequences collected in this work and two public

RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth

fusion framework and combining the three feature-types, particularly in scenes

with low-textured surfaces, dynamic objects and missing depth measurements.

Long-Term Video Interpolation with Bidirectional Predictive Network

Xiongtao Chen , Wenmin Wang , Jinzhuo Wang , Weimian Li , Baoyang Chen

Comments: 5 pages, 7 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

This paper considers the challenging task of long-term video interpolation.

Unlike most existing methods that only generate few intermediate frames between

existing adjacent ones, we attempt to speculate or imagine the procedure of an

episode and further generate multiple frames between two non-consecutive frames

in videos. In this paper, we present a novel deep architecture called

bidirectional predictive network (BiPN) that predicts intermediate frames from

two opposite directions. The bidirectional architecture allows the model to

learn scene transformation with time as well as generate longer video

sequences. Besides, our model can be extended to predict multiple possible

procedures by sampling different noise vectors. A joint loss composed of clues

in image and feature spaces and adversarial loss is designed to train our

model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes

and UCF101 and report competitive results to recent approaches.

SEP-Nets: Small and Effective Pattern Networks

Zhe Li , Xiaoyu Wang , Xutao Lv , Tianbao Yang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)

While going deeper has been witnessed to improve the performance of

convolutional neural networks (CNN), going smaller for CNN has received

increasing attention recently due to its attractiveness for mobile/embedded

applications. It remains an active and important topic how to design a small

network while retaining the performance of large and deep CNNs (e.g., Inception

Nets, ResNets). Albeit there are already intensive studies on compressing the

size of CNNs, the considerable drop of performance is still a key concern in

many designs. This paper addresses this concern with several new contributions.

First, we propose a simple yet powerful method for compressing the size of deep

CNNs based on parameter binarization. The striking difference from most

previous work on parameter binarization/quantization lies at different

treatments of (1 imes 1) convolutions and (k imes k) convolutions ((k>1)),

where we only binarize (k imes k) convolutions into binary patterns. The

resulting networks are referred to as pattern networks. By doing this, we show

that previous deep CNNs such as GoogLeNet and Inception-type Nets can be

compressed dramatically with marginal drop in performance. Second, in light of

the different functionalities of (1 imes 1) (data projection/transformation)

and (k imes k) convolutions (pattern extraction), we propose a new block

structure codenamed the pattern residual block that adds transformed feature

maps generated by (1 imes 1) convolutions to the pattern feature maps

generated by (k imes k) convolutions, based on which we design a small network

with (sim 1) million parameters. Combining with our parameter binarization, we

achieve better performance on ImageNet than using similar sized networks

including recently released Google MobileNets.

Deep Control – a simple automatic gain control for memory efficient and high performance training of deep convolutional neural networks

Brendan Ruff

Comments: Submitted to BMVC 2017 on 2nd May 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Training a deep convolutional neural net typically starts with a random

initialisation of all filters in all layers which severely reduces the forward

signal and back-propagated error and leads to slow and sub-optimal training.

Techniques that counter that focus on either increasing the signal or

increasing the gradients adaptively but the model behaves very differently at

the beginning of training compared to later when stable pathways through the

net have been established. To compound this problem the effective minibatch

size varies greatly between layers at different depths and between individual

filters as activation sparsity typically increases with depth leading to a

reduction in effective learning rate since gradients may superpose rather than

add and this further compounds the covariate shift problem as deeper neurons

are less able to adapt to upstream shift.

Proposed here is a method of automatic gain control of the signal built into

each convolutional neuron that achieves equivalent or superior performance than

batch normalisation and is compatible with single sample or minibatch gradient

descent. The same model is used both for training and inference.

The technique comprises a scaled per sample map mean subtraction from the raw

convolutional filter output followed by scaling of the difference.

Contrast Enhancement Estimation for Digital Image Forensics

Longyin Wen , Honggang Qi , Siwei Lyu Subjects : Computer Vision and Pattern Recognition (cs.CV)

Inconsistency in contrast enhancement can be used to expose image forgeries.

In this work, we describe a new method to estimate contrast enhancement from a

single image. Our method takes advantage of the nature of contrast enhancement

as a mapping between pixel values, and the distinct characteristics it

introduces to the image pixel histogram. Our method recovers the original pixel

histogram and the contrast enhancement simultaneously from a single image with

an iterative algorithm. Unlike previous methods, our method is robust in the

presence of additive noise perturbations that are used to hide the traces of

contrast enhancement. Furthermore, we also develop an e effective method to to

detect image regions undergone contrast enhancement transformations that are

different from the rest of the image, and use this method to detect composite

images. We perform extensive experimental evaluations to demonstrate the

efficacy and efficiency of our method method.

Can We See Photosynthesis? Magnifying the Tiny Color Changes of Plant Green Leaves Using Eulerian Video Magnification

Islam A.T.F. Taj-Eddin , Mahmoud Afifi , Mostafa Korashy , Ali H. Ahmed , Ng Yoke Cheng , Evelyng Hernandez , Salma M. Abdel-latif

Comments: 5 pages, 6 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Plant aliveness is proven through laboratory experiments and special

scientific instruments. In this paper, we aim to detect the degree of animation

of plants based on the magnification of the small color changes in the plant’s

green leaves using the Eulerian video magnification. Capturing the video under

a controlled environment, e.g., using a tripod and direct current (DC) light

sources, reduces camera movements and minimizes light fluctuations; we aim to

reduce the external factors as much as possible. The acquired video is then

stabilized and a proposed algorithm used to reduce the illumination variations.

Lastly, the Euler magnification is utilized to magnify the color changes on the

light invariant video. The proposed system does not require any special purpose

instruments as it uses a digital camera with a regular frame rate. The results

of magnified color changes on both natural and plastic leaves show that the

live green leaves have color changes in contrast to the plastic leaves. Hence,

we can argue that the color changes of the leaves are due to biological

operations, such as photosynthesis. To date, this is possibly the first work

that focuses on interpreting visually, some biological operations of plants

without any special purpose instruments.

Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking

James Tompkin , Kwang In Kim , Hanspeter Pfister , Christian Theobalt Subjects : Computer Vision and Pattern Recognition (cs.CV)

Large databases are often organized by hand-labeled metadata, or criteria,

which are expensive to collect. We can use unsupervised learning to model

database variation, but these models are often high dimensional, complex to

parameterize, or require expert knowledge. We learn low-dimensional continuous

criteria via interactive ranking, so that the novice user need only describe

the relative ordering of examples. This is formed as semi-supervised label

propagation in which we maximize the information gained from a limited number

of examples. Further, we actively suggest data points to the user to rank in a

more informative way than existing work. Our efficient approach allows users to

interactively organize thousands of data points along 1D and 2D continuous

sliders. We experiment with datasets of imagery and geometry to demonstrate

that our tool is useful for quickly assessing and organizing the content of

large databases.

A Direction Search and Spectral Clustering Based Approach to Subspace Clustering

Mostafa Rahmani , George Atia Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

This paper presents a new spectral-clustering-based approach to the subspace

clustering problem in which the data lies in the union of an unknown number of

unknown linear subspaces. Underpinning the proposed method is a convex program

for optimal direction search, which for each data point d, finds an optimal

direction in the span of the data that has minimum projection on the other data

points and non-vanishing projection on d. The obtained directions are

subsequently leveraged to identify a neighborhood set for each data point. An

Alternating Direction Method of Multipliers (ADMM) framework is provided to

efficiently solve for the optimal directions. The proposed method is shown to

often outperform the existing subspace clustering methods, particularly for

unwieldy scenarios involving high levels of noise and close subspaces, and

yields the state-of-the-art results for the problem of face clustering using

subspace segmentation.

Indirect Image Registration with Large Diffeomorphic Deformations

Chong Chen , Ozan Öktem

Comments: 34 pages, 4 figures, 1 table

Subjects

Numerical Analysis (math.NA)

; Computer Vision and Pattern Recognition (cs.CV); Dynamical Systems (math.DS); Functional Analysis (math.FA); Optimization and Control (math.OC)

We introduce a variational framework for indirect image registration where a

template is registered against a target that is known only through indirect

noisy observations, such as in tomographic imaging. The registration uses

diffeomorphisms that transform the template through a (group) action. These

diffeomorphisms are generated using the large deformation diffeomorphic metric

mapping framework, i.e., they are given as solutions of a flow equation that is

defined by a velocity field. We prove existence of solutions to this indirect

image registration procedure and provide examples of its performance on 2D

tomography with very sparse and/or highly noisy data.