arXiv Paper Daily: Fri, 5 May 2017

栏目: 编程工具 · 发布时间: 7年前

内容简介：arXiv Paper Daily: Fri, 5 May 2017

Neural and Evolutionary Computing

Evolutionary learning of fire fighting strategies

Martin Kretschmer , Elmar Langetepe Subjects : Neural and Evolutionary Computing (cs.NE)

The dynamic problem of enclosing an expanding fire can be modelled by a

discrete variant in a grid graph. While the fire expands to all neighbouring

cells in any time step, the fire fighter is allowed to block (c) cells in the

average outside the fire in the same time interval. It was shown that the

success of the fire fighter is guaranteed for (c>1.5) but no strategy can

enclose the fire for (cleq 1.5). For achieving such a critical threshold the

correctness (sometimes even optimality) of strategies and lower bounds have

been shown by integer programming or by direct but often very sophisticated

arguments. We investigate the problem whether it is possible to find or to

approach such a threshold and/or optimal strategies by means of evolutionary

algorithms, i.e., we just try to learn successful strategies for different

constants (c) and have a look at the outcome. The main general idea is that

this approach might give some insight in the power of evolutionary strategies

for similar geometrically motivated threshold questions. We investigate the

variant of protecting a highway with still unknown threshold and found

interesting strategic paradigms.

Keywords: Dynamic environments, fire fighting, evolutionary strategies,

threshold approximation

Pixel Normalization from Numeric Data as Input to Neural Networks

Parth Sane , Ravindra Agrawal

Comments: IEEE WiSPNET 2017 conference in Chennai

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Text to image transformation for input to neural networks requires

intermediate steps. This paper attempts to present a new approach to pixel

normalization so as to convert textual data into image, suitable as input for

neural networks. This method can be further improved by its Graphics Processing

Unit (GPU) implementation to provide significant speedup in computational time.

Computer Vision and Pattern Recognition

Recurrent Soft Attention Model for Common Object Recognition

Liliang Ren , Tong Xiao , Xiaogang Wang Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose the Recurrent Soft Attention Model, which integrates the visual

attention from the original image to a LSTM memory cell through a down-sample

network. The model recurrently transmits visual attention to the memory cells

for glimpse mask generation, which is a more natural way for attention

integration and exploitation in general object detection and recognition

problem. We test our model under the metric of the top-1 accuracy on the

CIFAR-10 dataset. The experiment shows that our down-sample network and

feedback mechanism plays an effective role among the whole network structure.

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Yifan Liu , Zengchang Qin , Zhenbo Luo , Hua Wang

Comments: 12 pages, 7 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Recently, realistic image generation using deep neural networks has become a

hot topic in machine learning and computer vision. Images can be generated at

the pixel level by learning from a large collection of images. Learning to

generate colorful cartoon images from black-and-white sketches is not only an

interesting research problem, but also a potential application in digital

entertainment. In this paper, we investigate the sketch-to-image synthesis

problem by using conditional generative adversarial networks (cGAN). We propose

the auto-painter model which can automatically generate compatible colors for a

sketch. The new model is not only capable of painting hand-draw sketch with

proper colors, but also allowing users to indicate preferred colors.

Experimental results on two sketch datasets show that the auto-painter performs

better that existing image-to-image methods.

Edge-based Component-Trees for Multi-Channel Image Segmentation

Tobias Böttger , Dominik Gutermuth

Comments: 11 pages, 8 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of edge-based component-trees for images with an

arbitrary number of channels. The approach is a natural extension of the

classical component-tree devoted to gray-scale images. The similar structure

enables the translation of many gray-level image processing techniques based on

the component-tree to hyperspectral and color images. As an example

application, we present an image segmentation approach that extracts Maximally

Stable Homogeneous Regions (MSHR). The approach very similar to MSER but can be

applied to images with an arbitrary number of channels. As opposed to MSER, our

approach implicitly segments regions with are both lighter and darker than

their background for gray-scale images and can be used in OCR applications

where MSER will fail. We introduce a local flooding-based immersion for the

edge-based component-tree construction which is linear in the number of pixels.

In the experiments, we show that the runtime scales favorably with an

increasing number of channels and may improve algorithms which build on MSER.

Action Tubelet Detector for Spatio-Temporal Action Localization

Vicky Kalogeiton , Philippe Weinzaepfel , Vittorio Ferrari , Cordelia Schmid

Comments: 9 pages, 8 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Current state-of-the-art approaches for spatio-temporal action detection rely

on detections at the frame level that are then linked or tracked across time.

In this paper, we leverage the temporal continuity of videos instead of

operating at the frame level. We propose the ACtion Tubelet detector

(ACT-detector) that takes as input a sequence of frames and outputs tubeletes,

i.e., sequences of bounding boxes with associated scores. The same way

state-of-the-art object detectors rely on anchor boxes, our ACT-detector is

based on anchor cuboids. We build upon the state-of-the-art SSD framework

(Single Shot MultiBox Detector). Convolutional features are extracted for each

frame, while scores and regressions are based on the temporal stacking of these

features, thus exploiting information from a sequence. Our experimental results

show that leveraging sequences of frames significantly improves detection

performance over using individual frames. The gain of our tubelet detector can

be explained by both more relevant scores and more precise localization. Our

ACT-detector outperforms the state of the art methods for frame-mAP and

video-mAP on the J-HMDB and UCF-101 datasets, in particular at high overlap

thresholds.

A Deep Learning Perspective on the Origin of Facial Expressions

Ran Breuer , Ron Kimmel Subjects : Computer Vision and Pattern Recognition (cs.CV)

Facial expressions play a significant role in human communication and

behavior. Psychologists have long studied the relationship between facial

expressions and emotions. Paul Ekman et al., devised the Facial Action Coding

System (FACS) to taxonomize human facial expressions and model their behavior.

The ability to recognize facial expressions automatically, enables novel

applications in fields like human-computer interaction, social gaming, and

psychological research. There has been a tremendously active research in this

field, with several recent papers utilizing convolutional neural networks (CNN)

for feature extraction and inference. In this paper, we employ CNN

understanding methods to study the relation between the features these

computational networks are using, the FACS and Action Units (AU). We verify our

findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets.

We apply these models to various tasks and tests using transfer learning,

including cross-dataset validation and cross-task performance. Finally, we

exploit the nature of the FER based CNN models for the detection of

micro-expressions and achieve state-of-the-art accuracy using a simple

long-short-term-memory (LSTM) recurrent neural network (RNN).

Pixel Normalization from Numeric Data as Input to Neural Networks

Parth Sane , Ravindra Agrawal

Comments: IEEE WiSPNET 2017 conference in Chennai

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Text to image transformation for input to neural networks requires

intermediate steps. This paper attempts to present a new approach to pixel

normalization so as to convert textual data into image, suitable as input for

neural networks. This method can be further improved by its Graphics Processing

Unit (GPU) implementation to provide significant speedup in computational time.

From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis

Yang Long , Li Liu , Ling Shao , Fumin Shen , Guiguang Ding , Jungong Han Subjects : Computer Vision and Pattern Recognition (cs.CV)

Robust object recognition systems usually rely on powerful feature extraction

mechanisms from a large number of real images. However, in many realistic

applications, collecting sufficient images for ever-growing new classes is

unattainable. In this paper, we propose a new Zero-shot learning (ZSL)

framework that can synthesise visual features for unseen classes without

acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS)

algorithm, semantic attributes are effectively utilised as an intermediate clue

to synthesise unseen visual features at the training stage. Hereafter, ZSL

recognition is converted into the conventional supervised problem, i.e. the

synthesised visual features can be straightforwardly fed to typical classifiers

such as SVM. On four benchmark datasets, we demonstrate the benefit of using

synthesised unseen data. Extensive experimental results suggest that our

proposed approach significantly improve the state-of-the-art results.

Am I Done? Predicting Action Progress in Videos

Federico Becattini , Tiberio Uricchio , Lamberto Ballan , Lorenzo Seidenari , Alberto Del Bimbo

Comments: Submitted to BMVC 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In this paper we introduce the problem of predicting action progress in

untrimmed videos. We argue that this is an extremely important task because, on

the one hand, it can be valuable for a wide range of applications and, on the

other hand, it facilitates better action detection results. To solve this

problem we introduce a novel approach, named ProgressNet, capable of predicting

when an action takes place in a video, where it is located within the frames,

and how far it has progressed during its execution. Motivated by the recent

success obtained from the interaction of Convolutional and Recurrent Neural

Networks, our model is based on a combination of the well known Faster R-CNN

framework, to make framewise predictions, and LSTM networks, to estimate action

progress through time. After introducing two evaluation protocols for the task

at hand, we demonstrate the capability of our model to effectively predict

action progress on a subset of 11 classes from UCF-101, all of which exhibit

strong temporal structure. Moreover, we show that this leads to

state-of-the-art spatio-temporal localization results.

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

Hou-Ning Hu , Yen-Chen Lin , Ming-Yu Liu , Hsien-Tzu Cheng , Yung-Ju Chang , Min Sun

Comments: 13 pages, 8 figures, To appear in CVPR 2017 as an Oral paper. The first two authors contributed equally to this work. this https URL

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Graphics (cs.GR); Multimedia (cs.MM)

Watching a 360{deg} sports video requires a viewer to continuously select a

viewing angle, either through a sequence of mouse clicks or head movements. To

relieve the viewer from this “360 piloting” task, we propose “deep 360 pilot”

— a deep learning-based agent for piloting through 360{deg} sports videos

automatically. At each frame, the agent observes a panoramic image and has the

knowledge of previously selected viewing angles. The task of the agent is to

shift the current viewing angle (i.e. action) to the next preferred one (i.e.,

goal). We propose to directly learn an online policy of the agent from data. We

use the policy gradient technique to jointly train our pipeline: by minimizing

(1) a regression loss measuring the distance between the selected and ground

truth viewing angles, (2) a smoothness loss encouraging smooth transition in

viewing angle, and (3) maximizing an expected reward of focusing on a

foreground object. To evaluate our method, we build a new 360-Sports video

dataset consisting of five sports domains. We train domain-specific agents and

achieve the best performance on viewing angle selection accuracy and transition

smoothness compared to [51] and other baselines.

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler Cinbis Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose a novel approach for unsupervised zero-shot learning (ZSL) of

classes based on their names. Most existing unsupervised ZSL methods aim to

learn a model for directly comparing image features and class names. However,

this proves to be a difficult task due to dominance of non-visual semantics in

underlying vector-space embeddings of class names. To address this issue, we

discriminatively learn a word representation such that the similarities between

class and combination of attribute names fall in line with the visual

similarity. Contrary to the traditional zero-shot learning approaches that are

built upon attribute presence, our approach avoids the laborious

attribute-class relation annotations for unseen classes. In addition, our

proposed approach renders text-only training possible, hence, the training can

be augmented without the need to collect additional image data. The

experimental results show that our method yields state-of-the-art results for

unsupervised ZSL in three benchmark datasets.

Generative Convolutional Networks for Latent Fingerprint Reconstruction

Jan Svoboda , Federico Monti , Michael M. Bronstein Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)

Performance of fingerprint recognition depends heavily on the extraction of

minutiae points. Enhancement of the fingerprint ridge pattern is thus an

essential pre-processing step that noticeably reduces false positive and

negative detection rates. A particularly challenging setting is when the

fingerprint images are corrupted or partially missing. In this work, we apply

generative convolutional networks to denoise visible minutiae and predict the

missing parts of the ridge pattern. The proposed enhancement approach is tested

as a pre-processing step in combination with several standard feature

extraction methods such as MINDTCT, followed by biometric comparison using MCC

and BOZORTH3. We evaluate our method on several publicly available latent

fingerprint datasets captured using different sensors.

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Dushyant Mehta , Srinath Sridhar , Oleksandr Sotnychenko , Helge Rhodin , Mohammad Shafiei , Hans-Peter Seidel , Weipeng Xu , Dan Casas , Christian Theobalt

Comments: Accepted to SIGGRAPH 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Graphics (cs.GR)

We present the first real-time method to capture the full global 3D skeletal

pose of a human in a stable, temporally consistent manner using a single RGB

camera. Our method combines a new convolutional neural network (CNN) based pose

regressor with kinematic skeleton fitting. Our novel fully-convolutional pose

formulation regresses 2D and 3D joint positions jointly in real time and does

not require tightly cropped input frames. A real-time kinematic skeleton

fitting method uses the CNN output to yield temporally stable 3D global pose

reconstructions on the basis of a coherent kinematic skeleton. This makes our

approach the first monocular RGB method usable in real-time applications such

as 3D character control—thus far, the only monocular methods for such

applications employed specialized RGB-D cameras. Our method’s accuracy is

quantitatively on par with the best offline 3D monocular RGB pose estimation

methods. Our results are qualitatively comparable to, and sometimes better

than, results from monocular RGB-D approaches, such as the Kinect. However, we

show that our approach is more broadly applicable than RGB-D solutions, i.e. it

works for outdoor scenes, community videos, and low quality commodity RGB

cameras.

Toward Open Set Face Recognition

Manuel Günther , Steve Cruz , Ethan M. Rudd , Terrance E. Boult Subjects : Computer Vision and Pattern Recognition (cs.CV)

Much research has been conducted on both face identification and face

verification problems, with greater focus on the latter. Research on face

identification has mostly focused on using closed-set protocols, which assume

that all probe images used in evaluation contain identities of subjects that

are enrolled in the gallery. Real systems, however, where only a fraction of

probe sample identities are enrolled in the gallery, cannot make this

closed-set assumption. Instead, they must assume an open set of probe samples

and be able to reject/ignore those that correspond to unknown identities. In

this paper, we address the widespread misconception that thresholding

verification-like scores is sufficient to solve the open-set face

identification problem, by formulating an open-set face identification protocol

and evaluating different strategies for assessing similarity. Our open-set

identification protocol is based on the canonical labeled faces in the wild

(LFW) dataset. We compare three algorithms for assessing similarity in a deep

feature space under an open-set protocol: thresholded verification-like scores,

linear discriminant analysis (LDA) scores, and an extreme value machine (EVM)

output probabilities. Our findings suggest that thresholding similarity

measures that are open-set by design outperforms verification-like score level

thresholding.

Fast k-means based on KNN Graph

Cheng-Hao Deng , Wan-Lei Zhao Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic

processing tool in various contexts. However, its computational cost could be

prohibitively high as the data size and the cluster number are large. It is

well known that the processing bottleneck of k-means lies in the operation of

seeking closest centroid in each iteration. In this paper, a novel solution

towards the scalability issue of k-means is presented. In the proposal, k-means

is supported by an approximate k-nearest neighbors graph. In the k-means

iteration, each data sample is only compared to clusters that its nearest

neighbors reside. Since the number of nearest neighbors we consider is much

less than k, the processing cost in this step becomes minor and irrelevant to

k. The processing bottleneck is therefore overcome. The most interesting thing

is that k-nearest neighbor graph is constructed by iteratively calling the fast

(k)-means itself. Comparing with existing fast k-means variants, the proposed

algorithm achieves hundreds to thousands times speed-up while maintaining high

clustering quality. As it is tested on 10 million 512-dimensional data, it

takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the

same scale of clustering, it would take 3 years for traditional k-means.

Artificial Intelligence

Semi-supervised model-based clustering with controlled clusters leakage

Marek Śmieja , Łukasz Struski , Jacek Tabor Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we focus on finding clusters in partially categorized data

sets. We propose a semi-supervised version of Gaussian mixture model, called

C3L, which retrieves natural subgroups of given categories. In contrast to

other semi-supervised models, C3L is parametrized by user-defined leakage

level, which controls maximal inconsistency between initial categorization and

resulting clustering. Our method can be implemented as a module in practical

expert systems to detect clusters, which combine expert knowledge with true

distribution of data. Moreover, it can be used for improving the results of

less flexible clustering techniques, such as projection pursuit clustering. The

paper presents extensive theoretical analysis of the model and fast algorithm

for its efficient optimization. Experimental results show that C3L finds high

quality clustering model, which can be applied in discovering meaningful groups

in partially classified data.

A Reasoning System for a First-Order Logic of Limited Belief

Christoph Schwering

Comments: 22 pages, 0 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Subjects

Artificial Intelligence (cs.AI)

Logics of limited belief aim at enabling computationally feasible reasoning

in highly expressive representation languages. These languages are often

dialects of first-order logic with a weaker form of logical entailment that

keeps reasoning decidable or even tractable. While a number of such logics have

been proposed in the past, they tend to remain for theoretical analysis only

and their practical relevance is very limited. In this paper, we aim to go

beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we

develop a logic of limited belief that is highly expressive while remaining

decidable in the first-order and tractable in the propositional case and

exhibits some characteristics that make it attractive for an implementation. We

introduce a reasoning system that employs this logic as representation language

and present experimental results that showcase the benefit of limited belief.

Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints

Francisco López-Ramos , Armando Guarnaschelli , José-Fernando Camacho-Vallejo , Laura Hervert-Escobar , Rosa G. González-Ramírez

Comments: 16 pages, 3 figures, 5 tables, proceedings paper of Mexican International Conference on Artificial Intelligence (MICAI) 2016

Subjects

Artificial Intelligence (cs.AI)

This work presents a model for the Tramp Ship Scheduling problem including

berth allocation considerations, motivated by a real case of a shipping

company. The aim is to determine the travel schedule for each vessel

considering multiple docking and multiple time windows at the berths. This work

is innovative due to the consideration of both spatial and temporal attributes

during the scheduling process. The resulting model is formulated as a

mixed-integer linear programming problem, and a heuristic method to deal with

multiple vessel schedules is also presented. Numerical experimentation is

performed to highlight the benefits of the proposed approach and the

applicability of the heuristic. Conclusions and recommendations for further

research are provided.

Semi-supervised cross-entropy clustering with information bottleneck constraint

Marek Śmieja , Bernhard C. Geiger Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a semi-supervised clustering method, CEC-IB, that

models data with a set of Gaussian distributions and that retrieves clusters

based on a partial labeling provided by the user (partition-level side

information). By combining the ideas from cross-entropy clustering (CEC) with

those from the information bottleneck method (IB), our method trades between

three conflicting goals: the accuracy with which the data set is modeled, the

simplicity of the model, and the consistency of the clustering with side

information. Experiments demonstrate that CEC-IB has a performance comparable

to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but

is faster, more robust to noisy labels, automatically determines the optimal

number of clusters, and performs well when not all classes are present in the

side information. Moreover, in contrast to other semi-supervised models, it can

be successfully applied in discovering natural subgroups if the partition-level

side information is derived from the top levels of a hierarchical clustering.