arXiv Paper Daily: Fri, 5 May 2017

栏目: 编程工具 · 发布时间: 7年前

内容简介:arXiv Paper Daily: Fri, 5 May 2017

Neural and Evolutionary Computing

Evolutionary learning of fire fighting strategies

Martin Kretschmer , Elmar Langetepe Subjects : Neural and Evolutionary Computing (cs.NE)

The dynamic problem of enclosing an expanding fire can be modelled by a

discrete variant in a grid graph. While the fire expands to all neighbouring

cells in any time step, the fire fighter is allowed to block (c) cells in the

average outside the fire in the same time interval. It was shown that the

success of the fire fighter is guaranteed for (c>1.5) but no strategy can

enclose the fire for (cleq 1.5). For achieving such a critical threshold the

correctness (sometimes even optimality) of strategies and lower bounds have

been shown by integer programming or by direct but often very sophisticated

arguments. We investigate the problem whether it is possible to find or to

approach such a threshold and/or optimal strategies by means of evolutionary

algorithms, i.e., we just try to learn successful strategies for different

constants (c) and have a look at the outcome. The main general idea is that

this approach might give some insight in the power of evolutionary strategies

for similar geometrically motivated threshold questions. We investigate the

variant of protecting a highway with still unknown threshold and found

interesting strategic paradigms.

Keywords: Dynamic environments, fire fighting, evolutionary strategies,

threshold approximation

Pixel Normalization from Numeric Data as Input to Neural Networks

Parth Sane , Ravindra Agrawal

Comments: IEEE WiSPNET 2017 conference in Chennai

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Text to image transformation for input to neural networks requires

intermediate steps. This paper attempts to present a new approach to pixel

normalization so as to convert textual data into image, suitable as input for

neural networks. This method can be further improved by its Graphics Processing

Unit (GPU) implementation to provide significant speedup in computational time.

Computer Vision and Pattern Recognition

Recurrent Soft Attention Model for Common Object Recognition

Liliang Ren , Tong Xiao , Xiaogang Wang Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose the Recurrent Soft Attention Model, which integrates the visual

attention from the original image to a LSTM memory cell through a down-sample

network. The model recurrently transmits visual attention to the memory cells

for glimpse mask generation, which is a more natural way for attention

integration and exploitation in general object detection and recognition

problem. We test our model under the metric of the top-1 accuracy on the

CIFAR-10 dataset. The experiment shows that our down-sample network and

feedback mechanism plays an effective role among the whole network structure.

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Yifan Liu , Zengchang Qin , Zhenbo Luo , Hua Wang

Comments: 12 pages, 7 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Recently, realistic image generation using deep neural networks has become a

hot topic in machine learning and computer vision. Images can be generated at

the pixel level by learning from a large collection of images. Learning to

generate colorful cartoon images from black-and-white sketches is not only an

interesting research problem, but also a potential application in digital

entertainment. In this paper, we investigate the sketch-to-image synthesis

problem by using conditional generative adversarial networks (cGAN). We propose

the auto-painter model which can automatically generate compatible colors for a

sketch. The new model is not only capable of painting hand-draw sketch with

proper colors, but also allowing users to indicate preferred colors.

Experimental results on two sketch datasets show that the auto-painter performs

better that existing image-to-image methods.

Edge-based Component-Trees for Multi-Channel Image Segmentation

Tobias Böttger , Dominik Gutermuth

Comments: 11 pages, 8 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of edge-based component-trees for images with an

arbitrary number of channels. The approach is a natural extension of the

classical component-tree devoted to gray-scale images. The similar structure

enables the translation of many gray-level image processing techniques based on

the component-tree to hyperspectral and color images. As an example

application, we present an image segmentation approach that extracts Maximally

Stable Homogeneous Regions (MSHR). The approach very similar to MSER but can be

applied to images with an arbitrary number of channels. As opposed to MSER, our

approach implicitly segments regions with are both lighter and darker than

their background for gray-scale images and can be used in OCR applications

where MSER will fail. We introduce a local flooding-based immersion for the

edge-based component-tree construction which is linear in the number of pixels.

In the experiments, we show that the runtime scales favorably with an

increasing number of channels and may improve algorithms which build on MSER.

Action Tubelet Detector for Spatio-Temporal Action Localization

Vicky Kalogeiton , Philippe Weinzaepfel , Vittorio Ferrari , Cordelia Schmid

Comments: 9 pages, 8 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Current state-of-the-art approaches for spatio-temporal action detection rely

on detections at the frame level that are then linked or tracked across time.

In this paper, we leverage the temporal continuity of videos instead of

operating at the frame level. We propose the ACtion Tubelet detector

(ACT-detector) that takes as input a sequence of frames and outputs tubeletes,

i.e., sequences of bounding boxes with associated scores. The same way

state-of-the-art object detectors rely on anchor boxes, our ACT-detector is

based on anchor cuboids. We build upon the state-of-the-art SSD framework

(Single Shot MultiBox Detector). Convolutional features are extracted for each

frame, while scores and regressions are based on the temporal stacking of these

features, thus exploiting information from a sequence. Our experimental results

show that leveraging sequences of frames significantly improves detection

performance over using individual frames. The gain of our tubelet detector can

be explained by both more relevant scores and more precise localization. Our

ACT-detector outperforms the state of the art methods for frame-mAP and

video-mAP on the J-HMDB and UCF-101 datasets, in particular at high overlap

thresholds.

A Deep Learning Perspective on the Origin of Facial Expressions

Ran Breuer , Ron Kimmel Subjects : Computer Vision and Pattern Recognition (cs.CV)

Facial expressions play a significant role in human communication and

behavior. Psychologists have long studied the relationship between facial

expressions and emotions. Paul Ekman et al., devised the Facial Action Coding

System (FACS) to taxonomize human facial expressions and model their behavior.

The ability to recognize facial expressions automatically, enables novel

applications in fields like human-computer interaction, social gaming, and

psychological research. There has been a tremendously active research in this

field, with several recent papers utilizing convolutional neural networks (CNN)

for feature extraction and inference. In this paper, we employ CNN

understanding methods to study the relation between the features these

computational networks are using, the FACS and Action Units (AU). We verify our

findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets.

We apply these models to various tasks and tests using transfer learning,

including cross-dataset validation and cross-task performance. Finally, we

exploit the nature of the FER based CNN models for the detection of

micro-expressions and achieve state-of-the-art accuracy using a simple

long-short-term-memory (LSTM) recurrent neural network (RNN).

Pixel Normalization from Numeric Data as Input to Neural Networks

Parth Sane , Ravindra Agrawal

Comments: IEEE WiSPNET 2017 conference in Chennai

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Text to image transformation for input to neural networks requires

intermediate steps. This paper attempts to present a new approach to pixel

normalization so as to convert textual data into image, suitable as input for

neural networks. This method can be further improved by its Graphics Processing

Unit (GPU) implementation to provide significant speedup in computational time.

From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis

Yang Long , Li Liu , Ling Shao , Fumin Shen , Guiguang Ding , Jungong Han Subjects : Computer Vision and Pattern Recognition (cs.CV)

Robust object recognition systems usually rely on powerful feature extraction

mechanisms from a large number of real images. However, in many realistic

applications, collecting sufficient images for ever-growing new classes is

unattainable. In this paper, we propose a new Zero-shot learning (ZSL)

framework that can synthesise visual features for unseen classes without

acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS)

algorithm, semantic attributes are effectively utilised as an intermediate clue

to synthesise unseen visual features at the training stage. Hereafter, ZSL

recognition is converted into the conventional supervised problem, i.e. the

synthesised visual features can be straightforwardly fed to typical classifiers

such as SVM. On four benchmark datasets, we demonstrate the benefit of using

synthesised unseen data. Extensive experimental results suggest that our

proposed approach significantly improve the state-of-the-art results.

Am I Done? Predicting Action Progress in Videos

Federico Becattini , Tiberio Uricchio , Lamberto Ballan , Lorenzo Seidenari , Alberto Del Bimbo

Comments: Submitted to BMVC 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

In this paper we introduce the problem of predicting action progress in

untrimmed videos. We argue that this is an extremely important task because, on

the one hand, it can be valuable for a wide range of applications and, on the

other hand, it facilitates better action detection results. To solve this

problem we introduce a novel approach, named ProgressNet, capable of predicting

when an action takes place in a video, where it is located within the frames,

and how far it has progressed during its execution. Motivated by the recent

success obtained from the interaction of Convolutional and Recurrent Neural

Networks, our model is based on a combination of the well known Faster R-CNN

framework, to make framewise predictions, and LSTM networks, to estimate action

progress through time. After introducing two evaluation protocols for the task

at hand, we demonstrate the capability of our model to effectively predict

action progress on a subset of 11 classes from UCF-101, all of which exhibit

strong temporal structure. Moreover, we show that this leads to

state-of-the-art spatio-temporal localization results.

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

Hou-Ning Hu , Yen-Chen Lin , Ming-Yu Liu , Hsien-Tzu Cheng , Yung-Ju Chang , Min Sun

Comments: 13 pages, 8 figures, To appear in CVPR 2017 as an Oral paper. The first two authors contributed equally to this work. this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Graphics (cs.GR); Multimedia (cs.MM)

Watching a 360{deg} sports video requires a viewer to continuously select a

viewing angle, either through a sequence of mouse clicks or head movements. To

relieve the viewer from this “360 piloting” task, we propose “deep 360 pilot”

— a deep learning-based agent for piloting through 360{deg} sports videos

automatically. At each frame, the agent observes a panoramic image and has the

knowledge of previously selected viewing angles. The task of the agent is to

shift the current viewing angle (i.e. action) to the next preferred one (i.e.,

goal). We propose to directly learn an online policy of the agent from data. We

use the policy gradient technique to jointly train our pipeline: by minimizing

(1) a regression loss measuring the distance between the selected and ground

truth viewing angles, (2) a smoothness loss encouraging smooth transition in

viewing angle, and (3) maximizing an expected reward of focusing on a

foreground object. To evaluate our method, we build a new 360-Sports video

dataset consisting of five sports domains. We train domain-specific agents and

achieve the best performance on viewing angle selection accuracy and transition

smoothness compared to [51] and other baselines.

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler Cinbis Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose a novel approach for unsupervised zero-shot learning (ZSL) of

classes based on their names. Most existing unsupervised ZSL methods aim to

learn a model for directly comparing image features and class names. However,

this proves to be a difficult task due to dominance of non-visual semantics in

underlying vector-space embeddings of class names. To address this issue, we

discriminatively learn a word representation such that the similarities between

class and combination of attribute names fall in line with the visual

similarity. Contrary to the traditional zero-shot learning approaches that are

built upon attribute presence, our approach avoids the laborious

attribute-class relation annotations for unseen classes. In addition, our

proposed approach renders text-only training possible, hence, the training can

be augmented without the need to collect additional image data. The

experimental results show that our method yields state-of-the-art results for

unsupervised ZSL in three benchmark datasets.

Generative Convolutional Networks for Latent Fingerprint Reconstruction

Jan Svoboda , Federico Monti , Michael M. Bronstein Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)

Performance of fingerprint recognition depends heavily on the extraction of

minutiae points. Enhancement of the fingerprint ridge pattern is thus an

essential pre-processing step that noticeably reduces false positive and

negative detection rates. A particularly challenging setting is when the

fingerprint images are corrupted or partially missing. In this work, we apply

generative convolutional networks to denoise visible minutiae and predict the

missing parts of the ridge pattern. The proposed enhancement approach is tested

as a pre-processing step in combination with several standard feature

extraction methods such as MINDTCT, followed by biometric comparison using MCC

and BOZORTH3. We evaluate our method on several publicly available latent

fingerprint datasets captured using different sensors.

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Dushyant Mehta , Srinath Sridhar , Oleksandr Sotnychenko , Helge Rhodin , Mohammad Shafiei , Hans-Peter Seidel , Weipeng Xu , Dan Casas , Christian Theobalt

Comments: Accepted to SIGGRAPH 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Graphics (cs.GR)

We present the first real-time method to capture the full global 3D skeletal

pose of a human in a stable, temporally consistent manner using a single RGB

camera. Our method combines a new convolutional neural network (CNN) based pose

regressor with kinematic skeleton fitting. Our novel fully-convolutional pose

formulation regresses 2D and 3D joint positions jointly in real time and does

not require tightly cropped input frames. A real-time kinematic skeleton

fitting method uses the CNN output to yield temporally stable 3D global pose

reconstructions on the basis of a coherent kinematic skeleton. This makes our

approach the first monocular RGB method usable in real-time applications such

as 3D character control—thus far, the only monocular methods for such

applications employed specialized RGB-D cameras. Our method’s accuracy is

quantitatively on par with the best offline 3D monocular RGB pose estimation

methods. Our results are qualitatively comparable to, and sometimes better

than, results from monocular RGB-D approaches, such as the Kinect. However, we

show that our approach is more broadly applicable than RGB-D solutions, i.e. it

works for outdoor scenes, community videos, and low quality commodity RGB

cameras.

Toward Open Set Face Recognition

Manuel Günther , Steve Cruz , Ethan M. Rudd , Terrance E. Boult Subjects : Computer Vision and Pattern Recognition (cs.CV)

Much research has been conducted on both face identification and face

verification problems, with greater focus on the latter. Research on face

identification has mostly focused on using closed-set protocols, which assume

that all probe images used in evaluation contain identities of subjects that

are enrolled in the gallery. Real systems, however, where only a fraction of

probe sample identities are enrolled in the gallery, cannot make this

closed-set assumption. Instead, they must assume an open set of probe samples

and be able to reject/ignore those that correspond to unknown identities. In

this paper, we address the widespread misconception that thresholding

verification-like scores is sufficient to solve the open-set face

identification problem, by formulating an open-set face identification protocol

and evaluating different strategies for assessing similarity. Our open-set

identification protocol is based on the canonical labeled faces in the wild

(LFW) dataset. We compare three algorithms for assessing similarity in a deep

feature space under an open-set protocol: thresholded verification-like scores,

linear discriminant analysis (LDA) scores, and an extreme value machine (EVM)

output probabilities. Our findings suggest that thresholding similarity

measures that are open-set by design outperforms verification-like score level

thresholding.

Fast k-means based on KNN Graph

Cheng-Hao Deng , Wan-Lei Zhao Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic

processing tool in various contexts. However, its computational cost could be

prohibitively high as the data size and the cluster number are large. It is

well known that the processing bottleneck of k-means lies in the operation of

seeking closest centroid in each iteration. In this paper, a novel solution

towards the scalability issue of k-means is presented. In the proposal, k-means

is supported by an approximate k-nearest neighbors graph. In the k-means

iteration, each data sample is only compared to clusters that its nearest

neighbors reside. Since the number of nearest neighbors we consider is much

less than k, the processing cost in this step becomes minor and irrelevant to

k. The processing bottleneck is therefore overcome. The most interesting thing

is that k-nearest neighbor graph is constructed by iteratively calling the fast

(k)-means itself. Comparing with existing fast k-means variants, the proposed

algorithm achieves hundreds to thousands times speed-up while maintaining high

clustering quality. As it is tested on 10 million 512-dimensional data, it

takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the

same scale of clustering, it would take 3 years for traditional k-means.

Artificial Intelligence

Semi-supervised model-based clustering with controlled clusters leakage

Marek Śmieja , Łukasz Struski , Jacek Tabor Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we focus on finding clusters in partially categorized data

sets. We propose a semi-supervised version of Gaussian mixture model, called

C3L, which retrieves natural subgroups of given categories. In contrast to

other semi-supervised models, C3L is parametrized by user-defined leakage

level, which controls maximal inconsistency between initial categorization and

resulting clustering. Our method can be implemented as a module in practical

expert systems to detect clusters, which combine expert knowledge with true

distribution of data. Moreover, it can be used for improving the results of

less flexible clustering techniques, such as projection pursuit clustering. The

paper presents extensive theoretical analysis of the model and fast algorithm

for its efficient optimization. Experimental results show that C3L finds high

quality clustering model, which can be applied in discovering meaningful groups

in partially classified data.

A Reasoning System for a First-Order Logic of Limited Belief

Christoph Schwering

Comments: 22 pages, 0 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Subjects

:

Artificial Intelligence (cs.AI)

Logics of limited belief aim at enabling computationally feasible reasoning

in highly expressive representation languages. These languages are often

dialects of first-order logic with a weaker form of logical entailment that

keeps reasoning decidable or even tractable. While a number of such logics have

been proposed in the past, they tend to remain for theoretical analysis only

and their practical relevance is very limited. In this paper, we aim to go

beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we

develop a logic of limited belief that is highly expressive while remaining

decidable in the first-order and tractable in the propositional case and

exhibits some characteristics that make it attractive for an implementation. We

introduce a reasoning system that employs this logic as representation language

and present experimental results that showcase the benefit of limited belief.

Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints

Francisco López-Ramos , Armando Guarnaschelli , José-Fernando Camacho-Vallejo , Laura Hervert-Escobar , Rosa G. González-Ramírez

Comments: 16 pages, 3 figures, 5 tables, proceedings paper of Mexican International Conference on Artificial Intelligence (MICAI) 2016

Subjects

:

Artificial Intelligence (cs.AI)

This work presents a model for the Tramp Ship Scheduling problem including

berth allocation considerations, motivated by a real case of a shipping

company. The aim is to determine the travel schedule for each vessel

considering multiple docking and multiple time windows at the berths. This work

is innovative due to the consideration of both spatial and temporal attributes

during the scheduling process. The resulting model is formulated as a

mixed-integer linear programming problem, and a heuristic method to deal with

multiple vessel schedules is also presented. Numerical experimentation is

performed to highlight the benefits of the proposed approach and the

applicability of the heuristic. Conclusions and recommendations for further

research are provided.

Semi-supervised cross-entropy clustering with information bottleneck constraint

Marek Śmieja , Bernhard C. Geiger Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a semi-supervised clustering method, CEC-IB, that

models data with a set of Gaussian distributions and that retrieves clusters

based on a partial labeling provided by the user (partition-level side

information). By combining the ideas from cross-entropy clustering (CEC) with

those from the information bottleneck method (IB), our method trades between

three conflicting goals: the accuracy with which the data set is modeled, the

simplicity of the model, and the consistency of the clustering with side

information. Experiments demonstrate that CEC-IB has a performance comparable

to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but

is faster, more robust to noisy labels, automatically determines the optimal

number of clusters, and performs well when not all classes are present in the

side information. Moreover, in contrast to other semi-supervised models, it can

be successfully applied in discovering natural subgroups if the partition-level

side information is derived from the top levels of a hierarchical clustering.

Fast k-means based on KNN Graph

Cheng-Hao Deng , Wan-Lei Zhao Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic

processing tool in various contexts. However, its computational cost could be

prohibitively high as the data size and the cluster number are large. It is

well known that the processing bottleneck of k-means lies in the operation of

seeking closest centroid in each iteration. In this paper, a novel solution

towards the scalability issue of k-means is presented. In the proposal, k-means

is supported by an approximate k-nearest neighbors graph. In the k-means

iteration, each data sample is only compared to clusters that its nearest

neighbors reside. Since the number of nearest neighbors we consider is much

less than k, the processing cost in this step becomes minor and irrelevant to

k. The processing bottleneck is therefore overcome. The most interesting thing

is that k-nearest neighbor graph is constructed by iteratively calling the fast

(k)-means itself. Comparing with existing fast k-means variants, the proposed

algorithm achieves hundreds to thousands times speed-up while maintaining high

clustering quality. As it is tested on 10 million 512-dimensional data, it

takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the

same scale of clustering, it would take 3 years for traditional k-means.

Of the People: Voting Is More Effective with Representative Candidates

Yu Cheng , Shaddin Dughmi , David Kempe Subjects : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)

In light of the classic impossibility results of Arrow and Gibbard and

Satterthwaite regarding voting with ordinal rules, there has been recent

interest in characterizing how well common voting rules approximate the social

optimum. In order to quantify the quality of approximation, it is natural to

consider the candidates and voters as embedded within a common metric space,

and to ask how much further the chosen candidate is from the population as

compared to the socially optimal one. We use this metric preference model to

explore a fundamental and timely question: does the social welfare of a

population improve when candidates are representative of the population? If so,

then by how much, and how does the answer depend on the complexity of the

metric space?

We restrict attention to the most fundamental and common social choice

setting: a population of voters, two independently drawn candidates, and a

majority rule election. When candidates are not representative of the

population, it is known that the candidate selected by the majority rule can be

thrice as far from the population as the socially optimal one. We examine how

this ratio improves when candidates are drawn independently from the population

of voters. Our results are two-fold: When the metric is a line, the ratio

improves from 3 to 4-2 sqrt{2}, roughly 1.1716; this bound is tight. When the

metric is arbitrary, we show a lower bound of 1.5 and a constant upper bound

strictly better than 2 on the approximation ratio of the majority rule.

The positive result depends in part on the assumption that candidates are

independent and identically distributed. However, we show that independence

alone is not enough to achieve the upper bound: even when candidates are drawn

independently, if the population of candidates can be different from the

voters, then an upper bound of 2 on the approximation is tight.

Gait Pattern Recognition Using Accelerometers

Vahid Alizadeh

Comments: 6 pages, project report

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

Motion ability is one of the most important human properties, including gait

as a basis of human transitional movement. Gait, as a biometric for recognizing

human identities, can be non-intrusively captured signals using wearable or

portable smart devices. In this study gait patterns is collected using a

wireless platform of two sensors located at chest and right ankle of the

subjects. Then the raw data has undergone some preprocessing methods and

segmented into 5 seconds windows. Some time and frequency domain features is

extracted and the performance evaluated by 5 different classifiers. Decision

Tree (with all features) and K-Nearest Neighbors (with 10 selected features)

classifiers reached 99.4% and 100% respectively.

Computation and Language

A Finite State and Rule-based Akshara to Prosodeme (A2P) Converter in Hindi

Somnath Roy

Comments: If you need software (A2P Converter), you have to write for the same at “somnathroy86@gmail.com” or “somnat75_llh@jnu.ac.in”

Subjects

:

Computation and Language (cs.CL)

This article describes a software module called Akshara to Prosodeme (A2P)

converter in Hindi. It converts an input grapheme into prosedeme (sequence of

phonemes with the specification of syllable boundaries and prosodic labels).

The software is based on two proposed finite state machines extemdash one for

the syllabification and another for the syllable labeling. In addition to that,

it also uses a set of nonlinear phonological rules proposed for foot formation

in Hindi, which encompass solutions to schwa-deletion in simple, compound,

derived and inflected words. The nonlinear phonological rules are based on

metrical phonology with the provision of recursive foot structure. A software

module is implemented in Python. The testing of the software for

syllabification, syllable labeling, schwa deletion and prosodic labeling yield

an accuracy of more than 99% on a lexicon of size 28664 words.

Probabilistic Typology: Deep Generative Models of Vowel Inventories

Ryan Cotterell , Jason Eisner

Comments: ACL 2017

Subjects

:

Computation and Language (cs.CL)

Linguistic typology studies the range of structures present in human

language. The main goal of the field is to discover which sets of possible

phenomena are universal, and which are merely frequent. For example, all

languages have vowels, while most—but not all—languages have an /u/ sound.

In this paper we present the first probabilistic treatment of a basic question

in phonological typology: What makes a natural vowel inventory? We introduce a

series of deep stochastic point processes, and contrast them with previous

computational, simulation-based approaches. We provide a comprehensive suite of

experiments on over 200 distinct languages.

Distributed, Parallel, and Cluster Computing

Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics

Omid Mashayekhi , Hang Qu , Chinmayee Shah , Philip Levis

Comments: To appear at USENIX ATC 2017

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

Control planes of cloud frameworks trade off between scheduling granularity

and performance. Centralized systems schedule at task granularity, but only

schedule a few thousand tasks per second. Distributed systems schedule hundreds

of thousands of tasks per second but changing the schedule is costly.

We present execution templates, a control plane abstraction that can schedule

hundreds of thousands of tasks per second while supporting fine-grained,

per-task scheduling decisions. Execution templates leverage a program’s

repetitive control flow to cache blocks of frequently-executed tasks. Executing

a task in a template requires sending a single message. Large-scale scheduling

changes install new templates, while small changes apply edits to existing

templates.

Evaluations of execution templates in Nimbus, a data analytics framework,

find that they provide the fine-grained scheduling flexibility of centralized

control planes while matching the strong scaling of distributed ones. Execution

templates support complex, real-world applications, such as a fluid simulation

with a triply nested loop and data dependent branches.

Learning

Fast k-means based on KNN Graph

Cheng-Hao Deng , Wan-Lei Zhao Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic

processing tool in various contexts. However, its computational cost could be

prohibitively high as the data size and the cluster number are large. It is

well known that the processing bottleneck of k-means lies in the operation of

seeking closest centroid in each iteration. In this paper, a novel solution

towards the scalability issue of k-means is presented. In the proposal, k-means

is supported by an approximate k-nearest neighbors graph. In the k-means

iteration, each data sample is only compared to clusters that its nearest

neighbors reside. Since the number of nearest neighbors we consider is much

less than k, the processing cost in this step becomes minor and irrelevant to

k. The processing bottleneck is therefore overcome. The most interesting thing

is that k-nearest neighbor graph is constructed by iteratively calling the fast

(k)-means itself. Comparing with existing fast k-means variants, the proposed

algorithm achieves hundreds to thousands times speed-up while maintaining high

clustering quality. As it is tested on 10 million 512-dimensional data, it

takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the

same scale of clustering, it would take 3 years for traditional k-means.

Optimal Approximation with Sparsely Connected Deep Neural Networks

Helmut Bölcskei , Philipp Grohs , Gitta Kutyniok , Philipp Petersen Subjects : Learning (cs.LG) ; Functional Analysis (math.FA)

We derive fundamental lower bounds on the connectivity and the memory

requirements of deep neural networks guaranteeing uniform approximation rates

for arbitrary function classes in (L^2(mathbb{R}^d)). In other words, we

establish a connection between the complexity of a function class and the

complexity of deep neural networks approximating functions from this class to

within a prescribed accuracy.

Additionally, we prove that our lower bounds are achievable for a broad

family of function classes. Specifically, all function classes that are

optimally approximated by a general class of representation systems—so-called

emph{affine systems}—can be approximated by deep neural networks with

minimal connectivity and memory requirements. Affine systems encompass a wealth

of representation systems from applied harmonic analysis such as wavelets,

ridgelets, curvelets, shearlets, (alpha)-shearlets, and more generally

(alpha)-molecules. This result elucidates a remarkable universality property

of neural networks and shows that they achieve the optimum approximation

properties of all affine systems combined. As a specific example, we consider

the class of (1/alpha)-cartoon-like functions, which is approximated optimally

by (alpha)-shearlets.

We also explain how our results can be extended to the case of functions on

low-dimensional immersed manifolds.

Finally, we present numerical experiments demonstrating that the standard

stochastic gradient descent algorithm generates deep neural networks providing

close-to-optimal approximation rates at minimal connectivity. Moreover, these

results show that stochastic gradient descent actually learns approximations

that are sparse in the representation systems optimally sparsifying the

function class the network is trained on.

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Minsoo Rhu , Mike O'Connor , Niladrish Chatterjee , Jeff Pool , Stephen W. Keckler Subjects : Learning (cs.LG) ; Hardware Architecture (cs.AR)

Popular deep learning frameworks require users to fine-tune their memory

usage so that the training data of a deep neural network (DNN) fits within the

GPU physical memory. Prior work tries to address this restriction by

virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be

utilized for memory allocations. Despite its merits, virtualizing memory can

incur significant performance overheads when the time needed to copy data back

and forth from CPU memory is higher than the latency to perform the

computations required for DNN forward and backward propagation. We introduce a

high-performance virtualization strategy based on a “compressing DMA engine”

(cDMA) that drastically reduces the size of the data structures that are

targeted for CPU-side allocations. The cDMA engine offers an average 2.6x

(maximum 13.8x) compression ratio by exploiting the sparsity inherent in

offloaded data, improving the performance of virtualized DNNs by an average 32%

(maximum 61%).

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels

Curtis G. Northcutt , Tailin Wu , Isaac L. Chuang Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)

Noisy PN learning is the problem of binary classification when training

examples may be mislabeled (flipped) uniformly with noise rate rho1 for

positive examples and rho0 for negative examples. We propose Rank Pruning (RP)

to solve noisy PN learning and the open problem of estimating the noise rates.

Unlike prior solutions, RP is efficient and general, requiring O(T) for any

unrestricted choice of probabilistic classifier with T fitting time. We prove

RP achieves consistent noise estimation and equivalent empirical risk as

learning with uncorrupted labels in ideal conditions, and derive closed-form

solutions when conditions are non-ideal. RP achieves state-of-the-art noise

rate estimation and F1, error, and AUC-PR on the MNIST and CIFAR datasets,

regardless of noise rates. To highlight, RP with a CNN classifier can predict

if a MNIST digit is a “1” or “not 1” with only 0.25% error, and 0.46% error

across all digits, even when 50% of positive examples are mislabeled and 50% of

observed positive labels are mislabeled negative examples.

Semi-supervised model-based clustering with controlled clusters leakage

Marek Śmieja , Łukasz Struski , Jacek Tabor Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we focus on finding clusters in partially categorized data

sets. We propose a semi-supervised version of Gaussian mixture model, called

C3L, which retrieves natural subgroups of given categories. In contrast to

other semi-supervised models, C3L is parametrized by user-defined leakage

level, which controls maximal inconsistency between initial categorization and

resulting clustering. Our method can be implemented as a module in practical

expert systems to detect clusters, which combine expert knowledge with true

distribution of data. Moreover, it can be used for improving the results of

less flexible clustering techniques, such as projection pursuit clustering. The

paper presents extensive theoretical analysis of the model and fast algorithm

for its efficient optimization. Experimental results show that C3L finds high

quality clustering model, which can be applied in discovering meaningful groups

in partially classified data.

Near-optimal linear decision trees for k-SUM and related problems

Daniel M. Kane , Shachar Lovett , Shay Moran

Comments: 18 paged, 1 figure

Subjects

:

Computational Geometry (cs.CG)

; Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Learning (cs.LG); Combinatorics (math.CO)

We construct near optimal linear decision trees for a variety of decision

problems in combinatorics and discrete geometry. For example, for any constant

(k), we construct linear decision trees that solve the (k)-SUM problem on (n)

elements using (O(n log^2 n)) linear queries. Moreover, the queries we use are

comparison queries, which compare the sums of two (k)-subsets; when viewed as

linear queries, comparison queries are (2k)-sparse and have only ({-1,0,1})

coefficients. We give similar constructions for sorting sumsets (A+B) and for

solving the SUBSET-SUM problem, both with optimal number of queries, up to

poly-logarithmic terms.

Our constructions are based on the notion of “inference dimension”, recently

introduced by the authors in the context of active classification with

comparison queries. This can be viewed as another contribution to the fruitful

link between machine learning and discrete geometry, which goes back to the

discovery of the VC dimension.

Semi-Supervised AUC Optimization based on Positive-Unlabeled Learning

Tomoya Sakai , Gang Niu , Masashi Sugiyama Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)

Maximizing the area under the receiver operating characteristic curve (AUC)

is a standard approach to imbalanced classification. So far, various supervised

AUC optimization methods have been developed and they are also extended to

semi-supervised scenarios to cope with small sample problems. However, existing

semi-supervised AUC optimization methods rely on strong distributional

assumptions, which are rarely satisfied in real-world problems. In this paper,

we propose a novel semi-supervised AUC optimization method that does not

require such restrictive assumptions. We first develop an AUC optimization

method based only on positive and unlabeled data (PU-AUC) and then extend it to

semi-supervised learning by combining it with a supervised AUC optimization

method. We theoretically prove that, without the restrictive distributional

assumptions, unlabeled data contribute to improving the generalization

performance in PU and semi-supervised AUC optimization methods. Finally, we

demonstrate the practical usefulness of the proposed methods through

experiments.

Generative Convolutional Networks for Latent Fingerprint Reconstruction

Jan Svoboda , Federico Monti , Michael M. Bronstein Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG)

Performance of fingerprint recognition depends heavily on the extraction of

minutiae points. Enhancement of the fingerprint ridge pattern is thus an

essential pre-processing step that noticeably reduces false positive and

negative detection rates. A particularly challenging setting is when the

fingerprint images are corrupted or partially missing. In this work, we apply

generative convolutional networks to denoise visible minutiae and predict the

missing parts of the ridge pattern. The proposed enhancement approach is tested

as a pre-processing step in combination with several standard feature

extraction methods such as MINDTCT, followed by biometric comparison using MCC

and BOZORTH3. We evaluate our method on several publicly available latent

fingerprint datasets captured using different sensors.

Semi-supervised cross-entropy clustering with information bottleneck constraint

Marek Śmieja , Bernhard C. Geiger Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a semi-supervised clustering method, CEC-IB, that

models data with a set of Gaussian distributions and that retrieves clusters

based on a partial labeling provided by the user (partition-level side

information). By combining the ideas from cross-entropy clustering (CEC) with

those from the information bottleneck method (IB), our method trades between

three conflicting goals: the accuracy with which the data set is modeled, the

simplicity of the model, and the consistency of the clustering with side

information. Experiments demonstrate that CEC-IB has a performance comparable

to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but

is faster, more robust to noisy labels, automatically determines the optimal

number of clusters, and performs well when not all classes are present in the

side information. Moreover, in contrast to other semi-supervised models, it can

be successfully applied in discovering natural subgroups if the partition-level

side information is derived from the top levels of a hierarchical clustering.

Information Theory

Blind Detection with Polar Codes

Carlo Condo , Seyyed Ali Hashemi , Warren J. Gross Subjects : Information Theory (cs.IT)

In blind detection, a set of candidates has to be decoded within a strict

time constraint, to identify which transmissions are directed at the user

equipment. Blind detection is an operation required by the 3GPP

LTE/LTE-Advanced standard, and it will be required in the 5th generation

wireless communication standard (5G) as well. We propose a blind detection

scheme based on polar codes, where the radio network temporary identifier

(RNTI) is transmitted instead of some of the frozen bits. A low-complexity

decoding stage decodes all candidates, selecting a subset that is decoded by a

high-performance algorithm. Simulations results show good missed detection and

false alarm rates, that meet the system specifications. We also propose an

early stopping criterion for the second decoding stage that can reduce the

number of operations performed, improving both average latency and energy

consumption. The detection speed is analyzed and different system parameter

combinations are shown to meet the stringent timing requirements, leading to

various implementation trade-offs.

PER Approximation for Cross-Layer Optimization under Reliability and Energy Constraints

Aamir Mahmood , Mikael Gidlund , M M Aftab Hossain Subjects : Information Theory (cs.IT)

The vision of connecting billions of battery operated devices to be used for

diverse emerging applications calls for a wireless communication system that

can support stringent reliability and latency requirements. Both reliability

and energy efficiency are critical for many of these applications that involve

communication with short packets which undermine the coding gain achievable

from large packets. In this paper, we first revisit the packet error rate (PER)

performance of uncoded schemes in block fading channels and derive a simple and

accurate PER expression. Specifically, we show that the waterfall threshold in

the PER upper bound in Nakagami-(m) block fading channels is tightly

approximated by the (m)-th moment of an asymptotic distribution of PER in AWGN

channel. This PER expression gives an explicit connection between the

parameters of both the physical and link layers and the PER. We utilize this

connection for cross-layer design and optimization of communication links. To

this end, we optimize signal-to-noise ratio (SNR) and modulation order at

physical layer, and the packet length and number of retransmissions at link

layer with respect to distance under the prescribed delay and reliability

constraint.

On the Design of Matched Filters for Molecule Counting Receivers

Vahid Jamali , Arman Ahmadzadeh , Robert Schober

Comments: To appear in IEEE Communications Letter

Subjects

:

Information Theory (cs.IT)

In this paper, we design matched filters for diffusive molecular

communication systems taking into account the following impairments:

signal-dependent diffusion noise, inter-symbol interference (ISI), and external

interfering molecules. The receiver counts the number of observed molecules

several times within one symbol interval and employs linear filtering to detect

the transmitted data. We derive the optimal matched filter by maximizing the

expected signal-to-interference-plus-noise ratio of the decision variable.

Moreover, we show that for the special case of an ISI-free channel, the matched

filter reduces to a simple sum detector and a correlator for the channel

impulse response for the diffusion noise-limited and (external)

interference-limited regimes, respectively. Our simulation results reveal that

the proposed matched filter considerably outperforms the benchmark schemes

available in literature, especially when ISI is severe.

Wireless Channel Modeling Perspectives for Ultra-Reliable Communications

Patrick C. F. Eggers , Petar Popovski

Comments: Submitted to IEEE Transactions on Wireless Communications

Subjects

:

Information Theory (cs.IT)

; Networking and Internet Architecture (cs.NI)

Ultra-Reliable Communication (URC) is one of the distinctive features of the

upcoming 5G wireless communication. The level of reliability, going down to

packet error rates (PER) of (10^{-9}), should be sufficiently convincing in

order to remove cables in an industrial setting or provide remote control of

robots with mission-critical function. In this paper we present elements of

physical and statistical modeling of the wireless channel that are relevant for

characterization of the lower tail of the channel Cumulative Distribution

Function (CDF). There are channel models, such as Two-Wave with Diffuse Power

(TWDP) or Suzuki, where finding the full CDF is not tractable. We show that,

for a wide range of channel models, the outage probability at URC levels can be

approximated by a simple expression, whose exponent depends on the actual

channel model. Furthermore, it is seen that the two-wave model leads to

pessimistic predictions of the fading in the region of ultra-reliable

communications, while the CDFs of models that contain diffuse components have

slopes that correspond to the slope of a Rayleigh fading. We provide analysis

of the receive antenna diversity schemes for URC-relevant statistics and obtain

a new expression for Maximum Ratio Combining (MRC) in Weibull channels.

3GPP-inspired HetNet Model using Poisson Cluster Process: Sum-product Functionals and Downlink Coverage

Chiranjib Saha , Mehrnaz Afshang , Harpreet S. Dhillon

Comments: Submitted to IEEE Transactions on Communications. A part of this paper appeared in 2017 ITA Workshop. It is available at arXiv:1702.05706

Subjects

:

Information Theory (cs.IT)

; Networking and Internet Architecture (cs.NI)

The growing complexity of heterogeneous cellular networks (HetNets) has

necessitated a variety of user and base station (BS) configurations to be

considered for realistic performance evaluation and system design. This is

directly reflected in the HetNet simulation models proposed by standardization

bodies, such as the third generation partnership project (3GPP). Complementary

to these simulation models, stochastic geometry-based approach, modeling the

locations of the users and the K tiers of BSs as independent and homogeneous

Poisson point processes (PPPs), has gained prominence in the past few years.

Despite its success in revealing useful insights, this PPP-based K-tier HetNet

model is not rich enough to capture spatial coupling between user and BS

locations that exists in real-world HetNet deployments and is included in 3GPP

simulation models. In this paper, we demonstrate that modeling a fraction of

users and arbitrary number of BS tiers alternatively with a Poisson cluster

process (PCP) captures the aforementioned coupling, thus bridging the gap

between the 3GPP simulation models and the PPP-based analytic model for

HetNets. We further show that the downlink coverage probability of a typical

user under maximum signal-to-interference-ratio association can be expressed in

terms of the sum-product functionals over PPP, PCP, and its associated

offspring point process, which are all characterized as a part of our analysis.

We also show that the proposed model converges to the PPP-based HetNet model as

the cluster size of the PCPs tends to infinity. Finally, we specialize our

analysis based on general PCPs for Thomas and Matern cluster processes. Special

instances of the proposed model closely resemble the different configurations

for BS and user locations considered in 3GPP simulations.

State-Dependent Gaussian Multiple Access Channels: New Outer Bounds and Capacity Results

Wei Yang , Yingbin Liang , Shlomo Shamai (Shitz), H. Vincent Poor

Comments: The material of this paper will be presented in part at the 2017 International Symposium on Information Theory (ISIT)

Subjects

:

Information Theory (cs.IT)

This paper studies a two-user state-dependent Gaussian multiple-access

channel (MAC) with state noncausally known at one encoder. Two scenarios are

considered: i) each user wishes to communicate an independent message to the

common receiver, and ii) the two encoders send a common message to the receiver

and the non-cognitive encoder (i.e., the encoder that does not know the state)

sends an independent individual message (this model is also known as the MAC

with degraded message sets). For both scenarios, new outer bounds on the

capacity region are derived, which improve uniformly over the best known outer

bounds. In the first scenario, the two corner points of the capacity region as

well as the sum rate capacity are established, and it is shown that a

single-letter solution is adequate to achieve both the corner points and the

sum rate capacity. Furthermore, the full capacity region is characterized in

situations in which the sum rate capacity is equal to the capacity of the

helper problem. The proof exploits the optimal-transportation idea of

Polyanskiy and Wu (which was used previously to establish an outer bound on the

capacity region of the interference channel) and the worst-case Gaussian noise

result for the case in which the input and the noise are dependent.

Capacity of Burst Noise-Erasure Channels With and Without Feedback and Input Cost

Lin Song , Fady Alajaji , Tamás Linder

Comments: Parts of this work will be presented at the 2017 IEEE International Symposium on Information Theory

Subjects

:

Information Theory (cs.IT)

A class of burst noise-erasure channels which incorporate both errors and

erasures during transmission is studied. The channel, whose output is

explicitly expressed in terms of its input and a stationary ergodic

noise-erasure process, is shown to have a so-called “quasi-symmetry” property

under certain invertibility conditions. As a result, it is proved that a

uniformly distributed input process maximizes the channel’s block mutual

information, resulting in a closed-form formula for its non-feedback capacity

in terms of the noise-erasure entropy rate and the entropy rate of an auxiliary

erasure process. The feedback channel capacity is also characterized, showing

that feedback does not increase capacity and generalizing prior related

results. The capacity-cost function of the channel with and without feedback is

also investigated. A sequence of finite-letter upper bounds for the

capacity-cost function without feedback is derived. Finite-letter lower bonds

for the capacity-cost function with feedback are obtained using a specific

encoding rule. Based on these bounds, it is demonstrated both numerically and

analytically that feedback can increase the capacity-cost function for a class

of channels with Markov noise-erasure processes.

Fourth-order Tensors with Multidimensional Discrete Transforms

Xiao-Yang Liu , Xiaodong Wang Subjects : Numerical Analysis (cs.NA) ; Information Theory (cs.IT)

The big data era is swamping areas including data analysis, machine/deep

learning, signal processing, statistics, scientific computing, and cloud

computing. The multidimensional feature and huge volume of big data put urgent

requirements to the development of multilinear modeling tools and efficient

algorithms. In this paper, we build a novel multilinear tensor space that

supports useful algorithms such as SVD and QR, while generalizing the matrix

space to fourth-order tensors was believed to be challenging. Specifically,

given any multidimensional discrete transform, we show that fourth-order

tensors are bilinear operators on a space of matrices. First, we take a

transform-based approach to construct a new tensor space by defining a new

multiplication operation and tensor products, and accordingly the analogous

concepts: identity, inverse, transpose, linear combinations, and orthogonality.

Secondly, we define the (mathcal{L})-SVD for fourth-order tensors and present

an efficient algorithm, where the tensor case requires a stronger condition for

unique decomposition than the matrix case. Thirdly, we define the tensor

(mathcal{L})-QR decomposition and propose a Householder QR algorithm to avoid

the catastrophic cancellation problem associated with the conventional

Gram-Schmidt process. Finally, we validate our schemes on video compression and

one-shot face recognition. For video compression, compared with the existing

tSVD, the proposed (mathcal{L})-SVD achieves (3sim 10)dB gains in RSE, while

the running time is reduced by about (50\%) and (87.5\%), respectively. For

one-shot face recognition, the recognition rate is increased by about (10\%

sim 20\%).

欢迎加入我爱机器学习QQ11群:191401275

arXiv Paper Daily: Fri, 5 May 2017

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机图形学原理及实践:C语言描述(原书第2版) (平装)

计算机图形学原理及实践:C语言描述(原书第2版) (平装)

福利 / 唐泽圣 / 机械工业出版社 / 2004-3 / 95.0

《计算机图形学原理及实践:C语言描述(原书第2版)》:这是计算机图形学领域的一部经典之作,作者Fley、va Dam等是国际图形学界的著名学者、学术带头人,而且《计算机图形学原理及实践:C语言描述(原书第2版)》英文版自出版以来,一直是各国大学计算机图形学课程的主要教科书。来自清华大学、北京大学、中国科学院计算技术研究所、中国科学院软件研究所的多位图形学领域的专家和精英花费了大量的时间和精力进行翻......一起来看看 《计算机图形学原理及实践:C语言描述(原书第2版) (平装)》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

随机密码生成器
随机密码生成器

多种字符组合密码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具