内容简介:arXiv Paper Daily: Mon, 19 Jun 2017
Neural and Evolutionary Computing
The Evolution of Neural Network-Based Chart Patterns: A Preliminary Study
Comments: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, Germany
Subjects:
Neural and Evolutionary Computing (cs.NE)
A neural network-based chart pattern represents adaptive parametric features,
including non-linear transformations, and a template that can be applied in the
feature space. The search of neural network-based chart patterns has been
unexplored despite its potential expressiveness. In this paper, we formulate a
general chart pattern search problem to enable cross-representational
quantitative comparison of various search schemes. We suggest a HyperNEAT
framework applying state-of-the-art deep neural network techniques to find
attractive neural network-based chart patterns; These techniques enable a fast
evaluation and search of robust patterns, as well as bringing a performance
gain. The proposed framework successfully found attractive patterns on the
Korean stock market. We compared newly found patterns with those found by
different search schemes, showing the proposed approach has potential.
Evaluating Noisy Optimisation Algorithms: First Hitting Time is Problematic
Comments: 4 pages, 4 figurs, 1 table
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
A key part of any evolutionary algorithm is fitness evaluation. When fitness
evaluations are corrupted by noise, as happens in many real-world problems as a
consequence of various types of uncertainty, a strategy is needed in order to
cope with this. Resampling is one of the most common strategies, whereby each
solution is evaluated many times in order to reduce the variance of the fitness
estimates. When evaluating the performance of a noisy optimisation algorithm, a
key consideration is the stopping condition for the algorithm. A frequently
used stopping condition in runtime analysis, known as “First Hitting Time”, is
to stop the algorithm as soon as it encounters the optimal solution. However,
this is unrealistic for real-world problems, as if the optimal solution were
already known, there would be no need to search for it. This paper argues that
the use of First Hitting Time, despite being a commonly used approach, is
significantly flawed and overestimates the quality of many algorithms in
real-world cases, where the optimum is not known in advance and has to be
genuinely searched for. A better alternative is to measure the quality of the
solution an algorithm returns after a fixed evaluation budget, i.e., to focus
on final solution quality. This paper argues that focussing on final solution
quality is more realistic and demonstrates cases where the results produced by
each algorithm evaluation method lead to very different conclusions regarding
the quality of each noisy optimisation algorithm.
Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder
Comments: Accepted to Rep4NLP 2017 Workshop at ACL 2017 Conference
Subjects:
Computation and Language (cs.CL)
; Neural and Evolutionary Computing (cs.NE)
We investigate the integration of a planning mechanism into an
encoder-decoder architecture with attention for character-level machine
translation. We develop a model that plans ahead when it computes alignments
between the source and target sequences, constructing a matrix of proposed
future alignments and a commitment vector that governs whether to follow or
recompute the plan. This mechanism is inspired by the strategic attentive
reader and writer (STRAW) model. Our proposed model is end-to-end trainable
with fully differentiable operations. We show that it outperforms a strong
baseline on three character-level decoder neural machine translation on WMT’15
corpus. Our analysis demonstrates that our model can compute qualitatively
intuitive alignments and achieves superior performance with fewer parameters.
Gradient Descent for Spiking Neural Networks
Dongsung Huh , Terrence J. Sejnowski Subjects : Neurons and Cognition (q-bio.NC) ; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Much of studies on neural computation are based on network models of static
neurons that produce analog output, despite the fact that information
processing in the brain is predominantly carried out by dynamic neurons that
produce discrete pulses called spikes. Research in spike-based computation has
been impeded by the lack of efficient supervised learning algorithm for spiking
networks. Here, we present a gradient descent method for optimizing spiking
network models by introducing a differentiable formulation of spiking networks
and deriving the exact gradient calculation. For demonstration, we trained
recurrent spiking networks on two dynamic tasks: one that requires optimizing
fast (~millisecond) spike-based interactions for efficient encoding of
information, and a delayed memory XOR task over extended duration (~second).
The results show that our method indeed optimizes the spiking network dynamics
on the time scale of individual spikes as well as behavioral time scales. In
conclusion, our result offers a general purpose supervised learning algorithm
for spiking neural networks, thus advancing further investigations on
spike-based computation.
Computer Vision and Pattern Recognition
Perceptual Generative Adversarial Networks for Small Object Detection
Jianan Li , Xiaodan Liang , Yunchao Wei , Tingfa Xu , Jiashi Feng , Shuicheng Yan Subjects : Computer Vision and Pattern Recognition (cs.CV)
Detecting small objects is notoriously challenging due to their low
resolution and noisy representation. Existing object detection pipelines
usually detect small objects through learning representations of all the
objects at multiple scales. However, the performance gain of such ad hoc
architectures is usually limited to pay off the computational cost. In this
work, we address the small object detection problem by developing a single
architecture that internally lifts representations of small objects to
“super-resolved” ones, achieving similar characteristics as large objects and
thus more discriminative for detection. For this purpose, we propose a new
Perceptual Generative Adversarial Network (Perceptual GAN) model that improves
small object detection through narrowing representation difference of small
objects from the large ones. Specifically, its generator learns to transfer
perceived poor representations of the small objects to super-resolved ones that
are similar enough to real large objects to fool a competing discriminator.
Meanwhile its discriminator competes with the generator to identify the
generated representation and imposes an additional perceptual requirement –
generated representations of small objects must be beneficial for detection
purpose – on the generator. Extensive evaluations on the challenging
Tsinghua-Tencent 100K and the Caltech benchmark well demonstrate the
superiority of Perceptual GAN in detecting small objects, including traffic
signs and pedestrians, over well-established state-of-the-arts.
Multispectral and Hyperspectral Image Fusion Using a 3-D-Convolutional Neural Network
Frosti Palsson , Johannes R. Sveinsson , Magnus O. Ulfarsson Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (stat.ML)
In this paper, we propose a method using a three dimensional convolutional
neural network (3-D-CNN) to fuse together multispectral (MS) and hyperspectral
(HS) images to obtain a high resolution hyperspectral image. Dimensionality
reduction of the hyperspectral image is performed prior to fusion in order to
significantly reduce the computational time and make the method more robust to
noise. Experiments are performed on a data set simulated using a real
hyperspectral image. The results obtained show that the proposed approach is
very promising when compared to conventional methods. This is especially true
when the hyperspectral image is corrupted by additive noise.
Self-ensembling for domain adaptation
Comments: 15 pages, 1 figure, submitted to BMVC 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This paper explores the use of self-ensembling with random image augmentation
— a technique that has achieved impressive results in the area of
semi-supervised learning — for visual domain adaptation problems. We modify
the approach of Laine et al. to improve stability and ease of use. Our approach
demonstrates state of the art results when performing adaptation between the
following pairs of datasets: MNIST and USPS, CIFAR-10 and STL, SVHN and MNIST,
Syn-Digits to SVHN and Syn-Signs to GTSRB. We also explore the use of richer
data augmentation to solve the challenging MNIST to SVHN adaptation path.
Dynamic Filters in Graph Convolutional Networks
Nitika Verma , Edmond Boyer , Jakob Verbeek Subjects : Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNNs) have massively impacted visual
recognition in 2D images, and are now ubiquitous in state-of-the-art
approaches. While CNNs naturally extend to other domains, such as audio and
video, where data is also organized in rectangular grids, they do not easily
generalize to other types of data such as 3D shape meshes, social network
graphs or molecular graphs. To handle such data, we propose a novel
graph-convolutional network architecture that builds on a generic formulation
that relaxes the 1-to-1 correspondence between filter weights and data elements
around the center of the convolution. The main novelty of our architecture is
that the shape of the filter is a function of the features in the previous
network layer, which is learned as an integral part of the neural network.
Experimental evaluations on digit recognition, semi-supervised document
classification, and 3D shape correspondence yield state-of-the-art results,
significantly improving over previous work for shape correspondence.
Interactive 3D Modeling with a Generative Adversarial Network
Jerry Liu , Fisher Yu , Thomas Funkhouser Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR)
This paper proposes the idea of using a generative adversarial network (GAN)
to assist a novice user in designing real-world shapes with a simple interface.
The user edits a voxel grid with a painting interface (like Minecraft). Yet, at
any time, he/she can execute a SNAP command, which projects the current voxel
grid onto a latent shape manifold with a learned projection operator and then
generates a similar, but more realistic, shape using a learned generator
network. Then the user can edit the resulting shape and snap again until he/she
is satisfied with the result. The main advantage of this approach is that the
projection and generation operators assist novice users to create 3D models
characteristic of a background distribution of object shapes, but without
having to specify all the details. The core new research idea is to use a GAN
to support this application. 3D GANs have previously been used for shape
generation, interpolation, and completion, but never for interactive modeling.
The new challenge for this application is to learn a projection operator that
takes an arbitrary 3D voxel model and produces a latent vector on the shape
manifold from which a similar and realistic shape can be generated. We develop
algorithms for this and other steps of the SNAP processing pipeline and
integrate them into a simple modeling tool. Experiments with these algorithms
and tool suggest that GANs provide a promising approach to computer-assisted
interactive modeling.
A Fully Trainable Network with RNN-based Pooling
Comments: 17 pages, 5 figures, 4 tables
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Pooling is an important component in convolutional neural networks (CNNs) for
aggregating features and reducing computational burden. Compared with other
components such as convolutional layers and fully connected layers which are
completely learned from data, the pooling component is still handcrafted such
as max pooling and average pooling. This paper proposes a learnable pooling
function using recurrent neural networks (RNN) so that the pooling can be fully
adapted to data and other components of the network, leading to an improved
performance. Such a network with learnable pooling function is referred to as a
fully trainable network (FTN). Experimental results have demonstrated that the
proposed RNN-based pooling can well approximate the existing pooling functions
and improve the performance of the network. Especially for small networks, the
proposed FTN can improve the performance by seven percentage points in terms of
error rate on the CIFAR-10 dataset compared with the traditional CNN.
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge
Comments: Submitted to the CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This article describes the final solution of team monkeytyping, who finished
in second place in the YouTube-8M video understanding challenge. The dataset
used in this challenge is a large-scale benchmark for multi-label video
classification. We extend the work in [1] and propose several improvements for
frame sequence modeling. We propose a network structure called Chaining that
can better capture the interactions between labels. Also, we report our
approaches in dealing with multi-scale information and attention pooling. In
addition, We find that using the output of model ensemble as a side target in
training can boost single model performance. We report our experiments in
bagging, boosting, cascade, and stacking, and propose a stacking algorithm
called attention weighted stacking. Our final submission is an ensemble that
consists of 74 sub models, all of which are listed in the appendix.
Symplectomorphic registration with phase space regularization by entropy spectrum pathways
Comments: 26 pages, 7 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The ability to register image data to a common coordinate system is a
critical feature of virtually all imaging studies that require multiple subject
analysis, combining single subject data from multiple modalities, or both.
However, in spite of the abundance of literature on the subject and the
existence of several variants of registration algorithms, their practical
utility remains problematic, as commonly acknowledged even by developers of
these methods because the complexity of the problem has resisted a general,
flexible, and robust theoretical and computational framework.
To address this issue, we present a new registration method that is similar
in spirit to the current state-of-the-art technique of diffeomorphic mapping,
but is more general and flexible. The method utilizes a Hamiltonian formalism
and constructs registration as a sequence of symplectomorphic maps in
conjunction with a novel phase space regularization based on the powerful
entropy spectrum pathways (ESP) framework.
The method is demonstrated on the three different magnetic resonance imaging
(MRI) modalities routinely used for human neuroimaging applications by mapping
between high resolution anatomical (HRA) volumes, medium resolution diffusion
weighted MRI (DW-MRI) and HRA volumes, and low resolution functional MRI (fMRI)
and HRA volumes. The typical processing time for high quality mapping ranges
from less than a minute to several minutes on a modern multi core CPU for
typical high resolution anatomical (~256x256x256 voxels) MRI volumes.
Face Clustering: Representation and Pairwise Constraints
Comments: 13 pages, journal paper
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Clustering face images according to their identity has two important
applications: (i) grouping a collection of face images when no external labels
are associated with images, and (ii) indexing for efficient large scale face
retrieval. The clustering problem is composed of two key parts: face
representation and choice of similarity for grouping faces. We first propose a
representation based on ResNet, which has been shown to perform very well in
image classification problems. Given this representation, we design a
clustering algorithm, Conditional Pairwise Clustering (ConPaC), which directly
estimates the adjacency matrix only based on the similarity between face
images. This allows a dynamic selection of number of clusters and retains
pairwise similarity between faces. ConPaC formulates the clustering problem as
a Conditional Random Field (CRF) model and uses Loopy Belief Propagation to
find an approximate solution for maximizing the posterior probability of the
adjacency matrix. Experimental results on two benchmark face datasets (LFW and
IJB-B) show that ConPaC outperforms well known clustering algorithms such as
k-means, spectral clustering and approximate rank-order. Additionally, our
algorithm can naturally incorporate pairwise constraints to obtain a
semi-supervised version that leads to improved clustering performance. We also
propose an k-NN variant of ConPaC, which has a linear time complexity given a
k-NN graph, suitable for large datasets.
Hierarchical Label Inference for Video Classification
Nelson Nauata , Jonathan Smith , Greg Mori Subjects : Computer Vision and Pattern Recognition (cs.CV)
Videos are a rich source of high-dimensional structured data, with a wide
range of interacting components at varying levels of granularity. In order to
improve understanding of unconstrained internet videos, it is important to
consider the role of labels at separate levels of abstraction. In this paper,
we consider the use of the Bidirectional Inference Neural Network (BINN) for
performing graph-based inference in label space for the task of video
classification. We take advantage of the inherent hierarchy between labels at
increasing granularity. The BINN is evaluated on the first and second release
of the YouTube-8M large scale multilabel video dataset. Our results demonstrate
the effectiveness of BINN, achieving significant improvements against baseline
models.
Robotic Ironing with 3D Perception and Force/Torque Feedback in Household Environments
Comments: Accepted and to be published on the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017) that will be held in Vancouver, Canada, September 24-28, 2017
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
As robotic systems become more popular in household environments, the
complexity of required tasks also increases. In this work we focus on a
domestic chore deemed dull by a majority of the population, the task of
ironing. The presented algorithm improves on the limited number of previous
works by joining 3D perception with force/torque sensing, with emphasis on
finding a practical solution with a feasible implementation in a domestic
setting. Our algorithm obtains a point cloud representation of the working
environment. From this point cloud, the garment is segmented and a custom
Wrinkleness Local Descriptor (WiLD) is computed to determine the location of
the present wrinkles. Using this descriptor, the most suitable ironing path is
computed and, based on it, the manipulation algorithm performs the
force-controlled ironing operation. Experiments have been performed with a
humanoid robot platform, proving that our algorithm is able to detect
successfully wrinkles present in garments and iteratively reduce the
wrinkleness using an unmodified iron.
A new look at clustering through the lens of deep convolutional neural networks
Ali Borji , Aysegul Dundar Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV)
Classification and clustering have been studied separately in machine
learning and computer vision. Inspired by the recent success of deep learning
models in solving various vision problems (e.g., object recognition, semantic
segmentation) and the fact that humans serve as the gold standard in assessing
clustering algorithms, here, we advocate for a unified treatment of the two
problems and suggest that hierarchical frameworks that progressively build
complex patterns on top of the simpler ones (e.g., convolutional neural
networks) offer a promising solution. We do not dwell much on the learning
mechanisms in these frameworks as they are still a matter of debate, with
respect to biological constraints. Instead, we emphasize on the
compositionality of the real world structures and objects. In particular, we
show that CNNs, trained end to end using back propagation with noisy labels,
are able to cluster data points belonging to several overlapping shapes, and do
so much better than the state of the art algorithms. The main takeaway lesson
from our study is that mechanisms of human vision, particularly the hierarchal
organization of the visual ventral stream should be taken into account in
clustering algorithms (e.g., for learning representations in an unsupervised
manner or with minimum supervision) to reach human level clustering
performance. This, by no means, suggests that other methods do not hold merits.
For example, methods relying on pairwise affinities (e.g., spectral clustering)
have been very successful in many cases but still fail in some cases (e.g.,
overlapping clusters).
Distance weighted discrimination of face images for gender classification
Comments: 9 pages, 4 figures, 1 table
Subjects:
Applications (stat.AP)
; Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME)
We illustrate the advantages of distance weighted discrimination for
classification and feature extraction in a High Dimension Low Sample Size
(HDLSS) situation. The HDLSS context is a gender classification problem of face
images in which the dimension of the data is several orders of magnitude larger
than the sample size. We compare distance weighted discrimination with Fisher’s
linear discriminant, support vector machines, and principal component analysis
by exploring their classification interpretation through insightful
visuanimations and by examining the classifiers’ discriminant errors. This
analysis enables us to make new contributions to the understanding of the
drivers of human discrimination between males and females.
Artificial Intelligence
Value-Decomposition Networks For Cooperative Multi-Agent Learning
Peter Sunehag , Guy Lever , Audrunas Gruslys , Wojciech Marian Czarnecki , Vinicius Zambaldi , Max Jaderberg , Marc Lanctot , Nicolas Sonnerat , Joel Z. Leibo , Karl Tuyls , Thore Graepel Subjects : Artificial Intelligence (cs.AI)
We study the problem of cooperative multi-agent reinforcement learning with a
single joint reward signal. This class of learning problems is difficult
because of the often large combined action and observation spaces. In the fully
centralized and decentralized approaches, we find the problem of spurious
rewards and a phenomenon we call the “lazy agent” problem, which arises due to
partial observability. We address these problems by training individual agents
with a novel value decomposition network architecture, which learns to
decompose the team value function into agent-wise value functions. We perform
an experimental evaluation across a range of partially-observable multi-agent
domains and show that learning such value-decompositions leads to superior
results, in particular when combined with weight sharing, role information and
information channels.
From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem
Comments: Submitted to Int’l Journal of Approximate Reasoning
Subjects:
Artificial Intelligence (cs.AI)
; Logic in Computer Science (cs.LO)
We consider the question of extending propositional logic to a logic of
plausible reasoning, and posit four requirements that any such extension should
satisfy. Each is a requirement that some property of classical propositional
logic be preserved in the extended logic; as such, the requirements are simpler
and less problematic than those used in Cox’s Theorem and its variants. As with
Cox’s Theorem, our requirements imply that the extended logic must be
isomorphic to (finite-set) probability theory. We also obtain specific
numerical values for the probabilities, recovering the classical definition of
probability as a theorem, with truth assignments that satisfy the premise
playing the role of the “possible cases.”
Improving Scalability of Inductive Logic Programming via Pruning and Best-Effort Optimisation
Comments: 24 pages, preprint of article accepted at Expert Systems With Applications
Subjects:
Artificial Intelligence (cs.AI)
Inductive Logic Programming (ILP) combines rule-based and statistical
artificial intelligence methods, by learning a hypothesis comprising a set of
rules given background knowledge and constraints for the search space. We focus
on extending the XHAIL algorithm for ILP which is based on Answer Set
Programming and we evaluate our extensions using the Natural Language
Processing application of sentence chunking. With respect to processing natural
language, ILP can cater for the constant change in how we use language on a
daily basis. At the same time, ILP does not require huge amounts of training
examples such as other statistical methods and produces interpretable results,
that means a set of rules, which can be analysed and tweaked if necessary. As
contributions we extend XHAIL with (i) a pruning mechanism within the
hypothesis generalisation algorithm which enables learning from larger
datasets, (ii) a better usage of modern solver technology using recently
developed optimisation methods, and (iii) a time budget that permits the usage
of suboptimal results. We evaluate these improvements on the task of sentence
chunking using three datasets from a recent SemEval competition. Results show
that our improvements allow for learning on bigger datasets with results that
are of similar quality to state-of-the-art systems on the same task. Moreover,
we compare the hypotheses obtained on datasets to gain insights on the
structure of each dataset.
Deal or No Deal? End-to-End Learning for Negotiation Dialogues
Mike Lewis , Denis Yarats , Yann N. Dauphin , Devi Parikh , Dhruv Batra Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Much of human dialogue occurs in semi-cooperative settings, where agents with
different goals attempt to agree on common decisions. Negotiations require
complex communication and reasoning skills, but success is easy to measure,
making this an interesting task for AI. We gather a large dataset of
human-human negotiations on a multi-issue bargaining task, where agents who
cannot observe each other’s reward functions must reach an agreement (or a
deal) via natural language dialogue. For the first time, we show it is possible
to train end-to-end models for negotiation, which must learn both linguistic
and reasoning skills with no annotated dialogue states. We also introduce
dialogue rollouts, in which the model plans ahead by simulating possible
complete continuations of the conversation, and find that this technique
dramatically improves performance. Our code and dataset are publicly available
( this https URL ).
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Comments: ICML 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG)
As a step towards developing zero-shot task generalization capabilities in
reinforcement learning (RL), we introduce a new RL problem where the agent
should learn to execute sequences of instructions after learning useful skills
that solve subtasks. In this problem, we consider two types of generalizations:
to previously unseen instructions and to longer sequences of instructions. For
generalization over unseen instructions, we propose a new objective which
encourages learning correspondences between similar subtasks by making
analogies. For generalization over sequential instructions, we present a
hierarchical architecture where a meta controller learns to use the acquired
skills for executing the instructions. To deal with delayed reward, we propose
a new neural architecture in the meta controller that learns when to update the
subtask, which makes learning more efficient. Experimental results on a
stochastic 3D domain show that the proposed ideas are crucial for
generalization to longer instructions as well as unseen instructions.
Conjunctions of Among Constraints
Comments: 15 pages plus appendix
Subjects:
Artificial Intelligence (cs.AI)
; Logic in Computer Science (cs.LO)
Many existing global constraints can be encoded as a conjunction of among
constraints. An among constraint holds if the number of the variables in its
scope whose value belongs to a prespecified set, which we call its range, is
within some given bounds. It is known that domain filtering algorithms can
benefit from reasoning about the interaction of among constraints so that
values can be filtered out taking into consideration several among constraints
simultaneously. The present pa- per embarks into a systematic investigation on
the circumstances under which it is possible to obtain efficient and complete
domain filtering algorithms for conjunctions of among constraints. We start by
observing that restrictions on both the scope and the range of the among
constraints are necessary to obtain meaningful results. Then, we derive a
domain flow-based filtering algorithm and present several applications. In
particular, it is shown that the algorithm unifies and generalizes several
previous existing results.
Collaborative vehicle routing: a survey
Margaretha Gansterer , Richard F. Hartl Subjects : Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Optimization and Control (math.OC); Physics and Society (physics.soc-ph)
In horizontal collaborations, carriers form coalitions in order to perform
parts of their logistics operations jointly. By exchanging transportation
requests among each other, they can operate more efficiently and in a more
sustainable way. Collaborative vehicle routing has been extensively discussed
in the literature. We identify three major streams of research: (i) centralized
collaborative planning, (ii) decentralized planning without auctions, and (ii)
auction-based decentralized planning. For each of them we give a structured
overview on the state of knowledge and discuss future research directions.
Structured Best Arm Identification with Fixed Confidence
Ruitong Huang , Mohammad M. Ajallooeian , Csaba Szepesvári , Martin Müller Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)
We study the problem of identifying the best action among a set of possible
options when the value of each action is given by a mapping from a number of
noisy micro-observables in the so-called fixed confidence setting. Our main
motivation is the application to the minimax game search, which has been a
major topic of interest in artificial intelligence. In this paper we introduce
an abstract setting to clearly describe the essential properties of the
problem. While previous work only considered a two-move game tree search
problem, our abstract setting can be applied to the general minimax games where
the depth can be non-uniform and arbitrary, and transpositions are allowed. We
introduce a new algorithm (LUCB-micro) for the abstract setting, and give its
lower and upper sample complexity results. Our bounds recover some previous
results, which were only available in more limited settings, while they also
shed further light on how the structure of minimax problems influence sample
complexity.
AI-Powered Social Bots
Comments: 2 figures
Subjects:
Social and Information Networks (cs.SI)
; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multimedia (cs.MM)
This paper gives an overview of impersonation bots that generate output in
one, or possibly, multiple modalities. We also discuss rapidly advancing areas
of machine learning and artificial intelligence that could lead to
frighteningly powerful new multi-modal social bots. Our main conclusion is that
most commonly known bots are one dimensional (i.e., chatterbot), and far from
deceiving serious interrogators. However, using recent advances in machine
learning, it is possible to unleash incredibly powerful, human-like armies of
social bots, in potentially well coordinated campaigns of deception and
influence.
Bib2vec: An Embedding-based Search System for Bibliographic Information
Comments: EACL2017 extended version
Journal-ref: Proceedings of the EACL 2017 Software Demonstrations, Valencia,
Spain, April 3-7 2017, pages 112-115
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We propose a novel embedding model that represents relationships among
several elements in bibliographic information with high representation ability
and flexibility. Based on this model, we present a novel search system that
shows the relationships among the elements in the ACL Anthology Reference
Corpus. The evaluation results show that our model can achieve a high
prediction ability and produce reasonable search results.
An Overview of Multi-Task Learning in Deep Neural Networks
Comments: 14 pages, 8 figures
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Multi-task learning (MTL) has led to successes in many applications of
machine learning, from natural language processing and speech recognition to
computer vision and drug discovery. This article aims to give a general
overview of MTL, particularly in deep neural networks. It introduces the two
most common methods for MTL in Deep Learning, gives an overview of the
literature, and discusses recent advances. In particular, it seeks to help ML
practitioners apply MTL by shedding light on how MTL works and providing
guidelines for choosing appropriate auxiliary tasks.
Evaluating Noisy Optimisation Algorithms: First Hitting Time is Problematic
Comments: 4 pages, 4 figurs, 1 table
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
A key part of any evolutionary algorithm is fitness evaluation. When fitness
evaluations are corrupted by noise, as happens in many real-world problems as a
consequence of various types of uncertainty, a strategy is needed in order to
cope with this. Resampling is one of the most common strategies, whereby each
solution is evaluated many times in order to reduce the variance of the fitness
estimates. When evaluating the performance of a noisy optimisation algorithm, a
key consideration is the stopping condition for the algorithm. A frequently
used stopping condition in runtime analysis, known as “First Hitting Time”, is
to stop the algorithm as soon as it encounters the optimal solution. However,
this is unrealistic for real-world problems, as if the optimal solution were
already known, there would be no need to search for it. This paper argues that
the use of First Hitting Time, despite being a commonly used approach, is
significantly flawed and overestimates the quality of many algorithms in
real-world cases, where the optimum is not known in advance and has to be
genuinely searched for. A better alternative is to measure the quality of the
solution an algorithm returns after a fixed evaluation budget, i.e., to focus
on final solution quality. This paper argues that focussing on final solution
quality is more realistic and demonstrates cases where the results produced by
each algorithm evaluation method lead to very different conclusions regarding
the quality of each noisy optimisation algorithm.
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Suncong Zheng , Feng Wang , Hongyun Bao , Yuexing Hao , Peng Zhou , Bo Xu Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Learning (cs.LG)
Joint extraction of entities and relations is an important task in
information extraction. To tackle this problem, we firstly propose a novel
tagging scheme that can convert the joint extraction task to a tagging problem.
Then, based on our tagging scheme, we study different end-to-end models to
extract entities and their relations directly, without identifying entities and
relations separately. We conduct experiments on a public dataset produced by
distant supervision method and the experimental results show that the tagging
based methods are better than most of the existing pipelined and joint learning
methods. What’s more, the end-to-end model proposed in this paper, achieves the
best results on the public dataset.
Information Retrieval
Twigraph: Discovering and Visualizing Influential Words between Twitter Profiles
Dhanasekar S , Sudharshan Srinivasan Subjects : Social and Information Networks (cs.SI) ; Information Retrieval (cs.IR)
The social media craze is on an ever increasing spree, and people are
connected with each other like never before, but these vast connections are
visually unexplored. We propose a methodology Twigraph to explore the
connections between persons using their Twitter profiles. First, we propose a
hybrid approach of recommending social media profiles, articles, and
advertisements to a user.The profiles are recommended based on the similarity
score between the user profile, and profile under evaluation. The similarity
between a set of profiles is investigated by finding the top influential words
thus causing a high similarity through an Influence Term Metric for each word.
Then, we group profiles of various domains such as politics, sports, and
entertainment based on the similarity score through a novel clustering
algorithm. The connectivity between profiles is envisaged using word graphs
that help in finding the words that connect a set of profiles and the profiles
that are connected to a word. Finally, we analyze the top influential words
over a set of profiles through clustering by finding the similarity of that
profiles enabling to break down a Twitter profile with a lot of followers to
fine level word connections using word graphs. The proposed method was
implemented on datasets comprising 1.1 M Tweets obtained from Twitter.
Experimental results show that the resultant influential words were highly
representative of the relationship between two profiles or a set of profiles
Bib2vec: An Embedding-based Search System for Bibliographic Information
Comments: EACL2017 extended version
Journal-ref: Proceedings of the EACL 2017 Software Demonstrations, Valencia,
Spain, April 3-7 2017, pages 112-115
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We propose a novel embedding model that represents relationships among
several elements in bibliographic information with high representation ability
and flexibility. Based on this model, we present a novel search system that
shows the relationships among the elements in the ACL Anthology Reference
Corpus. The evaluation results show that our model can achieve a high
prediction ability and produce reasonable search results.
Topic supervised non-negative matrix factorization
Comments: 20 pages, 7 figures
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Topic models have been extensively used to organize and interpret the
contents of large, unstructured corpora of text documents. Although topic
models often perform well on traditional training vs. test set evaluations, it
is often the case that the results of a topic model do not align with human
interpretation. This interpretability fallacy is largely due to the
unsupervised nature of topic models, which prohibits any user guidance on the
results of a model. In this paper, we introduce a semi-supervised method called
topic supervised non-negative matrix factorization (TS-NMF) that enables the
user to provide labeled example documents to promote the discovery of more
meaningful semantic structure of a corpus. In this way, the results of TS-NMF
better match the intuition and desired labeling of the user. The core of TS-NMF
relies on solving a non-convex optimization problem for which we derive an
iterative algorithm that is shown to be monotonic and convergent to a local
optimum. We demonstrate the practical utility of TS-NMF on the Reuters and
PubMed corpora, and find that TS-NMF is especially useful for conceptual or
broad topics, where topic key terms are not well understood. Although
identifying an optimal latent structure for the data is not a primary objective
of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard
similarity scores than the contemporary methods, (unsupervised) NMF and latent
Dirichlet allocation, at supervision rates as low as 10% to 20%.
Measuring Personalization of Web Search
Anikó Hannák , Piotr Sapieżyński , Arash Molavi Khaki , David Lazer , Alan Mislove , Christo Wilson Subjects : Computers and Society (cs.CY) ; Information Retrieval (cs.IR)
Web search is an integral part of our daily lives. Recently, there has been a
trend of personalization in Web search, where different users receive different
results for the same search query. The increasing level of personalization is
leading to concerns about Filter Bubble effects, where certain users are simply
unable to access information that the search engines’ algorithm decides is
irrelevant. Despite these concerns, there has been little quantification of the
extent of personalization in Web search today, or the user attributes that
cause it.
In light of this situation, we make three contributions. First, we develop a
methodology for measuring personalization in Web search results. While
conceptually simple, there are numerous details that our methodology must
handle in order to accurately attribute differences in search results to
personalization. Second, we apply our methodology to 200 users on Google Web
Search and 100 users on Bing. We find that, on average, 11.7% of results show
differences due to personalization on Google, while 15.8% of results are
personalized on Bing, but that this varies widely by search query and by result
ranking. Third, we investigate the user features used to personalize on Google
Web Search and Bing. Surprisingly, we only find measurable personalization as a
result of searching with a logged in account and the IP address of the
searching user. Our results are a first step towards understanding the extent
and effects of personalization on Web search engines today.
Computation and Language
An Automatic Approach for Document-level Topic Model Evaluation
Comments: 10 pages; accepted for the Twenty First Conference on Computational Natural Language Learning (CoNLL 2017)
Subjects:
Computation and Language (cs.CL)
Topic models jointly learn topics and document-level topic distribution.
Extrinsic evaluation of topic models tends to focus exclusively on topic-level
evaluation, e.g. by assessing the coherence of topics. We demonstrate that
there can be large discrepancies between topic- and document-level model
quality, and that basing model evaluation on topic-level analysis can be highly
misleading. We propose a method for automatically predicting topic model
quality based on analysis of document-level topic allocations, and provide
empirical evidence for its robustness.
Bib2vec: An Embedding-based Search System for Bibliographic Information
Comments: EACL2017 extended version
Journal-ref: Proceedings of the EACL 2017 Software Demonstrations, Valencia,
Spain, April 3-7 2017, pages 112-115
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We propose a novel embedding model that represents relationships among
several elements in bibliographic information with high representation ability
and flexibility. Based on this model, we present a novel search system that
shows the relationships among the elements in the ACL Anthology Reference
Corpus. The evaluation results show that our model can achieve a high
prediction ability and produce reasonable search results.
A Mixture Model for Learning Multi-Sense Word Embeddings
Comments: *SEM 2017
Subjects:
Computation and Language (cs.CL)
Word embeddings are now a standard technique for inducing meaning
representations for words. For getting good representations, it is important to
take into account different senses of a word. In this paper, we propose a
mixture model for learning multi-sense word embeddings. Our model generalizes
the previous works in that it allows to induce different weights of different
senses of a word. The experimental results show that our model outperforms
previous models on standard evaluation tasks.
Number game
Go Sugimoto (ACDH-ÖAW) Subjects : Computation and Language (cs.CL) ; Digital Libraries (cs.DL)
CLARIN (Common Language Resources and Technology Infrastructure) is regarded
as one of the most important European research infrastructures, offering and
promoting a wide array of useful services for (digital) research in linguistics
and humanities. However, the assessment of the users for its core technical
development has been highly limited, therefore, it is unclear if the community
is thoroughly aware of the status-quo of the growing infrastructure. In
addition, CLARIN does not seem to be fully materialised marketing and business
plans and strategies despite its strong technical assets. This article analyses
the web traffic of the Virtual Language Observatory, one of the main web
applications of CLARIN and a symbol of pan-European re-search cooperation, to
evaluate the users and performance of the service in a transparent and
scientific way. It is envisaged that the paper can raise awareness of the
pressing issues on objective and transparent operation of the infrastructure
though Open Evaluation, and the synergy between marketing and technical
development. It also investigates the “science of web analytics” in an attempt
to document the research process for the purpose of reusability and
reproducibility, thus to find universal lessons for the use of a web analytics,
rather than to merely produce a statistical report of a particular website
which loses its value outside its context.
Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder
Comments: Accepted to Rep4NLP 2017 Workshop at ACL 2017 Conference
Subjects:
Computation and Language (cs.CL)
; Neural and Evolutionary Computing (cs.NE)
We investigate the integration of a planning mechanism into an
encoder-decoder architecture with attention for character-level machine
translation. We develop a model that plans ahead when it computes alignments
between the source and target sequences, constructing a matrix of proposed
future alignments and a commitment vector that governs whether to follow or
recompute the plan. This mechanism is inspired by the strategic attentive
reader and writer (STRAW) model. Our proposed model is end-to-end trainable
with fully differentiable operations. We show that it outperforms a strong
baseline on three character-level decoder neural machine translation on WMT’15
corpus. Our analysis demonstrates that our model can compute qualitatively
intuitive alignments and achieves superior performance with fewer parameters.
Topic supervised non-negative matrix factorization
Comments: 20 pages, 7 figures
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Topic models have been extensively used to organize and interpret the
contents of large, unstructured corpora of text documents. Although topic
models often perform well on traditional training vs. test set evaluations, it
is often the case that the results of a topic model do not align with human
interpretation. This interpretability fallacy is largely due to the
unsupervised nature of topic models, which prohibits any user guidance on the
results of a model. In this paper, we introduce a semi-supervised method called
topic supervised non-negative matrix factorization (TS-NMF) that enables the
user to provide labeled example documents to promote the discovery of more
meaningful semantic structure of a corpus. In this way, the results of TS-NMF
better match the intuition and desired labeling of the user. The core of TS-NMF
relies on solving a non-convex optimization problem for which we derive an
iterative algorithm that is shown to be monotonic and convergent to a local
optimum. We demonstrate the practical utility of TS-NMF on the Reuters and
PubMed corpora, and find that TS-NMF is especially useful for conceptual or
broad topics, where topic key terms are not well understood. Although
identifying an optimal latent structure for the data is not a primary objective
of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard
similarity scores than the contemporary methods, (unsupervised) NMF and latent
Dirichlet allocation, at supervision rates as low as 10% to 20%.
Comments: APE/QE System Description Paper for WMT/CMT 2017
Subjects:
Computation and Language (cs.CL)
This work presents a novel approach to jointly tackling Automatic
Post-Editing (APE) and Word-Level Quality Estimation (QE) using ensembles of
specialized Neural Machine Translation (NMT) systems. Word-level features which
have proven effective for QE are included as input factors, expanding the
representation of the original source and the machine translation hypothesis,
which are used to generate an automatically post-edited hypothesis. We train a
suite of NMT models which use different input representations, but share the
same output space. These models are then ensembled together, and tuned for both
the APE and the QE task. We thus attempt to connect the state-of-the-art
approaches to APE and QE within a single framework. Our models achieve
state-of-the-art results in both tasks, with the only difference in the tuning
step which learns weights for each component of the ensemble.
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Suncong Zheng , Feng Wang , Hongyun Bao , Yuexing Hao , Peng Zhou , Bo Xu Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Learning (cs.LG)
Joint extraction of entities and relations is an important task in
information extraction. To tackle this problem, we firstly propose a novel
tagging scheme that can convert the joint extraction task to a tagging problem.
Then, based on our tagging scheme, we study different end-to-end models to
extract entities and their relations directly, without identifying entities and
relations separately. We conduct experiments on a public dataset produced by
distant supervision method and the experimental results show that the tagging
based methods are better than most of the existing pipelined and joint learning
methods. What’s more, the end-to-end model proposed in this paper, achieves the
best results on the public dataset.
Extracting Formal Models from Normative Texts
Comments: Extended version of conference paper at the 21st International Conference on Applications of Natural Language to Information Systems (NLDB 2016). arXiv admin note: substantial text overlap with arXiv:1607.01485
Subjects:
Computation and Language (cs.CL)
We are concerned with the analysis of normative texts – documents based on
the deontic notions of obligation, permission, and prohibition. Our goal is to
make queries about these notions and verify that a text satisfies certain
properties concerning causality of actions and timing constraints. This
requires taking the original text and building a representation (model) of it
in a formal language, in our case the C-O Diagram formalism. We present an
experimental, semi-automatic aid that helps to bridge the gap between a
normative text in natural language and its C-O Diagram representation. Our
approach consists of using dependency structures obtained from the
state-of-the-art Stanford Parser, and applying our own rules and heuristics in
order to extract the relevant components. The result is a tabular data
structure where each sentence is split into suitable fields, which can then be
converted into a C-O Diagram. The process is not fully automatic however, and
some post-editing is generally required of the user. We apply our tool and
perform experiments on documents from different domains, and report an initial
evaluation of the accuracy and feasibility of our approach.
Active learning in annotating micro-blogs dealing with e-reputation
Jean-Valèere Cossu , Alejandro Molina-Villegas , Mariana Tello-Signoret Subjects : Social and Information Networks (cs.SI) ; Computation and Language (cs.CL)
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author’s name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.
Deal or No Deal? End-to-End Learning for Negotiation Dialogues
Mike Lewis , Denis Yarats , Yann N. Dauphin , Devi Parikh , Dhruv Batra Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Much of human dialogue occurs in semi-cooperative settings, where agents with
different goals attempt to agree on common decisions. Negotiations require
complex communication and reasoning skills, but success is easy to measure,
making this an interesting task for AI. We gather a large dataset of
human-human negotiations on a multi-issue bargaining task, where agents who
cannot observe each other’s reward functions must reach an agreement (or a
deal) via natural language dialogue. For the first time, we show it is possible
to train end-to-end models for negotiation, which must learn both linguistic
and reasoning skills with no annotated dialogue states. We also introduce
dialogue rollouts, in which the model plans ahead by simulating possible
complete continuations of the conversation, and find that this technique
dramatically improves performance. Our code and dataset are publicly available
( this https URL ).
Distributed, Parallel, and Cluster Computing
Declarative Modeling for Building a Cloud Federation and Cloud Applications
Giuseppe Attardi , Alex Barchiesi , Alberto Colla , Fulvio Galeazzi , Giovanni Marzulli , Mario Reale Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
The paper illustrates how we built a federated cloud computing platform
dedicated to the Italian research community. Building a cloud platform is a
daunting task, that requires coordinating the deployment of many services,
interrelated and dependent on each other. Provisioning, servicing and
maintaining the platform must be automated. For our deployment, we chose a
declarative modeling tool, that allows describing the parts that compose the
system and their relations of supplier/consumer of specific interfaces. The
tool arranges the steps to bring the deployment to convergence by transforming
the state of the system until it reaches a configuration that satisfies all
constraints. We chose a declarative service modeling approach for orchestrating
both the deployment of the platform by the administrators and the deployment of
applications by users. The cloud platform has been designed so that it can be
managed by this kind of automation, facilitating the deployment of federated
regions by anyone wishing to join and to contribute resources to the
federation. Federated resources are integrated into a single cloud platform
available to any user of the federation. The federation can also seamlessly
include public clouds. We describe the architectural choices, how we adapted
the OpenStack basic facilities to the needs of a federation of multiple
independent organizations, how we control resource allocation according to
committed plans and correspondingly how we handle accounting and billing of
resource usage. Besides providing traditional IaaS services, the cloud supports
self-service deployment of cloud applications. The cloud thus addresses the
long tail of science, allowing researchers of any discipline, without expertise
in system or cloud administration, to deploy applications readily available for
their perusal.
Set-Constrained Delivery Broadcast: Definition, Abstraction Power, and Computability Limits
Comments: arXiv admin note: substantial text overlap with arXiv:1702.08176
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
This paper introduces a new communication abstraction, called Set-Constrained
Delivery Broadcast (SCD-broadcast), whose aim is to provide its users with an
appropriate abstraction level when they have to implement objects or
distributed tasks in an asynchronous message-passing system prone to process
crash failures. This abstraction allows each process to broadcast messages and
deliver a sequence of sets of messages in such a way that, if a process
delivers a set of messages including a message m and later delivers a set of
messages including a message m ‘ , no process delivers first a set of messages
including m ‘ and later a set of message including m. After having presented an
algorithm implementing SCD-broadcast, the paper investigates its programming
power and its computability limits. On the “power” side it presents
SCD-broadcast-based algorithms, which are both simple and efficient, building
objects (such as snapshot and conflict-free replicated data), and distributed
tasks. On the “computability limits” side it shows that SCD-broadcast and
read/write registers are computationally equivalent.
Concurrent Geometric Multicasting
Jordan Adamek , Mikhail Nesterenko , James Robinson , Sébastien Tixeuil (NPA, IUF, LINCS) Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Computational Geometry (cs.CG); Discrete Mathematics (cs.DM); Networking and Internet Architecture (cs.NI)
We present MCFR, a multicasting concurrent face routing algorithm that uses
geometric routing to deliver a message from source to multiple targets. We
describe the algorithm’s operation, prove it correct, estimate its performance
bounds and evaluate its performance using simulation. Our estimate shows that
MCFR is the first geometric multicast routing algorithm whose message delivery
latency is independent of network size and only proportional to the distance
between the source and the targets. Our simulation indicates that MCFR has
significantly better reliability than existing algorithms.
Parameterized Verification of Algorithms for Oblivious Robots on a Ring
Arnaud Sangnier (IRIF), Nathalie Sznajder (MoVe), Maria Potop-Butucaru (NPA, LINCS), Sébastien Tixeuil (NPA, IUF, LINCS) Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO); Robotics (cs.RO)
We study verification problems for autonomous swarms of mobile robots that
self-organize and cooperate to solve global objectives. In particular, we focus
in this paper on the model proposed by Suzuki and Yamashita of anonymous robots
evolving in a discrete space with a finite number of locations (here, a ring).
A large number of algorithms have been proposed working for rings whose size is
not a priori fixed and can be hence considered as a parameter. Handmade
correctness proofs of these algorithms have been shown to be error-prone, and
recent attention had been given to the application of formal methods to
automatically prove those. Our work is the first to study the verification
problem of such algorithms in the parameter-ized case. We show that safety and
reachability problems are undecidable for robots evolving asynchronously. On
the positive side, we show that safety properties are decidable in the
synchronous case, as well as in the asynchronous case for a particular class of
algorithms. Several properties on the protocol can be decided as well. Decision
procedures rely on an encoding in Presburger arithmetics formulae that can be
verified by an SMT-solver. Feasibility of our approach is demonstrated by the
encoding of several case studies.
Distributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs
Comments: 30 pages
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Big graphs (networks) arising in numerous application areas pose significant
challenges for graph analysts as these graphs grow to billions of nodes and
edges and are prohibitively large to fit in the main memory. Finding the number
of triangles in a graph is an important problem in the mining and analysis of
graphs. In this paper, we present two efficient MPI-based distributed memory
parallel algorithms for counting triangles in big graphs. The first algorithm
employs overlapping partitioning and efficient load balancing schemes to
provide a very fast parallel algorithm. The algorithm scales well to networks
with billions of nodes and can compute the exact number of triangles in a
network with 10 billion edges in 16 minutes. The second algorithm divides the
network into non-overlapping partitions leading to a space-efficient algorithm.
Our results on both artificial and real-world networks demonstrate a
significant space saving with this algorithm. We also present a novel approach
that reduces communication cost drastically leading the algorithm to both a
space- and runtime-efficient algorithm. Further, we demonstrate how our
algorithms can be used to list all triangles in a graph and compute clustering
coefficients of nodes. Our algorithm can also be adapted to a parallel
approximation algorithm using an edge sparsification method.
The Price of Anarchy in Hypergraph Coloring Games
Rann Smorodinsky , Shakhar Smorodinsky Subjects : Combinatorics (math.CO) ; Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Computer Science and Game Theory (cs.GT); Social and Information Networks (cs.SI)
The price of anarchy was introduced to measure the loss incurred by a society
of agents who take actions in a decentralized manner instead of through a
central authority. Hypergraph coloring has traditionally been studied in the
context of a central designer who chooses colors. In this paper we study the
price of anarchy when the choice of color is delegated to each of the vertices
which are assumed self-interested.
Distributed Transfer Linear Support Vector Machines
Rui Zhang , Quanyan Zhu Subjects : Learning (cs.LG) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Transfer learning has been developed to improve the performances of different
but related tasks in machine learning. However, such processes become less
efficient with the increase of the size of training data and the number of
tasks. Moreover, privacy can be violated as some tasks may contain sensitive
and private data, which are communicated between nodes and tasks. We propose a
consensus-based distributed transfer learning framework, where several tasks
aim to find the best linear support vector machine (SVM) classifiers in a
distributed network. With alternating direction method of multipliers, tasks
can achieve better classification accuracies more efficiently and privately, as
each node and each task train with their own data, and only decision variables
are transferred between different tasks and nodes. Numerical experiments on
MNIST datasets show that the knowledge transferred from the source tasks can be
used to decrease the risks of the target tasks that lack training data or have
unbalanced training labels. We show that the risks of the target tasks in the
nodes without the data of the source tasks can also be reduced using the
information transferred from the nodes who contain the data of the source
tasks. We also show that the target tasks can enter and leave in real-time
without rerunning the whole algorithm.
Learning
Local Feature Descriptor Learning with Adaptive Siamese Network
Chong Huang , Qiong Liu , Yan-Ying Chen , Kwang-Ting (Tim)
Comments: 4 pages, 2 tables, 1 figure
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Although the recent progress in the deep neural network has led to the
development of learnable local feature descriptors, there is no explicit answer
for estimation of the necessary size of a neural network. Specifically, the
local feature is represented in a low dimensional space, so the neural network
should have more compact structure. The small networks required for local
feature descriptor learning may be sensitive to initial conditions and learning
parameters and more likely to become trapped in local minima. In order to
address the above problem, we introduce an adaptive pruning Siamese
Architecture based on neuron activation to learn local feature descriptors,
making the network more computationally efficient with an improved recognition
rate over more complex networks. Our experiments demonstrate that our learned
local feature descriptors outperform the state-of-art methods in patch
matching.
L2 Regularization versus Batch and Weight Normalization
Twan van Laarhoven Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Batch Normalization is a commonly used trick to improve the training of deep
neural networks. These neural networks use L2 regularization, also called
weight decay, ostensibly to prevent overfitting. However, we show that L2
regularization has no regularizing effect when combined with normalization.
Instead, regularization has an influence on the scale of weights, and thereby
on the effective learning rate. We investigate this dependence, both in theory,
and experimentally. We show that popular optimization methods such as ADAM only
partially eliminate the influence of normalization on the learning rate. This
leads to a discussion on other ways to mitigate this issue.
Learning with Feature Evolvable Streams
Bo-Jian Hou , Lijun Zhang , Zhi-Hua Zhou Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Learning with streaming data has attracted much attention during the past few
years. Though most studies consider data stream with fixed features, in real
practice the features may be evolvable. For example, features of data gathered
by limited-lifespan sensors will change when these sensors are substituted by
new ones. In this paper, we propose a novel learning paradigm: Feature
Evolvable Streaming Learning where old features would vanish and new features
will occur. Rather than relying on only the current features, we attempt to
recover the vanished features and exploit it to improve performance.
Specifically, we learn two models from the recovered features and the current
features, respectively. To benefit from the recovered features, we develop two
ensemble methods. In the first method, we combine the predictions from two
models and theoretically show that with assistance of old features, the
performance on new features can be improved. In the second approach, we
dynamically select the best single prediction and establish a better
performance guarantee when the best model switches. Experiments on both
synthetic and real data validate the effectiveness of our proposal.
Structured Best Arm Identification with Fixed Confidence
Ruitong Huang , Mohammad M. Ajallooeian , Csaba Szepesvári , Martin Müller Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)
We study the problem of identifying the best action among a set of possible
options when the value of each action is given by a mapping from a number of
noisy micro-observables in the so-called fixed confidence setting. Our main
motivation is the application to the minimax game search, which has been a
major topic of interest in artificial intelligence. In this paper we introduce
an abstract setting to clearly describe the essential properties of the
problem. While previous work only considered a two-move game tree search
problem, our abstract setting can be applied to the general minimax games where
the depth can be non-uniform and arbitrary, and transpositions are allowed. We
introduce a new algorithm (LUCB-micro) for the abstract setting, and give its
lower and upper sample complexity results. Our bounds recover some previous
results, which were only available in more limited settings, while they also
shed further light on how the structure of minimax problems influence sample
complexity.
Veiled Attributes of the Variational Autoencoder
Bin Dai , Yu Wang , John Aston , Gang Hua , David Wipf Subjects : Learning (cs.LG)
Variational autoencoders (VAE) represent a popular, flexible form of deep
generative model that can be stochastically fit to samples from a given random
process using an information-theoretic variational bound on the true underlying
distribution. Once so-obtained, the model can be putatively used to generate
new samples from this distribution, or to provide a low-dimensional latent
representation of existing samples. While quite effective in numerous
application domains, certain important mechanisms which govern the behavior of
the VAE are obfuscated by the intractable integrals and resulting stochastic
approximations involved. Moreover, as a highly non-convex model, it remains
unclear exactly how minima of the underlying energy relate to original design
purposes. We attempt to better quantify these issues by analyzing a series of
tractable special cases of increasing complexity. In doing so, we unveil
interesting connections with more traditional dimensionality reduction models,
as well as an intrinsic yet underappreciated propensity for robustly dismissing
outliers when estimating latent manifolds. With respect to the latter, we
demonstrate that the VAE can be viewed as the natural evolution of recent
robust PCA models, capable of learning nonlinear manifolds obscured by gross
corruptions. However, this previously unexplored feature comes with the cost of
potential model collapse to a degenerate distribution that may be less suitable
as the basis for generating new samples.
One Model To Learn Them All
Lukasz Kaiser , Aidan N. Gomez , Noam Shazeer , Ashish Vaswani , Niki Parmar , Llion Jones , Jakob Uszkoreit Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Deep learning yields great results across many fields, from speech
recognition, image classification, to translation. But for each problem,
getting a deep model to work well involves research into the architecture and a
long period of tuning. We present a single model that yields good results on a
number of problems spanning multiple domains. In particular, this single model
is trained concurrently on ImageNet, multiple translation tasks, image
captioning (COCO dataset), a speech recognition corpus, and an English parsing
task. Our model architecture incorporates building blocks from multiple
domains. It contains convolutional layers, an attention mechanism, and
sparsely-gated layers. Each of these computational blocks is crucial for a
subset of the tasks we train on. Interestingly, even if a block is not crucial
for a task, we observe that adding it never hurts performance and in most cases
improves it on all tasks. We also show that tasks with less data benefit
largely from joint training with other tasks, while performance on large tasks
degrades only slightly if at all.
Deriving Compact Laws Based on Algebraic Formulation of a Data Set
In various subjects, there exist compact and consistent relationships between
input and output parameters. Discovering the relationships, or namely compact
laws, in a data set is of great interest in many fields, such as physics,
chemistry, and finance. While data discovery has made great progress in
practice thanks to the success of machine learning in recent years, the
development of analytical approaches in finding the theory behind the data is
relatively slow. In this paper, we develop an innovative approach in
discovering compact laws from a data set. By proposing a novel algebraic
equation formulation, we convert the problem of deriving meaning from data into
formulating a linear algebra model and searching for relationships that fit the
data. Rigorous proof is presented in validating the approach. The algebraic
formulation allows the search of equation candidates in an explicit
mathematical manner. Searching algorithms are also proposed for finding the
governing equations with improved efficiency. For a certain type of compact
theory, our approach assures convergence and the discovery is computationally
efficient and mathematically precise.
An Overview of Multi-Task Learning in Deep Neural Networks
Comments: 14 pages, 8 figures
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Multi-task learning (MTL) has led to successes in many applications of
machine learning, from natural language processing and speech recognition to
computer vision and drug discovery. This article aims to give a general
overview of MTL, particularly in deep neural networks. It introduces the two
most common methods for MTL in Deep Learning, gives an overview of the
literature, and discusses recent advances. In particular, it seeks to help ML
practitioners apply MTL by shedding light on how MTL works and providing
guidelines for choosing appropriate auxiliary tasks.
Learning Disjunctions of Predicates
Nader H. Bshouty , Dana Drachsler-Cohen , Martin Vechev , Eran Yahav Subjects : Learning (cs.LG)
Let (F) be a set of boolean functions. We present an algorithm for learning
(F_vee := {vee_{fin S} f mid S subseteq F}) from membership queries. Our
algorithm asks at most (|F| cdot OPT(F_vee)) membership queries where
(OPT(F_vee)) is the minimum worst case number of membership queries for
learning (F_vee). When (F) is a set of halfspaces over a constant dimension
space or a set of variable inequalities, our algorithm runs in polynomial time.
The problem we address has practical importance in the field of program
synthesis, where the goal is to synthesize a program that meets some
requirements. Program synthesis has become popular especially in settings
aiming to help end users. In such settings, the requirements are not provided
upfront and the synthesizer can only learn them by posing membership queries to
the end user. Our work enables such synthesizers to learn the exact
requirements while bounding the number of membership queries.
Generalization for Adaptively-chosen Estimators via Stable Median
Comments: To appear in Conference on Learning Theory (COLT) 2017
Subjects:
Learning (cs.LG)
; Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Datasets are often reused to perform multiple statistical analyses in an
adaptive way, in which each analysis may depend on the outcomes of previous
analyses on the same dataset. Standard statistical guarantees do not account
for these dependencies and little is known about how to provably avoid
overfitting and false discovery in the adaptive setting. We consider a natural
formalization of this problem in which the goal is to design an algorithm that,
given a limited number of i.i.d.~samples from an unknown distribution, can
answer adaptively-chosen queries about that distribution.
We present an algorithm that estimates the expectations of (k) arbitrary
adaptively-chosen real-valued estimators using a number of samples that scales
as (sqrt{k}). The answers given by our algorithm are essentially as accurate
as if fresh samples were used to evaluate each estimator. In contrast, prior
work yields error guarantees that scale with the worst-case sensitivity of each
estimator. We also give a version of our algorithm that can be used to verify
answers to such queries where the sample complexity depends logarithmically on
the number of queries (k) (as in the reusable holdout technique).
Our algorithm is based on a simple approximate median algorithm that
satisfies the strong stability guarantees of differential privacy. Our
techniques provide a new approach for analyzing the generalization guarantees
of differentially private algorithms.
A new look at clustering through the lens of deep convolutional neural networks
Ali Borji , Aysegul Dundar Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV)
Classification and clustering have been studied separately in machine
learning and computer vision. Inspired by the recent success of deep learning
models in solving various vision problems (e.g., object recognition, semantic
segmentation) and the fact that humans serve as the gold standard in assessing
clustering algorithms, here, we advocate for a unified treatment of the two
problems and suggest that hierarchical frameworks that progressively build
complex patterns on top of the simpler ones (e.g., convolutional neural
networks) offer a promising solution. We do not dwell much on the learning
mechanisms in these frameworks as they are still a matter of debate, with
respect to biological constraints. Instead, we emphasize on the
compositionality of the real world structures and objects. In particular, we
show that CNNs, trained end to end using back propagation with noisy labels,
are able to cluster data points belonging to several overlapping shapes, and do
so much better than the state of the art algorithms. The main takeaway lesson
from our study is that mechanisms of human vision, particularly the hierarchal
organization of the visual ventral stream should be taken into account in
clustering algorithms (e.g., for learning representations in an unsupervised
manner or with minimum supervision) to reach human level clustering
performance. This, by no means, suggests that other methods do not hold merits.
For example, methods relying on pairwise affinities (e.g., spectral clustering)
have been very successful in many cases but still fail in some cases (e.g.,
overlapping clusters).
Distributed Transfer Linear Support Vector Machines
Rui Zhang , Quanyan Zhu Subjects : Learning (cs.LG) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Transfer learning has been developed to improve the performances of different
but related tasks in machine learning. However, such processes become less
efficient with the increase of the size of training data and the number of
tasks. Moreover, privacy can be violated as some tasks may contain sensitive
and private data, which are communicated between nodes and tasks. We propose a
consensus-based distributed transfer learning framework, where several tasks
aim to find the best linear support vector machine (SVM) classifiers in a
distributed network. With alternating direction method of multipliers, tasks
can achieve better classification accuracies more efficiently and privately, as
each node and each task train with their own data, and only decision variables
are transferred between different tasks and nodes. Numerical experiments on
MNIST datasets show that the knowledge transferred from the source tasks can be
used to decrease the risks of the target tasks that lack training data or have
unbalanced training labels. We show that the risks of the target tasks in the
nodes without the data of the source tasks can also be reduced using the
information transferred from the nodes who contain the data of the source
tasks. We also show that the target tasks can enter and leave in real-time
without rerunning the whole algorithm.
Biased Bagging for Unsupervised Domain Adaptation
Comments: Source code to reproduce our experiments is available at this http URL
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
Unsupervised domain adaptation (DA) is an active research area whose
development and applications have been boosted by the explosion of data without
annotation. Manual labeling is tedious, error prone and expensive, therefore
unsupervised DA is needed more and more to automatize this task: an unlabeled
dataset (target) is annotated using a labeled dataset (source) from a related
domain. The majority of DA methods try to directly match the distributions of
the source and target data. Nevertheless, recent DA methods still suffer from
issues such as the incapability to scale to high dimensional data or the
sensitivity to the (hyper-)parameters of the adaptation procedure.
We propose to overcome these issues by using bagging. The main idea is to
directly embed a given source hypothesis inside a bagging procedure in order to
generate a sequence of good target hypotheses related to the source, which are
then used in a voting method for predicting labels of target data. Our method
is extremely easy to implement and apply. Its effectiveness is demonstrated by
a recent theoretical study on the generalization performance of voting methods
in terms of large mean and low variance of the margin distribution. A
qualitative analysis of our method shows that it tends to increase the mean and
to reduce the variance of the target margin distribution, which is beneficial
for generalization. We report state-of-the-art performance on benchmark
datasets for adaptation tasks using text and image datasets with diverse
characteristics, such as high number of features, large number of classes, and
based on deep input features.
Topic supervised non-negative matrix factorization
Comments: 20 pages, 7 figures
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Topic models have been extensively used to organize and interpret the
contents of large, unstructured corpora of text documents. Although topic
models often perform well on traditional training vs. test set evaluations, it
is often the case that the results of a topic model do not align with human
interpretation. This interpretability fallacy is largely due to the
unsupervised nature of topic models, which prohibits any user guidance on the
results of a model. In this paper, we introduce a semi-supervised method called
topic supervised non-negative matrix factorization (TS-NMF) that enables the
user to provide labeled example documents to promote the discovery of more
meaningful semantic structure of a corpus. In this way, the results of TS-NMF
better match the intuition and desired labeling of the user. The core of TS-NMF
relies on solving a non-convex optimization problem for which we derive an
iterative algorithm that is shown to be monotonic and convergent to a local
optimum. We demonstrate the practical utility of TS-NMF on the Reuters and
PubMed corpora, and find that TS-NMF is especially useful for conceptual or
broad topics, where topic key terms are not well understood. Although
identifying an optimal latent structure for the data is not a primary objective
of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard
similarity scores than the contemporary methods, (unsupervised) NMF and latent
Dirichlet allocation, at supervision rates as low as 10% to 20%.
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Suncong Zheng , Feng Wang , Hongyun Bao , Yuexing Hao , Peng Zhou , Bo Xu Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Learning (cs.LG)
Joint extraction of entities and relations is an important task in
information extraction. To tackle this problem, we firstly propose a novel
tagging scheme that can convert the joint extraction task to a tagging problem.
Then, based on our tagging scheme, we study different end-to-end models to
extract entities and their relations directly, without identifying entities and
relations separately. We conduct experiments on a public dataset produced by
distant supervision method and the experimental results show that the tagging
based methods are better than most of the existing pipelined and joint learning
methods. What’s more, the end-to-end model proposed in this paper, achieves the
best results on the public dataset.
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Comments: ICML 2017
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG)
As a step towards developing zero-shot task generalization capabilities in
reinforcement learning (RL), we introduce a new RL problem where the agent
should learn to execute sequences of instructions after learning useful skills
that solve subtasks. In this problem, we consider two types of generalizations:
to previously unseen instructions and to longer sequences of instructions. For
generalization over unseen instructions, we propose a new objective which
encourages learning correspondences between similar subtasks by making
analogies. For generalization over sequential instructions, we present a
hierarchical architecture where a meta controller learns to use the acquired
skills for executing the instructions. To deal with delayed reward, we propose
a new neural architecture in the meta controller that learns when to update the
subtask, which makes learning more efficient. Experimental results on a
stochastic 3D domain show that the proposed ideas are crucial for
generalization to longer instructions as well as unseen instructions.
Gradient Descent for Spiking Neural Networks
Dongsung Huh , Terrence J. Sejnowski Subjects : Neurons and Cognition (q-bio.NC) ; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Much of studies on neural computation are based on network models of static
neurons that produce analog output, despite the fact that information
processing in the brain is predominantly carried out by dynamic neurons that
produce discrete pulses called spikes. Research in spike-based computation has
been impeded by the lack of efficient supervised learning algorithm for spiking
networks. Here, we present a gradient descent method for optimizing spiking
network models by introducing a differentiable formulation of spiking networks
and deriving the exact gradient calculation. For demonstration, we trained
recurrent spiking networks on two dynamic tasks: one that requires optimizing
fast (~millisecond) spike-based interactions for efficient encoding of
information, and a delayed memory XOR task over extended duration (~second).
The results show that our method indeed optimizes the spiking network dynamics
on the time scale of individual spikes as well as behavioral time scales. In
conclusion, our result offers a general purpose supervised learning algorithm
for spiking neural networks, thus advancing further investigations on
spike-based computation.
Information Theory
A Survey on Non-Orthogonal Multiple Access for 5G Networks: Research Challenges and Future Trends
Comments: to appear in IEEE JSAC, 2017
Subjects:
Information Theory (cs.IT)
Non-orthogonal multiple access (NOMA) is an essential enabling technology for
the fifth generation (5G) wireless networks to meet the heterogeneous demands
on low latency, high reliability, massive connectivity, improved fairness, and
high throughput. The key idea behind NOMA is to serve multiple users in the
same resource block, such as a time slot, subcarrier, or spreading code. The
NOMA principle is a general framework, and several recently proposed 5G
multiple access schemes can be viewed as special cases. This survey provides an
overview of the latest NOMA research and innovations as well as their
applications. Thereby, the papers published in this special issue are put into
the content of the existing literature. Future research challenges regarding
NOMA in 5G and beyond are also discussed.
NOMA in Distributed Antenna System for Max-Min Fairness and Max-Sum-Rate
Dong-Jun Han , Minseok Choi , Jaekyun Moon Subjects : Information Theory (cs.IT)
Distributed antenna system (DAS) has been deployed for over a decade. DAS has
advantages in capacity especially for the cell edge users, in both single-cell
and multi-cell environments. In this paper, non-orthogonal multiple access
(NOMA) is suggested in single-cell DAS to maximize user fairness and sumrate.
Two transmission strategies are considered: NOMA with single selection
transmission and NOMA with blanket transmission. In a two-user scenario, the
center base station (BS) uses NOMA to serve both users and the remote radio
units (RRUs) serve only the weak user based on the transmission scheme. The
signals sent from the RRUs are not considered as interference at both user
sides. At one side, the signals from the RRUs are used to detect the required
data and at the other side, the signals are used to perform successive
interference cancellation (SIC). The max-min performance is studied when
instantaneous channel gain information (CGI) or channel distribution
information (CDI) is known at the transmitter. A closed-form expression of the
upper bound is derived for the data rate of each user with only CDI. In
addition, the sum-rate is studied with a minimum rate constraint when
instantaneous CGI known at the transmitter. The results show that NOMA with
blanket transmission gives the best performance compared to other schemes;
conventional-NOMA, conventional single selection scheme and joint-transmission
(JT) NOMA. It is also shown that with less transmit power, NOMA with single
selection transmission gives better performance than conventional-NOMA.
Precoded Chebyshev-NLMS based pre-distorter for nonlinear LED compensation in NOMA-VLC
Comments: R. Mitra and V. Bhatia are with Indian Institute of Technology Indore, Indore-453552, India, Email:phd1301202010@iiti.ac.in, vbhatia@iiti.ac.in. This work was submitted to IEEE Transactions on Communications on October 26, 2016, decisioned on March 3, 2017, and revised on April 25, 2017, and is currently under review in IEEE Transactions on Communications
Subjects:
Information Theory (cs.IT)
Visible light communication (VLC) is one of the main technologies driving the
future 5G communication systems due to its ability to support high data rates
with low power consumption, thereby facilitating high speed green
communications. To further increase the capacity of VLC systems, a technique
called non-orthogonal multiple access (NOMA) has been suggested to cater to
increasing demand for bandwidth, whereby users’ signals are superimposed prior
to transmission and detected at each user equipment using successive
interference cancellation (SIC). Some recent results on NOMA exist which
greatly enhance the achievable capacity as compared to orthogonal multiple
access techniques. However, one of the performance-limiting factors affecting
VLC systems is the nonlinear characteristics of a light emitting diode (LED).
This paper considers the nonlinear LED characteristics in the design of
pre-distorter for cognitive radio inspired NOMA in VLC, and proposes singular
value decomposition based Chebyshev precoding to improve performance of
nonlinear multiple-input multiple output NOMA-VLC. A novel and generalized
power allocation strategy is also derived in this work, which is valid even in
scenarios when users experience similar channels. Additionally, in this work,
analytical upper bounds for the bit error rate of the proposed detector are
derived for square (M)-quadrature amplitude modulation.
Sparsity Order Estimation from a Single Compressed Observation Vector
Sebastian Semper , Florian Römer , Thomas Hotz , Giovanni DelGaldo Subjects : Information Theory (cs.IT)
We investigate the problem of estimating the unknown degree of sparsity from
compressive measurements without the need to carry out a sparse recovery step.
While the sparsity order can be directly inferred from the effective rank of
the observation matrix in the multiple snapshot case, this appears to be
impossible in the more challenging single snapshot case. We show that specially
designed measurement matrices allow to rearrange the measurement vector into a
matrix such that its effective rank coincides with the effective sparsity
order. In fact, we prove that matrices which are composed of a Khatri-Rao
product of smaller matrices generate measurements that allow to infer the
sparsity order. Moreover, if some samples are used more than once, one of the
matrices needs to be Vandermonde. These structural constraints reduce the
degrees of freedom in choosing the measurement matrix which may incur in a
degradation in the achievable coherence. We thus also address suitable choices
of the measurement matrices. In particular, we analyze Khatri-Rao and
Vandermonde matrices in terms of their coherence and provide a new design for
Vandermonde matrices that achieves a low coherence.
Wireless Link Capacity under Shadowing and Fading
Comments: 20 pages. In Mobihoc’17
Subjects:
Information Theory (cs.IT)
; Networking and Internet Architecture (cs.NI)
We consider the following basic link capacity (a.k.a., one-shot scheduling)
problem in wireless networks: Given a set of communication links, find a
maximum subset of links that can successfully transmit simultaneously. Good
performance guarantees are known only for deterministic models, such as the
physical model with geometric (log-distance) pathloss. We treat this problem
under stochastic shadowing under general distributions, bound the effects of
shadowing on optimal capacity, and derive constant approximation algorithms. We
also consider temporal fading under Rayleigh distribution, and show that it
affects non-fading solutions only by a constant-factor. These can be combined
into a constant approximation link capacity algorithm under both time-invariant
shadowing and temporal fading.
Successive Cancellation Decoding of Single Parity-Check Product Codes
Mustafa Cemil Coşkun , Gianluigi Liva , Alexandre Graell i Amat , Michael Lentmaier Subjects : Information Theory (cs.IT)
We introduce successive cancellation (SC) decoding of product codes (PCs)
with single parity-check (SPC) component codes. Recursive formulas are derived,
which resemble the SC decoding algorithm of polar codes. We analyze the error
probability of SPC-PCs over the binary erasure channel under SC decoding. A
bridge with the analysis of PCs introduced by Elias in 1954 is also
established. Furthermore, bounds on the block error probability under SC
decoding are provided, and compared to the bounds under the original decoding
algorithm proposed by Elias. It is shown that SC decoding of SPC-PCs achieves a
lower block error probability than Elias’ decoding.
Conditions for Unique Reconstruction of Sparse Signals Using Compressive Sensing Methods
Comments: 32 pages, 3 figures, 4 MATLAB programs
Subjects:
Information Theory (cs.IT)
A signal is sparse in one of its representation domain if the number of
nonzero coefficients in that domain is much smaller than the total number of
coefficients. Sparse signals can be reconstructed from a very reduced set of
measurements/observations. The topic of this paper are conditions for the
unique reconstruction of sparse signals from a reduced set of observations.
After the basic definitions are introduced, the unique reconstruction
conditions are reviewed using the spark, restricted isometry, and coherence of
the measurement matrix. Uniqueness of the reconstruction of signals sparse in
the discrete Fourier domain (DFT), as the most important signal transformation
domain, is considered as well.
Base Station Selection for Massive MIMO Networks with Two-stage Precoding
Comments: Submitted to IEEE Wireless Communications Letters
Subjects:
Information Theory (cs.IT)
The two-stage precoding has been proposed to reduce the overhead of both the
channel training and the channel state information (CSI) feedback for the
massive multiple-input multiple-output (MIMO) system. But the overlap of the
angle-spreading-ranges (ASR) for different user clusters may seriously degrade
the performance of the two-stage precoding. In this letter, we propose one ASR
overlap mitigating scheme through the base station (BS) selection. Firstly, the
BS selection is formulated as a sum signal-to-interference-plus-noise ratio
(SINR) maximization problem. Then, the problem is solved by a low-complex
algorithm through maximizing signal-to-leakage-plus-noise ratio (SLNR). In
addition, we propose one low-overhead algorithm with the lower bound on the
average SLNR as the objective function. Finally, we demonstrate the efficacy of
the proposed schemes through the numerical simulations.
Comments: Submitted to IEEE Transactions on Wireless Communications
Subjects:
Information Theory (cs.IT)
As a revolutionary wireless transmission strategy, interference alignment
(IA) can improve the capacity of the cell-edge users. However, the acquisition
of the global channel state information (CSI) for IA leads to unacceptable
overhead in the massive MIMO systems. To tackle this problem, in this paper, we
propose an IA and soft-space-reuse (IA-SSR) based cooperative transmission
scheme under the two-stage precoding framework. Specifically, the cell-center
and the cell-edge users are separately treated to fully exploit the spatial
degrees of freedoms (DoF). Then, the optimal power allocation policy is
developed to maximize the sum-capacity of the network. Next, a low-cost channel
estimator is designed for the proposed IA-SSR framework. Some practical issues
in IA-SSR implementation are also discussed. Finally, plenty of numerical
results are presented to show the efficiency of the proposed algorithm.
Spectral Domain Sampling of Graph Signals
Comments: Submitted to IEEE Transactions on Signal Processing
Subjects:
Information Theory (cs.IT)
Sampling methods for graph signals in the graph spectral domain are proposed.
Conventional sampling of graph signals can be regarded as sampling in the graph
vertex domain, but it does not have desired characteristics in regard to the
graph spectral domain. Down- and upsampled graph signals by using our methods
inherit the frequency domain characteristics of sampled signals defined in the
time/spatial domain. Properties of the sampling effects are studied
theoretically and experimentally in comparison with the conventional sampling
method in the vertex domain. Fractional sampling and Laplacian pyramid of graph
signals are also shown as possible applications.
On M-ary Distributed Detection for Power Constraint Wireless Sensor Networks
Comments: arXiv admin note: text overlap with arXiv:cs/0703046 by other authors
Subjects:
Information Theory (cs.IT)
We consider a wireless sensor network (WSN), consisting of several sensors
and a fusion center (FC), which is tasked with solving an M-ary hypothesis
testing problem. Sensors make M-ary decisions and transmit their digitally
modulated decisions over orthogonal channels, which are subject to Rayleigh
fading and noise, to the FC. Adopting Bayesian optimality criterion, we
consider training and non-training based distributed detection systems and
investigate the effect of imperfect channel state information (CSI) on the
optimal maximum a posteriori probability (MAP) fusion rules and optimal power
allocation between sensors, when the sum of training and data symbol transmit
powers is fixed. We consider J-divergence criteria to do power allocation
between sensors. The theoretical results show that J-divergence for coherent
reception will be maximized if total training power be half of total power,
however for non coherent reception, optimal training power which maximize
J-divergence is zero. The simulated results also show that probability of error
will be minimized if training power be half of total power for coherent
reception and zero for non coherent reception.
Comments: accepted in IEEE TVT
Subjects:
Information Theory (cs.IT)
Given the critical dependence of broadcast channels by the accuracy of
channel state information at the transmitter (CSIT), we develop a general
downlink model with zero-forcing (ZF) precoding, applied in realistic
heterogeneous cellular systems with multiple antenna base stations (BSs).
Specifically, we take into consideration imperfect CSIT due to pilot
contamination, channel aging due to users relative movement, and unavoidable
residual additive transceiver hardware impairments (RATHIs). Assuming that the
BSs are Poisson distributed, the main contributions focus on the derivations of
the upper bound of the coverage probability and the achievable user rate for
this general model. We show that both the coverage probability and the user
rate are dependent on the imperfect CSIT and RATHIs. More concretely, we
quantify the resultant performance loss of the network due to these effects. We
depict that the uplink RATHIs have equal impact, but the downlink transmit BS
distortion has a greater impact than the receive hardware impairment of the
user. Thus, the transmit BS hardware should be of better quality than user’s
receive hardware. Furthermore, we characterise both the coverage probability
and user rate in terms of the time variation of the channel. It is shown that
both of them decrease with increasing user mobility, but after a specific value
of the normalised Doppler shift, they increase again. Actually, the time
variation, following the Jakes autocorrelation function, mirrors this effect on
coverage probability and user rate. Finally, we consider space division
multiple access (SDMA), single user beamforming (SU-BF), and baseline
single-input single-output (SISO) transmission. A comparison among these
schemes reveals that the coverage by means of SU-BF outperforms SDMA in terms
of coverage.
Phaseless Reconstruction from Space-Time Samples
Comments: 23 pages, 4 figures
Subjects:
Functional Analysis (math.FA)
; Information Theory (cs.IT)
Phaseless reconstruction from space-time samples is a nonlinear problem of
recovering a function (x) in a Hilbert space (mathcal{H}) from the modulus of
linear measurements ({lvert langle x, phi_i
angle
vert), ( ldots),
(lvert langle A^{L_i}x, phi_i
angle
vert : i inmathscr I}), where
({phi_i; i inmathscr I}subset mathcal{H}) is a set of functionals on
(mathcal{H}), and (A) is a bounded operator on (mathcal{H}) that acts as an
evolution operator. In this paper, we provide various sufficient or necessary
conditions for solving this problem, which has connections to (X)-ray
crystallography, the scattering transform, and deep learning.
Nonbacktracking Bounds on the Influence in Independent Cascade Models
Emmanuel Abbe , Sanjeev Kulkarni , Eun Jee Lee Subjects : Social and Information Networks (cs.SI) ; Computational Complexity (cs.CC); Information Theory (cs.IT); Probability (math.PR); Machine Learning (stat.ML)
This paper develops upper and lower bounds on the influence measure in a
network, more precisely, the expected number of nodes that a seed set can
influence in the independent cascade model. In particular, our bounds exploit
nonbacktracking walks, Fortuin-Kasteleyn-Ginibre (FKG) type inequalities, and
are computed by message passing implementation. Nonbacktracking walks have
recently allowed for headways in community detection, and this paper shows that
their use can also impact the influence computation. Further, we provide a knob
to control the trade-off between the efficiency and the accuracy of the bounds.
Finally, the tightness of the bounds is illustrated with simulations on various
network models.
Spatial Coding Techniques for Molecular MIMO
Martin Damrath , H. Birkan Yilmaz , Chan-Byoung Chae , Peter Adam Hoeher Subjects : Emerging Technologies (cs.ET) ; Information Theory (cs.IT)
This paper studies spatial diversity techniques applied to multiple-input
multiple-output (MIMO) diffusion-based molecular communications (DBMC). Two
types of spatial coding techniques, namely Alamouti-type coding and repetition
MIMO coding are suggested and analyzed. In addition, we consider receiver-side
equal-gain combining, which is equivalent to maximum-ratio combining in
symmetrical scenarios. For numerical analysis, the channel impulse responses of
a symmetrical (2 imes 2) MIMO-DBMC system are acquired by a trained
artificial neural network. It is demonstrated that spatial diversity has the
potential to improve the system performance and that repetition MIMO coding
outperforms Alamouti-type coding.
欢迎加入我爱机器学习QQ11群:191401275
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。