内容简介:Data-driven methods have recently been developed to discover underlying
Neural and Evolutionary Computing
Hao Xu , Haibin Chang , Dongxiao Zhang Subjects : Neural and Evolutionary Computing (cs.NE) ; Machine Learning (cs.LG)
Data-driven methods have recently been developed to discover underlying
partial differential equations (PDEs) of physical problems. However, for these
methods, a complete candidate library of potential terms in a PDE are usually
required. To overcome this limitation, we propose a novel framework combining
deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In
the proposed framework, a deep neural network that is trained with available
data of a physical problem is utilized to generate meta-data and calculate
derivatives, and the genetic algorithm is then employed to discover the
underlying PDE. Owing to the merits of the genetic algorithm, such as mutation
and crossover, DLGA-PDE can work with an incomplete candidate library. The
proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV)
equation, the Burgers equation, the wave equation, and the Chaffee-Infante
equation, respectively, for proof-of-concept. Satisfactory results are obtained
without the need for a complete candidate library, even in the presence of
noisy and limited data.
Multi-factorial Optimization for Large-scale Virtual Machine Placement in Cloud Computing
Zhengping Liang , Jian Zhang , Liang Feng , Zexuan Zhu Subjects : Neural and Evolutionary Computing (cs.NE)
The placement scheme of virtual machines (VMs) to physical servers (PSs) is
crucial to lowering operational cost for cloud providers. Evolutionary
algorithms (EAs) have been performed promising-solving on virtual machine
placement (VMP) problems in the past. However, as growing demand for cloud
services, the existing EAs fail to implement in large-scale virtual machine
placement (LVMP) problem due to the high time complexity and poor scalability.
Recently, the multi-factorial optimization (MFO) technology has surfaced as a
new search paradigm in evolutionary computing. It offers the ability to evolve
multiple optimization tasks simultaneously during the evolutionary process.
This paper aims to apply the MFO technology to the LVMP problem in
heterogeneous environment. Firstly, we formulate a deployment cost based VMP
problem in the form of the MFO problem. Then, a multi-factorial evolutionary
algorithm (MFEA) embedded with greedy-based allocation operator is developed to
address the established MFO problem. After that, a re-migration and merge
operator is designed to offer the integrated solution of the LVMP problem from
the solutions of MFO problem. To assess the effectiveness of our proposed
method, the simulation experiments are carried on large-scale and extra
large-scale VMs test data sets. The results show that compared with various
heuristic methods, our method could shorten optimization time significantly and
offer a competitive placement solution for the LVMP problem in heterogeneous
environment.
Population-based metaheuristics for Association Rule Text Mining
Iztok Fister Jr. , Suash Deb , Iztok Fister Subjects : Neural and Evolutionary Computing (cs.NE)
Nowadays, the majority of data on the Internet is held in an unstructured
format, like websites and e-mails. The importance of analyzing these data has
been growing day by day. Similar to data mining on structured data, text mining
methods for handling unstructured data have also received increasing attention
from the research community. The paper deals with the problem of Association
Rule Text Mining. To solve the problem, the PSO-ARTM method was proposed, that
consists of three steps: Text preprocessing, Association Rule Text Mining using
population-based metaheuristics, and text postprocessing. The method was
applied to a transaction database obtained from professional triathlon
athletes’ blogs and news posted on their websites. The obtained results reveal
that the proposed method is suitable for Association Rule Text Mining and,
therefore, offers a promising way for further development.
EdgeNets:Edge Varying Graph Neural Networks
Elvin Isufi , Fernando Gama , Alejandro Ribeiro Subjects : Machine Learning (cs.LG) ; Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Machine Learning (stat.ML)
Driven by the outstanding performance of neural networks in the structured
Euclidean domain, recent years have seen a surge of interest in developing
neural networks for graphs and data supported on graphs. The graph is leveraged
as a parameterization to capture detail at the node level with a reduced number
of parameters and complexity. Following this rationale, this paper puts forth a
general framework that unifies state-of-the-art graph neural networks (GNNs)
through the concept of EdgeNet. An EdgeNet is a GNN architecture that allows
different nodes to use different parameters to weigh the information of
different neighbors. By extrapolating this strategy to more iterations between
neighboring nodes, the EdgeNet learns edge- and neighbor-dependent weights to
capture local detail. This is the most general local operation that a node can
do and encompasses under one formulation all graph convolutional neural
networks (GCNNs) as well as graph attention networks (GATs). In writing
different GNN architectures with a common language, EdgeNets highlight specific
architecture advantages and limitations, while providing guidelines to improve
their capacity without compromising their local implementation. For instance,
we show that GCNNs have a parameter sharing structure that induces permutation
equivariance. This can be an advantage or a limitation, depending on the
application. When it is a limitation, we propose hybrid approaches and provide
insights to develop several other solutions that promote parameter sharing
without enforcing permutation equivariance. Another interesting conclusion is
the unification of GCNNs and GATs -approaches that have been so far perceived
as separate. In particular, we show that GATs are GCNNs on a graph that is
learned from the features. This particularization opens the doors to develop
alternative attention mechanisms for improving discriminatory power.
Ensemble Genetic Programming
Comments: eurogp 2020 submission
Subjects:
Machine Learning (cs.LG)
; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Ensemble learning is a powerful paradigm that has been usedin the top
state-of-the-art machine learning methods like Random Forestsand XGBoost.
Inspired by the success of such methods, we have devel-oped a new Genetic
Programming method called Ensemble GP. The evo-lutionary cycle of Ensemble GP
follows the same steps as other GeneticProgramming systems, but with
differences in the population structure,fitness evaluation and genetic
operators. We have tested this method oneight binary classification problems,
achieving results significantly betterthan standard GP, with much smaller
models. Although other methodslike M3GP and XGBoost were the best overall,
Ensemble GP was able toachieve exceptionally good generalization results on a
particularly hardproblem where none of the other methods was able to succeed.
An Efficient Framework for Automated Screening of Clinically Significant Macular Edema
Renoh Johnson Chalakkal , Faizal Hafiz , Waleed Abdulla , Akshya Swain Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The present study proposes a new approach to automated screening of
Clinically Significant Macular Edema (CSME) and addresses two major challenges
associated with such screenings, i.e., exudate segmentation and imbalanced
datasets. The proposed approach replaces the conventional exudate segmentation
based feature extraction by combining a pre-trained deep neural network with
meta-heuristic feature selection. A feature space over-sampling technique is
being used to overcome the effects of skewed datasets and the screening is
accomplished by a k-NN based classifier. The role of each data-processing step
(e.g., class balancing, feature selection) and the effects of limiting the
region-of-interest to fovea on the classification performance are critically
analyzed. Finally, the selection and implication of operating point on Receiver
Operating Characteristic curve are discussed. The results of this study
convincingly demonstrate that by following these fundamental practices of
machine learning, a basic k-NN based classifier could effectively accomplish
the CSME screening.
Memory capacity of neural networks with threshold and ReLU activations
Comments: 25 pages
Subjects:
Machine Learning (cs.LG)
; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Overwhelming theoretical and empirical evidence shows that mildly
overparametrized neural networks — those with more connections than the size
of the training data — are often able to memorize the training data with
(100\%) accuracy. This was rigorously proved for networks with sigmoid
activation functions and, very recently, for ReLU activations. Addressing a
1988 open question of Baum, we prove that this phenomenon holds for general
multilayered perceptrons, i.e. neural networks with threshold activation
functions, or with any mix of threshold and ReLU activations. Our construction
is probabilistic and exploits sparsity.
Computer Vision and Pattern Recognition
SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation
Jesse Sun , Fatemeh Darbeha , Mark Zaidi , Bo Wang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Medical image segmentation is a difficult but important task for many
clinical operations such as cardiac bi-ventricular volume estimation. More
recently, there has been a shift to utilizing deep learning and fully
convolutional neural networks (CNNs) to perform image segmentation that has
yielded state-of-the-art results in many public benchmark datasets. Despite the
progress of deep learning in medical image segmentation, standard CNNs are
still not fully adopted in clinical settings as they lack robustness and
interpretability. Shapes are generally more meaningful features than solely
textures of images, which are features regular CNNs learn, causing a lack of
robustness. Likewise, previous works surrounding model interpretability have
been focused on post hoc gradient-based saliency methods. However,
gradient-based saliency methods typically require additional computations post
hoc and have been shown to be unreliable for interpretability. Thus, we present
a new architecture called Shape Attentive U-Net (SAUNet) which focuses on model
interpretability and robustness. The proposed architecture attempts to address
these limitations by the use of a secondary shape stream that captures rich
shape-dependent information in parallel with the regular texture stream.
Furthermore, we suggest multi-resolution saliency maps can be learned using our
dual-attention decoder module which allows for multi-level interpretability and
mitigates the need for additional computations post hoc. Our method also
achieves state-of-the-art results on the two large public cardiac MRI image
segmentation datasets of SUN09 and AC17.
PatchPerPix for Instance Segmentation
Peter Hirsch , Lisa Mais , Dagmar Kainmueller Subjects : Computer Vision and Pattern Recognition (cs.CV)
In this paper we present a novel method for proposal free instance
segmentation that can handle sophisticated object shapes that span large parts
of an image and form dense object clusters with crossovers. Our method is based
on predicting dense local shape descriptors, which we assemble to form
instances. All instances are assembled simultaneously in one go. To our
knowledge, our method is the first non-iterative method that guarantees
instances to be composed of learnt shape patches. We evaluate our method on a
variety of data domains, where it defines the new state of the art on two
challenging benchmarks, namely the ISBI 2012 EM segmentation benchmark, and the
BBBC010 C. elegans dataset. We show furthermore that our method performs well
also on 3d image data, and can handle even extreme cases of complex shape
clusters.
Geometric Proxies for Live RGB-D Stream Enhancement and Consolidation
Comments: extension of our ECCV 2018 paper at this http URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We propose a geometric superstructure for unified real-time processing of
RGB-D data. Modern RGB-D sensors are widely used for indoor 3D capture, with
applications ranging from modeling to robotics, through augmented reality.
Nevertheless, their use is limited by their low resolution, with frames often
corrupted with noise, missing data and temporal inconsistencies. Our approach
consists in generating and updating through time a single set of compact local
statistics parameterized over detected geometric proxies, which are fed from
raw RGB-D data. Our proxies provide several processing primitives, which
improve the quality of the RGB-D stream on the fly or lighten further
operations. Experimental results confirm that our lightweight analysis
framework copes well with embedded execution as well as moderate memory and
computational capabilities compared to state-of-the-art methods. Processing
RGB-D data with our proxies allows noise and temporal flickering removal, hole
filling and resampling. As a substitute of the observed scene, our proxies can
additionally be applied to compression and scene reconstruction. We present
experiments performed with our framework in indoor scenes of different natures
within a recent open RGB-D dataset.
Multimodal Deep Unfolding for Guided Image Super-Resolution
Iman Marivani , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
The reconstruction of a high resolution image given a low resolution
observation is an ill-posed inverse problem in imaging. Deep learning methods
rely on training data to learn an end-to-end mapping from a low-resolution
input to a high-resolution output. Unlike existing deep multimodal models that
do not incorporate domain knowledge about the problem, we propose a multimodal
deep learning design that incorporates sparse priors and allows the effective
integration of information from another image modality into the network
architecture. Our solution relies on a novel deep unfolding operator,
performing steps similar to an iterative algorithm for convolutional sparse
coding with side information; therefore, the proposed neural network is
interpretable by design. The deep unfolding architecture is used as a core
component of a multimodal framework for guided image super-resolution. An
alternative multimodal design is investigated by employing residual learning to
improve the training efficiency. The presented multimodal approach is applied
to super-resolution of near-infrared and multi-spectral images as well as depth
upsampling using RGB images as side information. Experimental results show that
our model outperforms state-of-the-art methods.
P(^2)-GAN: Efficient Style Transfer Using Single Style Image
Comments: 5 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Style transfer is a useful image synthesis technique that can re-render given
image into another artistic style while preserving its content information.
Generative Adversarial Network (GAN) is a widely adopted framework toward this
task for its better representation ability on local style patterns than the
traditional Gram-matrix based methods. However, most previous methods rely on
sufficient amount of pre-collected style images to train the model. In this
paper, a novel Patch Permutation GAN (P(^2)-GAN) network that can efficiently
learn the stroke style from a single style image is proposed. We use patch
permutation to generate multiple training samples from the given style image. A
patch discriminator that can simultaneously process patch-wise images and
natural images seamlessly is designed. We also propose a local texture
descriptor based criterion to quantitatively evaluate the style transfer
quality. Experimental results showed that our method can produce finer quality
re-renderings from single style image with improved computational efficiency
compared with many state-of-the-arts methods.
Detecting Face2Face Facial Reenactment in Videos
Comments: 9 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Visual content has become the primary source of information, as evident in
the billions of images and videos, shared and uploaded on the Internet every
single day. This has led to an increase in alterations in images and videos to
make them more informative and eye-catching for the viewers worldwide. Some of
these alterations are simple, like copy-move, and are easily detectable, while
other sophisticated alterations like reenactment based DeepFakes are hard to
detect. Reenactment alterations allow the source to change the target
expressions and create photo-realistic images and videos. While technology can
be potentially used for several applications, the malicious usage of automatic
reenactment has a very large social implication. It is therefore important to
develop detection techniques to distinguish real images and videos with the
altered ones. This research proposes a learning-based algorithm for detecting
reenactment based alterations. The proposed algorithm uses a multi-stream
network that learns regional artifacts and provides a robust performance at
various compression levels. We also propose a loss function for the balanced
learning of the streams for the proposed network. The performance is evaluated
on the publicly available FaceForensics dataset. The results show
state-of-the-art classification accuracy of 99.96%, 99.10%, and 91.20% for no,
easy, and hard compression factors, respectively.
Learning Diverse Features with Part-Level Resolution for Person Re-Identification
Comments: 8 pages, 5 figures, submitted to IEEE TCSVT
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
Learning diverse features is key to the success of person re-identification.
Various part-based methods have been extensively proposed for learning local
representations, which, however, are still inferior to the best-performing
methods for person re-identification. This paper proposes to construct a strong
lightweight network architecture, termed PLR-OSNet, based on the idea of
Part-Level feature Resolution over the Omni-Scale Network (OSNet) for achieving
feature diversity. The proposed PLR-OSNet has two branches, one branch for
global feature representation and the other branch for local feature
representation. The local branch employs a uniform partition strategy for
part-level feature resolution but produces only a single identity-prediction
loss, which is in sharp contrast to the existing part-based methods. Empirical
evidence demonstrates that the proposed PLR-OSNet achieves state-of-the-art
performance on popular person Re-ID datasets, including Market1501,
DukeMTMC-reID and CUHK03, despite its small model size.
Evaluating Weakly Supervised Object Localization Methods Right
Junsuk Choe , Seong Joon Oh , Seungho Lee , Sanghyuk Chun , Zeynep Akata , Hyunjung Shim Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
Weakly-supervised object localization (WSOL) has gained popularity over the
last years for its promise to train localization models with only image-level
labels. Since the seminal WSOL work of class activation mapping (CAM), the
field has focused on how to expand the attention regions to cover objects more
broadly and localize them better. However, these strategies rely on full
localization supervision to validate hyperparameters and for model selection,
which is in principle prohibited under the WSOL setup. In this paper, we argue
that WSOL task is ill-posed with only image-level labels, and propose a new
evaluation protocol where full supervision is limited to only a small held-out
set not overlapping with the test set. We observe that, under our protocol, the
five most recent WSOL methods have not made a major improvement over the CAM
baseline. Moreover, we report that existing WSOL methods have not reached the
few-shot learning baseline, where the full-supervision at validation time is
used for model training instead. Based on our findings, we discuss some future
directions for WSOL.
An End-to-end Deep Learning Approach for Landmark Detection and Matching in Medical Images
Comments: SPIE Medical Imaging Conference – 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Anatomical landmark correspondences in medical images can provide additional
guidance information for the alignment of two images, which, in turn, is
crucial for many medical applications. However, manual landmark annotation is
labor-intensive. Therefore, we propose an end-to-end deep learning approach to
automatically detect landmark correspondences in pairs of two-dimensional (2D)
images. Our approach consists of a Siamese neural network, which is trained to
identify salient locations in images as landmarks and predict matching
probabilities for landmark pairs from two different images. We trained our
approach on 2D transverse slices from 168 lower abdominal Computed Tomography
(CT) scans. We tested the approach on 22,206 pairs of 2D slices with varying
levels of intensity, affine, and elastic transformations. The proposed approach
finds an average of 639, 466, and 370 landmark matches per image pair for
intensity, affine, and elastic transformations, respectively, with spatial
matching errors of at most 1 mm. Further, more than 99% of the landmark pairs
are within a spatial matching error of 2 mm, 4 mm, and 8 mm for image pairs
with intensity, affine, and elastic transformations, respectively. To
investigate the utility of our developed approach in a clinical setting, we
also tested our approach on pairs of transverse slices selected from follow-up
CT scans of three patients. Visual inspection of the results revealed landmark
matches in both bony anatomical regions as well as in soft tissues lacking
prominent intensity gradients.
From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds
Comments: Accepted to IEEE Robotics and Automation Letters 2020 | Video: this https URL | Code: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Robotics (cs.RO)
We propose a new method for segmentation-free joint estimation of orthogonal
planes, their intersection lines, relationship graph and corners lying at the
intersection of three orthogonal planes. Such unified scene exploration under
orthogonality allows for multitudes of applications such as semantic plane
detection or local and global scan alignment, which in turn can aid robot
localization or grasping tasks. Our two-stage pipeline involves a rough yet
joint estimation of orthogonal planes followed by a subsequent joint refinement
of plane parameters respecting their orthogonality relations. We form a graph
of these primitives, paving the way to the extraction of further reliable
features: lines and corners. Our experiments demonstrate the validity of our
approach in numerous scenarios from wall detection to 6D tracking, both on
synthetic and real data.
VMRFANet:View-Specific Multi-Receptive Field Attention Network for Person Re-identification
Comments: Accepted by ICAART2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Person re-identification (re-ID) aims to retrieve the same person across
different cameras. In practice, it still remains a challenging task due to
background clutter, variations on body poses and view conditions, inaccurate
bounding box detection, etc. To tackle these issues, in this paper, we propose
a novel multi-receptive field attention (MRFA) module that utilizes filters of
various sizes to help network focusing on informative pixels. Besides, we
present a view-specific mechanism that guides attention module to handle the
variation of view conditions. Moreover, we introduce a Gaussian horizontal
random cropping/padding method which further improves the robustness of our
proposed network. Comprehensive experiments demonstrate the effectiveness of
each component. Our method achieves 95.5% / 88.1% in rank-1 / mAP on
Market-1501, 88.9% / 80.0% on DukeMTMC-reID, 81.1% / 78.8% on CUHK03 labeled
dataset and 78.9% / 75.3% on CUHK03 detected dataset, outperforming current
state-of-the-art methods.
Transfer Learning using Neural Ordinary Differential Equations
Rajath S , Sumukh Aithal K , Natarajan Subramanyam Subjects : Computer Vision and Pattern Recognition (cs.CV)
A concept of using Neural Ordinary Differential Equations(NODE) for Transfer
Learning has been introduced. In this paper we use the EfficientNets to explore
transfer learning on CIFAR-10 dataset. We use NODE for fine-tuning our model.
Using NODE for fine tuning provides more stability during training and
validation.These continuous depth blocks can also have a trade off between
numerical precision and speed .Using Neural ODEs for transfer learning has
resulted in much stable convergence of the loss function.
Face Verification via learning the kernel matrix
Comments: 10 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The kernel function is introduced to solve the nonlinear pattern recognition
problem. The advantage of a kernel method often depends critically on a proper
choice of the kernel function. A promising approach is to learn the kernel from
data automatically. Over the past few years, some methods which have been
proposed to learn the kernel have some limitations: learning the parameters of
some prespecified kernel function and so on. In this paper, the nonlinear face
verification via learning the kernel matrix is proposed. A new criterion is
used in the new algorithm to avoid inverting the possibly singular within-class
which is a computational problem. The experimental results obtained on the
facial database XM2VTS using the Lausanne protocol show that the verification
performance of the new method is superior to that of the primary method Client
Specific Kernel Discriminant Analysis (CSKDA). The method CSKDA needs to choose
a proper kernel function through many experiments, while the new method could
learn the kernel from data automatically which could save a lot of time and
have the robust performance.
Neural Style Difference Transfer and Its Application to Font Generation
Comments: Submitted to DAS2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
Designing fonts requires a great deal of time and effort. It requires
professional skills, such as sketching, vectorizing, and image editing.
Additionally, each letter has to be designed individually. In this paper, we
will introduce a method to create fonts automatically. In our proposed method,
the difference of font styles between two different fonts is found and
transferred to another font using neural style transfer. Neural style transfer
is a method of stylizing the contents of an image with the styles of another
image. We proposed a novel neural style difference and content difference loss
for the neural style transfer. With these losses, new fonts can be generated by
adding or removing font styles from a font. We provided experimental results
with various combinations of input fonts and discussed limitations and future
development for the proposed method.
Recovering Geometric Information with Learned Texture Perturbations
Jane Wu , Yongxu Jin , Zhenglin Geng , Hui Zhou , Ronald Fedkiw Subjects : Computer Vision and Pattern Recognition (cs.CV)
Regularization is used to avoid overfitting when training a neural network;
unfortunately, this reduces the attainable level of detail hindering the
ability to capture high-frequency information present in the training data.
Even though various approaches may be used to re-introduce high-frequency
detail, it typically does not match the training data and is often not time
coherent. In the case of network inferred cloth, these sentiments manifest
themselves via either a lack of detailed wrinkles or unnaturally appearing
and/or time incoherent surrogate wrinkles. Thus, we propose a general strategy
whereby high-frequency information is procedurally embedded into low-frequency
data so that when the latter is smeared out by the network the former still
retains its high-frequency detail. We illustrate this approach by learning
texture coordinates which when smeared do not in turn smear out the
high-frequency detail in the texture itself but merely smoothly distort it.
Notably, we prescribe perturbed texture coordinates that are subsequently used
to correct the over-smoothed appearance of inferred cloth, and correcting the
appearance from multiple camera views naturally recovers lost geometric
information.
Tsun-Yi Yang , Duy-Kien Nguyen , Huub Heijnen , Vassileios Balntas Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
In this paper, we explore how three related tasks, namely keypoint detection,
description, and image retrieval can be jointly tackled using a single unified
framework, which is trained without the need of training data with point to
point correspondences. By leveraging diverse information from sequential layers
of a standard ResNet-based architecture, we are able to extract keypoints and
descriptors that encode local information using generic techniques such as
local activation norms, channel grouping and dropping, and self-distillation.
Subsequently, global information for image retrieval is encoded in an
end-to-end pipeline, based on pooling of the aforementioned local responses. In
contrast to previous methods in local matching, our method does not depend on
pointwise/pixelwise correspondences, and requires no such supervision at all
i.e. no depth-maps from an SfM model nor manually created synthetic affine
transformations. We illustrate that this simple and direct paradigm, is able to
achieve very competitive results against the state-of-the-art methods in
various challenging benchmark conditions such as viewpoint changes, scale
changes, and day-night shifting localization.
Autocamera Calibration for traffic surveillance cameras with wide angle lenses
Aman Gajendra Jain , Nicolas Saunier Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)
We propose a method for automatic calibration of a traffic surveillance
camera with wide-angle lenses. Video footage of a few minutes is sufficient for
the entire calibration process to take place. This method takes in the height
of the camera from the ground plane as the only user input to overcome the
scale ambiguity. The calibration is performed in two stages, 1. Intrinsic
Calibration 2. Extrinsic Calibration. Intrinsic calibration is achieved by
assuming an equidistant fisheye distortion and an ideal camera model. Extrinsic
calibration is accomplished by estimating the two vanishing points, on the
ground plane, from the motion of vehicles at perpendicular intersections. The
first stage of intrinsic calibration is also valid for thermal cameras.
Experiments have been conducted to demonstrate the effectiveness of this
approach on visible as well as thermal cameras.
Index Terms: fish-eye, calibration, thermal camera, intelligent
transportation systems, vanishing points
Spectral Pyramid Graph Attention Network for Hyperspectral Image Classification
Comments: 7 pages, 6 figures, 4 tables
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNN) have made significant advances in
hyperspectral image (HSI) classification. However, standard convolutional
kernel neglects the intrinsic connections between data points, resulting in
poor region delineation and small spurious predictions. Furthermore, HSIs have
a unique continuous data distribution along the high dimensional spectrum
domain – much remains to be addressed in characterizing the spectral contexts
considering the prohibitively high dimensionality and improving reasoning
capability in light of the limited amount of labelled data. This paper presents
a novel architecture which explicitly addresses these two issues. Specifically,
we design an architecture to encode the multiple spectral contextual
information in the form of spectral pyramid of multiple embedding spaces. In
each spectral embedding space, we propose graph attention mechanism to
explicitly perform interpretable reasoning in the spatial domain based on the
connection in spectral feature space. Experiments on three HSI datasets
demonstrate that the proposed architecture can significantly improve the
classification accuracy compared with the existing methods.
Active and Incremental Learning with Weak Supervision
Comments: Accepted for publication in KI – K”unstliche Intelligenz
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Large amounts of labeled training data are one of the main contributors to
the great success that deep models have achieved in the past. Label acquisition
for tasks other than benchmarks can pose a challenge due to requirements of
both funding and expertise. By selecting unlabeled examples that are promising
in terms of model improvement and only asking for respective labels, active
learning can increase the efficiency of the labeling process in terms of time
and cost.
In this work, we describe combinations of an incremental learning scheme and
methods of active learning. These allow for continuous exploration of newly
observed unlabeled data. We describe selection criteria based on model
uncertainty as well as expected model output change (EMOC). An object detection
task is evaluated in a continuous exploration context on the PASCAL VOC
dataset. We also validate a weakly supervised system based on active and
incremental learning in a real-world biodiversity application where images from
camera traps are analyzed. Labeling only 32 images by accepting or rejecting
proposals generated by our method yields an increase in accuracy from 25.4% to
42.6%.
Zhen-Liang Ni , Gui-Bin Bian , Guan-An Wang , Xiao-Hu Zhou , Zeng-Guang Hou , Xiao-Liang Xie , Zhen Li , Yu-Han Wang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Robotics (cs.RO); Image and Video Processing (eess.IV)
Surgical instrument segmentation is extremely important for computer-assisted
surgery. Different from common object segmentation, it is more challenging due
to the large illumination and scale variation caused by the special surgical
scenes. In this paper, we propose a novel bilinear attention network with
adaptive receptive field to solve these two challenges. For the illumination
variation, the bilinear attention module can capture second-order statistics to
encode global contexts and semantic dependencies between local pixels. With
them, semantic features in challenging areas can be inferred from their
neighbors and the distinction of various semantics can be boosted. For the
scale variation, our adaptive receptive field module aggregates multi-scale
features and automatically fuses them with different weights. Specifically, it
encodes the semantic relationship between channels to emphasize feature maps
with appropriate scales, changing the receptive field of subsequent
convolutions. The proposed network achieves the best performance 97.47% mean
IOU on Cata7 and comes first place on EndoVis 2017 by 10.10% IOU overtaking
second-ranking method.
Comments: submitted to International Journal of Machine Learning and Cybernetics
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Representation based classification methods have become a hot research topic
during the past few years, and the two most prominent approaches are sparse
representation based classification (SRC) and collaborative representation
based classification (CRC). CRC reveals that it is the collaborative
representation rather than the sparsity that makes SRC successful.
Nevertheless, the dense representation of CRC may not be discriminative which
will degrade its performance for classification tasks. To alleviate this
problem to some extent, we propose a new method called sparse and
collaborative-competitive representation based classification (SCCRC) for image
classification. Firstly, the coefficients of the test sample are obtained by
SRC and CCRC, respectively. Then the fused coefficient is derived by
multiplying the coefficients of SRC and CCRC. Finally, the test sample is
designated to the class that has the minimum residual. Experimental results on
several benchmark databases demonstrate the efficacy of our proposed SCCRC. The
source code of SCCRC is accessible at this https URL .
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Moshiur R. Farazi , Salman H. Khan , Nick Barnes Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computational Complexity (cs.CC)
Visual Question Answering (VQA) has emerged as a Visual Turing Test to
validate the reasoning ability of AI agents. The pivot to existing VQA models
is the joint embedding that is learned by combining the visual features from an
image and the semantic features from a given question. Consequently, a large
body of literature has focused on developing complex joint embedding strategies
coupled with visual attention mechanisms to effectively capture the interplay
between these two modalities. However, modelling the visual and semantic
features in a high dimensional (joint embedding) space is computationally
expensive, and more complex models often result in trivial improvements in the
VQA accuracy. In this work, we systematically study the trade-off between the
model complexity and the performance on the VQA task. VQA models have a diverse
architecture comprising of pre-processing, feature extraction, multimodal
fusion, attention and final classification stages. We specifically focus on the
effect of “multi-modal fusion” in VQA models that is typically the most
expensive step in a VQA pipeline. Our thorough experimental evaluation leads us
to two proposals, one optimized for minimal complexity and the other one
optimized for state-of-the-art VQA performance.
Plane Pair Matching for Efficient 3D View Registration
Adrien Kaiser , José Alonso Ybanez Zepeda , Tamy Boubekeur Subjects : Computer Vision and Pattern Recognition (cs.CV)
We present a novel method to estimate the motion matrix between overlapping
pairs of 3D views in the context of indoor scenes. We use the Manhattan world
assumption to introduce lightweight geometric constraints under the form of
planes into the problem, which reduces complexity by taking into account the
structure of the scene. In particular, we define a stochastic framework to
categorize planes as vertical or horizontal and parallel or non-parallel. We
leverage this classification to match pairs of planes in overlapping views with
point-of-view agnostic structural metrics. We propose to split the motion
computation using the classification and estimate separately the rotation and
translation of the sensor, using a quadric minimizer. We validate our approach
on a toy example and present quantitative experiments on a public RGB-D
dataset, comparing against recent state-of-the-art methods. Our evaluation
shows that planar constraints only add low computational overhead while
improving results in precision when applied after a prior coarse estimate. We
conclude by giving hints towards extensions and improvements of current
results.
Comments: 2018 International Conference on Smart Multimedia
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Image and Video Processing (eess.IV)
High Dynamic Range (HDR) imaging is gaining increased attention due to its
realistic content, for not only regular displays but also smartphones. Before
sufficient HDR content is distributed, HDR visualization still relies mostly on
converting Standard Dynamic Range (SDR) content. SDR images are often
quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video
transmission. Quantization can easily lead to banding artefacts. In some
computing and/or memory I/O limited environment, the traditional solution using
spatial neighborhood information is not feasible. Our method includes noise
generation (offline) and noise injection (online), and operates on pixels of
the quantized image. We vary the magnitude and structure of the noise pattern
adaptively based on the luma of the quantized pixel and the slope of the
inverse-tone mapping function. Subjective user evaluations confirm the superior
performance of our technique.
FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing
Comments: Accepted by AAAI2020 (with supplementary files)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Recently, convolutional neural networks (CNNs) have achieved great
improvements in single image dehazing and attained much attention in research.
Most existing learning-based dehazing methods are not fully end-to-end, which
still follow the traditional dehazing procedure: first estimate the medium
transmission and the atmospheric light, then recover the haze-free image based
on the atmospheric scattering model. However, in practice, due to lack of
priors and constraints, it is hard to precisely estimate these intermediate
parameters. Inaccurate estimation further degrades the performance of dehazing,
resulting in artifacts, color distortion and insufficient haze removal. To
address this, we propose a fully end-to-end Generative Adversarial Networks
with Fusion-discriminator (FD-GAN) for image dehazing. With the proposed
Fusion-discriminator which takes frequency information as additional priors,
our model can generator more natural and realistic dehazed images with less
color distortion and fewer artifacts. Moreover, we synthesize a large-scale
training dataset including various indoor and outdoor hazy images to boost the
performance and we reveal that for learning-based dehazing methods, the
performance is strictly influenced by the training data. Experiments have shown
that our method reaches state-of-the-art performance on both public synthetic
datasets and real-world images with more visually pleasing dehazed results.
A hybrid algorithm for disparity calculation from sparse disparity estimates based on stereo vision
Comments: 2014 SPCOM
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In this paper, we have proposed a novel method for stereo disparity
estimation by combining the existing methods of block based and region based
stereo matching. Our method can generate dense disparity maps from disparity
measurements of only 18% pixels of either the left or the right image of a
stereo image pair. It works by segmenting the lightness values of image pixels
using a fast implementation of K-Means clustering. It then refines those
segment boundaries by morphological filtering and connected components
analysis, thus removing a lot of redundant boundary pixels. This is followed by
determining the boundaries’ disparities by the SAD cost function. Lastly, we
reconstruct the entire disparity map of the scene from the boundaries’
disparities through disparity propagation along the scan lines and disparity
prediction of regions of uncertainty by considering disparities of the
neighboring regions. Experimental results on the Middlebury stereo vision
dataset demonstrate that the proposed method outperforms traditional disparity
determination methods like SAD and NCC by up to 30% and achieves an improvement
of 2.6% when compared to a recent approach based on absolute difference (AD)
cost function for disparity calculations [1].
G2MF-WA: Geometric Multi-Model Fitting with Weakly Annotated Data
Chao Zhang , Xuequan Lu , Katsuya Hotta , Xi Yang Subjects : Computer Vision and Pattern Recognition (cs.CV)
In this paper we attempt to address the problem of geometric multi-model
fitting with resorting to a few weakly annotated (WA) data points, which has
been sparsely studied so far. In weak annotating, most of the manual
annotations are supposed to be correct yet inevitably mixed with incorrect
ones. The WA data can be naturally obtained in an interactive way for specific
tasks, for example, in the case of homography estimation, one can easily
annotate points on the same plane/object with a single label by observing the
image. Motivated by this, we propose a novel method to make full use of the WA
data to boost the multi-model fitting performance. Specifically, a graph for
model proposal sampling is first constructed using the WA data, given the prior
that the WA data annotated with the same weak label has a high probability of
being assigned to the same model. By incorporating this prior knowledge into
the calculation of edge probabilities, vertices (i.e., data points) lie on/near
the latent model are likely to connect together and further form a
subset/cluster for effective proposals generation. With the proposals
generated, the (alpha)-expansion is adopted for labeling, and our method in
return updates the proposals. This works in an iterative way. Extensive
experiments validate our method and show that the proposed method produces
noticeably better results than state-of-the-art techniques in most cases.
A Novel Image Dehazing and Assessment Method
Comments: Accepted in IBA-ICICT 2019
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Images captured in hazy weather conditions often suffer from color contrast
and color fidelity. This degradation is represented by transmission map which
represents the amount of attenuation and airlight which represents the color of
additive noise. In this paper, we have proposed a method to estimate the
transmission map using haze levels instead of airlight color since there are
some ambiguities in estimation of airlight. Qualitative and quantitative
results of proposed method show competitiveness of the method given. In
addition we have proposed two metrics which are based on statistics of natural
outdoor images for assessment of haze removal algorithms.
SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Existing VQA datasets contain questions with varying levels of complexity.
While the majority of questions in these datasets require perception for
recognizing existence, properties, and spatial relationships of entities, a
significant portion of questions pose challenges that correspond to reasoning
tasks — tasks that can only be answered through a synthesis of perception and
knowledge about the world, logic and / or reasoning. This distinction allows us
to notice when existing VQA models have consistency issues — they answer the
reasoning question correctly but fail on associated low-level perception
questions. For example, models answer the complex reasoning question “Is the
banana ripe enough to eat?” correctly, but fail on the associated perception
question “Are the bananas mostly green or yellow?” indicating that the model
likely answered the reasoning question correctly but for the wrong reason. We
quantify the extent to which this phenomenon occurs by creating a new Reasoning
split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of
200K new perception questions which serve as sub questions corresponding to the
set of perceptual tasks needed to effectively answer the complex reasoning
questions in the Reasoning split. Additionally, we propose an approach called
Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the
model to attend do the same parts of the image when answering the reasoning
question and the perception sub questions. We show that SQuINT improves model
consistency by 7.8%, also marginally improving its performance on the Reasoning
questions in VQA, while also displaying qualitatively better attention maps.
MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning
Simon Vandenhende , Stamatios Georgoulis , Luc Van Gool Subjects : Computer Vision and Pattern Recognition (cs.CV)
In this paper, we highlight the importance of considering task interactions
at multiple scales when distilling task information in a multi-task learning
setup. In contrast to common belief, we show that tasks with high pattern
affinity at a certain scale are not guaranteed to retain this behaviour at
other scales, and vice versa. We propose a novel architecture, MTI-Net, that
builds upon this finding in three ways. First, it explicitly models task
interactions at every scale via a multi-scale multi-modal distillation unit.
Second, it propagates distilled task information from lower to higher scales
via a feature propagation module. Third, it aggregates the refined task
features from all scales via a feature aggregation unit to produce the final
per-task predictions.
Extensive experiments on two multi-task dense labeling datasets show that,
unlike prior work, our multi-task model delivers on the full potential of
multi-task learning, that is, smaller memory footprint, reduced number of
calculations, and better performance w.r.t. single-task learning.
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
Zhu Zhang , Zhou Zhao , Yang Zhao , Qi Wang , Huasheng Liu , Lianli Gao Subjects : Computer Vision and Pattern Recognition (cs.CV)
In this paper, we consider a novel task, Spatio-Temporal Video Grounding for
Multi-Form Sentences (STVG). Given an untrimmed video and a
declarative/interrogative sentence depicting an object, STVG aims to localize
the spatiotemporal tube of the queried object. STVG has two challenging
settings: (1) We need to localize spatio-temporal object tubes from untrimmed
videos, where the object may only exist in a very small segment of the video;
(2) We deal with multi-form sentences, including the declarative sentences with
explicit objects and interrogative sentences with unknown objects. Existing
methods cannot tackle the STVG task due to the ineffective tube pre-generation
and the lack of object relationship modeling. Thus, we then propose a novel
Spatio-Temporal Graph Reasoning Network (STGRN) for this task. First, we build
a spatio-temporal region graph to capture the region relationships with
temporal object dynamics, which involves the implicit and explicit spatial
subgraphs in each frame and the temporal dynamic subgraph across frames. We
then incorporate textual clues into the graph and develop the multi-step
cross-modal graph reasoning. Next, we introduce a spatio-temporal localizer
with a dynamic selection method to directly retrieve the spatiotemporal tubes
without tube pre-generation. Moreover, we contribute a large-scale video
grounding dataset VidSTG based on video relation dataset VidOR. The extensive
experiments demonstrate the effectiveness of our method.
RGB-D Odometry and SLAM
Comments: This is the pre-submission version of the manuscript that was later edited and published as a chapter in RGB-D Image Analysis and Processing
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Robotics (cs.RO)
The emergence of modern RGB-D sensors had a significant impact in many
application fields, including robotics, augmented reality (AR) and 3D scanning.
They are low-cost, low-power and low-size alternatives to traditional range
sensors such as LiDAR. Moreover, unlike RGB cameras, RGB-D sensors provide the
additional depth information that removes the need of frame-by-frame
triangulation for 3D scene reconstruction. These merits have made them very
popular in mobile robotics and AR, where it is of great interest to estimate
ego-motion and 3D scene structure. Such spatial understanding can enable robots
to navigate autonomously without collisions and allow users to insert virtual
entities consistent with the image stream. In this chapter, we review common
formulations of odometry and Simultaneous Localization and Mapping (known by
its acronym SLAM) using RGB-D stream input. The two topics are closely related,
as the former aims to track the incremental camera motion with respect to a
local map of the scene, and the latter to jointly estimate the camera
trajectory and the global map with consistency. In both cases, the standard
approaches minimize a cost function using nonlinear optimization techniques.
This chapter consists of three main parts: In the first part, we introduce the
basic concept of odometry and SLAM and motivate the use of RGB-D sensors. We
also give mathematical preliminaries relevant to most odometry and SLAM
algorithms. In the second part, we detail the three main components of SLAM
systems: camera pose tracking, scene mapping and loop closing. For each
component, we describe different approaches proposed in the literature. In the
final part, we provide a brief discussion on advanced research topics with the
references to the state-of-the-art.
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
Comments: Published in ICLR2020; code: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (stat.ML)
Batch Normalization (BN) is one of the most widely used techniques in Deep
Learning field. But its performance can awfully degrade with insufficient batch
size. This weakness limits the usage of BN on many computer vision tasks like
detection or segmentation, where batch size is usually small due to the
constraint of memory consumption. Therefore many modified normalization
techniques have been proposed, which either fail to restore the performance of
BN completely, or have to introduce additional nonlinear operations in
inference procedure and increase huge consumption. In this paper, we reveal
that there are two extra batch statistics involved in backward propagation of
BN, on which has never been well discussed before. The extra batch statistics
associated with gradients also can severely affect the training of deep neural
network. Based on our analysis, we propose a novel normalization method, named
Moving Average Batch Normalization (MABN). MABN can completely restore the
performance of vanilla BN in small batch cases, without introducing any
additional nonlinear operations in inference procedure. We prove the benefits
of MABN by both theoretical analysis and experiments. Our experiments
demonstrate the effectiveness of MABN in multiple computer vision tasks
including ImageNet and COCO. The code has been released in
Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement
Chunle Guo , Chongyi Li , Jichang Guo , Chen Change Loy , Junhui Hou , Sam Kwong , Runmin Cong Subjects : Computer Vision and Pattern Recognition (cs.CV)
The paper presents a novel method, Zero-Reference Deep Curve Estimation
(Zero-DCE), which formulates light enhancement as a task of image-specific
curve estimation with a deep network. Our method trains a lightweight deep
network, DCE-Net, to estimate pixel-wise and high-order curves for dynamic
range adjustment of a given image. The curve estimation is specially designed,
considering pixel value range, monotonicity, and differentiability. Zero-DCE is
appealing in its relaxed assumption on reference images, i.e., it does not
require any paired or unpaired data during training. This is achieved through a
set of carefully formulated non-reference loss functions, which implicitly
measure the enhancement quality and drive the learning of the network. Our
method is efficient as image enhancement can be achieved by an intuitive and
simple nonlinear curve mapping. Despite its simplicity, we show that it
generalizes well to diverse lighting conditions. Extensive experiments on
various benchmarks demonstrate the advantages of our method over
state-of-the-art methods qualitatively and quantitatively. Furthermore, the
potential benefits of our Zero-DCE to face detection in the dark are discussed.
Code and model will be available at this https URL .
SlideImages: A Dataset for Educational Image Classification
Comments: 8 pages, 2 figures, to be presented at ECIR 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Information Retrieval (cs.IR)
In the past few years, convolutional neural networks (CNNs) have achieved
impressive results in computer vision tasks, which however mainly focus on
photos with natural scene content. Besides, non-sensor derived images such as
illustrations, data visualizations, figures, etc. are typically used to convey
complex information or to explore large datasets. However, this kind of images
has received little attention in computer vision. CNNs and similar techniques
use large volumes of training data. Currently, many document analysis systems
are trained in part on scene images due to the lack of large datasets of
educational image data. In this paper, we address this issue and present
SlideImages, a dataset for the task of classifying educational illustrations.
SlideImages contains training data collected from various sources, e.g.,
Wikimedia Commons and the AI2D dataset, and test data collected from
educational slides. We have reserved all the actual educational images as a
test dataset in order to ensure that the approaches using this dataset
generalize well to new educational images, and potentially other domains.
Furthermore, we present a baseline system using a standard deep neural
architecture and discuss dealing with the challenge of limited training data.
Deep Semantic Face Deblurring
Comments: Submitted to International Journal of Computer Vision (IJCV). arXiv admin note: text overlap with arXiv:1803.03345
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose an effective and efficient face deblurring
algorithm by exploiting semantic cues via deep convolutional neural networks.
As the human faces are highly structured and share unified facial components
(e.g., eyes and mouths), such semantic information provides a strong prior for
restoration. We incorporate face semantic labels as input priors and propose an
adaptive structural loss to regularize facial local structures within an
end-to-end deep convolutional neural network. Specifically, we first use a
coarse deblurring network to reduce the motion blur on the input face image. We
then adopt a parsing network to extract the semantic features from the coarse
deblurred image. Finally, the fine deblurring network utilizes the semantic
information to restore a clear face image. We train the network with perceptual
and adversarial losses to generate photo-realistic results. The proposed method
restores sharp images with more accurate facial features and details.
Quantitative and qualitative evaluations demonstrate that the proposed face
deblurring algorithm performs favorably against the state-of-the-art methods in
terms of restoration quality, face recognition and execution speed.
Gated Path Selection Network for Semantic Segmentation
Qichuan Geng , Hong Zhang , Xiaojuan Qi , Ruigang Yang , Zhong Zhou , Gao Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Semantic segmentation is a challenging task that needs to handle large scale
variations, deformations and different viewpoints. In this paper, we develop a
novel network named Gated Path Selection Network (GPSNet), which aims to learn
adaptive receptive fields. In GPSNet, we first design a two-dimensional
multi-scale network – SuperNet, which densely incorporates features from
growing receptive fields. To dynamically select desirable semantic context, a
gate prediction module is further introduced. In contrast to previous works
that focus on optimizing sample positions on the regular grids, GPSNet can
adaptively capture free form dense semantic contexts. The derived adaptive
receptive fields are data-dependent, and are flexible that can model different
object geometric transformations. On two representative semantic segmentation
datasets, i.e., Cityscapes, and ADE20K, we show that the proposed approach
consistently outperforms previous methods and achieves competitive performance
without bells and whistles.
Human-Aware Motion Deblurring
Comments: ICCV2019 paper. Website: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a human-aware deblurring model that disentangles the
motion blur between foreground (FG) humans and background (BG). The proposed
model is based on a triple-branch encoder-decoder architecture. The first two
branches are learned for sharpening FG humans and BG details, respectively;
while the third one produces global, harmonious results by comprehensively
fusing multi-scale deblurring information from the two domains. The proposed
model is further endowed with a supervised, human-aware attention mechanism in
an end-to-end fashion. It learns a soft mask that encodes FG human information
and explicitly drives the FG/BG decoder-branches to focus on their specific
domains. To further benefit the research towards Human-aware Image Deblurring,
we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry
and sharp image pairs with 65,784 densely annotated FG human bounding boxes.
HIDE is specifically built to span a broad range of scenes, human object sizes,
motion patterns, and background complexities. Extensive experiments on public
benchmarks and our dataset demonstrate that our model performs favorably
against the state-of-the-art motion deblurring methods, especially in capturing
semantic details.
GTNet: Generative Transfer Network for Zero-Shot Object Detection
Comments: Accepted by AAAI 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We propose a Generative Transfer Network (GTNet) for zero shot object
detection (ZSD). GTNet consists of an Object Detection Module and a Knowledge
Transfer Module. The Object Detection Module can learn large-scale seen domain
knowledge. The Knowledge Transfer Module leverages a feature synthesizer to
generate unseen class features, which are applied to train a new classification
layer for the Object Detection Module. In order to synthesize features for each
unseen class with both the intra-class variance and the IoU variance, we design
an IoU-Aware Generative Adversarial Network (IoUGAN) as the feature
synthesizer, which can be easily integrated into GTNet. Specifically, IoUGAN
consists of three unit models: Class Feature Generating Unit (CFU), Foreground
Feature Generating Unit (FFU), and Background Feature Generating Unit (BFU).
CFU generates unseen features with the intra-class variance conditioned on the
class semantic embeddings. FFU and BFU add the IoU variance to the results of
CFU, yielding class-specific foreground and background features, respectively.
We evaluate our method on three public datasets and the results demonstrate
that our method performs favorably against the state-of-the-art ZSD approaches.
See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks
Comments: CVPR2019. Weblink: this https URL
Journal-ref: CVPR2019
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We introduce a novel network, called CO-attention Siamese Network (COSNet),
to address the unsupervised video object segmentation task from a holistic
view. We emphasize the importance of inherent correlation among video frames
and incorporate a global co-attention mechanism to improve further the
state-of-the-art deep learning based solutions that primarily focus on learning
discriminative foreground representations over appearance and motion in
short-term temporal segments. The co-attention layers in our network provide
efficient and competent stages for capturing global correlations and scene
context by jointly computing and appending co-attention responses into a joint
feature space. We train COSNet with pairs of video frames, which naturally
augments training data and allows increased learning capacity. During the
segmentation stage, the co-attention model encodes useful information by
processing multiple reference frames together, which is leveraged to infer the
frequently reappearing and salient foreground objects better. We propose a
unified and end-to-end trainable framework where different co-attention
variants can be derived for mining the rich context within videos. Our
extensive experiments over three large benchmarks manifest that COSNet
outperforms the current alternatives by a large margin.
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
Comments: ICCV2019(Oral). Website: this https URL
Journal-ref: ICCV2019(Oral)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This work proposes a novel attentive graph neural network (AGNN) for
zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this
task as a process of iterative information fusion over video graphs.
Specifically, AGNN builds a fully connected graph to efficiently represent
frames as nodes, and relations between arbitrary frame pairs as edges. The
underlying pair-wise relations are described by a differentiable attention
mechanism. Through parametric message passing, AGNN is able to efficiently
capture and mine much richer and higher-order relations between video frames,
thus enabling a more complete understanding of video content and more accurate
foreground estimation. Experimental results on three video segmentation
datasets show that AGNN sets a new state-of-the-art in each case. To further
demonstrate the generalizability of our framework, we extend AGNN to an
additional task: image object co-segmentation (IOCS). We perform experiments on
two famous IOCS datasets and observe again the superiority of our AGNN model.
The extensive experiments verify that AGNN is able to learn the underlying
semantic/appearance relationships among video frames or related images, and
discover the common objects.
Learning Compositional Neural Information Fusion for Human Parsing
Comments: ICCV2019. Websie: this https URL
Journal-ref: ICCV2019
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This work proposes to combine neural networks with the compositional
hierarchy of human bodies for efficient and complete human parsing. We
formulate the approach as a neural information fusion framework. Our model
assembles the information from three inference processes over the hierarchy:
direct inference (directly predicting each part of a human body using image
information), bottom-up inference (assembling knowledge from constituent
parts), and top-down inference (leveraging context from parent nodes). The
bottom-up and top-down inferences explicitly model the compositional and
decompositional relations in human bodies, respectively. In addition, the
fusion of multi-source information is conditioned on the inputs, i.e., by
estimating and considering the confidence of the sources. The whole model is
end-to-end differentiable, explicitly modeling information flows and
structures. Our approach is extensively evaluated on four popular datasets,
outperforming the state-of-the-arts in all cases, with a fast processing speed
of 23fps. Our code and results have been released to help ease future research
in this direction.
Image denoising via K-SVD with primal-dual active set algorithm
Comments: 9 pages, 6 figures. The paper was accepted by IEEE. WACV 2020 and will placed in the IEEE Xplore
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (stat.ML)
K-SVD algorithm has been successfully applied to image denoising tasks dozens
of years but the big bottleneck in speed and accuracy still needs attention to
break. For the sparse coding stage in K-SVD, which involves (ell_{0})
constraint, prevailing methods usually seek approximate solutions greedily but
are less effective once the noise level is high. The alternative (ell_{1})
optimization is proved to be powerful than (ell_{0}), however, the time
consumption prevents it from the implementation. In this paper, we propose a
new K-SVD framework called K-SVD(_P) by applying the Primal-dual active set
(PDAS) algorithm to it. Different from the greedy algorithms based K-SVD, the
K-SVD(_P) algorithm develops a selection strategy motivated by KKT
(Karush-Kuhn-Tucker) condition and yields to an efficient update in the sparse
coding stage. Since the K-SVD(_P) algorithm seeks for an equivalent solution to
the dual problem iteratively with simple explicit expression in this denoising
problem, speed and quality of denoising can be reached simultaneously.
Experiments are carried out and demonstrate the comparable denoising
performance of our K-SVD(_P) with state-of-the-art methods.
Towards More Efficient and Effective Inference: The Joint Decision of Multi-Participants
Hui Zhu , Zhulin An , Kaiqiang Xu , Xiaolong Hu , Yongjun Xu Subjects : Computer Vision and Pattern Recognition (cs.CV)
Existing approaches to improve the performances of convolutional neural
networks by optimizing the local architectures or deepening the networks tend
to increase the size of models significantly. In order to deploy and apply the
neural networks to edge devices which are in great demand, reducing the scale
of networks are quite crucial. However, It is easy to degrade the performance
of image processing by compressing the networks. In this paper, we propose a
method which is suitable for edge devices while improving the efficiency and
effectiveness of inference. The joint decision of multi-participants, mainly
contain multi-layers and multi-networks, can achieve higher classification
accuracy (0.26% on CIFAR-10 and 4.49% on CIFAR-100 at most) with similar total
number of parameters for classical convolutional neural networks.
MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion
Comments: Under Review
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
To efficiently extract spatiotemporal features of video for action
recognition, most state-of-the-art methods integrate 1D temporal convolution
into a conventional 2D CNN backbone. However, they all exploit 1D temporal
convolution of fixed kernel size (i.e., 3) in the network building block, thus
have suboptimal temporal modeling capability to handle both long-term and
short-term actions. To address this problem, we first investigate the impacts
of different kernel sizes for the 1D temporal convolutional filters. Then, we
propose a simple yet efficient operation called Mixed Temporal Convolution
(MixTConv), which consists of multiple depthwise 1D convolutional filters with
different kernel sizes. By plugging MixTConv into the conventional 2D CNN
backbone ResNet-50, we further propose an efficient and effective network
architecture named MSTNet for action recognition, and achieve state-of-the-art
results on multiple benchmarks.
NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection
Comments: 10 pages, 8 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Due to the advantages of real-time detection and improved performance,
single-shot detectors have gained great attention recently. To solve the
complex scale variations, single-shot detectors make scale-aware predictions
based on multiple pyramid layers. However, the features in the pyramid are not
scale-aware enough, which limits the detection performance. Two common problems
in single-shot detectors caused by object scale variations can be observed: (1)
small objects are easily missed; (2) the salient part of a large object is
sometimes detected as an object. With this observation, we propose a new
Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid
features and explore scale-aware features. In NET, a Neighbor Erasing Module
(NEM) is designed to erase the salient features of large objects and emphasize
the features of small objects in shallow layers. A Neighbor Transferring Module
(NTM) is introduced to transfer the erased features and highlight large objects
in deep layers. With this mechanism, a single-shot network called NETNet is
constructed for scale-aware object detection. In addition, we propose to
aggregate nearest neighboring pyramid features to enhance our NET. NETNet
achieves 38.5% AP at a speed of 27 FPS and 32.0% AP at a speed of 55 FPS on MS
COCO dataset. As a result, NETNet achieves a better trade-off for real-time and
accurate object detection.
Comments: To appear in AAAI2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Temporally language grounding in untrimmed videos is a newly-raised task in
video understanding. Most of the existing methods suffer from inferior
efficiency, lacking interpretability, and deviating from the human perception
mechanism. Inspired by human’s coarse-to-fine decision-making paradigm, we
formulate a novel Tree-Structured Policy based Progressive Reinforcement
Learning (TSP-PRL) framework to sequentially regulate the temporal boundary by
an iterative refinement process. The semantic concepts are explicitly
represented as the branches in the policy, which contributes to efficiently
decomposing complex policies into an interpretable primitive action.
Progressive reinforcement learning provides correct credit assignment via two
task-oriented rewards that encourage mutual promotion within the
tree-structured policy. We extensively evaluate TSP-PRL on the Charades-STA and
ActivityNet datasets, and experimental results show that TSP-PRL achieves
competitive performance over existing state-of-the-art methods.
ENAS U-Net: Evolutionary Neural Architecture Search for Retinal Vessel Segmentation
Zhun Fan , Jiahong Wei , Guijie Zhu , Jiajie Mo , Wenji Li Subjects : Computer Vision and Pattern Recognition (cs.CV)
The accurate retina vessel segmentation (RVS) is of great significance to
assist doctors in the diagnosis of ophthalmology diseases and other systemic
diseases, and manually designing a valid neural network architecture for
retinal vessel segmentation requires high expertise and a large workload. In
order to further improve the performance of vessel segmentation and reduce the
workload of manually designing neural network. We propose a specific search
space based on encoder-decoder framework and apply neural architecture search
(NAS) to retinal vessel segmentation. The search space is a macro-architecture
search that involves some operations and adjustments to the entire network
topology. For the architecture optimization, we adopt the modified evolutionary
strategy which can evolve with limited computing resource to evolve the
architectures. During the evolution, we select the elite architectures for the
next generation evolution based on their performances. After the evolution, the
searched model is evaluated on three mainstream datasets, namely DRIVE, STARE
and CHASE_DB1. The searched model achieves top performance on all three
datasets with fewer parameters (about 2.3M). Moreover, the results of
cross-training between above three datasets show that the searched model is
with considerable scalability, which indicates that the searched model is with
potential for clinical disease diagnosis.
Min Li , Zhenglong Zhou , Zhe Wu , Boxin Shi , Changyu Diao , Ping Tan Subjects : Computer Vision and Pattern Recognition (cs.CV)
We present a method to capture both 3D shape and spatially varying
reflectance with a multi-view photometric stereo (MVPS) technique that works
for general isotropic materials. Our algorithm is suitable for perspective
cameras and nearby point light sources. Our data capture setup is simple, which
consists of only a digital camera, some LED lights, and an optional automatic
turntable. From a single viewpoint, we use a set of photometric stereo images
to identify surface points with the same distance to the camera. We collect
this information from multiple viewpoints and combine it with
structure-from-motion to obtain a precise reconstruction of the complete 3D
shape. The spatially varying isotropic bidirectional reflectance distribution
function (BRDF) is captured by simultaneously inferring a set of basis BRDFs
and their mixing weights at each surface point. In experiments, we demonstrate
our algorithm with two different setups: a studio setup for highest precision
and a desktop setup for best usability. According to our experiments, under the
studio setting, the captured shapes are accurate to 0.5 millimeters and the
captured reflectance has a relative root-mean-square error (RMSE) of 9%. We
also quantitatively evaluate state-of-the-art MVPS on a newly collected
benchmark dataset, which is publicly available for inspiring future research.
Text-to-Image Generation with Attention Based Recurrent Neural Networks
Tehseen Zia , Shahan Arif , Shakeeb Murtaza , Mirza Ahsan Ullah Subjects : Computer Vision and Pattern Recognition (cs.CV)
Conditional image modeling based on textual descriptions is a relatively new
domain in unsupervised learning. Previous approaches use a latent variable
model and generative adversarial networks. While the formers are approximated
by using variational auto-encoders and rely on the intractable inference that
can hamper their performance, the latter is unstable to train due to Nash
equilibrium based objective function. We develop a tractable and stable
caption-based image generation model. The model uses an attention-based encoder
to learn word-to-pixel dependencies. A conditional autoregressive based decoder
is used for learning pixel-to-pixel dependencies and generating images.
Experimentations are performed on Microsoft COCO, and MNIST-with-captions
datasets and performance is evaluated by using the Structural Similarity Index.
Results show that the proposed model performs better than contemporary
approaches and generate better quality images. Keywords: Generative image
modeling, autoregressive image modeling, caption-based image generation, neural
attention, recurrent neural networks.
Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval
Comments: Accepted in WACV’2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that
the data of all the classes are available during training. The assumption may
not always be practical since the data of a few classes may be unavailable, or
the classes may not appear at the time of training. Zero-Shot Sketch-Based
Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to
handle previously unseen classes during the test. This paper proposes a
generative approach based on the Stacked Adversarial Network (SAN) and the
advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a
high-quality sample, SN learns a better distance metric compared to that of the
nearest neighbor search. The capability of the generative model to synthesize
image features based on the sketch reduces the SBIR problem to that of an
image-to-image retrieval problem. We evaluate the efficacy of our proposed
approach on TU-Berlin, and Sketchy database in both standard ZSL and
generalized ZSL setting. The proposed method yields a significant improvement
in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL)
for SBIR.
Deep Metric Structured Learning For Facial Expression Recognition
Pedro D. Marrero Fernandez , Tsang Ing Ren , Tsang Ing Jyh , Fidel A. Guerrero Peña , Alexandre Cunha Subjects : Computer Vision and Pattern Recognition (cs.CV)
We propose a deep metric learning model to create embedded sub-spaces with a
well defined structure. A new loss function that imposes Gaussian structures on
the output space is introduced to create these sub-spaces thus shaping the
distribution of the data. Having a mixture of Gaussians solution space is
advantageous given its simplified and well established structure. It allows
fast discovering of classes within classes and the identification of mean
representatives at the centroids of individual classes. We also propose a new
semi-supervised method to create sub-classes. We illustrate our methods on the
facial expression recognition problem and validate results on the FER+,
AffectNet, Extended Cohn-Kanade (CK+), BU-3DFE, and JAFFE datasets. We
experimentally demonstrate that the learned embedding can be successfully used
for various applications including expression retrieval and emotion
recognition.
A Foreground-background Parallel Compression with Residual Encoding for Surveillance Video
Lirong Wu , Kejie Huang , Haibin Shen , Lianli Gao Subjects : Computer Vision and Pattern Recognition (cs.CV)
The data storage has been one of the bottlenecks in surveillance systems. The
conventional video compression algorithms such as H.264 and H.265 do not fully
utilize the low information density characteristic of the surveillance video.
In this paper, we propose a video compression method that extracts and
compresses the foreground and background of the video separately. The
compression ratio is greatly improved by sharing background information among
multiple adjacent frames through an adaptive background updating and
interpolation module. Besides, we present two different schemes to compress the
foreground and compare their performance in the ablation study to show the
importance of temporal information for video compression. In the decoding end,
a coarse-to-fine two-stage module is applied to achieve the composition of the
foreground and background and the enhancements of frame quality. Furthermore,
an adaptive sampling method for surveillance cameras is proposed, and we have
shown its effects through software simulation. The experimental results show
that our proposed method requires 69.5% less bpp (bits per pixel) than the
conventional algorithm H.265 to achieve the same PSNR (36 dB) on the HECV
dataset.
A GAN-based Tunable Image Compression System
Lirong Wu , Kejie Huang , Haibin Shen Subjects : Computer Vision and Pattern Recognition (cs.CV)
The method of importance map has been widely adopted in DNN-based lossy image
compression to achieve bit allocation according to the importance of image
contents. However, insufficient allocation of bits in non-important regions
often leads to severe distortion at low bpp (bits per pixel), which hampers the
development of efficient content-weighted image compression systems. This paper
rethinks content-based compression by using Generative Adversarial Network
(GAN) to reconstruct the non-important regions. Moreover, multiscale pyramid
decomposition is applied to both the encoder and the discriminator to achieve
global compression of high-resolution images. A tunable compression scheme is
also proposed in this paper to compress an image to any specific compression
ratio without retraining the model. The experimental results show that our
proposed method improves MS-SSIM by more than 10.3% compared to the recently
reported GAN-based method to achieve the same low bpp (0.05) on the Kodak
dataset.
Harmonic Convolutional Networks based on Discrete Cosine Transform
Comments: arXiv admin note: substantial text overlap with arXiv:1812.03205
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
Convolutional neural networks (CNNs) learn filters in order to capture local
correlation patterns in feature space. In this paper we propose to revert to
learning combinations of preset spectral filters by switching to CNNs with
harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT)
filters which have excellent energy compaction properties and are widely used
for image compression. The proposed harmonic blocks rely on DCT-modeling and
replace conventional convolutional layers to produce partially or fully
harmonic versions of new or existing CNN architectures. We demonstrate how the
harmonic networks can be efficiently compressed in a straightforward manner by
truncating high-frequency information in harmonic blocks which is possible due
to the redundancies in the spectral domain. We report extensive experimental
validation demonstrating the benefits of the introduction of harmonic blocks
into state-of-the-art CNN models in image classification, segmentation and edge
detection applications.
Media Forensics and DeepFakes: an overview
Luisa Verdoliva Subjects : Computer Vision and Pattern Recognition (cs.CV)
With the rapid progress of recent years, techniques that generate and
manipulate multimedia content can now guarantee a very advanced level of
realism. The boundary between real and synthetic media has become very thin. On
the one hand, this opens the door to a series of exciting applications in
different fields such as creative arts, advertising, film production, video
games. On the other hand, it poses enormous security threats. Software packages
freely available on the web allow any individual, without special skills, to
create very realistic fake images and videos. So-called deepfakes can be used
to manipulate public opinion during elections, commit fraud, discredit or
blackmail people. Potential abuses are limited only by human imagination.
Therefore, there is an urgent need for automated tools capable of detecting
false multimedia content and avoiding the spread of dangerous false
information. This review paper aims to present an analysis of the methods for
visual media integrity verification, that is, the detection of manipulated
images and videos. Special emphasis will be placed on the emerging phenomenon
of deepfakes and, from the point of view of the forensic analyst, on modern
data-driven forensic methods. The analysis will help to highlight the limits of
current forensic tools, the most relevant issues, the upcoming challenges, and
suggest future directions for research.
Adapting Grad-CAM for Embedding Networks
Comments: WACV 2020 camera ready
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The gradient-weighted class activation mapping (Grad-CAM) method can
faithfully highlight important regions in images for deep model prediction in
image classification, image captioning and many other tasks. It uses the
gradients in back-propagation as weights (grad-weights) to explain network
decisions. However, applying Grad-CAM to embedding networks raises significant
challenges because embedding networks are trained by millions of dynamically
paired examples (e.g. triplets). To overcome these challenges, we propose an
adaptation of the Grad-CAM method for embedding networks. First, we aggregate
grad-weights from multiple training examples to improve the stability of
Grad-CAM. Then, we develop an efficient weight-transfer method to explain
decisions for any image without back-propagation. We extensively validate the
method on the standard CUB200 dataset in which our method produces more
accurate visual attention than the original Grad-CAM method. We also apply the
method to a house price estimation application using images. The method
produces convincing qualitative results, showcasing the practicality of our
approach.
Temporal Interlacing Network
Comments: Accepted to AAAI 2020. Winning entry of ICCV Multi-Moments in Time Challenge 2019. Code is available at this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
For a long time, the vision community tries to learn the spatio-temporal
representation by combining convolutional neural network together with various
temporal models, such as the families of Markov chain, optical flow, RNN and
temporal convolution. However, these pipelines consume enormous computing
resources due to the alternately learning process for spatial and temporal
information. One natural question is whether we can embed the temporal
information into the spatial one so the information in the two domains can be
jointly learned once-only. In this work, we answer this question by presenting
a simple yet powerful operator — temporal interlacing network (TIN). Instead
of learning the temporal features, TIN fuses the two kinds of information by
interlacing spatial representations from the past to the future, and vice
versa. A differentiable interlacing target can be learned to control the
interlacing process. In this way, a heavy temporal model is replaced by a
simple interlacing operator. We theoretically prove that with a learnable
interlacing target, TIN performs equivalently to the regularized temporal
convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on
6 challenging benchmarks. These results push the state-of-the-art performances
of video understanding by a considerable margin. Not surprising, the ensemble
model of the proposed TIN won the (1^{st}) place in the ICCV19 – Multi Moments
in Time challenge. Code is made available to facilitate further research at
this https URLFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn , David Berthelot , Chun-Liang Li , Zizhao Zhang , Nicholas Carlini , Ekin D. Cubuk , Alex Kurakin , Han Zhang , Colin Raffel Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Semi-supervised learning (SSL) provides an effective means of leveraging
unlabeled data to improve a model’s performance. In this paper, we demonstrate
the power of a simple combination of two common SSL methods: consistency
regularization and pseudo-labeling. Our algorithm, FixMatch, first generates
pseudo-labels using the model’s predictions on weakly-augmented unlabeled
images. For a given image, the pseudo-label is only retained if the model
produces a high-confidence prediction. The model is then trained to predict the
pseudo-label when fed a strongly-augmented version of the same image. Despite
its simplicity, we show that FixMatch achieves state-of-the-art performance
across a variety of standard semi-supervised learning benchmarks, including
94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 — just
4 labels per class. Since FixMatch bears many similarities to existing SSL
methods that achieve worse performance, we carry out an extensive ablation
study to tease apart the experimental factors that are most important to
FixMatch’s success. We make our code available at
Generate High-Resolution Adversarial Samples by Identifying Effective Features
Sizhe Chen , Peidong Zhang , Chengjin Sun , Jia Cai , Xiaolin Huang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
As the prevalence of deep learning in computer vision, adversarial samples
that weaken the neural networks emerge in large numbers, revealing their
deep-rooted defects. Most adversarial attacks calculate an imperceptible
perturbation in image space to fool the DNNs. In this strategy, the
perturbation looks like noise and thus could be mitigated. Attacks in feature
space produce semantic perturbation, but they could only deal with low
resolution samples. The reason lies in the great number of coupled features to
express a high-resolution image. In this paper, we propose Attack by
Identifying Effective Features (AIEF), which learns different weights for
features to attack. Effective features, those with great weights, influence the
victim model much but distort the image little, and thus are more effective for
attack. By attacking mostly on them, AIEF produces high resolution adversarial
samples with acceptable distortions. We demonstrate the effectiveness of AIEF
by attacking on different tasks with different generative models.
batchboost: regularization for stabilizing training with resistance to underfitting & overfitting
Comments: 6 pages; 5 figures
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Overfitting & underfitting and stable training are an important challenges in
machine learning. Current approaches for these issues are mixup, SamplePairing
and BC learning. In our work, we state the hypothesis that mixing many images
together can be more effective than just two. Batchboost pipeline has three
stages: (a) pairing: method of selecting two samples. (b) mixing: how to create
a new one from two samples. (c) feeding: combining mixed samples with new ones
from dataset into batch (with ratio (gamma)). Note that sample that appears in
our batch propagates with subsequent iterations with less and less importance
until the end of training. Pairing stage calculates the error per sample, sorts
the samples and pairs with strategy: hardest with easiest one, than mixing
stage merges two samples using mixup, (x_1 + (1-lambda)x_2). Finally, feeding
stage combines new samples with mixed by ratio 1:1. Batchboost has 0.5-3%
better accuracy than the current state-of-the-art mixup regularization on
CIFAR-10 & Fashion-MNIST. Our method is slightly better than SamplePairing
technique on small datasets (up to 5%). Batchboost provides stable training on
not tuned parameters (like weight decay), thus its a good method to test
performance of different architectures. Source code is at:
this https URLCut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
Comments: 8 pages, 3 figures, Association for the Advancement of Artificial Intelligence (AAAI2020). arXiv admin note: substantial text overlap with arXiv:1907.01709
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Conventional sequential learning methods such as Recurrent Neural Networks
(RNNs) focus on interactions between consecutive inputs, i.e. first-order
Markovian dependency. However, most of sequential data, as seen with videos,
have complex dependency structures that imply variable-length semantic flows
and their compositions, and those are hard to be captured by conventional
methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for
learning video data by discovering these complex structures of the video. The
CB-GLNs represent video data as a graph, with nodes and edges corresponding to
frames of the video and their dependencies respectively. The CB-GLNs find
compositional dependencies of the data in multilevel graph forms via a
parameterized kernel with graph-cut and a message passing framework. We
evaluate the proposed method on the two different tasks for video
understanding: Video theme classification (Youtube-8M dataset) and Video
Question and Answering (TVQA dataset). The experimental results show that our
model efficiently learns the semantic compositional structure of video data.
Furthermore, our model achieves the highest performance in comparison to other
baseline methods.
Yadong Zhang , Xin Chen Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Time series motifs play an important role in the time series analysis. The
motif-based time series clustering is used for the discovery of higher-order
patterns or structures in time series data. Inspired by the convolutional
neural network (CNN) classifier based on the image representations of time
series, motif difference field (MDF) is proposed. Compared to other image
representations of time series, MDF is simple and easy to construct. With the
Fully Convolution Network (FCN) as the classifier, MDF demonstrates the
superior performance on the UCR time series dataset in benchmark with other
time series classification methods. It is interesting to find that the triadic
time series motifs give the best result in the test. Due to the motif
clustering reflected in MDF, the significant motifs are detected with the help
of the Gradient-weighted Class Activation Mapping (Grad-CAM). The areas in MDF
with high weight in Grad-CAM have a high contribution from the significant
motifs with the desired ordinal patterns associated with the signature patterns
in time series. However, the signature patterns cannot be identified with the
neural network classifiers directly based on the time series.
Comments: 7 pages, 13 figures, IEEE International Conference on Robotics and Automation (ICRA) 2019
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
We present joint learning of instance and semantic segmentation for visible
and occluded region masks. Sharing the feature extractor with instance
occlusion segmentation, we introduce semantic occlusion segmentation into the
instance segmentation model. This joint learning fuses the instance- and
image-level reasoning of the mask prediction on the different segmentation
tasks, which was missing in the previous work of learning instance segmentation
only (instance-only). In the experiments, we evaluated the proposed joint
learning comparing the instance-only learning on the test dataset. We also
applied the joint learning model to 2 different types of robotic pick-and-place
tasks (random and target picking) and evaluated its effectiveness to achieve
real-world robotic tasks.
Breast lesion segmentation in ultrasound images with limited annotated data
Comments: Accepted to ISBI 2020
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Ultrasound (US) is one of the most commonly used imaging modalities in both
diagnosis and surgical interventions due to its low-cost, safety, and
non-invasive characteristic. US image segmentation is currently a unique
challenge because of the presence of speckle noise. As manual segmentation
requires considerable efforts and time, the development of automatic
segmentation algorithms has attracted researchers attention. Although recent
methodologies based on convolutional neural networks have shown promising
performances, their success relies on the availability of a large number of
training data, which is prohibitively difficult for many applications.
Therefore, in this study we propose the use of simulated US images and natural
images as auxiliary datasets in order to pre-train our segmentation network,
and then to fine-tune with limited in vivo data. We show that with as little as
19 in vivo images, fine-tuning the pre-trained network improves the dice score
by 21% compared to training from scratch. We also demonstrate that if the same
number of natural and simulation US images is available, pre-training on
simulation data is preferable.
Comments: 19 pages, 5 figures, 2 tables
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph); Quantitative Methods (q-bio.QM)
Histological staining is a vital step used to diagnose various diseases and
has been used for more than a century to provide contrast to tissue sections,
rendering the tissue constituents visible for microscopic analysis by medical
experts. However, this process is time-consuming, labor-intensive, expensive
and destructive to the specimen. Recently, the ability to virtually-stain
unlabeled tissue sections, entirely avoiding the histochemical staining step,
has been demonstrated using tissue-stain specific deep neural networks. Here,
we present a new deep learning-based framework which generates
virtually-stained images using label-free tissue, where different stains are
merged following a micro-structure map defined by the user. This approach uses
a single deep neural network that receives two different sources of information
at its input: (1) autofluorescence images of the label-free tissue sample, and
(2) a digital staining matrix which represents the desired microscopic map of
different stains to be virtually generated at the same tissue section. This
digital staining matrix is also used to virtually blend existing stains,
digitally synthesizing new histological stains. We trained and blindly tested
this virtual-staining network using unlabeled kidney tissue sections to
generate micro-structured combinations of Hematoxylin and Eosin (H&E), Jones
silver stain, and Masson’s Trichrome stain. Using a single network, this
approach multiplexes virtual staining of label-free tissue with multiple types
of stains and paves the way for synthesizing new digital histological stains
that can be created on the same tissue cross-section, which is currently not
feasible with standard histochemical staining methods.
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
Comments: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 2020
Subjects:
Computation and Language (cs.CL)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
There is a perennial need in the online advertising industry to refresh ad
creatives, i.e., images and text used for enticing online users towards a
brand. Such refreshes are required to reduce the likelihood of ad fatigue among
online users, and to incorporate insights from other successful campaigns in
related product categories. Given a brand, to come up with themes for a new ad
is a painstaking and time consuming process for creative strategists.
Strategists typically draw inspiration from the images and text used for past
ad campaigns, as well as world knowledge on the brands. To automatically infer
ad themes via such multimodal sources of information in past ad campaigns, we
propose a theme (keyphrase) recommender system for ad creative strategists. The
theme recommender is based on aggregating results from a visual question
answering (VQA) task, which ingests the following: (i) ad images, (ii) text
associated with the ads as well as Wikipedia pages on the brands in the ads,
and (iii) questions around the ad. We leverage transformer based cross-modality
encoders to train visual-linguistic representations for our VQA task. We study
two formulations for the VQA task along the lines of classification and
ranking; via experiments on a public dataset, we show that cross-modal
representations lead to significantly better classification accuracy and
ranking precision-recall metrics. Cross-modal representations show better
performance compared to separate image and text representations. In addition,
the use of multimodal information shows a significant lift over using only
textual or visual information.
Learning Deformable Registration of Medical Images with Anatomical Constraints
Comments: Accepted for publication in Neural Networks (Elsevier). Source code and resulting segmentation masks for the NIH Chest-XRay14 dataset with estimated quality index available at this https URL
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV)
Deformable image registration is a fundamental problem in the field of
medical image analysis. During the last years, we have witnessed the advent of
deep learning-based image registration methods which achieve state-of-the-art
performance, and drastically reduce the required computational time. However,
little work has been done regarding how can we encourage our models to produce
not only accurate, but also anatomically plausible results, which is still an
open question in the field. In this work, we argue that incorporating
anatomical priors in the form of global constraints into the learning process
of these models, will further improve their performance and boost the realism
of the warped images after registration. We learn global non-linear
representations of image anatomy using segmentation masks, and employ them to
constraint the registration process. The proposed AC-RegNet architecture is
evaluated in the context of chest X-ray image registration using three
different datasets, where the high anatomical variability makes the task
extremely challenging. Our experiments show that the proposed anatomically
constrained registration model produces more realistic and accurate results
than state-of-the-art methods, demonstrating the potential of this approach.
A deep network for sinogram and CT image reconstruction
Wei Wang , Xiang-Gen Xia , Chuanjiang He , Zemin Ren , Jian Lu , Tianfu Wang , Baiying Lei Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
A CT image can be well reconstructed when the sampling rate of the sinogram
satisfies the Nyquist criteria and the sampled signal is noise-free. However,
in practice, the sinogram is usually contaminated by noise, which degrades the
quality of a reconstructed CT image. In this paper, we design a deep network
for sinogram and CT image reconstruction. The network consists of two cascaded
blocks that are linked by a filter backprojection (FBP) layer, where the former
block is responsible for denoising and completing the sinograms while the
latter is used to removing the noise and artifacts of the CT images.
Experimental results show that the reconstructed CT images by our methods have
the highest PSNR and SSIM in average compared to state of the art methods.
Deep Image Clustering with Tensor Kernels and Unsupervised Companion Objectives
Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems
Subjects:
Machine Learning (stat.ML)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In this paper we develop a new model for deep image clustering, using
convolutional neural networks and tensor kernels. The proposed Deep Tensor
Kernel Clustering (DTKC) consists of a convolutional neural network (CNN),
which is trained to reflect a common cluster structure at the output of its
intermediate layers. Encouraging a consistent cluster structure throughout the
network has the potential to guide it towards meaningful clusters, even though
these clusters might appear to be nonlinear in the input space. The cluster
structure is enforced through the idea of unsupervised companion objectives,
where separate loss functions are attached to layers in the network. These
unsupervised companion objectives are constructed based on a proposed
generalization of the Cauchy-Schwarz (CS) divergence, from vectors to tensors
of arbitrary rank. Generalizing the CS divergence to tensor-valued data is a
crucial step, due to the tensorial nature of the intermediate representations
in the CNN. Several experiments are conducted to thoroughly assess the
performance of the proposed DTKC model. The results indicate that the model
outperforms, or performs comparable to, a wide range of baseline algorithms. We
also empirically demonstrate that our model does not suffer from objective
function mismatch, which can be a problematic artifact in autoencoder-based
clustering models.
An Efficient Framework for Automated Screening of Clinically Significant Macular Edema
Renoh Johnson Chalakkal , Faizal Hafiz , Waleed Abdulla , Akshya Swain Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The present study proposes a new approach to automated screening of
Clinically Significant Macular Edema (CSME) and addresses two major challenges
associated with such screenings, i.e., exudate segmentation and imbalanced
datasets. The proposed approach replaces the conventional exudate segmentation
based feature extraction by combining a pre-trained deep neural network with
meta-heuristic feature selection. A feature space over-sampling technique is
being used to overcome the effects of skewed datasets and the screening is
accomplished by a k-NN based classifier. The role of each data-processing step
(e.g., class balancing, feature selection) and the effects of limiting the
region-of-interest to fovea on the classification performance are critically
analyzed. Finally, the selection and implication of operating point on Receiver
Operating Characteristic curve are discussed. The results of this study
convincingly demonstrate that by following these fundamental practices of
machine learning, a basic k-NN based classifier could effectively accomplish
the CSME screening.
Towards Augmented Reality-based Suturing in Monocular Laparoscopic Training
Comments: Accepted for SPIE Medical Imaging 2020
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV)
Minimally Invasive Surgery (MIS) techniques have gained rapid popularity
among surgeons since they offer significant clinical benefits including reduced
recovery time and diminished post-operative adverse effects. However,
conventional endoscopic systems output monocular video which compromises depth
perception, spatial orientation and field of view. Suturing is one of the most
complex tasks performed under these circumstances. Key components of this tasks
are the interplay between needle holder and the surgical needle. Reliable 3D
localization of needle and instruments in real time could be used to augment
the scene with additional parameters that describe their quantitative geometric
relation, e.g. the relation between the estimated needle plane and its rotation
center and the instrument. This could contribute towards standardization and
training of basic skills and operative techniques, enhance overall surgical
performance, and reduce the risk of complications. The paper proposes an
Augmented Reality environment with quantitative and qualitative visual
representations to enhance laparoscopic training outcomes performed on a
silicone pad. This is enabled by a multi-task supervised deep neural network
which performs multi-class segmentation and depth map prediction. Scarcity of
labels has been conquered by creating a virtual environment which resembles the
surgical training scenario to generate dense depth maps and segmentation maps.
The proposed convolutional neural network was tested on real surgical training
scenarios and showed to be robust to occlusion of the needle. The network
achieves a dice score of 0.67 for surgical needle segmentation, 0.81 for needle
holder instrument segmentation and a mean absolute error of 6.5 mm for depth
estimation.
Gradient Surgery for Multi-Task Learning
Tianhe Yu , Saurabh Kumar , Abhishek Gupta , Sergey Levine , Karol Hausman , Chelsea Finn Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
While deep learning and deep reinforcement learning (RL) systems have
demonstrated impressive results in domains such as image classification, game
playing, and robotic control, data efficiency remains a major challenge.
Multi-task learning has emerged as a promising approach for sharing structure
across multiple tasks to enable more efficient learning. However, the
multi-task setting presents a number of optimization challenges, making it
difficult to realize large efficiency gains compared to learning tasks
independently. The reasons why multi-task learning is so challenging compared
to single-task learning are not fully understood. In this work, we identify a
set of three conditions of the multi-task optimization landscape that cause
detrimental gradient interference, and develop a simple yet general approach
for avoiding such interference between task gradients. We propose a form of
gradient surgery that projects a task’s gradient onto the normal plane of the
gradient of any other task that has a conflicting gradient. On a series of
challenging multi-task supervised and multi-task RL problems, this approach
leads to substantial gains in efficiency and performance. Further, it is
model-agnostic and can be combined with previously-proposed multi-task
architectures for enhanced performance.
Journal-ref: IEEE Transactions on Robotics ( Volume: 35 , Issue: 4 , Aug. 2019
)
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
In this work, we introduce the problem of cross-modal visuo-tactile object
recognition with robotic active exploration. With this term, we mean that the
robot observes a set of objects with visual perception and, later on, it is
able to recognize such objects only with tactile exploration, without having
touched any object before. Using a machine learning terminology, in our
application we have a visual training set and a tactile test set, or vice
versa. To tackle this problem, we propose an approach constituted by four
steps: finding a visuo-tactile common representation, defining a suitable set
of features, transferring the features across the domains, and classifying the
objects. We show the results of our approach using a set of 15 objects,
collecting 40 visual examples and five tactile examples for each object. The
proposed approach achieves an accuracy of 94.7%, which is comparable with the
accuracy of the monomodal case, i.e., when using visual data both as training
set and test set. Moreover, it performs well compared to the human ability,
which we have roughly estimated carrying out an experiment with ten
participants.
OIAD: One-for-all Image Anomaly Detection with Disentanglement Learning
Shuo Wang , Tianle Chen , Shangyu Chen , Carsten Rudolph , Surya Nepal , Marthie Grobler Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Machine Learning (stat.ML)
Anomaly detection aims to recognize samples with anomalous and unusual
patterns with respect to a set of normal data, which is significant for
numerous domain applications, e.g. in industrial inspection, medical imaging,
and security enforcement. There are two key research challenges associated with
existing anomaly detention approaches: (1) many of them perform well on
low-dimensional problems however the performance on high-dimensional instances
is limited, such as images; (2) many of them depend on often still rely on
traditional supervised approaches and manual engineering of features, while the
topic has not been fully explored yet using modern deep learning approaches,
even when the well-label samples are limited. In this paper, we propose a
One-for-all Image Anomaly Detection system (OIAD) based on disentangled
learning using only clean samples. Our key insight is that the impact of small
perturbation on the latent representation can be bounded for normal samples
while anomaly images are usually outside such bounded intervals, called
structure consistency. We implement this idea and evaluate its performance for
anomaly detention. Our experiments with three datasets show that OIAD can
detect over (90\%) of anomalies while maintaining a high low false alarm rate.
It can also detect suspicious samples from samples labeled as clean, coincided
with what humans would deem unusual.
Accelerating the Registration of Image Sequences by Spatio-temporal Multilevel Strategies
Comments: Accepted at ISBI 2020
Subjects:
Signal Processing (eess.SP)
; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Multilevel strategies are an integral part of many image registration
algorithms. These strategies are very well-known for avoiding undesirable local
minima, providing an outstanding initial guess, and reducing overall
computation time. State-of-the-art multilevel strategies build a hierarchy of
discretization in the spatial dimensions. In this paper, we present a
spatio-temporal strategy, where we introduce a hierarchical discretization in
the temporal dimension at each spatial level. This strategy is suitable for a
motion estimation problem where the motion is assumed smooth over time. Our
strategy exploits the temporal smoothness among image frames by following a
predictor-corrector approach. The strategy predicts the motion by a novel
interpolation method and later corrects it by registration. The prediction step
provides a good initial guess for the correction step, hence reduces the
overall computational time for registration. The acceleration is achieved by a
factor of 2.5 on average, over the state-of-the-art multilevel methods on three
examined optical coherence tomography datasets.
CHAOS Challenge — Combined (CT-MR) Healthy Abdominal Organ Segmentation
Comments: 10 pages, 2 figures
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV)
Segmentation of abdominal organs has been a comprehensive, yet unresolved,
research field for many years. In the last decade, intensive developments in
deep learning (DL) have introduced new state-of-the-art segmentation systems.
Despite outperforming the overall accuracy of existing systems, the effects of
DL model properties and parameters on the performance is hard to interpret.
This makes comparative analysis a necessary tool to achieve explainable studies
and systems. Moreover, the performance of DL for emerging learning approaches
such as cross-modality and multi-modal tasks have been rarely discussed. In
order to expand the knowledge in these topics, CHAOS — Combined (CT-MR)
Healthy Abdominal Organ Segmentation challenge has been organized in the IEEE
International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy.
Despite a large number of the previous abdomen related challenges, the majority
of which are focused on tumor/lesion detection and/or classification with a
single modality, CHAOS provides both abdominal CT and MR data from healthy
subjects. Five different and complementary tasks have been designed to analyze
the capabilities of the current approaches from multiple perspectives. The
results are investigated thoroughly, compared with manual annotations and
interactive methods. The outcomes are reported in detail to reflect the latest
advancements in the field. CHAOS challenge and data will be available online to
provide a continuous benchmark resource for segmentation.
Artificial Intelligence
Stochastic Finite State Control of POMDPs with LTL Specifications
Mohamadreza Ahmadi , Rangoli Sharan , Joel W. Burdick Subjects : Artificial Intelligence (cs.AI) ; Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Partially observable Markov decision processes (POMDPs) provide a modeling
framework for autonomous decision making under uncertainty and imperfect
sensing, e.g. robot manipulation and self-driving cars. However, optimal
control of POMDPs is notoriously intractable. This paper considers the
quantitative problem of synthesizing sub-optimal stochastic finite state
controllers (sFSCs) for POMDPs such that the probability of satisfying a set of
high-level specifications in terms of linear temporal logic (LTL) formulae is
maximized. We begin by casting the latter problem into an optimization and use
relaxations based on the Poisson equation and McCormick envelopes. Then, we
propose an stochastic bounded policy iteration algorithm, leading to a
controlled growth in sFSC size and an any time algorithm, where the performance
of the controller improves with successive iterations, but can be stopped by
the user based on time or memory considerations. We illustrate the proposed
method by a robot navigation case study.
Adequate and fair explanations
Nicholas Asher , Soumya Paul , Chris Russell Subjects : Artificial Intelligence (cs.AI)
Explaining sophisticated machine-learning based systems is an important issue
at the foundations of AI. Recent efforts have shown various methods for
providing explanations. These approaches can be broadly divided into two
schools: those that provide a local and human interpreatable approximation of a
machine learning algorithm, and logical approaches that exactly characterise
one aspect of the decision. In this paper we focus upon the second school of
exact explanations with a rigorous logical foundation. There is an
epistemological problem with these exact methods. While they can furnish
complete explanations, such explanations may be too complex for humans to
understand or even to write down in human readable form. Interpretability
requires epistemically accessible explanations, explanations humans can grasp.
Yet what is a sufficiently complete epistemically accessible explanation still
needs clarification. We do this here in terms of counterfactuals, following
[Wachter et al., 2017]. With counterfactual explanations, many of the
assumptions needed to provide a complete explanation are left implicit. To do
so, counterfactual explanations exploit the properties of a particular data
point or sample, and as such are also local as well as partial explanations. We
explore how to move from local partial explanations to what we call complete
local explanations and then to global ones. But to preserve accessibility we
argue for the need for partiality. This partiality makes it possible to hide
explicit biases present in the algorithm that may be injurious or unfair.We
investigate how easy it is to uncover these biases in providing complete and
fair explanations by exploiting the structure of the set of counterfactuals
providing a complete local explanation.
Implementations in Machine Ethics: A Survey
Comments: 37 pages, 7 tables, 4 figures, draft version
Subjects:
Artificial Intelligence (cs.AI)
Increasingly complex and autonomous systems require machine ethics to
maximize the benefits and minimize the risks to society arising from the new
technology. It is challenging to decide which type of ethical theory to employ
and how to implement it effectively. This survey provides a threefold
contribution. Firstly, it introduces a taxonomy to analyze the field of machine
ethics from an ethical, implementational, and technical perspective. Secondly,
an exhaustive selection and description of relevant works is presented.
Thirdly, applying the new taxonomy to the selected works, dominant research
patterns and lessons for the field are identified, and future directions for
research are suggested.
Channels' Confirmation and Predictions' Confirmation: from the Medical Test to the Raven Paradox
Comments: 12 tables, 7 figures
Subjects:
Artificial Intelligence (cs.AI)
; Information Theory (cs.IT); Logic (math.LO)
After long arguments between positivism and falsificationism, the
verification of universal hypotheses was replaced with the confirmation of
uncertain major premises. Unfortunately, Hemple discovered the Raven Paradox
(RP). Then, Carnap used the logical probability increment as the confirmation
measure. So far, many confirmation measures have been proposed. Measure F among
them proposed by Kemeny and Oppenheim possesses symmetries and asymmetries
proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and
normalizing property suggested by many researchers. Based on the semantic
information theory, a measure b* similar to F is derived from the medical test.
Like the likelihood ratio, b* and F can only indicate the quality of channels
or the testing means instead of the quality of probability predictions. And, it
is still not easy to use b*, F, or another measure to clarify the RP. For this
reason, measure c* similar to the correct rate is derived. The c* has the
simple form: (a-c)/max(a, c); it supports the Nicod Criterion and undermines
the Equivalence Condition, and hence, can be used to eliminate the RP. Some
examples are provided to show why it is difficult to use one of popular
confirmation measures to eliminate the RP. Measure F, b*, and c* indicate that
fewer counterexamples’ existence is more essential than more positive examples’
existence, and hence, are compatible with Popper’s falsification thought.
AI Trust in business processes: The need for process-aware explanations
Steve T.K. Jan , Vatche Ishakian , Vinod Muthusamy Subjects : Artificial Intelligence (cs.AI)
Business processes underpin a large number of enterprise operations including
processing loan applications, managing invoices, and insurance claims. There is
a large opportunity for infusing AI to reduce cost or provide better customer
experience, and the business process management (BPM) literature is rich in
machine learning solutions including unsupervised learning to gain insights on
clusters of process traces, classification models to predict the outcomes,
duration, or paths of partial process traces, extracting business process from
documents, and models to recommend how to optimize a business process or
navigate decision points. More recently, deep learning models including those
from the NLP domain have been applied to process predictions.
Unfortunately, very little of these innovations have been applied and adopted
by enterprise companies. We assert that a large reason for the lack of adoption
of AI models in BPM is that business users are risk-averse and do not
implicitly trust AI models. There has, unfortunately, been little attention
paid to explaining model predictions to business users with process context. We
challenge the BPM community to build on the AI interpretability literature, and
the AI Trust community to understand
Nicolas Aussel (INF, ACMES-SAMOVAR, IP Paris), Sophie Chabridon (IP Paris, INF, ACMES-SAMOVAR), Yohan Petetin (TIPIC-SAMOVAR, CITI, IP Paris) Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Machine Learning has proven useful in the recent years as a way to achieve
failure prediction for industrial systems. However, the high computational
resources necessary to run learning algorithms are an obstacle to its
widespread application. The sub-field of Distributed Learning offers a solution
to this problem by enabling the use of remote resources but at the expense of
introducing communication costs in the application that are not always
acceptable. In this paper, we propose a distributed learning approach able to
optimize the use of computational and communication resources to achieve
excellent learning model performances through a centralized architecture. To
achieve this, we present a new centralized distributed learning algorithm that
relies on the learning paradigms of Active Learning and Federated Learning to
offer a communication-efficient method that offers guarantees of model
precision on both the clients and the central server. We evaluate this method
on a public benchmark and show that its performances in terms of precision are
very close to state-of-the-art performance level of non-distributed learning
despite additional constraints.
A multi-agent ontologies-based clinical decision support system
Comments: in French
Journal-ref: AMINA’2012, Jan 2012, Mahdia, Tunisie
Subjects:
Artificial Intelligence (cs.AI)
Clinical decision support systems combine knowledge and data from a variety
of sources, represented by quantitative models based on stochastic methods, or
qualitative based rather on expert heuristics and deductive reasoning. At the
same time, case-based reasoning (CBR) memorizes and returns the experience of
solving similar problems. The cooperation of heterogeneous clinical knowledge
bases (knowledge objects, semantic distances, evaluation functions, logical
rules, databases…) is based on medical ontologies. A multi-agent decision
support system (MADSS) enables the integration and cooperation of agents
specialized in different fields of knowledge (semiology, pharmacology, clinical
cases, etc.). Each specialist agent operates a knowledge base defining the
conduct to be maintained in conformity with the state of the art associated
with an ontological basis that expresses the semantic relationships between the
terms of the domain in question. Our approach is based on the specialization of
agents adapted to the knowledge models used during the clinical steps and
ontologies. This modular approach is suitable for the realization of MADSS in
many areas.
On Algorithmic Decision Procedures in Emergency Response Systems in Smart and Connected Communities
Comments: Accepted at AAMAS 2020 (International Conference on Autonomous Agents and Multiagent Systems)
Subjects:
Artificial Intelligence (cs.AI)
Emergency Response Management (ERM) is a critical problem faced by
communities across the globe. Despite its importance, it is common for ERM
systems to follow myopic and straight-forward decision policies in the real
world. Principled approaches to aid decision-making under uncertainty have been
explored in this context but have failed to be accepted into real systems. We
identify a key issue impeding their adoption – algorithmic approaches to
emergency response focus on reactive, post-incident dispatching actions, i.e.
optimally dispatching a responder after incidents occur. However, the critical
nature of emergency response dictates that when an incident occurs, first
responders always dispatch the closest available responder to the incident. We
argue that the crucial period of planning for ERM systems is not post-incident,
but between incidents. However, this is not a trivial planning problem – a
major challenge with dynamically balancing the spatial distribution of
responders is the complexity of the problem. An orthogonal problem in ERM
systems is to plan under limited communication, which is particularly important
in disaster scenarios that affect communication networks. We address both the
problems by proposing two partially decentralized multi-agent planning
algorithms that utilize heuristics and the structure of the dispatch problem.
We evaluate our proposed approach using real-world data, and find that in
several contexts, dynamic re-balancing the spatial distribution of emergency
responders reduces both the average response time as well as its variance.
Sampling and Learning for Boolean Function
Chuyu Xiong Subjects : Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
In this article, we continue our study on universal learning machine by
introducing new tools. We first discuss boolean function and boolean circuit,
and we establish one set of tools, namely, fitting extremum and proper sampling
set. We proved the fundamental relationship between proper sampling set and
complexity of boolean circuit. Armed with this set of tools, we then introduce
much more effective learning strategies. We show that with such learning
strategies and learning dynamics, universal learning can be achieved, and
requires much less data.
AutoMATES: Automated Model Assembly from Text, Equations, and Software
Comments: 8 pages, 6 figures, accepted to Modeling the World’s Systems 2019
Subjects:
Artificial Intelligence (cs.AI)
; Multimedia (cs.MM); Software Engineering (cs.SE)
Models of complicated systems can be represented in different ways – in
scientific papers, they are represented using natural language text as well as
equations. But to be of real use, they must also be implemented as software,
thus making code a third form of representing models. We introduce the
AutoMATES project, which aims to build semantically-rich unified
representations of models from scientific code and publications to facilitate
the integration of computational models from different domains and allow for
modeling large, complicated systems that span multiple domains and levels of
abstraction.
Towards Social Identity in Socio-Cognitive Agents
Diogo Rato , Samuel Mascarenhas , Rui Prada Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Current architectures for social agents are designed around some specific
units of social behaviour that address particular challenges. Although their
performance might be adequate for controlled environments, deploying these
agents in the wild is difficult. Moreover, the increasing demand for autonomous
agents capable of living alongside humans calls for the design of more robust
social agents that can cope with diverse social situations. We believe that to
design such agents, their sociality and cognition should be conceived as one.
This includes creating mechanisms for constructing social reality as an
interpretation of the physical world with social meanings and selective
deployment of cognitive resources adequate to the situation. We identify
several design principles that should be considered while designing agent
architectures for socio-cognitive systems. Taking these remarks into account,
we propose a socio-cognitive agent model based on the concept of Cognitive
Social Frames that allow the adaptation of an agent’s cognition based on its
interpretation of its surroundings, its Social Context. Our approach supports
an agent’s reasoning about other social actors and its relationship with them.
Cognitive Social Frames can be built around social groups, and form the basis
for social group dynamics mechanisms and construct of Social Identity.
The Incentives that Shape Behaviour
Comments: 12 pages, 7 figures, accepted to SafeAI workshop at AAAI
Subjects:
Artificial Intelligence (cs.AI)
; Machine Learning (cs.LG)
Which variables does an agent have an incentive to control with its decision,
and which variables does it have an incentive to respond to? We formalise these
incentives, and demonstrate unique graphical criteria for detecting them in any
single decision causal influence diagram. To this end, we introduce structural
causal influence models, a hybrid of the influence diagram and structural
causal model frameworks. Finally, we illustrate how these incentives predict
agent incentives in both fairness and AI safety applications.
Measuring Diversity of Artificial Intelligence Conferences
Ana Freire , Lorenzo Porcaro , Emilia Gómez Subjects : Artificial Intelligence (cs.AI)
The lack of diversity of the Artificial Intelligence (AI) field is nowadays a
concern, and several initiatives such as funding schemes and mentoring programs
have been designed to fight against it. However, there is no indication on how
these initiatives actually impact AI diversity in the short and long term. This
work studies the concept of diversity in this particular context and proposes a
small set of diversity indicators (i.e. indexes) of AI scientific events. These
indicators are designed to quantify the lack of diversity of the AI field and
monitor its evolution. We consider diversity in terms of gender, geographical
location and business (understood as the presence of academia versus industry).
We compute these indicators for the different communities of a conference:
authors, keynote speakers and organizing committee. From these components we
compute a summarized diversity indicator for each AI event. We evaluate the
proposed indexes for a set of recent major AI conferences and we discuss their
values and limitations.
MOEA/D with Random Partial Update Strategy
Yuri Lavinas , Claus Aranha , Marcelo Ladeira , Felipe Campelo Subjects : Artificial Intelligence (cs.AI)
Recent studies on resource allocation suggest that some subproblems are more
important than others in the context of the MOEA/D, and that focusing on the
most relevant ones can consistently improve the performance of that algorithm.
These studies share the common characteristic of updating only a fraction of
the population at any given iteration of the algorithm. In this work we
investigate a new, simpler partial update strategy, in which a random subset of
solutions is selected at every iteration. The performance of the MOEA/D using
this new resource allocation approach is compared experimentally against that
of the standard MOEA/D-DE and the MOEA/D with relative improvement-based
resource allocation. The results indicate that using the MOEA/D with this new
partial update strategy results in improved HV and IGD values, and a much
higher proportion of non-dominated solutions, particularly as the number of
updated solutions at every iteration is reduced.
A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions
Amit Kumar Mondal , Nadeem Jamali Subjects : Artificial Intelligence (cs.AI)
Reinforcement learning is one of the core components in designing an
artificial intelligent system emphasizing real-time response. Reinforcement
learning influences the system to take actions within an arbitrary environment
either having previous knowledge about the environment model or not. In this
paper, we present a comprehensive study on Reinforcement Learning focusing on
various dimensions including challenges, the recent development of different
state-of-the-art techniques, and future directions. The fundamental objective
of this paper is to provide a framework for the presentation of available
methods of reinforcement learning that is informative enough and simple to
follow for the new researchers and academics in this domain considering the
latest concerns. First, we illustrated the core techniques of reinforcement
learning in an easily understandable and comparable way. Finally, we analyzed
and depicted the recent developments in reinforcement learning approaches. My
analysis pointed out that most of the models focused on tuning policy values
rather than tuning other things in a particular state of reasoning.
Correcting Knowledge Base Assertions
Comments: Accepted by The Web Conference (WWW) 2020
Subjects:
Artificial Intelligence (cs.AI)
The usefulness and usability of knowledge bases (KBs) is often limited by
quality issues. One common issue is the presence of erroneous assertions, often
caused by lexical or semantic confusion. We study the problem of correcting
such assertions, and present a general correction framework which combines
lexical matching, semantic embedding, soft constraint mining and semantic
consistency checking. The framework is evaluated using DBpedia and an
enterprise medical KB.
FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback
Comments: Accepted as Full Paper to International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2020
Subjects:
Artificial Intelligence (cs.AI)
Reinforcement learning has been successful in training autonomous agents to
accomplish goals in complex environments. Although this has been adapted to
multiple settings, including robotics and computer games, human players often
find it easier to obtain higher rewards in some environments than reinforcement
learning algorithms. This is especially true of high-dimensional state spaces
where the reward obtained by the agent is sparse or extremely delayed. In this
paper, we seek to effectively integrate feedback signals supplied by a human
operator with deep reinforcement learning algorithms in high-dimensional state
spaces. We call this FRESH (Feedback-based REward SHaping). During training, a
human operator is presented with trajectories from a replay buffer and then
provides feedback on states and actions in the trajectory. In order to
generalize feedback signals provided by the human operator to previously unseen
states and actions at test-time, we use a feedback neural network. We use an
ensemble of neural networks with a shared network architecture to represent
model uncertainty and the confidence of the neural network in its output. The
output of the feedback neural network is converted to a shaping reward that is
augmented to the reward provided by the environment. We evaluate our approach
on the Bowling and Skiing Atari games in the arcade learning environment.
Although human experts have been able to achieve high scores in these
environments, state-of-the-art deep learning algorithms perform poorly. We
observe that FRESH is able to achieve much higher scores than state-of-the-art
deep learning algorithms in both environments. FRESH also achieves a 21.4%
higher score than a human expert in Bowling and does as well as a human expert
in Skiing.
Fair Transfer of Multiple Style Attributes in Text
Karan Dabas , Nishtha Madan , Vijay Arya , Sameep Mehta , Gautam Singh , Tanmoy Chakraborty Subjects : Artificial Intelligence (cs.AI)
To preserve anonymity and obfuscate their identity on online platforms users
may morph their text and portray themselves as a different gender or
demographic. Similarly, a chatbot may need to customize its communication style
to improve engagement with its audience. This manner of changing the style of
written text has gained significant attention in recent years. Yet these past
research works largely cater to the transfer of single style attributes. The
disadvantage of focusing on a single style alone is that this often results in
target text where other existing style attributes behave unpredictably or are
unfairly dominated by the new style. To counteract this behavior, it would be
nice to have a style transfer mechanism that can transfer or control multiple
styles simultaneously and fairly. Through such an approach, one could obtain
obfuscated or written text incorporated with a desired degree of multiple soft
styles such as female-quality, politeness, or formalness.
In this work, we demonstrate that the transfer of multiple styles cannot be
achieved by sequentially performing multiple single-style transfers. This is
because each single style-transfer step often reverses or dominates over the
style incorporated by a previous transfer step. We then propose a neural
network architecture for fairly transferring multiple style attributes in a
given text. We test our architecture on the Yelp data set to demonstrate our
superior performance as compared to existing one-style transfer steps performed
in a sequence.
Qi Zhou , Jingjie Zhu , Junwen Zhang , Zhensheng Jia , Bernardo Huberman , Gee-Kung Chang Subjects : Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
A novel intelligent bandwidth allocation scheme in NG-EPON using
reinforcement learning is proposed and demonstrated for latency management. We
verify the capability of the proposed scheme under both fixed and dynamic
traffic loads scenarios to achieve <1ms average latency. The RL agent
demonstrates an efficient intelligent mechanism to manage the latency, which
provides a promising IBA solution for the next-generation access network.
Deceptive AI Explanations: Creation and Detection
Johannes Schneider , Joshua Handali , Michalis Vlachos , Christian Meske Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Artificial intelligence comes with great opportunities and but also great
risks. We investigate to what extent deep learning can be used to create and
detect deceptive explanations that either aim to lure a human into believing a
decision that is not truthful to the model or provide reasoning that is
non-faithful to the decision. Our theoretical insights show some limits of
deception and detection in the absence of domain knowledge. For empirical
evaluation, we focus on text classification. To create deceptive explanations,
we alter explanations originating from GradCAM, a state-of-art technique for
creating explanations in neural networks. We evaluate the effectiveness of
deceptive explanations on 200 participants. Our findings indicate that
deceptive explanations can indeed fool humans. Our classifier can detect even
seemingly minor attempts of deception with accuracy that exceeds 80\% given
sufficient domain knowledge encoded in the form of training data.
Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning
Comments: Published at SIGDIAL 2019
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Learning suitable and well-performing dialogue behaviour in statistical
spoken dialogue systems has been in the focus of research for many years. While
most work which is based on reinforcement learning employs an objective measure
like task success for modelling the reward signal, we use a reward based on
user satisfaction estimation. We propose a novel estimator and show that it
outperforms all previous estimators while learning temporal dependencies
implicitly. Furthermore, we apply this novel user satisfaction estimation model
live in simulated experiments where the satisfaction estimation model is
trained on one domain and applied in many other domains which cover a similar
task. We show that applying this model results in higher estimated
satisfaction, similar task success rates and a higher robustness to noise.
Turing analogues of Gödel statements and computability of intelligence
Comments: This is rewrite from a completely mathematical viewpoint of arXiv:1810.06985 , which is to be withdrawn
Subjects:
Logic in Computer Science (cs.LO)
; Artificial Intelligence (cs.AI)
We show that there is a mathematical obstruction to complete Turing
computability of intelligence. This obstruction can be circumvented only if
human reasoning is fundamentally unsound. The most compelling original argument
for existence of such an obstruction was proposed by Penrose, however Gödel,
Turing and Lucas have also proposed such arguments. We first partially
reformulate the argument of Penrose. In this formulation we argue that his
argument works up to possibility of construction of a certain Gödel
statement. We then completely re-frame the argument in the language of Turing
machines, and by partially defining our subject just enough, we show that a
certain analogue of a Gödel statement, or a Gödel string as we call it in
the language of Turing machines, can be readily constructed directly, without
appeal to the Gödel incompleteness theorem, and thus removing the final
objection.
Provenance for the Description Logic ELHr
Comments: 24 pages
Subjects:
Logic in Computer Science (cs.LO)
; Artificial Intelligence (cs.AI)
We address the problem of handling provenance information in ELHr ontologies.
We consider a setting recently introduced for ontology-based data access, based
on semirings and extending classical data provenance, in which ontology axioms
are annotated with provenance tokens. A consequence inherits the provenance of
the axioms involved in deriving it, yielding a provenance polynomial as an
annotation. We analyse the semantics for the ELHr case and show that the
presence of conjunctions poses various difficulties for handling provenance,
some of which are mitigated by assuming multiplicative idempotency of the
semiring. Under this assumption, we study three problems: ontology completion
with provenance, computing the set of relevant axioms for a consequence, and
query answering.
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping
Eugenio Bargiacchi , Timothy Verstraeten , Diederik M. Roijers , Ann Nowé Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
We present a new model-based reinforcement learning algorithm, Cooperative
Prioritized Sweeping, for efficient learning in multi-agent Markov decision
processes. The algorithm allows for sample-efficient learning on large problems
by exploiting a factorization to approximate the value function. Our approach
only requires knowledge about the structure of the problem in the form of a
dynamic decision network. Using this information, our method learns a model of
the environment and performs temporal difference updates which affect multiple
joint states and actions at once. Batch updates are additionally performed
which efficiently back-propagate knowledge throughout the factored Q-function.
Our method outperforms the state-of-the-art algorithm sparse cooperative
Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized
environments.
Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems
Vevake Balaraman , Bernardo Magnini Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
In task-oriented dialogue systems the dialogue state tracker (DST) component
is responsible for predicting the state of the dialogue based on the dialogue
history. Current DST approaches rely on a predefined domain ontology, a fact
that limits their effective usage for large scale conversational agents, where
the DST constantly needs to be interfaced with ever-increasing services and
APIs. Focused towards overcoming this drawback, we propose a domain-aware
dialogue state tracker, that is completely data-driven and it is modeled to
predict for dynamic service schemas. The proposed model utilizes domain and
slot information to extract both domain and slot specific representations for a
given dialogue, and then uses such representations to predict the values of the
corresponding slot. Integrating this mechanism with a pretrained language model
(i.e. BERT), our approach can effectively learn semantic relations.
Node Masking: Making Graph Neural Networks Generalize and Scale Better
Pushkar Mishra , Aleksandra Piktus , Gerard Goossen , Fabrizio Silvestri Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Graph Neural Networks (GNNs) have received a lot of interest in the recent
times. From the early spectral architectures that could only operate on
undirected graphs per a transductive learning paradigm to the current state of
the art spatial ones that can apply inductively to arbitrary graphs, GNNs have
seen significant contributions from the research community. In this paper, we
discuss some theoretical tools to better visualize the operations performed by
state of the art spatial GNNs. We analyze the inner workings of these
architectures and introduce a simple concept, node masking, that allows them to
generalize and scale better. To empirically validate the theory, we perform
several experiments on two widely used benchmark datasets for node
classification in both transductive and inductive settings.
Engineering AI Systems: A Research Agenda
Comments: 13 pages, 3 figures, highlights section
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Deploying machine-, and in particular deep-learning, (ML/DL) solutions in
industry-strength, production quality contexts proves to challenging. This
requires a structured engineering approach to constructing and evolving systems
that contain ML/DL components. In this paper, we provide a conceptualization of
the typical evolution patterns that companies experience when employing ML/DL
well as a framework for integrating ML/DL components in systems consisting of
multiple types of components. In addition, we provide an overview of the
engineering challenges surrounding AI/ML/DL solutions and, based on that, we
provide a research agenda and overview of open items that need to be addressed
by the research community at large.
Unsupervisedly Learned Representations: Should the Quest be Over?
Comments: submitted to Pattern Recognition Letters
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI)
There exists a Classification accuracy gap of about 20% between our best
methods of generating Unsupervisedly Learned Representations and the accuracy
rates achieved by (naturally Unsupervisedly Learning) humans. We are at our
fourth decade at least in search of this class of paradigms. It thus may well
be that we are looking in the wrong direction. We present in this paper a
possible solution to this puzzle. We demonstrate that Reinforcement Learning
schemes can learn representations, which may be used for Pattern Recognition
tasks such as Classification, achieving practically the same accuracy as that
of humans. Our main modest contribution lies in the observations that: a. when
applied to a real world environment (e.g. nature itself) Reinforcement Learning
does not require labels, and thus may be considered a natural candidate for the
long sought, accuracy competitive Unsupervised Learning method, and b. in
contrast, when Reinforcement Learning is applied in a simulated or symbolic
processing environment (e.g. a computer program) it does inherently require
labels and should thus be generally classified, with some exceptions, as
Supervised Learning. The corollary of these observations is that further search
for Unsupervised Learning competitive paradigms which may be trained in
simulated environments like many of those found in research and applications
may be futile.
Fast Sequence-Based Embedding with Diffusion Graphs
Comments: Source code available at: this https URL
Journal-ref: CompleNet 2018
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
A graph embedding is a representation of graph vertices in a low-dimensional
space, which approximately preserves properties such as distances between
nodes. Vertex sequence-based embedding procedures use features extracted from
linear sequences of nodes to create embeddings using a neural network. In this
paper, we propose diffusion graphs as a method to rapidly generate vertex
sequences for network embedding. Its computational efficiency is superior to
previous methods due to simpler sequence generation, and it produces more
accurate results. In experiments, we found that the performance relative to
other methods improves with increasing edge density in the graph. In a
community detection task, clustering nodes in the embedding space produces
better results compared to other sequence-based embedding methods.
Designing for the Long Tail of Machine Learning
Comments: Accepted for presentation in poster format for the ACM CHI’19 Workshop <Emerging Perspectives in Human-Centered Machine Learning>
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI)
Recent technical advances has made machine learning (ML) a promising
component to include in end user facing systems. However, user experience (UX)
practitioners face challenges in relating ML to existing user-centered design
processes and how to navigate the possibilities and constraints of this design
space. Drawing on our own experience, we characterize designing within this
space as navigating trade-offs between data gathering, model development and
designing valuable interactions for a given model performance. We suggest that
the theoretical description of how machine learning performance scales with
training data can guide designers in these trade-offs as well as having
implications for prototyping. We exemplify the learning curve’s usage by
arguing that a useful pattern is to design an initial system in a bootstrap
phase that aims to exploit the training effect of data collected at increasing
orders of magnitude.
Learning Diverse Features with Part-Level Resolution for Person Re-Identification
Comments: 8 pages, 5 figures, submitted to IEEE TCSVT
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
Learning diverse features is key to the success of person re-identification.
Various part-based methods have been extensively proposed for learning local
representations, which, however, are still inferior to the best-performing
methods for person re-identification. This paper proposes to construct a strong
lightweight network architecture, termed PLR-OSNet, based on the idea of
Part-Level feature Resolution over the Omni-Scale Network (OSNet) for achieving
feature diversity. The proposed PLR-OSNet has two branches, one branch for
global feature representation and the other branch for local feature
representation. The local branch employs a uniform partition strategy for
part-level feature resolution but produces only a single identity-prediction
loss, which is in sharp contrast to the existing part-based methods. Empirical
evidence demonstrates that the proposed PLR-OSNet achieves state-of-the-art
performance on popular person Re-ID datasets, including Market1501,
DukeMTMC-reID and CUHK03, despite its small model size.
Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
Carlos Fernandez , Foster Provost , Xintian Han Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Lack of understanding of the decisions made by model-based AI systems is an
important barrier for their adoption. We examine counterfactual explanations as
an alternative for explaining AI decisions. The counterfactual approach defines
an explanation as a set of the system’s data inputs that causally drives the
decision (meaning that removing them changes the decision) and is irreducible
(meaning that removing any subset of the inputs in the explanation does not
change the decision). We generalize previous work on counterfactual
explanations, resulting in a framework that (a) is model-agnostic, (b) can
address features with arbitrary data types, (c) is able explain decisions made
by complex AI systems that incorporate multiple models, and (d) is scalable to
large numbers of features. We also propose a heuristic procedure to find the
most useful explanations depending on the context. We contrast counterfactual
explanations with another alternative: methods that explain model predictions
by weighting features according to their importance (e.g., SHAP, LIME). This
paper presents two fundamental reasons why explaining model predictions is not
the same as explaining the decisions made using those predictions, suggesting
we should carefully consider whether importance-weight explanations are
well-suited to explain decisions made by AI systems. Specifically, we show that
(1) features that have a large importance weight for a model prediction may not
actually affect the corresponding decision, and (2) importance weights are
insufficient to communicate whether and how features influence system
decisions. We demonstrate this using several examples, including three detailed
studies using real-world data that compare the counterfactual approach with
SHAP and illustrate various conditions under which counterfactual explanations
explain data-driven decisions better than feature importance weights.
Lyceum: An efficient and scalable ecosystem for robot learning
Colin Summers , Kendall Lowrey , Aravind Rajeswaran , Siddhartha Srinivasa , Emanuel Todorov Subjects : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
We introduce Lyceum, a high-performance computational ecosystem for robot
learning. Lyceum is built on top of the Julia programming language and the
MuJoCo physics simulator, combining the ease-of-use of a high-level programming
language with the performance of native C. In addition, Lyceum has a
straightforward API to support parallel computation across multiple cores and
machines. Overall, depending on the complexity of the environment, Lyceum is
5-30x faster compared to other popular abstractions like OpenAI’s Gym and
DeepMind’s dm-control. This substantially reduces training time for various
reinforcement learning algorithms; and is also fast enough to support real-time
model predictive control through MuJoCo. The code, tutorials, and demonstration
videos can be found at: this http URL .
Dynamic Epistemic Logic Games with Epistemic Temporal Goals
Bastien Maubert , Aniello Murano , Sophie Pinchinat , François Schwarzentruber , Silvia Stranieri Subjects : Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Dynamic Epistemic Logic (DEL) is a logical framework in which one can
describe in great detail how actions are perceived by the agents, and how they
affect the world. DEL games were recently introduced as a way to define classes
of games with imperfect information where the actions available to the players
are described very precisely. This framework makes it possible to define
easily, for instance, classes of games where players can only use public
actions or public announcements. These games have been studied for reachability
objectives, where the aim is to reach a situation satisfying some epistemic
property expressed in epistemic logic; several (un)decidability results have
been established. In this work we show that the decidability results obtained
for reachability objectives extend to a much more general class of winning
conditions, namely those expressible in the epistemic temporal logic LTLK. To
do so we establish that the infinite game structures generated by DEL public
actions are regular, and we describe how to obtain finite representations on
which we rely to solve them.
An interpretable neural network model through piecewise linear approximation
Mengzhuo Guo , Qingpeng Zhang , Xiuwu Liao , Daniel Dajun Zeng Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Most existing interpretable methods explain a black-box model in a post-hoc
manner, which uses simpler models or data analysis techniques to interpret the
predictions after the model is learned. However, they (a) may derive
contradictory explanations on the same predictions given different methods and
data samples, and (b) focus on using simpler models to provide higher
descriptive accuracy at the sacrifice of prediction accuracy. To address these
issues, we propose a hybrid interpretable model that combines a piecewise
linear component and a nonlinear component. The first component describes the
explicit feature contributions by piecewise linear approximation to increase
the expressiveness of the model. The other component uses a multi-layer
perceptron to capture feature interactions and implicit nonlinearity, and
increase the prediction performance. Different from the post-hoc approaches,
the interpretability is obtained once the model is learned in the form of
feature shapes. We also provide a variant to explore higher-order interactions
among features to demonstrate that the proposed model is flexible for
adaptation. Experiments demonstrate that the proposed model can achieve good
interpretability by describing feature shapes while maintaining
state-of-the-art accuracy.
Comments: submitted; 31 pages, 10 tables and 29 figures
Subjects:
Software Engineering (cs.SE)
; Artificial Intelligence (cs.AI)
Architectural patterns provide a reusable architectural solution for commonly
recurring problems that can assist in designing software systems. In this
regard, self-awareness architectural patterns are specialized patterns that
leverage good engineering practices and experiences to help in designing
self-awareness and self-adaptation of a software system. However, domain
knowledge and engineers’ expertise that is built over time are not explicitly
linked to these patterns and the self-aware process. This linkage is important,
as it can enrich the design patterns of these systems, which consequently leads
to more effective and efficient self-aware and self-adaptive behaviours. This
paper is an introductory work that highlights the importance of synergizing
domain expertise into the self-awareness in software systems, relying on
well-defined underlying approaches. In particular, we present a holistic
framework that classifies widely known representations used to obtain and
maintain the domain expertise, documenting their nature and specifics rules
that permits different levels of synergies with self-awareness. Drawing on
such, we describe mechanisms that can enrich existing patterns with engineers’
expertise and knowledge of the domain. This, together with the framework, allow
us to codify an intuitive step-by-step methodology that guides engineer in
making design decisions when synergizing domain expertise into self-awareness
and reveal their importances, in an attempt to keep ‘engineers-in-the-loop’.
Through three case studies, we demonstrate how the enriched patterns, the
proposed framework and methodology can be applied in different domains, within
which we quantitatively compare the actual benefits of incorporating engineers’
expertise into self-awareness, at alternative levels of synergies.
A point-wise linear model reveals reasons for 30-day readmission of heart failure patients
Comments: 8 pages, 3 figures
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Heart failures in the United States cost an estimated 30.7 billion dollars
annually and predictive analysis can decrease costs due to readmission of heart
failure patients. Deep learning can predict readmissions but does not give
reasons for its predictions. Ours is the first study on a deep-learning
approach to explaining decisions behind readmission predictions. Additionally,
it provides an automatic patient stratification to explain cohorts of
readmitted patients. The new deep-learning model called a point-wise linear
model is a meta-learning machine of linear models. It generates a logistic
regression model to predict early readmission for each patient. The custom-made
prediction models allow us to analyze feature importance. We evaluated the
approach using a dataset that had 30-days readmission patients with heart
failures. This study has been submitted in PLOS ONE. In advance, we would like
to share the theoretical aspect of the point-wise linear model as a part of our
study.
SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Existing VQA datasets contain questions with varying levels of complexity.
While the majority of questions in these datasets require perception for
recognizing existence, properties, and spatial relationships of entities, a
significant portion of questions pose challenges that correspond to reasoning
tasks — tasks that can only be answered through a synthesis of perception and
knowledge about the world, logic and / or reasoning. This distinction allows us
to notice when existing VQA models have consistency issues — they answer the
reasoning question correctly but fail on associated low-level perception
questions. For example, models answer the complex reasoning question “Is the
banana ripe enough to eat?” correctly, but fail on the associated perception
question “Are the bananas mostly green or yellow?” indicating that the model
likely answered the reasoning question correctly but for the wrong reason. We
quantify the extent to which this phenomenon occurs by creating a new Reasoning
split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of
200K new perception questions which serve as sub questions corresponding to the
set of perceptual tasks needed to effectively answer the complex reasoning
questions in the Reasoning split. Additionally, we propose an approach called
Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the
model to attend do the same parts of the image when answering the reasoning
question and the perception sub questions. We show that SQuINT improves model
consistency by 7.8%, also marginally improving its performance on the Reasoning
questions in VQA, while also displaying qualitatively better attention maps.
Teaching Software Engineering for AI-Enabled Systems
Comments: to be published in ICSE-SEET 2020
Subjects:
Software Engineering (cs.SE)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Software engineers have significant expertise to offer when building
intelligent systems, drawing on decades of experience and methods for building
systems that are scalable, responsive and robust, even when built on unreliable
components. Systems with artificial-intelligence or machine-learning (ML)
components raise new challenges and require careful engineering. We designed a
new course to teach software-engineering skills to students with a background
in ML. We specifically go beyond traditional ML courses that teach modeling
techniques under artificial conditions and focus, in lecture and assignments,
on realism with large and changing datasets, robust and evolvable
infrastructure, and purposeful requirements engineering that considers ethics
and fairness as well. We describe the course and our infrastructure and share
experience and all material from teaching the course for the first time.
How do Data Science Workers Collaborate? Roles, Workflows, and Tools
Comments: CSCW’2020
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)
Today, the prominence of data science within organizations has given rise to
teams of data science workers collaborating on extracting insights from data,
as opposed to individual data scientists working alone. However, we still lack
a deep understanding of how data science workers collaborate in practice. In
this work, we conducted an online survey with 183 participants who work in
various aspects of data science. We focused on their reported interactions with
each other (e.g., managers with engineers) and with different tools (e.g.,
Jupyter Notebook). We found that data science teams are extremely collaborative
and work with a variety of stakeholders and tools during the six common steps
of a data science workflow (e.g., clean data and train model). We also found
that the collaborative practices workers employ, such as documentation, vary
according to the kinds of tools they use. Based on these findings, we discuss
design implications for supporting data science team collaborations and future
research directions.
Learning to See Analogies: A Connectionist Exploration
Comments: 191 pages, PhD thesis
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This dissertation explores the integration of learning and analogy-making
through the development of a computer program, called Analogator, that learns
to make analogies by example. By “seeing” many different analogy problems,
along with possible solutions, Analogator gradually develops an ability to make
new analogies. That is, it learns to make analogies by analogy. This approach
stands in contrast to most existing research on analogy-making, in which
typically the a priori existence of analogical mechanisms within a model is
assumed. The present research extends standard connectionist methodologies by
developing a specialized associative training procedure for a recurrent network
architecture. The network is trained to divide input scenes (or situations)
into appropriate figure and ground components. Seeing one scene in terms of a
particular figure and ground provides the context for seeing another in an
analogous fashion. After training, the model is able to make new analogies
between novel situations. Analogator has much in common with lower-level
perceptual models of categorization and recognition; it thus serves as a
unifying framework encompassing both high-level analogical learning and
low-level perception. This approach is compared and contrasted with other
computational models of analogy-making. The model’s training and generalization
performance is examined, and limitations are discussed.
Graph Ordering: Towards the Optimal by Learning
Comments: 11 pages
Subjects:
Social and Information Networks (cs.SI)
; Artificial Intelligence (cs.AI)
Graph representation learning has achieved a remarkable success in many
graph-based applications, such as node classification, link prediction, and
community detection. These models are usually designed to preserve the vertex
information at different granularity and reduce the problems in discrete space
to some machine learning tasks in continuous space. However, regardless of the
fruitful progress, for some kind of graph applications, such as graph
compression and edge partition, it is very hard to reduce them to some graph
representation learning tasks. Moreover, these problems are closely related to
reformulating a global layout for a specific graph, which is an important
NP-hard combinatorial optimization problem: graph ordering. In this paper, we
propose to attack the graph ordering problem behind such applications by a
novel learning approach. Distinguished from greedy algorithms based on
predefined heuristics, we propose a neural network model: Deep Order Network
(DON) to capture the hidden locality structure from partial vertex order sets.
Supervised by sampled partial order, DON has the ability to infer unseen
combinations. Furthermore, to alleviate the combinatorial explosion in the
training space of DON and make the efficient partial vertex order sampling , we
employ a reinforcement learning model: the Policy Network, to adjust the
partial order sampling probabilities during the training phase of DON
automatically. To this end, the Policy Network can improve the training
efficiency and guide DON to evolve towards a more effective model
automatically. Comprehensive experiments on both synthetic and real data
validate that DON-RL outperforms the current state-of-the-art heuristic
algorithm consistently. Two case studies on graph compression and edge
partitioning demonstrate the potential power of DON-RL in real applications.
Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning
Samaneh Hosseini Semnani , Hugh Liu , Michael Everett , Anton de Ruiter , Jonathan P. How Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
This paper introduces a hybrid algorithm of deep reinforcement learning (RL)
and Force-based motion planning (FMP) to solve distributed motion planning
problem in dense and dynamic environments. Individually, RL and FMP algorithms
each have their own limitations. FMP is not able to produce time-optimal paths
and existing RL solutions are not able to produce collision-free paths in dense
environments. Therefore, we first tried improving the performance of recent RL
approaches by introducing a new reward function that not only eliminates the
requirement of a pre supervised learning (SL) step but also decreases the
chance of collision in crowded environments. That improved things, but there
were still a lot of failure cases. So, we developed a hybrid approach to
leverage the simpler FMP approach in stuck, simple and high-risk cases, and
continue using RL for normal cases in which FMP can’t produce optimal path.
Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show
that the proposed algorithm outperforms both deep RL and FMP algorithms and
produces up to 50% more successful scenarios than deep RL and up to 75% less
extra time to reach goal than FMP.
The Risk to Population Health Equity Posed by Automated Decision Systems: A Narrative Review
Comments: 22 pages (12 pages excluding references), 1 figure
Subjects:
Computers and Society (cs.CY)
; Artificial Intelligence (cs.AI)
Artificial intelligence is already ubiquitous, and is increasingly being used
to autonomously make ever more consequential decisions. However, there has been
relatively little research into the consequences for equity of the use of
narrow AI and automated decision systems in medicine and public health. A
narrative review using a hermeneutic approach was undertaken to explore current
and future uses of AI in medicine and public health, issues that have emerged,
and longer-term implications for population health. Accounts in the literature
reveal a tremendous expectation on AI to transform medical and public health
practices, especially regarding precision medicine and precision public health.
Automated decisions being made about disease detection, diagnosis, treatment,
and health funding allocation have significant consequences for individual and
population health and wellbeing. Meanwhile, it is evident that issues of bias,
incontestability, and erosion of privacy have emerged in sensitive domains
where narrow AI and automated decision systems are in common use. As the use of
automated decision systems expands, it is probable that these same issues will
manifest widely in medicine and public health applications. Bias,
incontestability, and erosion of privacy are mechanisms by which existing
social, economic and health disparities are perpetuated and amplified. The
implication is that there is a significant risk that use of automated decision
systems in health will exacerbate existing population health inequities. The
industrial scale and rapidity with which automated decision systems can be
applied to whole populations heightens the risk to population health equity.
There is a need therefore to design and implement automated decision systems
with care, monitor their impact over time, and develop capacities to respond to
issues as they emerge.
Siamese Graph Neural Networks for Data Integration
Evgeny Krivosheev , Mattia Atzeni , Katsiaryna Mirylenka , Paolo Scotton , Fabio Casati Subjects : Databases (cs.DB) ; Artificial Intelligence (cs.AI)
Data integration has been studied extensively for decades and approached from
different angles. However, this domain still remains largely rule-driven and
lacks universal automation. Recent development in machine learning and in
particular deep learning has opened the way to more general and more efficient
solutions to data integration problems. In this work, we propose a general
approach to modeling and integrating entities from structured data, such as
relational databases, as well as unstructured sources, such as free text from
news articles. Our approach is designed to explicitly model and leverage
relations between entities, thereby using all available information and
preserving as much context as possible. This is achieved by combining siamese
and graph neural networks to propagate information between connected entities
and support high scalability. We evaluate our method on the task of integrating
data about business entities, and we demonstrate that it outperforms standard
rule-based systems, as well as other deep learning approaches that do not use
graph-based representations.
Activism by the AI Community: Analysing Recent Achievements and Future Prospects
Comments: Forthcoming in Proceedings of the 2020 AAAI/ACM Conference on Artificial Intelligence, Ethics and Society. 7 pages
Subjects:
Computers and Society (cs.CY)
; Artificial Intelligence (cs.AI)
The artificial intelligence community (AI) has recently engaged in activism
in relation to their employers, other members of the community, and their
governments in order to shape the societal and ethical implications of AI. It
has achieved some notable successes, but prospects for further political
organising and activism are uncertain. We survey activism by the AI community
over the last six years; apply two analytical frameworks drawing upon the
literature on epistemic communities, and worker organising and bargaining; and
explore what they imply for the future prospects of the AI community. Success
thus far has hinged on a coherent shared culture, and high bargaining power due
to the high demand for a limited supply of AI talent. Both are crucial to the
future of AI activism and worthy of sustained attention.
Comments: 14 pages, review. arXiv admin note: text overlap with arXiv:1810.05587 by other authors
Subjects:
Computer Science and Game Theory (cs.GT)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Deep reinforcement learning (RL) has achieved outstanding results in recent
years, which has led a dramatic increase in the number of methods and
applications. Recent works are exploring learning beyond single-agent scenarios
and considering multi-agent scenarios. However, they are faced with lots of
challenges and are seeking for help from traditional game-theoretic algorithms,
which, in turn, show bright application promise combined with modern algorithms
and boosting computing power. In this survey, we first introduce basic concepts
and algorithms in single agent RL and multi-agent systems; then, we summarize
the related algorithms from three aspects. Solution concepts from game theory
give inspiration to algorithms which try to evaluate the agents or find better
solutions in multi-agent systems. Fictitious self-play becomes popular and has
a great impact on the algorithm of multi-agent reinforcement learning.
Counterfactual regret minimization is an important tool to solve games with
incomplete information, and has shown great strength when combined with deep
learning.
A Survey on the Use of Preferences for Virtual Machine Placement in Cloud Data Centers
Comments: 40 pages, 5 figures, 6 tables
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
With the rapid development of virtualization techniques, cloud data centers
allow for cost effective, flexible, and customizable deployments of
applications on virtualized infrastructure. Virtual machine (VM) placement aims
to assign each virtual machine to a server in the cloud environment. VM
Placement is of paramount importance to the design of cloud data centers.
Typically, VM placement involves complex relations and multiple design factors
as well as local policies that govern the assignment decisions. It also
involves different constituents including cloud administrators and customers
that might have disparate preferences while opting for a placement solution.
Thus, it is often valuable to not only return an optimized solution to the VM
placement problem but also a solution that reflects the given preferences of
the constituents. In this paper, we provide a detailed review on the role of
preferences in the recent literature on VM placement. We further discuss key
challenges and identify possible research opportunities to better incorporate
preferences within the context of VM placement.
Information Retrieval
Hybrid Semantic Recommender System for Chemical Compounds
Comments: 8 pages, 3 figures, accepted as short-paper to the 42nd European Conference on Information Retrieval (ECIR), Lisbon 2020
Subjects:
Information Retrieval (cs.IR)
Recommending Chemical Compounds of interest to a particular researcher is a
poorly explored field. The few existent datasets with information about the
preferences of the researchers use implicit feedback. The lack of Recommender
Systems in this particular field presents a challenge for the development of
new recommendations models. In this work, we propose a Hybrid recommender model
for recommending Chemical Compounds. The model integrates
collaborative-filtering algorithms for implicit feedback (Alternating Least
Squares (ALS) and Bayesian Personalized Ranking(BPR)) and semantic similarity
between the Chemical Compounds in the ChEBI ontology (ONTO). We evaluated the
model in an implicit dataset of Chemical Compounds, CheRM. The Hybrid model was
able to improve the results of state-of-the-art collaborative-filtering
algorithms, especially for Mean Reciprocal Rank, with an increase of 6.7% when
comparing the collaborative-filtering ALS and the Hybrid ALS_ONTO.
BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction
Comments: Accepted as ECIR 2020 Short Paper
Subjects:
Information Retrieval (cs.IR)
Successful biomedical relation extraction can provide evidence to researchers
and clinicians about possible unknown associations between biomedical entities,
advancing the current knowledge we have about those entities and their inherent
mechanisms. Most biomedical relation extraction systems do not resort to
external sources of knowledge, such as domain-specific ontologies. However,
using deep learning methods, along with biomedical ontologies, has been
recently shown to effectively advance the biomedical relation extraction field.
To perform relation extraction, our deep learning system, BiOnt, employs four
types of biomedical ontologies, namely, the Gene Ontology, the Human Phenotype
Ontology, the Human Disease Ontology, and the Chemical Entities of Biological
Interest, regarding gene-products, phenotypes, diseases, and chemical
compounds, respectively. We tested our system with three data sets that
represent three different types of relations of biomedical entities. BiOnt
achieved, in F-score, an improvement of 4.93 percentage points for drug-drug
interactions (DDI corpus), 4.99 percentage points for phenotype-gene relations
(PGR corpus), and 2.21 percentage points for chemical-induced disease relations
(BC5CDR corpus), relatively to the state-of-the-art. The code supporting this
system is available at this https URL .
Quantum-like Structure in Multidimensional Relevance Judgements
Comments: To be published at 42nd European Conference on Research in IR, ECIR 2020
Subjects:
Information Retrieval (cs.IR)
A large number of studies in cognitive science have revealed that
probabilistic outcomes of certain human decisions do not agree with the axioms
of classical probability theory. The field of Quantum Cognition provides an
alternative probabilistic model to explain such paradoxical findings. It posits
that cognitive systems have an underlying quantum-like structure, especially in
decision-making under uncertainty. In this paper, we hypothesise that relevance
judgement, being a multidimensional, cognitive concept, can be used to probe
the quantum-like structure for modelling users’ cognitive states in information
seeking. Extending from an experiment protocol inspired by the Stern-Gerlach
experiment in Quantum Physics, we design a crowd-sourced user study to show
violation of the Kolmogorovian probability axioms as a proof of the
quantum-like structure, and provide a comparison between a quantum
probabilistic model and a Bayesian model for predictions of relevance.
Common Conversational Community Prototype: Scholarly Conversational Assistant
Krisztian Balog , Lucie Flekova , Matthias Hagen , Rosie Jones , Martin Potthast , Filip Radlinski , Mark Sanderson , Svitlana Vakulenko , Hamed Zamani Subjects : Information Retrieval (cs.IR)
This paper discusses the potential for creating academic resources (tools,
data, and evaluation approaches) to support research in conversational search,
by focusing on realistic information needs and conversational interactions.
Specifically, we propose to develop and operate a prototype conversational
search system for scholarly activities. This Scholarly Conversational Assistant
would serve as a useful tool, a means to create datasets, and a platform for
running evaluation challenges by groups across the community. This article
results from discussions of a working group at Dagstuhl Seminar 19461 on
Conversational Search.
On the Minimum Achievable Age of Information for General Service-Time Distributions
Jaya Prakash Champati , Ramana R. Avula , Tobias J. Oechtering , James Gross Subjects : Information Retrieval (cs.IR) ; Information Theory (cs.IT)
There is a growing interest in analysing the freshness of data in networked
systems. Age of Information (AoI) has emerged as a popular metric to quantify
this freshness at a given destination. There has been a significant research
effort in optimizing this metric in communication and networking systems under
different settings. In contrast to previous works, we are interested in a
fundamental question, what is the minimum achievable AoI in any
single-server-single-source queuing system for a given service-time
distribution? To address this question, we study a problem of optimizing AoI
under service preemptions. Our main result is on the characterization of the
minimum achievable average peak AoI (PAoI). We obtain this result by showing
that a fixed-threshold policy is optimal in the set of all randomized-threshold
causal policies. We use the characterization to provide necessary and
sufficient conditions for the service-time distributions under which
preemptions are beneficial.
Information Foraging for Enhancing Implicit Feedback in Content-based Image Recommendation
Comments: FIRE ’19: Proceedings of the 11th Forum for Information Retrieval Evaluation
Subjects:
Information Retrieval (cs.IR)
; Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
User implicit feedback plays an important role in recommender systems.
However, finding implicit features is a tedious task. This paper aims to
identify users’ preferences through implicit behavioural signals for image
recommendation based on the Information Scent Model of Information Foraging
Theory. In the first part, we hypothesise that the users’ perception is
improved with visual cues in the images as behavioural signals that provide
users’ information scent during information seeking. We designed a
content-based image recommendation system to explore which image attributes
(i.e., visual cues or bookmarks) help users find their desired image. We found
that users prefer recommendations predicated by visual cues and therefore
consider the visual cues as good information scent for their information
seeking. In the second part, we investigated if visual cues in the images
together with the images itself can be better perceived by the users than each
of them on its own. We evaluated the information scent artifacts in image
recommendation on the Pinterest image collection and the WikiArt dataset. We
find our proposed image recommendation system supports the implicit signals
through Information Foraging explanation of the information scent model.
Ranking Significant Discrepancies in Clinical Reports
Comments: ECIR 2020 (short)
Subjects:
Information Retrieval (cs.IR)
; Computation and Language (cs.CL)
Medical errors are a major public health concern and a leading cause of death
worldwide. Many healthcare centers and hospitals use reporting systems where
medical practitioners write a preliminary medical report and the report is
later reviewed, revised, and finalized by a more experienced physician. The
revisions range from stylistic to corrections of critical errors or
misinterpretations of the case. Due to the large quantity of reports written
daily, it is often difficult to manually and thoroughly review all the
finalized reports to find such errors and learn from them. To address this
challenge, we propose a novel ranking approach, consisting of textual and
ontological overlaps between the preliminary and final versions of reports. The
approach learns to rank the reports based on the degree of discrepancy between
the versions. This allows medical practitioners to easily identify and learn
from the reports in which their interpretation most substantially differed from
that of the attending physician (who finalized the report). This is a crucial
step towards uncovering potential errors and helping medical practitioners to
learn from such errors, thus improving patient-care in the long run. We
evaluate our model on a dataset of radiology reports and show that our approach
outperforms both previously-proposed approaches and more recent language models
by 4.5% to 15.4%.
Random-walk Based Generative Model for Classifying Document Networks
Comments: 7pages, 3 figures
Subjects:
Physics and Society (physics.soc-ph)
; Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Document networks are found in various collections of real-world data, such
as citation networks, hyperlinked web pages, and online social networks. A
large number of generative models have been proposed because they offer
intuitive and useful pictures for analyzing document networks. Prominent
examples are relational topic models, where documents are linked according to
their topic similarities. However, existing generative models do not make full
use of network structures because they are largely dependent on topic modeling
of documents. In particular, centrality of graph nodes is missing in generative
processes of previous models. In this paper, we propose a novel generative
model for document networks by introducing random walkers on networks to
integrate the node centrality into link generation processes. The developed
method is evaluated in semi-supervised classification tasks with real-world
citation networks. We show that the proposed model outperforms existing
probabilistic approaches especially in detecting communities in connected
networks.
Finding temporal patterns using algebraic fingerprints
Suhas Thejaswi , Aristides Gionis Subjects : Data Structures and Algorithms (cs.DS) ; Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
In this paper we study a family of pattern-detection problems in
vertex-colored temporal graphs. In particular, given a vertex-colored temporal
graph and a multi-set of colors as a query, we search for temporal paths in the
graph that contain the colors specified in the query. These types of problems
have several interesting applications, for example, recommending tours for
tourists, or searching for abnormal behavior in a network of financial
transactions.
For the family of pattern-detection problems we define, we establish
complexity results and design an algebraic-algorithmic framework based on
constrained multilinear sieving. We demonstrate that our solution can scale to
massive graphs with up to hundred million edges, despite the problems being
NP-hard. Our implementation, which is publicly available, exhibits practical
edge-linear scalability and highly optimized. For example, in a real-world
graph dataset with more than six million edges and a multi-set query with ten
colors, we can extract an optimal solution in less than eight minutes on a
haswell desktop with four cores.
Audio Summarization with Audio Features and Probability Distribution Divergence
Comments: 20th International Conference on Computational Linguistics and Intelligent Text Processing
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR)
The automatic summarization of multimedia sources is an important task that
facilitates the understanding of an individual by condensing the source while
maintaining relevant information. In this paper we focus on audio summarization
based on audio features and the probability of distribution divergence. Our
method, based on an extractive summarization approach, aims to select the most
relevant segments until a time threshold is reached. It takes into account the
segment’s length, position and informativeness value. Informativeness of each
segment is obtained by mapping a set of audio features issued from its
Mel-frequency Cepstral Coefficients and their corresponding Jensen-Shannon
divergence score. Results over a multi-evaluator scheme shows that our approach
provides understandable and informative summaries.
SlideImages: A Dataset for Educational Image Classification
Comments: 8 pages, 2 figures, to be presented at ECIR 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Information Retrieval (cs.IR)
In the past few years, convolutional neural networks (CNNs) have achieved
impressive results in computer vision tasks, which however mainly focus on
photos with natural scene content. Besides, non-sensor derived images such as
illustrations, data visualizations, figures, etc. are typically used to convey
complex information or to explore large datasets. However, this kind of images
has received little attention in computer vision. CNNs and similar techniques
use large volumes of training data. Currently, many document analysis systems
are trained in part on scene images due to the lack of large datasets of
educational image data. In this paper, we address this issue and present
SlideImages, a dataset for the task of classifying educational illustrations.
SlideImages contains training data collected from various sources, e.g.,
Wikimedia Commons and the AI2D dataset, and test data collected from
educational slides. We have reserved all the actual educational images as a
test dataset in order to ensure that the approaches using this dataset
generalize well to new educational images, and potentially other domains.
Furthermore, we present a baseline system using a standard deep neural
architecture and discuss dealing with the challenge of limited training data.
Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval
Comments: Accepted in WACV’2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that
the data of all the classes are available during training. The assumption may
not always be practical since the data of a few classes may be unavailable, or
the classes may not appear at the time of training. Zero-Shot Sketch-Based
Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to
handle previously unseen classes during the test. This paper proposes a
generative approach based on the Stacked Adversarial Network (SAN) and the
advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a
high-quality sample, SN learns a better distance metric compared to that of the
nearest neighbor search. The capability of the generative model to synthesize
image features based on the sketch reduces the SBIR problem to that of an
image-to-image retrieval problem. We evaluate the efficacy of our proposed
approach on TU-Berlin, and Sketchy database in both standard ZSL and
generalized ZSL setting. The proposed method yields a significant improvement
in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL)
for SBIR.
Adaptive Parameterization for Neural Dialogue Generation
Comments: Published as a long paper in EMNLP 2019
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Neural conversation systems generate responses based on the
sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with
a single set of learned parameters to generate responses for given input
contexts. When confronting diverse conversations, its adaptability is rather
limited and the model is hence prone to generate generic responses. In this
work, we propose an {f Ada}ptive {f N}eural {f D}ialogue generation
model, extsc{AdaND}, which manages various conversations with
conversation-specific parameterization. For each conversation, the model
generates parameters of the encoder-decoder by referring to the input context.
In particular, we propose two adaptive parameterization mechanisms: a
context-aware and a topic-aware parameterization mechanism. The context-aware
parameterization directly generates the parameters by capturing local semantics
of the given context. The topic-aware parameterization enables parameter
sharing among conversations with similar topics by first inferring the latent
topics of the given context and then generating the parameters with respect to
the distributional topics. Extensive experiments conducted on a large-scale
real-world conversational dataset show that our model achieves superior
performance in terms of both quantitative metrics and human evaluations.
SEPT: Improving Scientific Named Entity Recognition with Span Representation
Tan Yan , Heyan Huang , Xian-Ling Mao Subjects : Computation and Language (cs.CL) ; Information Retrieval (cs.IR)
We introduce a new scientific named entity recognizer called SEPT, which
stands for Span Extractor with Pre-trained Transformers. In recent papers, span
extractors have been demonstrated to be a powerful model compared with sequence
labeling models. However, we discover that with the development of pre-trained
language models, the performance of span extractors appears to become similar
to sequence labeling models. To keep the advantages of span representation, we
modified the model by under-sampling to balance the positive and negative
samples and reduce the search space. Furthermore, we simplify the origin
network architecture to combine the span extractor with BERT. Experiments
demonstrate that even simplified architecture achieves the same performance and
SEPT achieves a new state of the art result in scientific named entity
recognition even without relation information involved.
Computation and Language
Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference
Timo Schick , Hinrich Schütze Subjects : Computation and Language (cs.CL)
Some NLP tasks can be solved in a fully unsupervised fashion by providing a
pretrained language model with “task descriptions” in natural language (e.g.,
Radford et al., 2019). While this approach underperforms its supervised
counterpart, we show in this work that the two ideas can be combined: We
introduce Pattern-Exploiting Training (PET), a semi-supervised training
procedure that reformulates input examples as cloze-style phrases which help
the language model understand the given task. Theses phrases are then used to
assign soft labels to a large set of unlabeled examples. Finally, regular
supervised training is performed on the resulting training set. On several
tasks, we show that PET outperforms both supervised training and unsupervised
approaches in low-resource settings by a large margin.
Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning
Comments: Published at SIGDIAL 2019
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Learning suitable and well-performing dialogue behaviour in statistical
spoken dialogue systems has been in the focus of research for many years. While
most work which is based on reinforcement learning employs an objective measure
like task success for modelling the reward signal, we use a reward based on
user satisfaction estimation. We propose a novel estimator and show that it
outperforms all previous estimators while learning temporal dependencies
implicitly. Furthermore, we apply this novel user satisfaction estimation model
live in simulated experiments where the satisfaction estimation model is
trained on one domain and applied in many other domains which cover a similar
task. We show that applying this model results in higher estimated
satisfaction, similar task success rates and a higher robustness to noise.
Generating Sense Embeddings for Syntactic and Semantic Analogy for Portuguese
Comments: 14 pages, STIL 2019 Full paper
Journal-ref: STIL XII (2019) 104-113
Subjects:
Computation and Language (cs.CL)
Word embeddings are numerical vectors which can represent words or concepts
in a low-dimensional continuous space. These vectors are able to capture useful
syntactic and semantic information. The traditional approaches like Word2Vec,
GloVe and FastText have a strict drawback: they produce a single vector
representation per word ignoring the fact that ambiguous words can assume
different meanings. In this paper we use techniques to generate sense
embeddings and present the first experiments carried out for Portuguese. Our
experiments show that sense vectors outperform traditional word vectors in
syntactic and semantic analogy tasks, proving that the language resource
generated here can improve the performance of NLP tasks in Portuguese.
Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems
Vevake Balaraman , Bernardo Magnini Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
In task-oriented dialogue systems the dialogue state tracker (DST) component
is responsible for predicting the state of the dialogue based on the dialogue
history. Current DST approaches rely on a predefined domain ontology, a fact
that limits their effective usage for large scale conversational agents, where
the DST constantly needs to be interfaced with ever-increasing services and
APIs. Focused towards overcoming this drawback, we propose a domain-aware
dialogue state tracker, that is completely data-driven and it is modeled to
predict for dynamic service schemas. The proposed model utilizes domain and
slot information to extract both domain and slot specific representations for a
given dialogue, and then uses such representations to predict the values of the
corresponding slot. Integrating this mechanism with a pretrained language model
(i.e. BERT), our approach can effectively learn semantic relations.
A Physical Embedding Model for Knowledge Graphs
Comments: 9th Joint International Conference, JIST 2019, Hangzhou, China
Subjects:
Computation and Language (cs.CL)
Knowledge graph embedding methods learn continuous vector representations for
entities in knowledge graphs and have been used successfully in a large number
of applications. We present a novel and scalable paradigm for the computation
of knowledge graph embeddings, which we dub PYKE . Our approach combines a
physical model based on Hooke’s law and its inverse with ideas from simulated
annealing to compute embeddings for knowledge graphs efficiently. We prove that
PYKE achieves a linear space complexity. While the time complexity for the
initialization of our approach is quadratic, the time complexity of each of its
iterations is linear in the size of the input knowledge graph. Hence, PYKE’s
overall runtime is close to linear. Consequently, our approach easily scales up
to knowledge graphs containing millions of triples. We evaluate our approach
against six state-of-the-art embedding approaches on the DrugBank and DBpedia
datasets in two series of experiments. The first series shows that the cluster
purity achieved by PYKE is up to 26% (absolute) better than that of the state
of art. In addition, PYKE is more than 22 times faster than existing embedding
solutions in the best case. The results of our second series of experiments
show that PYKE is up to 23% (absolute) better than the state of art on the task
of type prediction while maintaining its superior scalability. Our
implementation and results are open-source and are available at
Length-controllable Abstractive Summarization by Guiding with Summary Prototype
Itsumi Saito , Kyosuke Nishida , Kosuke Nishida , Atsushi Otsuka , Hisako Asano , Junji Tomita , Hiroyuki Shindo , Yuji Matsumoto Subjects : Computation and Language (cs.CL)
We propose a new length-controllable abstractive summarization model. Recent
state-of-the-art abstractive summarization models based on encoder-decoder
models generate only one summary per source text. However, controllable
summarization, especially of the length, is an important aspect for practical
applications. Previous studies on length-controllable abstractive summarization
incorporate length embeddings in the decoder module for controlling the summary
length. Although the length embeddings can control where to stop decoding, they
do not decide which information should be included in the summary within the
length constraint. Unlike the previous models, our length-controllable
abstractive summarization model incorporates a word-level extractive module in
the encoder-decoder model instead of length embeddings. Our model generates a
summary in two steps. First, our word-level extractor extracts a sequence of
important words (we call it the “prototype text”) from the source text
according to the word-level importance scores and the length constraint.
Second, the prototype text is used as additional input to the encoder-decoder
model, which generates a summary by jointly encoding and copying words from
both the prototype text and source text. Since the prototype text is a guide to
both the content and length of the summary, our model can generate an
informative and length-controlled summary. Experiments with the CNN/Daily Mail
dataset and the NEWSROOM dataset show that our model outperformed previous
models in length-controlled settings.
A Hierarchical Location Normalization System for Text
Comments: 7 pages, submitted to conference
Subjects:
Computation and Language (cs.CL)
It’s natural these days for people to know the local events from massive
documents. Many texts contain location information, such as city name or road
name, which is always incomplete or latent. It’s significant to extract the
administrative area of the text and organize the hierarchy of area, called
location normalization. Existing detecting location systems either exclude
hierarchical normalization or present only a few specific regions. We propose a
system named ROIBase that normalizes the text by the Chinese hierarchical
administrative divisions. ROIBase adopts a co-occurrence constraint as the
basic framework to score the hit of the administrative area, achieves the
inference by special embeddings, and expands the recall by the ROI (region of
interest). It has high efficiency and interpretability because it mainly
establishes on the definite knowledge and has less complex logic than the
supervised models. We demonstrate that ROIBase achieves better performance
against feasible solutions and is useful as a strong support system for
location normalization.
Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching
Comments: AAAI 2020, 8 pages
Subjects:
Computation and Language (cs.CL)
Transformer has been successfully applied to many natural language processing
tasks. However, for textual sequence matching, simple matching between the
representation of a pair of sequences might bring in unnecessary noise. In this
paper, we propose a new approach to sequence pair matching with Transformer, by
learning head-wise matching representations on multiple levels. Experiments
show that our proposed approach can achieve new state-of-the-art performance on
multiple tasks that rely only on pre-computed sequence-vector-representation,
such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.
Text-based inference of moral sentiment change
Comments: In Proceedings of EMNLP 2019
Subjects:
Computation and Language (cs.CL)
We present a text-based framework for investigating moral sentiment change of
the public via longitudinal corpora. Our framework is based on the premise that
language use can inform people’s moral perception toward right or wrong, and we
build our methodology by exploring moral biases learned from diachronic word
embeddings. We demonstrate how a parameter-free model supports inference of
historical shifts in moral sentiment toward concepts such as slavery and
democracy over centuries at three incremental levels: moral relevance, moral
polarity, and fine-grained moral dimensions. We apply this methodology to
visualizing moral time courses of individual concepts and analyzing the
relations between psycholinguistic variables and rates of moral sentiment
change at scale. Our work offers opportunities for applying natural language
processing toward characterizing moral sentiment change in society.
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
Comments: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 2020
Subjects:
Computation and Language (cs.CL)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
There is a perennial need in the online advertising industry to refresh ad
creatives, i.e., images and text used for enticing online users towards a
brand. Such refreshes are required to reduce the likelihood of ad fatigue among
online users, and to incorporate insights from other successful campaigns in
related product categories. Given a brand, to come up with themes for a new ad
is a painstaking and time consuming process for creative strategists.
Strategists typically draw inspiration from the images and text used for past
ad campaigns, as well as world knowledge on the brands. To automatically infer
ad themes via such multimodal sources of information in past ad campaigns, we
propose a theme (keyphrase) recommender system for ad creative strategists. The
theme recommender is based on aggregating results from a visual question
answering (VQA) task, which ingests the following: (i) ad images, (ii) text
associated with the ads as well as Wikipedia pages on the brands in the ads,
and (iii) questions around the ad. We leverage transformer based cross-modality
encoders to train visual-linguistic representations for our VQA task. We study
two formulations for the VQA task along the lines of classification and
ranking; via experiments on a public dataset, we show that cross-modal
representations lead to significantly better classification accuracy and
ranking precision-recall metrics. Cross-modal representations show better
performance compared to separate image and text representations. In addition,
the use of multimodal information shows a significant lift over using only
textual or visual information.
Audio Summarization with Audio Features and Probability Distribution Divergence
Comments: 20th International Conference on Computational Linguistics and Intelligent Text Processing
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR)
The automatic summarization of multimedia sources is an important task that
facilitates the understanding of an individual by condensing the source while
maintaining relevant information. In this paper we focus on audio summarization
based on audio features and the probability of distribution divergence. Our
method, based on an extractive summarization approach, aims to select the most
relevant segments until a time threshold is reached. It takes into account the
segment’s length, position and informativeness value. Informativeness of each
segment is obtained by mapping a set of audio features issued from its
Mel-frequency Cepstral Coefficients and their corresponding Jensen-Shannon
divergence score. Results over a multi-evaluator scheme shows that our approach
provides understandable and informative summaries.
Nested-Wasserstein Self-Imitation Learning for Sequence Generation
Comments: Accepted by AISTATS2020
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
Reinforcement learning (RL) has been widely studied for improving
sequence-generation models. However, the conventional rewards used for RL
training typically cannot capture sufficient semantic information and therefore
render model bias. Further, the sparse and delayed rewards make RL exploration
inefficient. To alleviate these issues, we propose the concept of
nested-Wasserstein distance for distributional semantic matching. To further
exploit it, a novel nested-Wasserstein self-imitation learning framework is
developed, encouraging the model to exploit historical high-rewarded sequences
for enhanced exploration and better semantic matching. Our solution can be
understood as approximately executing proximal policy optimization with
Wasserstein trust-regions. Experiments on a variety of unconditional and
conditional sequence-generation tasks demonstrate the proposed approach
consistently leads to improved performance.
A multimodal deep learning approach for named entity recognition from social media
Meysam Asgari-Chenaghlu , M.Reza Feizi-Derakhshi , Leili Farzinvash , Cina Motamed Subjects : Computation and Language (cs.CL) ; Machine Learning (cs.LG); Multimedia (cs.MM); Social and Information Networks (cs.SI)
Named Entity Recognition (NER) from social media posts is a challenging task.
User generated content which forms the nature of social media, is noisy and
contains grammatical and linguistic errors. This noisy content makes it much
harder for tasks such as named entity recognition. However some applications
like automatic journalism or information retrieval from social media, require
more information about entities mentioned in groups of social media posts.
Conventional methods applied to structured and well typed documents provide
acceptable results while compared to new user generated media, these methods
are not satisfactory. One valuable piece of information about an entity is the
related image to the text. Combining this multimodal data reduces ambiguity and
provides wider information about the entities mentioned. In order to address
this issue, we propose a novel deep learning approach utilizing multimodal deep
learning. Our solution is able to provide more accurate results on named entity
recognition task. Experimental results, namely the precision, recall and F1
score metrics show the superiority of our work compared to other
state-of-the-art NER solutions.
From Speech-to-Speech Translation to Automatic Dubbing
Comments: 5 pages, 4 figures
Subjects:
Computation and Language (cs.CL)
; Sound (cs.SD); Audio and Speech Processing (eess.AS)
We present enhancements to a speech-to-speech translation pipeline in order
to perform automatic dubbing. Our architecture features neural machine
translation generating output of preferred length, prosodic alignment of the
translation with the original speech segments, neural text-to-speech with fine
tuning of the duration of each utterance, and, finally, audio rendering to
enriches text-to-speech output with background noise and reverberation
extracted from the original audio. We report on a subjective evaluation of
automatic dubbing of excerpts of TED Talks from English into Italian, which
measures the perceived naturalness of automatic dubbing and the relative
importance of each proposed enhancement.
Capturing Evolution in Word Usage: Just Add More Clusters?
Comments: 7 pages
Subjects:
Computation and Language (cs.CL)
The way the words are used evolves through time, mirroring cultural or
technological evolution of society. Semantic change detection is the task of
detecting and analysing word evolution in textual data, even in short periods
of time. This task has recently become a popular task in the NLP community. In
this paper we focus on a new set of methods relying on contextualised
embedding, a type of semantic modelling that revolutionised the field recently.
We leverage the ability of the transformer-based BERT model to generate
contextualised embeddings suitable to detect semantic change of words across
time. We compare our results to other approaches from the literature in a
common setting in order to establish strengths and weaknesses for each of them.
We also propose several ideas for improvements, managing to drastically improve
the performance of existing approaches.
Adaptive Parameterization for Neural Dialogue Generation
Comments: Published as a long paper in EMNLP 2019
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Neural conversation systems generate responses based on the
sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with
a single set of learned parameters to generate responses for given input
contexts. When confronting diverse conversations, its adaptability is rather
limited and the model is hence prone to generate generic responses. In this
work, we propose an {f Ada}ptive {f N}eural {f D}ialogue generation
model, extsc{AdaND}, which manages various conversations with
conversation-specific parameterization. For each conversation, the model
generates parameters of the encoder-decoder by referring to the input context.
In particular, we propose two adaptive parameterization mechanisms: a
context-aware and a topic-aware parameterization mechanism. The context-aware
parameterization directly generates the parameters by capturing local semantics
of the given context. The topic-aware parameterization enables parameter
sharing among conversations with similar topics by first inferring the latent
topics of the given context and then generating the parameters with respect to
the distributional topics. Extensive experiments conducted on a large-scale
real-world conversational dataset show that our model achieves superior
performance in terms of both quantitative metrics and human evaluations.
R2DE: a NLP approach to estimating IRT parameters of newly generated questions
Luca Benedetto , Andrea Cappelli , Roberto Turrin , Paolo Cremonesi Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Machine Learning (stat.ML)
The main objective of exams consists in performing an assessment of students’
expertise on a specific subject. Such expertise, also referred to as skill or
knowledge level, can then be leveraged in different ways (e.g., to assign a
grade to the students, to understand whether a student might need some support,
etc.). Similarly, the questions appearing in the exams have to be assessed in
some way before being used to evaluate students. Standard approaches to
questions’ assessment are either subjective (e.g., assessment by human experts)
or introduce a long delay in the process of question generation (e.g.,
pretesting with real students). In this work we introduce R2DE (which is a
Regressor for Difficulty and Discrimination Estimation), a model capable of
assessing newly generated multiple-choice questions by looking at the text of
the question and the text of the possible choices. In particular, it can
estimate the difficulty and the discrimination of each question, as they are
defined in Item Response Theory. We also present the results of extensive
experiments we carried out on a real world large scale dataset coming from an
e-learning platform, showing that our model can be used to perform an initial
assessment of newly created questions and ease some of the problems that arise
in question generation.
Comments: 5 pages, 2 figures
Subjects:
Audio and Speech Processing (eess.AS)
; Computation and Language (cs.CL)
It is generally believed that direct sequence-to-sequence (seq2seq) speech
recognition models are competitive with hybrid models only when a large amount
of data, at least a thousand hours, is available for training. In this paper,
we show that state-of-the-art recognition performance can be achieved on the
Switchboard-300 database using a single headed attention, LSTM based model.
Using a cross-utterance language model, our single-pass speaker independent
system reaches 6.4% and 12.5% word error rate (WER) on the Switchboard and
CallHome subsets of Hub5’00, without a pronunciation lexicon. While careful
regularization and data augmentation are crucial in achieving this level of
performance, experiments on Switchboard-2000 show that nothing is more useful
than more data.
SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Existing VQA datasets contain questions with varying levels of complexity.
While the majority of questions in these datasets require perception for
recognizing existence, properties, and spatial relationships of entities, a
significant portion of questions pose challenges that correspond to reasoning
tasks — tasks that can only be answered through a synthesis of perception and
knowledge about the world, logic and / or reasoning. This distinction allows us
to notice when existing VQA models have consistency issues — they answer the
reasoning question correctly but fail on associated low-level perception
questions. For example, models answer the complex reasoning question “Is the
banana ripe enough to eat?” correctly, but fail on the associated perception
question “Are the bananas mostly green or yellow?” indicating that the model
likely answered the reasoning question correctly but for the wrong reason. We
quantify the extent to which this phenomenon occurs by creating a new Reasoning
split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of
200K new perception questions which serve as sub questions corresponding to the
set of perceptual tasks needed to effectively answer the complex reasoning
questions in the Reasoning split. Additionally, we propose an approach called
Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the
model to attend do the same parts of the image when answering the reasoning
question and the perception sub questions. We show that SQuINT improves model
consistency by 7.8%, also marginally improving its performance on the Reasoning
questions in VQA, while also displaying qualitatively better attention maps.
Ranking Significant Discrepancies in Clinical Reports
Comments: ECIR 2020 (short)
Subjects:
Information Retrieval (cs.IR)
; Computation and Language (cs.CL)
Medical errors are a major public health concern and a leading cause of death
worldwide. Many healthcare centers and hospitals use reporting systems where
medical practitioners write a preliminary medical report and the report is
later reviewed, revised, and finalized by a more experienced physician. The
revisions range from stylistic to corrections of critical errors or
misinterpretations of the case. Due to the large quantity of reports written
daily, it is often difficult to manually and thoroughly review all the
finalized reports to find such errors and learn from them. To address this
challenge, we propose a novel ranking approach, consisting of textual and
ontological overlaps between the preliminary and final versions of reports. The
approach learns to rank the reports based on the degree of discrepancy between
the versions. This allows medical practitioners to easily identify and learn
from the reports in which their interpretation most substantially differed from
that of the attending physician (who finalized the report). This is a crucial
step towards uncovering potential errors and helping medical practitioners to
learn from such errors, thus improving patient-care in the long run. We
evaluate our model on a dataset of radiology reports and show that our approach
outperforms both previously-proposed approaches and more recent language models
by 4.5% to 15.4%.
Distributed, Parallel, and Cluster Computing
Lattice QCD on a novel vector architecture
Comments: Proceedings of Lattice 2019, 7 pages, 6 colorful figures
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; High Energy Physics – Lattice (hep-lat)
The SX-Aurora TSUBASA PCIe accelerator card is the newest model of NEC’s SX
architecture family. Its multi-core vector processor features a vector length
of 16 kbits and interfaces with up to 48 GB of HBM2 memory in the current
models, available since 2018. The compute performance is up to 2.45 TFlop/s
peak in double precision, and the memory throughput is up to 1.2 TB/s peak. New
models with improved performance characteristics are announced for the near
future. In this contribution we discuss key aspects of the SX-Aurora and
describe how we enabled the architecture in the Grid Lattice QCD framework.
An IoT Platform-as-a-service for NFV Based — Hybrid Cloud / Fog Systems
Carla Mouradian , Fereshteh Ebrahimnezhad , Yassine Jebbar , Jasmeen Kaur Ahluwalia , Seyedeh Negar Afrasiabi , Roch H. Glitho , Ashok Moghe Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Software Engineering (cs.SE)
Cloud computing, despite its inherent advantages (e.g., resource efficiency)
still faces several challenges. the wide are network used to connect the cloud
to end-users could cause high latency, which may not be tolerable for some
applications, especially Internet of Things (IoT applications. Fog computing
can reduce this latency by extending the traditional cloud architecture to the
edge of the network and by enabling the deployment of some application
components on fog nodes. Application providers use Platform-as-a-Service (PaaS)
to provision (i.e., develop, deploy, manage, and orchestrate) applications in
cloud. However, existing PaaS solutions (including IoT PaaS) usually focus on
cloud and do not enable provisioning of applications with components spanning
cloud and fog. provisioning such applications require novel functions, such as
application graph generation, that are absent from existing PaaS. Furthermore,
several functions offered by existing PaaS (e.g., publication/discovery) need
to be significantly extended in order to fit in a hybrid cloud/fog environment.
In this paper, we propose a novel architecture for PaaS for hybrid cloud/fog
system. It is IoT use case-driven, and its applications’ components are
implemented as Virtual Network Functions (VNFs) with execution sequences
modeled s graphs with sub-structures such as selection and loops. It automates
the provisioning of applications with components spanning cloud and fog. In
addition, it enables the discovery of existing cloud and fog nodes and
generates application graphs. A proof of concept is built based on Cloudify
open source. Feasibility is demonstrated by evaluating its performance when
PaaS modules and application components are placed in clouds and fogs in
different geographical locations.
Self Organization Agent Oriented Dynamic Resource Allocation on Open Federated Clouds Environment
Journal-ref: International Conference on Cloud Computing Technologies and
Applications, CLOUDTECH 2016, May 2016, Marrakech, Morocco
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
To ensure uninterrupted services to the cloud clients from federated cloud
providers, it is important to guarantee an efficient allocation of the cloud
resources to users to improve the rate of client satisfaction and the quality
of the service provisions. It is better to get as more computing and storage
resources as possible. In cloud domain several Multi Agent Resource Allocation
methods have been proposed to implement the problem of dynamic resource
allocation. However the problem is still open and many works to do in this
field. In cloud computing robustness is important so in this paper we focus on
auto-adaptive method to deal with changes of open federated cloud computing
environment. Our approach is hybrid, we first adopt an existing organizations
optimization approach for self organization in broker agent organization to
combine it with already existing Multi Agent Resource Allocation approach on
Federated Clouds. We consider an open clouds federation environment which is
dynamic and in constant evolution, new cloud operators can join the federation
or leave this one. At the same time our approach is multi criterion which can
take in account various parameters (i.e. computing load balance of mediator
agent, geographical distance (network delay) between costumer and provider…).
Serverless Straggler Mitigation using Local Error-Correcting Codes
Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Information Theory (cs.IT)
Inexpensive cloud services, such as serverless computing, are often
vulnerable to straggling nodes that increase end-to-end latency for distributed
computation. We propose and implement simple yet principled approaches for
straggler mitigation in serverless systems for matrix multiplication and
evaluate them on several common applications from machine learning and
high-performance computing. The proposed schemes are inspired by
error-correcting codes and employ parallel encoding and decoding over the data
stored in the cloud using serverless workers. This creates a fully distributed
computing framework without using a master node to conduct encoding or
decoding, which removes the computation, communication and storage bottleneck
at the master. On the theory side, we establish that our proposed scheme is
asymptotically optimal in terms of decoding time and provide a lower bound on
the number of stragglers it can tolerate with high probability. Through
extensive experiments, we show that our scheme outperforms existing schemes
such as speculative execution and other coding theoretic methods by at least
25%.
Transparently Capturing Request Execution Path for Anomaly Detection
Comments: 13pages, 7 figures
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
With the increasing scale and complexity of cloud systems and big data
analytics platforms, it is becoming more and more challenging to understand and
diagnose the processing of a service request in such distributed platforms. One
way that helps to deal with this problem is to capture the complete end-to-end
execution path of service requests among all involved components accurately.
This paper presents REPTrace, a generic methodology for capturing such
execution paths in a transparent fashion. We analyze a comprehensive list of
execution scenarios, and propose principles and algorithms for generating the
end-to-end request execution path for all the scenarios. Moreover, this paper
presents an anomaly detection approach exploiting request execution paths to
detect anomalies of the execution during request processing. The experiments on
four popular distributed platforms with different workloads show that REPTrace
can transparently capture the accurate request execution path with reasonable
latency and negligible network overhead. Fault injection experiments show that
execution anomalies are detected with high recall (96%).
Lorenz Braun , Sotirios Nikas , Chen Song , Vincent Heuveline , Holger Fröning Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Machine Learning (cs.LG); Performance (cs.PF)
Characterizing compute kernel execution behavior on GPUs for efficient task
scheduling is a non trivial task. We address this with a simple model enabling
portable and fast predictions among different GPUs using only
hardware-independent features extracted. This model is built based on random
forests using 189 individual compute kernels from benchmarks such as Parboil,
Rodinia, Polybench-GPU and SHOC. Evaluation of the model performance using
cross-validation yields a median Mean Average Percentage Error (MAPE) of
[13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on
five different GPUs, while latency for a single prediction varies between 0.1
and 0.2 seconds.
OpenMP Parallelization of Dynamic Programming and Greedy Algorithms
Claude Tadonki Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Multicore has emerged as a typical architecture model since its advent and
stands now as a standard. The trend is to increase the number of cores and
improve the performance of the memory system. Providing an efficient multicore
implementation for a important algorithmic kernel is a genuine contribution.
From a methodology standpoint, this should be done at the level of the
underlying paradigm if any. In this paper, we study the cases of {em dynamic
programming} and {em greedy algorithms}, which are two major algorithmic
paradigms. We exclusively consider directives-based loop parallelization
through OpenMP and investigate necessary pre-transformations to reach a regular
parallel form. We evaluate our methodology with a selection of well-known
combinatorial optimization problems on an INTEL Broadwell processor. Key points
for scalability are discussed before and after experimental results. Our
immediate perspective is to extend our study to the manycore case, with a
special focus on NUMA configurations.
Blockchain Consensuses Algorithms: A Survey
Md Sadek Ferdous , Mohammad Jabed Morshed Chowdhury , Mohammad A. Hoque , Alan Colman Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
In recent years, blockchain technology has received unparalleled attention
from academia, industry, and governments all around the world. It is considered
a technological breakthrough anticipated to disrupt several application
domains. This has resulted in a plethora of blockchain systems for various
purposes. However, many of these blockchain systems suffer from serious
shortcomings related to their performance and security, which need to be
addressed before any wide-scale adoption can be achieved. A crucial component
of any blockchain system is its underlying consensus algorithm, which in many
ways, determines its performance and security. Therefore, to address the
limitations of different blockchain systems, several existing as well novel
consensus algorithms have been introduced. A systematic analysis of these
algorithms will help to understand how and why any particular blockchain
performs the way it functions. However, the existing studies of consensus
algorithms are not comprehensive. Those studies have incomplete discussions on
the properties of the algorithms and fail to analyse several major blockchain
consensus algorithms in terms of their scopes. This article fills this gap by
analysing a wide range of consensus algorithms using a comprehensive taxonomy
of properties and by examining the implications of different issues still
prevalent in consensus algorithms in detail. The result of the analysis is
presented in tabular formats, which provides a visual illustration of these
algorithms in a meaningful way. We have also analysed more than hundred top
crypto-currencies belonging to different categories of consensus algorithms to
understand their properties and to implicate different trends in these
crypto-currencies. Finally, we have presented a decision tree of algorithms to
be used as a tool to test the suitability of consensus algorithms under
different criteria.
2PS: High-Quality Edge Partitioning with Two-Phase Streaming
Comments: in submission
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Graph partitioning is an important preprocessing step to distributed graph
processing. In edge partitioning, the edge set of a given graph is split into
(k) equally-sized partitions, such that the replication of vertices across
partitions is minimized. Streaming is a viable approach to partition graphs
that exceed the memory capacities of a single server. The graph is ingested as
a stream of edges, and one edge at a time is immediately and irrevocably
assigned to a partition based on a scoring function. However, streaming
partitioning suffers from the uninformed assignment problem: At the time of
partitioning early edges in the stream, there is no information available about
the rest of the edges. As a consequence, edge assignments are often driven by
balancing considerations, and the achieved replication factor is comparably
high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for
high-quality edge partitioning. In the first phase, vertices are separated into
clusters by a lightweight streaming clustering algorithm. In the second phase,
the graph is re-streamed and edge partitioning is performed while taking into
account the clustering of the vertices from the first phase. Our evaluations
show that 2PS can achieve a replication factor that is comparable to
heavy-weight random access partitioners while inducing orders of magnitude
lower memory overhead.
Distributed Vehicular Computing at the Dawn of 5G: a Survey
Comments: 34 pages, 10 figures, Journal
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Recent advances in information technology have revolutionized the automotive
industry, paving the way for next-generation smart and connected vehicles.
Connected vehicles can collaborate to deliver novel services and applications.
These services and applications require 1) massive volumes of data that
perceive ambient environments, 2) ultra-reliable and low-latency communication
networks, 3) real-time data processing which provides decision support under
application-specific constraints. Addressing such constraints introduces
significant challenges with current communication and computation technologies.
Coincidentally, the fifth generation of cellular networks (5G) was developed to
respond to communication challenges by providing an infrastructure for
low-latency, high-reliability, and high bandwidth communication. At the core of
this infrastructure, edge computing allows data offloading and computation at
the edge of the network, ensuring low-latency and context-awareness, and
pushing the utilization efficiency of 5G to its limit. In this paper, we aim at
providing a comprehensive overview of the state of research on vehicular
computing in the emerging age of 5G. After reviewing the main vehicular
applications requirements and challenges, we follow a bottom-up approach,
starting with the promising technologies for vehicular communications, all the
way up to Artificial Intelligence (AI) solutions. We explore the various
architectures for vehicular computing, including centralized Cloud Computing,
Vehicular Cloud Computing, and Vehicular Edge computing, and investigate the
potential data analytics technologies and their integration on top of the
vehicular computing architectures. We finally discuss several future research
directions and applications for vehicular computation systems.
BAASH: Enabling Blockchain-as-a-Service on High-Performance Computing Systems
Abdullah Al-Mamun , Dongfang Zhao Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
The state-of-the-art approach to manage blockchains is to process blocks of
transactions in a shared-nothing environment. Although blockchains have the
potential to provide various services for high-performance computing (HPC)
systems, HPC will not be able to embrace blockchains before the following two
missing pieces become available: (i) new consensus protocols being aware of the
shared-storage architecture in HPC, and (ii) new fault-tolerant mechanisms
compensating for HPC’s programming model—the message passing interface
(MPI)—that is vulnerable for blockchain-like workloads. To this end, we
design a new set of consensus protocols crafted for the HPC platforms and a new
fault-tolerance subsystem compensating for the failures caused by faulty MPI
processes. Built on top of the new protocols and fault-tolerance mechanism, a
prototype system is implemented and evaluated with two million transactions on
a 500-core HPC cluster, showing (6 imes), (12 imes), and (75 imes) higher
throughput than Hyperldeger, Ethereum, and Parity, respectively.
Contract-connection:An efficient communication protocol for Distributed Ledger Technology
Journal-ref: 2019 IEEE 38th International Performance Computing and
Communications Conference (IPCCC)
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Cryptography and Security (cs.CR)
Distributed Ledger Technology (DLT) is promising to become the foundation of
many decentralised systems. However, the unbalanced and unregulated network
layout contributes to the inefficiency of DLT especially in the Internet of
Things (IoT) environments, where nodes connect to only a limited number of
peers. The data communication speed globally is unbalanced and does not live up
to the constraints of efficient real-time distributed systems. In this paper,
we introduce a new communication protocol, which enables nodes to calculate the
tradeoff between connecting/disconnecting a peer in a completely decentralised
manner. The network layout globally is continuously re-balancing and optimising
along with nodes adjusting their peers. This communication protocol weakened
the inequality of the communication network. The experiment suggests this
communication protocol is stable and efficient.
The Parallelism Motifs of Genomic Data Analysis
Katherine Yelick , Aydin Buluc , Muaaz Awan , Ariful Azad , Benjamin Brock , Rob Egan , Saliya Ekanayake , Marquita Ellis , Evangelos Georganas , Giulia Guidi , Steven Hofmeyr , Oguz Selvitopi , Cristina Teodoropol , Leonid Oliker Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing.
75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices
Comments: 3 pages, 2 figures, 28 references, accepted to Northeast Database Day (NEDB) 2020. arXiv admin note: substantial text overlap with arXiv:1907.04217
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Databases (cs.DB); Data Structures and Algorithms (cs.DS); Performance (cs.PF); Social and Information Networks (cs.SI)
The SuiteSparse GraphBLAS C-library implements high performance hypersparse
matrices with bindings to a variety of languages (Python, Julia, and
Matlab/Octave). GraphBLAS provides a lightweight in-memory database
implementation of hypersparse matrices that are ideal for analyzing many types
of network data, while providing rigorous mathematical guarantees, such as
linearity. Streaming updates of hypersparse matrices put enormous pressure on
the memory hierarchy. This work benchmarks an implementation of hierarchical
hypersparse matrices that reduces memory pressure and dramatically increases
the update rate into a hypersparse matrices. The parameters of hierarchical
hypersparse matrices rely on controlling the number of entries in each level in
the hierarchy before an update is cascaded. The parameters are easily tunable
to achieve optimal performance for a variety of applications. Hierarchical
hypersparse matrices achieve over 1,000,000 updates per second in a single
instance. Scaling to 31,000 instances of hierarchical hypersparse matrices
arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update
rate of 75,000,000,000 updates per second. This capability allows the MIT
SuperCloud to analyze extremely large streaming network data sets.
CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via Sharding
Comments: short version in IPDPS 2020
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Cryptography and Security (cs.CR)
Traditional public distributed ledgers have not been able to scale-out well
and work efficiently. Sharding is deemed as a promising way to solve this
problem. By partitioning all nodes into small committees and letting them work
in parallel, we can significantly lower the amount of communication and
computation, reduce the overhead on each node’s storage, as well as enhance the
throughput of distributed ledger. Existing sharding-based protocols still
suffer from several serious drawbacks. The first thing is that all honest nodes
must connect well with each other, which demands a huge number of communication
channels in the network. Moreover, previous protocols have face great loss in
efficiency in the case where the honesty of each committee’s leader is in
question. At the same time, no explicit incentive is provided for nodes to
actively participate in the protocol.
We present CycLedger, a scalable and secure parallel protocol for distributed
ledger via sharding. Our protocol selects a leader and a partial set for each
committee, who are in charge of maintaining intra-shard consensus and
communicating with other committees, to reduce the amortized complexity of
communication, computation and storage on all nodes. We introduce a novel
commitment scheme between committees and a recovery procedure to prevent the
system from crashing even when leaders of committees are malicious. To add
incentive for the network, we use the concept of reputation, which measures
each node’s computing power. As nodes with higher reputation receive more
rewards, there is an encouragement for nodes with strong computing ability to
work honestly so as to gain reputation. In this way, we strike out a new path
to establish scalability, security and incentive for the sharding-based
distributed ledger.
Nicolas Aussel (INF, ACMES-SAMOVAR, IP Paris), Sophie Chabridon (IP Paris, INF, ACMES-SAMOVAR), Yohan Petetin (TIPIC-SAMOVAR, CITI, IP Paris) Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Machine Learning has proven useful in the recent years as a way to achieve
failure prediction for industrial systems. However, the high computational
resources necessary to run learning algorithms are an obstacle to its
widespread application. The sub-field of Distributed Learning offers a solution
to this problem by enabling the use of remote resources but at the expense of
introducing communication costs in the application that are not always
acceptable. In this paper, we propose a distributed learning approach able to
optimize the use of computational and communication resources to achieve
excellent learning model performances through a centralized architecture. To
achieve this, we present a new centralized distributed learning algorithm that
relies on the learning paradigms of Active Learning and Federated Learning to
offer a communication-efficient method that offers guarantees of model
precision on both the clients and the central server. We evaluate this method
on a public benchmark and show that its performances in terms of precision are
very close to state-of-the-art performance level of non-distributed learning
despite additional constraints.
Fast Sequence-Based Embedding with Diffusion Graphs
Comments: Source code available at: this https URL
Journal-ref: CompleNet 2018
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
A graph embedding is a representation of graph vertices in a low-dimensional
space, which approximately preserves properties such as distances between
nodes. Vertex sequence-based embedding procedures use features extracted from
linear sequences of nodes to create embeddings using a neural network. In this
paper, we propose diffusion graphs as a method to rapidly generate vertex
sequences for network embedding. Its computational efficiency is superior to
previous methods due to simpler sequence generation, and it produces more
accurate results. In experiments, we found that the performance relative to
other methods improves with increasing edge density in the graph. In a
community detection task, clustering nodes in the embedding space produces
better results compared to other sequence-based embedding methods.
Comments: 26 pages, 18 figures
Subjects:
Cryptography and Security (cs.CR)
; Distributed, Parallel, and Cluster Computing (cs.DC)
In today’s connected world, resource constrained devices are deployed for
sensing and decision making applications, ranging from smart cities to
environmental monitoring. Those recourse constrained devices are connected to
create real-time distributed networks popularly known as the Internet of Things
(IoT), fog computing and edge computing. The blockchain is gaining a lot of
interest in these domains to secure the system by ignoring centralized
dependencies, where proof-of-work (PoW) plays a vital role to make the whole
security solution decentralized. Due to the resource limitations of the
devices, PoW is not suitable for blockchain-based security solutions. This
paper presents a novel consensus algorithm called Proof-of-Authentication
(PoAh), which introduces a cryptographic authentication mechanism to replace
PoW for resource constrained devices, and to make the blockchain
application-specific. PoAh is thus suitable for private as well as permissioned
blockchains. Further, PoAh not only secures the systems, but also maintains
system sustainability and scalability. The proposed consensus algorithm is
evaluated theoretically in simulation scenarios, and in real-time hardware
testbeds to validate its performance. Finally, PoAh and its integration with
the blockchain in the IoT and edge computing scenarios is discussed. The
proposed PoAh, while running in limited computer resources (e.g. single-board
computing devices like the Raspberry Pi) has a latency in the order of 3 secs.
Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems
Burak Hasircioglu , Jesus Gomez-Vilardebo , Deniz Gunduz Subjects : Information Theory (cs.IT) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Polynomial coding has been proposed as a solution to the straggler mitigation
problem in distributed matrix multiplication. Previous works in the literature
employ univariate polynomials to encode matrix partitions. Such schemes greatly
improve the speed of distributed computing systems by making the task
completion time to depend only on the fastest workers. However, the work done
by the slowest workers, which fails to finish the task assigned to them, is
completely ignored. In order to exploit the partial computations of the slower
workers, we further decompose the overall matrix multiplication task into even
smaller subtasks to better fit workers’ storage and computation capacities. In
this work, we show that univariate schemes fail to make an efficient use of the
storage capacity and we propose bivariate polynomial codes. We show that
bivariate polynomial codes are a more natural choice to accommodate the
additional decomposition of subtasks, as well as, heterogeneous storage and
computation resources at workers. However, in contrast to univariate polynomial
decoding, for multivariate interpolation guarantying decodability is much
harder. We propose two bivartiate polynomial schemes. The first scheme exploits
the fact that bivariate interpolation is always possible for rectangular grid
of points. We obtain the rectangular grid of points at the cost of allowing
some redundant computations. For the second scheme, we relax the decoding
constraint, and require decodability for almost all choices of evaluation
points. We present interpolation sets satisfying the almost decodability
conditions for certain storage configurations of workers. Our numerical results
show that bivariate polynomial coding considerably reduces the completion time
of distributed matrix multiplication.
Finding temporal patterns using algebraic fingerprints
Suhas Thejaswi , Aristides Gionis Subjects : Data Structures and Algorithms (cs.DS) ; Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
In this paper we study a family of pattern-detection problems in
vertex-colored temporal graphs. In particular, given a vertex-colored temporal
graph and a multi-set of colors as a query, we search for temporal paths in the
graph that contain the colors specified in the query. These types of problems
have several interesting applications, for example, recommending tours for
tourists, or searching for abnormal behavior in a network of financial
transactions.
For the family of pattern-detection problems we define, we establish
complexity results and design an algebraic-algorithmic framework based on
constrained multilinear sieving. We demonstrate that our solution can scale to
massive graphs with up to hundred million edges, despite the problems being
NP-hard. Our implementation, which is publicly available, exhibits practical
edge-linear scalability and highly optimized. For example, in a real-world
graph dataset with more than six million edges and a multi-set query with ten
colors, we can extract an optimal solution in less than eight minutes on a
haswell desktop with four cores.
High-Quality Hierarchical Process Mapping
Marcelo Fonseca Faraj , Alexander van der Grinten , Henning Meyerhenke , Jesper Larsson Träff , Christian Schulz Subjects : Data Structures and Algorithms (cs.DS) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Partitioning graphs into blocks of roughly equal size such that few edges run
between blocks is a frequently needed operation when processing graphs on a
parallel computer. When a topology of a distributed system is known an
important task is then to map the blocks of the partition onto the processors
such that the overall communication cost is reduced. We present novel
multilevel algorithms that integrate graph partitioning and process mapping.
Important ingredients of our algorithm include fast label propagation, more
localized local search, initial partitioning, as well as a compressed data
structure to compute processor distances without storing a distance matrix.
Experiments indicate that our algorithms speed up the overall mapping process
and, due to the integrated multilevel approach, also find much better solutions
in practice. For example, one configuration of our algorithm yields better
solutions than the previous state-of-the-art in terms of mapping quality while
being a factor 62 faster. Compared to the currently fastest iterated multilevel
mapping algorithm Scotch, we obtain 16% better solutions while investing
slightly more running time.
Segment blockchain: A size reduced storage mechanism for blockchain
Journal-ref: IEEE Access,2020. https://ieeexplore.ieee.org/document/8957450/
Subjects:
Cryptography and Security (cs.CR)
; Distributed, Parallel, and Cluster Computing (cs.DC)
The exponential growth of the blockchain size has become a major contributing
factor that hinders the decentralisation of blockchain and its potential
implementations in data-heavy applications. In this paper, we propose segment
blockchain, an approach that segmentises blockchain and enables nodes to only
store a copy of one blockchain segment. We use emph{PoW} as a membership
threshold to limit the number of nodes taken by an Adversary—the Adversary
can only gain at most (n/2) of nodes in a network of (n) nodes when it has
(50\%) of the calculation power in the system (the Nakamoto blockchain security
threshold). A segment blockchain system fails when an Adversary stores all
copies of a segment, because the Adversary can then leave the system, causing a
permanent loss of the segment. We theoretically prove that segment blockchain
can sustain a ((AD/n)^m) failure probability when the Adversary has no more
than (AD) number of nodes and every segment is stored by (m) number of nodes.
The storage requirement is mostly shrunken compared to the traditional design
and therefore making the blockchain more suitable for data-heavy applications.
BlockHouse: Blockchain-based Distributed Storehouse System
Comments: Published in “9TH Latin-American Symposium on Dependable Computing”, 2019
Subjects:
Cryptography and Security (cs.CR)
; Distributed, Parallel, and Cluster Computing (cs.DC)
We propose in this paper BlockHouse, a decentralized/P2P storage system fully
based on private blockchains. Each participant can rent his unused storage in
order to host data of other members. This system uses a dual Smart Contract and
Proof of Retrievability system to automatically check at a fixed frequency if
the file is still hosted. In addition to transparency, the blockchain allows a
better integration with all payments associated to this type of system (
regular payments, sequestration to ensure good behaviors of users, …). Except
the data transferred between the client and the server, all the actions go
through a smart contract in the blockchain in order to log, pay and secure the
entire storage process.
Learning
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn , David Berthelot , Chun-Liang Li , Zizhao Zhang , Nicholas Carlini , Ekin D. Cubuk , Alex Kurakin , Han Zhang , Colin Raffel Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Semi-supervised learning (SSL) provides an effective means of leveraging
unlabeled data to improve a model’s performance. In this paper, we demonstrate
the power of a simple combination of two common SSL methods: consistency
regularization and pseudo-labeling. Our algorithm, FixMatch, first generates
pseudo-labels using the model’s predictions on weakly-augmented unlabeled
images. For a given image, the pseudo-label is only retained if the model
produces a high-confidence prediction. The model is then trained to predict the
pseudo-label when fed a strongly-augmented version of the same image. Despite
its simplicity, we show that FixMatch achieves state-of-the-art performance
across a variety of standard semi-supervised learning benchmarks, including
94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 — just
4 labels per class. Since FixMatch bears many similarities to existing SSL
methods that achieve worse performance, we carry out an extensive ablation
study to tease apart the experimental factors that are most important to
FixMatch’s success. We make our code available at
Deceptive AI Explanations: Creation and Detection
Johannes Schneider , Joshua Handali , Michalis Vlachos , Christian Meske Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Artificial intelligence comes with great opportunities and but also great
risks. We investigate to what extent deep learning can be used to create and
detect deceptive explanations that either aim to lure a human into believing a
decision that is not truthful to the model or provide reasoning that is
non-faithful to the decision. Our theoretical insights show some limits of
deception and detection in the absence of domain knowledge. For empirical
evaluation, we focus on text classification. To create deceptive explanations,
we alter explanations originating from GradCAM, a state-of-art technique for
creating explanations in neural networks. We evaluate the effectiveness of
deceptive explanations on 200 participants. Our findings indicate that
deceptive explanations can indeed fool humans. Our classifier can detect even
seemingly minor attempts of deception with accuracy that exceeds 80\% given
sufficient domain knowledge encoded in the form of training data.
Mobility Inference on Long-Tailed Sparse Trajectory
Lei Shi Subjects : Machine Learning (cs.LG) ; Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Analyzing the urban trajectory in cities has become an important topic in
data mining. How can we model the human mobility consisting of stay and travel
from the raw trajectory data? How can we infer such a mobility model from the
single trajectory information? How can we further generalize the mobility
inference to accommodate the real-world trajectory data that is sparsely
sampled over time?
In this paper, based on formal and rigid definitions of the stay/travel
mobility, we propose a single trajectory inference algorithm that utilizes a
generic long-tailed sparsity pattern in the large-scale trajectory data. The
algorithm guarantees a 100\% precision in the stay/travel inference with a
provable lower-bound in the recall. Furthermore, we introduce an
encoder-decoder learning architecture that admits multiple trajectories as
inputs. The architecture is optimized for the mobility inference problem
through customized embedding and learning mechanism. Evaluations with three
trajectory data sets of 40 million urban users validate the performance
guarantees of the proposed inference algorithm and demonstrate the superiority
of our deep learning model, in comparison to well-known sequence learning
methods. On extremely sparse trajectories, the deep learning model achieves a
2( imes) overall accuracy improvement from the single trajectory inference
algorithm, through proven scalability and generalizability to large-scale
versatile training data.
Generate High-Resolution Adversarial Samples by Identifying Effective Features
Sizhe Chen , Peidong Zhang , Chengjin Sun , Jia Cai , Xiaolin Huang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
As the prevalence of deep learning in computer vision, adversarial samples
that weaken the neural networks emerge in large numbers, revealing their
deep-rooted defects. Most adversarial attacks calculate an imperceptible
perturbation in image space to fool the DNNs. In this strategy, the
perturbation looks like noise and thus could be mitigated. Attacks in feature
space produce semantic perturbation, but they could only deal with low
resolution samples. The reason lies in the great number of coupled features to
express a high-resolution image. In this paper, we propose Attack by
Identifying Effective Features (AIEF), which learns different weights for
features to attack. Effective features, those with great weights, influence the
victim model much but distort the image little, and thus are more effective for
attack. By attacking mostly on them, AIEF produces high resolution adversarial
samples with acceptable distortions. We demonstrate the effectiveness of AIEF
by attacking on different tasks with different generative models.
batchboost: regularization for stabilizing training with resistance to underfitting & overfitting
Comments: 6 pages; 5 figures
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Overfitting & underfitting and stable training are an important challenges in
machine learning. Current approaches for these issues are mixup, SamplePairing
and BC learning. In our work, we state the hypothesis that mixing many images
together can be more effective than just two. Batchboost pipeline has three
stages: (a) pairing: method of selecting two samples. (b) mixing: how to create
a new one from two samples. (c) feeding: combining mixed samples with new ones
from dataset into batch (with ratio (gamma)). Note that sample that appears in
our batch propagates with subsequent iterations with less and less importance
until the end of training. Pairing stage calculates the error per sample, sorts
the samples and pairs with strategy: hardest with easiest one, than mixing
stage merges two samples using mixup, (x_1 + (1-lambda)x_2). Finally, feeding
stage combines new samples with mixed by ratio 1:1. Batchboost has 0.5-3%
better accuracy than the current state-of-the-art mixup regularization on
CIFAR-10 & Fashion-MNIST. Our method is slightly better than SamplePairing
technique on small datasets (up to 5%). Batchboost provides stable training on
not tuned parameters (like weight decay), thus its a good method to test
performance of different architectures. Source code is at:
this https URLEdgeNets:Edge Varying Graph Neural Networks
Elvin Isufi , Fernando Gama , Alejandro Ribeiro Subjects : Machine Learning (cs.LG) ; Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Machine Learning (stat.ML)
Driven by the outstanding performance of neural networks in the structured
Euclidean domain, recent years have seen a surge of interest in developing
neural networks for graphs and data supported on graphs. The graph is leveraged
as a parameterization to capture detail at the node level with a reduced number
of parameters and complexity. Following this rationale, this paper puts forth a
general framework that unifies state-of-the-art graph neural networks (GNNs)
through the concept of EdgeNet. An EdgeNet is a GNN architecture that allows
different nodes to use different parameters to weigh the information of
different neighbors. By extrapolating this strategy to more iterations between
neighboring nodes, the EdgeNet learns edge- and neighbor-dependent weights to
capture local detail. This is the most general local operation that a node can
do and encompasses under one formulation all graph convolutional neural
networks (GCNNs) as well as graph attention networks (GATs). In writing
different GNN architectures with a common language, EdgeNets highlight specific
architecture advantages and limitations, while providing guidelines to improve
their capacity without compromising their local implementation. For instance,
we show that GCNNs have a parameter sharing structure that induces permutation
equivariance. This can be an advantage or a limitation, depending on the
application. When it is a limitation, we propose hybrid approaches and provide
insights to develop several other solutions that promote parameter sharing
without enforcing permutation equivariance. Another interesting conclusion is
the unification of GCNNs and GATs -approaches that have been so far perceived
as separate. In particular, we show that GATs are GCNNs on a graph that is
learned from the features. This particularization opens the doors to develop
alternative attention mechanisms for improving discriminatory power.
Simple and Effective Graph Autoencoders with One-Hop Linear Models
Comments: Under review. A preliminary version of this work has previously been presented at a NeurIPS 2019 workshop without proceedings: arXiv:1910.00942
Subjects:
Machine Learning (cs.LG)
; Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged
as powerful node embedding methods, with promising performances on challenging
tasks such as link prediction and node clustering. Graph AE, VAE and most of
their extensions rely on graph convolutional networks (GCN) encoders to learn
vector space representations of nodes. In this paper, we propose to replace the
GCN encoder by a significantly simpler linear model w.r.t. the direct
neighborhood (one-hop) adjacency matrix of the graph. For the two
aforementioned tasks, we show that this approach consistently reaches
competitive performances w.r.t. GCN-based models for numerous real-world
graphs, including all benchmark datasets commonly used to evaluate graph AE and
VAE. We question the relevance of repeatedly using these datasets to compare
complex graph AE and VAE. We also emphasize the effectiveness of the proposed
encoding scheme, that appears as a simpler and faster alternative to GCN
encoders for many real-world applications.
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
Comments: 8 pages, 3 figures, Association for the Advancement of Artificial Intelligence (AAAI2020). arXiv admin note: substantial text overlap with arXiv:1907.01709
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Conventional sequential learning methods such as Recurrent Neural Networks
(RNNs) focus on interactions between consecutive inputs, i.e. first-order
Markovian dependency. However, most of sequential data, as seen with videos,
have complex dependency structures that imply variable-length semantic flows
and their compositions, and those are hard to be captured by conventional
methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for
learning video data by discovering these complex structures of the video. The
CB-GLNs represent video data as a graph, with nodes and edges corresponding to
frames of the video and their dependencies respectively. The CB-GLNs find
compositional dependencies of the data in multilevel graph forms via a
parameterized kernel with graph-cut and a message passing framework. We
evaluate the proposed method on the two different tasks for video
understanding: Video theme classification (Youtube-8M dataset) and Video
Question and Answering (TVQA dataset). The experimental results show that our
model efficiently learns the semantic compositional structure of video data.
Furthermore, our model achieves the highest performance in comparison to other
baseline methods.
Analytic Properties of Trackable Weak Models
Comments: 10 pages, 9 figures
Subjects:
Machine Learning (cs.LG)
; Combinatorics (math.CO); Machine Learning (stat.ML)
We present several new results on the feasibility of inferring the hidden
states in strongly-connected trackable weak models. Here, a weak model is a
directed graph in which each node is assigned a set of colors which may be
emitted when that node is visited. A hypothesis is a node sequence which is
consistent with a given color sequence. A weak model is said to be trackable if
the worst case number of such hypotheses grows as a polynomial in the sequence
length. We show that the number of hypotheses in strongly-connected trackable
models is bounded by a constant and give an expression for this constant. We
also consider the problem of reconstructing which branch was taken at a node
with same-colored out-neighbors, and show that it is always eventually possible
to identify which branch was taken if the model is strongly connected and
trackable. We illustrate these properties by assigning transition probabilities
and employing standard tools for analyzing Markov chains. In addition, we
present new results for the entropy rates of weak models according to whether
they are trackable or not. These theorems indicate that the combination of
trackability and strong connectivity dramatically simplifies the task of
reconstructing which nodes were visited. This work has implications for any
problem which can be described in terms of an agent traversing a colored graph,
such as the reconstruction of hidden states in a hidden Markov model (HMM).
Understanding the Limitations of Network Online Learning
Timothy LaRock , Timothy Sakharov , Sahely Bhadra , Tina Eliassi-Rad Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Studies of networked phenomena, such as interactions in online social media,
often rely on incomplete data, either because these phenomena are partially
observed, or because the data is too large or expensive to acquire all at once.
Analysis of incomplete data leads to skewed or misleading results. In this
paper, we investigate limitations of learning to complete partially observed
networks via node querying. Concretely, we study the following problem: given
(i) a partially observed network, (ii) the ability to query nodes for their
connections (e.g., by accessing an API), and (iii) a budget on the number of
such queries, sequentially learn which nodes to query in order to maximally
increase observability. We call this querying process Network Online Learning
and present a family of algorithms called NOL*. These algorithms learn to
choose which partially observed node to query next based on a parameterized
model that is trained online through a process of exploration and exploitation.
Extensive experiments on both synthetic and real world networks show that (i)
it is possible to sequentially learn to choose which nodes are best to query in
a network and (ii) some macroscopic properties of networks, such as the degree
distribution and modular structure, impact the potential for learning and the
optimal amount of random exploration.
Yadong Zhang , Xin Chen Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Time series motifs play an important role in the time series analysis. The
motif-based time series clustering is used for the discovery of higher-order
patterns or structures in time series data. Inspired by the convolutional
neural network (CNN) classifier based on the image representations of time
series, motif difference field (MDF) is proposed. Compared to other image
representations of time series, MDF is simple and easy to construct. With the
Fully Convolution Network (FCN) as the classifier, MDF demonstrates the
superior performance on the UCR time series dataset in benchmark with other
time series classification methods. It is interesting to find that the triadic
time series motifs give the best result in the test. Due to the motif
clustering reflected in MDF, the significant motifs are detected with the help
of the Gradient-weighted Class Activation Mapping (Grad-CAM). The areas in MDF
with high weight in Grad-CAM have a high contribution from the significant
motifs with the desired ordinal patterns associated with the signature patterns
in time series. However, the signature patterns cannot be identified with the
neural network classifiers directly based on the time series.
R2DE: a NLP approach to estimating IRT parameters of newly generated questions
Luca Benedetto , Andrea Cappelli , Roberto Turrin , Paolo Cremonesi Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Machine Learning (stat.ML)
The main objective of exams consists in performing an assessment of students’
expertise on a specific subject. Such expertise, also referred to as skill or
knowledge level, can then be leveraged in different ways (e.g., to assign a
grade to the students, to understand whether a student might need some support,
etc.). Similarly, the questions appearing in the exams have to be assessed in
some way before being used to evaluate students. Standard approaches to
questions’ assessment are either subjective (e.g., assessment by human experts)
or introduce a long delay in the process of question generation (e.g.,
pretesting with real students). In this work we introduce R2DE (which is a
Regressor for Difficulty and Discrimination Estimation), a model capable of
assessing newly generated multiple-choice questions by looking at the text of
the question and the text of the possible choices. In particular, it can
estimate the difficulty and the discrimination of each question, as they are
defined in Item Response Theory. We also present the results of extensive
experiments we carried out on a real world large scale dataset coming from an
e-learning platform, showing that our model can be used to perform an initial
assessment of newly created questions and ease some of the problems that arise
in question generation.
Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute
Comments: 7 pages
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Wikipedia is a huge opportunity for machine learning, being the largest
semi-structured base of knowledge available. Because of this, countless works
examine its contents, and focus on structuring it in order to make it usable in
learning tasks, for example by classifying it into an ontology. Beyond its
textual contents, Wikipedia also displays a typical graph structure, where
pages are linked together through citations. In this paper, we address the task
of integrating graph (i.e. structure) information to classify Wikipedia into a
fine-grained named entity ontology (NE), the Extended Named Entity hierarchy.
To address this task, we first start by assessing the relevance of the graph
structure for NE classification. We then explore two directions, one related to
feature vectors using graph descriptors commonly used in large-scale network
analysis, and one extending flat classification to a weighted model taking into
account semantic similarity. We conduct at-scale practical experiments, on a
manually labeled subset of 22,000 pages extracted from the Japanese Wikipedia.
Our results show that integrating graph information succeeds at reducing
sparsity of the input feature space, and yields classification results that are
comparable or better than previous works.
Ensemble Genetic Programming
Comments: eurogp 2020 submission
Subjects:
Machine Learning (cs.LG)
; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Ensemble learning is a powerful paradigm that has been usedin the top
state-of-the-art machine learning methods like Random Forestsand XGBoost.
Inspired by the success of such methods, we have devel-oped a new Genetic
Programming method called Ensemble GP. The evo-lutionary cycle of Ensemble GP
follows the same steps as other GeneticProgramming systems, but with
differences in the population structure,fitness evaluation and genetic
operators. We have tested this method oneight binary classification problems,
achieving results significantly betterthan standard GP, with much smaller
models. Although other methodslike M3GP and XGBoost were the best overall,
Ensemble GP was able toachieve exceptionally good generalization results on a
particularly hardproblem where none of the other methods was able to succeed.
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping
Eugenio Bargiacchi , Timothy Verstraeten , Diederik M. Roijers , Ann Nowé Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
We present a new model-based reinforcement learning algorithm, Cooperative
Prioritized Sweeping, for efficient learning in multi-agent Markov decision
processes. The algorithm allows for sample-efficient learning on large problems
by exploiting a factorization to approximate the value function. Our approach
only requires knowledge about the structure of the problem in the form of a
dynamic decision network. Using this information, our method learns a model of
the environment and performs temporal difference updates which affect multiple
joint states and actions at once. Batch updates are additionally performed
which efficiently back-propagate knowledge throughout the factored Q-function.
Our method outperforms the state-of-the-art algorithm sparse cooperative
Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized
environments.
Node Masking: Making Graph Neural Networks Generalize and Scale Better
Pushkar Mishra , Aleksandra Piktus , Gerard Goossen , Fabrizio Silvestri Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Graph Neural Networks (GNNs) have received a lot of interest in the recent
times. From the early spectral architectures that could only operate on
undirected graphs per a transductive learning paradigm to the current state of
the art spatial ones that can apply inductively to arbitrary graphs, GNNs have
seen significant contributions from the research community. In this paper, we
discuss some theoretical tools to better visualize the operations performed by
state of the art spatial GNNs. We analyze the inner workings of these
architectures and introduce a simple concept, node masking, that allows them to
generalize and scale better. To empirically validate the theory, we perform
several experiments on two widely used benchmark datasets for node
classification in both transductive and inductive settings.
The gap between theory and practice in function approximation with deep neural networks
Ben Adcock , Nick Dexter Subjects : Machine Learning (cs.LG) ; Numerical Analysis (math.NA); Machine Learning (stat.ML)
Deep learning (DL) is transforming whole industries as complicated
decision-making processes are being automated by Deep Neural Networks (DNNs)
trained on real-world data. Driven in part by a rapidly-expanding literature on
DNN approximation theory showing that DNNs can approximate a rich variety of
functions, these tools are increasingly being considered for problems in
scientific computing. Yet, unlike more traditional algorithms in this field,
relatively little is known about DNNs from the principles of numerical
analysis, namely, stability, accuracy, computational efficiency and sample
complexity. In this paper we introduce a computational framework for examining
DNNs in practice, and use it to study their empirical performance with regard
to these issues. We examine the performance of DNNs of different widths and
depths on a variety of test functions in various dimensions, including smooth
and piecewise smooth functions. We also compare DL against best-in-class
methods for smooth function approximation based on compressed sensing. Our main
conclusion is that there is a crucial gap between the approximation theory of
DNNs and their practical performance, with trained DNNs performing relatively
poorly on functions for which there are strong approximation results (e.g.
smooth functions), yet performing well in comparison to best-in-class methods
for other functions. Finally, we present a novel practical existence theorem,
which asserts the existence of a DNN architecture and training procedure which
offers the same performance as current best-in-class schemes. This result
indicates the potential for practical DNN approximation, and the need for
future research into practical architecture design and training strategies.
Engineering AI Systems: A Research Agenda
Comments: 13 pages, 3 figures, highlights section
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Deploying machine-, and in particular deep-learning, (ML/DL) solutions in
industry-strength, production quality contexts proves to challenging. This
requires a structured engineering approach to constructing and evolving systems
that contain ML/DL components. In this paper, we provide a conceptualization of
the typical evolution patterns that companies experience when employing ML/DL
well as a framework for integrating ML/DL components in systems consisting of
multiple types of components. In addition, we provide an overview of the
engineering challenges surrounding AI/ML/DL solutions and, based on that, we
provide a research agenda and overview of open items that need to be addressed
by the research community at large.
Unsupervisedly Learned Representations: Should the Quest be Over?
Comments: submitted to Pattern Recognition Letters
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI)
There exists a Classification accuracy gap of about 20% between our best
methods of generating Unsupervisedly Learned Representations and the accuracy
rates achieved by (naturally Unsupervisedly Learning) humans. We are at our
fourth decade at least in search of this class of paradigms. It thus may well
be that we are looking in the wrong direction. We present in this paper a
possible solution to this puzzle. We demonstrate that Reinforcement Learning
schemes can learn representations, which may be used for Pattern Recognition
tasks such as Classification, achieving practically the same accuracy as that
of humans. Our main modest contribution lies in the observations that: a. when
applied to a real world environment (e.g. nature itself) Reinforcement Learning
does not require labels, and thus may be considered a natural candidate for the
long sought, accuracy competitive Unsupervised Learning method, and b. in
contrast, when Reinforcement Learning is applied in a simulated or symbolic
processing environment (e.g. a computer program) it does inherently require
labels and should thus be generally classified, with some exceptions, as
Supervised Learning. The corollary of these observations is that further search
for Unsupervised Learning competitive paradigms which may be trained in
simulated environments like many of those found in research and applications
may be futile.
Fast Sequence-Based Embedding with Diffusion Graphs
Comments: Source code available at: this https URL
Journal-ref: CompleNet 2018
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
A graph embedding is a representation of graph vertices in a low-dimensional
space, which approximately preserves properties such as distances between
nodes. Vertex sequence-based embedding procedures use features extracted from
linear sequences of nodes to create embeddings using a neural network. In this
paper, we propose diffusion graphs as a method to rapidly generate vertex
sequences for network embedding. Its computational efficiency is superior to
previous methods due to simpler sequence generation, and it produces more
accurate results. In experiments, we found that the performance relative to
other methods improves with increasing edge density in the graph. In a
community detection task, clustering nodes in the embedding space produces
better results compared to other sequence-based embedding methods.
Learning to Control PDEs with Differentiable Physics
Comments: Published as a conference paper at ICLR 2020. Main text: 10 pages, 6 figures, 3 tables. Total: 28 pages, 18 figures
Subjects:
Machine Learning (cs.LG)
; Fluid Dynamics (physics.flu-dyn); Machine Learning (stat.ML)
Predicting outcomes and planning interactions with the physical world are
long-standing goals for machine learning. A variety of such tasks involves
continuous physical systems, which can be described by partial differential
equations (PDEs) with many degrees of freedom. Existing methods that aim to
control the dynamics of such systems are typically limited to relatively short
time frames or a small number of interaction parameters. We present a novel
hierarchical predictor-corrector scheme which enables neural networks to learn
to understand and control complex nonlinear physical systems over long time
frames. We propose to split the problem into two distinct tasks: planning and
control. To this end, we introduce a predictor network that plans optimal
trajectories and a control network that infers the corresponding control
parameters. Both stages are trained end-to-end using a differentiable PDE
solver. We demonstrate that our method successfully develops an understanding
of complex physical systems and learns to control them for tasks involving PDEs
such as the incompressible Navier-Stokes equations.
Fredrik D. Johansson , Uri Shalit , Nathan Kallus , David Sontag Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Practitioners in diverse fields such as healthcare, economics and education
are eager to apply machine learning to improve decision making. The cost and
impracticality of performing experiments and a recent monumental increase in
electronic record keeping has brought attention to the problem of evaluating
decisions based on non-experimental observational data. This is the setting of
this work. In particular, we study estimation of individual-level causal
effects, such as a single patient’s response to alternative medication, from
recorded contexts, decisions and outcomes. We give generalization bounds on the
error in estimated effects based on distance measures between groups receiving
different treatments, allowing for sample re-weighting. We provide conditions
under which our bound is tight and show how it relates to results for
unsupervised domain adaptation. Led by our theoretical results, we devise
representation learning algorithms that minimize our bound, by regularizing the
representation’s induced treatment group distance, and encourage sharing of
information between treatment groups. We extend these algorithms to
simultaneously learn a weighted representation to further reduce treatment
group distances. Finally, an experimental evaluation on real and synthetic data
shows the value of our proposed representation architecture and regularization
scheme.
Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
Carlos Fernandez , Foster Provost , Xintian Han Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Lack of understanding of the decisions made by model-based AI systems is an
important barrier for their adoption. We examine counterfactual explanations as
an alternative for explaining AI decisions. The counterfactual approach defines
an explanation as a set of the system’s data inputs that causally drives the
decision (meaning that removing them changes the decision) and is irreducible
(meaning that removing any subset of the inputs in the explanation does not
change the decision). We generalize previous work on counterfactual
explanations, resulting in a framework that (a) is model-agnostic, (b) can
address features with arbitrary data types, (c) is able explain decisions made
by complex AI systems that incorporate multiple models, and (d) is scalable to
large numbers of features. We also propose a heuristic procedure to find the
most useful explanations depending on the context. We contrast counterfactual
explanations with another alternative: methods that explain model predictions
by weighting features according to their importance (e.g., SHAP, LIME). This
paper presents two fundamental reasons why explaining model predictions is not
the same as explaining the decisions made using those predictions, suggesting
we should carefully consider whether importance-weight explanations are
well-suited to explain decisions made by AI systems. Specifically, we show that
(1) features that have a large importance weight for a model prediction may not
actually affect the corresponding decision, and (2) importance weights are
insufficient to communicate whether and how features influence system
decisions. We demonstrate this using several examples, including three detailed
studies using real-world data that compare the counterfactual approach with
SHAP and illustrate various conditions under which counterfactual explanations
explain data-driven decisions better than feature importance weights.
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Comments: 14 pages, 8 figures, ICLR2020 accepted as spotlight presentation
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
As deep neural networks (DNNs) achieve tremendous success across many
application domains, researchers tried to explore in many aspects on why they
generalize well. In this paper, we provide a novel perspective on these issues
using the gradient signal to noise ratio (GSNR) of parameters during training
process of DNNs. The GSNR of a parameter is defined as the ratio between its
gradient’s squared mean and variance, over the data distribution. Based on
several approximations, we establish a quantitative relationship between model
parameters’ GSNR and the generalization gap. This relationship indicates that
larger GSNR during training process leads to better generalization performance.
Moreover, we show that, different from that of shallow models (e.g. logistic
regression, support vector machines), the gradient descent optimization
dynamics of DNNs naturally produces large GSNR during training, which is
probably the key to DNNs’ remarkable generalization ability.
On the infinite width limit of neural networks with a standard parameterization
Jascha Sohl-Dickstein , Roman Novak , Samuel S. Schoenholz , Jaehoon Lee Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
There are currently two parameterizations used to derive fixed kernels
corresponding to infinite width neural networks, the NTK (Neural Tangent
Kernel) parameterization and the naive standard parameterization. However, the
extrapolation of both of these parameterizations to infinite width is
problematic. The standard parameterization leads to a divergent neural tangent
kernel while the NTK parameterization fails to capture crucial aspects of
finite width networks such as: the dependence of training dynamics on relative
layer widths, the relative training dynamics of weights and biases, and a
nonstandard learning rate scale. Here we propose an improved extrapolation of
the standard parameterization that preserves all of these properties as width
is taken to infinity and yields a well-defined neural tangent kernel. We show
experimentally that the resulting kernels typically achieve similar accuracy to
those resulting from an NTK parameterization, but with better correspondence to
the parameterization of typical finite width networks. Additionally, with
careful tuning of width parameters, the improved standard parameterization
kernels can outperform those stemming from an NTK parameterization. We release
code implementing this improved standard parameterization as part of the Neural
Tangents library at this https URL .
Mixed integer programming formulation of unsupervised learning
Arturo Berrones-Santos Subjects : Machine Learning (cs.LG) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)
A novel formulation and training procedure for full Boltzmann machines in
terms of a mixed binary quadratic feasibility problem is given. As a proof of
concept, the theory is analytically and numerically tested on XOR patterns.
SGLB: Stochastic Gradient Langevin Boosting
Aleksei Ustimenko , Liudmila Prokhorenkova Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
In this paper, we introduce Stochastic Gradient Langevin Boosting (SGLB) – a
powerful and efficient machine learning framework, which may deal with a wide
range of loss functions and has provable generalization guarantees. The method
is based on a special form of Langevin Diffusion equation specifically designed
for gradient boosting. This allows us to guarantee the global convergence,
while standard gradient boosting algorithms can guarantee only local optima,
which is a problem for multimodal loss functions. To illustrate the advantages
of SGLB, we apply it to a classification task with 0-1 loss function, which is
known to be multimodal, and to a standard Logistic regression task that is
convex. The algorithm is implemented as a part of the CatBoost gradient
boosting library and outperforms classic gradient boosting methods.
Heterogeneous Transfer Learning in Ensemble Clustering
Comments: 10 pages, 5 figures
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
This work proposes an ensemble clustering method using transfer learning
approach. We consider a clustering problem, in which in addition to data under
consideration, “similar” labeled data are available. The datasets can be
described with different features. The method is based on constructing
meta-features which describe structural characteristics of data, and their
transfer from source to target domain. An experimental study of the method
using Monte Carlo modeling has confirmed its efficiency. In comparison with
other similar methods, the proposed one is able to work under arbitrary feature
descriptions of source and target domains; it has smaller complexity.
Model Reuse with Reduced Kernel Mean Embedding Specification
Xi-Zhu Wu , Wenkai Xu , Song Liu , Zhi-Hua Zhou Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Given a publicly available pool of machine learning models constructed for
various tasks, when a user plans to build a model for her own machine learning
application, is it possible to build upon models in the pool such that the
previous efforts on these existing models can be reused rather than starting
from scratch? Here, a grand challenge is how to find models that are helpful
for the current application, without accessing the raw training data for the
models in the pool. In this paper, we present a two-phase framework. In the
upload phase, when a model is uploading into the pool, we construct a reduced
kernel mean embedding (RKME) as a specification for the model. Then in the
deployment phase, the relatedness of the current task and pre-trained models
will be measured based on the value of the RKME specification. Theoretical
results and extensive experiments validate the effectiveness of our approach.
An interpretable neural network model through piecewise linear approximation
Mengzhuo Guo , Qingpeng Zhang , Xiuwu Liao , Daniel Dajun Zeng Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Most existing interpretable methods explain a black-box model in a post-hoc
manner, which uses simpler models or data analysis techniques to interpret the
predictions after the model is learned. However, they (a) may derive
contradictory explanations on the same predictions given different methods and
data samples, and (b) focus on using simpler models to provide higher
descriptive accuracy at the sacrifice of prediction accuracy. To address these
issues, we propose a hybrid interpretable model that combines a piecewise
linear component and a nonlinear component. The first component describes the
explicit feature contributions by piecewise linear approximation to increase
the expressiveness of the model. The other component uses a multi-layer
perceptron to capture feature interactions and implicit nonlinearity, and
increase the prediction performance. Different from the post-hoc approaches,
the interpretability is obtained once the model is learned in the form of
feature shapes. We also provide a variant to explore higher-order interactions
among features to demonstrate that the proposed model is flexible for
adaptation. Experiments demonstrate that the proposed model can achieve good
interpretability by describing feature shapes while maintaining
state-of-the-art accuracy.
Projection based Active Gaussian Process Regression for Pareto Front Modeling
Zhengqi Gao , Jun Tao , Yangfeng Su , Dian Zhou , Xuan Zeng Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Pareto Front (PF) modeling is essential in decision making problems across
all domains such as economics, medicine or engineering. In Operation Research
literature, this task has been addressed based on multi-objective optimization
algorithms. However, without learning models for PF, these methods cannot
examine whether a new provided point locates on PF or not. In this paper, we
reconsider the task from Data Mining perspective. A novel projection based
active Gaussian process regression (P- aGPR) method is proposed for efficient
PF modeling. First, P- aGPR chooses a series of projection spaces with
dimensionalities ranking from low to high. Next, in each projection space, a
Gaussian process regression (GPR) model is trained to represent the constraint
that PF should satisfy in that space. Moreover, in order to improve modeling
efficacy and stability, an active learning framework has been developed by
exploiting the uncertainty information obtained in the GPR models. Different
from all existing methods, our proposed P-aGPR method can not only provide a
generative PF model, but also fast examine whether a provided point locates on
PF or not. The numerical results demonstrate that compared to state-of-the-art
passive learning methods the proposed P-aGPR method can achieve higher modeling
accuracy and stability.
A point-wise linear model reveals reasons for 30-day readmission of heart failure patients
Comments: 8 pages, 3 figures
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Heart failures in the United States cost an estimated 30.7 billion dollars
annually and predictive analysis can decrease costs due to readmission of heart
failure patients. Deep learning can predict readmissions but does not give
reasons for its predictions. Ours is the first study on a deep-learning
approach to explaining decisions behind readmission predictions. Additionally,
it provides an automatic patient stratification to explain cohorts of
readmitted patients. The new deep-learning model called a point-wise linear
model is a meta-learning machine of linear models. It generates a logistic
regression model to predict early readmission for each patient. The custom-made
prediction models allow us to analyze feature importance. We evaluated the
approach using a dataset that had 30-days readmission patients with heart
failures. This study has been submitted in PLOS ONE. In advance, we would like
to share the theoretical aspect of the point-wise linear model as a part of our
study.
Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications
Comments: QQ and ZZ contributed equally to the work. Invited review paper for IEEE Signal Processing Magazine Special Issue on non-convex optimization for signal processing and machine learning. This article contains 26 pages with 11 figures
Subjects:
Machine Learning (cs.LG)
; Information Theory (cs.IT); Image and Video Processing (eess.IV); Optimization and Control (math.OC); Machine Learning (stat.ML)
The problem of finding the sparsest vector (direction) in a low dimensional
subspace can be considered as a homogeneous variant of the sparse recovery
problem, which finds applications in robust subspace recovery, dictionary
learning, sparse blind deconvolution, and many other problems in signal
processing and machine learning. However, in contrast to the classical sparse
recovery problem, the most natural formulation for finding the sparsest vector
in a subspace is usually nonconvex. In this paper, we overview recent advances
on global nonconvex optimization theory for solving this problem, ranging from
geometric analysis of its optimization landscapes, to efficient optimization
algorithms for solving the associated nonconvex optimization problem, to
applications in machine intelligence, representation learning, and imaging
sciences. Finally, we conclude this review by pointing out several interesting
open problems for future research.
Reinforcement Learning with Probabilistically Complete Exploration
Philippe Morere , Gilad Francis , Tom Blau , Fabio Ramos Subjects : Machine Learning (cs.LG) ; Robotics (cs.RO); Machine Learning (stat.ML)
Balancing exploration and exploitation remains a key challenge in
reinforcement learning (RL). State-of-the-art RL algorithms suffer from high
sample complexity, particularly in the sparse reward case, where they can do no
better than to explore in all directions until the first positive rewards are
found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement
Learning (R3L). We formulate exploration as a search problem and leverage
widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to
find initial solutions. These solutions are used as demonstrations to
initialize a policy, then refined by a generic RL algorithm, leading to faster
and more stable convergence. We provide theoretical guarantees of R3L
exploration finding successful solutions, as well as bounds for its sampling
complexity. We experimentally demonstrate the method outperforms classic and
intrinsic exploration techniques, requiring only a fraction of exploration
samples and achieving better asymptotic performance.
Memory capacity of neural networks with threshold and ReLU activations
Comments: 25 pages
Subjects:
Machine Learning (cs.LG)
; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Overwhelming theoretical and empirical evidence shows that mildly
overparametrized neural networks — those with more connections than the size
of the training data — are often able to memorize the training data with
(100\%) accuracy. This was rigorously proved for networks with sigmoid
activation functions and, very recently, for ReLU activations. Addressing a
1988 open question of Baum, we prove that this phenomenon holds for general
multilayered perceptrons, i.e. neural networks with threshold activation
functions, or with any mix of threshold and ReLU activations. Our construction
is probabilistic and exploits sparsity.
A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications
Jie Gui , Zhenan Sun , Yonggang Wen , Dacheng Tao , Jieping Ye Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Generative adversarial networks (GANs) are a hot research topic recently.
GANs have been widely studied since 2014, and a large number of algorithms have
been proposed. However, there is few comprehensive study explaining the
connections among different GANs variants, and how they have evolved. In this
paper, we attempt to provide a review on various GANs methods from the
perspectives of algorithms, theory, and applications. Firstly, the motivations,
mathematical representations, and structure of most GANs algorithms are
introduced in details. Furthermore, GANs have been combined with other machine
learning algorithms for specific applications, such as semi-supervised
learning, transfer learning, and reinforcement learning. This paper compares
the commonalities and differences of these GANs methods. Secondly, theoretical
issues related to GANs are investigated. Thirdly, typical applications of GANs
in image processing and computer vision, natural language processing, music,
speech and audio, medical field, and data science are illustrated. Finally, the
future open research problems for GANs are pointed out.
Infrequent adverse event prediction in low carbon energy production using machine learning
Stefano Coniglio , Anthony J. Dunn , Alain B. Zemkoho Subjects : Machine Learning (cs.LG) ; Optimization and Control (math.OC)
Machine Learning is one of the fastest growing fields in academia. Many
industries are aiming to incorporate machine learning tools into their day to
day operation. However the keystone of doing so, is recognising when you have a
problem which can be solved using machine learning. Adverse event prediction is
one such problem. There are a wide range of methods for the production of
sustainable energy. In many of which adverse events can occur which can impede
energy production and even damage equipment. The two examples of adverse event
prediction in sustainable energy production we examine in this paper are foam
formation in anaerobic digestion and condenser fouling in steam turbines as
used in nuclear power stations. In this paper we will propose a framework for:
formalising a classification problem based around adverse event prediction,
building predictive maintenance models capable of predicting these events
before they occur and testing the reliability of these models.
Distributionally Robust Bayesian Quadrature Optimization
Comments: AISTATS2020
Journal-ref: AISTATS2020
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Bayesian quadrature optimization (BQO) maximizes the expectation of an
expensive black-box integrand taken over a known probability distribution. In
this work, we study BQO under distributional uncertainty in which the
underlying probability distribution is unknown except for a limited set of its
i.i.d. samples. A standard BQO approach maximizes the Monte Carlo estimate of
the true expected objective given the fixed sample set. Though Monte Carlo
estimate is unbiased, it has high variance given a small set of samples; thus
can result in a spurious objective function. We adopt the distributionally
robust optimization perspective to this problem by maximizing the expected
objective under the most adversarial distribution. In particular, we propose a
novel posterior sampling based algorithm, namely distributionally robust BQO
(DRBQO) for this purpose. We demonstrate the empirical effectiveness of our
proposed framework in synthetic and real-world problems, and characterize its
theoretical convergence via Bayesian regret.
Discriminator Soft Actor Critic without Extrinsic Rewards
Daichi Nishio , Daiki Kuyoshi , Toi Tsuneda , Satoshi Yamane Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
It is difficult to be able to imitate well in unknown states from a small
amount of expert data and sampling data. Supervised learning methods such as
Behavioral Cloning do not require sampling data, but usually suffer from
distribution shift. The methods based on reinforcement learning, such as
inverse reinforcement learning and generative adversarial imitation learning
(GAIL), can learn from only a few expert data. However, they often need to
interact with the environment. Soft Q imitation learning addressed the
problems, and it was shown that it could learn efficiently by combining
Behavioral Cloning and soft Q-learning with constant rewards. In order to make
this algorithm more robust to distribution shift, We propose Discriminator Soft
Actor Critic (DSAC). It uses a reward function based on adversarial inverse
reinforcement learning instead of constant rewards. We evaluated it on PyBullet
environments with only four expert trajectories.
Learning Options from Demonstration using Skill Segmentation
Comments: To be published in SAUPEC/RobMech/PRASA 2020. Consists of 6 pages, 5 figures
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
We present a method for learning options from segmented demonstration
trajectories. The trajectories are first segmented into skills using
nonparametric Bayesian clustering and a reward function for each segment is
then learned using inverse reinforcement learning. From this, a set of inferred
trajectories for the demonstration are generated. Option initiation sets and
termination conditions are learned from these trajectories using the one-class
support vector machine clustering algorithm. We demonstrate our method in the
four rooms domain, where an agent is able to autonomously discover usable
options from human demonstration. Our results show that these inferred options
can then be used to improve learning and planning.
Gradient Surgery for Multi-Task Learning
Tianhe Yu , Saurabh Kumar , Abhishek Gupta , Sergey Levine , Karol Hausman , Chelsea Finn Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
While deep learning and deep reinforcement learning (RL) systems have
demonstrated impressive results in domains such as image classification, game
playing, and robotic control, data efficiency remains a major challenge.
Multi-task learning has emerged as a promising approach for sharing structure
across multiple tasks to enable more efficient learning. However, the
multi-task setting presents a number of optimization challenges, making it
difficult to realize large efficiency gains compared to learning tasks
independently. The reasons why multi-task learning is so challenging compared
to single-task learning are not fully understood. In this work, we identify a
set of three conditions of the multi-task optimization landscape that cause
detrimental gradient interference, and develop a simple yet general approach
for avoiding such interference between task gradients. We propose a form of
gradient surgery that projects a task’s gradient onto the normal plane of the
gradient of any other task that has a conflicting gradient. On a series of
challenging multi-task supervised and multi-task RL problems, this approach
leads to substantial gains in efficiency and performance. Further, it is
model-agnostic and can be combined with previously-proposed multi-task
architectures for enhanced performance.
Algebraic and Analytic Approaches for Parameter Learning in Mixture Models
Comments: 22 pages, Accepted at Algorithmic Learning Theory (ALT) 2020
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
We present two different approaches for parameter learning in several mixture
models in one dimension. Our first approach uses complex-analytic methods and
applies to Gaussian mixtures with shared variance, binomial mixtures with
shared success probability, and Poisson mixtures, among others. An example
result is that (exp(O(N^{1/3}))) samples suffice to exactly learn a mixture of
(k Our second approach uses algebraic and combinatorial tools and applies to binomial mixtures with shared trial parameter (N) and differing success parameters, as well as to mixtures of geometric distributions. Again, as an example, for binomial mixtures with (k) components and success parameters discretized to resolution (epsilon), (O(k^2(N/epsilon)^{8/sqrt{epsilon}})) samples suffice to exactly recover the parameters. For some of these distributions, our results represent the first guarantees for parameter estimation. Comments: 10 pages, 5 figures : Machine Learning (cs.LG) ; Robotics (cs.RO); Machine Learning (stat.ML) Actor critic methods with sparse rewards in model-based deep reinforcement learning typically require a deterministic binary reward function that reflects only two possible outcomes: if, for each step, the goal has been achieved or not. Our hypothesis is that we can influence an agent to learn faster by applying an external environmental pressure during training, which adversely impacts its ability to get higher rewards. As such, we deviate from the classical paradigm of sparse rewards and add a uniformly sampled reward value to the baseline reward to show that (1) sample efficiency of the training process can be correlated to the adversity experienced during training, (2) it is possible to achieve higher performance in less time and with less resources, (3) we can reduce the performance variability experienced seed over seed, (4) there is a maximum point after which more pressure will not generate better results, and (5) that random positive incentives have an adverse effect when using a negative reward strategy, making an agent under those conditions learn poorly and more slowly. These results have been shown to be valid for Deep Deterministic Policy Gradients using Hindsight Experience Replay in a well known Mujoco environment, but we argue that they could be generalized to other methods and environments as well. Marek Śmieja , Łukasz Struski , Mário A. T. Figueiredo Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML) In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method. Comments: 191 pages, PhD thesis : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML) This dissertation explores the integration of learning and analogy-making through the development of a computer program, called Analogator, that learns to make analogies by example. By “seeing” many different analogy problems, along with possible solutions, Analogator gradually develops an ability to make new analogies. That is, it learns to make analogies by analogy. This approach stands in contrast to most existing research on analogy-making, in which typically the a priori existence of analogical mechanisms within a model is assumed. The present research extends standard connectionist methodologies by developing a specialized associative training procedure for a recurrent network architecture. The network is trained to divide input scenes (or situations) into appropriate figure and ground components. Seeing one scene in terms of a particular figure and ground provides the context for seeing another in an analogous fashion. After training, the model is able to make new analogies between novel situations. Analogator has much in common with lower-level perceptual models of categorization and recognition; it thus serves as a unifying framework encompassing both high-level analogical learning and low-level perception. This approach is compared and contrasted with other computational models of analogy-making. The model’s training and generalization performance is examined, and limitations are discussed. Shuo Wang , Tianle Chen , Shangyu Chen , Carsten Rudolph , Surya Nepal , Marthie Grobler Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Machine Learning (stat.ML) Anomaly detection aims to recognize samples with anomalous and unusual patterns with respect to a set of normal data, which is significant for numerous domain applications, e.g. in industrial inspection, medical imaging, and security enforcement. There are two key research challenges associated with existing anomaly detention approaches: (1) many of them perform well on low-dimensional problems however the performance on high-dimensional instances is limited, such as images; (2) many of them depend on often still rely on traditional supervised approaches and manual engineering of features, while the topic has not been fully explored yet using modern deep learning approaches, even when the well-label samples are limited. In this paper, we propose a One-for-all Image Anomaly Detection system (OIAD) based on disentangled learning using only clean samples. Our key insight is that the impact of small perturbation on the latent representation can be bounded for normal samples while anomaly images are usually outside such bounded intervals, called structure consistency. We implement this idea and evaluate its performance for anomaly detention. Our experiments with three datasets show that OIAD can detect over (90\%) of anomalies while maintaining a high low false alarm rate. It can also detect suspicious samples from samples labeled as clean, coincided with what humans would deem unusual. Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning Samaneh Hosseini Semnani , Hugh Liu , Michael Everett , Anton de Ruiter , Jonathan P. How Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO) This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can’t produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50% more successful scenarios than deep RL and up to 75% less extra time to reach goal than FMP. Comments: the 24th European Conference on Artificial Intelligence (ECAI 2020) : Machine Learning (cs.LG) ; Machine Learning (stat.ML) In this paper, we investigate algorithms for anomaly detection. Previous anomaly detection methods focus on modeling the distribution of non-anomalous data provided during training. However, this does not necessarily ensure the correct detection of anomalous data. We propose a new Regularized Cycle Consistent Generative Adversarial Network (RCGAN) in which deep neural networks are adversarially trained to better recognize anomalous samples. This approach is based on leveraging a penalty distribution with a new definition of the loss function and novel use of discriminator networks. It is based on a solid mathematical foundation, and proofs show that our approach has stronger guarantees for detecting anomalous examples compared to the current state-of-the-art. Experimental results on both real-world and synthetic data show that our model leads to significant and consistent improvements on previous anomaly detection benchmarks. Notably, RCGAN improves on the state-of-the-art on the KDDCUP, Arrhythmia, Thyroid, Musk and CIFAR10 datasets. Comments: 19 pages : Machine Learning (cs.LG) ; Optimization and Control (math.OC); Machine Learning (stat.ML) One of the key challenges in designing machine learning systems is to determine the right balance amongst several objectives, which also oftentimes are incommensurable and conflicting. For example, when designing deep neural networks (DNNs), one often has to trade-off between multiple objectives, such as accuracy, energy consumption, and inference time. Typically, there is no single configuration that performs equally well for all objectives. Consequently, one is interested in identifying Pareto-optimal designs. Although different multi-objective optimization algorithms have been developed to identify Pareto-optimal configurations, state-of-the-art multi-objective optimization methods do not consider the different evaluation costs attending the objectives under consideration. This is particularly important for optimizing DNNs: the cost arising on account of assessing the accuracy of DNNs is orders of magnitude higher than that of measuring the energy consumption of pre-trained DNNs. We propose FlexiBO, a flexible Bayesian optimization method, to address this issue. We formulate a new acquisition function based on the improvement of the Pareto hyper-volume weighted by the measurement cost of each objective. Our acquisition function selects the next sample and objective that provides maximum information gain per unit of cost. We evaluated FlexiBO on 7 state-of-the-art DNNs for object detection, natural language processing, and speech recognition. Our results indicate that, when compared to other state-of-the-art methods across the 7 architectures we tested, the Pareto front obtained using FlexiBO has, on average, a 28.44% higher contribution to the true Pareto front and achieves 25.64% better diversity. Comments: Appeared in ECML-PKDD 2019 : Machine Learning (cs.LG) ; Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML) In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder’s perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated. We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network. Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of the largest demand-side platforms and significant improvement has been achieved in comparison with the baseline solutions. Inference for Network Structure and Dynamics from Time Series Data via Graph Neural Network Mengyuan Chen , Jiang Zhang , Zhang Zhang , Lun Du , Qiao Hu , Shuo Wang , Jiaqi Zhu Subjects : Machine Learning (cs.LG) ; Social and Information Networks (cs.SI); Machine Learning (stat.ML) Network structures in various backgrounds play important roles in social, technological, and biological systems. However, the observable network structures in real cases are often incomplete or unavailable due to measurement errors or private protection issues. Therefore, inferring the complete network structure is useful for understanding complex systems. The existing studies have not fully solved the problem of inferring network structure with partial or no information about connections or nodes. In this paper, we tackle the problem by utilizing time series data generated by network dynamics. We regard the network inference problem based on dynamical time series data as a problem of minimizing errors for predicting future states and proposed a novel data-driven deep learning model called Gumbel Graph Network (GGN) to solve the two kinds of network inference problems: Network Reconstruction and Network Completion. For the network reconstruction problem, the GGN framework includes two modules: the dynamics learner and the network generator. For the network completion problem, GGN adds a new module called the States Learner to infer missing parts of the network. We carried out experiments on discrete and continuous time series data. The experiments show that our method can reconstruct up to 100% network structure on the network reconstruction task. While the model can also infer the unknown parts of the structure with up to 90% accuracy when some nodes are missing. And the accuracy decays with the increase of the fractions of missing nodes. Our framework may have wide application areas where the network structure is hard to obtained and the time series data is rich. Imtiaz Ahmed , Xia Ben Hu , Mithun P. Acharya , Yu Ding Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML) Dimensionality reduction is considered as an important step for ensuring competitive performance in unsupervised learning such as anomaly detection. Non-negative matrix factorization (NMF) is a popular and widely used method to accomplish this goal. But NMF, together with its recent, enhanced version, like graph regularized NMF or symmetric NMF, do not have the provision to include the neighborhood structure information and, as a result, may fail to provide satisfactory performance in presence of nonlinear manifold structure. To address that shortcoming, we propose to consider and incorporate the neighborhood structural similarity information within the NMF framework by modeling the data through a minimum spanning tree. What motivates our choice is the understanding that in the presence of complicated data structure, a minimum spanning tree can approximate the intrinsic distance between two data points better than a simple Euclidean distance does, and consequently, it constitutes a more reasonable basis for differentiating anomalies from the normal class data. We label the resulting method as the neighborhood structure assisted NMF. By comparing the formulation and properties of the neighborhood structure assisted NMF with other versions of NMF including graph regularized NMF and symmetric NMF, it is apparent that the inclusion of the neighborhood structure information using minimum spanning tree makes a key difference. We further devise both offline and online algorithmic versions of the proposed method. Empirical comparisons using twenty benchmark datasets as well as an industrial dataset extracted from a hydropower plant demonstrate the superiority of the neighborhood structure assisted NMF and support our claim of merit. Comments: IUI 2020 : Machine Learning (cs.LG) ; Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML) We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists’ trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies — qualitative interviews, a controlled experiment, and a card-sorting task — to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool. Comments: arXiv admin note: substantial text overlap with arXiv:1902.03055 : Machine Learning (cs.LG) ; Statistics Theory (math.ST); Machine Learning (stat.ML) There is a large body of work on convergence rates either in passive or active learning. Here we first outline some of the main results that have been obtained, more specifically in a nonparametric setting under assumptions about the smoothness of the regression function (or the boundary between classes) and the margin noise. We discuss the relative merits of these underlying assumptions by putting active learning in perspective with recent work on passive learning. We design an active learning algorithm with a rate of convergence better than in passive learning, using a particular smoothness assumption customized for k-nearest neighbors. Unlike previous active learning algorithms, we use a smoothness assumption that provides a dependence on the marginal distribution of the instance space. Additionally, our algorithm avoids the strong density assumption that supposes the existence of the density function of the marginal distribution of the instance space and is therefore more generally applicable. Victor de la Pena , Haolin Zou Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG) Online learning to rank is a core problem in machine learning. In Lattimore et al. (2018), a novel online learning algorithm was proposed based on topological sorting. In the paper they provided a set of self-normalized inequalities (a) in the algorithm as a criterion in iterations and (b) to provide an upper bound for cumulative regret, which is a measure of algorithm performance. In this work, we utilized method of mixtures and asymptotic expansions of certain implicit function to provide a tighter, iterated-log-like boundary for the inequalities, and as a consequence improve both the algorithm itself as well as its performance estimation. Iman Marivani , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) The reconstruction of a high resolution image given a low resolution observation is an ill-posed inverse problem in imaging. Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a high-resolution output. Unlike existing deep multimodal models that do not incorporate domain knowledge about the problem, we propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture. Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information; therefore, the proposed neural network is interpretable by design. The deep unfolding architecture is used as a core component of a multimodal framework for guided image super-resolution. An alternative multimodal design is investigated by employing residual learning to improve the training efficiency. The presented multimodal approach is applied to super-resolution of near-infrared and multi-spectral images as well as depth upsampling using RGB images as side information. Experimental results show that our model outperforms state-of-the-art methods. Wen Wang , Xiaojiang Peng , Yu Qiao , Jian Cheng Subjects : Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG) Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years. A typical OAD system mainly consists of three modules: a frame-level feature extractor which is usually based on pre-trained deep Convolutional Neural Networks (CNNs), a temporal modeling module, and an action classifier. Among them, the temporal modeling module is crucial which aggregates discriminative information from historical and current features. Though many temporal modeling methods have been developed for OAD and other topics, their effects are lack of investigation on OAD fairly. This paper aims to provide a comprehensive study on temporal modeling for OAD including four meta types of temporal modeling methods, ie temporal pooling, temporal convolution, recurrent neural networks, and temporal attention, and uncover some good practices to produce a state-of-the-art OAD system. Many of them are explored in OAD for the first time, and extensively evaluated with various hyper parameters. Furthermore, based on our comprehensive study, we present several hybrid temporal modeling methods, which outperform the recent state-of-the-art methods with sizable margins on THUMOS-14 and TVSeries. Artem Ryzhikov , Denis Derkach , Mikhail Hushchyn Subjects : Data Analysis, Statistics and Probability (physics.data-an) ; Machine Learning (cs.LG) Accurate particle identification (PID) is one of the most important aspects of the LHCb experiment. Modern machine learning techniques such as neural networks (NNs) are efficiently applied to this problem and are integrated into the LHCb software. In this research, we discuss novel applications of neural network speed-up techniques to achieve faster PID in LHC upgrade conditions. We show that the best results are obtained using variational dropout sparsification, which provides a prediction (feedforward pass) speed increase of up to a factor of sixteen even when compared to a model with shallow networks. Andreas Buchberger , Christian Häger , Henry D. Pfister , Laurent Schmalen , Alexandre Graell i Amat Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG) We consider near maximum-likelihood (ML) decoding of short linear block codes based on neural belief propagation (BP) decoding recently introduced by Nachmani et al.. While this method significantly outperforms conventional BP decoding, the underlying parity-check matrix may still limit the overall performance. In this paper, we introduce a method to tailor an overcomplete parity-check matrix to (neural) BP decoding using machine learning. We consider the weights in the Tanner graph as an indication of the importance of the connected check nodes (CNs) to decoding and use them to prune unimportant CNs. As the pruning is not tied over iterations, the final decoder uses a different parity-check matrix in each iteration. For Reed-Muller and short low-density parity-check codes, we achieve performance within 0.27 dB and 1.5 dB of the ML performance while reducing the complexity of the decoder. Junsuk Choe , Seong Joon Oh , Seungho Lee , Sanghyuk Chun , Zeynep Akata , Hyunjung Shim Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision to validate hyperparameters and for model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL. Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities Kaixuan Chen , Dalin Zhang , Lina Yao , Bin Guo , Zhiwen Yu , Yunhao Liu Subjects : Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG) The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions. Comments: 8 pages, 2 figures : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Statistics Theory (math.ST) The problem of maximizing (or minimizing) the agreement between clusterings, subject to given marginals, can be formally posed under a common framework for several agreement measures. Until now, it was possible to find its solution only through numerical algorithms. Here, an explicit solution is shown for the case where the two clusterings have two clusters each. Comments: 17 pages, 7 figures : Machine Learning (stat.ML) ; Machine Learning (cs.LG) Transport demand is highly dependent on supply, especially for shared transport services where availability is often limited. As observed demand cannot be higher than available supply, historical transport data typically represents a biased, or censored, version of the true underlying demand pattern. Without explicitly accounting for this inherent distinction, predictive models of demand would necessarily represent a biased version of true demand, thus less effectively predicting the needs of service users. To counter this problem, we propose a general method for censorship-aware demand modeling, for which we devise a censored likelihood function. We apply this method to the task of shared mobility demand prediction by incorporating the censored likelihood within a Gaussian Process model, which can flexibly approximate arbitrary functional forms. Experiments on artificial and real-world datasets show how taking into account the limiting effect of supply on demand is essential in the process of obtaining an unbiased predictive model of user demand behavior. Colin Summers , Kendall Lowrey , Aravind Rajeswaran , Siddhartha Srinivasa , Emanuel Todorov Subjects : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY) We introduce Lyceum, a high-performance computational ecosystem for robot learning. Lyceum is built on top of the Julia programming language and the MuJoCo physics simulator, combining the ease-of-use of a high-level programming language with the performance of native C. In addition, Lyceum has a straightforward API to support parallel computation across multiple cores and machines. Overall, depending on the complexity of the environment, Lyceum is 5-30x faster compared to other popular abstractions like OpenAI’s Gym and DeepMind’s dm-control. This substantially reduces training time for various reinforcement learning algorithms; and is also fast enough to support real-time model predictive control through MuJoCo. The code, tutorials, and demonstration videos can be found at: this http URL . Comments: Accepted to ISBI 2020 : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML) Ultrasound (US) is one of the most commonly used imaging modalities in both diagnosis and surgical interventions due to its low-cost, safety, and non-invasive characteristic. US image segmentation is currently a unique challenge because of the presence of speckle noise. As manual segmentation requires considerable efforts and time, the development of automatic segmentation algorithms has attracted researchers attention. Although recent methodologies based on convolutional neural networks have shown promising performances, their success relies on the availability of a large number of training data, which is prohibitively difficult for many applications. Therefore, in this study we propose the use of simulated US images and natural images as auxiliary datasets in order to pre-train our segmentation network, and then to fine-tune with limited in vivo data. We show that with as little as 19 in vivo images, fine-tuning the pre-trained network improves the dice score by 21% compared to training from scratch. We also demonstrate that if the same number of natural and simulation US images is available, pre-training on simulation data is preferable. Comments: Submitted to DAS2020 : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) Designing fonts requires a great deal of time and effort. It requires professional skills, such as sketching, vectorizing, and image editing. Additionally, each letter has to be designed individually. In this paper, we will introduce a method to create fonts automatically. In our proposed method, the difference of font styles between two different fonts is found and transferred to another font using neural style transfer. Neural style transfer is a method of stylizing the contents of an image with the styles of another image. We proposed a novel neural style difference and content difference loss for the neural style transfer. With these losses, new fonts can be generated by adding or removing font styles from a font. We provided experimental results with various combinations of input fonts and discussed limitations and future development for the proposed method. Hao Xu , Haibin Chang , Dongxiao Zhang Subjects : Neural and Evolutionary Computing (cs.NE) ; Machine Learning (cs.LG) Data-driven methods have recently been developed to discover underlying partial differential equations (PDEs) of physical problems. However, for these methods, a complete candidate library of potential terms in a PDE are usually required. To overcome this limitation, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In the proposed framework, a deep neural network that is trained with available data of a physical problem is utilized to generate meta-data and calculate derivatives, and the genetic algorithm is then employed to discover the underlying PDE. Owing to the merits of the genetic algorithm, such as mutation and crossover, DLGA-PDE can work with an incomplete candidate library. The proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV) equation, the Burgers equation, the wave equation, and the Chaffee-Infante equation, respectively, for proof-of-concept. Satisfactory results are obtained without the need for a complete candidate library, even in the presence of noisy and limited data. Comments: 19 pages, 5 figures, 2 tables : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph); Quantitative Methods (q-bio.QM) Histological staining is a vital step used to diagnose various diseases and has been used for more than a century to provide contrast to tissue sections, rendering the tissue constituents visible for microscopic analysis by medical experts. However, this process is time-consuming, labor-intensive, expensive and destructive to the specimen. Recently, the ability to virtually-stain unlabeled tissue sections, entirely avoiding the histochemical staining step, has been demonstrated using tissue-stain specific deep neural networks. Here, we present a new deep learning-based framework which generates virtually-stained images using label-free tissue, where different stains are merged following a micro-structure map defined by the user. This approach uses a single deep neural network that receives two different sources of information at its input: (1) autofluorescence images of the label-free tissue sample, and (2) a digital staining matrix which represents the desired microscopic map of different stains to be virtually generated at the same tissue section. This digital staining matrix is also used to virtually blend existing stains, digitally synthesizing new histological stains. We trained and blindly tested this virtual-staining network using unlabeled kidney tissue sections to generate micro-structured combinations of Hematoxylin and Eosin (H&E), Jones silver stain, and Masson’s Trichrome stain. Using a single network, this approach multiplexes virtual staining of label-free tissue with multiple types of stains and paves the way for synthesizing new digital histological stains that can be created on the same tissue cross-section, which is currently not feasible with standard histochemical staining methods. Tsun-Yi Yang , Duy-Kien Nguyen , Huub Heijnen , Vassileios Balntas Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) In this paper, we explore how three related tasks, namely keypoint detection, description, and image retrieval can be jointly tackled using a single unified framework, which is trained without the need of training data with point to point correspondences. By leveraging diverse information from sequential layers of a standard ResNet-based architecture, we are able to extract keypoints and descriptors that encode local information using generic techniques such as local activation norms, channel grouping and dropping, and self-distillation. Subsequently, global information for image retrieval is encoded in an end-to-end pipeline, based on pooling of the aforementioned local responses. In contrast to previous methods in local matching, our method does not depend on pointwise/pixelwise correspondences, and requires no such supervision at all i.e. no depth-maps from an SfM model nor manually created synthetic affine transformations. We illustrate that this simple and direct paradigm, is able to achieve very competitive results against the state-of-the-art methods in various challenging benchmark conditions such as viewpoint changes, scale changes, and day-night shifting localization. Yuxiao Chen , Sumanth Dathathri , Tung Phan-Minh , Richard M. Murray Subjects : Systems and Control (eess.SY) ; Machine Learning (cs.LG); Robotics (cs.RO) There is a growing interest in building autonomous systems that interact with complex environments. The difficulty associated with obtaining an accurate model for such environments poses a challenge to the task of assessing and guaranteeing the system’s performance. We present a data-driven solution that allows for a system to be evaluated for specification conformance without an accurate model of the environment. Our approach involves learning a conservative reactive bound of the environment’s behavior using data and specification of the system’s desired behavior. First, the approach begins by learning a conservative reactive bound on the environment’s actions that captures its possible behaviors with high probability. This bound is then used to assist verification, and if the verification fails under this bound, the algorithm returns counter-examples to show how failure occurs and then uses these to refine the bound. We demonstrate the applicability of the approach through two case-studies: i) verifying controllers for a toy multi-robot system, and ii) verifying an instance of human-robot interaction during a lane-change maneuver given real-world human driving data. Comments: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 2020 : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM) There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information. Wei Wang , Xiang-Gen Xia , Chuanjiang He , Zemin Ren , Jian Lu , Tianfu Wang , Baiying Lei Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) A CT image can be well reconstructed when the sampling rate of the sinogram satisfies the Nyquist criteria and the sampled signal is noise-free. However, in practice, the sinogram is usually contaminated by noise, which degrades the quality of a reconstructed CT image. In this paper, we design a deep network for sinogram and CT image reconstruction. The network consists of two cascaded blocks that are linked by a filter backprojection (FBP) layer, where the former block is responsible for denoising and completing the sinograms while the latter is used to removing the noise and artifacts of the CT images. Experimental results show that the reconstructed CT images by our methods have the highest PSNR and SSIM in average compared to state of the art methods. Comments: 12 pages, 7 figures, accepted to SafeAI workshop at AAAI : Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal model frameworks. Finally, we illustrate how these incentives predict agent incentives in both fairness and AI safety applications. Lorenz Braun , Sotirios Nikas , Chen Song , Vincent Heuveline , Holger Fröning Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Machine Learning (cs.LG); Performance (cs.PF) Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non trivial task. We address this with a simple model enabling portable and fast predictions among different GPUs using only hardware-independent features extracted. This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU and SHOC. Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of [13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on five different GPUs, while latency for a single prediction varies between 0.1 and 0.2 seconds. Zhen-Liang Ni , Gui-Bin Bian , Guan-An Wang , Xiao-Hu Zhou , Zeng-Guang Hou , Xiao-Liang Xie , Zhen Li , Yu-Han Wang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Robotics (cs.RO); Image and Video Processing (eess.IV) Surgical instrument segmentation is extremely important for computer-assisted surgery. Different from common object segmentation, it is more challenging due to the large illumination and scale variation caused by the special surgical scenes. In this paper, we propose a novel bilinear attention network with adaptive receptive field to solve these two challenges. For the illumination variation, the bilinear attention module can capture second-order statistics to encode global contexts and semantic dependencies between local pixels. With them, semantic features in challenging areas can be inferred from their neighbors and the distinction of various semantics can be boosted. For the scale variation, our adaptive receptive field module aggregates multi-scale features and automatically fuses them with different weights. Specifically, it encodes the semantic relationship between channels to emphasize feature maps with appropriate scales, changing the receptive field of subsequent convolutions. The proposed network achieves the best performance 97.47% mean IOU on Cata7 and comes first place on EndoVis 2017 by 10.10% IOU overtaking second-ranking method. Comments: Submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing 2020 (ICASSP 2020). Link to GitHub Repository: this https URL : Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP) The state-of-art approach to speaker verification involves the extraction of discriminative embeddings like x-vectors followed by a generative model back-end using a probabilistic linear discriminant analysis (PLDA). In this paper, we propose a Pairwise neural discriminative model for the task of speaker verification which operates on a pair of speaker embeddings such as x-vectors/i-vectors and outputs a score that can be considered as a scaled log-likelihood ratio. We construct a differentiable cost function which approximates speaker verification loss, namely the minimum detection cost. The pre-processing steps of linear discriminant analysis (LDA), unit length normalization and within class covariance normalization are all modeled as layers of a neural model and the speaker verification cost functions can be back-propagated through these layers during training. We also explore regularization techniques to prevent overfitting, which is a major concern in using discriminative back-end models for verification tasks. The experiments are performed on the NIST SRE 2018 development and evaluation datasets. We observe average relative improvements of 8% in CMN2 condition and 30% in VAST condition over the PLDA baseline system. Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems : Machine Learning (stat.ML) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) In this paper we develop a new model for deep image clustering, using convolutional neural networks and tensor kernels. The proposed Deep Tensor Kernel Clustering (DTKC) consists of a convolutional neural network (CNN), which is trained to reflect a common cluster structure at the output of its intermediate layers. Encouraging a consistent cluster structure throughout the network has the potential to guide it towards meaningful clusters, even though these clusters might appear to be nonlinear in the input space. The cluster structure is enforced through the idea of unsupervised companion objectives, where separate loss functions are attached to layers in the network. These unsupervised companion objectives are constructed based on a proposed generalization of the Cauchy-Schwarz (CS) divergence, from vectors to tensors of arbitrary rank. Generalizing the CS divergence to tensor-valued data is a crucial step, due to the tensorial nature of the intermediate representations in the CNN. Several experiments are conducted to thoroughly assess the performance of the proposed DTKC model. The results indicate that the model outperforms, or performs comparable to, a wide range of baseline algorithms. We also empirically demonstrate that our model does not suffer from objective function mismatch, which can be a problematic artifact in autoencoder-based clustering models. Renoh Johnson Chalakkal , Faizal Hafiz , Waleed Abdulla , Akshya Swain Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE) The present study proposes a new approach to automated screening of Clinically Significant Macular Edema (CSME) and addresses two major challenges associated with such screenings, i.e., exudate segmentation and imbalanced datasets. The proposed approach replaces the conventional exudate segmentation based feature extraction by combining a pre-trained deep neural network with meta-heuristic feature selection. A feature space over-sampling technique is being used to overcome the effects of skewed datasets and the screening is accomplished by a k-NN based classifier. The role of each data-processing step (e.g., class balancing, feature selection) and the effects of limiting the region-of-interest to fovea on the classification performance are critically analyzed. Finally, the selection and implication of operating point on Receiver Operating Characteristic curve are discussed. The results of this study convincingly demonstrate that by following these fundamental practices of machine learning, a basic k-NN based classifier could effectively accomplish the CSME screening. Comments: 2019 International Conference on Image Analysis and Recognition : Image and Video Processing (eess.IV) ; Machine Learning (cs.LG); Machine Learning (stat.ML) We propose a novel direction to improve the denoising quality of filtering-based denoising algorithms in real time by predicting the best filter parameter value using a Convolutional Neural Network (CNN). We take the use case of BM3D, the state-of-the-art filtering-based denoising algorithm, to demonstrate and validate our approach. We propose and train a simple, shallow CNN to predict in real time, the optimum filter parameter value, given the input noisy image. Each training example consists of a noisy input image (training data) and the filter parameter value that produces the best output (training label). Both qualitative and quantitative results using the widely used PSNR and SSIM metrics on the popular BSD68 dataset show that the CNN-guided BM3D outperforms the original, unguided BM3D across different noise levels. Thus, our proposed method is a CNN-based improvement on the original BM3D which uses a fixed, default parameter value for all images. Comments: 2018 IEEE SENSORS : Image and Video Processing (eess.IV) ; Machine Learning (cs.LG); Machine Learning (stat.ML) Interferometric Synthetic Aperture Radar (InSAR) imagery based on microwaves reflected off ground targets is becoming increasingly important in remote sensing for ground movement estimation. However, the reflections are contaminated by noise, which distorts the signal’s wrapped phase. Demarcation of image regions based on degree of contamination (“coherence”) is an important component of the InSAR processing pipeline. We introduce Convolutional Neural Networks (CNNs) to this problem domain and show their effectiveness in improving coherence-based demarcation and reducing misclassifications in completely incoherent regions through intelligent preprocessing of training data. Quantitative and qualitative comparisons prove superiority of proposed method over three established methods. Comments: 2018 IEEE SENSORS : Image and Video Processing (eess.IV) ; Machine Learning (cs.LG); Machine Learning (stat.ML) Interferometric Synthetic Aperture Radar (InSAR) imagery for estimating ground movement, based on microwaves reflected off ground targets is gaining increasing importance in remote sensing. However, noise corrupts microwave reflections received at satellite and contaminates the signal’s wrapped phase. We introduce Convolutional Neural Networks (CNNs) to this problem domain and show the effectiveness of autoencoder CNN architectures to learn InSAR image denoising filters in the absence of clean ground truth images, and for artefact reduction in estimated coherence through intelligent preprocessing of training data. We compare our results with four established methods to illustrate superiority of proposed method. Comments: Accepted by AISTATS2020 : Computation and Language (cs.CL) ; Machine Learning (cs.LG) Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching. To further exploit it, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences for enhanced exploration and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance. Shun-ichi Amari Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG) It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role: When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere reduces to a Gaussian distribution of negligibly small covariances. Nan Wu , Adrien Vincent , Dmitri Strukov , Yuan Xie Subjects : Emerging Technologies (cs.ET) ; Machine Learning (cs.LG) Recently, significant progress has been made in solving sophisticated problems among various domains by using reinforcement learning (RL), which allows machines or agents to learn from interactions with environments rather than explicit supervision. As the end of Moore’s law seems to be imminent, emerging technologies that enable high performance neuromorphic hardware systems are attracting increasing attention. Namely, neuromorphic architectures that leverage memristors, the programmable and nonvolatile two-terminal devices, as synaptic weights in hardware neural networks, are candidates of choice to realize such highly energy-efficient and complex nervous systems. However, one of the challenges for memristive hardware with integrated learning capabilities is prohibitively large number of write cycles that might be required during learning process, and this situation is even exacerbated under RL situations. In this work we propose a memristive neuromorphic hardware implementation for the actor-critic algorithm in RL. By introducing a two-fold training procedure (i.e., ex-situ pre-training and in-situ re-training) and several training techniques, the number of weight updates can be significantly reduced and thus it will be suitable for efficient in-situ learning implementations. As a case study, we consider the task of balancing an inverted pendulum, a classical problem in both RL and control theory. We believe that this study shows the promise of using memristor-based hardware neural networks for handling complex tasks through in-situ reinforcement learning. Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG) Existing VQA datasets contain questions with varying levels of complexity. While the majority of questions in these datasets require perception for recognizing existence, properties, and spatial relationships of entities, a significant portion of questions pose challenges that correspond to reasoning tasks — tasks that can only be answered through a synthesis of perception and knowledge about the world, logic and / or reasoning. This distinction allows us to notice when existing VQA models have consistency issues — they answer the reasoning question correctly but fail on associated low-level perception questions. For example, models answer the complex reasoning question “Is the banana ripe enough to eat?” correctly, but fail on the associated perception question “Are the bananas mostly green or yellow?” indicating that the model likely answered the reasoning question correctly but for the wrong reason. We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of 200K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split. Additionally, we propose an approach called Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the model to attend do the same parts of the image when answering the reasoning question and the perception sub questions. We show that SQuINT improves model consistency by 7.8%, also marginally improving its performance on the Reasoning questions in VQA, while also displaying qualitatively better attention maps. Optimal Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting Tianyang Hu , Zuofeng Shang , Guang Cheng Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG) Classifiers built with neural networks handle large-scale high-dimensional data, such as facial images from computer vision, extremely well while traditional statistical methods often fail miserably. In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk. In particular, a teacher-student framework is proposed that assumes the Bayes classifier to be expressed as ReLU neural networks. In this setup, we obtain a dimension-independent and un-improvable rate of convergence, i.e., (O(n^{-2/3})), for classifiers trained based on either 0-1 loss or hinge loss. This rate can be further improved to (O(n^{-1})) when data is separable. Here, (n) represents the sample size. Meysam Asgari-Chenaghlu , M.Reza Feizi-Derakhshi , Leili Farzinvash , Cina Motamed Subjects : Computation and Language (cs.CL) ; Machine Learning (cs.LG); Multimedia (cs.MM); Social and Information Networks (cs.SI) Named Entity Recognition (NER) from social media posts is a challenging task. User generated content which forms the nature of social media, is noisy and contains grammatical and linguistic errors. This noisy content makes it much harder for tasks such as named entity recognition. However some applications like automatic journalism or information retrieval from social media, require more information about entities mentioned in groups of social media posts. Conventional methods applied to structured and well typed documents provide acceptable results while compared to new user generated media, these methods are not satisfactory. One valuable piece of information about an entity is the related image to the text. Combining this multimodal data reduces ambiguity and provides wider information about the entities mentioned. In order to address this issue, we propose a novel deep learning approach utilizing multimodal deep learning. Our solution is able to provide more accurate results on named entity recognition task. Experimental results, namely the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions. Comments: MPhil Thesis (Scientific Computing, Physics, Machine Learning) : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Applications (stat.AP) The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes. Classifier learning in the presence of overlapping class distributions is a challenging problem in machine learning. Overlapping classes are described by the presence of ambiguous areas in the feature space with a high density of points belonging to both classes. This often occurs in real-world datasets, one such example is numeric data denoting properties of particle decays derived from high-energy accelerators like the Large Hadron Collider (LHC). A significant body of research targeting the class overlap problem use ensemble classifiers to boost the performance of algorithms by using them iteratively in multiple stages or using multiple copies of the same model on different subsets of the input training data. The former is called boosting and the latter is called bagging. The algorithm proposed in this thesis targets a challenging classification problem in high energy physics – that of improving the statistical significance of the Higgs discovery. The underlying dataset used to train the algorithm is experimental data built from the official ATLAS full-detector simulation with Higgs events (signal) mixed with different background events (background) that closely mimic the statistical properties of the signal generating class overlap. The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics. The algorithm utilizes a unified framework that combines two meta-learning techniques – bagging and boosting. The results show that this combination only works in the presence of a randomization trick in the base learners. Comments: 35 pages, 5 figures, 5 tables : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Methodology (stat.ME) Global optimization of expensive functions has important applications in physical and computer experiments. It is a challenging problem to develop efficient optimization scheme, because each function evaluation can be costly and the derivative information of the function is often not available. We propose a novel global optimization framework using adaptive Radial Basis Functions (RBF) based surrogate model via uncertainty quantification. The framework consists of two iteration steps. It first employs an RBF-based Bayesian surrogate model to approximate the true function, where the parameters of the RBFs can be adaptively estimated and updated each time a new point is explored. Then it utilizes a model-guided selection criterion to identify a new point from a candidate set for function evaluation. The selection criterion adopted here is a sample version of the expected improvement (EI) criterion. We conduct simulation studies with standard test functions, which show that the proposed method has some advantages, especially when the true surface is not very smooth. In addition, we also propose modified approaches to improve the search performance for identifying global optimal points and to deal with the higher dimension scenarios. Yi Wang , Yang Yang , Weiguo Zhu , Yi Wu , Xu Yan , Yongfeng Liu , Yu Wang , Liang Xie , Ziyao Gao , Wenjing Zhu , Xiang Chen , Wei Yan , Mingjie Tang , Yuan Tang Subjects : Databases (cs.DB) ; Machine Learning (cs.LG) Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow ( this https URL ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques — supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group. Comments: 16 pages : Optimization and Control (math.OC) ; Machine Learning (cs.LG); Machine Learning (stat.ML) Although theoretically appealing, Stochastic Natural Gradient Descent (SNGD) is computationally expensive, it has been shown to be highly sensitive to the learning rate, and it is not guaranteed to be convergent. Convergent Stochastic Natural Gradient Descent (CSNGD) aims at solving the last two problems. However, the computational expense of CSNGD is still unacceptable when the number of parameters is large. In this paper we introduce the Dual Stochastic Natural Gradient Descent (DSNGD) where we take benefit of dually flat manifolds to obtain a robust alternative to SNGD which is also computationally feasible. Comments: 159 pages, 46 figures : Materials Science (cond-mat.mtrl-sci) ; Machine Learning (cs.LG) By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years. Frank E. Curtis , Katya Scheinberg Subjects : Optimization and Control (math.OC) ; Machine Learning (cs.LG); Machine Learning (stat.ML) Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the potential to offer significant computational savings when training large-scale systems. Comments: to be published in ICSE-SEET 2020 : Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that are scalable, responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach software-engineering skills to students with a background in ML. We specifically go beyond traditional ML courses that teach modeling techniques under artificial conditions and focus, in lecture and assignments, on realism with large and changing datasets, robust and evolvable infrastructure, and purposeful requirements engineering that considers ethics and fairness as well. We describe the course and our infrastructure and share experience and all material from teaching the course for the first time. Comments: CSCW’2020 : Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML) Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions. Comments: 8 pages, 2 figures, 2 tables : Machine Learning (stat.ML) ; Machine Learning (cs.LG) Efficient Neural Architecture Search (ENAS) achieves novel efficiency for learning architecture with high-performance via parameter sharing, but suffers from an issue of slow propagation speed of search model with deep topology. In this paper, we propose a Broad version for ENAS (BENAS) to solve the above issue, by learning broad architecture whose propagation speed is fast with reinforcement learning and parameter sharing used in ENAS, thereby achieving a higher search efficiency. In particular, we elaborately design Broad Convolutional Neural Network (BCNN), the search paradigm of BENAS with fast propagation speed, which can obtain a satisfactory performance with broad topology, i.e. fast forward and backward propagation speed. The proposed BCNN extracts multi-scale features and enhancement representations, and feeds them into global average pooling layer to yield more reasonable and comprehensive representations so that the achieved performance of BCNN with shallow topology can be promised. In order to verify the effectiveness of BENAS, several experiments are performed and experimental results show that 1) BENAS delivers 0.23 day which is 2x less expensive than ENAS, 2) the architecture learned by BENAS based small-size BCNNs with 0.5 and 1.1 millions parameters obtain state-of-the-art performance, 3.63% and 3.40% test error on CIFAR-10, 3) the learned architecture based BCNN achieves 25.3\% top-1 error on ImageNet just using 3.9 millions parameters. Journal-ref: Knowledge-Based Systems, 2020, 105502 : Social and Information Networks (cs.SI) ; Machine Learning (cs.LG) Recently, information cascade prediction has attracted increasing interest from researchers, but it is far from being well solved partly due to the three defects of the existing works. First, the existing works often assume an underlying information diffusion model, which is impractical in real world due to the complexity of information diffusion. Second, the existing works often ignore the prediction of the infection order, which also plays an important role in social network analysis. At last, the existing works often depend on the requirement of underlying diffusion networks which are likely unobservable in practice. In this paper, we aim at the prediction of both node infection and infection order without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network, where the challenges are two-fold. The first is what cascading characteristics of nodes should be captured and how to capture them, and the second is that how to model the non-linear features of nodes in information cascades. To address these challenges, we propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction, which can capture not only the node structural property but also two kinds of node cascading characteristics. We propose an auto-encoder based collaborative embedding framework to learn the node embeddings with cascade collaboration and node collaboration, in which way the non-linearity of information cascades can be effectively captured. The results of extensive experiments conducted on real-world datasets verify the effectiveness of our approach. Comments: Accepted in WACV’2020 : Computer Vision and Pattern Recognition (cs.CV) ; Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML) Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR. Comments: Published as a long paper in EMNLP 2019 : Computation and Language (cs.CL) ; Information Retrieval (cs.IR); Machine Learning (cs.LG) Neural conversation systems generate responses based on the sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with a single set of learned parameters to generate responses for given input contexts. When confronting diverse conversations, its adaptability is rather limited and the model is hence prone to generate generic responses. In this work, we propose an {f Ada}ptive {f N}eural {f D}ialogue generation model, extsc{AdaND}, which manages various conversations with conversation-specific parameterization. For each conversation, the model generates parameters of the encoder-decoder by referring to the input context. In particular, we propose two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism. The context-aware parameterization directly generates the parameters by capturing local semantics of the given context. The topic-aware parameterization enables parameter sharing among conversations with similar topics by first inferring the latent topics of the given context and then generating the parameters with respect to the distributional topics. Extensive experiments conducted on a large-scale real-world conversational dataset show that our model achieves superior performance in terms of both quantitative metrics and human evaluations. Comments: 25 pages, 2 tables : Image and Video Processing (eess.IV) ; Machine Learning (cs.LG); Medical Physics (physics.med-ph); Machine Learning (stat.ML) This paper reviewed the machine learning-based studies for quantitative positron emission tomography (PET). Specifically, we summarized the recent developments of machine learning-based methods in PET attenuation correction and low-count PET reconstruction by listing and comparing the proposed methods, study designs and reported performances of the current published studies with brief discussion on representative studies. The contributions and challenges among the reviewed studies were summarized and highlighted in the discussion part followed by. Comments: arXiv admin note: substantial text overlap with arXiv:1812.03205 : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. In this paper we propose to revert to learning combinations of preset spectral filters by switching to CNNs with harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT) filters which have excellent energy compaction properties and are widely used for image compression. The proposed harmonic blocks rely on DCT-modeling and replace conventional convolutional layers to produce partially or fully harmonic versions of new or existing CNN architectures. We demonstrate how the harmonic networks can be efficiently compressed in a straightforward manner by truncating high-frequency information in harmonic blocks which is possible due to the redundancies in the spectral domain. We report extensive experimental validation demonstrating the benefits of the introduction of harmonic blocks into state-of-the-art CNN models in image classification, segmentation and edge detection applications. Comments: Submitted for publication : Information Theory (cs.IT) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML) We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for (f)-divergences. In particular, by generalizing the Dobrushin’s contraction coefficient for total variation distance to an (f)-divergence known as (E_{gamma})-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates. Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition Sebastian Raschka , Benjamin Kaufman Subjects : Biomolecules (q-bio.BM) ; Machine Learning (cs.LG) In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery. Comments: 14 pages, review. arXiv admin note: text overlap with arXiv:1810.05587 by other authors : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA) Deep reinforcement learning (RL) has achieved outstanding results in recent years, which has led a dramatic increase in the number of methods and applications. Recent works are exploring learning beyond single-agent scenarios and considering multi-agent scenarios. However, they are faced with lots of challenges and are seeking for help from traditional game-theoretic algorithms, which, in turn, show bright application promise combined with modern algorithms and boosting computing power. In this survey, we first introduce basic concepts and algorithms in single agent RL and multi-agent systems; then, we summarize the related algorithms from three aspects. Solution concepts from game theory give inspiration to algorithms which try to evaluate the agents or find better solutions in multi-agent systems. Fictitious self-play becomes popular and has a great impact on the algorithm of multi-agent reinforcement learning. Counterfactual regret minimization is an important tool to solve games with incomplete information, and has shown great strength when combined with deep learning. Comments: Accepted in PSIVT2019 Conference : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) Sharing images online poses security threats to a wide range of users due to the unawareness of privacy information. Deep features have been demonstrated to be a powerful representation for images. However, deep features usually suffer from the issues of a large size and requiring a huge amount of data for fine-tuning. In contrast to normal images (e.g., scene images), privacy images are often limited because of sensitive information. In this paper, we propose a novel approach that can work on limited data and generate deep features of smaller size. For training images, we first extract the initial deep features from the pre-trained model and then employ the K-means clustering algorithm to learn the centroids of these initial deep features. We use the learned centroids from training features to extract the final features for each testing image and encode our final features with the triangle encoding. To improve the discriminability of the features, we further perform the fusion of two proposed unsupervised deep features obtained from different layers. Experimental results show that the proposed features outperform state-of-the-art deep features, in terms of both classification accuracy and testing time. Comments: 14 pages, 8 figures, accepted by IEEE Transactions on Wireless Communications : Information Theory (cs.IT) Cloud radio access network (C-RAN) has been recognized as a promising architecture for next-generation wireless systems to extcolor{black}{support} the rapidly increasing demand for higher data rate. However, the performance of C-RAN is limited by the backhaul capacities, especially for the wireless deployment. While C-RAN with fixed BS caching has been demonstrated to reduce backhaul consumption, it is more challenging to further optimize the cache allocation at BSs with multi-cluster multicast backhaul, where the inter-cluster interference induces additional non-convexity to the cache optimization problem. Despite the challenges, we propose an accelerated first-order algorithm, which achieves much higher content downloading sum-rate than a second-order algorithm running for the same amount of time. Simulation results demonstrate that, by simultaneously delivering the required contents to different multicast clusters, the proposed algorithm achieves significantly higher downloading sum-rate than those of time-division single-cluster transmission schemes. Moreover, it is found that the proposed algorithm allocates larger cache sizes to the farther BSs within the nearer clusters, which provides insight to the superiority of the proposed cache allocation. Luca Barletta , Stefano Rini Subjects : Information Theory (cs.IT) In this paper, the capacity of the oversampled Wiener phase noise (OWPN) channel is investigated. The OWPN channel is a discrete-time point-to-point channel with a multi-sample receiver in which the channel output is affected by both additive and multiplicative noise. The additive noise is a white standard Gaussian process while the multiplicative noise is a Wiener phase noise process. This channel generalizes a number of channel models previously studied in the literature which investigate the effects of phase noise on the channel capacity, such as the Wiener phase noise channel and the non-coherent channel. We derive upper and inner bounds to the capacity of OWPN channel: (i) an upper bound is derived through the I-MMSE relationship by bounding the Fisher information when estimating a phase noise sample given the past channel outputs and phase noise realizations, then (ii) two inner bounds are shown: one relying on coherent combining of the oversampled channel outputs and one relying on non-coherent combining of the samples. After capacity, we study generalized degrees of freedom (GDoF) of the OWPN channel for the case in which the oversampling factor grows with the average transmit power (P) as (P)? and the frequency noise variance as (P^{alpha})?. Using our new capacity bounds, we derive the GDoF region in three regimes: regime (i) in which the GDoF region equals that of the classic additive white Gaussian noise (for (eta leq 1)), one (ii) in which GDoF region reduces to that of the non-coherent channel (for (eta geq min {alpha,1})) and, finally, one in which partially-coherent combining of the over-samples is asymptotically optimal (for (2 alpha-1leq eta leq 1)). Overall, our results are the first to identify the regimes in which different oversampling strategies are asymptotically optimal. Andreas Buchberger , Christian Häger , Henry D. Pfister , Laurent Schmalen , Alexandre Graell i Amat Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG) We consider near maximum-likelihood (ML) decoding of short linear block codes based on neural belief propagation (BP) decoding recently introduced by Nachmani et al.. While this method significantly outperforms conventional BP decoding, the underlying parity-check matrix may still limit the overall performance. In this paper, we introduce a method to tailor an overcomplete parity-check matrix to (neural) BP decoding using machine learning. We consider the weights in the Tanner graph as an indication of the importance of the connected check nodes (CNs) to decoding and use them to prune unimportant CNs. As the pruning is not tied over iterations, the final decoder uses a different parity-check matrix in each iteration. For Reed-Muller and short low-density parity-check codes, we achieve performance within 0.27 dB and 1.5 dB of the ML performance while reducing the complexity of the decoder. Asmaa Abdallah , Mohammad M. Mansour Subjects : Information Theory (cs.IT) Cell-free massive MIMO communications is an emerging network technology for 5G wireless communications wherein distributed multi-antenna access points (APs) serve many users simultaneously. Most prior work on cell-free massive MIMO systems assume time-division duplexing mode, although frequency-division duplexing (FDD) systems dominate current wireless standards. The key challenges in FDD massive MIMO systems are channel-state information (CSI) acquisition and feedback overhead. To address these challenges, we exploit the so-called angle reciprocity of multipath components in the uplink and downlink, so that the required CSI acquisition overhead scales only with the number of served users, and not the number of AP antennas nor APs. We propose a low complexity multipath component estimation technique and present linear angle-of-arrival (AoA)-based beamforming/combining schemes for FDD-based cell-free massive MIMO systems. We analyze the performance of these schemes by deriving closed-form expressions for the mean-square-error of the estimated multipath components, as well as expressions for the uplink and downlink spectral efficiency. Using semi-definite programming, we solve a max-min power allocation problem that maximizes the minimum user rate under per-user power constraints. Furthermore, we present a user-centric (UC) AP selection scheme in which each user chooses a subset of APs to improve the overall energy efficiency of the system. Simulation results demonstrate that the proposed multipath component estimation technique outperforms conventional subspace-based and gradient-descent based techniques. We also show that the proposed beamforming and combining techniques along with the proposed power control scheme substantially enhance the spectral and energy efficiencies with an adequate number of antennas at the APs. Yuhua Sun , Tongjiang Yan Subjects : Information Theory (cs.IT) In 2008, a class of binary sequences of period (N=4(2^k-1)(2^k+1)) with optimal autocorrelation magnitude has been presented by Yu and Gong based on an (m)-sequence, the perfect sequence ((0,1,1,1)) of period (4) and interleaving technique. In this paper, we study the 2-adic complexities of these sequences. Our results show that they are larger than (N-2lceilmathrm{log}_2N ceil) (which is far larger than (N/2)) and could attain the maximum value (N) if suitable parameters are chosen, i.e., the 2-adic complexity of this class of interleaved sequences is large enough to resist the Rational Approximation Algorithm. Comments: 14 pages, 11 figures, accepted by IEEE Transactions on Wireless Communications : Information Theory (cs.IT) Coordinated multi-point (CoMP) transmission is a cooperating technique among base stations (BSs) in a cellular network, with outstanding capability at inter-cell interference (ICI) mitigation. ICI is a dominant source of error, and has detrimental effects on system performance if not managed properly. Based on the theory of Poisson-Delaunay triangulation, this paper proposes a novel analytical model for CoMP operation in cellular networks. Unlike the conventional CoMP operation that is dynamic and needs on-line updating occasionally, the proposed approach enables the cooperating BS set of a user equipment (UE) to be fixed and off-line determined according to the location information of BSs. By using the theory of stochastic geometry, the coverage probability and spectral efficiency of a typical UE are analyzed, and simulation results corroborate the effectiveness of the proposed CoMP scheme and the developed performance analysis. Haiquan Lu , Yong Zeng , Shi Jin , Rui Zhang Subjects : Information Theory (cs.IT) This paper proposes a new three dimensional (3D) networking architecture enabled by aerial intelligent reflecting surface (AIRS) to achieve panoramic signal reflection from the sky. Compared to the conventional terrestrial IRS, AIRS not only enjoys higher deployment flexibility, but also is able to achieve 360(^circ) panoramic full-angle reflection and requires fewer reflections in general due to its higher likelihood of having line of sight (LoS) links with the ground nodes. We focus on the problem to maximize the worst-case signal-to-noise ratio (SNR) in a given coverage area by jointly optimizing the transmit beamforming, AIRS placement and phase shifts. The formulated problem is non-convex and the optimization variables are coupled with each other in an intricate manner. To tackle this problem, we first consider the special case of single-location SNR maximization to gain useful insights, for which the optimal solution is obtained in closed-form. Then for the general case of area coverage, an efficient suboptimal solution is proposed by exploiting the similarity between phase shifts optimization for IRS and analog beamforming for the conventional phase array. Numerical results show that the proposed design can achieve significant performance gain than heuristic AIRS deployment schemes. Comments: 6 pages; submitted to IEEE International Conference on Communications 2020 : Information Theory (cs.IT) Index coding is concerned with efficient broadcast of a set of messages to receivers in the presence of receiver side information. In this paper, we study the secure index coding problem with security constraints on the receivers themselves. That is, for each receiver there is a single legitimate message it needs to decode and a prohibited message list, none of which should be decoded by that receiver. To this end, our contributions are threefold. We first introduce a secure linear coding scheme, which is an extended version of the fractional local partial clique covering scheme that was originally devised for non-secure index coding. We then develop two information-theoretic bounds on the performance of any valid secure index code, namely secure polymatroidal outer bound (on the capacity region) and secure maximum acyclic induced subgraph lower bound (on the broadcast rate). The structure of these bounds leads us to further develop two necessary conditions for a given index coding problem to be securely feasible (i.e., to have nonzero rates). Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems Burak Hasircioglu , Jesus Gomez-Vilardebo , Deniz Gunduz Subjects : Information Theory (cs.IT) ; Distributed, Parallel, and Cluster Computing (cs.DC) Polynomial coding has been proposed as a solution to the straggler mitigation problem in distributed matrix multiplication. Previous works in the literature employ univariate polynomials to encode matrix partitions. Such schemes greatly improve the speed of distributed computing systems by making the task completion time to depend only on the fastest workers. However, the work done by the slowest workers, which fails to finish the task assigned to them, is completely ignored. In order to exploit the partial computations of the slower workers, we further decompose the overall matrix multiplication task into even smaller subtasks to better fit workers’ storage and computation capacities. In this work, we show that univariate schemes fail to make an efficient use of the storage capacity and we propose bivariate polynomial codes. We show that bivariate polynomial codes are a more natural choice to accommodate the additional decomposition of subtasks, as well as, heterogeneous storage and computation resources at workers. However, in contrast to univariate polynomial decoding, for multivariate interpolation guarantying decodability is much harder. We propose two bivartiate polynomial schemes. The first scheme exploits the fact that bivariate interpolation is always possible for rectangular grid of points. We obtain the rectangular grid of points at the cost of allowing some redundant computations. For the second scheme, we relax the decoding constraint, and require decodability for almost all choices of evaluation points. We present interpolation sets satisfying the almost decodability conditions for certain storage configurations of workers. Our numerical results show that bivariate polynomial coding considerably reduces the completion time of distributed matrix multiplication. Paulo Almeida , Umberto Martínez-Penas , Diego Napp Subjects : Information Theory (cs.IT) In the last decade there has been a great interest in extending results for codes equipped with the Hamming metric to analogous results for codes endowed with the rank metric. This work follows this thread of research and studies the characterization of systematic generator matrices (encoders) of codes with maximum rank distance. In the context of Hamming distance these codes are the so-called Maximum Distance Separable (MDS) codes and systematic encoders have been fully investigated. In this paper we investigate the algebraic properties and representation of encoders in systematic form of Maximum Rank Distance (MRD) codes and Maximum Sum Rank Distance (MSRD) codes. We address both block codes and convolutional codes separately and present necessary and sufficient conditions for an encoder in systematic form to generate a code with maximum (sum) rank distance. These characterizations are given in terms of certain matrices that must be superregular in a extension field and that preserve superregularity after some transformations performed over the base field. We conclude the work presenting some examples of Maximum Sum Rank convolutional codes over small fields. For the given parameters the examples obtained are over smaller fields than the examples obtained by other authors. Zitan Chen , Min Ye , Alexander Barg Subjects : Information Theory (cs.IT) Recently Reed-Solomon (RS) codes were shown to possess a repair scheme that supports repair of failed nodes with optimal repair bandwidth. In this paper, we extend this result in two directions. First, we propose a new repair scheme for the RS codes constructed in [Tamo-Ye-Barg, {em IEEE Transactions on Information Theory}, vol. 65, May 2019] and show that our new scheme is robust to erroneous information provided by the helper nodes while maintaining the optimal repair bandwidth. Second, we construct a new family of RS codes with optimal access for the repair of any single failed node. We also show that the constructed codes can accommodate both features, supporting optimal-access repair with optimal error-correction capability. Going beyond RS codes, we also prove that any scalar MDS code with optimal repair bandwidth allows for a repair scheme with optimal access property. Mahdi Shakiba-Herfeh , Arsenia Chorti , H. Vince Poor Subjects : Information Theory (cs.IT) The goal of physical layer security (PLS) is to make use of the properties of the physical layer, including the wireless communication medium and the transceiver hardware, to enable critical aspects of secure communications. In particular, PLS can be employed to provide i) node authentication, ii) message authentication, and, iii) message confidentiality. Unlike the corresponding classical cryptographic approaches which are all based on computational security, PLS’s added strength is that it is based on information theoretic security, in which no limitation with respect to the opponent’s computational power is assumed and is therefore inherently quantum resistant. In this survey, we review the aforementioned fundamental aspects of PLS, starting with node authentication, moving to the information theoretic characterization of message integrity, and finally, discussing message confidentiality both in the secret key generation from shared randomness and from the wiretap channel point of view. The aim of this review is to provide a comprehensive roadmap on important relevant results by the authors and other contributors and discuss open issues on the applicability of PLS in sixth generation systems. Lukas Holzbaur , Camilla Hollanti , Antonia Wachter-Zeh Subjects : Information Theory (cs.IT) A new computational private information retrieval (PIR) scheme based on random linear codes is presented. A matrix of messages from a McEliece scheme is used to query the server with carefully chosen errors. The server responds with the sum of the scalar multiple of the rows of the query matrix and the files. The user recovers the desired file by erasure decoding the response. Contrary to code-based cryptographic systems, the scheme presented here enables to use truly random codes, not only codes disguised as such. Further, we show the relation to the so-called error subspace search problem and quotient error search problem, which we assume to be difficult, and show that the scheme is secure against attacks based on solving these problems. Comments: 6 pages, no figures. Conference paper : Information Theory (cs.IT) A growing number of works have, in recent years, been concerned with in-vivo DNA as medium for data storage. This paper extends the concept of reconstruction codes for uniform-tandem-duplication noise to the model of associative memory, by finding the uncertainty associated with (m>2) strings (where a previous paper considered (m=2)). That uncertainty is found as an asymptotic expression assuming code-words belong to a typical set of strings, consisting a growing fraction of the space size, converging to (1). Our findings show that this uncertainty scales polynomially in the message length. Wentu Song , Kui Cai , Long Shi Subjects : Information Theory (cs.IT) Consider a centralized caching network with a single server and (K) users. The server has a database of (N) files with each file being divided into (F) packets ((F) is known as subpacketization), and each user owns a local cache that can store (frac{M}{N}) fraction of the (N) files. We construct a family of centralized coded caching schemes with polynomial subpacketization. Specifically, given (M), (N) and an integer (ngeq 0), we construct a family of coded caching schemes for any ((K,M,N)) caching system with (F=O(K^{n+1})). More generally, for any (tin{1,2,cdots,K-2}) and any integer (n) such that (0leq nleq t), we construct a coded caching scheme with (frac{M}{N}=frac{t}{K}) and (Fleq Kinom{left(1-frac{M}{N} ight)K+n}{n}). Farhad Shirani , Siddharth Garg , Elza Erkip Subjects : Information Theory (cs.IT) Permutations of correlated sequences of random variables appear naturally in a variety of applications such as graph matching and asynchronous communications. In this paper, the asymptotic statistical behavior of such permuted sequences is studied. It is assumed that a collection of random vectors is produced based on an arbitrary joint distribution, and the vectors undergo a permutation operation. The joint typicality of the resulting permuted vectors with respect to the original distribution is investigated. As an initial step, permutations of pairs of correlated random vectors are considered. It is shown that the probability of joint typicality of the permuted vectors depends only on the number and length of the disjoint cycles of the permutation. Consequently, it suffices to study typicality for a class of permutations called ‘standard permutations’, for which, upper-bounds on the probability of joint typicality are derived. The notion of standard permutations is extended to a class of permutation vectors called ‘Bell permutation vectors’. By investigating Bell permutation vectors, upper-bounds on the probability of joint typicality of permutations of arbitrary collections of random sequences are derived. Sanjeev Gurugopinath , Sami Muhaidat , Yousof Al-Hammadi , Paschalis C. Sofotasios , Octavia A. Dobre Subjects : Information Theory (cs.IT) ; Signal Processing (eess.SP) The proliferation of connected vehicles along with the high demand for rich multimedia services constitute key challenges for the emerging 5G-enabled vehicular networks. These challenges include, but are not limited to, high spectral efficiency and low latency requirements. Recently, the integration of cache-enabled networks with non-orthogonal multiple access (NOMA) has been shown to reduce the content delivery time and traffic congestion in wireless networks. Ac-cordingly, in this article, we envisage cache-aided NOMA as a technology facilitator for 5G-enabled vehicular networks. In particular, we present a cache-aided NOMA architecture, which can address some of the aforementioned challenges in these networks. We demonstrate that the spectral efficiency gain of the proposed architecture, which depends largely on the cached contents, significantly outperforms that of conventional vehicular networks. Finally, we provide deep insights into the challenges, opportunities, and future research trends that will enable the practical realization of cache-aided NOMA in 5G-enabled vehicular networks. Comments: 13 pages, 13 figures : Information Theory (cs.IT) We consider a fully-loaded ground wireless network supporting unmanned aerial vehicle (UAV) transmission services. To enable the overload transmissions to a ground user (GU) and a UAV, two transmission schemes are employed, namely non-orthogonal multiple access (NOMA) and relaying, depending on whether or not the GU and UAV are served simultaneously. Under the assumption of the system operating with infinite blocklength (IBL) codes, the IBL throughputs of both the GU and the UAV are derived under the two schemes. More importantly, we also consider the scenario in which data packets are transmitted via finite blocklength (FBL) codes, i.e., data transmission to both the UAV and the GU is performed under low-latency and high reliability constraints. In this setting, the FBL throughputs are characterized again considering the two schemes of NOMA and relaying. Following the IBL and FBL throughput characterizations, optimal resource allocation designs are subsequently proposed to maximize the UAV throughput while guaranteeing the throughput of the cellular user.Moreover, we prove that the relaying scheme is able to provide transmission service to the UAV while improving the GU’s performance, and that the relaying scheme potentially offers a higher throughput to the UAV in the FBL regime than in the IBL regime. On the other hand, the NOMA scheme provides a higher UAV throughput (than relaying) by slightly sacrificing the GU’s performance. Comments: 6 pages; submitted to IEEE International Symposium on Information Theory 2020 : Information Theory (cs.IT) This paper studies the tradeoff in privacy and utility in a single-trial multi-terminal guessing (estimation) framework using a system model that is inspired by index coding. There are (n) independent discrete sources at a data curator. There are (m) legitimate users and one adversary, each with some side information about the sources. The data curator broadcasts a distorted function of sources to legitimate users, which is also overheard by the adversary. In terms of utility, each legitimate user wishes to perfectly reconstruct some of the unknown sources and attain a certain gain in the estimation correctness for the remaining unknown sources. In terms of privacy, the data curator wishes to minimize the maximal leakage: the worst-case guessing gain of the adversary in estimating any target function of its unknown sources after receiving the broadcast data. Given the system settings, we derive fundamental performance lower bounds on the maximal leakage to the adversary, which are inspired by the notion of confusion graph and performance bounds for the index coding problem. We also detail a greedy privacy enhancing mechanism, which is inspired by the agglomerative clustering algorithms in the information bottleneck and privacy funnel problems. Comments: 6 pages : Information Theory (cs.IT) The Gray and Wyner lossy source coding for a simple network for sources that generate a tuple of jointly Gaussian random variables (RVs) (X_1 : Omega ightarrow {mathbb R}^{p_1}) and (X_2 : Omega ightarrow {mathbb R}^{p_2}), with respect to square-error distortion at the two decoders is re-examined using (1) Hotelling’s geometric approach of Gaussian RVs-the canonical variable form, and (2) van Putten’s and van Schuppen’s parametrization of joint distributions ({f P}_{X_1, X_2, W}) by Gaussian RVs (W : Omega ightarrow {mathbb R}^n ) which make ((X_1,X_2)) conditionally independent, and the weak stochastic realization of ((X_1, X_2)). Item (2) is used to parametrize the lossy rate region of the Gray and Wyner source coding problem for joint decoding with mean-square error distortions ({f E}ig{||X_i-hat{X}_i||_{{mathbb R}^{p_i}}^2 ig}leq Delta_i in [0,infty], i=1,2), by the covariance matrix of RV (W). From this then follows Wyner’s common information (C_W(X_1,X_2)) (information definition) is achieved by (W) with identity covariance matrix, while a formula for Wyner’s lossy common information (operational definition) is derived, given by (C_{WL}(X_1,X_2)=C_W(X_1,X_2) = frac{1}{2} sum_{j=1}^n ln left( frac{1+d_j}{1-d_j} ight),) for the distortion region ( 0leq Delta_1 leq sum_{j=1}^n(1-d_j)), (0leq Delta_2 leq sum_{j=1}^n(1-d_j)), and where (1 > d_1 geq d_2 geq ldots geq d_n>0) in ((0,1)) are {em the canonical correlation coefficients} computed from the canonical variable form of the tuple ((X_1, X_2)). The methods are of fundamental importance to other problems of multi-user communication, where conditional independence is imposed as a constraint. 3D Beamforming in Reconfigurable Intelligent Surfaces-assisted Wireless Communication Networks S. Mohammad Razavizadeh , Tommy Svensson Subjects : Information Theory (cs.IT) Reconfigurable Intelligent Surfaces (RIS) or Intelligent Reflecting Surfaces (IRS) are metasurfaces that can be deployed in various places in wireless environments to make these environments controllable and reconfigurable. In this paper, we investigate the problem of using 3D beamforming in RIS-empowered wireless networks and propose a new scheme that provides more degrees of freedom in designing and deploying the RIS-based networks. In the proposed scheme, a base station (BS) equipped with a full dimensional array of antennas optimizes its radiation pattern in the three-dimensional space to maximize the received signal to noise ratio at a target user. We also study the effect of angle of incidence of the received signal by the RIS on its reflecting properties and find a relation between this angle and the BS antenna array’s tilt and elevation angles. The user receives the signal from a reflected path from the RIS as well as from a direct path from the BS which both depend on the BS antenna array’s tilt and elevation angles. These angles and also the RIS element’s phase shifts are jointly numerically optimized. Our simulation results show that using RIS-assisted 3D beamforming with optimized phase shifts and radiation angles can considerably improve the performance of wireless networks. In this paper, we present an efficiently encodable and decodable code construction that is capable of correction a burst of deletions of length at most (k). The redundancy of this code is (log n + k(k+1)/2log log n+c_k) for some constant (c_k) that only depends on (k) and thus is scaling-optimal. The code can be split into two main components. First, we impose a constraint that allows to locate the burst of deletions up to an interval of size roughly (log n). Then, with the knowledge of the approximate location of the burst, we use several {shifted Varshamov-Tenengolts} codes to correct the burst of deletions, which only requires a small amount of redundancy since the location is already known up to an interval of small size. Finally, we show how to efficiently encode and decode the code. Abbas Khalili , Shahram Shahsavari , Mohammad A. (Amir) Khojastepour , Elza Erkip Subjects : Information Theory (cs.IT) Directional transmission patterns (a.k.a. narrow beams) are the key to wireless communications in millimeter wave (mmWave) frequency bands which suffer from high path loss and severe shadowing. In addition, the propagation channel in mmWave frequencies incorporates only a few number of spatial clusters requiring a procedure to align the corresponding narrow beams with the angle of departure (AoD) of the channel clusters. The objective of this procedure, called beam alignment (BA) is to increase the beamforming gain for subsequent data communication. Several prior studies consider optimizing BA procedure to achieve various objectives such as reducing the BA overhead, increasing throughput, and reducing power consumption. While these studies mostly provide optimized BA schemes for scenarios with a single active user, there are often multiple active users in practical networks. Consequently, it is more efficient in terms of BA overhead and delay to design multi-user BA schemes which can perform beam management for multiple users collectively. This paper considers a class of multi-user BA schemes where the base station performs a one shot scan of the angular domain to simultaneously localize multiple users. The objective is to minimize the average of expected width of remaining uncertainty regions (UR) on the AoDs after receiving users’ feedbacks. Fundamental bounds on the optimal performance are analyzed using information theoretic tools. Furthermore, a beam design optimization problem is formulated and a practical BA scheme, which provides significant gains compared to the beam sweeping used in 5G standard is proposed. Comments: Submitted for publication : Information Theory (cs.IT) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML) We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for (f)-divergences. In particular, by generalizing the Dobrushin’s contraction coefficient for total variation distance to an (f)-divergence known as (E_{gamma})-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates. Mohammadamin Baniasadi , Ertem Tuncel Subjects : Information Theory (cs.IT) Minimum power required to achieve a distortion-noise profile, i.e., a function indicating the maximum allowed distortion value for each noise level, is studied for the transmission of Gaussian sources over Gaussian channels under a regime of bandwidth approaching zero. A simple but instrumental lower bound to the minimum required power for a given profile is presented. For an upper bound, a dirty-paper based coding scheme is proposed and its power-distortion tradeoff is analyzed. Finally, upper and lower bounds to the minimum power is analyzed and compared for specific distortion-noise profiles, namely rational profiles with order one and two. Channels' Confirmation and Predictions' Confirmation: from the Medical Test to the Raven Paradox Comments: 12 tables, 7 figures : Artificial Intelligence (cs.AI) ; Information Theory (cs.IT); Logic (math.LO) After long arguments between positivism and falsificationism, the verification of universal hypotheses was replaced with the confirmation of uncertain major premises. Unfortunately, Hemple discovered the Raven Paradox (RP). Then, Carnap used the logical probability increment as the confirmation measure. So far, many confirmation measures have been proposed. Measure F among them proposed by Kemeny and Oppenheim possesses symmetries and asymmetries proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and normalizing property suggested by many researchers. Based on the semantic information theory, a measure b* similar to F is derived from the medical test. Like the likelihood ratio, b* and F can only indicate the quality of channels or the testing means instead of the quality of probability predictions. And, it is still not easy to use b*, F, or another measure to clarify the RP. For this reason, measure c* similar to the correct rate is derived. The c* has the simple form: (a-c)/max(a, c); it supports the Nicod Criterion and undermines the Equivalence Condition, and hence, can be used to eliminate the RP. Some examples are provided to show why it is difficult to use one of popular confirmation measures to eliminate the RP. Measure F, b*, and c* indicate that fewer counterexamples’ existence is more essential than more positive examples’ existence, and hence, are compatible with Popper’s falsification thought. Comments: Submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible : Signal Processing (eess.SP) ; Information Theory (cs.IT) We describe three new high-performance receivers suitable for symbol detection of large-scaled and overloaded multidimensional wireless communication systems, which are designed upon the usual perfect channel state information (CSI) assumption at the receiver. Using this common assumption, the maximum likelihood (ML) detection problem is first formulated in terms of an l0-norm-based optimization problem, subsequently transformed using a recently-proposed fractional programming (FP) technique referred to as quadratic transform (QT), in which the l0-norm is not relaxed into an l1-norm, in three distinct ways so as to offer a different performance-complexity trade-off. The first algorithm, dubbed the discreteness-aware penalized zero-forcing (DAPZF) receiver, aims at outperforming state-of-the-arts (SotAs) while minimizing the computational complexity. The second solution, referred to as the discreteness-aware probabilistic soft-quantization detector (DAPSD), is designed to improve the recovery performance via a soft-quantization method, and is found via numerical simulations to achieve the best performance of the three. Finally, the third scheme, named the discreteness-aware generalized eigenvalue detector (DAGED), not only offers a trade-off between performance and complexity compared to the others, but also differs from them by not requiring a penalization parameter to be optimized offline. Simulation results demonstrate that all three methods outperform the state-of-the-art receivers, with the DAPZF exhibiting significantly lower complexity. Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Information Theory (cs.IT) Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%. Comments: 5 pages, 2 figures : Signal Processing (eess.SP) ; Information Theory (cs.IT) In this paper, we propose a novel orthogonal frequency division multiplexing with index modulation (OFDM-IM) scheme, which we call Q-ary multi-mode OFDM-IM (Q-MM-OFDM-IM). In the proposed scheme, Q disjoint M-ary constellations are used repeatedly on each subcarrier, and a maximum-distance separable code is applied to the indices of these constellations to achieve the highest number of index symbols. A low-complexity subcarrier-wise detection is shown possible for the proposed scheme. Spectral efficiency (SE) and the error rate performance of the proposed scheme are further analyzed. It is shown that the proposed scheme exhibits a very flexible structure that is capable of encompassing conventional OFDM as a special case. It is also shown that the proposed scheme is capable of considerably outperforming the other OFDM-IM schemes and conventional OFDM in terms of error and SE performance while preserving a low-complexity structure. Comments: QQ and ZZ contributed equally to the work. Invited review paper for IEEE Signal Processing Magazine Special Issue on non-convex optimization for signal processing and machine learning. This article contains 26 pages with 11 figures : Machine Learning (cs.LG) ; Information Theory (cs.IT); Image and Video Processing (eess.IV); Optimization and Control (math.OC); Machine Learning (stat.ML) The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning. However, in contrast to the classical sparse recovery problem, the most natural formulation for finding the sparsest vector in a subspace is usually nonconvex. In this paper, we overview recent advances on global nonconvex optimization theory for solving this problem, ranging from geometric analysis of its optimization landscapes, to efficient optimization algorithms for solving the associated nonconvex optimization problem, to applications in machine intelligence, representation learning, and imaging sciences. Finally, we conclude this review by pointing out several interesting open problems for future research. Quantitative Aspects of Programming Languages and Systems over the past (2^4) years and beyond Comments: In Proceedings QAPL 2019, arXiv:2001.06163 Journal-ref: EPTCS 312, 2020, pp. 1-19 : Programming Languages (cs.PL) ; Information Theory (cs.IT); Logic in Computer Science (cs.LO) Quantitative aspects of computation are related to the use of both physical and mathematical quantities, including time, performance metrics, probability, and measures for reliability and security. They are essential in characterizing the behaviour of many critical systems and in estimating their properties. Hence, they need to be integrated both at the level of system modeling and within the verification methodologies and tools. Along the last two decades a variety of theoretical achievements and automated techniques have contributed to make quantitative modeling and verification mainstream in the research community. In the same period, they represented the central theme of the series of workshops entitled Quantitative Aspects of Programming Languages and Systems (QAPL) and born in 2001. The aim of this survey is to revisit such achievements and results from the standpoint of QAPL and its community. Comments: 25 pages : Machine Learning (cs.LG) ; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML) Overwhelming theoretical and empirical evidence shows that mildly overparametrized neural networks — those with more connections than the size of the training data — are often able to memorize the training data with (100\%) accuracy. This was rigorously proved for networks with sigmoid activation functions and, very recently, for ReLU activations. Addressing a 1988 open question of Baum, we prove that this phenomenon holds for general multilayered perceptrons, i.e. neural networks with threshold activation functions, or with any mix of threshold and ReLU activations. Our construction is probabilistic and exploits sparsity. Comments: 8 pages, 5 figures, concise version submitted to 2020 IEEE International Symposium on Information Theory (ISIT) : Cryptography and Security (cs.CR) ; Information Theory (cs.IT) Recent works have considered the ability of transmitter Alice to communicate reliably to receiver Bob without being detected by warden Willie. These works generally assume a standard discrete-time model. But the assumption of a discrete-time model in standard communication scenarios is often predicated on its equivalence to a continuous-time model, which has not been established for the covert communications problem. Here, we consider the continuous-time channel directly and study if efficient covert communication can still be achieved. We assume that an uninformed jammer is present to assist Alice, and we consider additive white Gaussian noise (AWGN) channels between all parties. For a channel with approximate bandwidth W, we establish constructions such that O(WT) information bits can be transmitted covertly and reliably from Alice to Bob in T seconds for two separate scenarios: 1) when the path-loss between Alice and Willie is known; and 2) when the path-loss between Alice and Willie is unknown. Minimum Symbol-Error Probability Symbol-Level Precoding with Intelligent Reflecting Surface Mingjie Shao , Qiang Li , Wing-Kin Ma Subjects : Signal Processing (eess.SP) ; Information Theory (cs.IT) Recently, the use of intelligent reflecting surface (IRS) has gained considerable attention in wireless communications. By intelligently adjusting the passive reflection angle, IRS is able to assist the base station (BS) to extend the coverage and improve spectral efficiency. This paper considers a joint symbol-level precoding (SLP) and IRS reflecting design to minimize the symbol-error probability (SEP) of the intended users in an IRS-aided multiuser MISO downlink. We formulate the SEP minimization problems to pursue uniformly good performance for all users for both QAM and PSK constellations. The resulting problem is non-convex and we resort to alternating minimization to obtain a stationary solution. Simulation results demonstrate that under the aid of IRS our proposed design indeed enhances the bit-error rate performance. In particular, the performance improvement is significant when the number of IRS elements is large. Jaya Prakash Champati , Ramana R. Avula , Tobias J. Oechtering , James Gross Subjects : Information Retrieval (cs.IR) ; Information Theory (cs.IT) There is a growing interest in analysing the freshness of data in networked systems. Age of Information (AoI) has emerged as a popular metric to quantify this freshness at a given destination. There has been a significant research effort in optimizing this metric in communication and networking systems under different settings. In contrast to previous works, we are interested in a fundamental question, what is the minimum achievable AoI in any single-server-single-source queuing system for a given service-time distribution? To address this question, we study a problem of optimizing AoI under service preemptions. Our main result is on the characterization of the minimum achievable average peak AoI (PAoI). We obtain this result by showing that a fixed-threshold policy is optimal in the set of all randomized-threshold causal policies. We use the characterization to provide necessary and sufficient conditions for the service-time distributions under which preemptions are beneficial. Andrés Abeliuk , Zhishen Huang , Emilio Ferrara , Kristina Lerman Subjects : Computers and Society (cs.CY) ; Information Theory (cs.IT) Applications from finance to epidemiology and cyber-security require accurate forecasts of dynamic phenomena, which are often only partially observed. We demonstrate that a system’s predictability degrades as a function of temporal sampling, regardless of the adopted forecasting model. We quantify the loss of predictability due to sampling, and show that it cannot be recovered by using external signals. We validate the generality of our theoretical findings in real-world partially observed systems representing infectious disease outbreaks, online discussions, and software development projects. On a variety of prediction tasks—forecasting new infections, the popularity of topics in online discussions, or interest in cryptocurrency projects—predictability irrecoverably decays as a function of sampling, unveiling fundamental predictability limits in partially observed systems. 微信扫一扫,关注我爱机器学习公众号 微博:我爱机器学习 A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints
Learning to See Analogies: A Connectionist Exploration
OIAD: One-for-all Image Anomaly Detection with Disentanglement Learning
Regularized Cycle Consistent Generative Adversarial Network for Anomaly Detection
FlexiBO: Cost-Aware Multi-Objective Optimization of Deep Neural Networks
Scalable Bid Landscape Forecasting in Real-time Bidding
K-NN active learning under local smoothness assumption
TopRank+: A Refinement of TopRank Algorithm
Multimodal Deep Unfolding for Guided Image Super-Resolution
A Comprehensive Study on Temporal Modeling for Online Action Detection
Variational Dropout Sparsification for Particle Identification speed-up
Pruning Neural Belief Propagation Decoders
Evaluating Weakly Supervised Object Localization Methods Right
Explicit agreement extremes for a (2 imes2) table with given marginals
Estimating Latent Demand of Shared Mobility through Censored Gaussian Processes
Lyceum: An efficient and scalable ecosystem for robot learning
Breast lesion segmentation in ultrasound images with limited annotated data
Neural Style Difference Transfer and Its Application to Font Generation
Counter-example Guided Learning of Bounds on Environment Behavior
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
A deep network for sinogram and CT image reconstruction
The Incentives that Shape Behaviour
Pairwise Discriminative Neural PLDA for Speaker Verification
Deep Image Clustering with Tensor Kernels and Unsupervised Companion Objectives
An Efficient Framework for Automated Screening of Clinically Significant Macular Edema
CNN-Based Real-Time Parameter Tuning for Optimizing Denoising Filter Performance
CNN-based InSAR Coherence Classification
CNN-based InSAR Denoising and Coherence Metric
Nested-Wasserstein Self-Imitation Learning for Sequence Generation
Memristor Hardware-Friendly Reinforcement Learning
SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
A multimodal deep learning approach for named entity recognition from social media
SQLFlow: A Bridge between SQL and Machine Learning
Dual Stochastic Natural Gradient Descent
Big-Data Science in Porous Materials: Materials Genomics and Machine Learning
Adaptive Stochastic Optimization
Teaching Software Engineering for AI-Enabled Systems
How do Data Science Workers Collaborate? Roles, Workflows, and Tools
Efficient Neural Architecture Search: A Broad Version
Deep Collaborative Embedding for information cascade prediction
Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval
Adaptive Parameterization for Neural Dialogue Generation
Machine Learning in Quantitative PET Imaging
Harmonic Convolutional Networks based on Discrete Cosine Transform
Privacy Amplification of Iterative Algorithms via Contraction Coefficients
Unsupervised Deep Features for Privacy Image Classification
Information Theory
On the Capacity of the Oversampled Wiener Phase Noise Channel
Pruning Neural Belief Propagation Decoders
Efficient Angle-Domain Processing for FDD-based Cell-free Massive MIMO Systems
Coordinated Multi-Point Transmission: A Poisson-Delaunay Triangulation Based Approach
Enabling Panoramic Full-Angle Reflection via Aerial Intelligent Reflecting Surface
Secure Index Coding with Security Constraints on Receivers
Systematic Maximum Sum Rank Codes
Enabling optimal access and error correction for the repair of Reed-Solomon codes
Physical Layer Security: Authentication, Integrity and Confidentiality
Computational Code-Based Single-Server Private Information Retrieval
Uncertainty of Reconstructing Multiple Messages from Uniform-Tandem-Duplication Noise
Coded Caching with Polynomial Subpacketization
On the Joint Typicality of Permutations of Sequences of Random Variables
Non-Orthogonal Multiple Access with Wireless Caching for 5G-Enabled Vehicular Networks
Privacy-Utility Tradeoff in a Guessing Framework Inspired by Index Coding
Optimal Codes Correcting a Burst of Deletions of Variable Length
On Optimal Multi-user Beam Alignment in Millimeter Wave Wireless Systems
Privacy Amplification of Iterative Algorithms via Contraction Coefficients
Robust Gaussian Joint Source-Channel Coding Under the Near-Zero Bandwidth Regime
Discreteness-aware Receivers for Overloaded MIMO Systems
Serverless Straggler Mitigation using Local Error-Correcting Codes
Q-ary Multi-Mode OFDM with Index Modulation
Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications
Memory capacity of neural networks with threshold and ReLU activations
Covert Communication in Continuous-Time Systems
On the Minimum Achievable Age of Information for General Service-Time Distributions
Predictability limit of partially observed systems
欢迎加入我爱机器学习QQ14群:336582044
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。