内容简介:We consider the problem of unsupervised camera pose estimation. Given aninput video sequence, our goal is to estimate the camera pose (i.e. the cameramotion) between consecutive frames. Traditionally, this problem is tackled by
Computer Vision and Pattern Recognition
Unsupervised Learning of Camera Pose with Compositional Re-estimation
Comments: Accepted to WACV 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We consider the problem of unsupervised camera pose estimation. Given an
input video sequence, our goal is to estimate the camera pose (i.e. the camera
motion) between consecutive frames. Traditionally, this problem is tackled by
placing strict constraints on the transformation vector or by incorporating
optical flow through a complex pipeline. We propose an alternative approach
that utilizes a compositional re-estimation process for camera pose estimation.
Given an input, we first estimate a depth map. Our method then iteratively
estimates the camera motion based on the estimated depth map. Our approach
significantly improves the predicted camera motion both quantitatively and
visually. Furthermore, the re-estimation resolves the problem of
out-of-boundaries pixels in a novel and simple way. Another advantage of our
approach is that it is adaptable to other camera pose estimation approaches.
Experimental analysis on KITTI benchmark dataset demonstrates that our method
outperforms existing state-of-the-art approaches in unsupervised camera
ego-motion estimation.
Combining PRNU and noiseprint for robust and efficient device source identification
Davide Cozzolino , Francesco Marra , Diego Gragnaniello , Giovanni Poggi , Luisa Verdoliva Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Image and Video Processing (eess.IV)
PRNU-based image processing is a key asset in digital multimedia forensics.
It allows for reliable device identification and effective detection and
localization of image forgeries, in very general conditions. However,
performance impairs significantly in challenging conditions involving low
quality and quantity of data. These include working on compressed and cropped
images, or estimating the camera PRNU pattern based on only a few images. To
boost the performance of PRNU-based analyses in such conditions we propose to
leverage the image noiseprint, a recently proposed camera-model fingerprint
that has proved effective for several forensic tasks. Numerical experiments on
datasets widely used for source identification prove that the proposed method
ensures a significant performance improvement in a wide range of challenging
situations.
TailorGAN: Making User-Defined Fashion Designs
Comments: fashion
Journal-ref: 2020 Winter Conference on Applications of Computer Vision
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Attribute editing has become an important and emerging topic of computer
vision. In this paper, we consider a task: given a reference garment image A
and another image B with target attribute (collar/sleeve), generate a
photo-realistic image which combines the texture from reference A and the new
attribute from reference B. The highly convoluted attributes and the lack of
paired data are the main challenges to the task. To overcome those limitations,
we propose a novel self-supervised model to synthesize garment images with
disentangled attributes (e.g., collar and sleeves) without paired data. Our
method consists of a reconstruction learning step and an adversarial learning
step. The model learns texture and location information through reconstruction
learning. And, the model’s capability is generalized to achieve
single-attribute manipulation by adversarial learning. Meanwhile, we compose a
new dataset, named GarmentSet, with annotation of landmarks of collars and
sleeves on clean garment images. Extensive experiments on this dataset and
real-world samples demonstrate that our method can synthesize much better
results than the state-of-the-art methods in both quantitative and qualitative
comparisons.
Subjective Annotation for a Frame Interpolation Benchmark using Artifact Amplification
Comments: arXiv admin note: text overlap with arXiv:1901.05362
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Current benchmarks for optical flow algorithms evaluate the estimation either
directly by comparing the predicted flow fields with the ground truth or
indirectly by using the predicted flow fields for frame interpolation and then
comparing the interpolated frames with the actual frames. In the latter case,
objective quality measures such as the mean squared error are typically
employed. However, it is well known that for image quality assessment, the
actual quality experienced by the user cannot be fully deduced from such simple
measures. Hence, we conducted a subjective quality assessment crowdscouring
study for the interpolated frames provided by one of the optical flow
benchmarks, the Middlebury benchmark. It contains interpolated frames from 155
methods applied to each of 8 contents. We collected forced choice paired
comparisons between interpolated images and corresponding ground truth. To
increase the sensitivity of observers when judging minute difference in paired
comparisons we introduced a new method to the field of full-reference quality
assessment, called artifact amplification. From the crowdsourcing data we
reconstructed absolute quality scale values according to Thurstone’s model. As
a result, we obtained a re-ranking of the 155 participating algorithms w.r.t.
the visual quality of the interpolated frames. This re-ranking not only shows
the necessity of visual quality assessment as another evaluation metric for
optical flow and frame interpolation benchmarks, the results also provide the
ground truth for designing novel image quality assessment (IQA) methods
dedicated to perceptual quality of interpolated images. As a first step, we
proposed such a new full-reference method, called WAE-IQA. By weighing the
local differences between an interpolated image and its ground truth WAE-IQA
performed slightly better than the currently best FR-IQA approach from the
literature.
GraphBGS: Background Subtraction via Recovery of Graph Signals
Jhony H. Giraldo , Thierry Bouwmans Subjects : Computer Vision and Pattern Recognition (cs.CV)
Graph-based algorithms have been successful approaching the problems of
unsupervised and semi-supervised learning. Recently, the theory of graph signal
processing and semi-supervised learning have been combined leading to new
developments and insights in the field of machine learning. In this paper,
concepts of recovery of graph signals and semi-supervised learning are
introduced in the problem of background subtraction. We propose a new algorithm
named GraphBGS, this method uses a Mask R-CNN for instances segmentation;
temporal median filter for background initialization; motion, texture, color,
and structural features for representing the nodes of a graph; k-nearest
neighbors for the construction of the graph; and finally a semi-supervised
method inspired from the theory of recovery of graph signals to solve the
problem of background subtraction. The method is evaluated on the publicly
available change detection, and scene background initialization databases.
Experimental results show that GraphBGS outperforms unsupervised background
subtraction algorithms in some challenges of the change detection dataset. And
most significantly, this method outperforms generative adversarial networks in
unseen videos in some sequences of the scene background initialization
database.
Latency-Aware Differentiable Neural Architecture Search
Comments: 11 pages, 7 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Differentiable neural architecture search methods became popular in automated
machine learning, mainly due to their low search costs and flexibility in
designing the search space. However, these methods suffer the difficulty in
optimizing network, so that the searched network is often unfriendly to
hardware. This paper deals with this problem by adding a differentiable latency
loss term into optimization, so that the search process can tradeoff between
accuracy and latency with a balancing coefficient. The core of latency
prediction is to encode each network architecture and feed it into a
multi-layer regressor, with the training data being collected from randomly
sampling a number of architectures and evaluating them on the hardware. We
evaluate our approach on NVIDIA Tesla-P100 GPUs. With 100K sampled
architectures (requiring a few hours), the latency prediction module arrives at
a relative error of lower than 10\%. Equipped with this module, the search
method can reduce the latency by 20% meanwhile preserving the accuracy. Our
approach also enjoys the ability of being transplanted to a wide range of
hardware platforms with very few efforts, or being used to optimizing other
non-differentiable factors such as power consumption.
Comments: Submitted to IEEE Geoscience and Remote Sensing Magazine
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Success of deep neural networks in the framework of remote sensing (RS) image
analysis depends on the availability of a high number of annotated images.
BigEarthNet is a new large-scale Sentinel-2 benchmark archive that has been
recently introduced in RS to advance deep learning (DL) studies. Each image
patch in BigEarthNet is annotated with multi-labels provided by the CORINE Land
Cover (CLC) map of 2018 based on its most thematic detailed Level-3 class
nomenclature. BigEarthNet has enabled data-hungry DL algorithms to reach high
performance in the context of multi-label RS image retrieval and
classification. However, initial research demonstrates that some CLC classes
are challenging to be accurately described by considering only (single-date)
Sentinel-2 images. To further increase the effectiveness of BigEarthNet, in
this paper we introduce an alternative class-nomenclature to allow DL models
for better learning and describing the complex spatial and spectral information
content of the Sentinel-2 images. This is achieved by interpreting and
arranging the CLC Level-3 nomenclature based on the properties of Sentinel-2
images in a new nomenclature of 19 classes. Then, the new class-nomenclature of
BigEarthNet is used within state-of-the-art DL models (namely VGG model at the
depth of 16 and 19 layers [VGG16 and VGG19] and ResNet model at the depth of
50, 101 and 152 layers [ResNet50, ResNet101, ResNet152] as well as K-Branch CNN
model) in the context of multi-label classification. Experimental results show
that the models trained from scratch on BigEarthNet outperform those
pre-trained on ImageNet, especially in relation to some complex classes
including agriculture and other vegetated and natural environments. All DL
models are made publicly available, offering an important resource to guide
future progress on content based image retrieval and scene classification
problems in RS.
Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks
Comments: Accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 1-1, New York, USA
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
Ensemble methods, traditionally built with independently trained
de-correlated models, have proven to be efficient methods for reducing the
remaining residual generalization error, which results in robust and accurate
methods for real-world applications. In the context of deep learning, however,
training an ensemble of deep networks is costly and generates high redundancy
which is inefficient. In this paper, we present experiments on Ensembles with
Shared Representations (ESRs) based on convolutional networks to demonstrate,
quantitatively and qualitatively, their data processing efficiency and
scalability to large-scale datasets of facial expressions. We show that
redundancy and computational load can be dramatically reduced by varying the
branching level of the ESR without loss of diversity and generalization power,
which are both important for ensemble performance. Experiments on large-scale
datasets suggest that ESRs reduce the remaining residual generalization error
on the AffectNet and FER+ datasets, reach human-level performance, and
outperform state-of-the-art methods on facial expression recognition in the
wild using emotion and affect concepts.
Vision Meets Drones: Past, Present and Future
Comments: arXiv admin note: text overlap with arXiv:1804.07437
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Drones, or general UAVs, equipped with cameras have been fast deployed with a
wide range of applications, including agriculture, aerial photography, fast
delivery, and surveillance. Consequently, automatic understanding of visual
data collected from drones becomes highly demanding, bringing computer vision
and drones more and more closely. To promote and track the developments of
object detection and tracking algorithms, we have organized two challenge
workshops in conjunction with European Conference on Computer Vision (ECCV)
2018, and IEEE International Conference on Computer Vision (ICCV) 2019,
attracting more than 100 teams around the world. We provide a large-scale drone
captured dataset, VisDrone, which includes four tracks, i.e., (1) image object
detection, (2) video object detection, (3) single object tracking, and (4)
multi-object tracking. This paper first presents a thorough review of object
detection and tracking datasets and benchmarks, and discuss the challenges of
collecting large-scale drone-based object detection and tracking datasets with
fully manual annotations. After that, we describe our VisDrone dataset, which
is captured over various urban/suburban areas of (14) different cities across
China from North to South. Being the largest such dataset ever published,
VisDrone enables extensive evaluation and investigation of visual analysis
algorithms on the drone platform. We provide a detailed analysis of the current
state of the field of large-scale object detection and tracking on drones, and
conclude the challenge as well as propose future directions and improvements.
We expect the benchmark largely boost the research and development in video
analysis on drone platforms. All the datasets and experimental results can be
downloaded from the website: this https URL .
Predicting the Physical Dynamics of Unseen 3D Objects
Comments: In Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020. arXiv admin note: text overlap with arXiv:1901.00466
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
Machines that can predict the effect of physical interactions on the dynamics
of previously unseen object instances are important for creating better robots
and interactive virtual worlds. In this work, we focus on predicting the
dynamics of 3D objects on a plane that have just been subjected to an impulsive
force. In particular, we predict the changes in state – 3D position, rotation,
velocities, and stability. Different from previous work, our approach can
generalize dynamics predictions to object shapes and initial conditions that
were unseen during training. Our method takes the 3D object’s shape as a point
cloud and its initial linear and angular velocities as input. We extract shape
features and use a recurrent neural network to predict the full change in state
at each time step. Our model can support training with data from both a physics
engine or the real world. Experiments show that we can accurately predict the
changes in state for unseen object geometries and initial conditions.
Review: deep learning on 3D point clouds
Saifullahi Aminu Bello , Shangshu Yu , Cheng Wang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Point cloud is point sets defined in 3D metric space. Point cloud has become
one of the most significant data format for 3D representation. Its gaining
increased popularity as a result of increased availability of acquisition
devices, such as LiDAR, as well as increased application in areas such as
robotics, autonomous driving, augmented and virtual reality. Deep learning is
now the most powerful tool for data processing in computer vision, becoming the
most preferred technique for tasks such as classification, segmentation, and
detection. While deep learning techniques are mainly applied to data with a
structured grid, point cloud, on the other hand, is unstructured. The
unstructuredness of point clouds makes use of deep learning for its processing
directly very challenging. Earlier approaches overcome this challenge by
preprocessing the point cloud into a structured grid format at the cost of
increased computational cost or lost of depth information. Recently, however,
many state-of-the-arts deep learning techniques that directly operate on point
cloud are being developed. This paper contains a survey of the recent
state-of-the-art deep learning techniques that mainly focused on point cloud
data. We first briefly discussed the major challenges faced when using deep
learning directly on point cloud, we also briefly discussed earlier approaches
which overcome the challenges by preprocessing the point cloud into a
structured grid. We then give the review of the various state-of-the-art deep
learning approaches that directly process point cloud in its unstructured form.
We introduced the popular 3D point cloud benchmark datasets. And we also
further discussed the application of deep learning in popular 3D vision tasks
including classification, segmentation and detection.
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
Comments: 11 pages, 3 figures, 16 tables
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Recent studies in image classification have demonstrated a variety of
techniques for improving the performance of Convolutional Neural Networks
(CNNs). However, attempts to combine existing techniques to create a practical
model are still uncommon. In this study, we carry out extensive experiments to
validate that carefully assembling these techniques and applying them to a
basic CNN model in combination can improve the accuracy and robustness of the
model while minimizing the loss of throughput. For example, our proposed
ResNet-50 shows an improvement in top-1 accuracy from 76.3% to 82.78%, and an
mCE improvement from 76.0% to 48.9%, on the ImageNet ILSVRC2012 validation set.
With these improvements, inference throughput only decreases from 536 to 312.
The resulting model significantly outperforms state-of-the-art models with
similar accuracy in terms of mCE and inference throughput. To verify the
performance improvement in transfer learning, fine grained classification and
image retrieval tasks were tested on several open datasets and showed that the
improvement to backbone network performance boosted transfer learning
performance significantly. Our approach achieved 1st place in the iFood
Competition Fine-Grained Visual Recognition at CVPR 2019, and the source code
and trained models are available at this https URL
SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On
Comments: Accepted at IEEE WACV 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Image-based virtual try-on for fashion has gained considerable attention
recently. The task requires trying on a clothing item on a target model image.
An efficient framework for this is composed of two stages: (1) warping
(transforming) the try-on cloth to align with the pose and shape of the target
model, and (2) a texture transfer module to seamlessly integrate the warped
try-on cloth onto the target model image. Existing methods suffer from
artifacts and distortions in their try-on output. In this work, we present
SieveNet, a framework for robust image-based virtual try-on. Firstly, we
introduce a multi-stage coarse-to-fine warping network to better model
fine-grained intricacies (while transforming the try-on cloth) and train it
with a novel perceptual geometric matching loss. Next, we introduce a try-on
cloth conditioned segmentation mask prior to improve the texture transfer
network. Finally, we also introduce a dueling triplet loss strategy for
training the texture translation network which further improves the quality of
the generated try-on results. We present extensive qualitative and quantitative
evaluations of each component of the proposed pipeline and show significant
performance improvements against the current state-of-the-art method.
Two-Phase Object-Based Deep Learning for Multi-temporal SAR Image Change Detection
Xinzheng Zhang , Guo Liu , Ce Zhang , Peter M Atkinson , Xiaoheng Tan , Xin Jian , Xichuan Zhou , Yongming Li Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Image and Video Processing (eess.IV)
Change detection is one of the fundamental applications of synthetic aperture
radar (SAR) images. However, speckle noise presented in SAR images has a much
negative effect on change detection. In this research, a novel two-phase
object-based deep learning approach is proposed for multi-temporal SAR image
change detection. Compared with traditional methods, the proposed approach
brings two main innovations. One is to classify all pixels into three
categories rather than two categories: unchanged pixels, changed pixels caused
by strong speckle (false changes), and changed pixels formed by real terrain
variation (real changes). The other is to group neighboring pixels into
segmented into superpixel objects (from pixels) such as to exploit local
spatial context. Two phases are designed in the methodology: 1) Generate
objects based on the simple linear iterative clustering algorithm, and
discriminate these objects into changed and unchanged classes using fuzzy
c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the
set of changed and unchanged superpixels. 2) Deep learning on the pixel sets
over the changed superpixels only, obtained in the first phase, to discriminate
real changes from false changes. SLIC is employed again to achieve new
superpixels in the second phase. Low rank and sparse decomposition are applied
to these new superpixels to suppress speckle noise significantly. A further
clustering step is applied to these new superpixels via FCM. A new PCANet is
then trained to classify two kinds of changed superpixels to achieve the final
change maps. Numerical experiments demonstrate that, compared with benchmark
methods, the proposed approach can distinguish real changes from false changes
effectively with significantly reduced false alarm rates, and achieve up to
99.71% change detection accuracy using multi-temporal SAR imagery.
Registration made easy — standalone orthopedic navigation with HoloLens
Comments: 6 pages, 5 figures, accepted at CVPR 2019 workshop on Computer Vision Applications for Mixed Reality Headsets ( this https URL )
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In surgical navigation, finding correspondence between preoperative plan and
intraoperative anatomy, the so-called registration task, is imperative. One
promising approach is to intraoperatively digitize anatomy and register it with
the preoperative plan. State-of-the-art commercial navigation systems implement
such approaches for pedicle screw placement in spinal fusion surgery. Although
these systems improve surgical accuracy, they are not gold standard in clinical
practice. Besides economical reasons, this may be due to their difficult
integration into clinical workflows and unintuitive navigation feedback.
Augmented Reality has the potential to overcome these limitations.
Consequently, we propose a surgical navigation approach comprising
intraoperative surface digitization for registration and intuitive holographic
navigation for pedicle screw placement that runs entirely on the Microsoft
HoloLens. Preliminary results from phantom experiments suggest that the method
may meet clinical accuracy requirements.
Comments: 8 pages, 8 figures, 6 tables
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Optical flow estimation is an important yet challenging problem in the field
of video analytics. The features of different semantics levels/layers of a
convolutional neural network can provide information of different granularity.
To exploit such flexible and comprehensive information, we propose a
semi-supervised Feature Pyramidal Correlation and Residual Reconstruction
Network (FPCR-Net) for optical flow estimation from frame pairs. It consists of
two main modules: pyramid correlation mapping and residual reconstruction. The
pyramid correlation mapping module takes advantage of the multi-scale
correlations of global/local patches by aggregating features of different
scales to form a multi-level cost volume. The residual reconstruction module
aims to reconstruct the sub-band high-frequency residuals of finer optical flow
in each stage. Based on the pyramid correlation mapping, we further propose a
correlation-warping-normalization (CWN) module to efficiently exploit the
correlation dependency. Experiment results show that the proposed scheme
achieves the state-of-the-art performance, with improvement by 0.80, 1.15 and
0.10 in terms of average end-point error (AEE) against competing baseline
methods – FlowNet2, LiteFlowNet and PWC-Net on the Final pass of Sintel
dataset, respectively.
Interpreting Galaxy Deblender GAN from the Discriminator's Perspective
Comments: 5 pages, 4 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Image and Video Processing (eess.IV)
Generative adversarial networks (GANs) are well known for their unsupervised
learning capabilities. A recent success in the field of astronomy is deblending
two overlapping galaxy images via a branched GAN model. However, it remains a
significant challenge to comprehend how the network works, which is
particularly difficult for non-expert users. This research focuses on behaviors
of one of the network’s major components, the Discriminator, which plays a
vital role but is often overlooked, Specifically, we enhance the Layer-wise
Relevance Propagation (LRP) scheme to generate a heatmap-based visualization.
We call this technique Polarized-LRP and it consists of two parts i.e. positive
contribution heatmaps for ground truth images and negative contribution
heatmaps for generated images. Using the Galaxy Zoo dataset we demonstrate that
our method clearly reveals attention areas of the Discriminator when
differentiating generated galaxy images from ground truth images. To connect
the Discriminator’s impact on the Generator, we visualize the gradual changes
of the Generator across the training process. An interesting result we have
achieved there is the detection of a problematic data augmentation procedure
that would else have remained hidden. We find that our proposed method serves
as a useful visual analytical tool for a deeper understanding of GAN models.
Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition
Comments: 17 pages, 18 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Affective computing and cognitive theory are widely used in modern
human-computer interaction scenarios. Human faces, as the most prominent and
easily accessible features, have attracted great attention from researchers.
Since humans have rich emotions and developed musculature, there exist a lot of
fine-grained expressions in real-world applications. However, it is extremely
time-consuming to collect and annotate a large number of facial images, of
which may even require psychologists to correctly categorize them. To the best
of our knowledge, the existing expression datasets are only limited to several
basic facial expressions, which are not sufficient to support our ambitions in
developing successful human-computer interaction systems. To this end, a novel
Fine-grained Facial Expression Database – F2ED is contributed in this paper,
and it includes more than 200k images with 54 facial expressions from 119
persons. Considering the phenomenon of uneven data distribution and lack of
samples is common in real-world scenarios, we further evaluate several tasks of
few-shot expression learning by virtue of our F2ED, which are to recognize the
facial expressions given only few training instances. These tasks mimic human
performance to learn robust and general representation from few examples. To
address such few-shot tasks, we propose a unified task-driven framework –
Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize
facial images and thus augmenting the instances of few-shot expression classes.
Extensive experiments are conducted on F2ED and existing facial expression
datasets, i.e., JAFFE and FER2013, to validate the efficacy of our F2ED in
pre-training facial expression recognition network and the effectiveness of our
proposed approach Comp-GAN to improve the performance of few-shot recognition
tasks.
Spatio-Temporal Ranked-Attention Networks for Video Captioning
Anoop Cherian , Jue Wang , Chiori Hori , Tim K. Marks Subjects : Computer Vision and Pattern Recognition (cs.CV)
Generating video descriptions automatically is a challenging task that
involves a complex interplay between spatio-temporal visual features and
language models. Given that videos consist of spatial (frame-level) features
and their temporal evolutions, an effective captioning model should be able to
attend to these different cues selectively. To this end, we propose a
Spatio-Temporal and Temporo-Spatial (STaTS) attention model which, conditioned
on the language state, hierarchically combines spatial and temporal attention
to videos in two different orders: (i) a spatio-temporal (ST) sub-model, which
first attends to regions that have temporal evolution, then temporally pools
the features from these regions; and (ii) a temporo-spatial (TS) sub-model,
which first decides a single frame to attend to, then applies spatial attention
within that frame. We propose a novel LSTM-based temporal ranking function,
which we call ranked attention, for the ST model to capture action dynamics.
Our entire framework is trained end-to-end. We provide experiments on two
benchmark datasets: MSVD and MSR-VTT. Our results demonstrate the synergy
between the ST and TS modules, outperforming recent state-of-the-art methods.
Automatic Discovery of Political Meme Genres with Diverse Appearances
Comments: 16 pages, 10 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Social and Information Networks (cs.SI)
Forms of human communication are not static — we expect some evolution in
the way information is conveyed over time because of advances in technology.
One example of this phenomenon is the image-based meme, which has emerged as a
dominant form of political messaging in the past decade. While originally used
to spread jokes on social media, memes are now having an outsized impact on
public perception of world events. A significant challenge in automatic meme
analysis has been the development of a strategy to match memes from within a
single genre when the appearances of the images vary. Such variation is
especially common in memes exhibiting mimicry. For example, when voters perform
a common hand gesture to signal their support for a candidate. In this paper we
introduce a scalable automated visual recognition pipeline for discovering
political meme genres of diverse appearance. This pipeline can ingest meme
images from a social network, apply computer vision-based techniques to extract
local features and index new images into a database, and then organize the
memes into related genres. To validate this approach, we perform a large case
study on the 2019 Indonesian Presidential Election using a new dataset of over
two million images collected from Twitter and Instagram. Results show that this
approach can discover new meme genres with visually diverse images that share
common stylistic elements, paving the way forward for further work in semantic
analysis and content attribution.
On- Device Information Extraction from Screenshots in form of tags
Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a method to make mobile screenshots easily searchable. In this
paper, we present the workflow in which we: 1) preprocessed a collection of
screenshots, 2) identified script presentin image, 3) extracted unstructured
text from images, 4) identifiedlanguage of the extracted text, 5) extracted
keywords from the text, 6) identified tags based on image features, 7) expanded
tag set by identifying related keywords, 8) inserted image tags with relevant
images after ranking and indexed them to make it searchable on device. We made
the pipeline which supports multiple languages and executed it on-device, which
addressed privacy concerns. We developed novel architectures for components in
the pipeline, optimized performance and memory for on-device computation. We
observed from experimentation that the solution developed can reduce overall
user effort and improve end user experience while searching, whose results are
published.
Tracking of Micro Unmanned Aerial Vehicles: A Comparative Study
Comments: In proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2019), 13 pages, 9 Figures
Journal-ref: F. G”okc{c}e. Tracking of Micro Unmanned Aerial Vehicles: A
Comparative Study. In Proceedings of the International Conference on
Artificial Intelligence and Applied Mathematics in Engineering, Antalya,
Turkey, 20-22 Apr. 2019, pp.374-386
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Robotics (cs.RO)
Micro unmanned aerial vehicles (mUAV) became very common in recent years. As
a result of their widespread usage, when they are flown by hobbyists illegally,
crucial risks are imposed and such mUAVs need to be sensed by security systems.
Furthermore, the sensing of mUAVs are essential for also swarm robotics
research where the individuals in a flock of robots require systems to sense
and localize each other for coordinated operation. In order to obtain such
systems, there are studies to detect mUAVs utilizing different sensing mediums,
such as vision, infrared and sound signals, and small-scale radars. However,
there are still challenges that awaits to be handled in this field such as
integrating tracking approaches to the vision-based detection systems to
enhance accuracy and computational complexity. For this reason, in this study,
we combine various tracking approaches to a vision-based mUAV detection system
available in the literature, in order to evaluate different tracking approaches
in terms of accuracy and as well as investigate the effect of such integration
to the computational cost.
Increasing the robustness of DNNs against image corruptions by playing the Game of Noise
Evgenia Rusak , Lukas Schott , Roland Zimmermann , Julian Bitterwolf , Oliver Bringmann , Matthias Bethge , Wieland Brendel Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
The human visual system is remarkably robust against a wide range of
naturally occurring variations and corruptions like rain or snow. In contrast,
the performance of modern image recognition models strongly degrades when
evaluated on previously unseen corruptions. Here, we demonstrate that a simple
but properly tuned training with additive Gaussian and Speckle noise
generalizes surprisingly well to unseen corruptions, easily reaching the
previous state of the art on the corruption benchmark ImageNet-C (with
ResNet50) and on MNIST-C. We build on top of these strong baseline results and
show that an adversarial training of the recognition model against uncorrelated
worst-case noise distributions leads to an additional increase in performance.
This regularization can be combined with previously proposed defense methods
for further improvement.
Modality-Balanced Models for Visual Dialogue
Comments: AAAI 2020 (11 pages)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The Visual Dialog task requires a model to exploit both image and
conversational context information to generate the next response to the
dialogue. However, via manual analysis, we find that a large number of
conversational questions can be answered by only looking at the image without
any access to the context history, while others still need the conversation
context to predict the correct answers. We demonstrate that due to this reason,
previous joint-modality (history and image) models over-rely on and are more
prone to memorizing the dialogue history (e.g., by extracting certain keywords
or patterns in the context information), whereas image-only models are more
generalizable (because they cannot memorize or extract keywords from history)
and perform substantially better at the primary normalized discounted
cumulative gain (NDCG) task metric which allows multiple correct answers.
Hence, this observation encourages us to explicitly maintain two models, i.e.,
an image-only model and an image-history joint model, and combine their
complementary abilities for a more balanced multimodal model. We present
multiple methods for this integration of the two models, via ensemble and
consensus dropout fusion with shared parameters. Empirically, our models
achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and
high balance across metrics), and substantially outperform the winner of the
Visual Dialog challenge 2018 on most metrics.
Tethered Aerial Visual Assistance
Comments: Submitted to special issue of “Field and Service Robotics” of the Journal of Field Robotics (JFR). arXiv admin note: text overlap with arXiv:1904.00078
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
In this paper, an autonomous tethered Unmanned Aerial Vehicle (UAV) is
developed into a visual assistant in a marsupial co-robots team, collaborating
with a tele-operated Unmanned Ground Vehicle (UGV) for robot operations in
unstructured or confined environments. These environments pose extreme
challenges to the remote tele-operator due to the lack of sufficient
situational awareness, mostly caused by the unstructuredness and confinement,
stationary and limited field-of-view and lack of depth perception from the
robot’s onboard cameras. To overcome these problems, a secondary tele-operated
robot is used in current practices, who acts as a visual assistant and provides
external viewpoints to overcome the perceptual limitations of the primary
robot’s onboard sensors. However, a second tele-operated robot requires extra
manpower and teamwork demand between primary and secondary operators. The
manually chosen viewpoints tend to be subjective and sub-optimal. Considering
these intricacies, we develop an autonomous tethered aerial visual assistant in
place of the secondary tele-operated robot and operator, to reduce human robot
ratio from 2:2 to 1:2. Using a fundamental viewpoint quality theory, a formal
risk reasoning framework, and a newly developed tethered motion suite, our
visual assistant is able to autonomously navigate to good-quality viewpoints in
a risk-aware manner through unstructured or confined spaces with a tether. The
developed marsupial co-robots team could improve tele-operation efficiency in
nuclear operations, bomb squad, disaster robots, and other domains with novel
tasks or highly occluded environments, by reducing manpower and teamwork
demand, and achieving better visual assistance quality with trustworthy
risk-aware motion.
DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images
Comments: arXiv admin note: text overlap with arXiv:1907.06490
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep learning methods for super-resolution of a remote sensing scene from
multiple unregistered low-resolution images have recently gained attention
thanks to a challenge proposed by the European Space Agency. This paper
presents an evolution of the winner of the challenge, showing how incorporating
non-local information in a convolutional neural network allows to exploit
self-similar patterns that provide enhanced regularization of the
super-resolution problem. Experiments on the dataset of the challenge show
improved performance over the state-of-the-art, which does not exploit
non-local information.
Zhenbing Zhao , Hongyu Qi , Yincheng Qi , Ke Zhang , Yongjie Zhai , Wenqing Zhao Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV)
Bolts are the most numerous fasteners in transmission lines and are prone to
losing their split pins. How to realize the automatic pin-missing defect
detection for bolts in transmission lines so as to achieve timely and efficient
trouble shooting is a difficult problem and the long-term research target of
power systems. In this paper, an automatic detection model called Automatic
Visual Shape Clustering Network (AVSCNet) for pin-missing defect is
constructed. Firstly, an unsupervised clustering method for the visual shapes
of bolts is proposed and applied to construct a defect detection model which
can learn the difference of visual shape. Next, three deep convolutional neural
network optimization methods are used in the model: the feature enhancement,
feature fusion and region feature extraction. The defect detection results are
obtained by applying the regression calculation and classification to the
regional features. In this paper, the object detection model of different
networks is used to test the dataset of pin-missing defect constructed by the
aerial images of transmission lines from multiple locations, and it is
evaluated by various indicators and is fully verified. The results show that
our method can achieve considerably satisfactory detection effect.
Sideways: Depth-Parallel Training of Video Models
Mateusz Malinowski , Grzegorz Swirszcz , Joao Carreira , Viorica Patraucean Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose Sideways, an approximate backpropagation scheme for training video
models. In standard backpropagation, the gradients and activations at every
computation step through the model are temporally synchronized. The forward
activations need to be stored until the backward pass is executed, preventing
inter-layer (depth) parallelization. However, can we leverage smooth, redundant
input streams such as videos to develop a more efficient training scheme? Here,
we explore an alternative to backpropagation; we overwrite network activations
whenever new ones, i.e., from new frames, become available. Such a more gradual
accumulation of information from both passes breaks the precise correspondence
between gradients and activations, leading to theoretically more noisy weight
updates. Counter-intuitively, we show that Sideways training of deep
convolutional video networks not only still converges, but can also potentially
exhibit better generalization compared to standard synchronized
backpropagation.
FedVision: An Online Visual Object Detection Platform Powered by Federated Learning
Yang Liu , Anbu Huang , Yun Luo , He Huang , Youzhi Liu , Yuanyuan Chen , Lican Feng , Tianjian Chen , Han Yu , Qiang Yang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Visual object detection is a computer vision-based artificial intelligence
(AI) technique which has many practical applications (e.g., fire hazard
monitoring). However, due to privacy concerns and the high cost of transmitting
video data, it is highly challenging to build object detection models on
centrally stored large training datasets following the current approach.
Federated learning (FL) is a promising approach to resolve this challenge.
Nevertheless, there currently lacks an easy to use tool to enable computer
vision application developers who are not experts in federated learning to
conveniently leverage this technology and apply it in their systems. In this
paper, we report FedVision – a machine learning engineering platform to support
the development of federated learning powered computer vision applications. The
platform has been deployed through a collaboration between WeBank and Extreme
Vision to help customers develop computer vision-based safety monitoring
solutions in smart city applications. Over four months of usage, it has
achieved significant efficiency improvement and cost reduction while removing
the need to transmit sensitive data for three major corporate customers. To the
best of our knowledge, this is the first real application of FL in computer
vision-based tasks.
Spatiotemporal Camera-LiDAR Calibration: A Targetless and Structureless Approach
Comments: 8 pages, To appear, IEEE Robotics and Automation Letters 2020
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
The demand for multimodal sensing systems for robotics is growing due to the
increase in robustness, reliability and accuracy offered by these systems.
These systems also need to be spatially and temporally co-registered to be
effective. In this paper, we propose a targetless and structureless
spatiotemporal camera-LiDAR calibration method. Our method combines a
closed-form solution with a modified structureless bundle adjustment where the
coarse-to-fine approach does not {require} an initial guess on the
spatiotemporal parameters. Also, as 3D features (structure) are calculated from
triangulation only, there is no need to have a calibration target or to match
2D features with the 3D point cloud which provides flexibility in the
calibration process and sensor configuration. We demonstrate the accuracy and
robustness of the proposed method through both simulation and real data
experiments using multiple sensor payload configurations mounted to hand-held,
aerial and legged robot systems. Also, qualitative results are given in the
form of a colorized point cloud visualization.
An adversarial learning framework for preserving users' anonymity in face-based emotion recognition
Vansh Narula , Zhangyang (Atlas)
Wang , Theodora Chaspari Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Image and video-capturing technologies have permeated our every-day life.
Such technologies can continuously monitor individuals’ expressions in
real-life settings, affording us new insights into their emotional states and
transitions, thus paving the way to novel well-being and healthcare
applications. Yet, due to the strong privacy concerns, the use of such
technologies is met with strong skepticism, since current face-based emotion
recognition systems relying on deep learning techniques tend to preserve
substantial information related to the identity of the user, apart from the
emotion-specific information. This paper proposes an adversarial learning
framework which relies on a convolutional neural network (CNN) architecture
trained through an iterative procedure for minimizing identity-specific
information and maximizing emotion-dependent information. The proposed approach
is evaluated through emotion classification and face identification metrics,
and is compared against two CNNs, one trained solely for emotion recognition
and the other trained solely for face identification. Experiments are performed
using the Yale Face Dataset and Japanese Female Facial Expression Database.
Results indicate that the proposed approach can learn a convolutional
transformation for preserving emotion recognition accuracy and degrading face
identity recognition, providing a foundation toward privacy-aware emotion
recognition technologies.
Comments: 6 pages, Accepted and to appear in ISQED 2020
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we propose Code-Bridged Classifier (CBC), a framework for
making a Convolutional Neural Network (CNNs) robust against adversarial attacks
without increasing or even by decreasing the overall models’ computational
complexity. More specifically, we propose a stacked encoder-convolutional
model, in which the input image is first encoded by the encoder module of a
denoising auto-encoder, and then the resulting latent representation (without
being decoded) is fed to a reduced complexity CNN for image classification. We
illustrate that this network not only is more robust to adversarial examples
but also has a significantly lower computational complexity when compared to
the prior art defenses.
Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning
Paola Cascante-Bonilla , Fuwen Tan , Yanjun Qi , Vicente Ordonez Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Semi-supervised learning aims to take advantage of a large amount of
unlabeled data to improve the accuracy of a model that only has access to a
small number of labeled examples. We propose curriculum labeling, an approach
that exploits pseudo-labeling for propagating labels to unlabeled samples in an
iterative and self-paced fashion. This approach is surprisingly simple and
effective and surpasses or is comparable with the best methods proposed in the
recent literature across all the standard benchmarks for image classification.
Notably, we obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled
samples, and 88.56% top-5 accuracy on Imagenet-ILSVRC using 128,000 labeled
samples. In contrast to prior works, our approach shows improvements even in a
more realistic scenario that leverages out-of-distribution unlabeled data
samples.
Artificial Intelligence
Fast Compliance Checking with General Vocabularies
Comments: arXiv admin note: substantial text overlap with arXiv:2001.05390
Subjects:
Artificial Intelligence (cs.AI)
We address the problem of complying with the GDPR while processing and
transferring personal data on the web. For this purpose we introduce an
extensible profile of OWL2 for representing data protection policies. With this
language, a company’s data usage policy can be checked for compliance with data
subjects’ consent and with a formalized fragment of the GDPR by means of
subsumption queries. The outer structure of the policies is restricted in order
to make compliance checking highly scalable, as required when processing
high-frequency data streams or large data volumes. However, the vocabularies
for specifying policy properties can be chosen rather freely from expressive
Horn fragments of OWL2. We exploit IBQ reasoning to integrate specialized
reasoners for the policy language and the vocabulary’s language. Our
experiments show that this approach significantly improves performance.
Visual Simplified Characters' Emotion Emulator Implementing OCC Model
Comments: 7 pages, 14 figures, 2 tables
Journal-ref: CGST Conference on Computer Science and Engineering, Istanbul,
Turkey, 19-21 December 2011
Subjects:
Artificial Intelligence (cs.AI)
In this paper, we present a visual emulator of the emotions seen in
characters in stories. This system is based on a simplified view of the
cognitive structure of emotions proposed by Ortony, Clore and Collins (OCC
Model). The goal of this paper is to provide a visual platform that allows us
to observe changes in the characters’ different emotions, and the intricate
interrelationships between: 1) each character’s emotions, 2) their affective
relationships and actions, 3) The events that take place in the development of
a plot, and 4) the objects of desire that make up the emotional map of any
story. This tool was tested on stories with a contrasting variety of emotional
and affective environments: Othello, Twilight, and Harry Potter, behaving
sensibly and in keeping with the atmosphere in which the characters were
immersed.
A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis
Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020
Subjects:
Artificial Intelligence (cs.AI)
; Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In recent years, Markov logic networks (MLNs) have been proposed as a
potentially useful paradigm for music signal analysis. Because all hidden
Markov models can be reformulated as MLNs, the latter can provide an
all-encompassing framework that reuses and extends previous work in the field.
However, just because it is theoretically possible to reformulate previous work
as MLNs, does not mean that it is advantageous. In this paper, we analyse some
proposed examples of MLNs for musical analysis and consider their practical
disadvantages when compared to formulating the same musical dependence
relationships as (dynamic) Bayesian networks. We argue that a number of
practical hurdles such as the lack of support for sequences and for arbitrary
continuous probability distributions make MLNs less than ideal for the proposed
musical applications, both in terms of easy of formulation and computational
requirements due to their required inference algorithms. These conclusions are
not specific to music, but apply to other fields as well, especially when
sequential data with continuous observations is involved. Finally, we show that
the ideas underlying the proposed examples can be expressed perfectly well in
the more commonly used framework of (dynamic) Bayesian networks.
Plato Dialogue System: A Flexible Conversational AI Research Platform
Alexandros Papangelis , Mahdi Namazifar , Chandra Khatri , Yi-Chia Wang , Piero Molino , Gokhan Tur Subjects : Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
As the field of Spoken Dialogue Systems and Conversational AI grows, so does
the need for tools and environments that abstract away implementation details
in order to expedite the development process, lower the barrier of entry to the
field, and offer a common test-bed for new ideas. In this paper, we present
Plato, a flexible Conversational AI platform written in Python that supports
any kind of conversational agent architecture, from standard architectures to
architectures with jointly-trained components, single- or multi-party
interactions, and offline or online training of any conversational agent
component. Plato has been designed to be easy to understand and debug and is
agnostic to the underlying learning frameworks that train each component.
Modality-Balanced Models for Visual Dialogue
Comments: AAAI 2020 (11 pages)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The Visual Dialog task requires a model to exploit both image and
conversational context information to generate the next response to the
dialogue. However, via manual analysis, we find that a large number of
conversational questions can be answered by only looking at the image without
any access to the context history, while others still need the conversation
context to predict the correct answers. We demonstrate that due to this reason,
previous joint-modality (history and image) models over-rely on and are more
prone to memorizing the dialogue history (e.g., by extracting certain keywords
or patterns in the context information), whereas image-only models are more
generalizable (because they cannot memorize or extract keywords from history)
and perform substantially better at the primary normalized discounted
cumulative gain (NDCG) task metric which allows multiple correct answers.
Hence, this observation encourages us to explicitly maintain two models, i.e.,
an image-only model and an image-history joint model, and combine their
complementary abilities for a more balanced multimodal model. We present
multiple methods for this integration of the two models, via ensemble and
consensus dropout fusion with shared parameters. Empirically, our models
achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and
high balance across metrics), and substantially outperform the winner of the
Visual Dialog challenge 2018 on most metrics.
Comments: 6 pages; 1 figure
Subjects:
Software Engineering (cs.SE)
; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)
When developing autonomous systems, engineers and other stakeholders make
great effort to prepare the system for all foreseeable events and conditions.
However, these systems are still bound to encounter events and conditions that
were not considered at design time. For reasons like safety, cost, or ethics,
it is often highly desired that these new situations be handled correctly upon
first encounter. In this paper we first justify our position that there will
always exist unpredicted events and conditions, driven among others by: new
inventions in the real world; the diversity of world-wide system deployments
and uses; and, the non-negligible probability that multiple seemingly unlikely
events, which may be neglected at design time, will not only occur, but occur
together. We then argue that despite this unpredictability property, handling
these events and conditions is indeed possible. Hence, we offer and exemplify
design principles that when applied in advance, can enable systems to deal, in
the future, with unpredicted circumstances. We conclude with a discussion of
how this work and a broader theoretical study of the unexpected can contribute
toward a foundation of engineering principles for developing trustworthy
next-generation autonomous systems.
User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant
Comments: To be published as a conference paper in the proceedings of IUI’20
Journal-ref: 25th International Conference on Intelligent User Interfaces (IUI
’20), March 17–20, 2020, Cagliari, Italy
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
People are becoming increasingly comfortable using Digital Assistants (DAs)
to interact with services or connected objects. However, for non-programming
users, the available possibilities for customizing their DA are limited and do
not include the possibility of teaching the assistant new tasks. To make the
most of the potential of DAs, users should be able to customize assistants by
instructing them through Natural Language (NL). To provide such
functionalities, NL interpretation in traditional assistants should be
improved: (1) The intent identification system should be able to recognize new
forms of known intents, and to acquire new intents as they are expressed by the
user. (2) In order to be adaptive to novel intents, the Natural Language
Understanding module should be sample efficient, and should not rely on a
pretrained model. Rather, the system should continuously collect the training
data as it learns new intents from the user. In this work, we propose AidMe
(Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop
adaptive intent detection framework that allows the assistant to adapt to its
user by learning his intents as their interaction progresses. AidMe builds its
repertoire of intents and collects data to train a model of semantic similarity
evaluation that can discriminate between the learned intents and autonomously
discover new forms of known intents. AidMe addresses two major issues – intent
learning and user adaptation – for instructable digital assistants. We
demonstrate the capabilities of AidMe as a standalone system by comparing it
with a one-shot learning system and a pretrained NLU module through simulations
of interactions with a user. We also show how AidMe can smoothly integrate to
an existing instructable digital assistant.
Information Retrieval
On- Device Information Extraction from Screenshots in form of tags
Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a method to make mobile screenshots easily searchable. In this
paper, we present the workflow in which we: 1) preprocessed a collection of
screenshots, 2) identified script presentin image, 3) extracted unstructured
text from images, 4) identifiedlanguage of the extracted text, 5) extracted
keywords from the text, 6) identified tags based on image features, 7) expanded
tag set by identifying related keywords, 8) inserted image tags with relevant
images after ranking and indexed them to make it searchable on device. We made
the pipeline which supports multiple languages and executed it on-device, which
addressed privacy concerns. We developed novel architectures for components in
the pipeline, optimized performance and memory for on-device computation. We
observed from experimentation that the solution developed can reduce overall
user effort and improve end user experience while searching, whose results are
published.
A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis
Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020
Subjects:
Artificial Intelligence (cs.AI)
; Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In recent years, Markov logic networks (MLNs) have been proposed as a
potentially useful paradigm for music signal analysis. Because all hidden
Markov models can be reformulated as MLNs, the latter can provide an
all-encompassing framework that reuses and extends previous work in the field.
However, just because it is theoretically possible to reformulate previous work
as MLNs, does not mean that it is advantageous. In this paper, we analyse some
proposed examples of MLNs for musical analysis and consider their practical
disadvantages when compared to formulating the same musical dependence
relationships as (dynamic) Bayesian networks. We argue that a number of
practical hurdles such as the lack of support for sequences and for arbitrary
continuous probability distributions make MLNs less than ideal for the proposed
musical applications, both in terms of easy of formulation and computational
requirements due to their required inference algorithms. These conclusions are
not specific to music, but apply to other fields as well, especially when
sequential data with continuous observations is involved. Finally, we show that
the ideas underlying the proposed examples can be expressed perfectly well in
the more commonly used framework of (dynamic) Bayesian networks.
Computation and Language
A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings
Iker García , Rodrigo Agerri , German Rigau Subjects : Computation and Language (cs.CL)
This paper presents a new technique for creating monolingual and
cross-lingual meta-embeddings. Our method integrates multiple word embeddings
created from complementary techniques, textual sources, knowledge bases and
languages. Existing word vectors are projected to a common semantic space using
linear transformations and averaging. With our method the resulting
meta-embeddings maintain the dimensionality of the original embeddings without
losing information while dealing with the out-of-vocabulary problem. An
extensive empirical evaluation demonstrates the effectiveness of our technique
with respect to previous work on various intrinsic and extrinsic multilingual
evaluations, obtaining competitive results for Semantic Textual Similarity and
state-of-the-art performance for word similarity and POS tagging (English and
Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent
cross-lingual transfer learning capabilities. In other words, we can leverage
pre-trained source embeddings from a resource-rich language in order to improve
the word representations for under-resourced languages.
Modality-Balanced Models for Visual Dialogue
Comments: AAAI 2020 (11 pages)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The Visual Dialog task requires a model to exploit both image and
conversational context information to generate the next response to the
dialogue. However, via manual analysis, we find that a large number of
conversational questions can be answered by only looking at the image without
any access to the context history, while others still need the conversation
context to predict the correct answers. We demonstrate that due to this reason,
previous joint-modality (history and image) models over-rely on and are more
prone to memorizing the dialogue history (e.g., by extracting certain keywords
or patterns in the context information), whereas image-only models are more
generalizable (because they cannot memorize or extract keywords from history)
and perform substantially better at the primary normalized discounted
cumulative gain (NDCG) task metric which allows multiple correct answers.
Hence, this observation encourages us to explicitly maintain two models, i.e.,
an image-only model and an image-history joint model, and combine their
complementary abilities for a more balanced multimodal model. We present
multiple methods for this integration of the two models, via ensemble and
consensus dropout fusion with shared parameters. Empirically, our models
achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and
high balance across metrics), and substantially outperform the winner of the
Visual Dialog challenge 2018 on most metrics.
A Hybrid Solution to Learn Turn-Taking in Multi-Party Service-based Chat Groups
Comments: arXiv admin note: text overlap with arXiv:1907.02090
Subjects:
Computation and Language (cs.CL)
; Formal Languages and Automata Theory (cs.FL)
To predict the next most likely participant to interact in a multi-party
conversation is a difficult problem. In a text-based chat group, the only
information available is the sender, the content of the text and the dialogue
history. In this paper we present our study on how these information can be
used on the prediction task through a corpus and architecture that integrates
turn-taking classifiers based on Maximum Likelihood Expectation (MLE),
Convolutional Neural Networks (CNN) and Finite State Automata (FSA). The corpus
is a synthetic adaptation of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ)
to a multiple travel service-based bots scenario with dialogue errors and was
created to simulate user’s interaction and evaluate the architecture. We
present experimental results which show that the CNN approach achieves better
performance than the baseline with an accuracy of 92.34%, but the integrated
solution with MLE, CNN and FSA achieves performance even better, with 95.65%.
RobBERT: a Dutch RoBERTa-based Language Model
Comments: 7 pages, 2 tables
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
Pre-trained language models have been dominating the field of natural
language processing in recent years, and have led to significant performance
gains for various complex natural language tasks. One of the most prominent
pre-trained language models is BERT (Bi-directional Encoders for Transformers),
which was released as an English as well as a multilingual version. Although
multilingual BERT performs well on many tasks, recent studies showed that BERT
models trained on a single language significantly outperform the multilingual
results. Training a Dutch BERT model thus has a lot of potential for a wide
range of Dutch NLP tasks. While previous approaches have used earlier
implementations of BERT to train their Dutch BERT, we used RoBERTa, a robustly
optimized BERT approach, to train a Dutch language model called RobBERT. We
show that RobBERT improves state of the art results in Dutch-specific language
tasks, and also outperforms other existing Dutch BERT-based models in sentiment
analysis. These results indicate that RobBERT is a powerful pre-trained model
for fine-tuning for a large variety of Dutch language tasks. We publicly
release this pre-trained model in hope of supporting further downstream Dutch
NLP applications.
Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System
Comments: DSTC8 collocated with AAAI2020
Subjects:
Computation and Language (cs.CL)
Understanding dynamic scenes and dialogue contexts in order to converse with
users has been challenging for multimodal dialogue systems. The 8-th Dialog
System Technology Challenge (DSTC8) proposed an Audio Visual Scene-Aware Dialog
(AVSD) task, which contains multiple modalities including audio, vision, and
language, to evaluate how dialogue systems understand different modalities and
response to users. In this paper, we proposed a multi-step joint-modality
attention network (JMAN) based on recurrent neural network (RNN) to reason on
videos. Our model performs a multi-step attention mechanism and jointly
considers both visual and textual representations in each reasoning process to
better integrate information from the two different modalities. Compared to the
baseline released by AVSD organizers, our model achieves a relative 12.1% and
22.4% improvement over the baseline on ROUGE-L score and CIDEr score.
Plato Dialogue System: A Flexible Conversational AI Research Platform
Alexandros Papangelis , Mahdi Namazifar , Chandra Khatri , Yi-Chia Wang , Piero Molino , Gokhan Tur Subjects : Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
As the field of Spoken Dialogue Systems and Conversational AI grows, so does
the need for tools and environments that abstract away implementation details
in order to expedite the development process, lower the barrier of entry to the
field, and offer a common test-bed for new ideas. In this paper, we present
Plato, a flexible Conversational AI platform written in Python that supports
any kind of conversational agent architecture, from standard architectures to
architectures with jointly-trained components, single- or multi-party
interactions, and offline or online training of any conversational agent
component. Plato has been designed to be easy to understand and debug and is
agnostic to the underlying learning frameworks that train each component.
Supervised Speaker Embedding De-Mixing in Two-Speaker Environment
Comments: Submitted to Odyssey 2020
Subjects:
Sound (cs.SD)
; Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In this work, a speaker embedding de-mixing approach is proposed. Instead of
separating two-speaker signal in signal space like speech source separation,
the proposed approach separates different speaker properties from two-speaker
signal in embedding space. The proposed approach contains two steps. In step
one, the clean speaker embeddings are learned and collected by a residual TDNN
based network. In step two, the two-speaker signal and the embedding of one of
the speakers are input to a speaker embedding de-mixing network. The de-mixing
network is trained to generate the embedding of the other speaker of the by
reconstruction loss. Speaker identification accuracy on the de-mixed speaker
embeddings is used to evaluate the quality of the obtained embeddings.
Experiments are done in two kind of data: artificial augmented two-speaker data
(TIMIT) and real world recording of two-speaker data (MC-WSJ). Six diffident
speaker embedding de-mixing architectures are investigated. Comparing with the
speaker identification accuracy on the clean speaker embeddings (98.5%), the
obtained results show that one of the speaker embedding de-mixing architectures
obtain close performance, reaching 96.9% test accuracy on TIMIT when the SNR
between the target speaker and interfering speaker is 5 dB. More surprisingly,
we found choosing a simple subtraction as the embedding de-mixing function
could obtain the second best performance, reaching 95.2% test accuracy.
On- Device Information Extraction from Screenshots in form of tags
Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a method to make mobile screenshots easily searchable. In this
paper, we present the workflow in which we: 1) preprocessed a collection of
screenshots, 2) identified script presentin image, 3) extracted unstructured
text from images, 4) identifiedlanguage of the extracted text, 5) extracted
keywords from the text, 6) identified tags based on image features, 7) expanded
tag set by identifying related keywords, 8) inserted image tags with relevant
images after ranking and indexed them to make it searchable on device. We made
the pipeline which supports multiple languages and executed it on-device, which
addressed privacy concerns. We developed novel architectures for components in
the pipeline, optimized performance and memory for on-device computation. We
observed from experimentation that the solution developed can reduce overall
user effort and improve end user experience while searching, whose results are
published.
User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant
Comments: To be published as a conference paper in the proceedings of IUI’20
Journal-ref: 25th International Conference on Intelligent User Interfaces (IUI
’20), March 17–20, 2020, Cagliari, Italy
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
People are becoming increasingly comfortable using Digital Assistants (DAs)
to interact with services or connected objects. However, for non-programming
users, the available possibilities for customizing their DA are limited and do
not include the possibility of teaching the assistant new tasks. To make the
most of the potential of DAs, users should be able to customize assistants by
instructing them through Natural Language (NL). To provide such
functionalities, NL interpretation in traditional assistants should be
improved: (1) The intent identification system should be able to recognize new
forms of known intents, and to acquire new intents as they are expressed by the
user. (2) In order to be adaptive to novel intents, the Natural Language
Understanding module should be sample efficient, and should not rely on a
pretrained model. Rather, the system should continuously collect the training
data as it learns new intents from the user. In this work, we propose AidMe
(Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop
adaptive intent detection framework that allows the assistant to adapt to its
user by learning his intents as their interaction progresses. AidMe builds its
repertoire of intents and collects data to train a model of semantic similarity
evaluation that can discriminate between the learned intents and autonomously
discover new forms of known intents. AidMe addresses two major issues – intent
learning and user adaptation – for instructable digital assistants. We
demonstrate the capabilities of AidMe as a standalone system by comparing it
with a one-shot learning system and a pretrained NLU module through simulations
of interactions with a user. We also show how AidMe can smoothly integrate to
an existing instructable digital assistant.
Distributed, Parallel, and Cluster Computing
Consistency of Proof-of-Stake Blockchains with Concurrent Honest Slot Leaders
Comments: Initial submission. arXiv admin note: text overlap with arXiv:1911.10187
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM)
We improve the fundamental security threshold of Proof-of-Stake (PoS)
blockchain protocols, reflecting for the first time the positive effect of
rounds with multiple honest leaders. Current analyses of the longest-chain rule
in PoS blockchain protocols reduce consistency to the dynamics of an abstract,
round-based block creation process determined by three probabilities: (p_A),
the probability that a round has at least one adversarial leader; (p_h), the
probability that a round has a single honest leader; and (p_H), the probability
that a round has multiple, but honest, leaders. We present a consistency
analysis that achieves the optimal threshold (p_h + p_H > p_A). This is a first
in the literature and can be applied to both the simple synchronous setting and
the setting with bounded delays. We also achieve the optimal consistency error
(e^{-Theta(k)}), (k) being the confirmation time.
The consistency analyses in Ouroboros Praos (Eurocrypt 2018) and Genesis (CCS
2018) assume that (p_h – p_H > p_A); the analyses in Sleepy Consensus
(Asiacrypt 2017) and Snow White (Fin. Crypto 2019) assume that (p_h > p_A).
Thus existing analyses either incur a penalty for multiply-honest rounds, or
treat them neutrally. In addition, previous analyses completely break down when
(p_h < p_A). Our new results can be directly applied to improve the consistency
of these existing protocols. We emphasize that these thresholds determine the
critical tradeoff between honest majority, network delays, and consistency
error.
We complement our results with a consistency analysis in the setting where
uniquely honest slots are rare, event letting (p_h = 0), under the added
assumption that honest players adopt a consistent chain selection rule. Our
analysis provides a direct connection between the Ouroboros analysis focusing
on “relative margin” and the Sleepy analysis focusing on “strong pivots.”
Dynamic Byzantine Reliable Broadcast [Technical Report]
Comments: This work has been supported in part by a grant from Interchain Foundation
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Reliable broadcast is a powerful primitive guaranteeing that, intuitively,
all processes in a distributed system deliver the same set of messages. There
is a twofold reason why this primitive is appealing: (i) we can implement it
deterministically in a completely asynchronous environment, unlike stronger
primitives like consensus and total-order broadcast, and yet (ii) it is
powerful enough to implement numerous useful applications like payment systems.
The problem we tackle in this paper is that of dynamic reliable broadcast,
i.e., enabling processes to join or leave the system. This is desirable
property for long-lived applications supposed to be highly available, yet has
been precluded in previous asynchronous reliable broadcast protocols.
We introduce the first specification of a dynamic Byzantine reliable
broadcast (DBRB) primitive that is amenable to an asynchronous implementation.
Indeed, we present an algorithm that implements this specification in an
asynchronous environment. Our algorithm ensures that if any correct process in
the system broadcasts (resp. delivers) a message, then every correct process in
the system delivers that message, or leaves the system. We assume that, at any
point in time, 2/3 of the processes in the system are correct, which is tight.
We also prove that even if only one process in the system can fail—and it can
fail by merely crashing—then it is impossible to implement a stronger
primitive, ensuring that if any correct process in the system broadcasts (resp.
delivers) a message, then every correct process in the system delivers that
message, including those that eventually leave.
Comments: 12 pages
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
With ever-increasing volumes of scientific floating-point data being produced
by high-performance computing applications, significantly reducing scientific
floating-point data size is critical, and error-controlled lossy compressors
have been developed for years. None of the existing scientific floating-point
lossy data compressors, however, support effective fixed-ratio lossy
compression. Yet fixed-ratio lossy compression for scientific floating-point
data not only compresses to the requested ratio but also respects a
user-specified error bound with higher fidelity. In this paper, we present
FRaZ: a generic fixed-ratio lossy compression framework respecting
user-specified error constraints. The contribution is twofold. (1) We develop
an efficient iterative approach to accurately determine the appropriate error
settings for different lossy compressors based on target compression ratios.
(2) We perform a thorough performance and accuracy evaluation for our proposed
fixed-ratio compression framework with multiple state-of-the-art
error-controlled lossy compressors, using several real-world scientific
floating-point datasets from different domains. Experiments show that FRaZ
effectively identifies the optimum error setting in the entire error setting
space of any given lossy compressor. While fixed-ratio lossy compression is
slower than fixed-error compression, it provides an important new lossy
compression technique for users of very large scientific floating-point
datasets.
Comments: 25 pages, 11 figures
Subjects:
Methodology (stat.ME)
; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Distributed statistical inference has recently attracted immense attention.
Herein, we study the asymptotic efficiency of the maximum likelihood estimator
(MLE), the one-step MLE, and the aggregated estimating equation estimator for
generalized linear models with a diverging number of covariates. Then a novel
method is proposed to obtain an asymptotically efficient estimator for
large-scale distributed data by two rounds of communication between local
machines and the central server. The assumption on the number of machines in
this paper is more relaxed and thus practical for real-world applications.
Simulations and a case study demonstrate the satisfactory finite-sample
performance of the proposed estimators.
Learning
Gradient descent with momentum — to accelerate or to super-accelerate?
Comments: 19 pages + references, 8 figures. A variant of Nesterov acceleration is proposed and studied
Subjects:
Machine Learning (cs.LG)
; Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider gradient descent with `momentum’, a widely used method for loss
function minimization in machine learning. This method is often used with
`Nesterov acceleration’, meaning that the gradient is evaluated not at the
current position in parameter space, but at the estimated position after one
step. In this work, we show that the algorithm can be improved by extending
this `acceleration’ — by using the gradient at an estimated position several
steps ahead rather than just one step ahead. How far one looks ahead in this
`super-acceleration’ algorithm is determined by a new hyperparameter.
Considering a one-parameter quadratic loss function, the optimal value of the
super-acceleration can be exactly calculated and analytically estimated. We
show explicitly that super-accelerating the momentum algorithm is beneficial,
not only for this idealized problem, but also for several synthetic loss
landscapes and for the MNIST classification task with neural networks.
Super-acceleration is also easy to incorporate into adaptive algorithms like
RMSProp or Adam, and is shown to improve these algorithms.
Lynton Ardizzone , Radek Mackowiak , Ullrich Köthe , Carsten Rother Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
The Information Bottleneck (IB) principle offers a unified approach to many
learning and prediction problems. Although optimal in an information-theoretic
sense, practical applications of IB are hampered by a lack of accurate
high-dimensional estimators of mutual information, its main constituent. We
propose to combine IB with invertible neural networks (INNs), which for the
first time allows exact calculation of the required mutual information. Applied
to classification, our proposed method results in a generative classifier we
call IB-INN. It accurately models the class conditional likelihoods,
generalizes well to unseen data and reliably recognizes out-of-distribution
examples. In contrast to existing generative classifiers, these advantages
incur only minor reductions in classification accuracy in comparison to
corresponding discriminative methods such as feed-forward networks.
Furthermore, we provide insight into why IB-INNs are superior to other
generative architectures and training procedures and show experimentally that
our method outperforms alternative models of comparable complexity.
Mikhail Hushchyn , Andrey Ustyuzhanin Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
The goal of the change-point detection is to discover changes of time series
distribution. One of the state of the art approaches of the change-point
detection are based on direct density ratio estimation. In this work we show
how existing algorithms can be generalized using various binary classification
and regression models. In particular, we show that the Gradient Boosting over
Decision Trees and Neural Networks can be used for this purpose. The algorithms
are tested on several synthetic and real-world datasets. The results show that
the proposed methods outperform classical RuLSIF algorithm. Discussion of cases
where the proposed algorithms have advantages over existing methods are also
provided.
Approximating Activation Functions
Comments: 10 Pages, 5 Figures, 1 Table
Subjects:
Machine Learning (cs.LG)
; Performance (cs.PF); Machine Learning (stat.ML)
ReLU is widely seen as the default choice for activation functions in neural
networks. However, there are cases where more complicated functions are
required. In particular, recurrent neural networks (such as LSTMs) make
extensive use of both hyperbolic tangent and sigmoid functions. These functions
are expensive to compute. We used function approximation techniques to develop
replacements for these functions and evaluated them empirically on three
popular network configurations. We find safe approximations that yield a 10% to
37% improvement in training times on the CPU. These approximations were
suitable for all cases we considered and we believe are appropriate
replacements for all networks using these activation functions. We also develop
ranged approximations which only apply in some cases due to restrictions on
their input domain. Our ranged approximations yield a performance improvement
of 20% to 53% in network training time. Our functions also match or
considerably out perform the ad-hoc approximations used in Theano and the
implementation of Word2Vec.
Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet
Comments: arXiv admin note: text overlap with arXiv:1912.07160
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Adversarial attacks on deep neural networks (DNNs) have been found for
several years. However, the existing adversarial attacks have high success
rates only when the information of the attacked DNN is well-known or could be
estimated by structure similarity or massive queries. In this paper, we propose
an emph{Attack on Attention} (AoA), a semantic feature commonly shared by
DNNs. The transferability of AoA is quite high. With no more than 10 queries of
the decision only, AoA can achieve almost 100\% success rate when attacking on
many popular DNNs. Even without query, AoA could keep a surprisingly high
attack performance. We apply AoA to generate 96020 adversarial samples from
ImageNet to defeat many neural networks, and thus name the dataset as
emph{DAmageNet}. 20 well-trained DNNs are tested on DAmageNet. Without
adversarial training, most of the tested DNNs have an error rate over 90\%.
DAmageNet is the first universal adversarial dataset and it could serve as a
benchmark for robustness testing and adversarial training.
Cyber Attack Detection thanks to Machine Learning Algorithms
Comments: 46 pages, 38 figures, project report
Subjects:
Machine Learning (cs.LG)
; Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
Cybersecurity attacks are growing both in frequency and sophistication over
the years. This increasing sophistication and complexity call for more
advancement and continuous innovation in defensive strategies. Traditional
methods of intrusion detection and deep packet inspection, while still largely
used and recommended, are no longer sufficient to meet the demands of growing
security threats. As computing power increases and cost drops, Machine Learning
is seen as an alternative method or an additional mechanism to defend against
malwares, botnets, and other attacks. This paper explores Machine Learning as a
viable solution by examining its capabilities to classify malicious traffic in
a network.
First, a strong data analysis is performed resulting in 22 extracted features
from the initial Netflow datasets. All these features are then compared with
one another through a feature selection process. Then, our approach analyzes
five different machine learning algorithms against NetFlow dataset containing
common botnets. The Random Forest Classifier succeeds in detecting more than
95% of the botnets in 8 out of 13 scenarios and more than 55% in the most
difficult datasets. Finally, insight is given to improve and generalize the
results, especially through a bootstrapping technique.
Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant
Shayan Aziznejad , Harshit Gupta , Joaquim Campos , Michael Unser Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
We introduce a variational framework to learn the activation functions of
deep neural networks. The main motivation is to control the Lipschitz
regularity of the input-output relation. To that end, we first establish a
global bound for the Lipschitz constant of neural networks. Based on the
obtained bound, we then formulate a variational problem for learning activation
functions. Our variational problem is infinite-dimensional and is not
computationally tractable. However, we prove that there always exists a
solution that has continuous and piecewise-linear (linear-spline) activations.
This reduces the original problem to a finite-dimensional minimization. We
numerically compare our scheme with standard ReLU network and its variations,
PReLU and LeakyReLU.
Comments: preprint for TII: SS on Applications of Artificial Intelligence in Industrial Power Electronics and Systems
Subjects:
Machine Learning (cs.LG)
; Systems and Control (eess.SY); Machine Learning (stat.ML)
Monitoring the magnet temperature in permanent magnet synchronous motors
(PMSMs) for automotive applications is a challenging task for several decades
now, as signal injection or sensor-based methods still prove unfeasible in a
commercial context. Overheating results in severe motor deterioration and is
thus of high concern for the machine’s control strategy and its design. Lack of
precise temperature estimations leads to lesser device utilization and higher
material cost. In this work, several machine learning (ML) models are
empirically evaluated on their estimation accuracy for the task of predicting
latent high-dynamic magnet temperature profiles. The range of selected
algorithms covers as diverse approaches as possible with ordinary and weighted
least squares, support vector regression, (k)-nearest neighbors, randomized
trees and neural networks. Having test bench data available, it is shown that
ML approaches relying merely on collected data meet the estimation performance
of classical thermal models built on thermodynamic theory, yet not all kinds of
models render efficient use of large datasets or sufficient modeling
capacities. Especially linear regression and simple feed-forward neural
networks with optimized hyperparameters mark strong predictive quality at low
to moderate model sizes.
Sideways: Depth-Parallel Training of Video Models
Mateusz Malinowski , Grzegorz Swirszcz , Joao Carreira , Viorica Patraucean Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose Sideways, an approximate backpropagation scheme for training video
models. In standard backpropagation, the gradients and activations at every
computation step through the model are temporally synchronized. The forward
activations need to be stored until the backward pass is executed, preventing
inter-layer (depth) parallelization. However, can we leverage smooth, redundant
input streams such as videos to develop a more efficient training scheme? Here,
we explore an alternative to backpropagation; we overwrite network activations
whenever new ones, i.e., from new frames, become available. Such a more gradual
accumulation of information from both passes breaks the precise correspondence
between gradients and activations, leading to theoretically more noisy weight
updates. Counter-intuitively, we show that Sideways training of deep
convolutional video networks not only still converges, but can also potentially
exhibit better generalization compared to standard synchronized
backpropagation.
GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks
Qiang Huang , Makoto Yamada , Yuan Tian , Dinesh Singh , Dawei Yin , Yi Chang Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Graph structured data has wide applicability in various domains such as
physics, chemistry, biology, computer vision, and social networks, to name a
few. Recently, graph neural networks (GNN) were shown to be successful in
effectively representing graph structured data because of their good
performance and generalization ability. GNN is a deep learning based method
that learns a node representation by combining specific nodes and the
structural/topological information of a graph. However, like other deep models,
explaining the effectiveness of GNN models is a challenging task because of the
complex nonlinear transformations made over the iterations. In this paper, we
propose GraphLIME, a local interpretable model explanation for graphs using the
Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear
feature selection method. GraphLIME is a generic GNN-model explanation
framework that learns a nonlinear interpretable model locally in the subgraph
of the node being explained. More specifically, to explain a node, we generate
a nonlinear interpretable model from its (N)-hop neighborhood and then compute
the K most representative features as the explanations of its prediction using
HSIC Lasso. Through experiments on two real-world datasets, the explanations of
GraphLIME are found to be of extraordinary degree and more descriptive in
comparison to the existing explanation methods.
FedVision: An Online Visual Object Detection Platform Powered by Federated Learning
Yang Liu , Anbu Huang , Yun Luo , He Huang , Youzhi Liu , Yuanyuan Chen , Lican Feng , Tianjian Chen , Han Yu , Qiang Yang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Visual object detection is a computer vision-based artificial intelligence
(AI) technique which has many practical applications (e.g., fire hazard
monitoring). However, due to privacy concerns and the high cost of transmitting
video data, it is highly challenging to build object detection models on
centrally stored large training datasets following the current approach.
Federated learning (FL) is a promising approach to resolve this challenge.
Nevertheless, there currently lacks an easy to use tool to enable computer
vision application developers who are not experts in federated learning to
conveniently leverage this technology and apply it in their systems. In this
paper, we report FedVision – a machine learning engineering platform to support
the development of federated learning powered computer vision applications. The
platform has been deployed through a collaboration between WeBank and Extreme
Vision to help customers develop computer vision-based safety monitoring
solutions in smart city applications. Over four months of usage, it has
achieved significant efficiency improvement and cost reduction while removing
the need to transmit sensitive data for three major corporate customers. To the
best of our knowledge, this is the first real application of FL in computer
vision-based tasks.
DNNs as Layers of Cooperating Classifiers
Comments: Accepted at AAAI-2020. The preprint contains additional figures and an appendix not included in the conference version. Main text remains unchanged
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
A robust theoretical framework that can describe and predict the
generalization ability of deep neural networks (DNNs) in general circumstances
remains elusive. Classical attempts have produced complexity metrics that rely
heavily on global measures of compactness and capacity with little
investigation into the effects of sub-component collaboration. We demonstrate
intriguing regularities in the activation patterns of the hidden nodes within
fully-connected feedforward networks. By tracing the origin of these patterns,
we show how such networks can be viewed as the combination of two information
processing systems: one continuous and one discrete. We describe how these two
systems arise naturally from the gradient-based optimization process, and
demonstrate the classification ability of the two systems, individually and in
collaboration. This perspective on DNN classification offers a novel way to
think about generalization, in which different subsets of the training data are
used to train distinct classifiers; those classifiers are then combined to
perform the classification task, and their consistency is crucial for accurate
classification.
Comments: 25 pages, 4 figures
Subjects:
Machine Learning (cs.LG)
; Probability (math.PR); Machine Learning (stat.ML)
We introduce a deep neural network based method for solving a class of
elliptic partial differential equations. We approximate the solution of the PDE
with a deep neural network which is trained under the guidance of a
probabilistic representation of the PDE in the spirit of the Feynman-Kac
formula. The solution is given by an expectation of a martingale process driven
by a Brownian motion. As Brownian walkers explore the domain, the deep neural
network is iteratively trained using a form of reinforcement learning. Our
method is a ‘Derivative-Free Loss Method’ since it does not require the
explicit calculation of the derivatives of the neural network with respect to
the input neurons in order to compute the training loss. The advantages of our
method are showcased in a series of test problems: a corner singularity
problem, an interface problem, and an application to a chemotaxis population
model.
Graph Inference Learning for Semi-supervised Classification
Comments: 11 pages
Journal-ref: International Conference on Learning Representations (ICLR), 2020
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
In this work, we address semi-supervised classification of graph data, where
the categories of those unlabeled nodes are inferred from labeled nodes as well
as graph structures. Recent works often solve this problem via advanced graph
convolution in a conventionally supervised manner, but the performance could
degrade significantly when labeled data is scarce. To this end, we propose a
Graph Inference Learning (GIL) framework to boost the performance of
semi-supervised node classification by learning the inference of node labels on
graph topology. To bridge the connection between two nodes, we formally define
a structure relation by encapsulating node attributes, between-node paths, and
local topological structures together, which can make the inference
conveniently deduced from one node to another node. For learning the inference
process, we further introduce meta-optimization on structure relations from
training nodes to validation nodes, such that the learnt graph inference
capability can be better self-adapted to testing nodes. Comprehensive
evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed, and
NELL) demonstrate the superiority of our proposed GIL when compared against
state-of-the-art methods on the semi-supervised node classification task.
ADAMT: A Stochastic Optimization with Trend Correction Scheme
Bingxin Zhou , Xuebin Zheng , Junbin Gao Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Adam-type optimizers, as a class of adaptive moment estimation methods with
the exponential moving average scheme, have been successfully used in many
applications of deep learning. Such methods are appealing for capability on
large-scale sparse datasets with high computational efficiency. In this paper,
we present a new framework for adapting Adam-type methods, namely AdamT.
Instead of applying a simple exponential weighted average, AdamT also includes
the trend information when updating the parameters with the adaptive step size
and gradients. The additional terms promise an efficient movement on the
complex cost surface, and thus the loss would converge more rapidly. We show
empirically the importance of adding the trend component, where AdamT
outperforms the vanilla Adam method constantly with state-of-the-art models on
several classical real-world datasets.
Learning Stable Deep Dynamics Models
Comments: NeurIPS 2019
Subjects:
Machine Learning (cs.LG)
; Dynamical Systems (math.DS); Machine Learning (stat.ML)
Deep networks are commonly used to model dynamical systems, predicting how
the state of a system will evolve over time (either autonomously or in response
to control inputs). Despite the predictive power of these systems, it has been
difficult to make formal claims about the basic properties of the learned
systems. In this paper, we propose an approach for learning dynamical systems
that are guaranteed to be stable over the entire state space. The approach
works by jointly learning a dynamics model and Lyapunov function that
guarantees non-expansiveness of the dynamics under the learned Lyapunov
function. We show that such learning systems are able to model simple dynamical
systems and can be combined with additional deep generative models to learn
complex dynamics, such as video textures, in a fully end-to-end fashion.
Better Boosting with Bandits for Online Learning
Comments: 44 pages, 6 figures
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Probability estimates generated by boosting ensembles are poorly calibrated
because of the margin maximization nature of the algorithm. The outputs of the
ensemble need to be properly calibrated before they can be used as probability
estimates. In this work, we demonstrate that online boosting is also prone to
producing distorted probability estimates. In batch learning, calibration is
achieved by reserving part of the training data for training the calibrator
function. In the online setting, a decision needs to be made on each round:
shall the new example(s) be used to update the parameters of the ensemble or
those of the calibrator. We proceed to resolve this decision with the aid of
bandit optimization algorithms. We demonstrate superior performance to
uncalibrated and naively-calibrated on-line boosting ensembles in terms of
probability estimation. Our proposed mechanism can be easily adapted to other
tasks(e.g. cost-sensitive classification) and is robust to the choice of
hyperparameters of both the calibrator and the ensemble.
An adversarial learning framework for preserving users' anonymity in face-based emotion recognition
Vansh Narula , Zhangyang (Atlas)
Wang , Theodora Chaspari Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Image and video-capturing technologies have permeated our every-day life.
Such technologies can continuously monitor individuals’ expressions in
real-life settings, affording us new insights into their emotional states and
transitions, thus paving the way to novel well-being and healthcare
applications. Yet, due to the strong privacy concerns, the use of such
technologies is met with strong skepticism, since current face-based emotion
recognition systems relying on deep learning techniques tend to preserve
substantial information related to the identity of the user, apart from the
emotion-specific information. This paper proposes an adversarial learning
framework which relies on a convolutional neural network (CNN) architecture
trained through an iterative procedure for minimizing identity-specific
information and maximizing emotion-dependent information. The proposed approach
is evaluated through emotion classification and face identification metrics,
and is compared against two CNNs, one trained solely for emotion recognition
and the other trained solely for face identification. Experiments are performed
using the Yale Face Dataset and Japanese Female Facial Expression Database.
Results indicate that the proposed approach can learn a convolutional
transformation for preserving emotion recognition accuracy and degrading face
identity recognition, providing a foundation toward privacy-aware emotion
recognition technologies.
Comments: 6 pages, Accepted and to appear in ISQED 2020
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we propose Code-Bridged Classifier (CBC), a framework for
making a Convolutional Neural Network (CNNs) robust against adversarial attacks
without increasing or even by decreasing the overall models’ computational
complexity. More specifically, we propose a stacked encoder-convolutional
model, in which the input image is first encoded by the encoder module of a
denoising auto-encoder, and then the resulting latent representation (without
being decoded) is fed to a reduced complexity CNN for image classification. We
illustrate that this network not only is more robust to adversarial examples
but also has a significantly lower computational complexity when compared to
the prior art defenses.
Fairness Measures for Regression via Probabilistic Classification
Daniel Steinberg , Alistair Reid , Simon O'Callaghan Subjects : Machine Learning (cs.LG) ; Computers and Society (cs.CY); Machine Learning (stat.ML)
Algorithmic fairness involves expressing notions such as equity, or
reasonable treatment, as quantifiable measures that a machine learning
algorithm can optimise. Most work in the literature to date has focused on
classification problems where the prediction is categorical, such as accepting
or rejecting a loan application. This is in part because classification
fairness measures are easily computed by comparing the rates of outcomes,
leading to behaviours such as ensuring that the same fraction of eligible men
are selected as eligible women. But such measures are computationally difficult
to generalise to the continuous regression setting for problems such as
pricing, or allocating payments. The difficulty arises from estimating
conditional densities (such as the probability density that a system will
over-charge by a certain amount). For the regression setting we introduce
tractable approximations of the independence, separation and sufficiency
criteria by observing that they factorise as ratios of different conditional
probabilities of the protected attributes. We introduce and train machine
learning classifiers, distinct from the predictor, as a mechanism to estimate
these probabilities from the data. This naturally leads to model agnostic,
tractable approximations of the criteria, which we explore experimentally.
Fourier Transform Approach to Machine Learning III: Fourier Classification
Soheil Mehrabkhani Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
We propose a Fourier-based learning algorithm for highly nonlinear multiclass
classification. The algorithm is based on a smoothing technique to calculate
the probability distribution of all classes. To obtain the probability
distribution, the density distribution of each class is smoothed by a low-pass
filter separately. The advantage of the Fourier representation is capturing the
nonlinearities of the data distribution without defining any kernel function.
Furthermore, contrary to the support vector machines, it makes a probabilistic
explanation for the classification possible. Moreover, it can treat overlapped
classes as well. Comparing to the logistic regression, it does not require
feature engineering. In general, its computational performance is also very
well for large data sets and in contrast to other algorithms, the typical
overfitting problem does not happen at all. The capability of the algorithm is
demonstrated for multiclass classification with overlapped classes and very
high nonlinearity of the class distributions.
Understanding the Power of Persistence Pairing via Permutation Test
Comments: 20 pages, 6 graphs
Subjects:
Machine Learning (cs.LG)
; Computational Geometry (cs.CG); Machine Learning (stat.ML)
Recently many efforts have been made to incorporate persistence diagrams, one
of the major tools in topological data analysis (TDA), into machine learning
pipelines. To better understand the power and limitation of persistence
diagrams, we carry out a range of experiments on both graph data and shape
data, aiming to decouple and inspect the effects of different factors involved.
To this end, we also propose the so-called emph{permutation test} for
persistence diagrams to delineate critical values and pairings of critical
values. For graph classification tasks, we note that while persistence pairing
yields consistent improvement over various benchmark datasets, it appears that
for various filtration functions tested, most discriminative power comes from
critical values. For shape segmentation and classification, however, we note
that persistence pairing shows significant power on most of the benchmark
datasets, and improves over both summaries based on merely critical values, and
those based on permutation tests. Our results help provide insights on when
persistence diagram based summaries could be more suitable.
Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning
Paola Cascante-Bonilla , Fuwen Tan , Yanjun Qi , Vicente Ordonez Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Semi-supervised learning aims to take advantage of a large amount of
unlabeled data to improve the accuracy of a model that only has access to a
small number of labeled examples. We propose curriculum labeling, an approach
that exploits pseudo-labeling for propagating labels to unlabeled samples in an
iterative and self-paced fashion. This approach is surprisingly simple and
effective and surpasses or is comparable with the best methods proposed in the
recent literature across all the standard benchmarks for image classification.
Notably, we obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled
samples, and 88.56% top-5 accuracy on Imagenet-ILSVRC using 128,000 labeled
samples. In contrast to prior works, our approach shows improvements even in a
more realistic scenario that leverages out-of-distribution unlabeled data
samples.
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
Antoine Dedieu , Hussein Hazimeh , Rahul Mazumder Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)
We consider a discrete optimization based approach for learning sparse
classifiers, where the outcome depends upon a linear combination of a small
subset of features. Recent work has shown that mixed integer programming (MIP)
can be used to solve (to optimality) (ell_0)-regularized problems at scales
much larger than what was conventionally considered possible in the statistics
and machine learning communities. Despite their usefulness, MIP-based
approaches are significantly slower compared to relatively mature algorithms
based on (ell_1)-regularization and relatives. We aim to bridge this
computational gap by developing new MIP-based algorithms for
(ell_0)-regularized classification. We propose two classes of scalable
algorithms: an exact algorithm that can handle (papprox 50,000) features in a
few minutes, and approximate algorithms that can address instances with
(papprox 10^6) in times comparable to fast (ell_1)-based algorithms. Our
exact algorithm is based on the novel idea of extsl{integrality generation},
which solves the original problem (with (p) binary variables) via a sequence of
mixed integer programs that involve a small number of binary variables. Our
approximate algorithms are based on coordinate descent and local combinatorial
search. In addition, we present new estimation error bounds for a class of
(ell_0)-regularized estimators. Experiments on real and synthetic data
demonstrate that our approach leads to models with considerably improved
statistical performance (especially, variable selection) when compared to
competing toolkits.
Robust Generalization via (α)-Mutual Information
Comments: Accepted to IZS2020. arXiv admin note: substantial text overlap with arXiv:1912.01439
Subjects:
Information Theory (cs.IT)
; Machine Learning (cs.LG)
The aim of this work is to provide bounds connecting two probability measures
of the same event using Rényi (alpha)-Divergences and Sibson’s
(alpha)-Mutual Information, a generalization of respectively the
Kullback-Leibler Divergence and Shannon’s Mutual Information. A particular case
of interest can be found when the two probability measures considered are a
joint distribution and the corresponding product of marginals (representing the
statistically independent scenario). In this case, a bound using Sibson’s
(alpha-)Mutual Information is retrieved, extending a result involving Maximal
Leakage to general alphabets. These results have broad applications, from
bounding the generalization error of learning algorithms to the more general
framework of adaptive data analysis, provided that the divergences and/or
information measures used are amenable to such an analysis ({it i.e.,} are
robust to post-processing and compose adaptively). The generalization error
bounds are derived with respect to high-probability events but a corresponding
bound on expected generalization error is also retrieved.
Supervised Speaker Embedding De-Mixing in Two-Speaker Environment
Comments: Submitted to Odyssey 2020
Subjects:
Sound (cs.SD)
; Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In this work, a speaker embedding de-mixing approach is proposed. Instead of
separating two-speaker signal in signal space like speech source separation,
the proposed approach separates different speaker properties from two-speaker
signal in embedding space. The proposed approach contains two steps. In step
one, the clean speaker embeddings are learned and collected by a residual TDNN
based network. In step two, the two-speaker signal and the embedding of one of
the speakers are input to a speaker embedding de-mixing network. The de-mixing
network is trained to generate the embedding of the other speaker of the by
reconstruction loss. Speaker identification accuracy on the de-mixed speaker
embeddings is used to evaluate the quality of the obtained embeddings.
Experiments are done in two kind of data: artificial augmented two-speaker data
(TIMIT) and real world recording of two-speaker data (MC-WSJ). Six diffident
speaker embedding de-mixing architectures are investigated. Comparing with the
speaker identification accuracy on the clean speaker embeddings (98.5%), the
obtained results show that one of the speaker embedding de-mixing architectures
obtain close performance, reaching 96.9% test accuracy on TIMIT when the SNR
between the target speaker and interfering speaker is 5 dB. More surprisingly,
we found choosing a simple subtraction as the embedding de-mixing function
could obtain the second best performance, reaching 95.2% test accuracy.
Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks
Comments: 8 pages, 4 figures, AAAI 2020
Subjects:
Social and Information Networks (cs.SI)
; Machine Learning (cs.LG)
Social media has been developing rapidly in public due to its nature of
spreading new information, which leads to rumors being circulated. Meanwhile,
detecting rumors from such massive information in social media is becoming an
arduous challenge. Therefore, some deep learning methods are applied to
discover rumors through the way they spread, such as Recursive Neural Network
(RvNN) and so on. However, these deep learning methods only take into account
the patterns of deep propagation but ignore the structures of wide dispersion
in rumor detection. Actually, propagation and dispersion are two crucial
characteristics of rumors. In this paper, we propose a novel bi-directional
graph model, named Bi-Directional Graph Convolutional Networks (Bi-GCN), to
explore both characteristics by operating on both top-down and bottom-up
propagation of rumors. It leverages a GCN with a top-down directed graph of
rumor spreading to learn the patterns of rumor propagation, and a GCN with an
opposite directed graph of rumor diffusion to capture the structures of rumor
dispersion. Moreover, the information from the source post is involved in each
layer of GCN to enhance the influences from the roots of rumors. Encouraging
empirical results on several benchmarks confirm the superiority of the proposed
method over the state-of-the-art approaches.
DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images
Comments: arXiv admin note: text overlap with arXiv:1907.06490
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep learning methods for super-resolution of a remote sensing scene from
multiple unregistered low-resolution images have recently gained attention
thanks to a challenge proposed by the European Space Agency. This paper
presents an evolution of the winner of the challenge, showing how incorporating
non-local information in a convolutional neural network allows to exploit
self-similar patterns that provide enhanced regularization of the
super-resolution problem. Experiments on the dataset of the challenge show
improved performance over the state-of-the-art, which does not exploit
non-local information.
Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks
Comments: Accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 1-1, New York, USA
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
Ensemble methods, traditionally built with independently trained
de-correlated models, have proven to be efficient methods for reducing the
remaining residual generalization error, which results in robust and accurate
methods for real-world applications. In the context of deep learning, however,
training an ensemble of deep networks is costly and generates high redundancy
which is inefficient. In this paper, we present experiments on Ensembles with
Shared Representations (ESRs) based on convolutional networks to demonstrate,
quantitatively and qualitatively, their data processing efficiency and
scalability to large-scale datasets of facial expressions. We show that
redundancy and computational load can be dramatically reduced by varying the
branching level of the ESR without loss of diversity and generalization power,
which are both important for ensemble performance. Experiments on large-scale
datasets suggest that ESRs reduce the remaining residual generalization error
on the AffectNet and FER+ datasets, reach human-level performance, and
outperform state-of-the-art methods on facial expression recognition in the
wild using emotion and affect concepts.
Journal-ref: LWT, Volume 108, 2019, Pages 377-384
Subjects:
Signal Processing (eess.SP)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
It is crucial for the wine industry to have methods like electronic nose
systems (E-Noses) for real-time monitoring thresholds of acetic acid in wines,
preventing its spoilage or determining its quality. In this paper, we prove
that the portable and compact self-developed E-Nose, based on thin film
semiconductor (SnO2) sensors and trained with an approach that uses deep
Multilayer Perceptron (MLP) neural network, can perform early detection of wine
spoilage thresholds in routine tasks of wine quality control. To obtain rapid
and online detection, we propose a method of rising-window focused on raw data
processing to find an early portion of the sensor signals with the best
recognition performance. Our approach was compared with the conventional
approach employed in E-Noses for gas recognition that involves feature
extraction and selection techniques for preprocessing data, succeeded by a
Support Vector Machine (SVM) classifier. The results evidence that is possible
to classify three wine spoilage levels in 2.7 seconds after the gas injection
point, implying in a methodology 63 times faster than the results obtained with
the conventional approach in our experimental setup.
Gilles Vandewiele , Isabelle Dehaene , György Kovács , Lucas Sterckx , Olivier Janssens , Femke Ongenae , Femke De Backere , Filip De Turck , Kristien Roelens , Johan Decruyenaere , Sofie Van Hoecke , Thomas Demeester Subjects : Signal Processing (eess.SP) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Information extracted from electrohysterography recordings could potentially
prove to be an interesting additional source of information to estimate the
risk on preterm birth. Recently, a large number of studies have reported
near-perfect results to distinguish between recordings of patients that will
deliver term or preterm using a public resource, called the Term/Preterm
Electrohysterogram database. However, we argue that these results are overly
optimistic due to a methodological flaw being made. In this work, we focus on
one specific type of methodological flaw: applying over-sampling before
partitioning the data into mutually exclusive training and testing sets. We
show how this causes the results to be biased using two artificial datasets and
reproduce results of studies in which this flaw was identified. Moreover, we
evaluate the actual impact of over-sampling on predictive performance, when
applied prior to data partitioning, using the same methodologies of related
studies, to provide a realistic view of these methodologies’ generalization
capabilities. We make our research reproducible by providing all the code under
an open license.
Predicting the Physical Dynamics of Unseen 3D Objects
Comments: In Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020. arXiv admin note: text overlap with arXiv:1901.00466
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
Machines that can predict the effect of physical interactions on the dynamics
of previously unseen object instances are important for creating better robots
and interactive virtual worlds. In this work, we focus on predicting the
dynamics of 3D objects on a plane that have just been subjected to an impulsive
force. In particular, we predict the changes in state – 3D position, rotation,
velocities, and stability. Different from previous work, our approach can
generalize dynamics predictions to object shapes and initial conditions that
were unseen during training. Our method takes the 3D object’s shape as a point
cloud and its initial linear and angular velocities as input. We extract shape
features and use a recurrent neural network to predict the full change in state
at each time step. Our model can support training with data from both a physics
engine or the real world. Experiments show that we can accurately predict the
changes in state for unseen object geometries and initial conditions.
RobBERT: a Dutch RoBERTa-based Language Model
Comments: 7 pages, 2 tables
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
Pre-trained language models have been dominating the field of natural
language processing in recent years, and have led to significant performance
gains for various complex natural language tasks. One of the most prominent
pre-trained language models is BERT (Bi-directional Encoders for Transformers),
which was released as an English as well as a multilingual version. Although
multilingual BERT performs well on many tasks, recent studies showed that BERT
models trained on a single language significantly outperform the multilingual
results. Training a Dutch BERT model thus has a lot of potential for a wide
range of Dutch NLP tasks. While previous approaches have used earlier
implementations of BERT to train their Dutch BERT, we used RoBERTa, a robustly
optimized BERT approach, to train a Dutch language model called RobBERT. We
show that RobBERT improves state of the art results in Dutch-specific language
tasks, and also outperforms other existing Dutch BERT-based models in sentiment
analysis. These results indicate that RobBERT is a powerful pre-trained model
for fine-tuning for a large variety of Dutch language tasks. We publicly
release this pre-trained model in hope of supporting further downstream Dutch
NLP applications.
Epileptic Seizure Classification with Symmetric and Hybrid Bilinear Models
Comments: 9 pages, 4 figures, 3 tables
Subjects:
Signal Processing (eess.SP)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
Epilepsy affects nearly 1% of the global population, of which two thirds can
be treated by anti-epileptic drugs and a much lower percentage by surgery.
Diagnostic procedures for epilepsy and monitoring are highly specialized and
labour-intensive. The accuracy of the diagnosis is also complicated by
overlapping medical symptoms, varying levels of experience and inter-observer
variability among clinical professions. This paper proposes a novel hybrid
bilinear deep learning network with an application in the clinical procedures
of epilepsy classification diagnosis, where the use of surface
electroencephalogram (sEEG) and audiovisual monitoring is standard practice.
Hybrid bilinear models based on two types of feature extractors, namely
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), are
trained using Short-Time Fourier Transform (STFT) of one-second sEEG. In the
proposed hybrid models, CNNs extract spatio-temporal patterns, while RNNs focus
on the characteristics of temporal dynamics in relatively longer intervals
given the same input data. Second-order features, based on interactions between
these spatio-temporal features are further explored by bilinear pooling and
used for epilepsy classification. Our proposed methods obtain an F1-score of
97.4% on the Temple University Hospital Seizure Corpus and 97.2% on the
EPILEPSIAE dataset, comparing favourably to existing benchmarks for sEEG-based
seizure type classification. The open-source implementation of this study is
available at this https URL
Marc Bocquet , Julien Brajard , Alberto Carrassi , Laurent Bertino Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
The reconstruction from observations of high-dimensional chaotic dynamics
such as geophysical flows is hampered by (i) the partial and noisy observations
that can realistically be obtained, (ii) the need to learn from long time
series of data, and (iii) the unstable nature of the dynamics. To achieve such
inference from the observations over long time series, it has been suggested to
combine data assimilation and machine learning in several ways. We show how to
unify these approaches from a Bayesian perspective using
expectation-maximization and coordinate descents. Implementations and
approximations of these methods are also discussed. Finally, we numerically and
successfully test the approach on two relevant low-order chaotic models with
distinct identifiability.
SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On
Comments: Accepted at IEEE WACV 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Image-based virtual try-on for fashion has gained considerable attention
recently. The task requires trying on a clothing item on a target model image.
An efficient framework for this is composed of two stages: (1) warping
(transforming) the try-on cloth to align with the pose and shape of the target
model, and (2) a texture transfer module to seamlessly integrate the warped
try-on cloth onto the target model image. Existing methods suffer from
artifacts and distortions in their try-on output. In this work, we present
SieveNet, a framework for robust image-based virtual try-on. Firstly, we
introduce a multi-stage coarse-to-fine warping network to better model
fine-grained intricacies (while transforming the try-on cloth) and train it
with a novel perceptual geometric matching loss. Next, we introduce a try-on
cloth conditioned segmentation mask prior to improve the texture transfer
network. Finally, we also introduce a dueling triplet loss strategy for
training the texture translation network which further improves the quality of
the generated try-on results. We present extensive qualitative and quantitative
evaluations of each component of the proposed pipeline and show significant
performance improvements against the current state-of-the-art method.
Performance of Statistical and Machine Learning Techniques for Physical Layer Authentication
Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Information Forensics and Security. arXiv admin note: text overlap with arXiv:1909.07969
Subjects:
Cryptography and Security (cs.CR)
; Information Theory (cs.IT); Machine Learning (cs.LG)
In this paper we consider authentication at the physical layer, in which the
authenticator aims at distinguishing a legitimate supplicant from an attacker
on the basis of the characteristics of the communication channel.
Authentication is performed over a set of parallel wireless channels affected
by time-varying fading at the presence of a malicious attacker, whose channel
has a spatial correlation with the supplicant’s one. We first propose the use
of two different statistical decision methods, and we prove that using a large
number of references (in the form of channel estimates) affected by different
levels of time-varying fading is not beneficial from a security point of view.
We then propose to exploit classification methods based on machine learning. In
order to face the worst case of an authenticator provided with no forged
messages during training, we consider one-class classifiers. When instead the
training set includes some forged messages, we resort to more conventional
binary classifiers, considering the cases in which such messages are either
labelled or not. For the latter case, we exploit clustering algorithms to label
the training set. The performance of both nearest neighbor (NN) and support
vector machine (SVM) classification techniques is assessed. Through numerical
examples, we show that under the same probability of false alarm, one-class
classification (OCC) algorithms achieve the lowest probability of missed
detection when a small spatial correlation exists between the main channel and
the adversary one, while statistical methods are advantageous when the spatial
correlation between the two channels is large.
Comments: 25 pages, 11 figures
Subjects:
Methodology (stat.ME)
; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Distributed statistical inference has recently attracted immense attention.
Herein, we study the asymptotic efficiency of the maximum likelihood estimator
(MLE), the one-step MLE, and the aggregated estimating equation estimator for
generalized linear models with a diverging number of covariates. Then a novel
method is proposed to obtain an asymptotically efficient estimator for
large-scale distributed data by two rounds of communication between local
machines and the central server. The assumption on the number of machines in
this paper is more relaxed and thus practical for real-world applications.
Simulations and a case study demonstrate the satisfactory finite-sample
performance of the proposed estimators.
A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis
Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020
Subjects:
Artificial Intelligence (cs.AI)
; Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In recent years, Markov logic networks (MLNs) have been proposed as a
potentially useful paradigm for music signal analysis. Because all hidden
Markov models can be reformulated as MLNs, the latter can provide an
all-encompassing framework that reuses and extends previous work in the field.
However, just because it is theoretically possible to reformulate previous work
as MLNs, does not mean that it is advantageous. In this paper, we analyse some
proposed examples of MLNs for musical analysis and consider their practical
disadvantages when compared to formulating the same musical dependence
relationships as (dynamic) Bayesian networks. We argue that a number of
practical hurdles such as the lack of support for sequences and for arbitrary
continuous probability distributions make MLNs less than ideal for the proposed
musical applications, both in terms of easy of formulation and computational
requirements due to their required inference algorithms. These conclusions are
not specific to music, but apply to other fields as well, especially when
sequential data with continuous observations is involved. Finally, we show that
the ideas underlying the proposed examples can be expressed perfectly well in
the more commonly used framework of (dynamic) Bayesian networks.
Increasing the robustness of DNNs against image corruptions by playing the Game of Noise
Evgenia Rusak , Lukas Schott , Roland Zimmermann , Julian Bitterwolf , Oliver Bringmann , Matthias Bethge , Wieland Brendel Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
The human visual system is remarkably robust against a wide range of
naturally occurring variations and corruptions like rain or snow. In contrast,
the performance of modern image recognition models strongly degrades when
evaluated on previously unseen corruptions. Here, we demonstrate that a simple
but properly tuned training with additive Gaussian and Speckle noise
generalizes surprisingly well to unseen corruptions, easily reaching the
previous state of the art on the corruption benchmark ImageNet-C (with
ResNet50) and on MNIST-C. We build on top of these strong baseline results and
show that an adversarial training of the recognition model against uncorrelated
worst-case noise distributions leads to an additional increase in performance.
This regularization can be combined with previously proposed defense methods
for further improvement.
Comments: 6 pages; 1 figure
Subjects:
Software Engineering (cs.SE)
; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)
When developing autonomous systems, engineers and other stakeholders make
great effort to prepare the system for all foreseeable events and conditions.
However, these systems are still bound to encounter events and conditions that
were not considered at design time. For reasons like safety, cost, or ethics,
it is often highly desired that these new situations be handled correctly upon
first encounter. In this paper we first justify our position that there will
always exist unpredicted events and conditions, driven among others by: new
inventions in the real world; the diversity of world-wide system deployments
and uses; and, the non-negligible probability that multiple seemingly unlikely
events, which may be neglected at design time, will not only occur, but occur
together. We then argue that despite this unpredictability property, handling
these events and conditions is indeed possible. Hence, we offer and exemplify
design principles that when applied in advance, can enable systems to deal, in
the future, with unpredicted circumstances. We conclude with a discussion of
how this work and a broader theoretical study of the unexpected can contribute
toward a foundation of engineering principles for developing trustworthy
next-generation autonomous systems.
Extracting more from boosted decision trees: A high energy physics case study
Comments: Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada
Subjects:
Machine Learning (stat.ML)
; Machine Learning (cs.LG); Applications (stat.AP)
Particle identification is one of the core tasks in the data analysis
pipeline at the Large Hadron Collider (LHC). Statistically, this entails the
identification of rare signal events buried in immense backgrounds that mimic
the properties of the former. In machine learning parlance, particle
identification represents a classification problem characterized by overlapping
and imbalanced classes. Boosted decision trees (BDTs) have had tremendous
success in the particle identification domain but more recently have been
overshadowed by deep learning (DNNs) approaches. This work proposes an
algorithm to extract more out of standard boosted decision trees by targeting
their main weakness, susceptibility to overfitting. This novel construction
harnesses the meta-learning techniques of boosting and bagging simultaneously
and performs remarkably well on the ATLAS Higgs (H) to tau-tau data set (ATLAS
et al., 2014) which was the subject of the 2014 Higgs ML Challenge
(Adam-Bourdarios et al., 2015). While the decay of Higgs to a pair of tau
leptons was established in 2018 (CMS collaboration et al., 2017) at the
4.9(sigma) significance based on the 2016 data taking period, the 2014 public
data set continues to serve as a benchmark data set to test the performance of
supervised classification schemes. We show that the score achieved by the
proposed algorithm is very close to the published winning score which leverages
an ensemble of deep neural networks (DNNs). Although this paper focuses on a
single application, it is expected that this simple and robust technique will
find wider applications in high energy physics.
User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant
Comments: To be published as a conference paper in the proceedings of IUI’20
Journal-ref: 25th International Conference on Intelligent User Interfaces (IUI
’20), March 17–20, 2020, Cagliari, Italy
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
People are becoming increasingly comfortable using Digital Assistants (DAs)
to interact with services or connected objects. However, for non-programming
users, the available possibilities for customizing their DA are limited and do
not include the possibility of teaching the assistant new tasks. To make the
most of the potential of DAs, users should be able to customize assistants by
instructing them through Natural Language (NL). To provide such
functionalities, NL interpretation in traditional assistants should be
improved: (1) The intent identification system should be able to recognize new
forms of known intents, and to acquire new intents as they are expressed by the
user. (2) In order to be adaptive to novel intents, the Natural Language
Understanding module should be sample efficient, and should not rely on a
pretrained model. Rather, the system should continuously collect the training
data as it learns new intents from the user. In this work, we propose AidMe
(Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop
adaptive intent detection framework that allows the assistant to adapt to its
user by learning his intents as their interaction progresses. AidMe builds its
repertoire of intents and collects data to train a model of semantic similarity
evaluation that can discriminate between the learned intents and autonomously
discover new forms of known intents. AidMe addresses two major issues – intent
learning and user adaptation – for instructable digital assistants. We
demonstrate the capabilities of AidMe as a standalone system by comparing it
with a one-shot learning system and a pretrained NLU module through simulations
of interactions with a user. We also show how AidMe can smoothly integrate to
an existing instructable digital assistant.
Information Theory
Design and Analysis of Online Fountain Codes for Intermediate Performance
Jingxuan Huang , Zesong Fei , Congzhe Cao , Ming Xiao Subjects : Information Theory (cs.IT)
For the benefit of improved intermediate performance, recently online
fountain codes attract much research attention. However, there is a trade-off
between the intermediate performance and the full recovery overhead for online
fountain codes, which prevents them to be improved simultaneously. We analyze
this trade-off, and propose to improve both of these two performance. We first
propose a method called Online Fountain Codes without Build-up phase (OFCNB)
where the degree-1 coded symbols are transmitted at first and the build-up
phase is removed to improve the intermediate performance. Then we analyze the
performance of OFCNB theoretically. Motivated by the analysis results, we
propose Systematic Online Fountain Codes (SOFC) to further reduce the full
recovery overhead. Theoretical analysis shows that SOFC has better intermediate
performance, and it also requires lower full recovery overhead when the channel
erasure rate is lower than a constant. Simulation results verify the analyses
and demonstrate the superior performance of OFCNB and SOFC in comparison to
other online fountain codes.
Robust Generalization via (α)-Mutual Information
Comments: Accepted to IZS2020. arXiv admin note: substantial text overlap with arXiv:1912.01439
Subjects:
Information Theory (cs.IT)
; Machine Learning (cs.LG)
The aim of this work is to provide bounds connecting two probability measures
of the same event using Rényi (alpha)-Divergences and Sibson’s
(alpha)-Mutual Information, a generalization of respectively the
Kullback-Leibler Divergence and Shannon’s Mutual Information. A particular case
of interest can be found when the two probability measures considered are a
joint distribution and the corresponding product of marginals (representing the
statistically independent scenario). In this case, a bound using Sibson’s
(alpha-)Mutual Information is retrieved, extending a result involving Maximal
Leakage to general alphabets. These results have broad applications, from
bounding the generalization error of learning algorithms to the more general
framework of adaptive data analysis, provided that the divergences and/or
information measures used are amenable to such an analysis ({it i.e.,} are
robust to post-processing and compose adaptively). The generalization error
bounds are derived with respect to high-probability events but a corresponding
bound on expected generalization error is also retrieved.
On the Capacity of Private Monomial Computation
Comments: Accepted for 2020 International Zurich Seminar on Information and Communication
Subjects:
Information Theory (cs.IT)
In this work, we consider private monomial computation (PMC) for replicated
noncolluding databases. In PMC, a user wishes to privately retrieve an
arbitrary multivariate monomial from a candidate set of monomials in (f)
messages over a finite field (mathbb F_q), where (q=p^k) is a power of a prime
(p) and (k ge 1), replicated over (n) databases. We derive the PMC capacity
under a technical condition on (p) and for asymptotically large (q). The
condition on (p) is satisfied, e.g., for large enough (p). Also, we present a
novel PMC scheme for arbitrary (q) that is capacity-achieving in the asymptotic
case above. Moreover, we present formulas for the entropy of a multivariate
monomial and for a set of monomials in uniformly distributed random variables
over a finite field, which are used in the derivation of the capacity
expression.
DNA-Based Storage: Models and Fundamental Limits
Comments: Submitted to IEEE Transaction of Information Theory; in parts presented at ISIT 2017 and ISIT 2019. arXiv admin note: text overlap with arXiv:1705.04732
Subjects:
Information Theory (cs.IT)
Due to its longevity and enormous information density, DNA is an attractive
medium for archival storage. In this work, we study the fundamental limits and
trade-offs of DNA-based storage systems by introducing a new channel model,
which we call the noisy shuffling-sampling channel. Motivated by current
technological constraints on DNA synthesis and sequencing, this model captures
three key distinctive aspects of DNA storage systems: (1) the data is written
onto many short DNA molecules; (2) the molecules are corrupted by noise during
synthesis and sequencing and (3) the data is read by randomly sampling from the
DNA pool. We provide capacity results for this channel under specific noise and
sampling assumptions and show that, in many scenarios, a simple index-based
coding scheme is optimal.
Point-line incidence on Grassmannians and majority logic decoding of Grassmann codes
Peter Beelen , Prasant Singh Subjects : Information Theory (cs.IT) ; Algebraic Geometry (math.AG)
In this article, we consider the decoding problem of Grassmann codes using
majority logic. We show that for two points of the Grassmannian, there exists a
canonical path between these points once a complete flag is fixed. These paths
are used to construct a large set of parity checks orthogonal on a coordinate
of the code, resulting in a majority decoding algorithm.
Data-Driven Ensembles for Deep and Hard-Decision Hybrid Decoding
Tomer Raviv , Nir Raviv , Yair Be'ery Subjects : Information Theory (cs.IT)
Ensemble models are widely used to solve complex tasks by their decomposition
into multiple simpler tasks, each one solved locally by a single member of the
ensemble. Decoding of error-correction codes is a hard problem due to the curse
of dimensionality, leading one to consider ensembles-of-decoders as a possible
solution. Nonetheless, one must take complexity into account, especially in
decoding. We suggest a low-complexity scheme where a single member participates
in the decoding of each word. First, the distribution of feasible words is
partitioned into non-overlapping regions. Thereafter, specialized experts are
formed by independently training each member on a single region. A classical
hard-decision decoder (HDD) is employed to map every word to a single expert in
an injective manner. FER gains of up to 0.4dB at the waterfall region, and of
1.25dB at the error floor region are achieved for two BCH(63,36) and (63,45)
codes with cycle-reduced parity-check matrices, compared to the previous best
result of the paper “Active Deep Decoding of Linear Codes”.
Duplication with transposition distance to the root for (q)-ary strings
Comments: 6 pages, 1 table, submitted to International Symposium on Information Theory (ISIT) 2020
Subjects:
Information Theory (cs.IT)
We study the duplication with transposition distance between strings of
length (n) over a (q)-ary alphabet and their roots. In other words, we
investigate the number of duplication operations of the form (x = (abcd) o y
= (abcbd)), where (x) and (y) are strings and (a), (b), (c) and (d) are their
substrings, needed to get a (q)-ary string of length (n) starting from the set
of strings without duplications. For exact duplication, we prove that the
maximal distance between a string of length at most (n) and its root has the
asymptotic order (n/log n). For approximate duplication, where a
(eta)-fraction of symbols may be duplicated incorrectly, we show that the
maximal distance has a sharp transition from the order (n/log n) to (log n)
at (eta=(q-1)/q). The motivation for this problem comes from genomics, where
such duplications represent a special kind of mutation and the distance between
a given biological sequence and its root is the smallest number of
transposition mutations required to generate the sequence.
Chebyshev Inertial Landweber Algorithm for Linear Inverse Problems
Comments: 5 pages
Subjects:
Information Theory (cs.IT)
; Numerical Analysis (math.NA)
The Landweber algorithm defined on complex/real Hilbert spaces is a gradient
descent algorithm for linear inverse problems. Our contribution is to present a
novel method for accelerating convergence of the Landweber algorithm. In this
paper, we first extend the theory of the Chebyshev inertial iteration to the
Landweber algorithm on Hilbert spaces. An upper bound on the convergence rate
clarifies the speed of global convergence of the proposed method. The Chebyshev
inertial Landweber algorithm can be applied to wide class of signal recovery
problems on a Hilbert space including deconvolution for continuous signals. The
theoretical discussion developed in this paper naturally leads to a novel
practical signal recovery algorithm. As a demonstration, a MIMO detection
algorithm based on the projected Landweber algorithm is derived. The proposed
MIMO detection algorithm achieves much smaller symbol error rate compared with
the MMSE detector.
Yanjun Han Subjects : Information Theory (cs.IT) ; Functional Analysis (math.FA)
We show a general phenomenon of the constrained functional value for
densities satisfying general convexity conditions, which generalizes the
observation in Bobkov and Madiman (2011) that the entropy per coordinate in a
log-concave random vector in any dimension with given density at the mode has a
range of just 1. Specifically, for general functions (phi) and (psi), we
derive upper and lower bounds of density functionals taking the form (I_phi(f)
= int_{mathbb{R}^n} phi(f(x))dx) assuming the convexity of (psi^{-1}(f(x)))
for the density, and establish the tightness of these bounds under mild
conditions satisfied by most examples. We apply this result to the distributed
simulation of continuous random variables, and establish an upper bound of the
exact common information for (eta)-concave joint densities, which is a
generalization of the log-concave densities in Li and El Gamal (2017).
Low Complexity Algorithms for Transmission of Short Blocks over the BSC with Full Feedback
Comments: Submitted to ISIT 2020; comments welcome!
Subjects:
Information Theory (cs.IT)
Building on the work of Horstein, Shayevitz and Feder, and Naghshvar et al.,
this paper presents algorithms for low-complexity sequential transmission of a
(k)-bit message over the binary symmetric channel (BSC) with full, noiseless
feedback. To lower complexity, this paper shows that the initial (k) binary
transmissions can be sent before any feedback is required and groups the
messages with equal posteriors to reduce the number of posterior updates from
exponential in (k) to linear in (k). Simulations results demonstrate the
achievable rates for this full, noiseless feedback system approach capacity
rapidly as a function of (k), significantly faster than the achievable rate
curve of Polyanskiy et al. for a stop feedback system.
An Efficient Algorithm for Designing Optimal CRCs for Tail-Biting Convolutional Codes
Comments: Submitted to ISIT 2020; comments welcome!
Subjects:
Information Theory (cs.IT)
This paper proposes an efficient algorithm for designing the
distance-spectrum-optimal (DSO) cyclic redundancy check (CRC) polynomial for a
given tail-biting convolutional code (TBCC). Lou et al. proposed DSO CRC design
methodology for a given zero-terminated convolutional code (ZTCC), in which the
fundamental design principle is to maximize the minimum distance at which an
undetectable error event of ZTCC first occurs. This paper applies the same
principle to design the DSO CRC for a given TBCC. Our algorithm is based on
partitioning the tail-biting trellis into several disjoint sets of tail-biting
paths that are closed under cyclic shifts. This paper shows that the
tail-biting path in each set can be constructed by concatenating the
irreducible error events and/or circularly shifting the resultant path. This
motivates an efficient collection algorithm that aims at gathering irreducible
error events, and a search algorithm that reconstructs the full list of error
events in the order of increasing distance, which can be used to find the DSO
CRC for a given TBCC.
GSSMD: New metric for robust and interpretable assay quality assessment and hit selection
Comments: Submitted to Research Synthesis Methods
Subjects:
Applications (stat.AP)
; Information Theory (cs.IT); Quantitative Methods (q-bio.QM); Methodology (stat.ME)
In the high-throughput screening (HTS) campaigns, the Z’-factor and strictly
standardized mean difference (SSMD) are commonly used to assess the quality of
assays and to select hits. However, these measures are vulnerable to outliers
and their performances are highly sensitive to background distributions. Here,
we propose an alternative measure for assay quality assessment and hit
selection. The proposed method is a non-parametric generalized variant of SSMD
(GSSMD). In this paper, we have shown that the proposed method provides more
robust and intuitive way of assay quality assessment and hit selection.
Performance of Statistical and Machine Learning Techniques for Physical Layer Authentication
Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Information Forensics and Security. arXiv admin note: text overlap with arXiv:1909.07969
Subjects:
Cryptography and Security (cs.CR)
; Information Theory (cs.IT); Machine Learning (cs.LG)
In this paper we consider authentication at the physical layer, in which the
authenticator aims at distinguishing a legitimate supplicant from an attacker
on the basis of the characteristics of the communication channel.
Authentication is performed over a set of parallel wireless channels affected
by time-varying fading at the presence of a malicious attacker, whose channel
has a spatial correlation with the supplicant’s one. We first propose the use
of two different statistical decision methods, and we prove that using a large
number of references (in the form of channel estimates) affected by different
levels of time-varying fading is not beneficial from a security point of view.
We then propose to exploit classification methods based on machine learning. In
order to face the worst case of an authenticator provided with no forged
messages during training, we consider one-class classifiers. When instead the
training set includes some forged messages, we resort to more conventional
binary classifiers, considering the cases in which such messages are either
labelled or not. For the latter case, we exploit clustering algorithms to label
the training set. The performance of both nearest neighbor (NN) and support
vector machine (SVM) classification techniques is assessed. Through numerical
examples, we show that under the same probability of false alarm, one-class
classification (OCC) algorithms achieve the lowest probability of missed
detection when a small spatial correlation exists between the main channel and
the adversary one, while statistical methods are advantageous when the spatial
correlation between the two channels is large.
Proceedings 16th Workshop on Quantitative Aspects of Programming Languages and Systems
Journal-ref: EPTCS 312, 2020
Subjects:
Programming Languages (cs.PL)
; Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Logic in Computer Science (cs.LO)
This EPTCS volume contains the proceedings of the 16th Workshop on
Quantitative Aspects of Programming Languages and Systems (QAPL 2019) held in
Prague, Czech Republic, on Sunday 7 April 2019. QAPL 2019 was a satellite event
of the European Joint Conferences on Theory and Practice of Software (ETAPS
2019).
QAPL focuses on quantitative aspects of computations, which may refer to the
use of physical quantities (time, bandwidth, etc.) as well as mathematical
quantities (e.g., probabilities) for the characterisation of the behaviour and
for determining the properties of systems. Such quantities play a central role
in defining both the model of systems (architecture, language design,
semantics) and the methodologies and tools for the analysis and verification of
system properties. The aim of the QAPL workshop series is to discuss the
explicit use of time and probability and general quantities either directly in
the model or as a tool for the analysis or synthesis of systems.
The 16th edition of QAPL also focuses on discussing the developments,
challenges and results in this area covered by our workshop in its nearly
20-year history.
Comments: 33 pages, 20 figures
Subjects:
Functional Analysis (math.FA)
; Information Theory (cs.IT)
Framelets (a.k.a. wavelet frames) are of interest in both theory and
applications. Quite often, tight or dual framelets with high vanishing moments
are constructed through the popular oblique extension principle (OEP). Though
OEP can increase vanishing moments for improved sparsity, it has a serious
shortcoming for scalar framelets: the associated discrete framelet transform is
often not compact and deconvolution is unavoidable. Here we say that a framelet
transform is compact if it can be implemented by convolution using only
finitely supported filters. On the other hand, in sharp contrast to the
extensively studied scalar framelets, multiframelets (a.k.a. vector framelets)
derived through OEP from refinable vector functions are much less studied and
are far from well understood. Also, most constructed multiframelets often lack
balancing property which reduces sparsity. In this paper, we are particularly
interested in quasi-tight multiframelets, which are special dual multiframelets
but behave almost identically as tight multiframelets. From any compactly
supported emph{refinable vector function having at least two entries}, we
prove that we can always construct through OEP a compactly supported
quasi-tight multiframelet such that (1) its associated discrete framelet
transform is compact and has the highest possible balancing order; (2) all
compactly supported framelet generators have the highest possible order of
vanishing moments, matching the approximation/accuracy order of its underlying
refinable vector function. This result demonstrates great advantages of OEP for
multiframelets (retaining all the desired properties) over scalar framelets.
欢迎加入我爱机器学习QQ14群:336582044
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Java JDK6学习笔记
林信良 / 清华大学出版社 / 2007-4 / 59.90元
《Java JDK6学习笔记》是作者良葛格本人近几年来学习Java的心得笔记,结构按照作者的学习脉络依次展开,从什么是Java、如何配置Java开发环境、基本的Java语法到程序流程控制、管理类文件、异常处理、枚举类型、泛型、J2SE中标准的API等均进行了详细介绍。本书还安排了一个“文字编辑器”的专题制作。此外,Java SE6的新功能,对Java lang等套件的功能加强,以及JDBC4.0、......一起来看看 《Java JDK6学习笔记》 这本书的介绍吧!