arXiv Paper Daily: Mon, 20 Jan 2020

栏目: IT技术 · 发布时间: 5年前

内容简介:We consider the problem of unsupervised camera pose estimation. Given aninput video sequence, our goal is to estimate the camera pose (i.e. the cameramotion) between consecutive frames. Traditionally, this problem is tackled by

Computer Vision and Pattern Recognition

Unsupervised Learning of Camera Pose with Compositional Re-estimation

Seyed Shahabeddin Nabavi , Mehrdad Hosseinzadeh , Ramin Fahimi , Yang Wang

Comments: Accepted to WACV 2020



Computer Vision and Pattern Recognition (cs.CV)

We consider the problem of unsupervised camera pose estimation. Given an

input video sequence, our goal is to estimate the camera pose (i.e. the camera

motion) between consecutive frames. Traditionally, this problem is tackled by

placing strict constraints on the transformation vector or by incorporating

optical flow through a complex pipeline. We propose an alternative approach

that utilizes a compositional re-estimation process for camera pose estimation.

Given an input, we first estimate a depth map. Our method then iteratively

estimates the camera motion based on the estimated depth map. Our approach

significantly improves the predicted camera motion both quantitatively and

visually. Furthermore, the re-estimation resolves the problem of

out-of-boundaries pixels in a novel and simple way. Another advantage of our

approach is that it is adaptable to other camera pose estimation approaches.

Experimental analysis on KITTI benchmark dataset demonstrates that our method

outperforms existing state-of-the-art approaches in unsupervised camera

ego-motion estimation.

Combining PRNU and noiseprint for robust and efficient device source identification

Davide Cozzolino , Francesco Marra , Diego Gragnaniello , Giovanni Poggi , Luisa Verdoliva Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Image and Video Processing (eess.IV)

PRNU-based image processing is a key asset in digital multimedia forensics.

It allows for reliable device identification and effective detection and

localization of image forgeries, in very general conditions. However,

performance impairs significantly in challenging conditions involving low

quality and quantity of data. These include working on compressed and cropped

images, or estimating the camera PRNU pattern based on only a few images. To

boost the performance of PRNU-based analyses in such conditions we propose to

leverage the image noiseprint, a recently proposed camera-model fingerprint

that has proved effective for several forensic tasks. Numerical experiments on

datasets widely used for source identification prove that the proposed method

ensures a significant performance improvement in a wide range of challenging


TailorGAN: Making User-Defined Fashion Designs

Lele Chen , Justin Tian , Guo Li , Cheng-Haw Wu , Erh-Kan King , Kuan-Ting Chen , Shao-Hang Hsieh

Comments: fashion

Journal-ref: 2020 Winter Conference on Applications of Computer Vision



Computer Vision and Pattern Recognition (cs.CV)

Attribute editing has become an important and emerging topic of computer

vision. In this paper, we consider a task: given a reference garment image A

and another image B with target attribute (collar/sleeve), generate a

photo-realistic image which combines the texture from reference A and the new

attribute from reference B. The highly convoluted attributes and the lack of

paired data are the main challenges to the task. To overcome those limitations,

we propose a novel self-supervised model to synthesize garment images with

disentangled attributes (e.g., collar and sleeves) without paired data. Our

method consists of a reconstruction learning step and an adversarial learning

step. The model learns texture and location information through reconstruction

learning. And, the model’s capability is generalized to achieve

single-attribute manipulation by adversarial learning. Meanwhile, we compose a

new dataset, named GarmentSet, with annotation of landmarks of collars and

sleeves on clean garment images. Extensive experiments on this dataset and

real-world samples demonstrate that our method can synthesize much better

results than the state-of-the-art methods in both quantitative and qualitative


Subjective Annotation for a Frame Interpolation Benchmark using Artifact Amplification

Hui Men , Vlad Hosu , Hanhe Lin , Andrés Bruhn , Dietmar Saupe

Comments: arXiv admin note: text overlap with arXiv:1901.05362



Computer Vision and Pattern Recognition (cs.CV)

Current benchmarks for optical flow algorithms evaluate the estimation either

directly by comparing the predicted flow fields with the ground truth or

indirectly by using the predicted flow fields for frame interpolation and then

comparing the interpolated frames with the actual frames. In the latter case,

objective quality measures such as the mean squared error are typically

employed. However, it is well known that for image quality assessment, the

actual quality experienced by the user cannot be fully deduced from such simple

measures. Hence, we conducted a subjective quality assessment crowdscouring

study for the interpolated frames provided by one of the optical flow

benchmarks, the Middlebury benchmark. It contains interpolated frames from 155

methods applied to each of 8 contents. We collected forced choice paired

comparisons between interpolated images and corresponding ground truth. To

increase the sensitivity of observers when judging minute difference in paired

comparisons we introduced a new method to the field of full-reference quality

assessment, called artifact amplification. From the crowdsourcing data we

reconstructed absolute quality scale values according to Thurstone’s model. As

a result, we obtained a re-ranking of the 155 participating algorithms w.r.t.

the visual quality of the interpolated frames. This re-ranking not only shows

the necessity of visual quality assessment as another evaluation metric for

optical flow and frame interpolation benchmarks, the results also provide the

ground truth for designing novel image quality assessment (IQA) methods

dedicated to perceptual quality of interpolated images. As a first step, we

proposed such a new full-reference method, called WAE-IQA. By weighing the

local differences between an interpolated image and its ground truth WAE-IQA

performed slightly better than the currently best FR-IQA approach from the


GraphBGS: Background Subtraction via Recovery of Graph Signals

Jhony H. Giraldo , Thierry Bouwmans Subjects : Computer Vision and Pattern Recognition (cs.CV)

Graph-based algorithms have been successful approaching the problems of

unsupervised and semi-supervised learning. Recently, the theory of graph signal

processing and semi-supervised learning have been combined leading to new

developments and insights in the field of machine learning. In this paper,

concepts of recovery of graph signals and semi-supervised learning are

introduced in the problem of background subtraction. We propose a new algorithm

named GraphBGS, this method uses a Mask R-CNN for instances segmentation;

temporal median filter for background initialization; motion, texture, color,

and structural features for representing the nodes of a graph; k-nearest

neighbors for the construction of the graph; and finally a semi-supervised

method inspired from the theory of recovery of graph signals to solve the

problem of background subtraction. The method is evaluated on the publicly

available change detection, and scene background initialization databases.

Experimental results show that GraphBGS outperforms unsupervised background

subtraction algorithms in some challenges of the change detection dataset. And

most significantly, this method outperforms generative adversarial networks in

unseen videos in some sequences of the scene background initialization


Latency-Aware Differentiable Neural Architecture Search

Yuhui Xu , Lingxi Xie , Xiaopeng Zhang , Xin Chen , Bowen Shi , Qi Tian , Hongkai Xiong

Comments: 11 pages, 7 figures



Computer Vision and Pattern Recognition (cs.CV)

Differentiable neural architecture search methods became popular in automated

machine learning, mainly due to their low search costs and flexibility in

designing the search space. However, these methods suffer the difficulty in

optimizing network, so that the searched network is often unfriendly to

hardware. This paper deals with this problem by adding a differentiable latency

loss term into optimization, so that the search process can tradeoff between

accuracy and latency with a balancing coefficient. The core of latency

prediction is to encode each network architecture and feed it into a

multi-layer regressor, with the training data being collected from randomly

sampling a number of architectures and evaluating them on the hardware. We

evaluate our approach on NVIDIA Tesla-P100 GPUs. With 100K sampled

architectures (requiring a few hours), the latency prediction module arrives at

a relative error of lower than 10\%. Equipped with this module, the search

method can reduce the latency by 20% meanwhile preserving the accuracy. Our

approach also enjoys the ability of being transplanted to a wide range of

hardware platforms with very few efforts, or being used to optimizing other

non-differentiable factors such as power consumption.

BigEarthNet Deep Learning Models with A New Class-Nomenclature for Remote Sensing Image Understanding

Gencer Sumbul , Jian Kang , Tristan Kreuziger , Filipe Marcelino , Hugo Costa , Pedro Benevides , Mario Caetano , Begüm Demir

Comments: Submitted to IEEE Geoscience and Remote Sensing Magazine



Computer Vision and Pattern Recognition (cs.CV)

Success of deep neural networks in the framework of remote sensing (RS) image

analysis depends on the availability of a high number of annotated images.

BigEarthNet is a new large-scale Sentinel-2 benchmark archive that has been

recently introduced in RS to advance deep learning (DL) studies. Each image

patch in BigEarthNet is annotated with multi-labels provided by the CORINE Land

Cover (CLC) map of 2018 based on its most thematic detailed Level-3 class

nomenclature. BigEarthNet has enabled data-hungry DL algorithms to reach high

performance in the context of multi-label RS image retrieval and

classification. However, initial research demonstrates that some CLC classes

are challenging to be accurately described by considering only (single-date)

Sentinel-2 images. To further increase the effectiveness of BigEarthNet, in

this paper we introduce an alternative class-nomenclature to allow DL models

for better learning and describing the complex spatial and spectral information

content of the Sentinel-2 images. This is achieved by interpreting and

arranging the CLC Level-3 nomenclature based on the properties of Sentinel-2

images in a new nomenclature of 19 classes. Then, the new class-nomenclature of

BigEarthNet is used within state-of-the-art DL models (namely VGG model at the

depth of 16 and 19 layers [VGG16 and VGG19] and ResNet model at the depth of

50, 101 and 152 layers [ResNet50, ResNet101, ResNet152] as well as K-Branch CNN

model) in the context of multi-label classification. Experimental results show

that the models trained from scratch on BigEarthNet outperform those

pre-trained on ImageNet, especially in relation to some complex classes

including agriculture and other vegetated and natural environments. All DL

models are made publicly available, offering an important resource to guide

future progress on content based image retrieval and scene classification

problems in RS.

Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks

Henrique Siqueira , Sven Magg , Stefan Wermter

Comments: Accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 1-1, New York, USA



Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Ensemble methods, traditionally built with independently trained

de-correlated models, have proven to be efficient methods for reducing the

remaining residual generalization error, which results in robust and accurate

methods for real-world applications. In the context of deep learning, however,

training an ensemble of deep networks is costly and generates high redundancy

which is inefficient. In this paper, we present experiments on Ensembles with

Shared Representations (ESRs) based on convolutional networks to demonstrate,

quantitatively and qualitatively, their data processing efficiency and

scalability to large-scale datasets of facial expressions. We show that

redundancy and computational load can be dramatically reduced by varying the

branching level of the ESR without loss of diversity and generalization power,

which are both important for ensemble performance. Experiments on large-scale

datasets suggest that ESRs reduce the remaining residual generalization error

on the AffectNet and FER+ datasets, reach human-level performance, and

outperform state-of-the-art methods on facial expression recognition in the

wild using emotion and affect concepts.

Vision Meets Drones: Past, Present and Future

Pengfei Zhu , Longyin Wen , Dawei Du , Xiao Bian , Qinghua Hu , Haibin Ling

Comments: arXiv admin note: text overlap with arXiv:1804.07437



Computer Vision and Pattern Recognition (cs.CV)

Drones, or general UAVs, equipped with cameras have been fast deployed with a

wide range of applications, including agriculture, aerial photography, fast

delivery, and surveillance. Consequently, automatic understanding of visual

data collected from drones becomes highly demanding, bringing computer vision

and drones more and more closely. To promote and track the developments of

object detection and tracking algorithms, we have organized two challenge

workshops in conjunction with European Conference on Computer Vision (ECCV)

2018, and IEEE International Conference on Computer Vision (ICCV) 2019,

attracting more than 100 teams around the world. We provide a large-scale drone

captured dataset, VisDrone, which includes four tracks, i.e., (1) image object

detection, (2) video object detection, (3) single object tracking, and (4)

multi-object tracking. This paper first presents a thorough review of object

detection and tracking datasets and benchmarks, and discuss the challenges of

collecting large-scale drone-based object detection and tracking datasets with

fully manual annotations. After that, we describe our VisDrone dataset, which

is captured over various urban/suburban areas of (14) different cities across

China from North to South. Being the largest such dataset ever published,

VisDrone enables extensive evaluation and investigation of visual analysis

algorithms on the drone platform. We provide a detailed analysis of the current

state of the field of large-scale object detection and tracking on drones, and

conclude the challenge as well as propose future directions and improvements.

We expect the benchmark largely boost the research and development in video

analysis on drone platforms. All the datasets and experimental results can be

downloaded from the website: this https URL .

Predicting the Physical Dynamics of Unseen 3D Objects

Davis Rempe , Srinath Sridhar , He Wang , Leonidas J. Guibas

Comments: In Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020. arXiv admin note: text overlap with arXiv:1901.00466



Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Machines that can predict the effect of physical interactions on the dynamics

of previously unseen object instances are important for creating better robots

and interactive virtual worlds. In this work, we focus on predicting the

dynamics of 3D objects on a plane that have just been subjected to an impulsive

force. In particular, we predict the changes in state – 3D position, rotation,

velocities, and stability. Different from previous work, our approach can

generalize dynamics predictions to object shapes and initial conditions that

were unseen during training. Our method takes the 3D object’s shape as a point

cloud and its initial linear and angular velocities as input. We extract shape

features and use a recurrent neural network to predict the full change in state

at each time step. Our model can support training with data from both a physics

engine or the real world. Experiments show that we can accurately predict the

changes in state for unseen object geometries and initial conditions.

Review: deep learning on 3D point clouds

Saifullahi Aminu Bello , Shangshu Yu , Cheng Wang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Point cloud is point sets defined in 3D metric space. Point cloud has become

one of the most significant data format for 3D representation. Its gaining

increased popularity as a result of increased availability of acquisition

devices, such as LiDAR, as well as increased application in areas such as

robotics, autonomous driving, augmented and virtual reality. Deep learning is

now the most powerful tool for data processing in computer vision, becoming the

most preferred technique for tasks such as classification, segmentation, and

detection. While deep learning techniques are mainly applied to data with a

structured grid, point cloud, on the other hand, is unstructured. The

unstructuredness of point clouds makes use of deep learning for its processing

directly very challenging. Earlier approaches overcome this challenge by

preprocessing the point cloud into a structured grid format at the cost of

increased computational cost or lost of depth information. Recently, however,

many state-of-the-arts deep learning techniques that directly operate on point

cloud are being developed. This paper contains a survey of the recent

state-of-the-art deep learning techniques that mainly focused on point cloud

data. We first briefly discussed the major challenges faced when using deep

learning directly on point cloud, we also briefly discussed earlier approaches

which overcome the challenges by preprocessing the point cloud into a

structured grid. We then give the review of the various state-of-the-art deep

learning approaches that directly process point cloud in its unstructured form.

We introduced the popular 3D point cloud benchmark datasets. And we also

further discussed the application of deep learning in popular 3D vision tasks

including classification, segmentation and detection.

Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network

Jungkyu Lee , Taeryun Won , Kiho Hong

Comments: 11 pages, 3 figures, 16 tables



Computer Vision and Pattern Recognition (cs.CV)

Recent studies in image classification have demonstrated a variety of

techniques for improving the performance of Convolutional Neural Networks

(CNNs). However, attempts to combine existing techniques to create a practical

model are still uncommon. In this study, we carry out extensive experiments to

validate that carefully assembling these techniques and applying them to a

basic CNN model in combination can improve the accuracy and robustness of the

model while minimizing the loss of throughput. For example, our proposed

ResNet-50 shows an improvement in top-1 accuracy from 76.3% to 82.78%, and an

mCE improvement from 76.0% to 48.9%, on the ImageNet ILSVRC2012 validation set.

With these improvements, inference throughput only decreases from 536 to 312.

The resulting model significantly outperforms state-of-the-art models with

similar accuracy in terms of mCE and inference throughput. To verify the

performance improvement in transfer learning, fine grained classification and

image retrieval tasks were tested on several open datasets and showed that the

improvement to backbone network performance boosted transfer learning

performance significantly. Our approach achieved 1st place in the iFood

Competition Fine-Grained Visual Recognition at CVPR 2019, and the source code

and trained models are available at this https URL

SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On

Surgan Jandial , Ayush Chopra , Kumar Ayush , Mayur Hemani , Abhijeet Kumar , Balaji Krishnamurthy

Comments: Accepted at IEEE WACV 2020



Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Image-based virtual try-on for fashion has gained considerable attention

recently. The task requires trying on a clothing item on a target model image.

An efficient framework for this is composed of two stages: (1) warping

(transforming) the try-on cloth to align with the pose and shape of the target

model, and (2) a texture transfer module to seamlessly integrate the warped

try-on cloth onto the target model image. Existing methods suffer from

artifacts and distortions in their try-on output. In this work, we present

SieveNet, a framework for robust image-based virtual try-on. Firstly, we

introduce a multi-stage coarse-to-fine warping network to better model

fine-grained intricacies (while transforming the try-on cloth) and train it

with a novel perceptual geometric matching loss. Next, we introduce a try-on

cloth conditioned segmentation mask prior to improve the texture transfer

network. Finally, we also introduce a dueling triplet loss strategy for

training the texture translation network which further improves the quality of

the generated try-on results. We present extensive qualitative and quantitative

evaluations of each component of the proposed pipeline and show significant

performance improvements against the current state-of-the-art method.

Two-Phase Object-Based Deep Learning for Multi-temporal SAR Image Change Detection

Xinzheng Zhang , Guo Liu , Ce Zhang , Peter M Atkinson , Xiaoheng Tan , Xin Jian , Xichuan Zhou , Yongming Li Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Image and Video Processing (eess.IV)

Change detection is one of the fundamental applications of synthetic aperture

radar (SAR) images. However, speckle noise presented in SAR images has a much

negative effect on change detection. In this research, a novel two-phase

object-based deep learning approach is proposed for multi-temporal SAR image

change detection. Compared with traditional methods, the proposed approach

brings two main innovations. One is to classify all pixels into three

categories rather than two categories: unchanged pixels, changed pixels caused

by strong speckle (false changes), and changed pixels formed by real terrain

variation (real changes). The other is to group neighboring pixels into

segmented into superpixel objects (from pixels) such as to exploit local

spatial context. Two phases are designed in the methodology: 1) Generate

objects based on the simple linear iterative clustering algorithm, and

discriminate these objects into changed and unchanged classes using fuzzy

c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the

set of changed and unchanged superpixels. 2) Deep learning on the pixel sets

over the changed superpixels only, obtained in the first phase, to discriminate

real changes from false changes. SLIC is employed again to achieve new

superpixels in the second phase. Low rank and sparse decomposition are applied

to these new superpixels to suppress speckle noise significantly. A further

clustering step is applied to these new superpixels via FCM. A new PCANet is

then trained to classify two kinds of changed superpixels to achieve the final

change maps. Numerical experiments demonstrate that, compared with benchmark

methods, the proposed approach can distinguish real changes from false changes

effectively with significantly reduced false alarm rates, and achieve up to

99.71% change detection accuracy using multi-temporal SAR imagery.

Registration made easy — standalone orthopedic navigation with HoloLens

Florentin Liebmann , Simon Roner , Marco von Atzigen , Florian Wanivenhaus , Caroline Neuhaus , José Spirig , Davide Scaramuzza , Reto Sutter , Jess Snedeker , Mazda Farshad , Philipp Fürnstahl

Comments: 6 pages, 5 figures, accepted at CVPR 2019 workshop on Computer Vision Applications for Mixed Reality Headsets ( this https URL )



Computer Vision and Pattern Recognition (cs.CV)

In surgical navigation, finding correspondence between preoperative plan and

intraoperative anatomy, the so-called registration task, is imperative. One

promising approach is to intraoperatively digitize anatomy and register it with

the preoperative plan. State-of-the-art commercial navigation systems implement

such approaches for pedicle screw placement in spinal fusion surgery. Although

these systems improve surgical accuracy, they are not gold standard in clinical

practice. Besides economical reasons, this may be due to their difficult

integration into clinical workflows and unintuitive navigation feedback.

Augmented Reality has the potential to overcome these limitations.

Consequently, we propose a surgical navigation approach comprising

intraoperative surface digitization for registration and intuitive holographic

navigation for pedicle screw placement that runs entirely on the Microsoft

HoloLens. Preliminary results from phantom experiments suggest that the method

may meet clinical accuracy requirements.

FPCR-Net: Feature Pyramidal Correlation and Residual Reconstruction for Semi-supervised Optical Flow Estimation

Xiaolin Song , Jingyu Yang , Cuiling Lan , Wenjun Zeng

Comments: 8 pages, 8 figures, 6 tables



Computer Vision and Pattern Recognition (cs.CV)

Optical flow estimation is an important yet challenging problem in the field

of video analytics. The features of different semantics levels/layers of a

convolutional neural network can provide information of different granularity.

To exploit such flexible and comprehensive information, we propose a

semi-supervised Feature Pyramidal Correlation and Residual Reconstruction

Network (FPCR-Net) for optical flow estimation from frame pairs. It consists of

two main modules: pyramid correlation mapping and residual reconstruction. The

pyramid correlation mapping module takes advantage of the multi-scale

correlations of global/local patches by aggregating features of different

scales to form a multi-level cost volume. The residual reconstruction module

aims to reconstruct the sub-band high-frequency residuals of finer optical flow

in each stage. Based on the pyramid correlation mapping, we further propose a

correlation-warping-normalization (CWN) module to efficiently exploit the

correlation dependency. Experiment results show that the proposed scheme

achieves the state-of-the-art performance, with improvement by 0.80, 1.15 and

0.10 in terms of average end-point error (AEE) against competing baseline

methods – FlowNet2, LiteFlowNet and PWC-Net on the Final pass of Sintel

dataset, respectively.

Interpreting Galaxy Deblender GAN from the Discriminator's Perspective

Heyi Li , Yuewei Lin , Klaus Mueller , Wei Xu

Comments: 5 pages, 4 figures



Computer Vision and Pattern Recognition (cs.CV)

; Image and Video Processing (eess.IV)

Generative adversarial networks (GANs) are well known for their unsupervised

learning capabilities. A recent success in the field of astronomy is deblending

two overlapping galaxy images via a branched GAN model. However, it remains a

significant challenge to comprehend how the network works, which is

particularly difficult for non-expert users. This research focuses on behaviors

of one of the network’s major components, the Discriminator, which plays a

vital role but is often overlooked, Specifically, we enhance the Layer-wise

Relevance Propagation (LRP) scheme to generate a heatmap-based visualization.

We call this technique Polarized-LRP and it consists of two parts i.e. positive

contribution heatmaps for ground truth images and negative contribution

heatmaps for generated images. Using the Galaxy Zoo dataset we demonstrate that

our method clearly reveals attention areas of the Discriminator when

differentiating generated galaxy images from ground truth images. To connect

the Discriminator’s impact on the Generator, we visualize the gradual changes

of the Generator across the training process. An interesting result we have

achieved there is the detection of a problematic data augmentation procedure

that would else have remained hidden. We find that our proposed method serves

as a useful visual analytical tool for a deeper understanding of GAN models.

Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition

Wenxuan Wang , Yanwei Fu , Qiang Sun , Tao Chen , Chenjie Cao , Ziqi Zheng , Guoqiang Xu , Han Qiu , Yu-Gang Jiang , Xiangyang Xue

Comments: 17 pages, 18 figures



Computer Vision and Pattern Recognition (cs.CV)

Affective computing and cognitive theory are widely used in modern

human-computer interaction scenarios. Human faces, as the most prominent and

easily accessible features, have attracted great attention from researchers.

Since humans have rich emotions and developed musculature, there exist a lot of

fine-grained expressions in real-world applications. However, it is extremely

time-consuming to collect and annotate a large number of facial images, of

which may even require psychologists to correctly categorize them. To the best

of our knowledge, the existing expression datasets are only limited to several

basic facial expressions, which are not sufficient to support our ambitions in

developing successful human-computer interaction systems. To this end, a novel

Fine-grained Facial Expression Database – F2ED is contributed in this paper,

and it includes more than 200k images with 54 facial expressions from 119

persons. Considering the phenomenon of uneven data distribution and lack of

samples is common in real-world scenarios, we further evaluate several tasks of

few-shot expression learning by virtue of our F2ED, which are to recognize the

facial expressions given only few training instances. These tasks mimic human

performance to learn robust and general representation from few examples. To

address such few-shot tasks, we propose a unified task-driven framework –

Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize

facial images and thus augmenting the instances of few-shot expression classes.

Extensive experiments are conducted on F2ED and existing facial expression

datasets, i.e., JAFFE and FER2013, to validate the efficacy of our F2ED in

pre-training facial expression recognition network and the effectiveness of our

proposed approach Comp-GAN to improve the performance of few-shot recognition


Spatio-Temporal Ranked-Attention Networks for Video Captioning

Anoop Cherian , Jue Wang , Chiori Hori , Tim K. Marks Subjects : Computer Vision and Pattern Recognition (cs.CV)

Generating video descriptions automatically is a challenging task that

involves a complex interplay between spatio-temporal visual features and

language models. Given that videos consist of spatial (frame-level) features

and their temporal evolutions, an effective captioning model should be able to

attend to these different cues selectively. To this end, we propose a

Spatio-Temporal and Temporo-Spatial (STaTS) attention model which, conditioned

on the language state, hierarchically combines spatial and temporal attention

to videos in two different orders: (i) a spatio-temporal (ST) sub-model, which

first attends to regions that have temporal evolution, then temporally pools

the features from these regions; and (ii) a temporo-spatial (TS) sub-model,

which first decides a single frame to attend to, then applies spatial attention

within that frame. We propose a novel LSTM-based temporal ranking function,

which we call ranked attention, for the ST model to capture action dynamics.

Our entire framework is trained end-to-end. We provide experiments on two

benchmark datasets: MSVD and MSR-VTT. Our results demonstrate the synergy

between the ST and TS modules, outperforming recent state-of-the-art methods.

Automatic Discovery of Political Meme Genres with Diverse Appearances

William Theisen , Joel Brogan , Pamela Bilo Thomas , Daniel Moreira , Pascal Phoa , Tim Weninger , Walter Scheirer

Comments: 16 pages, 10 figures



Computer Vision and Pattern Recognition (cs.CV)

; Social and Information Networks (cs.SI)

Forms of human communication are not static — we expect some evolution in

the way information is conveyed over time because of advances in technology.

One example of this phenomenon is the image-based meme, which has emerged as a

dominant form of political messaging in the past decade. While originally used

to spread jokes on social media, memes are now having an outsized impact on

public perception of world events. A significant challenge in automatic meme

analysis has been the development of a strategy to match memes from within a

single genre when the appearances of the images vary. Such variation is

especially common in memes exhibiting mimicry. For example, when voters perform

a common hand gesture to signal their support for a candidate. In this paper we

introduce a scalable automated visual recognition pipeline for discovering

political meme genres of diverse appearance. This pipeline can ingest meme

images from a social network, apply computer vision-based techniques to extract

local features and index new images into a database, and then organize the

memes into related genres. To validate this approach, we perform a large case

study on the 2019 Indonesian Presidential Election using a new dataset of over

two million images collected from Twitter and Instagram. Results show that this

approach can discover new meme genres with visually diverse images that share

common stylistic elements, paving the way forward for further work in semantic

analysis and content attribution.

On- Device Information Extraction from Screenshots in form of tags

Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)

We propose a method to make mobile screenshots easily searchable. In this

paper, we present the workflow in which we: 1) preprocessed a collection of

screenshots, 2) identified script presentin image, 3) extracted unstructured

text from images, 4) identifiedlanguage of the extracted text, 5) extracted

keywords from the text, 6) identified tags based on image features, 7) expanded

tag set by identifying related keywords, 8) inserted image tags with relevant

images after ranking and indexed them to make it searchable on device. We made

the pipeline which supports multiple languages and executed it on-device, which

addressed privacy concerns. We developed novel architectures for components in

the pipeline, optimized performance and memory for on-device computation. We

observed from experimentation that the solution developed can reduce overall

user effort and improve end user experience while searching, whose results are


Tracking of Micro Unmanned Aerial Vehicles: A Comparative Study

Fatih Gökçe

Comments: In proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2019), 13 pages, 9 Figures

Journal-ref: F. G”okc{c}e. Tracking of Micro Unmanned Aerial Vehicles: A

Comparative Study. In Proceedings of the International Conference on

Artificial Intelligence and Applied Mathematics in Engineering, Antalya,

Turkey, 20-22 Apr. 2019, pp.374-386



Computer Vision and Pattern Recognition (cs.CV)

; Robotics (cs.RO)

Micro unmanned aerial vehicles (mUAV) became very common in recent years. As

a result of their widespread usage, when they are flown by hobbyists illegally,

crucial risks are imposed and such mUAVs need to be sensed by security systems.

Furthermore, the sensing of mUAVs are essential for also swarm robotics

research where the individuals in a flock of robots require systems to sense

and localize each other for coordinated operation. In order to obtain such

systems, there are studies to detect mUAVs utilizing different sensing mediums,

such as vision, infrared and sound signals, and small-scale radars. However,

there are still challenges that awaits to be handled in this field such as

integrating tracking approaches to the vision-based detection systems to

enhance accuracy and computational complexity. For this reason, in this study,

we combine various tracking approaches to a vision-based mUAV detection system

available in the literature, in order to evaluate different tracking approaches

in terms of accuracy and as well as investigate the effect of such integration

to the computational cost.

Increasing the robustness of DNNs against image corruptions by playing the Game of Noise

Evgenia Rusak , Lukas Schott , Roland Zimmermann , Julian Bitterwolf , Oliver Bringmann , Matthias Bethge , Wieland Brendel Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Machine Learning (stat.ML)

The human visual system is remarkably robust against a wide range of

naturally occurring variations and corruptions like rain or snow. In contrast,

the performance of modern image recognition models strongly degrades when

evaluated on previously unseen corruptions. Here, we demonstrate that a simple

but properly tuned training with additive Gaussian and Speckle noise

generalizes surprisingly well to unseen corruptions, easily reaching the

previous state of the art on the corruption benchmark ImageNet-C (with

ResNet50) and on MNIST-C. We build on top of these strong baseline results and

show that an adversarial training of the recognition model against uncorrelated

worst-case noise distributions leads to an additional increase in performance.

This regularization can be combined with previously proposed defense methods

for further improvement.

Modality-Balanced Models for Visual Dialogue

Hyounghun Kim , Hao Tan , Mohit Bansal

Comments: AAAI 2020 (11 pages)



Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The Visual Dialog task requires a model to exploit both image and

conversational context information to generate the next response to the

dialogue. However, via manual analysis, we find that a large number of

conversational questions can be answered by only looking at the image without

any access to the context history, while others still need the conversation

context to predict the correct answers. We demonstrate that due to this reason,

previous joint-modality (history and image) models over-rely on and are more

prone to memorizing the dialogue history (e.g., by extracting certain keywords

or patterns in the context information), whereas image-only models are more

generalizable (because they cannot memorize or extract keywords from history)

and perform substantially better at the primary normalized discounted

cumulative gain (NDCG) task metric which allows multiple correct answers.

Hence, this observation encourages us to explicitly maintain two models, i.e.,

an image-only model and an image-history joint model, and combine their

complementary abilities for a more balanced multimodal model. We present

multiple methods for this integration of the two models, via ensemble and

consensus dropout fusion with shared parameters. Empirically, our models

achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and

high balance across metrics), and substantially outperform the winner of the

Visual Dialog challenge 2018 on most metrics.

Tethered Aerial Visual Assistance

Xuesu Xiao , Jan Dufek , Robin R. Murphy

Comments: Submitted to special issue of “Field and Service Robotics” of the Journal of Field Robotics (JFR). arXiv admin note: text overlap with arXiv:1904.00078



Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

In this paper, an autonomous tethered Unmanned Aerial Vehicle (UAV) is

developed into a visual assistant in a marsupial co-robots team, collaborating

with a tele-operated Unmanned Ground Vehicle (UGV) for robot operations in

unstructured or confined environments. These environments pose extreme

challenges to the remote tele-operator due to the lack of sufficient

situational awareness, mostly caused by the unstructuredness and confinement,

stationary and limited field-of-view and lack of depth perception from the

robot’s onboard cameras. To overcome these problems, a secondary tele-operated

robot is used in current practices, who acts as a visual assistant and provides

external viewpoints to overcome the perceptual limitations of the primary

robot’s onboard sensors. However, a second tele-operated robot requires extra

manpower and teamwork demand between primary and secondary operators. The

manually chosen viewpoints tend to be subjective and sub-optimal. Considering

these intricacies, we develop an autonomous tethered aerial visual assistant in

place of the secondary tele-operated robot and operator, to reduce human robot

ratio from 2:2 to 1:2. Using a fundamental viewpoint quality theory, a formal

risk reasoning framework, and a newly developed tethered motion suite, our

visual assistant is able to autonomously navigate to good-quality viewpoints in

a risk-aware manner through unstructured or confined spaces with a tether. The

developed marsupial co-robots team could improve tele-operation efficiency in

nuclear operations, bomb squad, disaster robots, and other domains with novel

tasks or highly occluded environments, by reducing manpower and teamwork

demand, and achieving better visual assistance quality with trustworthy

risk-aware motion.

DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images

Andrea Bordone Molini , Diego Valsesia , Giulia Fracastoro , Enrico Magli

Comments: arXiv admin note: text overlap with arXiv:1907.06490



Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning methods for super-resolution of a remote sensing scene from

multiple unregistered low-resolution images have recently gained attention

thanks to a challenge proposed by the European Space Agency. This paper

presents an evolution of the winner of the challenge, showing how incorporating

non-local information in a convolutional neural network allows to exploit

self-similar patterns that provide enhanced regularization of the

super-resolution problem. Experiments on the dataset of the challenge show

improved performance over the state-of-the-art, which does not exploit

non-local information.

Detection Method Based on Automatic Visual Shape Clustering for Pin-Missing Defect in Transmission Lines

Zhenbing Zhao , Hongyu Qi , Yincheng Qi , Ke Zhang , Yongjie Zhai , Wenqing Zhao Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV)

Bolts are the most numerous fasteners in transmission lines and are prone to

losing their split pins. How to realize the automatic pin-missing defect

detection for bolts in transmission lines so as to achieve timely and efficient

trouble shooting is a difficult problem and the long-term research target of

power systems. In this paper, an automatic detection model called Automatic

Visual Shape Clustering Network (AVSCNet) for pin-missing defect is

constructed. Firstly, an unsupervised clustering method for the visual shapes

of bolts is proposed and applied to construct a defect detection model which

can learn the difference of visual shape. Next, three deep convolutional neural

network optimization methods are used in the model: the feature enhancement,

feature fusion and region feature extraction. The defect detection results are

obtained by applying the regression calculation and classification to the

regional features. In this paper, the object detection model of different

networks is used to test the dataset of pin-missing defect constructed by the

aerial images of transmission lines from multiple locations, and it is

evaluated by various indicators and is fully verified. The results show that

our method can achieve considerably satisfactory detection effect.

Sideways: Depth-Parallel Training of Video Models

Mateusz Malinowski , Grzegorz Swirszcz , Joao Carreira , Viorica Patraucean Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose Sideways, an approximate backpropagation scheme for training video

models. In standard backpropagation, the gradients and activations at every

computation step through the model are temporally synchronized. The forward

activations need to be stored until the backward pass is executed, preventing

inter-layer (depth) parallelization. However, can we leverage smooth, redundant

input streams such as videos to develop a more efficient training scheme? Here,

we explore an alternative to backpropagation; we overwrite network activations

whenever new ones, i.e., from new frames, become available. Such a more gradual

accumulation of information from both passes breaks the precise correspondence

between gradients and activations, leading to theoretically more noisy weight

updates. Counter-intuitively, we show that Sideways training of deep

convolutional video networks not only still converges, but can also potentially

exhibit better generalization compared to standard synchronized


FedVision: An Online Visual Object Detection Platform Powered by Federated Learning

Yang Liu , Anbu Huang , Yun Luo , He Huang , Youzhi Liu , Yuanyuan Chen , Lican Feng , Tianjian Chen , Han Yu , Qiang Yang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Visual object detection is a computer vision-based artificial intelligence

(AI) technique which has many practical applications (e.g., fire hazard

monitoring). However, due to privacy concerns and the high cost of transmitting

video data, it is highly challenging to build object detection models on

centrally stored large training datasets following the current approach.

Federated learning (FL) is a promising approach to resolve this challenge.

Nevertheless, there currently lacks an easy to use tool to enable computer

vision application developers who are not experts in federated learning to

conveniently leverage this technology and apply it in their systems. In this

paper, we report FedVision – a machine learning engineering platform to support

the development of federated learning powered computer vision applications. The

platform has been deployed through a collaboration between WeBank and Extreme

Vision to help customers develop computer vision-based safety monitoring

solutions in smart city applications. Over four months of usage, it has

achieved significant efficiency improvement and cost reduction while removing

the need to transmit sensitive data for three major corporate customers. To the

best of our knowledge, this is the first real application of FL in computer

vision-based tasks.

Spatiotemporal Camera-LiDAR Calibration: A Targetless and Structureless Approach

Chanoh Park , Peyman Moghadam , Soohwan Kim , Sridha Sridharan , Clinton Fookes

Comments: 8 pages, To appear, IEEE Robotics and Automation Letters 2020



Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV)

The demand for multimodal sensing systems for robotics is growing due to the

increase in robustness, reliability and accuracy offered by these systems.

These systems also need to be spatially and temporally co-registered to be

effective. In this paper, we propose a targetless and structureless

spatiotemporal camera-LiDAR calibration method. Our method combines a

closed-form solution with a modified structureless bundle adjustment where the

coarse-to-fine approach does not {require} an initial guess on the

spatiotemporal parameters. Also, as 3D features (structure) are calculated from

triangulation only, there is no need to have a calibration target or to match

2D features with the 3D point cloud which provides flexibility in the

calibration process and sensor configuration. We demonstrate the accuracy and

robustness of the proposed method through both simulation and real data

experiments using multiple sensor payload configurations mounted to hand-held,

aerial and legged robot systems. Also, qualitative results are given in the

form of a colorized point cloud visualization.

An adversarial learning framework for preserving users' anonymity in face-based emotion recognition

Vansh Narula , Zhangyang (Atlas)

Wang , Theodora Chaspari Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Image and video-capturing technologies have permeated our every-day life.

Such technologies can continuously monitor individuals’ expressions in

real-life settings, affording us new insights into their emotional states and

transitions, thus paving the way to novel well-being and healthcare

applications. Yet, due to the strong privacy concerns, the use of such

technologies is met with strong skepticism, since current face-based emotion

recognition systems relying on deep learning techniques tend to preserve

substantial information related to the identity of the user, apart from the

emotion-specific information. This paper proposes an adversarial learning

framework which relies on a convolutional neural network (CNN) architecture

trained through an iterative procedure for minimizing identity-specific

information and maximizing emotion-dependent information. The proposed approach

is evaluated through emotion classification and face identification metrics,

and is compared against two CNNs, one trained solely for emotion recognition

and the other trained solely for face identification. Experiments are performed

using the Yale Face Dataset and Japanese Female Facial Expression Database.

Results indicate that the proposed approach can learn a convolutional

transformation for preserving emotion recognition accuracy and degrading face

identity recognition, providing a foundation toward privacy-aware emotion

recognition technologies.

Code-Bridged Classifier (CBC): A Low or Negative Overhead Defense for Making a CNN Classifier Robust Against Adversarial Attacks

Farnaz Behnia , Ali Mirzaeian , Mohammad Sabokrou , Sai Manoj , Tinoosh Mohsenin , Khaled N. Khasawneh , Liang Zhao , Houman Homayoun , Avesta Sasan

Comments: 6 pages, Accepted and to appear in ISQED 2020



Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

In this paper, we propose Code-Bridged Classifier (CBC), a framework for

making a Convolutional Neural Network (CNNs) robust against adversarial attacks

without increasing or even by decreasing the overall models’ computational

complexity. More specifically, we propose a stacked encoder-convolutional

model, in which the input image is first encoded by the encoder module of a

denoising auto-encoder, and then the resulting latent representation (without

being decoded) is fed to a reduced complexity CNN for image classification. We

illustrate that this network not only is more robust to adversarial examples

but also has a significantly lower computational complexity when compared to

the prior art defenses.

Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning

Paola Cascante-Bonilla , Fuwen Tan , Yanjun Qi , Vicente Ordonez Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Semi-supervised learning aims to take advantage of a large amount of

unlabeled data to improve the accuracy of a model that only has access to a

small number of labeled examples. We propose curriculum labeling, an approach

that exploits pseudo-labeling for propagating labels to unlabeled samples in an

iterative and self-paced fashion. This approach is surprisingly simple and

effective and surpasses or is comparable with the best methods proposed in the

recent literature across all the standard benchmarks for image classification.

Notably, we obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled

samples, and 88.56% top-5 accuracy on Imagenet-ILSVRC using 128,000 labeled

samples. In contrast to prior works, our approach shows improvements even in a

more realistic scenario that leverages out-of-distribution unlabeled data


Artificial Intelligence

Fast Compliance Checking with General Vocabularies

P. A. Bonatti , L. Ioffredo , I. M. Petrova , L. Sauro

Comments: arXiv admin note: substantial text overlap with arXiv:2001.05390



Artificial Intelligence (cs.AI)

We address the problem of complying with the GDPR while processing and

transferring personal data on the web. For this purpose we introduce an

extensible profile of OWL2 for representing data protection policies. With this

language, a company’s data usage policy can be checked for compliance with data

subjects’ consent and with a formalized fragment of the GDPR by means of

subsumption queries. The outer structure of the policies is restricted in order

to make compliance checking highly scalable, as required when processing

high-frequency data streams or large data volumes. However, the vocabularies

for specifying policy properties can be chosen rather freely from expressive

Horn fragments of OWL2. We exploit IBQ reasoning to integrate specialized

reasoners for the policy language and the vocabulary’s language. Our

experiments show that this approach significantly improves performance.

Visual Simplified Characters' Emotion Emulator Implementing OCC Model

Ana Lilia Laureano-Cruces , Laura Hernández-Domínguez , Martha Mora-Torres , Juan-Manuel Torres-Moreno , Jaime Enrique Cabrera-López

Comments: 7 pages, 14 figures, 2 tables

Journal-ref: CGST Conference on Computer Science and Engineering, Istanbul,

Turkey, 19-21 December 2011



Artificial Intelligence (cs.AI)

In this paper, we present a visual emulator of the emotions seen in

characters in stories. This system is based on a simplified view of the

cognitive structure of emotions proposed by Ortony, Clore and Collins (OCC

Model). The goal of this paper is to provide a visual platform that allows us

to observe changes in the characters’ different emotions, and the intricate

interrelationships between: 1) each character’s emotions, 2) their affective

relationships and actions, 3) The events that take place in the development of

a plot, and 4) the objects of desire that make up the emotional map of any

story. This tool was tested on stories with a contrasting variety of emotional

and affective environments: Othello, Twilight, and Harry Potter, behaving

sensibly and in keeping with the atmosphere in which the characters were


A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

Johan Pauwels , György Fazekas , Mark B. Sandler

Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020



Artificial Intelligence (cs.AI)

; Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In recent years, Markov logic networks (MLNs) have been proposed as a

potentially useful paradigm for music signal analysis. Because all hidden

Markov models can be reformulated as MLNs, the latter can provide an

all-encompassing framework that reuses and extends previous work in the field.

However, just because it is theoretically possible to reformulate previous work

as MLNs, does not mean that it is advantageous. In this paper, we analyse some

proposed examples of MLNs for musical analysis and consider their practical

disadvantages when compared to formulating the same musical dependence

relationships as (dynamic) Bayesian networks. We argue that a number of

practical hurdles such as the lack of support for sequences and for arbitrary

continuous probability distributions make MLNs less than ideal for the proposed

musical applications, both in terms of easy of formulation and computational

requirements due to their required inference algorithms. These conclusions are

not specific to music, but apply to other fields as well, especially when

sequential data with continuous observations is involved. Finally, we show that

the ideas underlying the proposed examples can be expressed perfectly well in

the more commonly used framework of (dynamic) Bayesian networks.

Plato Dialogue System: A Flexible Conversational AI Research Platform

Alexandros Papangelis , Mahdi Namazifar , Chandra Khatri , Yi-Chia Wang , Piero Molino , Gokhan Tur Subjects : Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

As the field of Spoken Dialogue Systems and Conversational AI grows, so does

the need for tools and environments that abstract away implementation details

in order to expedite the development process, lower the barrier of entry to the

field, and offer a common test-bed for new ideas. In this paper, we present

Plato, a flexible Conversational AI platform written in Python that supports

any kind of conversational agent architecture, from standard architectures to

architectures with jointly-trained components, single- or multi-party

interactions, and offline or online training of any conversational agent

component. Plato has been designed to be easy to understand and debug and is

agnostic to the underlying learning frameworks that train each component.

Modality-Balanced Models for Visual Dialogue

Hyounghun Kim , Hao Tan , Mohit Bansal

Comments: AAAI 2020 (11 pages)



Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The Visual Dialog task requires a model to exploit both image and

conversational context information to generate the next response to the

dialogue. However, via manual analysis, we find that a large number of

conversational questions can be answered by only looking at the image without

any access to the context history, while others still need the conversation

context to predict the correct answers. We demonstrate that due to this reason,

previous joint-modality (history and image) models over-rely on and are more

prone to memorizing the dialogue history (e.g., by extracting certain keywords

or patterns in the context information), whereas image-only models are more

generalizable (because they cannot memorize or extract keywords from history)

and perform substantially better at the primary normalized discounted

cumulative gain (NDCG) task metric which allows multiple correct answers.

Hence, this observation encourages us to explicitly maintain two models, i.e.,

an image-only model and an image-history joint model, and combine their

complementary abilities for a more balanced multimodal model. We present

multiple methods for this integration of the two models, via ensemble and

consensus dropout fusion with shared parameters. Empirically, our models

achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and

high balance across metrics), and substantially outperform the winner of the

Visual Dialog challenge 2018 on most metrics.

Expecting the Unexpected: Developing Autonomous-System Design Principles for Reacting to Unpredicted Events and Conditions

Assaf Marron , Lior Limonad , Sarah Pollack , David Harel

Comments: 6 pages; 1 figure



Software Engineering (cs.SE)

; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)

When developing autonomous systems, engineers and other stakeholders make

great effort to prepare the system for all foreseeable events and conditions.

However, these systems are still bound to encounter events and conditions that

were not considered at design time. For reasons like safety, cost, or ethics,

it is often highly desired that these new situations be handled correctly upon

first encounter. In this paper we first justify our position that there will

always exist unpredicted events and conditions, driven among others by: new

inventions in the real world; the diversity of world-wide system deployments

and uses; and, the non-negligible probability that multiple seemingly unlikely

events, which may be neglected at design time, will not only occur, but occur

together. We then argue that despite this unpredictability property, handling

these events and conditions is indeed possible. Hence, we offer and exemplify

design principles that when applied in advance, can enable systems to deal, in

the future, with unpredicted circumstances. We conclude with a discussion of

how this work and a broader theoretical study of the unexpected can contribute

toward a foundation of engineering principles for developing trustworthy

next-generation autonomous systems.

User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant

Nicolas Lair , Clément Delgrange , David Mugisha , Jean-Michel Dussoux , Pierre-Yves Oudeyer , Peter Ford Dominey

Comments: To be published as a conference paper in the proceedings of IUI’20

Journal-ref: 25th International Conference on Intelligent User Interfaces (IUI

’20), March 17–20, 2020, Cagliari, Italy



Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

People are becoming increasingly comfortable using Digital Assistants (DAs)

to interact with services or connected objects. However, for non-programming

users, the available possibilities for customizing their DA are limited and do

not include the possibility of teaching the assistant new tasks. To make the

most of the potential of DAs, users should be able to customize assistants by

instructing them through Natural Language (NL). To provide such

functionalities, NL interpretation in traditional assistants should be

improved: (1) The intent identification system should be able to recognize new

forms of known intents, and to acquire new intents as they are expressed by the

user. (2) In order to be adaptive to novel intents, the Natural Language

Understanding module should be sample efficient, and should not rely on a

pretrained model. Rather, the system should continuously collect the training

data as it learns new intents from the user. In this work, we propose AidMe

(Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop

adaptive intent detection framework that allows the assistant to adapt to its

user by learning his intents as their interaction progresses. AidMe builds its

repertoire of intents and collects data to train a model of semantic similarity

evaluation that can discriminate between the learned intents and autonomously

discover new forms of known intents. AidMe addresses two major issues – intent

learning and user adaptation – for instructable digital assistants. We

demonstrate the capabilities of AidMe as a standalone system by comparing it

with a one-shot learning system and a pretrained NLU module through simulations

of interactions with a user. We also show how AidMe can smoothly integrate to

an existing instructable digital assistant.

Information Retrieval

On- Device Information Extraction from Screenshots in form of tags

Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)

We propose a method to make mobile screenshots easily searchable. In this

paper, we present the workflow in which we: 1) preprocessed a collection of

screenshots, 2) identified script presentin image, 3) extracted unstructured

text from images, 4) identifiedlanguage of the extracted text, 5) extracted

keywords from the text, 6) identified tags based on image features, 7) expanded

tag set by identifying related keywords, 8) inserted image tags with relevant

images after ranking and indexed them to make it searchable on device. We made

the pipeline which supports multiple languages and executed it on-device, which

addressed privacy concerns. We developed novel architectures for components in

the pipeline, optimized performance and memory for on-device computation. We

observed from experimentation that the solution developed can reduce overall

user effort and improve end user experience while searching, whose results are


A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

Johan Pauwels , György Fazekas , Mark B. Sandler

Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020



Artificial Intelligence (cs.AI)

; Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In recent years, Markov logic networks (MLNs) have been proposed as a

potentially useful paradigm for music signal analysis. Because all hidden

Markov models can be reformulated as MLNs, the latter can provide an

all-encompassing framework that reuses and extends previous work in the field.

However, just because it is theoretically possible to reformulate previous work

as MLNs, does not mean that it is advantageous. In this paper, we analyse some

proposed examples of MLNs for musical analysis and consider their practical

disadvantages when compared to formulating the same musical dependence

relationships as (dynamic) Bayesian networks. We argue that a number of

practical hurdles such as the lack of support for sequences and for arbitrary

continuous probability distributions make MLNs less than ideal for the proposed

musical applications, both in terms of easy of formulation and computational

requirements due to their required inference algorithms. These conclusions are

not specific to music, but apply to other fields as well, especially when

sequential data with continuous observations is involved. Finally, we show that

the ideas underlying the proposed examples can be expressed perfectly well in

the more commonly used framework of (dynamic) Bayesian networks.

Computation and Language

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

Iker García , Rodrigo Agerri , German Rigau Subjects : Computation and Language (cs.CL)

This paper presents a new technique for creating monolingual and

cross-lingual meta-embeddings. Our method integrates multiple word embeddings

created from complementary techniques, textual sources, knowledge bases and

languages. Existing word vectors are projected to a common semantic space using

linear transformations and averaging. With our method the resulting

meta-embeddings maintain the dimensionality of the original embeddings without

losing information while dealing with the out-of-vocabulary problem. An

extensive empirical evaluation demonstrates the effectiveness of our technique

with respect to previous work on various intrinsic and extrinsic multilingual

evaluations, obtaining competitive results for Semantic Textual Similarity and

state-of-the-art performance for word similarity and POS tagging (English and

Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent

cross-lingual transfer learning capabilities. In other words, we can leverage

pre-trained source embeddings from a resource-rich language in order to improve

the word representations for under-resourced languages.

Modality-Balanced Models for Visual Dialogue

Hyounghun Kim , Hao Tan , Mohit Bansal

Comments: AAAI 2020 (11 pages)



Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The Visual Dialog task requires a model to exploit both image and

conversational context information to generate the next response to the

dialogue. However, via manual analysis, we find that a large number of

conversational questions can be answered by only looking at the image without

any access to the context history, while others still need the conversation

context to predict the correct answers. We demonstrate that due to this reason,

previous joint-modality (history and image) models over-rely on and are more

prone to memorizing the dialogue history (e.g., by extracting certain keywords

or patterns in the context information), whereas image-only models are more

generalizable (because they cannot memorize or extract keywords from history)

and perform substantially better at the primary normalized discounted

cumulative gain (NDCG) task metric which allows multiple correct answers.

Hence, this observation encourages us to explicitly maintain two models, i.e.,

an image-only model and an image-history joint model, and combine their

complementary abilities for a more balanced multimodal model. We present

multiple methods for this integration of the two models, via ensemble and

consensus dropout fusion with shared parameters. Empirically, our models

achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and

high balance across metrics), and substantially outperform the winner of the

Visual Dialog challenge 2018 on most metrics.

A Hybrid Solution to Learn Turn-Taking in Multi-Party Service-based Chat Groups

Maira Gatti de Bayser , Melina Alberio Guerra , Paulo Cavalin , Claudio Pinhanez

Comments: arXiv admin note: text overlap with arXiv:1907.02090



Computation and Language (cs.CL)

; Formal Languages and Automata Theory (cs.FL)

To predict the next most likely participant to interact in a multi-party

conversation is a difficult problem. In a text-based chat group, the only

information available is the sender, the content of the text and the dialogue

history. In this paper we present our study on how these information can be

used on the prediction task through a corpus and architecture that integrates

turn-taking classifiers based on Maximum Likelihood Expectation (MLE),

Convolutional Neural Networks (CNN) and Finite State Automata (FSA). The corpus

is a synthetic adaptation of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ)

to a multiple travel service-based bots scenario with dialogue errors and was

created to simulate user’s interaction and evaluate the architecture. We

present experimental results which show that the CNN approach achieves better

performance than the baseline with an accuracy of 92.34%, but the integrated

solution with MLE, CNN and FSA achieves performance even better, with 95.65%.

RobBERT: a Dutch RoBERTa-based Language Model

Pieter Delobelle , Thomas Winters , Bettina Berendt

Comments: 7 pages, 2 tables



Computation and Language (cs.CL)

; Machine Learning (cs.LG)

Pre-trained language models have been dominating the field of natural

language processing in recent years, and have led to significant performance

gains for various complex natural language tasks. One of the most prominent

pre-trained language models is BERT (Bi-directional Encoders for Transformers),

which was released as an English as well as a multilingual version. Although

multilingual BERT performs well on many tasks, recent studies showed that BERT

models trained on a single language significantly outperform the multilingual

results. Training a Dutch BERT model thus has a lot of potential for a wide

range of Dutch NLP tasks. While previous approaches have used earlier

implementations of BERT to train their Dutch BERT, we used RoBERTa, a robustly

optimized BERT approach, to train a Dutch language model called RobBERT. We

show that RobBERT improves state of the art results in Dutch-specific language

tasks, and also outperforms other existing Dutch BERT-based models in sentiment

analysis. These results indicate that RobBERT is a powerful pre-trained model

for fine-tuning for a large variety of Dutch language tasks. We publicly

release this pre-trained model in hope of supporting further downstream Dutch

NLP applications.

Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Yun-Wei Chu , Kuan-Yen Lin , Chao-Chun Hsu , Lun-Wei Ku

Comments: DSTC8 collocated with AAAI2020



Computation and Language (cs.CL)

Understanding dynamic scenes and dialogue contexts in order to converse with

users has been challenging for multimodal dialogue systems. The 8-th Dialog

System Technology Challenge (DSTC8) proposed an Audio Visual Scene-Aware Dialog

(AVSD) task, which contains multiple modalities including audio, vision, and

language, to evaluate how dialogue systems understand different modalities and

response to users. In this paper, we proposed a multi-step joint-modality

attention network (JMAN) based on recurrent neural network (RNN) to reason on

videos. Our model performs a multi-step attention mechanism and jointly

considers both visual and textual representations in each reasoning process to

better integrate information from the two different modalities. Compared to the

baseline released by AVSD organizers, our model achieves a relative 12.1% and

22.4% improvement over the baseline on ROUGE-L score and CIDEr score.

Plato Dialogue System: A Flexible Conversational AI Research Platform

Alexandros Papangelis , Mahdi Namazifar , Chandra Khatri , Yi-Chia Wang , Piero Molino , Gokhan Tur Subjects : Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

As the field of Spoken Dialogue Systems and Conversational AI grows, so does

the need for tools and environments that abstract away implementation details

in order to expedite the development process, lower the barrier of entry to the

field, and offer a common test-bed for new ideas. In this paper, we present

Plato, a flexible Conversational AI platform written in Python that supports

any kind of conversational agent architecture, from standard architectures to

architectures with jointly-trained components, single- or multi-party

interactions, and offline or online training of any conversational agent

component. Plato has been designed to be easy to understand and debug and is

agnostic to the underlying learning frameworks that train each component.

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

Yanpei Shi , Thomas Hain

Comments: Submitted to Odyssey 2020



Sound (cs.SD)

; Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this work, a speaker embedding de-mixing approach is proposed. Instead of

separating two-speaker signal in signal space like speech source separation,

the proposed approach separates different speaker properties from two-speaker

signal in embedding space. The proposed approach contains two steps. In step

one, the clean speaker embeddings are learned and collected by a residual TDNN

based network. In step two, the two-speaker signal and the embedding of one of

the speakers are input to a speaker embedding de-mixing network. The de-mixing

network is trained to generate the embedding of the other speaker of the by

reconstruction loss. Speaker identification accuracy on the de-mixed speaker

embeddings is used to evaluate the quality of the obtained embeddings.

Experiments are done in two kind of data: artificial augmented two-speaker data

(TIMIT) and real world recording of two-speaker data (MC-WSJ). Six diffident

speaker embedding de-mixing architectures are investigated. Comparing with the

speaker identification accuracy on the clean speaker embeddings (98.5%), the

obtained results show that one of the speaker embedding de-mixing architectures

obtain close performance, reaching 96.9% test accuracy on TIMIT when the SNR

between the target speaker and interfering speaker is 5 dB. More surprisingly,

we found choosing a simple subtraction as the embedding de-mixing function

could obtain the second best performance, reaching 95.2% test accuracy.

On- Device Information Extraction from Screenshots in form of tags

Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)

We propose a method to make mobile screenshots easily searchable. In this

paper, we present the workflow in which we: 1) preprocessed a collection of

screenshots, 2) identified script presentin image, 3) extracted unstructured

text from images, 4) identifiedlanguage of the extracted text, 5) extracted

keywords from the text, 6) identified tags based on image features, 7) expanded

tag set by identifying related keywords, 8) inserted image tags with relevant

images after ranking and indexed them to make it searchable on device. We made

the pipeline which supports multiple languages and executed it on-device, which

addressed privacy concerns. We developed novel architectures for components in

the pipeline, optimized performance and memory for on-device computation. We

observed from experimentation that the solution developed can reduce overall

user effort and improve end user experience while searching, whose results are


User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant

Nicolas Lair , Clément Delgrange , David Mugisha , Jean-Michel Dussoux , Pierre-Yves Oudeyer , Peter Ford Dominey

Comments: To be published as a conference paper in the proceedings of IUI’20

Journal-ref: 25th International Conference on Intelligent User Interfaces (IUI

’20), March 17–20, 2020, Cagliari, Italy



Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

People are becoming increasingly comfortable using Digital Assistants (DAs)

to interact with services or connected objects. However, for non-programming

users, the available possibilities for customizing their DA are limited and do

not include the possibility of teaching the assistant new tasks. To make the

most of the potential of DAs, users should be able to customize assistants by

instructing them through Natural Language (NL). To provide such

functionalities, NL interpretation in traditional assistants should be

improved: (1) The intent identification system should be able to recognize new

forms of known intents, and to acquire new intents as they are expressed by the

user. (2) In order to be adaptive to novel intents, the Natural Language

Understanding module should be sample efficient, and should not rely on a

pretrained model. Rather, the system should continuously collect the training

data as it learns new intents from the user. In this work, we propose AidMe

(Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop

adaptive intent detection framework that allows the assistant to adapt to its

user by learning his intents as their interaction progresses. AidMe builds its

repertoire of intents and collects data to train a model of semantic similarity

evaluation that can discriminate between the learned intents and autonomously

discover new forms of known intents. AidMe addresses two major issues – intent

learning and user adaptation – for instructable digital assistants. We

demonstrate the capabilities of AidMe as a standalone system by comparing it

with a one-shot learning system and a pretrained NLU module through simulations

of interactions with a user. We also show how AidMe can smoothly integrate to

an existing instructable digital assistant.

Distributed, Parallel, and Cluster Computing

Consistency of Proof-of-Stake Blockchains with Concurrent Honest Slot Leaders

Aggelos Kiayias , Saad Quader , Alexander Russell

Comments: Initial submission. arXiv admin note: text overlap with arXiv:1911.10187



Distributed, Parallel, and Cluster Computing (cs.DC)

; Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM)

We improve the fundamental security threshold of Proof-of-Stake (PoS)

blockchain protocols, reflecting for the first time the positive effect of

rounds with multiple honest leaders. Current analyses of the longest-chain rule

in PoS blockchain protocols reduce consistency to the dynamics of an abstract,

round-based block creation process determined by three probabilities: (p_A),

the probability that a round has at least one adversarial leader; (p_h), the

probability that a round has a single honest leader; and (p_H), the probability

that a round has multiple, but honest, leaders. We present a consistency

analysis that achieves the optimal threshold (p_h + p_H > p_A). This is a first

in the literature and can be applied to both the simple synchronous setting and

the setting with bounded delays. We also achieve the optimal consistency error

(e^{-Theta(k)}), (k) being the confirmation time.

The consistency analyses in Ouroboros Praos (Eurocrypt 2018) and Genesis (CCS

2018) assume that (p_h – p_H > p_A); the analyses in Sleepy Consensus

(Asiacrypt 2017) and Snow White (Fin. Crypto 2019) assume that (p_h > p_A).

Thus existing analyses either incur a penalty for multiply-honest rounds, or

treat them neutrally. In addition, previous analyses completely break down when

(p_h < p_A). Our new results can be directly applied to improve the consistency

of these existing protocols. We emphasize that these thresholds determine the

critical tradeoff between honest majority, network delays, and consistency


We complement our results with a consistency analysis in the setting where

uniquely honest slots are rare, event letting (p_h = 0), under the added

assumption that honest players adopt a consistent chain selection rule. Our

analysis provides a direct connection between the Ouroboros analysis focusing

on “relative margin” and the Sleepy analysis focusing on “strong pivots.”

Dynamic Byzantine Reliable Broadcast [Technical Report]

Rachid Guerraoui , Jovan Komatovic , Dragos-Adrian Seredinschi

Comments: This work has been supported in part by a grant from Interchain Foundation



Distributed, Parallel, and Cluster Computing (cs.DC)

Reliable broadcast is a powerful primitive guaranteeing that, intuitively,

all processes in a distributed system deliver the same set of messages. There

is a twofold reason why this primitive is appealing: (i) we can implement it

deterministically in a completely asynchronous environment, unlike stronger

primitives like consensus and total-order broadcast, and yet (ii) it is

powerful enough to implement numerous useful applications like payment systems.

The problem we tackle in this paper is that of dynamic reliable broadcast,

i.e., enabling processes to join or leave the system. This is desirable

property for long-lived applications supposed to be highly available, yet has

been precluded in previous asynchronous reliable broadcast protocols.

We introduce the first specification of a dynamic Byzantine reliable

broadcast (DBRB) primitive that is amenable to an asynchronous implementation.

Indeed, we present an algorithm that implements this specification in an

asynchronous environment. Our algorithm ensures that if any correct process in

the system broadcasts (resp. delivers) a message, then every correct process in

the system delivers that message, or leaves the system. We assume that, at any

point in time, 2/3 of the processes in the system are correct, which is tight.

We also prove that even if only one process in the system can fail—and it can

fail by merely crashing—then it is impossible to implement a stronger

primitive, ensuring that if any correct process in the system broadcasts (resp.

delivers) a message, then every correct process in the system delivers that

message, including those that eventually leave.

FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data

Robert Underwood , Sheng Di , Jon C. Calhoun , Franck Cappello

Comments: 12 pages



Distributed, Parallel, and Cluster Computing (cs.DC)

With ever-increasing volumes of scientific floating-point data being produced

by high-performance computing applications, significantly reducing scientific

floating-point data size is critical, and error-controlled lossy compressors

have been developed for years. None of the existing scientific floating-point

lossy data compressors, however, support effective fixed-ratio lossy

compression. Yet fixed-ratio lossy compression for scientific floating-point

data not only compresses to the requested ratio but also respects a

user-specified error bound with higher fidelity. In this paper, we present

FRaZ: a generic fixed-ratio lossy compression framework respecting

user-specified error constraints. The contribution is twofold. (1) We develop

an efficient iterative approach to accurately determine the appropriate error

settings for different lossy compressors based on target compression ratios.

(2) We perform a thorough performance and accuracy evaluation for our proposed

fixed-ratio compression framework with multiple state-of-the-art

error-controlled lossy compressors, using several real-world scientific

floating-point datasets from different domains. Experiments show that FRaZ

effectively identifies the optimum error setting in the entire error setting

space of any given lossy compressor. While fixed-ratio lossy compression is

slower than fixed-error compression, it provides an important new lossy

compression technique for users of very large scientific floating-point


Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Ping Zhou (1), Zhen Yu (2), Jingyi Ma (3), Maozai Tian (2) ((1) Beijing Information Science and Technology University, (2) Renmin University of China, (3) Central University of Finance and Economics)

Comments: 25 pages, 11 figures



Methodology (stat.ME)

; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Distributed statistical inference has recently attracted immense attention.

Herein, we study the asymptotic efficiency of the maximum likelihood estimator

(MLE), the one-step MLE, and the aggregated estimating equation estimator for

generalized linear models with a diverging number of covariates. Then a novel

method is proposed to obtain an asymptotically efficient estimator for

large-scale distributed data by two rounds of communication between local

machines and the central server. The assumption on the number of machines in

this paper is more relaxed and thus practical for real-world applications.

Simulations and a case study demonstrate the satisfactory finite-sample

performance of the proposed estimators.


Gradient descent with momentum — to accelerate or to super-accelerate?

Goran Nakerst , John Brennan , Masudul Haque

Comments: 19 pages + references, 8 figures. A variant of Nesterov acceleration is proposed and studied



Machine Learning (cs.LG)

; Optimization and Control (math.OC); Machine Learning (stat.ML)

We consider gradient descent with `momentum’, a widely used method for loss

function minimization in machine learning. This method is often used with

`Nesterov acceleration’, meaning that the gradient is evaluated not at the

current position in parameter space, but at the estimated position after one

step. In this work, we show that the algorithm can be improved by extending

this `acceleration’ — by using the gradient at an estimated position several

steps ahead rather than just one step ahead. How far one looks ahead in this

`super-acceleration’ algorithm is determined by a new hyperparameter.

Considering a one-parameter quadratic loss function, the optimal value of the

super-acceleration can be exactly calculated and analytically estimated. We

show explicitly that super-accelerating the momentum algorithm is beneficial,

not only for this idealized problem, but also for several synthetic loss

landscapes and for the MNIST classification task with neural networks.

Super-acceleration is also easy to incorporate into adaptive algorithms like

RMSProp or Adam, and is shown to improve these algorithms.

Exact Information Bottleneck with Invertible Neural Networks: Getting the Best of Discriminative and Generative Modeling

Lynton Ardizzone , Radek Mackowiak , Ullrich Köthe , Carsten Rother Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

The Information Bottleneck (IB) principle offers a unified approach to many

learning and prediction problems. Although optimal in an information-theoretic

sense, practical applications of IB are hampered by a lack of accurate

high-dimensional estimators of mutual information, its main constituent. We

propose to combine IB with invertible neural networks (INNs), which for the

first time allows exact calculation of the required mutual information. Applied

to classification, our proposed method results in a generative classifier we

call IB-INN. It accurately models the class conditional likelihoods,

generalizes well to unseen data and reliably recognizes out-of-distribution

examples. In contrast to existing generative classifiers, these advantages

incur only minor reductions in classification accuracy in comparison to

corresponding discriminative methods such as feed-forward networks.

Furthermore, we provide insight into why IB-INNs are superior to other

generative architectures and training procedures and show experimentally that

our method outperforms alternative models of comparable complexity.

Generalization of Change-Point Detection in Time Series Data Based on Direct Density Ratio Estimation

Mikhail Hushchyn , Andrey Ustyuzhanin Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

The goal of the change-point detection is to discover changes of time series

distribution. One of the state of the art approaches of the change-point

detection are based on direct density ratio estimation. In this work we show

how existing algorithms can be generalized using various binary classification

and regression models. In particular, we show that the Gradient Boosting over

Decision Trees and Neural Networks can be used for this purpose. The algorithms

are tested on several synthetic and real-world datasets. The results show that

the proposed methods outperform classical RuLSIF algorithm. Discussion of cases

where the proposed algorithms have advantages over existing methods are also


Approximating Activation Functions

Nicholas Gerard Timmons , Andrew Rice

Comments: 10 Pages, 5 Figures, 1 Table



Machine Learning (cs.LG)

; Performance (cs.PF); Machine Learning (stat.ML)

ReLU is widely seen as the default choice for activation functions in neural

networks. However, there are cases where more complicated functions are

required. In particular, recurrent neural networks (such as LSTMs) make

extensive use of both hyperbolic tangent and sigmoid functions. These functions

are expensive to compute. We used function approximation techniques to develop

replacements for these functions and evaluated them empirically on three

popular network configurations. We find safe approximations that yield a 10% to

37% improvement in training times on the CPU. These approximations were

suitable for all cases we considered and we believe are appropriate

replacements for all networks using these activation functions. We also develop

ranged approximations which only apply in some cases due to restrictions on

their input domain. Our ranged approximations yield a performance improvement

of 20% to 53% in network training time. Our functions also match or

considerably out perform the ad-hoc approximations used in Theano and the

implementation of Word2Vec.

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet

Sizhe Chen , Zhengbao He , Chengjin Sun , Xiaolin Huang

Comments: arXiv admin note: text overlap with arXiv:1912.07160



Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Adversarial attacks on deep neural networks (DNNs) have been found for

several years. However, the existing adversarial attacks have high success

rates only when the information of the attacked DNN is well-known or could be

estimated by structure similarity or massive queries. In this paper, we propose

an emph{Attack on Attention} (AoA), a semantic feature commonly shared by

DNNs. The transferability of AoA is quite high. With no more than 10 queries of

the decision only, AoA can achieve almost 100\% success rate when attacking on

many popular DNNs. Even without query, AoA could keep a surprisingly high

attack performance. We apply AoA to generate 96020 adversarial samples from

ImageNet to defeat many neural networks, and thus name the dataset as

emph{DAmageNet}. 20 well-trained DNNs are tested on DAmageNet. Without

adversarial training, most of the tested DNNs have an error rate over 90\%.

DAmageNet is the first universal adversarial dataset and it could serve as a

benchmark for robustness testing and adversarial training.

Cyber Attack Detection thanks to Machine Learning Algorithms

Antoine Delplace , Sheryl Hermoso , Kristofer Anandita

Comments: 46 pages, 38 figures, project report



Machine Learning (cs.LG)

; Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)

Cybersecurity attacks are growing both in frequency and sophistication over

the years. This increasing sophistication and complexity call for more

advancement and continuous innovation in defensive strategies. Traditional

methods of intrusion detection and deep packet inspection, while still largely

used and recommended, are no longer sufficient to meet the demands of growing

security threats. As computing power increases and cost drops, Machine Learning

is seen as an alternative method or an additional mechanism to defend against

malwares, botnets, and other attacks. This paper explores Machine Learning as a

viable solution by examining its capabilities to classify malicious traffic in

a network.

First, a strong data analysis is performed resulting in 22 extracted features

from the initial Netflow datasets. All these features are then compared with

one another through a feature selection process. Then, our approach analyzes

five different machine learning algorithms against NetFlow dataset containing

common botnets. The Random Forest Classifier succeeds in detecting more than

95% of the botnets in 8 out of 13 scenarios and more than 55% in the most

difficult datasets. Finally, insight is given to improve and generalize the

results, especially through a bootstrapping technique.

Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

Shayan Aziznejad , Harshit Gupta , Joaquim Campos , Michael Unser Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

We introduce a variational framework to learn the activation functions of

deep neural networks. The main motivation is to control the Lipschitz

regularity of the input-output relation. To that end, we first establish a

global bound for the Lipschitz constant of neural networks. Based on the

obtained bound, we then formulate a variational problem for learning activation

functions. Our variational problem is infinite-dimensional and is not

computationally tractable. However, we prove that there always exists a

solution that has continuous and piecewise-linear (linear-spline) activations.

This reduces the original problem to a finite-dimensional minimization. We

numerically compare our scheme with standard ReLU network and its variations,

PReLU and LeakyReLU.

Data-Driven Permanent Magnet Temperature Estimation in Synchronous Motors with Supervised Machine Learning

Wilhelm Kirchgässner , Oliver Wallscheid , Joachim Böcker

Comments: preprint for TII: SS on Applications of Artificial Intelligence in Industrial Power Electronics and Systems



Machine Learning (cs.LG)

; Systems and Control (eess.SY); Machine Learning (stat.ML)

Monitoring the magnet temperature in permanent magnet synchronous motors

(PMSMs) for automotive applications is a challenging task for several decades

now, as signal injection or sensor-based methods still prove unfeasible in a

commercial context. Overheating results in severe motor deterioration and is

thus of high concern for the machine’s control strategy and its design. Lack of

precise temperature estimations leads to lesser device utilization and higher

material cost. In this work, several machine learning (ML) models are

empirically evaluated on their estimation accuracy for the task of predicting

latent high-dynamic magnet temperature profiles. The range of selected

algorithms covers as diverse approaches as possible with ordinary and weighted

least squares, support vector regression, (k)-nearest neighbors, randomized

trees and neural networks. Having test bench data available, it is shown that

ML approaches relying merely on collected data meet the estimation performance

of classical thermal models built on thermodynamic theory, yet not all kinds of

models render efficient use of large datasets or sufficient modeling

capacities. Especially linear regression and simple feed-forward neural

networks with optimized hyperparameters mark strong predictive quality at low

to moderate model sizes.

Sideways: Depth-Parallel Training of Video Models

Mateusz Malinowski , Grzegorz Swirszcz , Joao Carreira , Viorica Patraucean Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose Sideways, an approximate backpropagation scheme for training video

models. In standard backpropagation, the gradients and activations at every

computation step through the model are temporally synchronized. The forward

activations need to be stored until the backward pass is executed, preventing

inter-layer (depth) parallelization. However, can we leverage smooth, redundant

input streams such as videos to develop a more efficient training scheme? Here,

we explore an alternative to backpropagation; we overwrite network activations

whenever new ones, i.e., from new frames, become available. Such a more gradual

accumulation of information from both passes breaks the precise correspondence

between gradients and activations, leading to theoretically more noisy weight

updates. Counter-intuitively, we show that Sideways training of deep

convolutional video networks not only still converges, but can also potentially

exhibit better generalization compared to standard synchronized


GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks

Qiang Huang , Makoto Yamada , Yuan Tian , Dinesh Singh , Dawei Yin , Yi Chang Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Graph structured data has wide applicability in various domains such as

physics, chemistry, biology, computer vision, and social networks, to name a

few. Recently, graph neural networks (GNN) were shown to be successful in

effectively representing graph structured data because of their good

performance and generalization ability. GNN is a deep learning based method

that learns a node representation by combining specific nodes and the

structural/topological information of a graph. However, like other deep models,

explaining the effectiveness of GNN models is a challenging task because of the

complex nonlinear transformations made over the iterations. In this paper, we

propose GraphLIME, a local interpretable model explanation for graphs using the

Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear

feature selection method. GraphLIME is a generic GNN-model explanation

framework that learns a nonlinear interpretable model locally in the subgraph

of the node being explained. More specifically, to explain a node, we generate

a nonlinear interpretable model from its (N)-hop neighborhood and then compute

the K most representative features as the explanations of its prediction using

HSIC Lasso. Through experiments on two real-world datasets, the explanations of

GraphLIME are found to be of extraordinary degree and more descriptive in

comparison to the existing explanation methods.

FedVision: An Online Visual Object Detection Platform Powered by Federated Learning

Yang Liu , Anbu Huang , Yun Luo , He Huang , Youzhi Liu , Yuanyuan Chen , Lican Feng , Tianjian Chen , Han Yu , Qiang Yang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Visual object detection is a computer vision-based artificial intelligence

(AI) technique which has many practical applications (e.g., fire hazard

monitoring). However, due to privacy concerns and the high cost of transmitting

video data, it is highly challenging to build object detection models on

centrally stored large training datasets following the current approach.

Federated learning (FL) is a promising approach to resolve this challenge.

Nevertheless, there currently lacks an easy to use tool to enable computer

vision application developers who are not experts in federated learning to

conveniently leverage this technology and apply it in their systems. In this

paper, we report FedVision – a machine learning engineering platform to support

the development of federated learning powered computer vision applications. The

platform has been deployed through a collaboration between WeBank and Extreme

Vision to help customers develop computer vision-based safety monitoring

solutions in smart city applications. Over four months of usage, it has

achieved significant efficiency improvement and cost reduction while removing

the need to transmit sensitive data for three major corporate customers. To the

best of our knowledge, this is the first real application of FL in computer

vision-based tasks.

DNNs as Layers of Cooperating Classifiers

Marelie H. Davel , Marthinus W. Theunissen , Arnold M. Pretorius , Etienne Barnard

Comments: Accepted at AAAI-2020. The preprint contains additional figures and an appendix not included in the conference version. Main text remains unchanged



Machine Learning (cs.LG)

; Machine Learning (stat.ML)

A robust theoretical framework that can describe and predict the

generalization ability of deep neural networks (DNNs) in general circumstances

remains elusive. Classical attempts have produced complexity metrics that rely

heavily on global measures of compactness and capacity with little

investigation into the effects of sub-component collaboration. We demonstrate

intriguing regularities in the activation patterns of the hidden nodes within

fully-connected feedforward networks. By tracing the origin of these patterns,

we show how such networks can be viewed as the combination of two information

processing systems: one continuous and one discrete. We describe how these two

systems arise naturally from the gradient-based optimization process, and

demonstrate the classification ability of the two systems, individually and in

collaboration. This perspective on DNN classification offers a novel way to

think about generalization, in which different subsets of the training data are

used to train distinct classifiers; those classifiers are then combined to

perform the classification task, and their consistency is crucial for accurate


A Derivative-Free Method for Solving Elliptic Partial Differential Equations with Deep Neural Networks

Jihun Han , Mihai Nica , Adam R Stinchcombe

Comments: 25 pages, 4 figures



Machine Learning (cs.LG)

; Probability (math.PR); Machine Learning (stat.ML)

We introduce a deep neural network based method for solving a class of

elliptic partial differential equations. We approximate the solution of the PDE

with a deep neural network which is trained under the guidance of a

probabilistic representation of the PDE in the spirit of the Feynman-Kac

formula. The solution is given by an expectation of a martingale process driven

by a Brownian motion. As Brownian walkers explore the domain, the deep neural

network is iteratively trained using a form of reinforcement learning. Our

method is a ‘Derivative-Free Loss Method’ since it does not require the

explicit calculation of the derivatives of the neural network with respect to

the input neurons in order to compute the training loss. The advantages of our

method are showcased in a series of test problems: a corner singularity

problem, an interface problem, and an application to a chemotaxis population


Graph Inference Learning for Semi-supervised Classification

Chunyan Xu , Zhen Cui , Xiaobin Hong , Tong Zhang , Jian Yang , Wei Liu

Comments: 11 pages

Journal-ref: International Conference on Learning Representations (ICLR), 2020



Machine Learning (cs.LG)

; Machine Learning (stat.ML)

In this work, we address semi-supervised classification of graph data, where

the categories of those unlabeled nodes are inferred from labeled nodes as well

as graph structures. Recent works often solve this problem via advanced graph

convolution in a conventionally supervised manner, but the performance could

degrade significantly when labeled data is scarce. To this end, we propose a

Graph Inference Learning (GIL) framework to boost the performance of

semi-supervised node classification by learning the inference of node labels on

graph topology. To bridge the connection between two nodes, we formally define

a structure relation by encapsulating node attributes, between-node paths, and

local topological structures together, which can make the inference

conveniently deduced from one node to another node. For learning the inference

process, we further introduce meta-optimization on structure relations from

training nodes to validation nodes, such that the learnt graph inference

capability can be better self-adapted to testing nodes. Comprehensive

evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed, and

NELL) demonstrate the superiority of our proposed GIL when compared against

state-of-the-art methods on the semi-supervised node classification task.

ADAMT: A Stochastic Optimization with Trend Correction Scheme

Bingxin Zhou , Xuebin Zheng , Junbin Gao Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Adam-type optimizers, as a class of adaptive moment estimation methods with

the exponential moving average scheme, have been successfully used in many

applications of deep learning. Such methods are appealing for capability on

large-scale sparse datasets with high computational efficiency. In this paper,

we present a new framework for adapting Adam-type methods, namely AdamT.

Instead of applying a simple exponential weighted average, AdamT also includes

the trend information when updating the parameters with the adaptive step size

and gradients. The additional terms promise an efficient movement on the

complex cost surface, and thus the loss would converge more rapidly. We show

empirically the importance of adding the trend component, where AdamT

outperforms the vanilla Adam method constantly with state-of-the-art models on

several classical real-world datasets.

Learning Stable Deep Dynamics Models

Gaurav Manek , J. Zico Kolter

Comments: NeurIPS 2019



Machine Learning (cs.LG)

; Dynamical Systems (math.DS); Machine Learning (stat.ML)

Deep networks are commonly used to model dynamical systems, predicting how

the state of a system will evolve over time (either autonomously or in response

to control inputs). Despite the predictive power of these systems, it has been

difficult to make formal claims about the basic properties of the learned

systems. In this paper, we propose an approach for learning dynamical systems

that are guaranteed to be stable over the entire state space. The approach

works by jointly learning a dynamics model and Lyapunov function that

guarantees non-expansiveness of the dynamics under the learned Lyapunov

function. We show that such learning systems are able to model simple dynamical

systems and can be combined with additional deep generative models to learn

complex dynamics, such as video textures, in a fully end-to-end fashion.

Better Boosting with Bandits for Online Learning

Nikolaos Nikolaou , Joseph Mellor , Nikunj C. Oza , Gavin Brown

Comments: 44 pages, 6 figures



Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Probability estimates generated by boosting ensembles are poorly calibrated

because of the margin maximization nature of the algorithm. The outputs of the

ensemble need to be properly calibrated before they can be used as probability

estimates. In this work, we demonstrate that online boosting is also prone to

producing distorted probability estimates. In batch learning, calibration is

achieved by reserving part of the training data for training the calibrator

function. In the online setting, a decision needs to be made on each round:

shall the new example(s) be used to update the parameters of the ensemble or

those of the calibrator. We proceed to resolve this decision with the aid of

bandit optimization algorithms. We demonstrate superior performance to

uncalibrated and naively-calibrated on-line boosting ensembles in terms of

probability estimation. Our proposed mechanism can be easily adapted to other

tasks(e.g. cost-sensitive classification) and is robust to the choice of

hyperparameters of both the calibrator and the ensemble.

An adversarial learning framework for preserving users' anonymity in face-based emotion recognition

Vansh Narula , Zhangyang (Atlas)

Wang , Theodora Chaspari Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Image and video-capturing technologies have permeated our every-day life.

Such technologies can continuously monitor individuals’ expressions in

real-life settings, affording us new insights into their emotional states and

transitions, thus paving the way to novel well-being and healthcare

applications. Yet, due to the strong privacy concerns, the use of such

technologies is met with strong skepticism, since current face-based emotion

recognition systems relying on deep learning techniques tend to preserve

substantial information related to the identity of the user, apart from the

emotion-specific information. This paper proposes an adversarial learning

framework which relies on a convolutional neural network (CNN) architecture

trained through an iterative procedure for minimizing identity-specific

information and maximizing emotion-dependent information. The proposed approach

is evaluated through emotion classification and face identification metrics,

and is compared against two CNNs, one trained solely for emotion recognition

and the other trained solely for face identification. Experiments are performed

using the Yale Face Dataset and Japanese Female Facial Expression Database.

Results indicate that the proposed approach can learn a convolutional

transformation for preserving emotion recognition accuracy and degrading face

identity recognition, providing a foundation toward privacy-aware emotion

recognition technologies.

Code-Bridged Classifier (CBC): A Low or Negative Overhead Defense for Making a CNN Classifier Robust Against Adversarial Attacks

Farnaz Behnia , Ali Mirzaeian , Mohammad Sabokrou , Sai Manoj , Tinoosh Mohsenin , Khaled N. Khasawneh , Liang Zhao , Houman Homayoun , Avesta Sasan

Comments: 6 pages, Accepted and to appear in ISQED 2020



Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

In this paper, we propose Code-Bridged Classifier (CBC), a framework for

making a Convolutional Neural Network (CNNs) robust against adversarial attacks

without increasing or even by decreasing the overall models’ computational

complexity. More specifically, we propose a stacked encoder-convolutional

model, in which the input image is first encoded by the encoder module of a

denoising auto-encoder, and then the resulting latent representation (without

being decoded) is fed to a reduced complexity CNN for image classification. We

illustrate that this network not only is more robust to adversarial examples

but also has a significantly lower computational complexity when compared to

the prior art defenses.

Fairness Measures for Regression via Probabilistic Classification

Daniel Steinberg , Alistair Reid , Simon O'Callaghan Subjects : Machine Learning (cs.LG) ; Computers and Society (cs.CY); Machine Learning (stat.ML)

Algorithmic fairness involves expressing notions such as equity, or

reasonable treatment, as quantifiable measures that a machine learning

algorithm can optimise. Most work in the literature to date has focused on

classification problems where the prediction is categorical, such as accepting

or rejecting a loan application. This is in part because classification

fairness measures are easily computed by comparing the rates of outcomes,

leading to behaviours such as ensuring that the same fraction of eligible men

are selected as eligible women. But such measures are computationally difficult

to generalise to the continuous regression setting for problems such as

pricing, or allocating payments. The difficulty arises from estimating

conditional densities (such as the probability density that a system will

over-charge by a certain amount). For the regression setting we introduce

tractable approximations of the independence, separation and sufficiency

criteria by observing that they factorise as ratios of different conditional

probabilities of the protected attributes. We introduce and train machine

learning classifiers, distinct from the predictor, as a mechanism to estimate

these probabilities from the data. This naturally leads to model agnostic,

tractable approximations of the criteria, which we explore experimentally.

Fourier Transform Approach to Machine Learning III: Fourier Classification

Soheil Mehrabkhani Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

We propose a Fourier-based learning algorithm for highly nonlinear multiclass

classification. The algorithm is based on a smoothing technique to calculate

the probability distribution of all classes. To obtain the probability

distribution, the density distribution of each class is smoothed by a low-pass

filter separately. The advantage of the Fourier representation is capturing the

nonlinearities of the data distribution without defining any kernel function.

Furthermore, contrary to the support vector machines, it makes a probabilistic

explanation for the classification possible. Moreover, it can treat overlapped

classes as well. Comparing to the logistic regression, it does not require

feature engineering. In general, its computational performance is also very

well for large data sets and in contrast to other algorithms, the typical

overfitting problem does not happen at all. The capability of the algorithm is

demonstrated for multiclass classification with overlapped classes and very

high nonlinearity of the class distributions.

Understanding the Power of Persistence Pairing via Permutation Test

Chen Cai , Yusu Wang

Comments: 20 pages, 6 graphs



Machine Learning (cs.LG)

; Computational Geometry (cs.CG); Machine Learning (stat.ML)

Recently many efforts have been made to incorporate persistence diagrams, one

of the major tools in topological data analysis (TDA), into machine learning

pipelines. To better understand the power and limitation of persistence

diagrams, we carry out a range of experiments on both graph data and shape

data, aiming to decouple and inspect the effects of different factors involved.

To this end, we also propose the so-called emph{permutation test} for

persistence diagrams to delineate critical values and pairings of critical

values. For graph classification tasks, we note that while persistence pairing

yields consistent improvement over various benchmark datasets, it appears that

for various filtration functions tested, most discriminative power comes from

critical values. For shape segmentation and classification, however, we note

that persistence pairing shows significant power on most of the benchmark

datasets, and improves over both summaries based on merely critical values, and

those based on permutation tests. Our results help provide insights on when

persistence diagram based summaries could be more suitable.

Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning

Paola Cascante-Bonilla , Fuwen Tan , Yanjun Qi , Vicente Ordonez Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Semi-supervised learning aims to take advantage of a large amount of

unlabeled data to improve the accuracy of a model that only has access to a

small number of labeled examples. We propose curriculum labeling, an approach

that exploits pseudo-labeling for propagating labels to unlabeled samples in an

iterative and self-paced fashion. This approach is surprisingly simple and

effective and surpasses or is comparable with the best methods proposed in the

recent literature across all the standard benchmarks for image classification.

Notably, we obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled

samples, and 88.56% top-5 accuracy on Imagenet-ILSVRC using 128,000 labeled

samples. In contrast to prior works, our approach shows improvements even in a

more realistic scenario that leverages out-of-distribution unlabeled data


Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

Antoine Dedieu , Hussein Hazimeh , Rahul Mazumder Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)

We consider a discrete optimization based approach for learning sparse

classifiers, where the outcome depends upon a linear combination of a small

subset of features. Recent work has shown that mixed integer programming (MIP)

can be used to solve (to optimality) (ell_0)-regularized problems at scales

much larger than what was conventionally considered possible in the statistics

and machine learning communities. Despite their usefulness, MIP-based

approaches are significantly slower compared to relatively mature algorithms

based on (ell_1)-regularization and relatives. We aim to bridge this

computational gap by developing new MIP-based algorithms for

(ell_0)-regularized classification. We propose two classes of scalable

algorithms: an exact algorithm that can handle (papprox 50,000) features in a

few minutes, and approximate algorithms that can address instances with

(papprox 10^6) in times comparable to fast (ell_1)-based algorithms. Our

exact algorithm is based on the novel idea of extsl{integrality generation},

which solves the original problem (with (p) binary variables) via a sequence of

mixed integer programs that involve a small number of binary variables. Our

approximate algorithms are based on coordinate descent and local combinatorial

search. In addition, we present new estimation error bounds for a class of

(ell_0)-regularized estimators. Experiments on real and synthetic data

demonstrate that our approach leads to models with considerably improved

statistical performance (especially, variable selection) when compared to

competing toolkits.

Robust Generalization via (α)-Mutual Information

Amedeo Roberto Esposito , Michael Gastpar , Ibrahim Issa

Comments: Accepted to IZS2020. arXiv admin note: substantial text overlap with arXiv:1912.01439



Information Theory (cs.IT)

; Machine Learning (cs.LG)

The aim of this work is to provide bounds connecting two probability measures

of the same event using Rényi (alpha)-Divergences and Sibson’s

(alpha)-Mutual Information, a generalization of respectively the

Kullback-Leibler Divergence and Shannon’s Mutual Information. A particular case

of interest can be found when the two probability measures considered are a

joint distribution and the corresponding product of marginals (representing the

statistically independent scenario). In this case, a bound using Sibson’s

(alpha-)Mutual Information is retrieved, extending a result involving Maximal

Leakage to general alphabets. These results have broad applications, from

bounding the generalization error of learning algorithms to the more general

framework of adaptive data analysis, provided that the divergences and/or

information measures used are amenable to such an analysis ({it i.e.,} are

robust to post-processing and compose adaptively). The generalization error

bounds are derived with respect to high-probability events but a corresponding

bound on expected generalization error is also retrieved.

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

Yanpei Shi , Thomas Hain

Comments: Submitted to Odyssey 2020



Sound (cs.SD)

; Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this work, a speaker embedding de-mixing approach is proposed. Instead of

separating two-speaker signal in signal space like speech source separation,

the proposed approach separates different speaker properties from two-speaker

signal in embedding space. The proposed approach contains two steps. In step

one, the clean speaker embeddings are learned and collected by a residual TDNN

based network. In step two, the two-speaker signal and the embedding of one of

the speakers are input to a speaker embedding de-mixing network. The de-mixing

network is trained to generate the embedding of the other speaker of the by

reconstruction loss. Speaker identification accuracy on the de-mixed speaker

embeddings is used to evaluate the quality of the obtained embeddings.

Experiments are done in two kind of data: artificial augmented two-speaker data

(TIMIT) and real world recording of two-speaker data (MC-WSJ). Six diffident

speaker embedding de-mixing architectures are investigated. Comparing with the

speaker identification accuracy on the clean speaker embeddings (98.5%), the

obtained results show that one of the speaker embedding de-mixing architectures

obtain close performance, reaching 96.9% test accuracy on TIMIT when the SNR

between the target speaker and interfering speaker is 5 dB. More surprisingly,

we found choosing a simple subtraction as the embedding de-mixing function

could obtain the second best performance, reaching 95.2% test accuracy.

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

Tian Bian , Xi Xiao , Tingyang Xu , Peilin Zhao , Wenbing Huang , Yu Rong , Junzhou Huang

Comments: 8 pages, 4 figures, AAAI 2020



Social and Information Networks (cs.SI)

; Machine Learning (cs.LG)

Social media has been developing rapidly in public due to its nature of

spreading new information, which leads to rumors being circulated. Meanwhile,

detecting rumors from such massive information in social media is becoming an

arduous challenge. Therefore, some deep learning methods are applied to

discover rumors through the way they spread, such as Recursive Neural Network

(RvNN) and so on. However, these deep learning methods only take into account

the patterns of deep propagation but ignore the structures of wide dispersion

in rumor detection. Actually, propagation and dispersion are two crucial

characteristics of rumors. In this paper, we propose a novel bi-directional

graph model, named Bi-Directional Graph Convolutional Networks (Bi-GCN), to

explore both characteristics by operating on both top-down and bottom-up

propagation of rumors. It leverages a GCN with a top-down directed graph of

rumor spreading to learn the patterns of rumor propagation, and a GCN with an

opposite directed graph of rumor diffusion to capture the structures of rumor

dispersion. Moreover, the information from the source post is involved in each

layer of GCN to enhance the influences from the roots of rumors. Encouraging

empirical results on several benchmarks confirm the superiority of the proposed

method over the state-of-the-art approaches.

DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images

Andrea Bordone Molini , Diego Valsesia , Giulia Fracastoro , Enrico Magli

Comments: arXiv admin note: text overlap with arXiv:1907.06490



Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning methods for super-resolution of a remote sensing scene from

multiple unregistered low-resolution images have recently gained attention

thanks to a challenge proposed by the European Space Agency. This paper

presents an evolution of the winner of the challenge, showing how incorporating

non-local information in a convolutional neural network allows to exploit

self-similar patterns that provide enhanced regularization of the

super-resolution problem. Experiments on the dataset of the challenge show

improved performance over the state-of-the-art, which does not exploit

non-local information.

Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks

Henrique Siqueira , Sven Magg , Stefan Wermter

Comments: Accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 1-1, New York, USA



Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Ensemble methods, traditionally built with independently trained

de-correlated models, have proven to be efficient methods for reducing the

remaining residual generalization error, which results in robust and accurate

methods for real-world applications. In the context of deep learning, however,

training an ensemble of deep networks is costly and generates high redundancy

which is inefficient. In this paper, we present experiments on Ensembles with

Shared Representations (ESRs) based on convolutional networks to demonstrate,

quantitatively and qualitatively, their data processing efficiency and

scalability to large-scale datasets of facial expressions. We show that

redundancy and computational load can be dramatically reduced by varying the

branching level of the ESR without loss of diversity and generalization power,

which are both important for ensemble performance. Experiments on large-scale

datasets suggest that ESRs reduce the remaining residual generalization error

on the AffectNet and FER+ datasets, reach human-level performance, and

outperform state-of-the-art methods on facial expression recognition in the

wild using emotion and affect concepts.

Wine quality rapid detection using a compact electronic nose system: application focused on spoilage thresholds by acetic acid

Juan C. Rodriguez Gamboa , Eva Susana Albarracin E. , Adenilton J. da Silva , Luciana Leite , Tiago A. E. Ferreira

Journal-ref: LWT, Volume 108, 2019, Pages 377-384



Signal Processing (eess.SP)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

It is crucial for the wine industry to have methods like electronic nose

systems (E-Noses) for real-time monitoring thresholds of acetic acid in wines,

preventing its spoilage or determining its quality. In this paper, we prove

that the portable and compact self-developed E-Nose, based on thin film

semiconductor (SnO2) sensors and trained with an approach that uses deep

Multilayer Perceptron (MLP) neural network, can perform early detection of wine

spoilage thresholds in routine tasks of wine quality control. To obtain rapid

and online detection, we propose a method of rising-window focused on raw data

processing to find an early portion of the sensor signals with the best

recognition performance. Our approach was compared with the conventional

approach employed in E-Noses for gas recognition that involves feature

extraction and selection techniques for preprocessing data, succeeded by a

Support Vector Machine (SVM) classifier. The results evidence that is possible

to classify three wine spoilage levels in 2.7 seconds after the gas injection

point, implying in a methodology 63 times faster than the results obtained with

the conventional approach in our experimental setup.

Overly Optimistic Prediction Results on Imbalanced Data: Flaws and Benefits of Applying Over-sampling

Gilles Vandewiele , Isabelle Dehaene , György Kovács , Lucas Sterckx , Olivier Janssens , Femke Ongenae , Femke De Backere , Filip De Turck , Kristien Roelens , Johan Decruyenaere , Sofie Van Hoecke , Thomas Demeester Subjects : Signal Processing (eess.SP) ; Machine Learning (cs.LG); Machine Learning (stat.ML)

Information extracted from electrohysterography recordings could potentially

prove to be an interesting additional source of information to estimate the

risk on preterm birth. Recently, a large number of studies have reported

near-perfect results to distinguish between recordings of patients that will

deliver term or preterm using a public resource, called the Term/Preterm

Electrohysterogram database. However, we argue that these results are overly

optimistic due to a methodological flaw being made. In this work, we focus on

one specific type of methodological flaw: applying over-sampling before

partitioning the data into mutually exclusive training and testing sets. We

show how this causes the results to be biased using two artificial datasets and

reproduce results of studies in which this flaw was identified. Moreover, we

evaluate the actual impact of over-sampling on predictive performance, when

applied prior to data partitioning, using the same methodologies of related

studies, to provide a realistic view of these methodologies’ generalization

capabilities. We make our research reproducible by providing all the code under

an open license.

Predicting the Physical Dynamics of Unseen 3D Objects

Davis Rempe , Srinath Sridhar , He Wang , Leonidas J. Guibas

Comments: In Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020. arXiv admin note: text overlap with arXiv:1901.00466



Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Machines that can predict the effect of physical interactions on the dynamics

of previously unseen object instances are important for creating better robots

and interactive virtual worlds. In this work, we focus on predicting the

dynamics of 3D objects on a plane that have just been subjected to an impulsive

force. In particular, we predict the changes in state – 3D position, rotation,

velocities, and stability. Different from previous work, our approach can

generalize dynamics predictions to object shapes and initial conditions that

were unseen during training. Our method takes the 3D object’s shape as a point

cloud and its initial linear and angular velocities as input. We extract shape

features and use a recurrent neural network to predict the full change in state

at each time step. Our model can support training with data from both a physics

engine or the real world. Experiments show that we can accurately predict the

changes in state for unseen object geometries and initial conditions.

RobBERT: a Dutch RoBERTa-based Language Model

Pieter Delobelle , Thomas Winters , Bettina Berendt

Comments: 7 pages, 2 tables



Computation and Language (cs.CL)

; Machine Learning (cs.LG)

Pre-trained language models have been dominating the field of natural

language processing in recent years, and have led to significant performance

gains for various complex natural language tasks. One of the most prominent

pre-trained language models is BERT (Bi-directional Encoders for Transformers),

which was released as an English as well as a multilingual version. Although

multilingual BERT performs well on many tasks, recent studies showed that BERT

models trained on a single language significantly outperform the multilingual

results. Training a Dutch BERT model thus has a lot of potential for a wide

range of Dutch NLP tasks. While previous approaches have used earlier

implementations of BERT to train their Dutch BERT, we used RoBERTa, a robustly

optimized BERT approach, to train a Dutch language model called RobBERT. We

show that RobBERT improves state of the art results in Dutch-specific language

tasks, and also outperforms other existing Dutch BERT-based models in sentiment

analysis. These results indicate that RobBERT is a powerful pre-trained model

for fine-tuning for a large variety of Dutch language tasks. We publicly

release this pre-trained model in hope of supporting further downstream Dutch

NLP applications.

Epileptic Seizure Classification with Symmetric and Hybrid Bilinear Models

Tennison Liu , Nhan Duy Truong , Armin Nikpour , Luping Zhou , Omid Kavehei

Comments: 9 pages, 4 figures, 3 tables



Signal Processing (eess.SP)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Epilepsy affects nearly 1% of the global population, of which two thirds can

be treated by anti-epileptic drugs and a much lower percentage by surgery.

Diagnostic procedures for epilepsy and monitoring are highly specialized and

labour-intensive. The accuracy of the diagnosis is also complicated by

overlapping medical symptoms, varying levels of experience and inter-observer

variability among clinical professions. This paper proposes a novel hybrid

bilinear deep learning network with an application in the clinical procedures

of epilepsy classification diagnosis, where the use of surface

electroencephalogram (sEEG) and audiovisual monitoring is standard practice.

Hybrid bilinear models based on two types of feature extractors, namely

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), are

trained using Short-Time Fourier Transform (STFT) of one-second sEEG. In the

proposed hybrid models, CNNs extract spatio-temporal patterns, while RNNs focus

on the characteristics of temporal dynamics in relatively longer intervals

given the same input data. Second-order features, based on interactions between

these spatio-temporal features are further explored by bilinear pooling and

used for epilepsy classification. Our proposed methods obtain an F1-score of

97.4% on the Temple University Hospital Seizure Corpus and 97.2% on the

EPILEPSIAE dataset, comparing favourably to existing benchmarks for sEEG-based

seizure type classification. The open-source implementation of this study is

available at this https URL

Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning

Marc Bocquet , Julien Brajard , Alberto Carrassi , Laurent Bertino Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Atmospheric and Oceanic Physics (

The reconstruction from observations of high-dimensional chaotic dynamics

such as geophysical flows is hampered by (i) the partial and noisy observations

that can realistically be obtained, (ii) the need to learn from long time

series of data, and (iii) the unstable nature of the dynamics. To achieve such

inference from the observations over long time series, it has been suggested to

combine data assimilation and machine learning in several ways. We show how to

unify these approaches from a Bayesian perspective using

expectation-maximization and coordinate descents. Implementations and

approximations of these methods are also discussed. Finally, we numerically and

successfully test the approach on two relevant low-order chaotic models with

distinct identifiability.

SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On

Surgan Jandial , Ayush Chopra , Kumar Ayush , Mayur Hemani , Abhijeet Kumar , Balaji Krishnamurthy

Comments: Accepted at IEEE WACV 2020



Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Image-based virtual try-on for fashion has gained considerable attention

recently. The task requires trying on a clothing item on a target model image.

An efficient framework for this is composed of two stages: (1) warping

(transforming) the try-on cloth to align with the pose and shape of the target

model, and (2) a texture transfer module to seamlessly integrate the warped

try-on cloth onto the target model image. Existing methods suffer from

artifacts and distortions in their try-on output. In this work, we present

SieveNet, a framework for robust image-based virtual try-on. Firstly, we

introduce a multi-stage coarse-to-fine warping network to better model

fine-grained intricacies (while transforming the try-on cloth) and train it

with a novel perceptual geometric matching loss. Next, we introduce a try-on

cloth conditioned segmentation mask prior to improve the texture transfer

network. Finally, we also introduce a dueling triplet loss strategy for

training the texture translation network which further improves the quality of

the generated try-on results. We present extensive qualitative and quantitative

evaluations of each component of the proposed pipeline and show significant

performance improvements against the current state-of-the-art method.

Performance of Statistical and Machine Learning Techniques for Physical Layer Authentication

Linda Senigagliesi , Marco Baldi , Ennio Gambi

Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Information Forensics and Security. arXiv admin note: text overlap with arXiv:1909.07969



Cryptography and Security (cs.CR)

; Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper we consider authentication at the physical layer, in which the

authenticator aims at distinguishing a legitimate supplicant from an attacker

on the basis of the characteristics of the communication channel.

Authentication is performed over a set of parallel wireless channels affected

by time-varying fading at the presence of a malicious attacker, whose channel

has a spatial correlation with the supplicant’s one. We first propose the use

of two different statistical decision methods, and we prove that using a large

number of references (in the form of channel estimates) affected by different

levels of time-varying fading is not beneficial from a security point of view.

We then propose to exploit classification methods based on machine learning. In

order to face the worst case of an authenticator provided with no forged

messages during training, we consider one-class classifiers. When instead the

training set includes some forged messages, we resort to more conventional

binary classifiers, considering the cases in which such messages are either

labelled or not. For the latter case, we exploit clustering algorithms to label

the training set. The performance of both nearest neighbor (NN) and support

vector machine (SVM) classification techniques is assessed. Through numerical

examples, we show that under the same probability of false alarm, one-class

classification (OCC) algorithms achieve the lowest probability of missed

detection when a small spatial correlation exists between the main channel and

the adversary one, while statistical methods are advantageous when the spatial

correlation between the two channels is large.

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Ping Zhou (1), Zhen Yu (2), Jingyi Ma (3), Maozai Tian (2) ((1) Beijing Information Science and Technology University, (2) Renmin University of China, (3) Central University of Finance and Economics)

Comments: 25 pages, 11 figures



Methodology (stat.ME)

; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Distributed statistical inference has recently attracted immense attention.

Herein, we study the asymptotic efficiency of the maximum likelihood estimator

(MLE), the one-step MLE, and the aggregated estimating equation estimator for

generalized linear models with a diverging number of covariates. Then a novel

method is proposed to obtain an asymptotically efficient estimator for

large-scale distributed data by two rounds of communication between local

machines and the central server. The assumption on the number of machines in

this paper is more relaxed and thus practical for real-world applications.

Simulations and a case study demonstrate the satisfactory finite-sample

performance of the proposed estimators.

A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

Johan Pauwels , György Fazekas , Mark B. Sandler

Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020



Artificial Intelligence (cs.AI)

; Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In recent years, Markov logic networks (MLNs) have been proposed as a

potentially useful paradigm for music signal analysis. Because all hidden

Markov models can be reformulated as MLNs, the latter can provide an

all-encompassing framework that reuses and extends previous work in the field.

However, just because it is theoretically possible to reformulate previous work

as MLNs, does not mean that it is advantageous. In this paper, we analyse some

proposed examples of MLNs for musical analysis and consider their practical

disadvantages when compared to formulating the same musical dependence

relationships as (dynamic) Bayesian networks. We argue that a number of

practical hurdles such as the lack of support for sequences and for arbitrary

continuous probability distributions make MLNs less than ideal for the proposed

musical applications, both in terms of easy of formulation and computational

requirements due to their required inference algorithms. These conclusions are

not specific to music, but apply to other fields as well, especially when

sequential data with continuous observations is involved. Finally, we show that

the ideas underlying the proposed examples can be expressed perfectly well in

the more commonly used framework of (dynamic) Bayesian networks.

Increasing the robustness of DNNs against image corruptions by playing the Game of Noise

Evgenia Rusak , Lukas Schott , Roland Zimmermann , Julian Bitterwolf , Oliver Bringmann , Matthias Bethge , Wieland Brendel Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Machine Learning (stat.ML)

The human visual system is remarkably robust against a wide range of

naturally occurring variations and corruptions like rain or snow. In contrast,

the performance of modern image recognition models strongly degrades when

evaluated on previously unseen corruptions. Here, we demonstrate that a simple

but properly tuned training with additive Gaussian and Speckle noise

generalizes surprisingly well to unseen corruptions, easily reaching the

previous state of the art on the corruption benchmark ImageNet-C (with

ResNet50) and on MNIST-C. We build on top of these strong baseline results and

show that an adversarial training of the recognition model against uncorrelated

worst-case noise distributions leads to an additional increase in performance.

This regularization can be combined with previously proposed defense methods

for further improvement.

Expecting the Unexpected: Developing Autonomous-System Design Principles for Reacting to Unpredicted Events and Conditions

Assaf Marron , Lior Limonad , Sarah Pollack , David Harel

Comments: 6 pages; 1 figure



Software Engineering (cs.SE)

; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)

When developing autonomous systems, engineers and other stakeholders make

great effort to prepare the system for all foreseeable events and conditions.

However, these systems are still bound to encounter events and conditions that

were not considered at design time. For reasons like safety, cost, or ethics,

it is often highly desired that these new situations be handled correctly upon

first encounter. In this paper we first justify our position that there will

always exist unpredicted events and conditions, driven among others by: new

inventions in the real world; the diversity of world-wide system deployments

and uses; and, the non-negligible probability that multiple seemingly unlikely

events, which may be neglected at design time, will not only occur, but occur

together. We then argue that despite this unpredictability property, handling

these events and conditions is indeed possible. Hence, we offer and exemplify

design principles that when applied in advance, can enable systems to deal, in

the future, with unpredicted circumstances. We conclude with a discussion of

how this work and a broader theoretical study of the unexpected can contribute

toward a foundation of engineering principles for developing trustworthy

next-generation autonomous systems.

Extracting more from boosted decision trees: A high energy physics case study

Vidhi Lalchand

Comments: Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada



Machine Learning (stat.ML)

; Machine Learning (cs.LG); Applications (stat.AP)

Particle identification is one of the core tasks in the data analysis

pipeline at the Large Hadron Collider (LHC). Statistically, this entails the

identification of rare signal events buried in immense backgrounds that mimic

the properties of the former. In machine learning parlance, particle

identification represents a classification problem characterized by overlapping

and imbalanced classes. Boosted decision trees (BDTs) have had tremendous

success in the particle identification domain but more recently have been

overshadowed by deep learning (DNNs) approaches. This work proposes an

algorithm to extract more out of standard boosted decision trees by targeting

their main weakness, susceptibility to overfitting. This novel construction

harnesses the meta-learning techniques of boosting and bagging simultaneously

and performs remarkably well on the ATLAS Higgs (H) to tau-tau data set (ATLAS

et al., 2014) which was the subject of the 2014 Higgs ML Challenge

(Adam-Bourdarios et al., 2015). While the decay of Higgs to a pair of tau

leptons was established in 2018 (CMS collaboration et al., 2017) at the

4.9(sigma) significance based on the 2016 data taking period, the 2014 public

data set continues to serve as a benchmark data set to test the performance of

supervised classification schemes. We show that the score achieved by the

proposed algorithm is very close to the published winning score which leverages

an ensemble of deep neural networks (DNNs). Although this paper focuses on a

single application, it is expected that this simple and robust technique will

find wider applications in high energy physics.

User-in-the-loop Adaptive Intent Detection for Instructable Digital Assistant

Nicolas Lair , Clément Delgrange , David Mugisha , Jean-Michel Dussoux , Pierre-Yves Oudeyer , Peter Ford Dominey

Comments: To be published as a conference paper in the proceedings of IUI’20

Journal-ref: 25th International Conference on Intelligent User Interfaces (IUI

’20), March 17–20, 2020, Cagliari, Italy



Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

People are becoming increasingly comfortable using Digital Assistants (DAs)

to interact with services or connected objects. However, for non-programming

users, the available possibilities for customizing their DA are limited and do

not include the possibility of teaching the assistant new tasks. To make the

most of the potential of DAs, users should be able to customize assistants by

instructing them through Natural Language (NL). To provide such

functionalities, NL interpretation in traditional assistants should be

improved: (1) The intent identification system should be able to recognize new

forms of known intents, and to acquire new intents as they are expressed by the

user. (2) In order to be adaptive to novel intents, the Natural Language

Understanding module should be sample efficient, and should not rely on a

pretrained model. Rather, the system should continuously collect the training

data as it learns new intents from the user. In this work, we propose AidMe

(Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop

adaptive intent detection framework that allows the assistant to adapt to its

user by learning his intents as their interaction progresses. AidMe builds its

repertoire of intents and collects data to train a model of semantic similarity

evaluation that can discriminate between the learned intents and autonomously

discover new forms of known intents. AidMe addresses two major issues – intent

learning and user adaptation – for instructable digital assistants. We

demonstrate the capabilities of AidMe as a standalone system by comparing it

with a one-shot learning system and a pretrained NLU module through simulations

of interactions with a user. We also show how AidMe can smoothly integrate to

an existing instructable digital assistant.

Information Theory

Design and Analysis of Online Fountain Codes for Intermediate Performance

Jingxuan Huang , Zesong Fei , Congzhe Cao , Ming Xiao Subjects : Information Theory (cs.IT)

For the benefit of improved intermediate performance, recently online

fountain codes attract much research attention. However, there is a trade-off

between the intermediate performance and the full recovery overhead for online

fountain codes, which prevents them to be improved simultaneously. We analyze

this trade-off, and propose to improve both of these two performance. We first

propose a method called Online Fountain Codes without Build-up phase (OFCNB)

where the degree-1 coded symbols are transmitted at first and the build-up

phase is removed to improve the intermediate performance. Then we analyze the

performance of OFCNB theoretically. Motivated by the analysis results, we

propose Systematic Online Fountain Codes (SOFC) to further reduce the full

recovery overhead. Theoretical analysis shows that SOFC has better intermediate

performance, and it also requires lower full recovery overhead when the channel

erasure rate is lower than a constant. Simulation results verify the analyses

and demonstrate the superior performance of OFCNB and SOFC in comparison to

other online fountain codes.

Robust Generalization via (α)-Mutual Information

Amedeo Roberto Esposito , Michael Gastpar , Ibrahim Issa

Comments: Accepted to IZS2020. arXiv admin note: substantial text overlap with arXiv:1912.01439



Information Theory (cs.IT)

; Machine Learning (cs.LG)

The aim of this work is to provide bounds connecting two probability measures

of the same event using Rényi (alpha)-Divergences and Sibson’s

(alpha)-Mutual Information, a generalization of respectively the

Kullback-Leibler Divergence and Shannon’s Mutual Information. A particular case

of interest can be found when the two probability measures considered are a

joint distribution and the corresponding product of marginals (representing the

statistically independent scenario). In this case, a bound using Sibson’s

(alpha-)Mutual Information is retrieved, extending a result involving Maximal

Leakage to general alphabets. These results have broad applications, from

bounding the generalization error of learning algorithms to the more general

framework of adaptive data analysis, provided that the divergences and/or

information measures used are amenable to such an analysis ({it i.e.,} are

robust to post-processing and compose adaptively). The generalization error

bounds are derived with respect to high-probability events but a corresponding

bound on expected generalization error is also retrieved.

On the Capacity of Private Monomial Computation

Yauhen Yakimenka , Hsuan-Yin Lin , Eirik Rosnes

Comments: Accepted for 2020 International Zurich Seminar on Information and Communication



Information Theory (cs.IT)

In this work, we consider private monomial computation (PMC) for replicated

noncolluding databases. In PMC, a user wishes to privately retrieve an

arbitrary multivariate monomial from a candidate set of monomials in (f)

messages over a finite field (mathbb F_q), where (q=p^k) is a power of a prime

(p) and (k ge 1), replicated over (n) databases. We derive the PMC capacity

under a technical condition on (p) and for asymptotically large (q). The

condition on (p) is satisfied, e.g., for large enough (p). Also, we present a

novel PMC scheme for arbitrary (q) that is capacity-achieving in the asymptotic

case above. Moreover, we present formulas for the entropy of a multivariate

monomial and for a set of monomials in uniformly distributed random variables

over a finite field, which are used in the derivation of the capacity


DNA-Based Storage: Models and Fundamental Limits

Ilan Shomorony , Reinhard Heckel

Comments: Submitted to IEEE Transaction of Information Theory; in parts presented at ISIT 2017 and ISIT 2019. arXiv admin note: text overlap with arXiv:1705.04732



Information Theory (cs.IT)

Due to its longevity and enormous information density, DNA is an attractive

medium for archival storage. In this work, we study the fundamental limits and

trade-offs of DNA-based storage systems by introducing a new channel model,

which we call the noisy shuffling-sampling channel. Motivated by current

technological constraints on DNA synthesis and sequencing, this model captures

three key distinctive aspects of DNA storage systems: (1) the data is written

onto many short DNA molecules; (2) the molecules are corrupted by noise during

synthesis and sequencing and (3) the data is read by randomly sampling from the

DNA pool. We provide capacity results for this channel under specific noise and

sampling assumptions and show that, in many scenarios, a simple index-based

coding scheme is optimal.

Point-line incidence on Grassmannians and majority logic decoding of Grassmann codes

Peter Beelen , Prasant Singh Subjects : Information Theory (cs.IT) ; Algebraic Geometry (math.AG)

In this article, we consider the decoding problem of Grassmann codes using

majority logic. We show that for two points of the Grassmannian, there exists a

canonical path between these points once a complete flag is fixed. These paths

are used to construct a large set of parity checks orthogonal on a coordinate

of the code, resulting in a majority decoding algorithm.

Data-Driven Ensembles for Deep and Hard-Decision Hybrid Decoding

Tomer Raviv , Nir Raviv , Yair Be'ery Subjects : Information Theory (cs.IT)

Ensemble models are widely used to solve complex tasks by their decomposition

into multiple simpler tasks, each one solved locally by a single member of the

ensemble. Decoding of error-correction codes is a hard problem due to the curse

of dimensionality, leading one to consider ensembles-of-decoders as a possible

solution. Nonetheless, one must take complexity into account, especially in

decoding. We suggest a low-complexity scheme where a single member participates

in the decoding of each word. First, the distribution of feasible words is

partitioned into non-overlapping regions. Thereafter, specialized experts are

formed by independently training each member on a single region. A classical

hard-decision decoder (HDD) is employed to map every word to a single expert in

an injective manner. FER gains of up to 0.4dB at the waterfall region, and of

1.25dB at the error floor region are achieved for two BCH(63,36) and (63,45)

codes with cycle-reduced parity-check matrices, compared to the previous best

result of the paper “Active Deep Decoding of Linear Codes”.

Duplication with transposition distance to the root for (q)-ary strings

Nikita Polyanskii , Ilya Vorobyev

Comments: 6 pages, 1 table, submitted to International Symposium on Information Theory (ISIT) 2020



Information Theory (cs.IT)

We study the duplication with transposition distance between strings of

length (n) over a (q)-ary alphabet and their roots. In other words, we

investigate the number of duplication operations of the form (x = (abcd) o y

= (abcbd)), where (x) and (y) are strings and (a), (b), (c) and (d) are their

substrings, needed to get a (q)-ary string of length (n) starting from the set

of strings without duplications. For exact duplication, we prove that the

maximal distance between a string of length at most (n) and its root has the

asymptotic order (n/log n). For approximate duplication, where a

(eta)-fraction of symbols may be duplicated incorrectly, we show that the

maximal distance has a sharp transition from the order (n/log n) to (log n)

at (eta=(q-1)/q). The motivation for this problem comes from genomics, where

such duplications represent a special kind of mutation and the distance between

a given biological sequence and its root is the smallest number of

transposition mutations required to generate the sequence.

Chebyshev Inertial Landweber Algorithm for Linear Inverse Problems

Tadashi Wadayama , Satoshi Takabe

Comments: 5 pages



Information Theory (cs.IT)

; Numerical Analysis (math.NA)

The Landweber algorithm defined on complex/real Hilbert spaces is a gradient

descent algorithm for linear inverse problems. Our contribution is to present a

novel method for accelerating convergence of the Landweber algorithm. In this

paper, we first extend the theory of the Chebyshev inertial iteration to the

Landweber algorithm on Hilbert spaces. An upper bound on the convergence rate

clarifies the speed of global convergence of the proposed method. The Chebyshev

inertial Landweber algorithm can be applied to wide class of signal recovery

problems on a Hilbert space including deconvolution for continuous signals. The

theoretical discussion developed in this paper naturally leads to a novel

practical signal recovery algorithm. As a demonstration, a MIMO detection

algorithm based on the projected Landweber algorithm is derived. The proposed

MIMO detection algorithm achieves much smaller symbol error rate compared with

the MMSE detector.

Constrained Functional Value under General Convexity Conditions with Applications to Distributed Simulation

Yanjun Han Subjects : Information Theory (cs.IT) ; Functional Analysis (math.FA)

We show a general phenomenon of the constrained functional value for

densities satisfying general convexity conditions, which generalizes the

observation in Bobkov and Madiman (2011) that the entropy per coordinate in a

log-concave random vector in any dimension with given density at the mode has a

range of just 1. Specifically, for general functions (phi) and (psi), we

derive upper and lower bounds of density functionals taking the form (I_phi(f)

= int_{mathbb{R}^n} phi(f(x))dx) assuming the convexity of (psi^{-1}(f(x)))

for the density, and establish the tightness of these bounds under mild

conditions satisfied by most examples. We apply this result to the distributed

simulation of continuous random variables, and establish an upper bound of the

exact common information for (eta)-concave joint densities, which is a

generalization of the log-concave densities in Li and El Gamal (2017).

Low Complexity Algorithms for Transmission of Short Blocks over the BSC with Full Feedback

Amaael Antonini , Hengjie Yang , Richard D. Wesel

Comments: Submitted to ISIT 2020; comments welcome!



Information Theory (cs.IT)

Building on the work of Horstein, Shayevitz and Feder, and Naghshvar et al.,

this paper presents algorithms for low-complexity sequential transmission of a

(k)-bit message over the binary symmetric channel (BSC) with full, noiseless

feedback. To lower complexity, this paper shows that the initial (k) binary

transmissions can be sent before any feedback is required and groups the

messages with equal posteriors to reduce the number of posterior updates from

exponential in (k) to linear in (k). Simulations results demonstrate the

achievable rates for this full, noiseless feedback system approach capacity

rapidly as a function of (k), significantly faster than the achievable rate

curve of Polyanskiy et al. for a stop feedback system.

An Efficient Algorithm for Designing Optimal CRCs for Tail-Biting Convolutional Codes

Hengjie Yang , Linfang Wang , Vincent Lau , Richard D. Wesel

Comments: Submitted to ISIT 2020; comments welcome!



Information Theory (cs.IT)

This paper proposes an efficient algorithm for designing the

distance-spectrum-optimal (DSO) cyclic redundancy check (CRC) polynomial for a

given tail-biting convolutional code (TBCC). Lou et al. proposed DSO CRC design

methodology for a given zero-terminated convolutional code (ZTCC), in which the

fundamental design principle is to maximize the minimum distance at which an

undetectable error event of ZTCC first occurs. This paper applies the same

principle to design the DSO CRC for a given TBCC. Our algorithm is based on

partitioning the tail-biting trellis into several disjoint sets of tail-biting

paths that are closed under cyclic shifts. This paper shows that the

tail-biting path in each set can be constructed by concatenating the

irreducible error events and/or circularly shifting the resultant path. This

motivates an efficient collection algorithm that aims at gathering irreducible

error events, and a search algorithm that reconstructs the full list of error

events in the order of increasing distance, which can be used to find the DSO

CRC for a given TBCC.

GSSMD: New metric for robust and interpretable assay quality assessment and hit selection

Seongyong Park , Shujaat Khan

Comments: Submitted to Research Synthesis Methods



Applications (stat.AP)

; Information Theory (cs.IT); Quantitative Methods (q-bio.QM); Methodology (stat.ME)

In the high-throughput screening (HTS) campaigns, the Z’-factor and strictly

standardized mean difference (SSMD) are commonly used to assess the quality of

assays and to select hits. However, these measures are vulnerable to outliers

and their performances are highly sensitive to background distributions. Here,

we propose an alternative measure for assay quality assessment and hit

selection. The proposed method is a non-parametric generalized variant of SSMD

(GSSMD). In this paper, we have shown that the proposed method provides more

robust and intuitive way of assay quality assessment and hit selection.

Performance of Statistical and Machine Learning Techniques for Physical Layer Authentication

Linda Senigagliesi , Marco Baldi , Ennio Gambi

Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Information Forensics and Security. arXiv admin note: text overlap with arXiv:1909.07969



Cryptography and Security (cs.CR)

; Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper we consider authentication at the physical layer, in which the

authenticator aims at distinguishing a legitimate supplicant from an attacker

on the basis of the characteristics of the communication channel.

Authentication is performed over a set of parallel wireless channels affected

by time-varying fading at the presence of a malicious attacker, whose channel

has a spatial correlation with the supplicant’s one. We first propose the use

of two different statistical decision methods, and we prove that using a large

number of references (in the form of channel estimates) affected by different

levels of time-varying fading is not beneficial from a security point of view.

We then propose to exploit classification methods based on machine learning. In

order to face the worst case of an authenticator provided with no forged

messages during training, we consider one-class classifiers. When instead the

training set includes some forged messages, we resort to more conventional

binary classifiers, considering the cases in which such messages are either

labelled or not. For the latter case, we exploit clustering algorithms to label

the training set. The performance of both nearest neighbor (NN) and support

vector machine (SVM) classification techniques is assessed. Through numerical

examples, we show that under the same probability of false alarm, one-class

classification (OCC) algorithms achieve the lowest probability of missed

detection when a small spatial correlation exists between the main channel and

the adversary one, while statistical methods are advantageous when the spatial

correlation between the two channels is large.

Proceedings 16th Workshop on Quantitative Aspects of Programming Languages and Systems

Alessandro Aldini (University of Urbino), Herbert Wiklicky (Imperial College London)

Journal-ref: EPTCS 312, 2020



Programming Languages (cs.PL)

; Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Logic in Computer Science (cs.LO)

This EPTCS volume contains the proceedings of the 16th Workshop on

Quantitative Aspects of Programming Languages and Systems (QAPL 2019) held in

Prague, Czech Republic, on Sunday 7 April 2019. QAPL 2019 was a satellite event

of the European Joint Conferences on Theory and Practice of Software (ETAPS


QAPL focuses on quantitative aspects of computations, which may refer to the

use of physical quantities (time, bandwidth, etc.) as well as mathematical

quantities (e.g., probabilities) for the characterisation of the behaviour and

for determining the properties of systems. Such quantities play a central role

in defining both the model of systems (architecture, language design,

semantics) and the methodologies and tools for the analysis and verification of

system properties. The aim of the QAPL workshop series is to discuss the

explicit use of time and probability and general quantities either directly in

the model or as a tool for the analysis or synthesis of systems.

The 16th edition of QAPL also focuses on discussing the developments,

challenges and results in this area covered by our workshop in its nearly

20-year history.

Compactly Supported Quasi-tight Multiframelets with High Balancing Orders and Compact Framelet Transforms

Bin Han , Ran Lu

Comments: 33 pages, 20 figures



Functional Analysis (math.FA)

; Information Theory (cs.IT)

Framelets (a.k.a. wavelet frames) are of interest in both theory and

applications. Quite often, tight or dual framelets with high vanishing moments

are constructed through the popular oblique extension principle (OEP). Though

OEP can increase vanishing moments for improved sparsity, it has a serious

shortcoming for scalar framelets: the associated discrete framelet transform is

often not compact and deconvolution is unavoidable. Here we say that a framelet

transform is compact if it can be implemented by convolution using only

finitely supported filters. On the other hand, in sharp contrast to the

extensively studied scalar framelets, multiframelets (a.k.a. vector framelets)

derived through OEP from refinable vector functions are much less studied and

are far from well understood. Also, most constructed multiframelets often lack

balancing property which reduces sparsity. In this paper, we are particularly

interested in quasi-tight multiframelets, which are special dual multiframelets

but behave almost identically as tight multiframelets. From any compactly

supported emph{refinable vector function having at least two entries}, we

prove that we can always construct through OEP a compactly supported

quasi-tight multiframelet such that (1) its associated discrete framelet

transform is compact and has the highest possible balancing order; (2) all

compactly supported framelet generators have the highest possible order of

vanishing moments, matching the approximation/accuracy order of its underlying

refinable vector function. This result demonstrates great advantages of OEP for

multiframelets (retaining all the desired properties) over scalar framelets.


arXiv Paper Daily: Mon, 20 Jan 2020



以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网






[日] 结城浩 / 周自恒 / 人民邮电出版社 / 2014-12 / 79.00元

本书以图配文的形式,详细讲解了6种最重要的密码技术:对称密码、公钥密码、单向散列函数、消息认证码、数字签名和伪随机数生成器。 第一部分讲述了密码技术的历史沿革、对称密码、分组密码模式(包括ECB、CBC、CFB、OFB、CTR)、公钥、混合密码系统。第二部分重点介绍了认证方面的内容,涉及单向散列函数、消息认证码、数字签名、证书等。第三部分讲述了密钥、随机数、PGP、SSL/TLS 以及密码技......一起来看看 《图解密码技术》 这本书的介绍吧!



MD5 加密
MD5 加密

MD5 加密工具

SHA 加密
SHA 加密

SHA 加密工具