Sentiment Analyzer with BERT (build, tune, deploy)

栏目: IT技术 · 发布时间: 4年前

Sentiment Analyzer with BERT (build, tune, deploy)

Brief description of how I developed sentiment analyzer. It covers text preprocessing, model building, tuning, API, frontend creation and containerization.

Zuzanna Deutschman

Jul 24 ·4min read

Sentiment Analyzer with BERT (build, tune, deploy)

Dataset

I used the dataset published by The Stanford NLP Group . I merged two files, namely ‘dictionary.txt’ including 239,232 text fragments and ‘sentiment_labels.txt’ containing the sentiment scores assigned to the various text fragments.

Text preprocessing with regular expressions

To clean the text, I usually use a bunch of functions containing regular expressions. In common.py you can find all of them, for example remove_nonwords described below:

Similar functions were used for empty rows, special signs, numbers and html code removal.

After text cleaning, it’s time for BERT embeddings creation. For that purpose, I used bert-as-service . It is very simple and consists of only 3 steps: download a pre-trained model, start the BERT service and use client for sentence encodings of specified length.

There are multiple parameters that can be setup, when running a service. For example, to define max_seq_len , I calculated 0.9 quantile of train data length.

Preprocessed data has a form of data frame containing 768 features. For full code, please go to nlp_preprocess.py .

Model building with Keras

In this part, we build and train the model on different parameters. Let’s assume we want 5-layers neural network as below. We will parametrize batch_size, number of epochs, number of nodes in the first 4 dense layers and 5 dropout layers.

Model tuning with Sacred

Now we can tune the parameters. We will use sacred module. Key points here are:

1. Create an Experiment and add Observer

First we need to create an experiment and observer that logs all kinds of information. It’s very simple!

2. Define the main function

The @ex.automain decorator defines and runs the main function of the experiment when we run the Python script.

3. Add the Configuration parameters

We will define them through Config Scope.

4. Add metrics

In our case here I want to know the MAE and MSE . We can use the Metrics API for that.

5. Run the experiment

Functions from the previous steps are stored in model_experiment.py script. In order to run our exepriment for bunch of parameters, we create and run run_sacred.py. For all possible permutations, MAE and MSE will be saved in MongoDB.

The best result I got is 9% of MAE score. That means that our sentiment analyzer works pretty good. We can check it with model_inference function.

Please note that the score is normalized so that outlier values can be also obtained. After model is saved, we can build a Web API!

Web API creation with Flask

Now we want to create an API that runs the code in the function and displays the returned result in the browser.

The syntax @app.route('/score', methods=['PUT']) lets Flask know that the function, score , should be mapped to the endpoint/score . The methods list is a keyword argument that tells us what kind of HTTP requests are allowed. We’ll be using PUT requests to receive sentences from a user. In function score , we get a score in dictionary form, since it can be easily converted to a JSON string. Full code is available in api.py .

Frontend

For web interface, three files were created:

index.html
style.css
index.js

For gradient HSV model was used. Saturation and Value are constants. Hue corresponds to score value. Changing hue in range [0;120] yields smooth colour change from red to yellow to green.

Docker containerization

The brilliance of Docker is that, once you package an application and all its dependencies into container, you ensure it will run in any environment. It is generally recommended to separate areas of concern by using one service per container. In my small app there are 3 parts that should be combined: bert-as-service, application and frontend. The tool that helps you build Docker images and run containers is Docker Compose .

Steps that we need to do to dockerize our code:

Create separate folders for bert-as-service, api and frontend,
Put there relevant files,
Add requirenments.txt and Dockerfile to each folder. The first file should cover all needed libraries that will be installed via command in the second file. Its format is described in docker documentation
Create docker-compose.yaml in the 3 folders directory. Define the 3 services that make up the app in this file, so they can be run together in an isolated environment.

Now we are ready to build and run our application! Please see the sample outputs below.

As usual, please feel free to view the full code on my Gitlab.

Projects · Zuzanna / Sentiment Analysis with BERT

GitLab.com

gitlab.co

以上所述就是小编给大家介绍的《Sentiment Analyzer with BERT (build, tune, deploy)》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Sentiment Analyzer with BERT (build, tune, deploy)

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

高等应用数学问题的MATLAB求解

薛定宇、陈阳泉 / 清华大学出版社 / 2008-10 / 49.00元

薛定宇和陈阳泉编著的《高等应用数学问题的MATLAB求解》首先介绍了MATLAB语言程序设计的基本内容，在此基础上系统介绍了各个应用数学领域的问题求解，如基于MATLAB的微积分问题、线性代数问题的计算机求解、积分变换和复变函数问题、非线性方程与最优化问题、常微分方程与偏微分方程问题、数据插值与函数逼近问题、概率论与数理统计问题的解析解和数值解法等，还介绍了较新的非传统方法，如模糊逻辑与模糊推理、......一起来看看《高等应用数学问题的MATLAB求解》这本书的介绍吧!

码农工具

Base64 编码/解码

HEX HSV 转换工具

HEX HSV 互换工具