Full Stack webscrapper for ML using nodeJS and mySQL

栏目: IT技术 · 发布时间: 6年前

内容简介：The documentation in this repository describe the FullStack webscrapping platform for use in Machine learning.

The documentation in this repository describe the FullStack webscrapping platform for use in Machine learning.

Architecture

Full Stack webscrapper for ML using nodeJS and mySQL

We first break the architecture into four distictive components namely Front-End, API, Scrapers and Database. The user sends information from the front-end to the API, the fron-end connects the API through a form. Inputs like the youtube URL are sent through front-end. Later the scrapers through the API pulls the necessary data and is saved to the database. Afterwhich the data is served to the front-end.

The Tech Stack are as below

Front-End - javascript
API - express
scraper - puppeteer
db - mysql (typeorm)

Also we need nodejs, npm and mysql.

The Architecture consists of several components:

Front End

For the Front-end we will have a header, an input box and a button. Below which we will have render boxes which renders relevant info from json. This will send data to the API.

API

We will have to create a single route with two methods GET and POST. We use nodejs and simple backed framework express.

Scraper

This function takes in URL and reaches out to YouTube, fetch the relevant data and then store it into the database.

Database

We use mySQL here. Here we add id, name, avatar and channelURL

To run the program

First go into server

$ npm install init

Install all the necessary packages

$ npm install express
$ npm install body-parser

Run the index.js script

$ node index.js

Thanks to Aron from Uber

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Full Stack webscrapper for ML using nodeJS and mySQL

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Web性能优化

Patrick Killelea 谢 / 谢文亮 / 清华大学出版社 / 2003-11-01 / 49.00元

本书讲述如何将Web性能调至最佳状态。书中不仅谈到了Web服务器软件的优化，而且还涉及到如何流水化处理Web内容，如何从浏览器端着手优化性能，如何调校客户端和服务器的硬件，以及如何最大限度地使用网络本身的特性。书中的内容涉及到影响性能好坏的本质，并为得到立竿见影的效果提供了具体建议。本书向您娓娓道出评价计算性能高低的准则，并在后半部分讲述从客户端、网络直到服务器这一链条中每个环节的薄弱之一起来看看《Web性能优化》这本书的介绍吧!

码农工具

Full Stack webscrapper for ML using nodeJS and mySQL

Architecture

Front End

API

Scraper

Database

Web性能优化

CSS 压缩/解压工具

在线进制转换器

html转js在线工具