Our AI/ML Startup’s Tech Stack

栏目: IT技术 · 发布时间: 4年前

内容简介：Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…Hopefully, this gives you an idea of some of the rel

Some insight into how we’re building our technology.

Luke Posey

Mar 23 ·4min read

Our AI/ML Startup’s Tech Stack — Photo Creds: Unsplash

Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…

Hopefully, this gives you an idea of some of the relevant technologies in our field. From talking to fellow startup founders, this stack is pretty similar across a lot of other Machine Learning focused data teams, with some variations from industry and personal circumstance.

Overview of our stack:

Spawner API :

Languages: Python, C++, SQL
AI/ML: TensorFlow (our toolkit of choice for DL), Scikit-Learn (our go-to for most non-DL tasks)
Other Libraries: Pandas, Numpy, fbprophet, NLTK, scipy, ffn, pyodbc, APScheduler
Database: SQL Server, migration to PostgreSQL
Warehouse: N/A
ETL: Python, Airflow
Visualizations: Streamlit, Plotly (visualizing app performance), Altair (viz and dashboarding for new ideas), Tableau (internal business intelligence)
Hosting: Azure (core), Heroku (side projects & demos)
Tracking & SC: GitHub, Notion (keeping engineering, PMs and marketing synced up)

Spawner Portal:

Languages & Frameworks: (FE) React + Next.js, (BE) Python
Database: SQL Server
Hosting: Azure

Languages

We use Python for basically everything. When something that isn’t serving efficiently or wasn’t built very well isn’t keeping up, we think about converting to C++ with Python serving merely as a reference implementation. We use Python most heavily for our modeling and ETL. We’re very much a data company so of course there’s SQL everywhere.

AI/ML

We like TensorFlow for its great documentation and high number of devs with TensorFlow familiarity. Though PyTorch is starting to make some real headways, especially with all the great work Facebook Research has done recently. For now, TensorFlow is the majority of our stack, but I see nothing wrong with TF and PyTorch mingling in the future.

Scikit-Learn pops up all over the place. Its ease of use is undeniable. It’s seen in production at companies all over industry. It’s really the bread and butter of much of what we do non-deep learning that we do on the ML side.

Frontends & Frameworks

Quite frankly I’m not a frontend dev and so I won’t waste any of your time on this section. Our first hire liked Vue.js and so we went that direction originally. He thought React/Next.js made more since for another part of the codebase so that’s how that happened. We’re incredibly pleased with the work our devs have done. We love Next.js for its SEO friendliness.

Database

Our stack lives on Azure, so SQL Server seemed to make the most sense up front. From a cost and ease of use perspective, the two are obviously tightly integrated. Other than the social pressure of “you’re not using MySQL or PostgreSQL???” it’s doing everything we need, for now. We’re eyeing a potential move to PostgreSQL.

Data Warehouse

We’re slowly moving into Databricks as the volume of data grows. I’m a huge Databricks fan, also a big Snowflake fan. The two partnering up is awesome. Databricks is spectacular and I expect to bring Databricks fully online very soon.

Extract, Transform, Load (ETL)

Until recently, I was almost exclusively using Python + SQL & SQL Alchemy to do most ETL tasks. Someone I know forced me to check out Airflow and all of a sudden it’s becoming part of our stack. Scheduling workflows feels a little more natural using Airflow than stringing together cron jobs.

Tracking & Source Control

We use GitHub. Shocker.

We use Notion and I find myself using Notion for more than just project management and tracking. I use it for personal accountability and really just a technical diary. I’m able to keep track of what I do on a daily basis, make sure I’m allocating time efficiently, and track what everyone on the team is up to and where I can help out or make someone’s life easier.

Visualizations

We love Streamlit; it helps us demo models and API endpoints in very little time. I dig Plotly and Altair at the moment for their ease of use on non-public projects. Plotly gives us a bunch of flexibility and features without much effort. Altair gives us more in-depth features and customization for extra effort.

For business metrics and keeping track of revenue, churn, GA, etc we love throwing stuff into Tableau. It’s an easy way for our non-technical folks to dig straight into the analytics.

You can check out our product here.

Let’s continue the conversation on Twitter!

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Our AI/ML Startup’s Tech Stack

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

数据结构与问题求解

韦斯 / 清华大学出版社 / 2011-8 / 89.50元

《数据结构与问题求解(Java语言版)(第4版)》是专为计算机科学专业的两个学期课程而设计的，从介绍什么足数据结构开始，继而对高级数据结构与算法进行分析。《数据结构与问题求解(Java语言版)(第4版)》以独特的方式，清晰地将每种数据结构的接口与其实现分离开来，即将如何使用数据结构与如何对数据结构编程相分离。《数据结构与问题求解(Java语言版)(第4版)》从抽象思维和问题求解的角度出发，为数据结......一起来看看《数据结构与问题求解》这本书的介绍吧!

码农工具