Our AI/ML Startup’s Tech Stack

栏目: IT技术 · 发布时间: 4年前

内容简介:Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…Hopefully, this gives you an idea of some of the rel

Some insight into how we’re building our technology.

Mar 23 ·4min read

Our AI/ML Startup’s Tech Stack

Photo Creds: Unsplash

Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…

Hopefully, this gives you an idea of some of the relevant technologies in our field. From talking to fellow startup founders, this stack is pretty similar across a lot of other Machine Learning focused data teams, with some variations from industry and personal circumstance.

Overview of our stack:

Spawner API :

  • Languages: Python, C++, SQL
  • AI/ML: TensorFlow (our toolkit of choice for DL), Scikit-Learn (our go-to for most non-DL tasks)
  • Other Libraries: Pandas, Numpy, fbprophet, NLTK, scipy, ffn, pyodbc, APScheduler
  • Database: SQL Server, migration to PostgreSQL
  • Warehouse: N/A
  • ETL: Python, Airflow
  • Visualizations: Streamlit, Plotly (visualizing app performance), Altair (viz and dashboarding for new ideas), Tableau (internal business intelligence)
  • Hosting: Azure (core), Heroku (side projects & demos)
  • Tracking & SC: GitHub, Notion (keeping engineering, PMs and marketing synced up)

Spawner Portal:

  • Languages & Frameworks: (FE) React + Next.js, (BE) Python
  • Database: SQL Server
  • Hosting: Azure

Languages

We use Python for basically everything. When something that isn’t serving efficiently or wasn’t built very well isn’t keeping up, we think about converting to C++ with Python serving merely as a reference implementation. We use Python most heavily for our modeling and ETL. We’re very much a data company so of course there’s SQL everywhere.

AI/ML

We like TensorFlow for its great documentation and high number of devs with TensorFlow familiarity. Though PyTorch is starting to make some real headways, especially with all the great work Facebook Research has done recently. For now, TensorFlow is the majority of our stack, but I see nothing wrong with TF and PyTorch mingling in the future.

Scikit-Learn pops up all over the place. Its ease of use is undeniable. It’s seen in production at companies all over industry. It’s really the bread and butter of much of what we do non-deep learning that we do on the ML side.

Frontends & Frameworks

Quite frankly I’m not a frontend dev and so I won’t waste any of your time on this section. Our first hire liked Vue.js and so we went that direction originally. He thought React/Next.js made more since for another part of the codebase so that’s how that happened. We’re incredibly pleased with the work our devs have done. We love Next.js for its SEO friendliness.

Database

Our stack lives on Azure, so SQL Server seemed to make the most sense up front. From a cost and ease of use perspective, the two are obviously tightly integrated. Other than the social pressure of “you’re not using MySQL or PostgreSQL???” it’s doing everything we need, for now. We’re eyeing a potential move to PostgreSQL.

Data Warehouse

We’re slowly moving into Databricks as the volume of data grows. I’m a huge Databricks fan, also a big Snowflake fan. The two partnering up is awesome. Databricks is spectacular and I expect to bring Databricks fully online very soon.

Extract, Transform, Load (ETL)

Until recently, I was almost exclusively using Python + SQL & SQL Alchemy to do most ETL tasks. Someone I know forced me to check out Airflow and all of a sudden it’s becoming part of our stack. Scheduling workflows feels a little more natural using Airflow than stringing together cron jobs.

Tracking & Source Control

We use GitHub. Shocker.

We use Notion and I find myself using Notion for more than just project management and tracking. I use it for personal accountability and really just a technical diary. I’m able to keep track of what I do on a daily basis, make sure I’m allocating time efficiently, and track what everyone on the team is up to and where I can help out or make someone’s life easier.

Visualizations

We love Streamlit; it helps us demo models and API endpoints in very little time. I dig Plotly and Altair at the moment for their ease of use on non-public projects. Plotly gives us a bunch of flexibility and features without much effort. Altair gives us more in-depth features and customization for extra effort.

For business metrics and keeping track of revenue, churn, GA, etc we love throwing stuff into Tableau. It’s an easy way for our non-technical folks to dig straight into the analytics.

You can check out our product here.

Let’s continue the conversation on Twitter!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

从规范出发的程序设计

从规范出发的程序设计

[美] Carroll Morgan / 裘宗燕 / 机械工业出版社 / 2002-8 / 45.00元

本书详细论述了有关规范程序设计的内容,包括:程序和精化、谓词演算、选择、迭代、构造类型、模块和封装等,最后几章还包含了大量的实例研究和一些更高级的程序设计技术。本书提倡一种严格的程序开发方法,分析问题要用严格方式写出程序的规范,而后通过一系列具有严格理论基础的推导,最终得到可以运行的程序。 本书是被世界上许多重要大学采用的教材,适于计算机及相关专业的本科生和研究生使用。一起来看看 《从规范出发的程序设计》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器