On GraphQL-to-SQL

栏目: IT技术 · 发布时间: 4年前

内容简介:Looking at this query, this is the absolute most efficient way of getting the data for it. This is a really interesting idea, and it's not surprising many people are aiming for something like this :point_down:All these tools are slightly different and the

GraphQL has a reputation for its N+1 problem which can often happen when implemented naively. This leads to a lot of us trying to solve the issue of data fetching with GraphQL in the most efficient way possible. Besides the popular Dataloader approach, another very common way of tackling this problem is by taking a GraphQL query, and coming up with the exact SQL needed to resolve it:

// example from Join Monster
{                           SELECT                  
  user(id: 1) {               "user"."id",          
    idEncoded                 "user"."first_name",  
    fullName        ==>       "user"."last_name",    
    email                     "user"."email_address"
  }                         FROM "accounts" AS "user"
}                           WHERE "user"."id" = 1

Looking at this query, this is the absolute most efficient way of getting the data for it. This is a really interesting idea, and it's not surprising many people are aiming for something like this :point_down:

All these tools are slightly different and the core problem they solve is not all the same. However, they all rely a lot on compiling a GraphQL AST into a SQL statement that will hopefully be more performant than the combined SQL statements that would be executed from a naive GraphQL implementation. I see a lot of effort towards those tools, but they never really seemed to be able to solve the GraphQL data fetching issues I see day-to-day.

Some are tied to a particular style of development

Hasura and PostGraphile are really solid products, and are different from just a GraphQL-to-SQL in the sense that they aim to be a complete engine for building applications. In both these tools the database is what drives a lot of the application logic, although Hasura allows you to "stitch" your own custom resolvers with the database-backed schema and PostGraphile handles custom use cases with PostgreSQL Functions .

Honestly, both these tools are an amazing fit to get an "instant GraphQL API", especially when you don't want to deal with maintaining your own GraphQL server. That's the tradeoff here. However, for existing server applications interested in offering a GraphQL interface, it's less of the sweet spot. Same thing with Super Graph . It takes an existing database, introspects it, and generates a full GraphQL schema for it, including sortings and orderings. Useful if you want to scaffold something quickly, but definitely not that useful for large existing codebases with a lot of logic outside of the database.

Coupling

Other tools like Join Monster and SqlMancer take a slightly different approach. They don't necessarily need to introspect an existing database, decoupling the GraphQL schema definition from the database schema definition.

Tools like Join Monster (and Prisma Tools to a certain degree) need to somehow look at a GraphQL query, and decide what to fetch from the database.

For Join Monster, a requirement is that our object types map to database tables:

const User = new GraphQLObjectType({
  name: 'User',
  sqlTable: 'accounts'
  uniqueKey: 'id',
  fields: () => ({ /*...*/ })
})

The biggest hurdle for these tools is going beyond a 1:1 mapping between GraphQL fields and SQL columns. For example, how would you handle a GraphQL field name , which actually relies on logic concatenating firstName and lastName ? It would be very hard to make that happen magically, so Join Monster for example allows you to configure this behavior :

const User = new GraphQLObjectType({
  //...
  fields: () => ({
    fullName: {
      description: 'A user\'s first and last name',
      type: GraphQLString,
      // perhaps there is no 1-to-1 mapping of field to column
      // this field depends on multiple columns
      sqlDeps: [ 'first_name', 'last_name' ],
      resolve: user => `${user.first_name} ${user.last_name}`
    }
  })
})

The Join Monster API comes with quite a few constraints. Our types must map to database tables and we must know the SQL dependencies of our fields. Imagine the field PullRequest.mergeable on GitHub's GraphQL API. We can imagine this field probably needs to load data coming from git, data coming from various "checks" running on the PullRequest, data coming from different pull request reviews. Not only is that already more than a single table, but it's likely to evolve with more data needed overtime. This makes it really hard to come up with a sqlDeps configuration that would not be brittle.

It's quite difficult for an API as complex as GitHub, or really any really large application to be able to provide the exact database requirements to fulfill a use case. Sometimes data is loaded deep within existing logic, wrapped in conditionals. Not only that, but this also doesn't take into consideration application caching, and fields that make calls to other data sources or other services. Don't get me wrong, this is also really challenging with a lazy loading approach. But at least we don't have to maintain configuration for SQL at our interface layer, which does seem better to me.

Beware of tools that make this seem too easy. Join Monster is really good at what it does, but as you can see requires complex configuration to make things work outside the happy path. If the library doesn't have these configurations and only relies on naming conventions, either it's an auto-generated GraphQL API from a database schema, or it's probably too naive to handle more complex cases (Or maybe it has some AI to analyze your data dependencies, who knows :speak_no_evil:).

Is one large query always better?

Is generating a giant SQL statement always faster than a few smaller SQL statements? That's far from a new question and is obviously very hard to answer generically like this. However, we can certainly say it is not always the case . For questions like these, I always read up High Performance MySQL . I don't want to spoil the book here, but it does highlight many reasons why a giant SQL query is not always the better choice. A lot of applications often need to try "join decomposition". This gives us potentially a few advantages:

  • More potential for cache hits
  • Sometimes reduces lock contention
  • IN() queries can sometimes be faster than complex JOINS
  • And a lot more things that are explained in that book!

It's probably possible for some GraphQL-to-SQL tools to be smart enough to know how to do this, but it definitely is more of a black box than optimizing your dataloaders. All that to say that database performance is not always as simple as making one large SQL statement.

Data fetching remains something to improve

While I would not suggest most of these tools for those who have existing and complex code bases, I do think there is room for a lot of improvement with GraphQL execution. While GraphQL is datastore agnostic, the nature of its execution does impose certain constraints on how we structure application logic. Ideally, things would just work TM . Prisma client is exciting on that side of things , if it keeps getting better and better. Maybe another solution we'll see is some kind of proxy a la ProxySQL where the application can use the database without worrying about GraphQL execution, and somehow the proxy making sense of queries and making for performant ones.

If you're interested in bootstrapping a new API and you're comfortable with using the database server as the center-piece, I can only recommend Hasura and Postgraphile, they're both great. For existing codebases, and especially for large/complex ones, I'm sticking with the recommendation of using a Dataloader/Lazy/Asynchronous loading approach for GraphQL APIs, over tools that aim for ahead-of-time SQL generation.

If you've used some of these tools on existing complex codebases, I'd love to hear from you.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

运营有道:重新定义互联网运营

运营有道:重新定义互联网运营

李明轩 / 机械工业出版社 / 2017-7-31 / 69.00元

本书是前百度资深运营专家多年运营经验的总结,是作者运营千万级用户规模的大型互联网产品的实操经验复盘,是作者在“在行”上为近百位CEO和高管提供互联网运营咨询服务后对互联网运营需求的深入洞见。 本书的思想基础是“运营必须以用户为中心”,从产品、用户、市场3个维度对互联网运营重新进行了系统性的梳理:从道的层面解读并重新定义运营方法论,从术的层面围绕方法论提出行之有效的解决方法和实际案例。重点不在......一起来看看 《运营有道:重新定义互联网运营》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

在线进制转换器
在线进制转换器

各进制数互转换器

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具