GraphQL vs. REST


A comparison for fetching data from GitHub's GraphQL and REST APIs

Posted 2 months ago

Introduction

This past summer I interned in San Francisco as a software engineering intern at a company called Doximity, a medical-based tech startup focused on doctors and clinicians. In this article, I will share an experience I had at the company by detailing an engineering decision I made, where I designed an internal tool based on an analysis of the GraphQL and REST APIs for GitHub.

Technical Background

Before diving into the problem statement and practical examples, I'd like to take some time to go over some technical terminology that will be used throughout the article.

CRUD


Create, read, update, and delete, commonly known by the acronym CRUD, is a set of actions you may perform on a resource in persistent storage applications.

HTTP verbs, which are often used to build RESTful endpoints, may be expressed by the CRUD acronym as follows:

C:   PUT/POST         # Creates a resource (if the resource does not exist)
R:   GET              # Fetches an existing resource
U:   PUT/POST/PATCH   # Updates an existing resource
D:   DELETE           # Deletes an existing resource

Figure 1: HTTP methods, and the CRUD actions which they map to

For problem scope and brevity's sake, I will only be focusing on the R in CRUD (the READ action), and how it pertained to a solution I devised at my last internship.

REST


"REST (Representational State Transfer) is an application program interface (API) that uses HTTP requests to GET, PUT, POST and DELETE data." [1]

REST is the most common way to build CRUD APIs. Moreover, it is most commonly built with HTTP methods. Other protocols exist for RESTful servers, such as Simple Object Access Protocol (SOAP), but for the purpose of the article, I will only be focusing on RESTful endpoint architectures which use HTTP verbs.

Regarding CRUD, the READ action is equivalent to the GET HTTP verb for a particular Uniform Resource Identifier (URI) (see figure 1). Making a GET request for a particular endpoint will return a representation of a resource (or list of resources), which is commonly in the form of some JavaScript Object Notation (JSON) schema.

Example Request-Response Cycle

Request:

GET /users                 # Perform GET request on /users URI
Accept: application/json   # Designates content is in JSON format

Response:

200 OK
Content-Type: application/json

{
  "users": [
    {
      "id": 1,
      "first_name": "Andrew"
    },
    {
      "id": 2,
      "first_name": "Robert"
    }
  ]
}

Figure 2: example of a GET request-response cycle for the /users endpoint

In this example, the URI /users corresponds to the User resource. When a request is made, the endpoint returns an array of JSON objects with the User object schema.

GraphQL


"GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data." [2]

GraphQL was developed by Facebook in 2012 as an alternative to conventional REST endpoints as a way to fetch and mutate data in a web server. It supports create, read, update, delete (CRUD) actions for persistent storage applications as REST does. It was designed as a specification, not an implementation, with a client implementation provided by Facebook and written in JavaScript (Node.js).

By definition, GraphQL is a query language. To achieve the READ CRUD action functionality in GraphQL, a user may perform a query. A GraphQL query, in its most rudimentary form, is a way to ask a given object for its fields.

{
  user {
    id     # 'id' returns an 'Int' type
    name   # 'name' returns a 'String' type
  }
}

Figure 3: rudimentary query asking to fetch a user's id and name

Figure 4: an example of a GraphQL query and response in GitHub's GraphQL (v4) API

Since being open-sourced in 2015, GraphQL has gained popularity, and now has clients implemented in over 10 languages, including the popular programming languages Elixir, Java, Python, Clojure, and Ruby.

import sangria.schema._
import sangria.execution._
import sangria.macros._

// Defining a potential query type with a value for the field "hello"
val QueryType = ObjectType("Query", fields[Unit, Unit](
  Field("hello", StringType, resolve = _ ⇒ "Hello world!")
))

// Creating a GraphQL schema for the given query type
val schema = Schema(QueryType)

// Executing a query on the schema with the given potential query type
val query = graphql"{ hello }"
Executor.execute(schema, query) map println

Figure 5: an example usage of a GraphQL client written in Scala

GraphQL is gaining traction and popularity in the developer community due to its ease of use, and performance benefits such as reducing the number of requests required to fetch desired data. Several larger tech companies such as GitHub, Shopify, and Pinterest have adopted GraphQL into their stacks, and have started to write versions of their APIs with the GraphQL specifications, to take advantage of the benefits aforementioned.

Problem Statement

For one of my projects at Doximity, I was assigned a task to investigate and subsequently build a tool to help recruiters research potential software engineering candidates using GitHub as a means of finding them.

Business Requirements


Initial Specifications

The initial specifications for the tool were as follows:

  • Specification 1: A recruiter at Doximity should be able to find software engineers to reach out to for potential engineering outsourcing

  • Specification 2: The said recruiter should be able to search candidates based on location

  • Specification 3: The said recruiter should be able to search candidates based on skills (i.e., rails, docker, neo4j)

Data to collect from candidates

In total, there were seven requirements for the data collected about a candidate:

  • Requirement 1: Name

  • Requirement 2: Count of projects in each language

  • Requirement 3: Email

  • Requirement 4: List of organizations they belong to

  • Requirement 5: Number of followers

  • Requirement 6: If they are looking for a job

  • Requirement 7: Location

Research


I initially started researching ways I could leverage GitHub's REST API (v3) to find this information. However, I noticed a new API for GitHub in alpha stage utilizing the GraphQL specification (v4). I was curious about the benefits attained from using GraphQL over conventional REST architecture, and so I decided to compare the two in the context of my use case.

Comparing REST and GraphQL

When investigating the differences between the APIs, I noticed some shortcomings with the REST implementation and some advantages of the GraphQL one when leveraging the GitHub API for my use case.

Shortcomings of REST


Shortcoming: Numerous requests needed to fetch desired data

In our situation, multiple calls were necessary to retrieve all of the required data needed by our recruiters. Firstly, an initial request was made with a query string comprised of a list of locations for potential candidates, 'anded' to a list of languages said candidates had used in their repositories.

GET: "https://api.github.com/search/users?q=location:san-francisco&language:ruby"

Figure 6: example of the initial GET request, with a specified locations, and languages

This example would return an array of User objects comprised of the following JSON object schema:

{
  "login": "mojombo",
  "id": 1,
  "avatar_url": "https://avatars0.githubusercontent.com/u/1?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/mojombo",
  "html_url": "https://github.com/mojombo",
  "followers_url": "https://api.github.com/users/mojombo/followers",
  "following_url": "https://api.github.com/users/mojombo/following{/other_user}",
  "gists_url": "https://api.github.com/users/mojombo/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/mojombo/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/mojombo/subscriptions",
  "organizations_url": "https://api.github.com/users/mojombo/orgs",
  "repos_url": "https://api.github.com/users/mojombo/repos",
  "events_url": "https://api.github.com/users/mojombo/events{/privacy}",
  "received_events_url": "https://api.github.com/users/mojombo/received_events",
  "type": "User",
  "site_admin": false,
  "score": 1.0
}

Figure 7: JSON object schema for GitHub v3 REST API's serach/users endpoint

As you can see from the response schema, none of the required information in the business requirements is present in this response. However, some other API URLs exist, which can be queried to find the desired information.

Endpoints for api.github.com:

GET: "search/users?q=location:#{locations}&language:#{languages}" # fetches array of users including urls and 'user_name'
GET: "search/code?q=#{skill} user:#{user_name}"                   # search for a particular skill or key term
GET: "users/#{user_name}"                                         # name, email, location, isHireable, number of followers
GET: "users/#{user_name}/orgs"                                    # organizations they belong to
GET: "users/#{user_name}/repos"                                   # gets variable 'repo_name'
GET: "repos/#{user_name}/#{repo_name}/languages"                  # gets languages used in each repository

Figure 8: all the endpoints required in a tentative RESTful solution, using GitHub's v3 REST API

In total, to fetch all the desired information using the traditional REST architecture, we had to make one request to get $n$ candidates, and then three + $x$ + $R_m$ calls per candidate to pull all the required data.

let $n$ be the total number of users fetched in one query

let $x$ be the number of skills you wish to check for a given candidate

let $R_m$ be the average number of repositories a given candidate has

Formula for number of requests: $1 + \prod_{m=1}^n (3 + x + R_m)$

Shortcoming: Excess response information

Moreover, a significant shortcoming I found with RESTful requests is when I only wanted one line of data in a request, and by default, the endpoint would return multiple lines (sending more data over the server than was required). This shortcoming will be covered in depth when highlighting the advantages of GraphQL in this regard.

Advantages of GraphQL


Advantage: Inline fragments

GraphQL lets you benefit from a programming concept know as union types. In short, a union type in the context of GraphQL allows you to specify the number of permitted primitive and non-primitive types a value may hold. For an abstract type, you may fragment on multiple concrete types. In GraphQL, this concept may be illustrated through an example with an abstract base class and derived concrete classes.

Consider the following class hierarchy example with abstract base class Animal, and derived concrete classes Dog, Cat, and Bird:

Figure 9: UML of simple class hierarchy with an abstract base class and three derived child classes

In the query example below, pet returns an abstract type Animal, which may have any of the concrete types Dog, Cat, or Bird. For the sake of argument, say I want to use a GraphQL API to get a list of objects of type Dog and Bird, but not of type Cat. I could fragment on the concrete types as follows:

query PetsInStore($store: Store!) {
  pet(store: $store) {
    age
    name
    ... on Dog {
      fur_colour
    }
    ... on Bird {
      feather_colour
    }
  }
}

Figure 10: An example of fragmenting on two concrete types for a union type comprised of derived types of an abstract type [3]

This query would return an array of concrete objects of type Bird or Dog of abstract type Animal, with their corresponding fields feather_colour, and fur_colour.

Advantage: Tell GraphQL what you need

In conventional REST architectures, you often end up fetching more data than is desired for your use case. Say you wanted to fetch a list of user ids from a given endpoint with the following schema:

{
  "users": [
    {
      "id": 1,
      "first_name": "Andrew",
      "last_name": "McBurney",
      "age": 20,
      "citizenship": "Canadian",
      "birthday": "October 13th, 1996",
      "hometown": "Niagara Falls, Ontario"
    },
    ...
  ]
}

Figure 11: REST response for /users endpoint on an arbitrary API

In this example, the only data you're concerned about in the id field. However, when you perform a GET request on the /users endpoint, you end up fetching a list of objects of type User, with the fields id, first_name, last_name, age, citizenship, birthday, and hometown.

In GraphQL, you can simply by specifying the fields of interest to you. The above example may be simplified as follows:

query {
  users {
    id
  }
}

Figure 12: GraphQL solution for business requirements entailing only user id field

This query will return a JSON response of the following form:

{
  "users": [
    {
      "id": 1
    },
    {
      "id": 2
    }
    ...
  ]
}

Figure 13: GraphQL response for objects of type User, limiting the fields to id only

In GraphQL, you simply tell the server what data you want through a query, and it returns only that information. The advantage over REST, in this case, is that you end up sending fewer data over the server since you're only fetching the data you care about.

In summary:

REST    # potential for extra data sent over the server (fixed response)
GraphQL # tell it what you want, restrict data to what you need (dynamic response)

Advantage: One endpoint

Furthermore, there's no concept of multiple endpoints in GraphQL as there is with the conventional REST architecture. You can access all the information you require in an API simply by fragmenting on union types, rather than making multiple requests to attain the desired results.

Figure 14: in GraphQL, you can get the same information in fewer requests, from one endpoint rather than multiple like in REST architecture

Design Decision

Constraints and Limitations


Rate Limits

GitHub uses rate limits to restrict the number of requests a user may make to their API. Rate limits are commonly used as security measures to prevent malicious scraping of information.

For the v3 REST API, GitHub had a rate limit of 30 authenticated requests per minute [4] and 10 unauthenticated requests per minute [4] for their search API api.github.com/search.

For the v4 GraphQL API and the other v3 REST endpoints, the total number of authenticated requests you're allowed to make in an hour is 5000 (approximately 83 requests per minute).

Due to this limitation, there was an obligation to fetch the information in the fewest number of requests possible — to avoid the throttling of requests when exceeding the API's rate limit. This played an important factor in the design of the tool I implemented.

Compatibility Issues

Unfortunately, GraphQL is still in its alpha stages [5], and is not entirely compatible with the v3 REST API specification. A limitation I encountered was with regards to the code search feature in the v3 endpoint. Unfortunately, there was not a viable alternative to this endpoint in the GraphQL API at the time I implemented the solution.

Figure 15: code search issue opened up on GitHub's GraphQL API discussion forums

Final Decision


As was previously mentioned, when I was researching the GraphQL (v4) and REST (v3) APIs, I found two shortcomings with the REST implementation: numerous requests were needed to pull all the desired data, and excess response information was returned from the server. The GraphQL API mitigated these two shortcomings with the REST architecture for this use case. It reduced the problem I had with many requests, by enabling me to write a query that encompassed what would have been multiple requests to the REST API. Furthermore, it allowed me to specify only the fields relevant to my business logic, thus removing data I wasn't concerned with from the server response.

Due to the compatibility issues with the v4 GraphQL API not being backward compatible with the search/code endpoint from the v3 REST API api.github.com/search/code, I was tempted to implement the entire solution using the v3 REST API. However, I didn't want to lose out on the benefits GraphQL provides, and ended up creating a hybrid solution which leveraged both the v3 REST API and v4 GraphQL one.

Hybrid Solution

I ended up using the v3 REST search API to search the code for specific keywords, or skills in files. An example was searching for the term rails in the file Gemfile, to see if the candidate had rails projects on their GitHub.

GET: "https://api.github.com/search/code?q=rails+in+filename:Gemfile user:AndrewMcBurney"

Figure 16: example using v3 REST API search/code endpoint to search for all repositories containing the keyword rails in the filename Gemfile for user AndrewMcBurney

For all other business requirements, I used the following GraphQL query from the GitHub v4 API:


query($query_string: String!, $cursor: String!, $m: Int!) {
  search(query: $query_string, type: USER, first: $m, after: $cursor) {
    userCount
    edges {
      cursor
      node {
        # Fragmenting on concrete class User
        ... on User {
          # Requirement 1: Name
          name

          # Requirement 3: Email
          email

          # Requirement 6: If they are looking for a job
          isHireable

          # Requirement 7: Location
          location

          # Requirement 5: Number of followers
          followers {
            totalCount
          }

          # Requirement 4: List of organizations they belong to
          organizations(first: 10) {
            nodes {
              name
            }
          }

          # Requirement 2: Count of projects in each language
          repositories(first: 100, orderBy: { field: PUSHED_AT, direction: DESC }) {
            nodes {
              languages(first: 10, orderBy: { field: SIZE, direction: DESC }) {
                nodes {
                  name
                }
              }
            }
          }
        }
      }
    }
  }
}

Figure 17: GraphQL query to fetch all required data (except code searching)

While this query is certainly more complicated than the tentative REST solution in figure 8, it provides several advantages, such as reducing the number of requests needed to fetch all desired data and limiting the response to only contain the fields relevant to my business requirements.

let $m$ be the total number of users fetched in one query

let $x$ be the number of skills you wish to check for a given candidate

Formula for number of requests: $1 + mx$

Conclusion

Lessons Learned


Looking back, there were plenty of lessons to be learned from the design decisions I made. Here are some of the following lessons I'd like to share:

GraphQL isn't a silver bullet, and won't take over API design overnight

There are plenty of legacy REST APIs in production today, and only a handful of companies using GraphQL to design their new APIs. It will take a long time before GraphQL gains enough traction as a query language for it to take up a large portion of the market.

Tools in alpha stage are volatile and not always feature complete

As was discussed in the limitations section, the GitHub v4 API is not currently backward compatible with the v3 REST API. Hence, the new API is not feature complete with regards to the v3 REST endpoint. Moreover, some GraphQL APIs are not compatible with their legacy counterparts (i.e., GitHub v3 to v4 compatibility).

Furthermore, GitHub has warned users that API features are volatile and subject to change before the beta release. Thus, breaking changes are a possibility, which would lead to further maintenance for developers using the API.

The chances of GitHub modifying the fields I used from the objects I searched are very small — but there is still an opportunity for things to change before the final release of the v4 API.

In many situations, developer time is more valuable than your program runtime

While the hybrid solution I implemented was faster, and used fewer requests than a solution comprised solely of the v3 REST API, it added extra complexity. Given the volatile nature of the API, perhaps it wasn't the best decision to implement the tool partially using the v4 GraphQL API.

After completing the project, I found myself asking questions such as " was the benefit gained from using part of the GraphQL API worth the developer time spent to implement both GraphQL and REST clients, and the added complexity of scraping from two distinct APIs?"

Final Words


GraphQL is on the rise, and it's exciting to see how the developer community grows and responds to the new technology — as it is a relatively young query language.

If you're interested in learning more about GraphQL, and have a GitHub account, I would recommend playing around with the GitHub GraphQL API explorer. It's a great tool if you're interested in learning more about the query language, and the GitHub API itself.

As always, thank you for reading my article, and have a fantastic day!


— Andrew Robert

References

[1] http://searchcloudstorage.techtarget.com/definition/RESTful-API

[2] http://graphql.org/

[3] http://graphql.org/learn/queries/#inline-fragments

[4] https://developer.github.com/v3/search/#rate-limit

[5] https://developer.github.com/v4/

[6] https://medium.com/chute-engineering/graphql-in-the-age-of-rest-apis-b10f2bf09bba

GraphQL logos are trademarked by Facebook — all rights reserved.