After 20 years building software, the most important principle I’ve learned is: don’t repeat yourself (DRY).
Unfortunately, there’s no standard way to avoid repeating type information multiple times as data gets passed through a modern data stack. Here are is what that stack might look like from start to finish, along with how types are represented at each layer:
This means that in a typical setup, there may be five copies of your data type schema, all of which you need to maintain separately since there may be small differences.
Copying of course takes more time, but the real impact is far greater.
If you’re an experienced software engineer, then you know what happens next: if developers can make mistakes, they will make them. What’s worse, it can be hard to tell the difference between a mistake and an intentional change.
The most common problem we have seen at minware is types becoming more permissive unnecessarily (e.g., allowing them to be null even if they are never null). This can be a headache to trace back and figure out how to handle on the front end.
While more permissive types don’t directly cause bugs, they cause developers to start adding non-null type assertions or any
types on the front end. This effectively bypasses the type system, which does cause bugs when developers inevitably make mistakes or the back end changes.
And, if you think developers are going to keep documentation comments up-to-date everywhere, you obviously have never worked with a developer.
At minware, this has been wasting a lot of time and causing us a lot of anxiety. Here we explain how to fix it.
##Why Direct Translation from Typescript to GraphQL is Hard
Since this is a common problem, you would think that there are a lot of easy solutions out there already.
After a lot of searching, I began to understand why directly translating types is hard, at least out of Typescript.
The problem with Typescript is that the typing information is not actually available when the code runs.
This means that you cannot write code that will directly translate the type of a Typescript object into, let’s say, a GraphQL Nexus type definition, because the Typescript object does not exist at runtime.
If you want to do that, then you have to use a code generator to actually parse the Typescript type definitions and write GraphQL definitions to a file.
If you have experience with this type of thing, then you understand why other people aren’t doing it. Aside from the clunkiness of having to run the code generator before running the code itself, maintaining a robust translator from Typescript to GraphQL would be a lot of work.
After concluding that type extraction from Typescript was a bad idea, the next thought was to translate from database types into X, and then from X into both Typescript and GraphQL independently. The only question is: what is the best choice for X?
Since the first step in the stack (reading from an external API) uses JSON schema as an input, and it is widely supported, this seemed like the way to go.
With this approach, you only need to get JSON schema files from somewhere, drop them into your application, and then automatically produce both your Typescript and GraphQL definitions.
We won’t cover how to obtain a JSON schema file for your database/data warehouse schema here, but there are tools to produce a JSON schema file from SQL statements. dbt’s output catalog isn’t in JSON schema format, but writing a translator is straightforward since it is already in a JSON format.
Once you have a source JSON schema in place, the next question is how to translate it into Typescript.
For this there are two popular packages, each of which takes a different approach. First, json-schema-to-typescript uses code generation to read a JSON schema file and emit a corresponding Typescript file.
The nice thing about this approach is that it produces doc strings on the type definitions using the descriptions from the JSON schema.
The downside is that any transformations you want to make from one JSON schema to another (e.g., going from database to application types by filling in object references) will also have to use code generation. Having multiple layers of code generation tends to make development slower and more error prone.
The other solution, json-schema-to-ts, works purely in typescript by taking a constant JSON schema definition and converting it directly into a type.
This gets around the translation problem because you can just define a new JSON schema that extends an earlier one, and then translate that into Typescript as well.
The main limitation of this is that the base JSON schema definition has to be in a Typescript file, not a JSON file (importing JSON as a typed value in Typescript is not yet supported). However, this didn’t seem like a big deal because we had to generate the original JSON schema anyway – this just necessitated wrapping const schema = ... as const;
around it and putting it in a .ts file.
There was not an existing package that we could find for translating the JSON schema into GraphQL object definitions, but it was fairly easy to create one.
We are using GraphQL Nexus, which allows you to define your GraphQL objects using functions.
So, we imported all of the JSON schema objects and produced object and union definitions dynamically using Nexus, including descriptions. We were able to define scalars (like dates) using the “format” property, and infer nullability based on the presence of “null” in the type list.
To make this easier, we wrote a simple code generator that runs before Nexus. It scans files in a directory and creates an index file that imports and re-exports all the schema definitions, so you can just do something like:
import * as schemas from 'schema-index';
The code we wrote is a bit hacky and only has basic error checking, but it all fit within 100 lines.
There were a few hiccups that we ran into along the way, and this section talks about how we dealt with them.
First, our code uses enums rather than types that are unions of string values. This has various advantages, and we didn’t want to lose our enums.
When defining types using json-schema-to-ts, it will do the right thing and reference the enum type if you do the following:
type: 'string',
enum: Object.values(EnumType),
However, for this to work, the enum needs to be defined outside of the JSON schema declaration. Though this makes generating your root JSON schema definition files a little less straightforward because you have to break out the enum definitions, we felt it was worth it and just went ahead and did it, producing enum definitions in a separate file and importing them into the JSON schema definition files.
The next issue you run into with this is that the name of the enum type won’t be there when you read the JSON schema to translate it into GraphQL. To get around this, you can add an extra property like the following to the definition above, which you can then reference in the GraphQL translator:
typeName: 'EnumType',
Finally, the GraphQL translator needs to import all of the enums themselves and declare them as enum types in Nexus. We also updated the generation step that runs before Nexus to import and re-export all the enums to make this part easy.
References to other types need to be handled properly both when translating to Typescript, and when translating to GraphQL.
In json-schema-to-ts, one documented way of doing object references is to put a ref in the referring type like so:
$ref: "user.json"
Then, in the declaration of the user type, you would put:
$id: "user.json"
Finally, when you instantiate the Typescript type, you can put:
type Org = FromSchema<
typeof usersSchema,
{ references: [typeof userSchema] }
>;
However, there are two problems with this. The first is that the referencing object Org
will define the type inline rather than knowing that it should be another typescript type User
, which makes type definitions confusing when you try to read them.
Second, it requires listing all of the other schemas including references of User
, their references, etc. in the references
list, which can become unwieldy if you have a complex type tree.
To get around these issues, we used another hack instead, which is to define the types using the deserialize
parameter of FromSchema
. With this method, you declare a reference to another type like this:
type: 'string',
format: 'type:User',
Then, you put this in the deserialize parameter of FromSchema
:
type Org = FromSchema<
typeof usersSchema,
{ deserialize: [
pattern: {
type: 'string';
format: type:User';
};
output: User;
] }
>;
And voila, the Org
type references the User
Typescript type without inlining it or needing to know about its references.
The Nexus type generation can then look for formats matching the type:X pattern and declare them properly as well.
This does require abusing the JSON schema format a bit, but we found that acceptable because the original source type definitions came from a database and only had IDs anyway, so we didn’t need to create these references until we produced derived types inside of our code.
One important step when going from database types to application types is transforming objects in various ways.
One type of translation we needed to do was going from snake case database type names like org_member
into camel case names like orgMember
.
Luckily, Typescript can do this, so a straight property name translation can happen as follows (see the code for the conversion functions in the appendix):
export const OrgSchema = {
...OrgSchemaSql,
title: 'Org',
properties: snakeToCamelProps(OrgSchemaSql.properties),
required: snakeToCamelArray(OrgSchemaSql.required),
} as const;
However, you may also want to do more complex translations, like picking or excluding specific properties. This can be done in a similar way:
export const UserSchema = {
...UserSchemaSql,
title: 'User',
properties: excludeObjectProps(
UserSchemaSql.properties,
['pwhash'],
),
required: excludeArrayValues(
UserSchemaSql.required,
['pwhash'],
),
} as const;
You can also merge two types together, like if you want to union the properties from a base table with a version table:
const MergedSchema = {
...BaseSchema,
...VersionSchema,
properties: {
...BaseSchema.properties,
...VersionSchema.properties,
},
required: [
...BaseSchema.required,
...VersionSchema.required,
],
...overrides,
} as const;
The possibilities are endless. You can exclude and add new properties as well, like if you have a JSON-encoded string in your database and you want the application to pass it around as typed object.
If your schema is sufficiently complex, you probably have union types somewhere, like Report = ReportA | ReportB
.
When you declare a union type in Typescript, it will magically figure out which type you are referring to based on a union key property within all the union members, like reportType: ReportType.A
.
JSON schemas also allow union types with the anyOf
keyword. However, when you are building a union in Nexus, you need to explicitly define a function that determines the type using the properties in an object.
While you could probably write code to infer this automatically, we found it easier to just declare the union key property and value explicitly in the JSON schema as follows:
anyOfKeyProp: 'reportType',
anyOf: [
{
type: 'string',
format: 'type:ReportA',
keyValue: ReportType.A,
},
{
type: 'string',
format: 'type:ReportB',
keyValue: ReportType.B,
},
],
In the definition of the ReportA
schema, you would have a property:
reportType: {
type: 'string',
enum: [ReportType.A],
typeName: 'ReportType',
},
Then, when creating the unionType
in Nexus, you can use those values to implement a resolveType
function to make Nexus happy.
Here we showed how to conquer code repetition for type declarations between your database and GraphQL API. In one step, any database type definition changes (including comments!) will automatically flow through to your GraphQL API and server Typescript types.
While we couldn’t find an existing standard way to do this in any examples or packages, it ended up not being too complicated using json-schema-to-ts and GraphQL Nexus.
Our hope is that you too can follow this guide and make your code more DRY.
Contact us if you would like to see more of the code, and we may publish a package at some point in the future if there is interest.
Here is the code that we referenced above for applying basic transformations in Typescript. All the code in this post is free to use under the MIT license.
The SnakeToCamelCase
definition can be found in this post.
export type SnakeToCamelCaseLower<
S extends string | number | symbol
> = S extends string
? SnakeToCamelCase<Lowercase<S>>
: S;
export function snakeToCamelString<S extends string>(
str: S,
): SnakeToCamelCase<S> {
const snakeSplit = str.split(/_+/)
.filter(splitStr => splitStr)
.map((splitStr: string, idx: number) =>
(idx === 0
? splitStr.toLowerCase()
: `${splitStr[0].toUpperCase()}$
splitStr.substring(1)}`))
.join('');
return snakeSplit as SnakeToCamelCase<S>;
}
export function snakeToCamelProps<T extends object>(
obj: T,
): { [K in keyof T as SnakeToCamelCaseLower<K>]: T[K] } {
const camelObject = {} as any;
for (const k in obj) {
const camelKey = snakeToCamelString(k);
camelObject[camelKey] = obj[k];
}
return camelObject;
}
export function snakeToCamelArray<T extends string>(
strList: readonly T[],
) {
return strList.map(str => snakeToCamelString(str));
}
export function excludeArray<
T extends V | string, V extends string
>(
strList: readonly T[],
omitStrList: readonly V[],
): Exclude<T, V>[] {
return strList.filter(
(str: T): str is Exclude<T, V> =>
!(omitStrList as T[]).includes(str),
);
}
export function omitObjectProps<
T extends { [k in V]: any },
V extends string
>(
obj: T,
omitPropList: readonly V[],
): Omit<T, V> {
const omitObject = {} as Omit<T, V>;
const keys = Object.keys(obj);
for (const k of keys) {
if (!omitPropList.includes(k as V)) {
omitObject[k] = obj[k];
}
}
return omitObject;
}