Deploying Javascript functions on Google Big Query

Google BigQuery supports user-defined functions (UDFs) written in Javascript as well as SQL, this opens up a whole new world of capabilities that UDFs could provide.

Let's start by running through some basics for using Javascript in UDFs in BigQuery. The examples here uses standard SQL mode as that's the preferred syntax.

In the examples below, temporary UDFs are created but this just makes testing and development easier.

Return types

First, a simple Javascript UDF to return a single boolean value:

CREATE TEMP FUNCTION booleanExample()
 RETURNS BOOLEAN
 LANGUAGE js AS r"""
   return true;
 """
;

SELECT booleanExample() AS result;

Next a UDF that returns an array of strings:

CREATE TEMP FUNCTION arrayOfStringsExample()
  RETURNS ARRAY<STRING>
  LANGUAGE js AS r"""
    return ['one', 'two', 'three'];
  """
;

SELECT arrayOfStringsExample() AS result;

Now a UDF that returns a Javascript object:

CREATE TEMP FUNCTION objectExample()
  RETURNS STRUCT<one INT64, two INT64, three INT64>
  LANGUAGE js AS r"""
    return {one: 1, two: 2, three: 3};
  """
;

SELECT objectExample() AS result;

If you were to specify additional fields in the STRUCT which are not present in the Javascript object they will simply be set as null. The reverse is also true, so you can omit fields from the resulting STRUCT, any other fields in the Javascript object will not be returned.

Next, a UDF that returns an array of objects:

CREATE TEMP FUNCTION arrayOfObjectsExample()
  RETURNS ARRAY<STRUCT<one INT64, two INT64, three INT64>>
  LANGUAGE js AS r"""
    return [
      {one: 1, two: 2, three: 3},
      {one: 1, two: 2, three: 3}
    ];
  """
;

SELECT arrayOfObjectsExample() AS result;

Finally, a UDF that takes a string as input:

CREATE TEMP FUNCTION echo(word STRING)
  RETURNS STRING
    LANGUAGE js
    AS r"""
      return word;
    """
;

SELECT echo('echo') AS result;

Using these above examples you should be able to create a Javascript function to return the data types you need. A full list of supported data types is detailed in the Standard SQL user-defined functions documentation.

Bundling an asset

Using inline Javascript is fine for small or simple UDFs, but to leverage Javascript's wide variety of open source modules we will have to create a deployment asset that contains all our logic and code.

A common choice for this process is to use webpack in a similar way that you might for a web application.

The required webpack.config.js is very simple, we just define an entry point and and a resulting output filename:

module.exports = {
  entry: './src/index.js',
  mode: 'production',
  output: {
    filename: 'dist.js'
  }
}

Entry point

We'll create a index.js file which will allow all of our functionality to be surfaced and usable in our BigQuery UDFs:

const typeExamples = require('./typeExamples')

const index = module.exports = {}

index.typeexamples = typeExamples
index.helloWorld = () => 'Hello World!'

// Put functionality you need in the global scope for BigQuery usage
Object.assign(global, index)

In the example above we are including our example functions from another file and defining a simple hello world function. If you have a larger amount of functions you can use npm modules such as requireAll.

Your javascript code can be unit tested in the usual way like any other node.js application.

Deployment

To use a prebuilt Javascript asset as a function we will need to first deploy this to Google Cloud Storage (GCS). The simplest solution here is to manually copy the dist.js that webpack creates to a GCS bucket.

For a more formal and orchestrated process, we can create a Javascript script to deploy our asset to GCS. The code below is a quick example of this.

const { Storage } = require('@google-cloud/storage')

const upload = async () => {
    const storage = new Storage()
  try {
    await storage.createBucket('YOUR_BUCKET')
  } catch (e) {  // Don't worry if the bucket already exists
    if (e.message !== 'Sorry, that name is not available. Please try a different one.') throw e
  }
  return storage.bucket('YOUR_BUCKET')
    .upload('dist/dist.js', {
      destination: 'bigquery-js-udf-example/dist/dist.js'
    })
}

upload()
  .catch((e) => {
    console.error(e.message)
    process.exit(1)
  })
  .then(() => {
    console.log('...Success')
  })

Authentication

All scripts that use GCP features will require that your terminal is already authenticated with GCP.

Consult Google's docs for installing the SDK, then the following commands can be used to configure authentication.

gcloud auth login
gclud auth application-default login

Once the Google Cloud SDK has been installed authentication can easily be checked with the command:

gcloud auth list

Check your CI provider's documentation for how to configure GCP authentication when deploying it's normally a case of configuring environment variables with a previously created service account.

Testing

Now we are ready to test our new asset with BigQuery. The simplest manual approach would be to call the UDF in the GCP BigQuery console as follows:

CREATE TEMP FUNCTION booleanExample()
  RETURNS BOOLEAN
    LANGUAGE js
    OPTIONS (
      library=["gs://YOUR_BUCKET/bigquery-js-udf-example/dist/dist.js"]
    )
    AS r"""
      return typeExamples.boolean();
    """
;

SELECT booleanExample() AS result;

We can automate testing using the same framework that we wrote the unit tests in.

const { BigQuery } = require('@google-cloud/bigquery')

const runUdf = async (client, returnType, js) => {
  const query = `
    CREATE TEMP FUNCTION testFunction()
      RETURNS ${returnType}
        LANGUAGE js
        OPTIONS (
          library=["gs://YOUR_BUCKET/bigquery-js-udf-example/dist/dist.js"]
        )
        AS r"""
          ${js}
        """
    ;

    SELECT testFunction() AS result;`
  const [job] = await client.createQueryJob({
    query
  })
  const [rows] = await job.getQueryResults()
  return rows[0].result
}

describe('BigQuery tests', () => {
  let bigquery

  beforeAll(() => {
    bigquery = new BigQuery({
      projectId: process.env.GCP_PROJECT_ID
    })
  })

  it('typeExamples.boolean()', async () => {
    const result = await runUdf(bigquery, 'BOOLEAN', 'return typeExamples.boolean();')
    expect(result).toBe(true)
  })

NPM Modules

Now that we have our JS asset being built with webpack and a method of deploying our code to GCS, we can include other modules as required. This can be included in the same way as in any other client-side project. However, there are some Limitations to what you can use.

The example below uses the sillyname NPM module, showing that including 3rd party code is just like with any other node.js application:

const generateName = require('sillyname')
const index = module.exports = {}
index.sillyName = generateName
Object.assign(global, index)

The sillyName function can now be accessed in your Javascript UDFs in the same way as the previous functions.

Wrapping up

Javascript UDF's allows you to leverage the large pool of open source modules into your UDF's. These functions can be easily unit and system tested using popular JS testing frameworks such as Jest or Mocha.

For full working examples, tests, and deployment scripts can be found on the accompanying GitHub Project at:

https://github.com/thedumbterminal/bigquery-js-udf-example

Last updated: 24/03/2021