target-bigquery - Meltano Hub

Google BigQuery

target-bigquery (z3z1ma variant)

BigQuery loader

The target-bigquery loader sends data into Google BigQuery after it was pulled from a source using an extractor

Available Variants

adswerve
jmriego
youcruit
z3z1ma (default)

Getting Started

Prerequisites

If you haven't already, follow the initial steps of the Getting Started guide:

Installation and configuration

Add the target-bigquery loader to your project using
```
meltano add
```
:

meltano add loader target-bigquery

Configure the target-bigquery settings using

meltano config

:

meltano config target-bigquery set --interactive

Next steps

Follow the remaining steps of the Getting Started guide:

Run a data integration (EL) pipeline

If you run into any issues, learn how to get help.

Capabilities

The current capabilities for target-bigquery may have been automatically set when originally added to the Hub. Please review the capabilities when using this loader. If you find they are out of date, please consider updating them by making a pull request to the YAML file that defines the capabilities for this loader.

This plugin has the following capabilities:

about
schema-flattening
stream-maps

You can override these capabilities or specify additional ones in your meltano.yml by adding the capabilities key.

Settings

The target-bigquery settings that are known to Meltano are documented below. To quickly find the setting you're looking for, click on any setting name from the list:

You can also list these settings using

meltano config

with the list subcommand:

meltano config target-bigquery list

You can override these settings or specify additional ones in your meltano.yml by adding the settings key.

Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.

Credentials Path (credentials_path)

Environment variable: TARGET_BIGQUERY_CREDENTIALS_PATH

The path to a gcp credentials json file.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set credentials_path [value]

Credentials Json (credentials_json)

Environment variable: TARGET_BIGQUERY_CREDENTIALS_JSON

A JSON string of your service account JSON file.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set credentials_json [value]

Project (project)

Environment variable: TARGET_BIGQUERY_PROJECT

The target GCP project to materialize data into.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set project [value]

Dataset (dataset)

Environment variable: TARGET_BIGQUERY_DATASET

The target dataset to materialize data into.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set dataset [value]

Location (location)

Environment variable: TARGET_BIGQUERY_LOCATION

The target dataset/bucket location to materialize data into.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set location [value]

Batch Size (batch_size)

Environment variable: TARGET_BIGQUERY_BATCH_SIZE

The maximum number of rows to send in a single batch or commit.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set batch_size [value]

Fail Fast (fail_fast)

Environment variable: TARGET_BIGQUERY_FAIL_FAST

Fail the entire load job if any row fails to insert.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set fail_fast [value]

Timeout (timeout)

Environment variable: TARGET_BIGQUERY_TIMEOUT

Default timeout for batch_job and gcs_stage derived LoadJobs.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set timeout [value]

Denormalized (denormalized)

Environment variable: TARGET_BIGQUERY_DENORMALIZED

Determines whether to denormalize the data before writing to BigQuery. A false value will write data using a fixed JSON column based schema, while a true value will write data using a dynamic schema derived from the tap.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set denormalized [value]

Method (method)

Environment variable: TARGET_BIGQUERY_METHOD

The method to use for writing to BigQuery.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set method [value]

Generate View (generate_view)

Environment variable: TARGET_BIGQUERY_GENERATE_VIEW

Determines whether to generate a view based on the SCHEMA message parsed from the tap. Only valid if denormalized=false meaning you are using the fixed JSON column based schema.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set generate_view [value]

Bucket (bucket)

Environment variable: TARGET_BIGQUERY_BUCKET

The GCS bucket to use for staging data. Only used if method is gcs_stage.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set bucket [value]

Partition Granularity (partition_granularity)

Environment variable: TARGET_BIGQUERY_PARTITION_GRANULARITY

The granularity of the partitioning strategy. Defaults to month.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set partition_granularity [value]

Cluster On Key Properties (cluster_on_key_properties)

Environment variable: TARGET_BIGQUERY_CLUSTER_ON_KEY_PROPERTIES

Determines whether to cluster on the key properties from the tap. Defaults to false. When false, clustering will be based on _sdc_batched_at instead.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set cluster_on_key_properties [value]

Column Name Transforms Lower (column_name_transforms.lower)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_LOWER

Lowercase column names

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms lower [value]

Column Name Transforms Quote (column_name_transforms.quote)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_QUOTE

Quote columns during DDL generation

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms quote [value]

Column Name Transforms Add Underscore When Invalid (column_name_transforms.add_underscore_when_invalid)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_ADD_UNDERSCORE_WHEN_INVALID

Add an underscore when a column starts with a digit

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms add_underscore_when_invalid [value]

Column Name Transforms Snake Case (column_name_transforms.snake_case)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_SNAKE_CASE

Convert columns to snake case

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms snake_case [value]

Options Storage Write Batch Mode (options.storage_write_batch_mode)

Environment variable: TARGET_BIGQUERY_OPTIONS_STORAGE_WRITE_BATCH_MODE

By default, we use the default stream (Committed mode) in the storage_write_api load method which results in streaming records which are immediately available and is generally fastest. If this is set to true, we will use the application created streams (Committed mode) to transactionally batch data on STATE messages and at end of pipe.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set options storage_write_batch_mode [value]

Options Process Pool (options.process_pool)

Environment variable: TARGET_BIGQUERY_OPTIONS_PROCESS_POOL

By default we use an autoscaling threadpool to write to BigQuery. If set to true, we will use a process pool.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set options process_pool [value]

Options Max Workers (options.max_workers)

Environment variable: TARGET_BIGQUERY_OPTIONS_MAX_WORKERS

By default, each sink type has a preconfigured max worker pool limit. This sets an override for maximum number of workers in the pool.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set options max_workers [value]

Upsert (upsert)

Environment variable: TARGET_BIGQUERY_UPSERT

Determines if we should upsert. Defaults to false. A value of true will write to a temporary table and then merge into the target table (upsert). This requires the target table to be unique on the key properties. A value of false will write to the target table directly (append). A value of an array of strings will evaluate the strings in order using fnmatch. At the end of the array, the value of the last match will be used. If not matched, the default value is false (append).

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set upsert [value]

Overwrite (overwrite)

Environment variable: TARGET_BIGQUERY_OVERWRITE

Determines if the target table should be overwritten on load. Defaults to false. A value of true will write to a temporary table and then overwrite the target table inside a transaction (so it is safe). A value of false will write to the target table directly (append). A value of an array of strings will evaluate the strings in order using fnmatch. At the end of the array, the value of the last match will be used. If not matched, the default value is false. This is mutually exclusive with the upsert option. If both are set, upsert will take precedence.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set overwrite [value]

Dedupe Before Upsert (dedupe_before_upsert)

Environment variable: TARGET_BIGQUERY_DEDUPE_BEFORE_UPSERT

This option is only used if upsert is enabled for a stream. The selection criteria for the stream's candidacy is the same as upsert. If the stream is marked for deduping before upsert, we will create a _session scoped temporary table during the merge transaction to dedupe the ingested records. This is useful for streams that are not unique on the key properties during an ingest but are unique in the source system. Data lake ingestion is often a good example of this where the same unique record may exist in the lake at different points in time from different extracts.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set dedupe_before_upsert [value]

Stream Maps (stream_maps)

Environment variable: TARGET_BIGQUERY_STREAM_MAPS

Config object for stream maps capability. For more information check out Stream Maps.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set stream_maps [value]

Stream Map Config (stream_map_config)

Environment variable: TARGET_BIGQUERY_STREAM_MAP_CONFIG

User-defined config values to be used within map expressions.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set stream_map_config [value]

Flattening Enabled (flattening_enabled)

Environment variable: TARGET_BIGQUERY_FLATTENING_ENABLED

'True' to enable schema flattening and automatically expand nested properties.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set flattening_enabled [value]

Flattening Max Depth (flattening_max_depth)

Environment variable: TARGET_BIGQUERY_FLATTENING_MAX_DEPTH

The max depth to flatten schemas.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set flattening_max_depth [value]

Something missing?

This page is generated from a YAML file that you can contribute changes to.

Edit it on GitHub!

Looking for help?

If you're having trouble getting the target-bigquery loader to work, look for an existing issue in its repository, file a new issue, or join the Meltano Slack community and ask for help in the

#plugins-general

channel.

Install

meltano add loader target-bigquery

Maintenance Status

Repo

https://github.com/z3z1ma/target-bigquery

Maintainer

Meltano Stats

Keywords