The tap-rest-api-msdk extractor pulls data from REST API that can then be sent to a destination using a loader.
Getting Started
Prerequisites
If you haven't already, follow the initial steps of the Getting Started guide:
Installation and configuration
-
Add the tap-rest-api-msdk extractor to your
project using
:meltano add
-
Configure the tap-rest-api-msdk
settings using
:meltano config
-
Test that extractor settings are valid using
:meltano config
meltano add extractor tap-rest-api-msdk
meltano config tap-rest-api-msdk set --interactive
meltano config tap-rest-api-msdk test
Next steps
Follow the remaining steps of the Getting Started guide:
If you run into any issues, learn how to get help.
Capabilities
The current capabilities for
tap-rest-api-msdk
may have been automatically set when originally added to the Hub. Please review the
capabilities when using this extractor. If you find they are out of date, please
consider updating them by making a pull request to the YAML file that defines the
capabilities for this extractor.
This plugin has the following capabilities:
- catalog
- state
- discover
- about
- stream-maps
- schema-flattening
You can
override these capabilities or specify additional ones
in your meltano.yml
by adding the capabilities
key.
Settings
The
tap-rest-api-msdk
settings that are known to Meltano are documented below. To quickly
find the setting you're looking for, click on any setting name from the list:
api_url
next_page_token_path
pagination_request_style
pagination_response_style
pagination_page_size
path
params
headers
records_path
primary_keys
replication_key
except_keys
num_inference_records
streams
stream_maps
stream_map_config
flattening_enabled
flattening_max_depth
You can also list these settings using
with the meltano config
list
subcommand:
meltano config tap-rest-api-msdk list
You can
override these settings or specify additional ones
in your meltano.yml
by adding the settings
key.
Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.
API URL (api_url)
-
Environment variable:
TAP_REST_API_MSDK_API_URL
the base url/endpoint for the desired api
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set api_url [value]
Next Page Token Path (next_page_token_path)
-
Environment variable:
TAP_REST_API_MSDK_NEXT_PAGE_TOKEN_PATH
a jsonpath string representing the path to the 'next page' token. Defaults to $.next_page
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set next_page_token_path [value]
Pagination Request Style (pagination_request_style)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_REQUEST_STYLE
the pagination style to use for requests. Defaults to default
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_request_style [value]
Pagination Response Style (pagination_response_style)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_RESPONSE_STYLE
the pagination style to use for response. Defaults to default
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_response_style [value]
Pagination Page Size (pagination_page_size)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_PAGE_SIZE
the size of each page in records. Defaults to None
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_page_size [value]
Path (path)
-
Environment variable:
TAP_REST_API_MSDK_PATH
the path appended to the api_url
. Stream-level path will overwrite top-level path
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set path [value]
Params (params)
-
Environment variable:
TAP_REST_API_MSDK_PARAMS
an object providing the params
in a requests.get
method. Stream level params will be mergedwith top-level params with stream level params overwritingtop-level params with the same key.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set params [value]
Headers (headers)
-
Environment variable:
TAP_REST_API_MSDK_HEADERS
An object of headers to pass into the api calls. Stream levelheaders will be merged with top-level params with streamlevel params overwriting top-level params with the same key
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set headers [value]
Records Path (records_path)
-
Environment variable:
TAP_REST_API_MSDK_RECORDS_PATH
a jsonpath string representing the path in the requests response that contains the records to process. Defaults to $[*]
. Stream level records_path will overwrite the top-level records_path
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set records_path [value]
Primary Keys (primary_keys)
-
Environment variable:
TAP_REST_API_MSDK_PRIMARY_KEYS
a list of the json keys of the primary key for the stream.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set primary_keys [value]
Replication Key (replication_key)
-
Environment variable:
TAP_REST_API_MSDK_REPLICATION_KEY
the json key of the replication key. Note that this should be an incrementing integer or datetime object.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set replication_key [value]
Except Keys (except_keys)
-
Environment variable:
TAP_REST_API_MSDK_EXCEPT_KEYS
This tap automatically flattens the entire json structure and builds keys based on the corresponding paths.; Keys, whether composite or otherwise, listed in this dictionary will not be recursively flattened, but instead their values will be; turned into a json string and processed in that format. This is also automatically done for any lists within the records; therefore, records are not duplicated for each item in lists.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set except_keys [value]
Num Inference Records (num_inference_records)
-
Environment variable:
TAP_REST_API_MSDK_NUM_INFERENCE_RECORDS
number of records used to infer the stream's schema. Defaults to 50.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set num_inference_records [value]
Streams (streams)
-
Environment variable:
TAP_REST_API_MSDK_STREAMS
An array of streams, designed for separate paths using thesame base url.
Stream level config options.
Parameters that appear at the stream-level will overwrite their top-level counterparts except where noted below:
- name: required: name of the stream.
- path: optional: the path appended to the api_url.
- params: optional: an object of objects that provide the params in a requests.get method. Stream level params will be merged with top-level params with stream level params overwriting top-level params with the same key.
- headers: optional: an object of headers to pass into the api calls. Stream level headers will be merged with top-level params with stream level params overwriting top-level params with the same key
- records_path: optional: a jsonpath string representing the path in the requests response that contains the records to process. Defaults to $[*].
- primary_keys: required: a list of the json keys of the primary key for the stream.
- replication_key: optional: the json key of the replication key. Note that this should be an incrementing integer or datetime object.
- except_keys: This tap automatically flattens the entire json structure and builds keys based on the corresponding paths. Keys, whether composite or otherwise, listed in this dictionary will not be recursively flattened, but instead their values will be turned into a json string and processed in that format. This is also automatically done for any lists within the records; therefore, records are not duplicated for each item in lists.
- num_inference_keys: optional: number of records used to infer the stream's schema. Defaults to 50.
- schema: optional: A valid Singer schema or a path-like string that provides the path to a .json file that contains a valid Singer schema. If provided, the schema will not be inferred from the results of an api call.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set streams [value]
Stream Maps (stream_maps)
-
Environment variable:
TAP_REST_API_MSDK_STREAM_MAPS
Config object for stream maps capability.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set stream_maps [value]
Stream Map Config (stream_map_config)
-
Environment variable:
TAP_REST_API_MSDK_STREAM_MAP_CONFIG
User-defined config values to be used within map expressions.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set stream_map_config [value]
Flattening Enabled (flattening_enabled)
-
Environment variable:
TAP_REST_API_MSDK_FLATTENING_ENABLED
'True' to enable schema flattening and automatically expand nested properties.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set flattening_enabled [value]
Flattening Max Depth (flattening_max_depth)
-
Environment variable:
TAP_REST_API_MSDK_FLATTENING_MAX_DEPTH
The max depth to flatten schemas.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set flattening_max_depth [value]
Examples
An example retrieving publicly available earthquake data is described in this blog. The configuration yaml for that API example should look like the following:
- name: tap-rest-api-msdk
variant: widen
pip_url: tap-rest-api-msdk
config:
api_url: https://earthquake.usgs.gov/fdsnws
streams:
- name: us_earthquakes
params:
format: geojson
starttime: '2022-12-07'
endtime: '2022-12-08'
minmagnitude: 1
path: /event/1/query
primary_keys:
- id
records_path: $.features[*]
num_inference_records: 200
select:
- '*.*'
Something missing?
This page is generated from a YAML file that you can contribute changes to.
Edit it on GitHub!Looking for help?
#plugins-general
channel.