DataHubGc
CLI based Ingestion
Install the Plugin
The datahub-gc source works out of the box with acryl-datahub.
Config Details
- Options
 - Schema
 
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
cleanup_expired_tokens  boolean  | Whether to clean up expired tokens or not  Default: True  | 
truncate_index_older_than_days  integer  | Indices older than this number of days will be truncated  Default: 30  | 
truncate_indices  boolean  | Whether to truncate elasticsearch indices or not which can be safely truncated  Default: True  | 
truncation_sleep_between_seconds  integer  | Sleep between truncation monitoring.  Default: 30  | 
truncation_watch_until  integer  | Wait for truncation of indices until this number of documents are left  Default: 10000  | 
The JSONSchema for this configuration is inlined below.
{
  "title": "DataHubGcSourceConfig",
  "type": "object",
  "properties": {
    "cleanup_expired_tokens": {
      "title": "Cleanup Expired Tokens",
      "description": "Whether to clean up expired tokens or not",
      "default": true,
      "type": "boolean"
    },
    "truncate_indices": {
      "title": "Truncate Indices",
      "description": "Whether to truncate elasticsearch indices or not which can be safely truncated",
      "default": true,
      "type": "boolean"
    },
    "truncate_index_older_than_days": {
      "title": "Truncate Index Older Than Days",
      "description": "Indices older than this number of days will be truncated",
      "default": 30,
      "type": "integer"
    },
    "truncation_watch_until": {
      "title": "Truncation Watch Until",
      "description": "Wait for truncation of indices until this number of documents are left",
      "default": 10000,
      "type": "integer"
    },
    "truncation_sleep_between_seconds": {
      "title": "Truncation Sleep Between Seconds",
      "description": "Sleep between truncation monitoring.",
      "default": 30,
      "type": "integer"
    }
  },
  "additionalProperties": false
}
Code Coordinates
- Class Name: 
datahub.ingestion.source.gc.datahub_gc.DataHubGcSource - Browse on GitHub
 
Questions
If you've got any questions on configuring ingestion for DataHubGc, feel free to ping us on our Slack.
Is this page helpful?