Metadata-Version: 2.4
Name: acryl-datahub-airflow-plugin
Version: 1.6.0.1rc1
Summary: Datahub Airflow plugin to capture executions and send to Datahub
Home-page: https://docs.datahub.com/
License: Apache-2.0
Project-URL: Documentation, https://docs.datahub.com/docs/
Project-URL: Source, https://github.com/datahub-project/datahub
Project-URL: Changelog, https://github.com/datahub-project/datahub/releases
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: apache-airflow-providers-openlineage>=2.1.0
Requires-Dist: acryl-datahub[datahub-rest,sql-parser]==1.6.0.1rc1
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.1rc1
Requires-Dist: apache-airflow<4.0.0,>=3.0.0
Requires-Dist: pydantic>=2.4.0
Provides-Extra: ignore
Provides-Extra: airflow3
Provides-Extra: datahub-rest
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.1rc1; extra == "datahub-rest"
Provides-Extra: datahub-kafka
Requires-Dist: acryl-datahub[datahub-kafka]==1.6.0.1rc1; extra == "datahub-kafka"
Provides-Extra: datahub-file
Requires-Dist: acryl-datahub[sync-file-emitter]==1.6.0.1rc1; extra == "datahub-file"
Provides-Extra: dev
Requires-Dist: acryl-datahub[datahub-rest,sql-parser]==1.6.0.1rc1; extra == "dev"
Requires-Dist: pytest>=6.2.2; extra == "dev"
Requires-Dist: ruff==0.11.7; extra == "dev"
Requires-Dist: types-dataclasses; extra == "dev"
Requires-Dist: packaging; extra == "dev"
Requires-Dist: mypy==1.17.1; extra == "dev"
Requires-Dist: tenacity; extra == "dev"
Requires-Dist: tox-uv; extra == "dev"
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.1rc1; extra == "dev"
Requires-Dist: types-click==0.1.12; extra == "dev"
Requires-Dist: sqlalchemy-stubs; extra == "dev"
Requires-Dist: types-tabulate; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: types-toml; extra == "dev"
Requires-Dist: apache-airflow-providers-openlineage>=2.1.0; extra == "dev"
Requires-Dist: apache-airflow<4.0.0,>=3.0.0; extra == "dev"
Requires-Dist: types-cachetools; extra == "dev"
Requires-Dist: types-python-dateutil; extra == "dev"
Requires-Dist: pydantic>=2.4.0; extra == "dev"
Requires-Dist: pytest-cov>=2.8.1; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: deepdiff!=8.0.0; extra == "dev"
Requires-Dist: coverage>=5.1; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Requires-Dist: types-six; extra == "dev"
Provides-Extra: integration-tests
Requires-Dist: apache-airflow-providers-sqlite; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-snowflake; extra == "integration-tests"
Requires-Dist: virtualenv; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-amazon; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-teradata; extra == "integration-tests"
Requires-Dist: acryl-datahub[testing-utils]==1.6.0.1rc1; extra == "integration-tests"
Requires-Dist: acryl-datahub[datahub-kafka]==1.6.0.1rc1; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-google; extra == "integration-tests"
Requires-Dist: acryl-datahub[sync-file-emitter]==1.6.0.1rc1; extra == "integration-tests"
Requires-Dist: snowflake-connector-python>=2.7.10; extra == "integration-tests"
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Datahub Airflow Plugin

See [the DataHub Airflow docs](https://docs.datahub.com/docs/lineage/airflow) for details.

## Version Compatibility

The plugin supports Apache Airflow 3.0+. Airflow 2.x is not supported — pin
`acryl-datahub-airflow-plugin <= 1.6.0` (the last release with Airflow 2 support)
if you need to integrate with Airflow 2.

| Airflow Version | Status             | Notes                |
| --------------- | ------------------ | -------------------- |
| 2.x             | ❌ Unsupported     | Use version <= 1.6.0 |
| 3.0+            | ✅ Fully Supported |                      |

## Installation

```bash
pip install acryl-datahub-airflow-plugin
```

This installs:

- `acryl-datahub[sql-parser,datahub-rest]` — DataHub SDK with SQL parsing and REST emitter
- `pydantic>=2.4.0`
- `apache-airflow>=3.0.0,<4.0.0`
- `apache-airflow-providers-openlineage>=2.1.0`

### Optional extras

```bash
pip install 'acryl-datahub-airflow-plugin[datahub-kafka]'   # Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-file]'    # File emitter (testing)
```

## Configuration

The plugin can be configured via `airflow.cfg` under the `[datahub]` section. Below are the key configuration options:

### Extractor Patching (OpenLineage Enhancements)

When `enable_extractors=True` (default), the DataHub plugin enhances OpenLineage extractors to provide better lineage. You can fine-tune these enhancements:

```ini
[datahub]
# Enable/disable all OpenLineage extractors
enable_extractors = True  # Default: True

# Enable multi-statement SQL parsing (resolves temp tables, merges lineage)
enable_multi_statement_sql_parsing = False  # Default: False

# Patch SQLParser to use DataHub's advanced SQL parser (enables column-level lineage)
patch_sql_parser = True  # Default: True

# Use DataHub's enhancements for specific operators
extract_athena_operator = True              # Default: True
extract_bigquery_insert_job_operator = True # Default: True
extract_teradata_operator = True            # Default: True
```

**Multi-Statement SQL Parsing:**

When `enable_multi_statement_sql_parsing=True`, if a task executes multiple SQL statements (e.g., `CREATE TEMP TABLE ...; INSERT ... FROM temp_table;`), DataHub parses all statements together and resolves temporary table dependencies within that task. By default (False), only the first statement is parsed.

**How patches work:**

The DataHub plugin monkey-patches OpenLineage extractors at runtime:

- `patch_sql_parser=True` patches `SQLParser.generate_openlineage_metadata_from_sql()` to use DataHub's parser, enabling more accurate lineage and column-level lineage.
- `extract_athena_operator` / `extract_bigquery_insert_job_operator` / `extract_teradata_operator` patch the corresponding operator's `get_openlineage_facets_on_complete()` method with DataHub's enhanced implementation.

### Example: disable DataHub's SQL parser

```ini
[datahub]
enable_extractors = True
patch_sql_parser = False
```

### Other Configuration Options

For a complete list of configuration options, see the [DataHub Airflow documentation](https://docs.datahub.com/docs/lineage/airflow#configuration).

## Developing

See the [developing docs](../../metadata-ingestion/developing.md).
