Apache Airflow
Databricks Provider
Apache Airflow can orchestrate an RTDIP Pipeline that has been deployed as a Databricks Job. For further information on how to deploy an RTDIP Pipeline as a Databricks Job, please see here.
Databricks has also provided more information about running Databricks jobs from Apache Airflow here.
Prerequisites
- An Apache Airflow instance must be running.
- Authentication between Apache Airflow and Databricks must be configured.
- The python packages
apache-airflow
andapache-airflow-providers-databricks
must be installed. - You have created an RTDIP Pipeline and deployed it to Databricks.
Example
The JOB ID
in the example below can be obtained from the Databricks Job.
from airflow import DAG
from airflow.providers.databricks.operators.databricks import DatabricksRunNowOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow'
}
with DAG('databricks_dag',
start_date = days_ago(2),
schedule_interval = None,
default_args = default_args
) as dag:
opr_run_now = DatabricksRunNowOperator(
task_id = 'run_now',
databricks_conn_id = 'databricks_default',
job_id = JOB_ID
)