Databricks
Last updated
Last updated
The Databricks piece in ZBrain Flow provides integration with Databricks' unified data analytics platform, allowing you to execute SQL queries and manage data processing jobs directly from your workflows. This powerful connector enables you to interact with Databricks workspaces without manual intervention. With Databricks integration, you can build automations that run data transformations, query data lakes, monitor job executions, and incorporate big data processing into your business processes. This piece is ideal for data teams looking to automate their analytics workflows, integrate data processing with other business systems, and create consistent, repeatable data pipelines.
Click on the '+' button in the Flow and search for Databricks.
Select Databricks.
Decide on the action you need, then select it. ZBrain Flow provides several options:
Run SQL Command – Execute SQL queries against Databricks warehouses.
Create Databricks Job – Define and configure new data processing jobs.
Get Job Status – Monitor the execution status of Databricks jobs.
Run Job – Trigger the execution of Databricks jobs.
Before using any Databricks actions in ZBrain Flow, you'll need to set up a connection to your Databricks environment. This is a one-time setup that will allow you to access your analytics platform securely.
To create your Databricks connection:
From any Databricks action, click on the connection dropdown and select 'Create connection'.
In the popup window that appears, you'll need to:
Enter a descriptive 'Connection Name' to identify this Databricks connection
In the 'Instance Name' field, enter your Databricks workspace URL (e.g., )
From the 'Grant Type' dropdown, select 'Client Credentials' as the authorization method
In the 'Client Id' field, enter your Databricks application client ID
In the 'Client Secret' field, enter your Databricks application client secret
Click 'Save' to store this connection
Configuration Steps:
Connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section.
In the 'Warehouse Path' field, enter the HTTP path to your SQL warehouse. This typically looks like: "/sql/warehouses/abc12345"
In the 'Query' field, enter the SQL statement you want to execute. You can run any SQL command supported by your Databricks SQL endpoint.
Configuration Steps:
Connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section.
From the 'Task Types' dropdown, select the kind of processing you want to perform:
Notebook Task - to run a Databricks notebook
Python Wheel Task - to execute Python code packaged as wheel files
In the 'Job Name' field, enter a descriptive name for your job.
In the 'Cluster ID' field, optionally specify an existing cluster to run the job. Leave empty for Databricks to create a job cluster automatically.
In the 'Cron Schedule' field, optionally enter a cron expression to schedule recurring job runs.
From the 'Timezone' dropdown, select the time zone for scheduled job execution. This affects when cron-scheduled jobs will run.
In the 'Max Concurrent Runs' field, set the maximum number of job instances that can run simultaneously. The default is 1, which prevents multiple instances of the same job from running at once.
To get a job status, first connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section. Then, provide the ID of the job for which you need the job status.
To run a job, first connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section. Next, specify the ID of the job you want to run.