Databricks

Last updated 23 days ago

Databricks

The Databricks piece in ZBrain Flow provides integration with Databricks' unified data analytics platform, allowing you to execute SQL queries and manage data processing jobs directly from your workflows. This powerful connector enables you to interact with Databricks workspaces without manual intervention. With Databricks integration, you can build automations that run data transformations, query data lakes, monitor job executions, and incorporate big data processing into your business processes. This piece is ideal for data teams looking to automate their analytics workflows, integrate data processing with other business systems, and create consistent, repeatable data pipelines.

How to Use Databricks in ZBrain Flow

Select Databricks as Your Connection

Click on the '+' button in the Flow and search for Databricks.
Select Databricks.
Decide on the action you need, then select it. ZBrain Flow provides several options:
- Run SQL Command – Execute SQL queries against Databricks warehouses.
- Create Databricks Job – Define and configure new data processing jobs.
- Get Job Status – Monitor the execution status of Databricks jobs.
- Run Job – Trigger the execution of Databricks jobs.

How to Connect to Your Databricks Workspace

Before using any Databricks actions in ZBrain Flow, you'll need to set up a connection to your Databricks environment. This is a one-time setup that will allow you to access your analytics platform securely.

To create your Databricks connection:

From any Databricks action, click on the connection dropdown and select 'Create connection'.
In the popup window that appears, you'll need to:
- Enter a descriptive 'Connection Name' to identify this Databricks connection
- In the 'Instance Name' field, enter your Databricks workspace URL (e.g., )
- From the 'Grant Type' dropdown, select 'Client Credentials' as the authorization method
- In the 'Client Id' field, enter your Databricks application client ID
- In the 'Client Secret' field, enter your Databricks application client secret
- Click 'Save' to store this connection

How to Execute SQL Queries in Databricks

Configuration Steps:

Connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section.
In the 'Warehouse Path' field, enter the HTTP path to your SQL warehouse. This typically looks like: "/sql/warehouses/abc12345"
In the 'Query' field, enter the SQL statement you want to execute. You can run any SQL command supported by your Databricks SQL endpoint.

How to Create a Databricks Job

Configuration Steps:

Connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section.
From the 'Task Types' dropdown, select the kind of processing you want to perform:
- Notebook Task - to run a Databricks notebook
- Python Wheel Task - to execute Python code packaged as wheel files
In the 'Job Name' field, enter a descriptive name for your job.
In the 'Cluster ID' field, optionally specify an existing cluster to run the job. Leave empty for Databricks to create a job cluster automatically.
In the 'Cron Schedule' field, optionally enter a cron expression to schedule recurring job runs.
From the 'Timezone' dropdown, select the time zone for scheduled job execution. This affects when cron-scheduled jobs will run.
In the 'Max Concurrent Runs' field, set the maximum number of job instances that can run simultaneously. The default is 1, which prevents multiple instances of the same job from running at once.

How to Get Job Status

To get a job status, first connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section. Then, provide the ID of the job for which you need the job status.

How to Run a Job

To run a job, first connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section. Next, specify the ID of the job you want to run.

PreviousCrypto NextData Mapper

Last updated 23 days ago

How to Use Databricks in ZBrain Flow

Select Databricks as Your Connection

Click on the '+' button in the Flow and search for Databricks.
Select Databricks.
Decide on the action you need, then select it. ZBrain Flow provides several options:
- Run SQL Command – Execute SQL queries against Databricks warehouses.
- Create Databricks Job – Define and configure new data processing jobs.
- Get Job Status – Monitor the execution status of Databricks jobs.
- Run Job – Trigger the execution of Databricks jobs.

How to Connect to Your Databricks Workspace

To create your Databricks connection:

From any Databricks action, click on the connection dropdown and select 'Create connection'.
In the popup window that appears, you'll need to:
- Enter a descriptive 'Connection Name' to identify this Databricks connection
- In the 'Instance Name' field, enter your Databricks workspace URL (e.g., )
- From the 'Grant Type' dropdown, select 'Client Credentials' as the authorization method
- In the 'Client Id' field, enter your Databricks application client ID
- In the 'Client Secret' field, enter your Databricks application client secret
- Click 'Save' to store this connection

How to Execute SQL Queries in Databricks

Configuration Steps:

Connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section.
In the 'Warehouse Path' field, enter the HTTP path to your SQL warehouse. This typically looks like: "/sql/warehouses/abc12345"
In the 'Query' field, enter the SQL statement you want to execute. You can run any SQL command supported by your Databricks SQL endpoint.

How to Create a Databricks Job

Configuration Steps:

Connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section.
From the 'Task Types' dropdown, select the kind of processing you want to perform:
- Notebook Task - to run a Databricks notebook
- Python Wheel Task - to execute Python code packaged as wheel files
In the 'Job Name' field, enter a descriptive name for your job.
In the 'Cluster ID' field, optionally specify an existing cluster to run the job. Leave empty for Databricks to create a job cluster automatically.
In the 'Cron Schedule' field, optionally enter a cron expression to schedule recurring job runs.
From the 'Timezone' dropdown, select the time zone for scheduled job execution. This affects when cron-scheduled jobs will run.
In the 'Max Concurrent Runs' field, set the maximum number of job instances that can run simultaneously. The default is 1, which prevents multiple instances of the same job from running at once.

How to Get Job Status

How to Run a Job

To run a job, first connect to your Databricks workspace following the steps in the "How to Connect to Your Databricks Workspace" section. Next, specify the ID of the job you want to run.