Connectors

RTDIP SDK provides functionality to connect to and query its data using connectors. Below is a list of the available connectors.

ODBC

Databricks SQL Connector

Enables connectivity to Databricks using the Databricks SQL Connector which does not require any ODBC installation.

For more information refer to this documentation and for the specific implementation within the RTDIP SDK, refer to this link.

from rtdip_sdk.connectors import DatabricksSQLConnection

server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"

connection = DatabricksSQLConnection(server_hostname, http_path, access_token)

Replace server_hostname, http_path and access_token with your own information.

PYODBC SQL Connector

PYDOBC is a popular python package for querying data using ODBC. Refer to their documentation for more information about pyodbc, how to install it and how you can leverage it in your code.

Warning

The RTDIP SDK does not specify pyodbc as one of its package dependencies. It will need to be installed into your environment separately.

View information about how pyodbc is implemented in the RTDIP SDK here.

from rtdip_sdk.connectors import PYODBCSQLConnection

server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"
driver_path = "/Library/simba/spark/lib/libsparkodbc_sbu.dylib"

connection = PYODBCSQLConnection(driver_path, sever_hostname, http_path, access_token)

Replace server_hostname, http_path and access_token with your own information.

TURBODBC SQL Connector

Turbodbc is a powerful python ODBC package that has advanced options for querying performance. Find out more about installing it on your operation system and what Turbodbc can do here and refer to this documentation for more information about how it is implemented in the RTDIP SDK.

Warning

The RTDIP SDK does not specify turbodbc as one of its package dependencies. It will need to be installed into your environment separately.

from rtdip_sdk.connectors import TURBODBCSQLConnection

server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"

connection = TURBODBCSQLConnection(server_hostname, http_path, access_token)

Replace server_hostname, http_path and access_token with your own information.

Spark

Spark Connector

The Spark Connector enables querying of data using a Spark Session. This is useful for querying local instances of Spark or Delta. However, the most useful application of this connector is to leverage Spark Connect to enable connecting to a remote Spark Cluster to provide the compute for the query being run from a local machine.

from rtdip_sdk.connectors import SparkConnection

spark_server = "spark_server"
access_token = "my_token"

spark_remote = "sc://{}:443;token={}".format(spark_server, access_token)
connection = SparkConnection(spark_remote=spark_remote)

Replace the access_token with your own authentiction token.

LLMs

Chat Open AI Databricks Connector

The Chat Open AI Databricks Connector enables querying of Databricks data using Chat GPT.

Warning

This is experimental and you will likely experience variable responses to your questions depending on the complexity of the data you use in this setup. Start small, with only a 2 - 3 tables before scaling up.

from rtdip_sdk.connectors import ChatOpenAIDatabricksConnection

agent = ChatOpenAIDatabricksConnection(
    catalog="<databricks catalog>", 
    schema="<databricks schema>", 
    server_hostname="<databricks host name>",                   
    http_path="<databricks http path>",                         
    access_token="<Azure AD token or databricks PAT token>",
    openai_api_key="<Open AI API key>",
    openai_model = "gpt-4",                                     
    sample_rows_in_table_info = 5, 
    verbose_logging = True
)

response = agent.run("What was the average actual power generated by Turbine 1 at ACME Wind Farm on 6 May?")
print(response)

Some notes on the above:

server_hostname and http_path can be obtained from your Databricks SQL Warehouse or Databricks Cluster.
access_token can be either a Databricks PAT Token or Azure AD Token. To obtain an Azure AD token, please refer to this documentation
open_ai_model defaults to gpt-4 but is not easily available at the time of writing. Alternatively, the gpt-3.5-turbo-16k-0613 has worked well in our tests
sample_rows_in_table_info limits the number of rows queried in a table when the SQL Database Agent is looking context in the data. Be careful to not increase this too much as its then possible to exceed token limits on the gpt models