Connectors
RTDIP SDK provides functionality to connect to and query its data using connectors. Below is a list of the available connectors.
ODBC
Databricks SQL Connector
Enables connectivity to Databricks using the Databricks SQL Connector which does not require any ODBC installation.
For more information refer to this documentation and for the specific implementation within the RTDIP SDK, refer to this link.
from rtdip_sdk.connectors import DatabricksSQLConnection
server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"
connection = DatabricksSQLConnection(server_hostname, http_path, access_token)
Replace server_hostname, http_path and access_token with your own information.
PYODBC SQL Connector
PYDOBC is a popular python package for querying data using ODBC. Refer to their documentation for more information about pyodbc, how to install it and how you can leverage it in your code.
Warning
The RTDIP SDK does not specify pyodbc
as one of its package dependencies. It will need to be installed into your environment separately.
View information about how pyodbc is implemented in the RTDIP SDK here.
from rtdip_sdk.connectors import PYODBCSQLConnection
server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"
driver_path = "/Library/simba/spark/lib/libsparkodbc_sbu.dylib"
connection = PYODBCSQLConnection(driver_path, sever_hostname, http_path, access_token)
Replace server_hostname, http_path and access_token with your own information.
TURBODBC SQL Connector
Turbodbc is a powerful python ODBC package that has advanced options for querying performance. Find out more about installing it on your operation system and what Turbodbc can do here and refer to this documentation for more information about how it is implemented in the RTDIP SDK.
Warning
The RTDIP SDK does not specify turbodbc
as one of its package dependencies. It will need to be installed into your environment separately.
from rtdip_sdk.connectors import TURBODBCSQLConnection
server_hostname = "server_hostname"
http_path = "http_path"
access_token = "token"
connection = TURBODBCSQLConnection(server_hostname, http_path, access_token)
Replace server_hostname, http_path and access_token with your own information.
Spark
Spark Connector
The Spark Connector enables querying of data using a Spark Session. This is useful for querying local instances of Spark or Delta. However, the most useful application of this connector is to leverage Spark Connect to enable connecting to a remote Spark Cluster to provide the compute for the query being run from a local machine.
from rtdip_sdk.connectors import SparkConnection
spark_server = "spark_server"
access_token = "my_token"
spark_remote = "sc://{}:443;token={}".format(spark_server, access_token)
connection = SparkConnection(spark_remote=spark_remote)
Replace the access_token with your own authentiction token.
LLMs
Chat Open AI Databricks Connector
The Chat Open AI Databricks Connector enables querying of Databricks data using Chat GPT.
Warning
This is experimental and you will likely experience variable responses to your questions depending on the complexity of the data you use in this setup. Start small, with only a 2 - 3 tables before scaling up.
from rtdip_sdk.connectors import ChatOpenAIDatabricksConnection
agent = ChatOpenAIDatabricksConnection(
catalog="<databricks catalog>",
schema="<databricks schema>",
server_hostname="<databricks host name>",
http_path="<databricks http path>",
access_token="<Azure AD token or databricks PAT token>",
openai_api_key="<Open AI API key>",
openai_model = "gpt-4",
sample_rows_in_table_info = 5,
verbose_logging = True
)
response = agent.run("What was the average actual power generated by Turbine 1 at ACME Wind Farm on 6 May?")
print(response)
Some notes on the above:
server_hostname
andhttp_path
can be obtained from your Databricks SQL Warehouse or Databricks Cluster.access_token
can be either a Databricks PAT Token or Azure AD Token. To obtain an Azure AD token, please refer to this documentationopen_ai_model
defaults togpt-4
but is not easily available at the time of writing. Alternatively, thegpt-3.5-turbo-16k-0613
has worked well in our testssample_rows_in_table_info
limits the number of rows queried in a table when the SQL Database Agent is looking context in the data. Be careful to not increase this too much as its then possible to exceed token limits on the gpt models