Write to Delta
SparkDeltaDestination
Bases: DestinationInterface
The Spark Delta Destination is used to write data to a Delta table.
Examples
#Delta Destination for Streaming Queries
from rtdip_sdk.pipelines.destinations import SparkDeltaDestination
delta_destination = SparkDeltaDestination(
data=df,
options={
"checkpointLocation": "/{CHECKPOINT-LOCATION}/"
},
destination="DELTA-TABLE-PATH",
mode="append",
trigger="10 seconds",
query_name="DeltaDestination",
query_wait_interval=None
)
delta_destination.write_stream()
#Delta Destination for Batch Queries
from rtdip_sdk.pipelines.destinations import SparkDeltaDestination
delta_destination = SparkDeltaDestination(
data=df,
options={
"overwriteSchema": True
},
destination="DELTA-TABLE-PATH",
mode="append",
trigger="10 seconds",
query_name="DeltaDestination",
query_wait_interval=None
)
delta_destination.write_batch()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Dataframe to be written to Delta |
required |
options |
dict
|
required | |
destination |
str
|
Either the name of the Hive Metastore or Unity Catalog Delta Table or the path to the Delta table |
required |
mode |
optional str
|
Method of writing to Delta Table - append/overwrite (batch), append/update/complete (stream). Default is append |
'append'
|
trigger |
optional str
|
Frequency of the write operation. Specify "availableNow" to execute a trigger once, otherwise specify a time period such as "30 seconds", "5 minutes". Set to "0 seconds" if you do not want to use a trigger. (stream) Default is 10 seconds |
'10 seconds'
|
query_name |
optional str
|
Unique name for the query in associated SparkSession. (stream) Default is DeltaDestination |
'DeltaDestination'
|
query_wait_interval |
optional int
|
If set, waits for the streaming query to complete before returning. (stream) Default is None |
None
|
Attributes:
Name | Type | Description |
---|---|---|
checkpointLocation |
str
|
Path to checkpoint files. (Streaming) |
txnAppId |
str
|
A unique string that you can pass on each DataFrame write. (Batch & Streaming) |
txnVersion |
str
|
A monotonically increasing number that acts as transaction version. (Batch & Streaming) |
maxRecordsPerFile |
int str
|
Specify the maximum number of records to write to a single file for a Delta Lake table. (Batch) |
replaceWhere |
str
|
Condition(s) for overwriting. (Batch) |
partitionOverwriteMode |
str
|
When set to dynamic, overwrites all existing data in each logical partition for which the write will commit new data. Default is static. (Batch) |
overwriteSchema |
bool str
|
If True, overwrites the schema as well as the table data. (Batch) |
Source code in src/sdk/python/rtdip_sdk/pipelines/destinations/spark/delta.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
system_type()
staticmethod
Attributes:
Name | Type | Description |
---|---|---|
SystemType |
Environment
|
Requires PYSPARK |
Source code in src/sdk/python/rtdip_sdk/pipelines/destinations/spark/delta.py
115 116 117 118 119 120 121 |
|
write_batch()
Writes batch data to Delta. Most of the options provided by the Apache Spark DataFrame write API are supported for performing batch writes on tables.
Source code in src/sdk/python/rtdip_sdk/pipelines/destinations/spark/delta.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
write_stream()
Writes streaming data to Delta. Exactly-once processing is guaranteed
Source code in src/sdk/python/rtdip_sdk/pipelines/destinations/spark/delta.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|