Note
In response to Microsoft’s retirement of Azure Data Lake Storage (ADLS) Gen 1 on February 29, 2024, Syniti Replicate discontinues support for using ADLS Gen 1 for replication services. Refer to the retirement announcement here.
If you are using ADLS Gen 1, Syniti recommends to use ADLS Gen 2.
Syniti Replicate supports the use of Microsoft ADLS Gen 2 for:
Microsoft ADLS Gen 2
This topic describes how to set up your Syniti Replicate environment to replicate data to a Microsoft Azure Data Lake Storage (ADLS) Gen2 as target connection. The replicated data is stored in flat file format in ADLS Gen2. When replicating from a relational database to ADLS Gen2, Syniti Replicate allows you to replicate data in two different modes:
Refresh: A one-time complete replication from source table to ADLS Gen2, according to replication settings and scripts. You can schedule the replication, identify the columns to be replicated, and add scripts to transform data during replication.
One-Way Mirroring: A continuous update of a replicated table to ADLS Gen2 based on changes to the source database that have been recorded in the database server log. You can define the replication settings to check the transaction log on the source database at regular intervals. Any changes found in the log would be applied to the target.
Prerequisites
For ADLS Gen2 targets, the .NET provider and plugin are included with the Syniti Replicate installation package.
Ensure that you create an application ID in Azure Active Directory, this application ID must have write access to the specified ADLS Gen2 target container.
In the Access Control (IAM) settings for the ADLS Gen2 file system, assign the Storage Blob Data Contributor permissions to the role used in Syniti Replicate.
Make sure you have database connections via a .NET data provider to the supported databases for source database tables.
For Parquet Serialization, complete the following steps:
Download the Parquet .Net v4.10.1 provider from the Nuget site.
Create a local directory to deploy the required files (for example, C:\tmp\adls) and save those files in it.
From the command prompt with admin permissions, execute the below command. Make sure to execute this command from the right directory.
nuget install Parquet.Net -Version 4.10.1 -Framework net480 -OutputDirectory C:\tmp\adlsThis command will download the entire provider into the specified output directory.
Copy the below files from the downloaded packages and paste it into Replicate’s Plugins\ParquetSerializer directory. For example, C:\Program Files\Syniti\Syniti Replicate\Plugins\ParquetSerializer.
D:\Temp\Parquet\Apache.Arrow.2.0.0\lib\netcoreapp1.3\Apache.Arrow.dll
D:\Temp\Parquet\IronCompress.1.3.0\lib\netstandard2.0\IronCompress.dll
D:\Temp\Parquet\Microsoft.Data.Analysis.0.20.1\lib\netstandard2.0\Microsoft.Data.Analysis.dll
D:\Temp\Parquet\Microsoft.IO.RecyclableMemoryStream.2.3.2\lib\netstandard2.0\Microsoft.IO.RecyclableMemoryStream.dll
D:\Temp\Parquet\Microsoft.ML.DataView.2.0.1\lib\netstandard2.0\Microsoft.ML.DataView.dll
D:\Temp\Parquet\Parquet.Net.4.10.1\lib\netstandard2.0\Parquet.dll
D:\Temp\Parquet\Snappier.1.0.0\lib\netstandard2.0\Snappier.dll
D:\Temp\Parquet\System.Buffers.4.5.1\lib\netstandard2.0\System.Buffers.dll
D:\Temp\Parquet\System.Collections.Immutable.1.5.0\lib\netstandard2.0\System.Collections.Immutable.dll
D:\Temp\Parquet\System.Memory.4.5.5\lib\netstandard2.0\System.Memory.dll
D:\Temp\Parquet\System.Numerics.Vectors.4.5.0\lib\netstandard2.0\System.Numeric.Vectors.dll
D:\Temp\Parquet\System.Runtime.CompilerServices.Unsafe.6.0.0\lib\netstandard2.0\System.Runtime.CompilerServices.Unsafe.dll
D:\Temp\Parquet\System.Threading.Tasks.Extensions.4.5.4\lib\netstandard2.0\System.Threading.Tasks.Extensions.dll
Note
It is possible that the packages are a higher version than mentioned above. For example, instead of System.Threading.Tasks.Extensions.4.5.4, the package can be System.Threading.Tasks.Extensions.4.5.5. In this case, copy the version that has been downloaded.
Limitations
Azure Data Lake is supported only as target system in Refresh and Mirroring mode.
Access via Proxy is not supported. Syniti Replicate needs to connect directly with Azure Data Lake.
Data Encryption is not supported for Azure Data Lake.
Data Masking is not supported in Syniti Replicate.
Key Vault is not supported in Syniti Replicate.
Database and schema names containing special characters like slash or backslash are not supported.
Configure ADLS Gen2 in Syniti Replicate
Using ADLS Gen2, you can:
Set a replication using Refresh Mode, or
Set a replication using One-way Mirroring Mode
If your replication is Mirroring mode (Change Data Capture), you can define your Transactional Strategy.
Set Up a Target Connection to ADLS Gen2
This section assumes that you have selected the Enable Transactional Replication wizard option to launch the Add Target Connection wizard. If not, to open the wizard:
In the Management Center, select Targets in the Metadata Explorer.
Right-click and select Add New Connection. The Add Target Connection Wizard is displayed.
To set up a Target Connection to ADLS Gen2:
In the Add Target Connection Wizard, select the Microsoft Azure Data Lake Gen2 option from the Provider field.
Click Next to move to the Set connection string page.
Set the following values to the respective fields accordingly:
Field Name | Description |
Output Folder | The location of the schema for config files for the ADLS Gen2 objects |
Client Id | Client ID in GUID format that identifies the client application. (Application added in App registrations in the Active Directory) |
Tenant Id | TenantID (Directory) in grid format representing the Active Directory Tenant. |
Client Secret | Client secret token used from the client application. It can be generated by clicking Certificates & Secrets for the App added in App registrations. |
Storage Account Name | The name of the ADLS Gen2 account used to load data. |
Target Filesystem Name | Target FileSystem or Container name to use in loading data in ADLS Gen2. |
Optional Properties
Set additional properties for the following fields accordingly:
Field Name | Description |
Use Column Names | Determinates if the output CSV files generated during replication will contain the list of column names in the first row. |
Add Transactional Info | Determinates if the output CSV files generated during replication will contain transactional info columns (operation type, transaction id, transaction timestamp, transaction user, and so on) and if these information will appear at the beginning of the record. |
Transactional Column Prefix | Prefix to add to transactional fields (if selected). This option will allow using a custom format for the transactional fields to distinguish them from the data fields |
Serialization | Define the type of serialization required for the Azure Data Lake:
|
Column Separator | [CSV Only] Indicates the character used as column separator in the CSV file.
|
Column Escape | [CSV Only] Indicates the escape character used for the column separator. Default backslash (\). Editable value. |
String Separator | [CSV Only] Indicates the character used as string separator in the CSV file. Default quotation marks ("). Editable value. |
String Escape | [CSV Only] Indicates the escape character used for the string separator. Default backslash (\). Editable value. |
Compression | [Parquet Only] The Parquet compression method used.
|
Use Nullable Fields | [Parquet Only] Allow Nullable fields. |
Use DateTime Fields | [Parquet Only] Allow DataTime fields. |
Use One File Per Group | Indicates if each replication group generates a common CSV file or one file for each replication in the group. |
Extended Properties | Additional properties not listed in the grid. Enter the properties as a list of "property=value" string separated by (;). |
File Structure
Data files replicated to ADLS Gen2 using mirroring have the following structure. Files have a .mir extension when used for mirroring and a .ref extension when used for snapshot or refresh replications.
Column | Value |
1 | Sequence number |
2 | Timestamp of transaction |
3 | User, if provided by the database |
4 | Table name |
5 | Operation code – one of I (Insert), U (Update), or D (Delete) |
6 | Previous primary key value for the database column. This is used for update operations to identify the correct record |
7 | Next primary key value. This is used for update operations to identify the correct record in cases where the primary key is changed during the update operation |
8 | Database column value |
Example File
Example filename: corp_20150115T161123155064_1.mir
10,2015-01-22 15:32:39.000000,,"glb","I",,2,"test insert"
11,2015-01-22 15:32:40.000000,," glb ","D",10,,
12,2015-01-22 15:32:42.000000,," glb ","U",104,104,"my updated"
13,2015-01-22 15:32:44.000000,," glb ","U",1,12,"sd"Partitions for Refresh Replications
Use this dialog box to set up partitions for refresh replications to improve replication efficiency.
Refresh Partitions allow you to define multiple partitions or subsets within a replication during the refresh operation, so that multiple parallel threads move a specific subset of the data.
Syniti Replicate will create same number of files according the number of partitions defined in the source system, for example, if the user creates 10 partitions per table, Syniti Replicate will create 10 CSV or Parquet files into the directory folder in ADLS Gen2.
Data Type Mapping
When ADLS Gen2 is used as a target system, the mapping of Syniti Replicate data types to the corresponding data type in ADLS Gen2 is listed in the following table:
Syniti Replicate Mapping | Azure Data Lake |
AutoDecimal | decimal |
AutoInteger | integer |
AutoNumeric | decimal |
AutoSmallint | smallint |
AutoTinyint | tinyint |
Bigint | bigint |
Binary | binary |
Bit | bit |
Blob | binary |
Char | string |
Clob | string |
Date | date |
Decimal | decimal |
Double | double |
Float | float |
GUID | string |
Integer | integer |
LongVarbinary | binary |
LongVarchar | string |
Money | decimal |
NChar | string |
NText | string |
Number | decimal |
Numeric | decimal |
NVarchar | string |
Real | real |
Smalldatetime | datetime |
Smallint | smallint |
Smallmoney | decimal |
Sysname | string |
Time | time |
Timestamp | timestamp |
TimestampZ | timestamptz |
Tinyint | tinyint |
Varbinary | binary |
Varchar | string |
Xml | string |
