AWS S3 Storage
DLH.io documentation for AWS S3 Storage
Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage.
Our AWS S3 Storage DLH.io integration allows for S3 as a source and a target connection:
- replicates AWS S3 Storage data to your Cloud Data Warehouse target
- synchronizes to your target destination at a scheduled frequency
- replicate many connections directly to your AWS S3 storage bucket
It allows you to replicate/synchronize your S3 data files, including capturing snapshots of data at any point int time, and keep it up-to-date with little to no configuration efforts. You don’t even need to prepare the target schema — DLH.io will automatically handle all the heavy lifting for you.
All you need is to specify the connection to your S3, point to your target system, or use a DLH.io managed Data Warehouse and DLH.io does the rest. Our support team can even help you set it up for you during a short technical on-boarding session.
Setup Instructions
Follow the setup steps guide for configuring your AWS S3 Storage connection to enable data to flow into your destination via DLH.
Supported Features
Here are key features supported for this connector.
| Sync Feature | Supported | Details |
|---|---|---|
| Custom Data and Development | ✓ | Ability to enhance connector upon request |
| Historical Re-Load/Load | ✓ | |
| Incremental/Delta Load | ✓ | Gets most recent records and changes |
| Column Selection | ✓ | |
| Column Hashing | ✓ | |
| Re-Sync Table/Entity | ✓ | Select at the table level to reload data history (on next Sync Bridge run) |
| Custom Queries | Utilizing SQL Data Query Connector | |
| Custom Data | ||
| Captures Deleted Rows | ✓ | On all supported tables |
| API Sync Bridge Initiation | ✓ | |
| Priority Scheduling | ✓ | |
| Private VPC/Link | ☂ | |
| DLH Data Model Available | - |
If you have any questions about these supported features please reach out to our Customer Support team.
Details on Sync Processing
For this connector, we believe the sync processing is straightforward. We've provided a number of details, steps, and other guidance here and in the setup steps guide. Be sure to also check the change log and notes page from time to time for any changes.
GZIP and Compressed File Handling
In the option for compression there are several options:
- GZIP
- Can contain multiple files.
- It doesn't matter if the file types are all of a kind or different formats for loading as long as they are compatible structures to JSON, CSV, etc. and this depends on the selected file type and will parse only the file type selected (JSON, CSV, etc.)
- GZ
- Compressed Version of JSON (a single JSON file)
- If file type is JSON then it will be an unzipped JSON File only
- ZIP
- Similar to how GZIP performs as described above.
Control Each Column Data Type
SQL Transformations allow logic to be executed against a target connection based on a scheduled frequency or triggered event of new data on tables updated via DLH.io (DLH.io). This especially helps when you want to control the data type set in your Target Connection.
Security and Other Considerations
S3 can be used as a source or target destination. However, this should be used with great consideration to the impact of other systems that may conflict. Restrictions prevent one bucket from syncing to another bucket as this itnegration is not meant for Big Data transfer concepts and will fail if attempting to do so.
Permissions attempt to use the GET and PUT privileges for connecting and testing the connection to the S3 storage bucket. In many cases one of the first test is to verify if your access allows for listing the bucket and or all files in the bucket. This uses the ListBucket privilege, s3:ListBucket.
Failing to have one of these privileges set as part of your policy may cause your integration to fail. An error will be present in the logs if that is the case. If you seek some special permissions due to security constraints or otherwise please contact Customer Support.
The recommendation is that your role or policy appears as follows if having full control over the S3 or S3 compatible bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObjectAcl",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<bucket_name>*",
"arn:aws:s3:::<bucket_name>*/*"
]
}
]
}Issue Handling
If any issues occur with the authorization simply return to the sources page in DLH.io, edit the source details and click the 'Save & Test' button to confirm connectivity. If any issues persist please contact our support team via the DLH.io Support Portal.