Google Cloud Storage
DLH.io documentation for Google Cloud Storage
Google Cloud Storage also known as GCP Storage is the Google Cloud object storage or blob storage concept for storing files and other objects in the cloud.
DLH.io provides this connector as a direct way to work with data and files both as source and as target (i.e.: backup) conduits.
GCP Storage is mainly used for synchronizing data into BigQuery but can also be used for other general synchronization data flows and pipelines.
GCP Storage Pre-Requisities:
- Name of your GCP Project
- Name of your GCP Storage Bucket
- Service Account Key JSON
Setup Instructions
Follow the setup steps guide for configuring your Google Cloud Storage connection to enable data to flow into your destination via DLH.
Supported Features
Here are key features supported for this connector.
| Sync Feature | Supported | Details |
|---|---|---|
| Custom Data and Development | ✓ | Ability to enhance connector upon request |
| Historical Re-Load/Load | ✓ | |
| Incremental/Delta Load | ✓ | Gets most recent records and changes |
| Column Selection | ✓ | |
| Column Hashing | ✓ | |
| Re-Sync Table/Entity | ✓ | Select at the table level to reload data history (on next Sync Bridge run) |
| Custom Queries | Utilizing SQL Data Query Connector | |
| Custom Data | ||
| Captures Deleted Rows | ✓ | On all supported tables |
| API Sync Bridge Initiation | ✓ | |
| Priority Scheduling | ✓ | |
| Private VPC/Link | ☂ | |
| DLH Data Model Available | - |
If you have any questions about these supported features please reach out to our Customer Support team.
Details on Sync Processing
For this connector, we believe the sync processing is straightforward. We've provided a number of details, steps, and other guidance here and in the setup steps guide. Be sure to also check the change log and notes page from time to time for any changes.
FAQs & Troubleshooting
Why am I getting a storage.buckets.get error?
This issue is due to how dlh.io needs to retrieve file information from your bucket. If you see an error message containing this warning or error, it means that you need to update your permissions for the Service Account and/or the dlh.io user shown in the connection to have either the Storage Admin role granted or create a custom role in your GCP project with this permission. See this Serverfault.com answer for some general direction, if needed.
..
DLH.io runs a test when you create this GCP Storage source connector which lists the files in the bucket and some other steps to ensure that DLH.io can act as the conduit to work with your bucket. If any portion of the test fails a notification o the permission issue should appear in the logs, alerts, or in the page when the test is conducted.
Can we use the GCP Storage Option for non-BigQuery DWs?
No. Not currently as of 01/2023, it is only available for BigQuery processing.
Control Each Column Data Type
SQL Transformations allow logic to be executed against a target connection based on a scheduled frequency or triggered event of new data on tables updated via DLH.io (DLH.io). This especially helps when you want to control the data type set in your Target Connection since all columns are set as VARCHAR(16777216).
Issue Handling
If any issues occur with the authorization simply return to the sources page in DLH.io, edit the source details and click the Save & Test or Authorize Your Account or Re-Authorize Account button to confirm connectivity. If any issues persist please contact our support team via the DLH.io Support Portal.