DLHDLH.io Documentation

Google Cloud Storage

DLH.io documentation for Google Cloud Storage

Google Cloud Storage also known as GCP Storage is the Google Cloud object storage or blob storage concept for storing files and other objects in the cloud.

DLH.io provides this connector as a direct way to work with data and files both as source and as target (i.e.: backup) conduits.

GCP Storage is mainly used for synchronizing data into BigQuery but can also be used for other general synchronization data flows and pipelines.

GCP Storage Pre-Requisities:

  • Name of your GCP Project
  • Name of your GCP Storage Bucket
  • Service Account Key JSON

Setup Instructions

DLH.io securely connects to your Google Cloud Storage bucket. Using the form in the DLH.io portal please complete the following basic steps.

  1. Enter a Name or Alias for this connection, in the Name/Alias field, that is unique from other connectors

  2. Enter a Target Schema Prefix, which will be the prefix for the schema at the target you will sync your data files into

  3. Enter a Bucket name, where your files are stored

    • Typically just the name of the bucket. No http or gs prefixes required.
  4. Select your 'Region'

  5. Enter any other optional details in the available fields (See the setup video if you need help or contact support)

    • Folder Path, is a path on the root bucket from where desired files will be retrieved
    • File Pattern, is a regular expression (RegEx) used to isolated only certain files to be retrieved
    • File Type, allows for a pre-determined type of file extension to be retreived
  6. Enter your Service Account Key, which should be a JSON string

  7. Click the **Save & Test **button. Once your credentials are accepted you should be able to see a successful connection.

FAQs & Troubleshooting

Why am I getting a storage.buckets.get error?

This issue is due to how dlh.io needs to retrieve file information from your bucket. If you see an error message containing this warning or error, it means that you need to update your permissions for the Service Account and/or the dlh.io user shown in the connection to have either the Storage Admin role granted or create a custom role in your GCP project with this permission. See this Serverfault.com answer for some general direction, if needed.

..

DLH.io runs a test when you create this GCP Storage source connector which lists the files in the bucket and some other steps to ensure that DLH.io can act as the conduit to work with your bucket. If any portion of the test fails a notification o the permission issue should appear in the logs, alerts, or in the page when the test is conducted.

Can we use the GCP Storage Option for non-BigQuery DWs?

No. Not currently as of 01/2023, it is only available for BigQuery processing.

Control Each Column Data Type

SQL Transformations allow logic to be executed against a target connection based on a scheduled frequency or triggered event of new data on tables updated via DLH.io (DLH.io). This especially helps when you want to control the data type set in your Target Connection since all columns are set as VARCHAR(16777216).

Issue Handling

If any issues occur with the authorization simply return to the sources page in DLH.io, edit the source details and click the Save & Test or Authorize Your Account or Re-Authorize Account button to confirm connectivity. If any issues persist please contact our support team via the DLH.io Support Portal.