Skip to main content

Introduction

What are Data Connectors?

A data connector in Labellerr is a saved set of cloud storage credentials (for AWS S3 or Google Cloud Storage) that authorises Labellerr to securely access your bucket, without storing or moving credentials manually on every operation.Connectors are reusable, once a connector credential is saved, it can be used across multiple datasets and projects. The SDK provides full lifecycle management:
  • Create connector credentials for AWS S3 or GCP GCS
  • Test connector credentials before saving them
  • List all existing connector credentials by provider and type
  • Delete connector credentials that are no longer needed

Connector Types

TypeEnum ValueDescription
ImportConnectionType._IMPORTPull files from cloud storage into Labellerr
ExportConnectionType._EXPORTPush annotation results back to cloud storage

Provider Types

ProviderEnum ValueDescription
AWS S3ConnectorType._S3Amazon Simple Storage Service
GCP GCSConnectorType._GCSGoogle Cloud Storage

Required Imports

Required Imports
from labellerr.client import LabellerrClient
from labellerr.core.schemas import (
    ConnectionType,
    ConnectorType,
    AWSConnectionParams,
    AWSConnectionTestParams,
    GCSConnectionParams,
    GCSConnectionTestParams
)
from labellerr.core.connectors import (
    LabellerrS3Connection,
    LabellerrGCSConnection,
    list_connections,
    delete_connection
)
from labellerr.core.exceptions import LabellerrError

Connect to AWS S3

Connect AWS S3 Bucket

Use LabellerrS3Connection to create a connection to your Amazon S3 bucket. It is strongly recommended to test the connection first before saving it to ensure your IAM credentials and path are valid.
1

Initialize the Client

Create a LabellerrClient instance with your API credentials.
2

Test the Connection (Recommended)

Call LabellerrS3Connection.test_connection() with a specific S3 path to validate credentials and permissions.
3

Create the Connection

Call LabellerrS3Connection.create_connection() to persist the connection for future use.
Connect to AWS S3
from labellerr.client import LabellerrClient
from labellerr.core.connectors import LabellerrS3Connection
from labellerr.core.schemas import AWSConnectionParams, AWSConnectionTestParams, ConnectionType
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Step 1: Test the connection first (recommended)
    test_params = AWSConnectionTestParams(
        aws_access_key="your_aws_access_key",
        aws_secrets_key="your_aws_secret_key",
        path="s3://your-bucket-name/path/to/data/",
        data_type="image",
        connection_type="import"
    )

    test_result = LabellerrS3Connection.test_connection(client, test_params)
    print(f"✓ Connection test result: {test_result}")

    # Step 2: Create the connection
    connection_params = AWSConnectionParams(
        aws_access_key="your_aws_access_key",
        aws_secrets_key="your_aws_secret_key",
        path="s3://your-bucket-name/path/to/data/",
        data_type="image",
        connection_type=ConnectionType._IMPORT,
        name="My S3 Import Connection",
        description="Production AWS S3 bucket for image datasets"
    )

    s3_connection = LabellerrS3Connection.create_connection(client, connection_params)

    # Print connection details
    print(f"✓ S3 Connection created successfully!")
    print(f"  Connection ID   : {s3_connection.connection_id}")
    print(f"  Connection Name : {s3_connection.name}")
    print(f"  Provider        : AWS S3")
    print(f"  Type            : Import")
    print(f"  Path            : s3://your-bucket-name/path/to/data/")

except LabellerrError as e:
    print(f"✗ Connection failed: {str(e)}")
Required IAM Permissions for AWS S3Your IAM user must have the following permissions on the target bucket:
Use CaseRequired Permissions
Import (read data)s3:GetObject, s3:ListBucket, s3:GetBucketCors(Optional), s3:GetBucketLocation, s3:PutBucketCors(Optional)
Export (write results)All import permissions + s3:PutObject, s3:DeleteObject
For a step-by-step IAM setup guide, see Connect AWS S3.
S3 Path Format: Always use the s3://bucket-name/folder/subfolder/ format. Include a trailing slash for folder paths.

Connect to GCP (Google Cloud Storage)

Connect GCP GCS Bucket

Use LabellerrGCSConnection to create a connection to your Google Cloud Storage bucket. You will need a service account JSON key file with the appropriate permissions.
1

Initialize the Client

Create a LabellerrClient instance with your API credentials.
2

Test the Connection (Recommended)

Call LabellerrGCSConnection.test_connection() to validate your service account credentials and path access.
3

Create the Connection

Call LabellerrGCSConnection.create_connection() to save the connection for future use.
Connect to GCP GCS
from labellerr.client import LabellerrClient
from labellerr.core.connectors import LabellerrGCSConnection
from labellerr.core.schemas import GCSConnectionParams, GCSConnectionTestParams, ConnectionType
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Step 1: Test the connection first (recommended)
    test_params = GCSConnectionTestParams(
        svc_account_json="/path/to/service-account-key.json",
        path="gs://your-bucket-name/path/to/data/",
        data_type="image",
        connection_type="import"
    )

    test_result = LabellerrGCSConnection.test_connection(client, test_params)
    print(f"✓ Connection test result: {test_result}")

    # Step 2: Create the connection
    connection_params = GCSConnectionParams(
        svc_account_json="/path/to/service-account-key.json",
        path="gs://your-bucket-name/path/to/data/",
        data_type="image",
        connection_type=ConnectionType._IMPORT,
        name="My GCS Import Connection",
        description="Production GCS bucket for image datasets"
    )

    gcs_connection = LabellerrGCSConnection.create_connection(client, connection_params)

    # Print connection details
    print(f"✓ GCS Connection created successfully!")
    print(f"  Connection ID   : {gcs_connection.connection_id}")
    print(f"  Connection Name : {gcs_connection.name}")
    print(f"  Provider        : Google Cloud Storage")
    print(f"  Type            : Import")
    print(f"  Path            : gs://your-bucket-name/path/to/data/")

except LabellerrError as e:
    print(f"✗ Connection failed: {str(e)}")
Required GCS Service Account PermissionsYour service account must have the following roles/permissions:
Use CaseRequired Permissions
Import (read data)storage.objects.get, storage.objects.list, storage.buckets.get, storage.buckets.update(Optional)
Export (write results)All import permissions + storage.objects.create, storage.objects.delete
For a step-by-step service account setup guide, see Connect GCS.
GCS Path Format: Always use the gs://bucket-name/folder/subfolder/ format. Include a trailing slash for folder paths. Your service account JSON key file must be readable from the machine running the SDK.

List Connections

List All Connections

Use list_connections() to retrieve all saved connections for a given cloud provider and connection type. This is useful to inspect existing connections, find a connection ID to reuse, or audit what connections are configured.
List Connections
from labellerr.client import LabellerrClient
from labellerr.core.schemas import ConnectionType, ConnectorType
from labellerr.core.connectors import list_connections
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # List all S3 import connections
    connections = list_connections(client, ConnectorType._S3, ConnectionType._IMPORT)

    for conn in connections:
        print(f"  Connection ID   : {conn.connection_id}")
        print(f"  Connection Name : {conn.name}")
        print(f"  Description     : {conn.description}")
        print(f"  Provider        : {conn.connector_type}")
        print(f"  Connection Type : {conn.connection_type}")
        print(f"  Created At      : {conn.created_at}")
        print("-" * 50)

except LabellerrError as e:
    print(f"✗ Failed to list connections: {str(e)}")
You can filter connections by swapping ConnectorType._S3 for ConnectorType._GCS to list GCS connections, and ConnectionType._IMPORT for ConnectionType._EXPORT to list export connections.

Delete a Connection

Delete Connection

Use delete_connection() to permanently remove a saved connection by its connection ID. This is useful for cleaning up unused or outdated connections.
Delete Connection
from labellerr.client import LabellerrClient
from labellerr.core.connectors import delete_connection
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

# The connection ID you want to delete
connection_id_to_delete = "your_connection_id_here"

try:
    response = delete_connection(client, connection_id=connection_id_to_delete)

    print(f"✓ Connection deleted successfully!")
    print(f"  Deleted Connection ID : {connection_id_to_delete}")
    print(f"  Response              : {response}")

except LabellerrError as e:
    print(f"✗ Failed to delete connection: {str(e)}")
Caution: Deleting a connection is permanent. Any datasets that were linked to this connection will lose access to the cloud storage path. Ensure no active datasets or projects depend on this connection before deleting.

Error Handling

Best Practices for Error Handling

Always wrap connection operations in try-except blocks using LabellerrError to handle failures gracefully.
Error Handling Example
from labellerr.client import LabellerrClient
from labellerr.core.connectors import LabellerrS3Connection, list_connections, delete_connection
from labellerr.core.schemas import AWSConnectionParams, ConnectionType, ConnectorType
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

# Create connection with error handling
try:
    params = AWSConnectionParams(
        aws_access_key="your_aws_access_key",
        aws_secrets_key="your_aws_secret_key",
        path="s3://your-bucket/data/",
        data_type="image",
        connection_type=ConnectionType._IMPORT,
        name="My S3 Connection"
    )
    conn = LabellerrS3Connection.create_connection(client, params)
    print(f"✓ Created: {conn.connection_id}")

except LabellerrError as e:
    print(f"✗ Create failed: {str(e)}")

# List connections with error handling
try:
    connections = list_connections(client, ConnectorType._S3, ConnectionType._IMPORT)
    for conn in connections:
        print(f"  ID: {conn.connection_id}, Name: {conn.name}")

except LabellerrError as e:
    print(f"✗ List failed: {str(e)}")

# Delete connection with error handling
try:
    response = delete_connection(client, connection_id="connection_id_to_delete")
    print(f"✓ Deleted: {response}")

except LabellerrError as e:
    print(f"✗ Delete failed: {str(e)}")

API Reference

from labellerr.core.connectors import LabellerrS3Connection
from labellerr.core.schemas import AWSConnectionParams

connection = LabellerrS3Connection.create_connection(
    client=client,
    params=AWSConnectionParams(
        aws_access_key="...",
        aws_secrets_key="...",
        path="s3://bucket/path/",
        data_type="image",
        connection_type=ConnectionType._IMPORT,
        name="Connection Name",
        description="Optional description"
    )
)
Returns: LabellerrS3Connection object with properties:
  • connection_id: str — Unique identifier for the connection
  • name: str — Display name of the connection
  • description: str — Optional description
  • connector_type: str — Always "s3" for S3 connections
  • connection_type: str"import" or "export"
  • created_at: datetime — Timestamp of creation
from labellerr.core.connectors import LabellerrGCSConnection
from labellerr.core.schemas import GCSConnectionParams

connection = LabellerrGCSConnection.create_connection(
    client=client,
    params=GCSConnectionParams(
        svc_account_json="/path/to/service-account.json",
        path="gs://bucket/path/",
        data_type="image",
        connection_type=ConnectionType._IMPORT,
        name="Connection Name",
        description="Optional description"
    )
)
Returns: LabellerrGCSConnection object with properties:
  • connection_id: str — Unique identifier for the connection
  • name: str — Display name of the connection
  • description: str — Optional description
  • connector_type: str — Always "gcs" for GCS connections
  • connection_type: str"import" or "export"
  • created_at: datetime — Timestamp of creation
from labellerr.core.connectors import list_connections
from labellerr.core.schemas import ConnectorType, ConnectionType

connections = list_connections(
    client=client,
    connector=ConnectorType._S3,        # or ConnectorType._GCS
    connection_type=ConnectionType._IMPORT  # or ConnectionType._EXPORT
)

for conn in connections:
    print(conn.connection_id)
Parameters:
ParameterTypeDescription
clientLabellerrClientAuthenticated client instance
connectorConnectorTypeConnectorType._S3 or ConnectorType._GCS
connection_typeConnectionTypeConnectionType._IMPORT or ConnectionType._EXPORT
Returns: Iterable of connection objects, each with connection_id, name, description, connector_type, connection_type, created_at.
from labellerr.core.connectors import delete_connection

response = delete_connection(
    client=client,
    connection_id="your_connection_id_here"
)
print(response)
Parameters:
ParameterTypeDescription
clientLabellerrClientAuthenticated client instance
connection_idstrID of the connection to delete
Returns: Response object or confirmation message from the API.

Common Use Cases

Reuse Across Datasets

Create one S3 or GCS connection and use the same connection_id across multiple datasets, avoiding repeated credential entry.

Import Large Datasets

Connect directly to cloud storage buckets containing thousands of files, bypassing local upload limits (2,500 files / 2.5 GB).

Export Annotations

Create an export-type connection to automatically push completed annotation exports back to your S3 or GCS bucket.

Audit & Cleanup

Use list_connections() periodically to audit all active connections and delete_connection() to remove unused ones.

Create Datasets

Use your connections to create datasets directly from cloud storage

Connect AWS S3

Step-by-step guide to configure AWS IAM permissions for S3

Connect GCS

Step-by-step guide to configure a GCS service account
For technical support, contact support@tensormatics.com