Skip to main content

Retrieving All Projects for a Client

You can retrieve all projects associated with a specific client ID using the SDK’s project listing functionality:

Example Usage:

Retrieve All Projects
from labellerr.client import LabellerrClient
from labellerr.core.projects import list_projects
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # List all projects - returns list of LabellerrProject objects
    projects = list_projects(client)
    
    print(f"Found {len(projects)} projects:")
    for project in projects:
        print(f"- Project ID: {project.project_id}")
        print(f"  Data Type: {project.data_type}")
        print(f"  Attached Datasets: {len(project.attached_datasets)}")
        print(f"  Created By: {project.created_by}")
        print(f"  Status Code: {project.status_code}")
        
except LabellerrError as e:
    print(f"Failed to retrieve projects: {str(e)}")
This method is useful when you need to:
  • List all projects for a client
  • Find specific project IDs
  • Check project statuses and configurations
  • Get an overview of client’s work
  • Access project properties programmatically
This returns a list of LabellerrProject objects, each with access to properties like project_id, data_type, attached_datasets, created_by, and more.

Retrieving All Datasets

You can retrieve both linked and unlinked datasets associated with a client using the SDK’s dataset listing capabilities:

Example Usage:

Retrieve All Datasets
from labellerr.client import LabellerrClient
from labellerr.core.datasets import list_datasets
from labellerr.core.schemas import DataSetScope
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Get all datasets using list_datasets with auto-pagination
    datasets_generator = list_datasets(
        client=client,
        datatype='image',
        scope=DataSetScope.client,  # or DataSetScope.project
        page_size=-1  # Auto-paginate through all datasets
    )
    
    # Iterate through the generator
    print("Datasets:")
    for dataset in datasets_generator:
        print(f"- Dataset ID: {dataset.get('dataset_id')}")
        print(f"  Name: {dataset.get('name')}")
        print(f"  Description: {dataset.get('description')}")
        print(f"  Data Type: {dataset.get('data_type')}")
        print(f"  Files Count: {dataset.get('files_count', 0)}")

except LabellerrError as e:
    print(f"Failed to retrieve datasets: {str(e)}")
Pagination Support:
  • Returns a generator that yields individual datasets
  • Use page_size=-1 to automatically paginate through all datasets (recommended)
  • Use specific page_size to limit results (e.g., page_size=20)
  • Use last_dataset_id for manual pagination across multiple requests
  • Generator approach is memory-efficient for large numbers of datasets
Available Scope Options:
  • DataSetScope.client - Datasets with client-level permissions
  • DataSetScope.project - Datasets with project-level permissions
This method is useful when you need to:
  • Get a comprehensive list of all datasets in your workspace
  • Filter datasets by specific data types (image, video, audio, document, text)
  • Organize datasets by scope (client-level or project-level permissions)
  • Efficiently iterate through large numbers of datasets with pagination
  • Memory-efficient processing of datasets using generators

Advanced Pagination Examples

Retrieving Files from a Dataset

You can fetch all files from a specific dataset using the fetch_files() method:

Example Usage:

Fetch Files from Dataset
from labellerr.client import LabellerrClient
from labellerr.core.datasets import LabellerrDataset
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Get dataset instance
    dataset = LabellerrDataset(client=client, dataset_id="your_dataset_id")
    
    # Fetch all files
    files = dataset.fetch_files()
    
    print(f"Dataset contains {len(files)} files:")
    for file in files:
        print(f"- File ID: {file.get('file_id')}")
        print(f"  File Name: {file.get('file_name')}")
        print(f"  Status: {file.get('status')}")
        
except LabellerrError as e:
    print(f"Failed to fetch files: {str(e)}")
This method is useful when you need to:
  • Get a list of all files in a dataset
  • Check file statuses before creating a project
  • Verify dataset contents
  • Build custom file processing workflows

Working with Projects and Datasets

Get Project Information

Get Project Details
from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Get project instance
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Access project properties
    print(f"Project ID: {project.project_id}")
    print(f"Data Type: {project.data_type}")
    print(f"Attached Datasets: {project.attached_datasets}")
    
except LabellerrError as e:
    print(f"Failed to retrieve project: {str(e)}")

Attach Datasets to a Project

Attach Datasets
from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Attach a single dataset
    result = project.attach_dataset_to_project(dataset_id="dataset_id_to_attach")
    
    # Or attach multiple datasets at once
    # result = project.attach_dataset_to_project(dataset_ids=["dataset_1", "dataset_2"])
    
    print(f"Dataset(s) attached successfully: {result}")
    
except LabellerrError as e:
    print(f"Failed to attach dataset: {str(e)}")

Detach Datasets from a Project

Detach Datasets
from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Detach a single dataset
    result = project.detach_dataset_from_project(dataset_id="dataset_id_to_detach")
    
    # Or detach multiple datasets at once
    # result = project.detach_dataset_from_project(dataset_ids=["dataset_1", "dataset_2"])
    
    print(f"Dataset(s) detached successfully: {result}")
    
except LabellerrError as e:
    print(f"Failed to detach dataset: {str(e)}")
Important: When detaching datasets from a project, the method no longer requires client_id and project_id parameters as these are automatically derived from the project instance.

Bulk Assign Files

You can bulk assign multiple files to a new status in a project, optionally assigning them to a specific user.

Example Usage:

Bulk Assign Files
from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Bulk assign files to review status
    result = project.bulk_assign_files(
        file_ids=["file_id_1", "file_id_2", "file_id_3"],
        new_status="review",
        assign_to="[email protected]"  # optional
    )
    
    print(f"Files assigned successfully: {result}")
    
except LabellerrError as e:
    print(f"Failed to bulk assign files: {str(e)}")

Acceptable Status Values

StatusDescription
annotationAssign files for annotation
reviewAssign files for review after annotation
client_reviewAssign files for client review
acceptedMark files as accepted
rejectedMark files as rejected

When to Use Bulk Assign

Bulk assign is essential for automating annotation workflows:

Common Use Cases

Use CaseExample
Batch AssignmentAssign 100 images to annotator at once instead of clicking 100 times
Status ProgressionMove all “annotation” files to “review” status after completion
Team ManagementDistribute files across multiple team members efficiently
Workflow AutomationAutomate the flow: annotation → review → client_review → accepted
OnboardingAssign initial set of files to new annotators
Quality ControlSend all files to a senior reviewer before client submission
Example Scenario: Imagine you have 500 images that need annotation. After annotators finish, you want to send them all to review status. Without bulk assign, you’d click 500 times. With bulk_assign_files(), it’s one API call:
# After annotation is complete
result = project.bulk_assign_files(
    file_ids=all_500_file_ids,
    new_status="review",
    assign_to="[email protected]"
)

Error Handling

The Labellerr SDK uses a custom exception class, LabellerrError, to indicate issues during API interactions. Always wrap your function calls in try-except blocks to gracefully handle errors.

Example:

Error Handling Example
from labellerr.core.exceptions import LabellerrError
from labellerr.core.projects import LabellerrProject

try:
    # Example function call
    project = LabellerrProject(client=client, project_id="project_id")
    datasets = project.attached_datasets
except LabellerrError as e:
    print(f"An error occurred: {str(e)}")

API Reference

Function Signatures

from labellerr.core.projects import list_projects

def list_projects(client: LabellerrClient) -> list[LabellerrProject]:
    """
    Retrieves a list of projects associated with a client ID.
    
    Parameters:
        client: LabellerrClient instance
        
    Returns:
        List of LabellerrProject objects with properties:
        - project_id: str
        - data_type: str
        - attached_datasets: list
        - created_by: str
        - created_at: datetime
        - annotation_template_id: str
        - status_code: int
    """
Example:
projects = list_projects(client)
for project in projects:
    print(f"{project.project_id}: {project.data_type}")
from labellerr.core.datasets import list_datasets
from labellerr.core.schemas import DataSetScope

def list_datasets(
    client: LabellerrClient,
    datatype: str,
    scope: DataSetScope,
    page_size: int = 10,
    last_dataset_id: str = None
) -> Generator:
    """
    Retrieves datasets by parameters with pagination support.
    
    Parameters:
        client: LabellerrClient instance
        datatype: Type of data ('image', 'video', 'audio', 'document', 'text')
        scope: DataSetScope.client or DataSetScope.project
        page_size: Number of datasets per page (default: 10, -1 for auto-pagination)
        last_dataset_id: ID of last dataset from previous page (for manual pagination)
        
    Returns:
        Generator yielding dataset dictionaries with keys:
        - dataset_id: str
        - name: str
        - description: str
        - data_type: str
        - files_count: int
        - created_at: datetime
        - created_by: str
    """
Examples:
# Auto-paginate through all
for dataset in list_datasets(client, 'image', DataSetScope.client, page_size=-1):
    print(dataset['name'])

# Get first 20 only
datasets = list(list_datasets(client, 'video', DataSetScope.client, page_size=20))
from labellerr.core.datasets import LabellerrDataset

dataset = LabellerrDataset(client=client, dataset_id="dataset_id")
files = dataset.fetch_files()
Returns:List of file dictionaries with metadata including:
  • file_id: str
  • file_name: str
  • status: str
  • file_type: str
  • file_size: int
from labellerr.core.projects import LabellerrProject

project = LabellerrProject(client=client, project_id="project_id")

# Available properties:
project.project_id           # str: Project identifier
project.data_type            # str: Data type (image, video, etc.)
project.attached_datasets    # list: List of dataset IDs
project.created_by           # str: Creator email
project.created_at           # datetime: Creation timestamp
project.annotation_template_id  # str: Template ID
project.status_code          # int: Project status code
Methods:
  • attach_dataset_to_project(dataset_id=None, dataset_ids=None)
  • detach_dataset_from_project(dataset_id=None, dataset_ids=None)
  • bulk_assign_files(file_ids, new_status, assign_to=None)
  • upload_preannotations(annotation_format, annotation_file, conf_bucket=None, _async=False)
  • create_export(export_config)

Best Practices

Use Auto-Pagination

Set page_size=-1 for list_datasets() to automatically handle all pages without manual intervention

Leverage Generators

Process datasets as they’re retrieved instead of loading all into memory at once

Filter by Scope

Use DataSetScope.client for workspace-level datasets or DataSetScope.project for project-specific ones

Error Handling

Always wrap API calls in try-except blocks to catch LabellerrError exceptions

The Labellerr SDK is a fast and reliable solution for managing annotation workflows. Want to try it end-to-end? Refer to this Google Colab Cookbook for a ready-to-run tutorial. For more related cookbooks and examples, please visit our repository: Labellerr Hands-On Learning