Retrieve Projects & Datasets - Labellerr Documentation

Retrieving All Projects for a Client

You can retrieve all projects associated with a specific client ID using the SDK’s project listing functionality:

Example Usage:

Retrieve All Projects

from labellerr.client import LabellerrClient
from labellerr.core.projects import list_projects
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # List all projects - returns list of LabellerrProject objects
    projects = list_projects(client)
    
    print(f"Found {len(projects)} projects:")
    for project in projects:
        print(f"- Project ID: {project.project_id}")
        print(f"  Data Type: {project.data_type}")
        print(f"  Attached Datasets: {len(project.attached_datasets)}")
        print(f"  Created By: {project.created_by}")
        print(f"  Status Code: {project.status_code}")
        
except LabellerrError as e:
    print(f"Failed to retrieve projects: {str(e)}")

This method is useful when you need to:

List all projects for a client
Find specific project IDs
Check project statuses and configurations
Get an overview of client’s work
Access project properties programmatically

This returns a list of LabellerrProject objects, each with access to properties like project_id, data_type, attached_datasets, created_by, and more.

Retrieving All Datasets

You can retrieve both linked and unlinked datasets associated with a client using the SDK’s dataset listing capabilities:

Example Usage:

Retrieve All Datasets

from labellerr.client import LabellerrClient
from labellerr.core.datasets import list_datasets
from labellerr.core.schemas import DataSetScope
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Get all datasets using list_datasets with auto-pagination
    datasets_generator = list_datasets(
        client=client,
        datatype='image',
        scope=DataSetScope.client,  # or DataSetScope.project
        page_size=-1  # Auto-paginate through all datasets
    )
    
    # Iterate through the generator
    print("Datasets:")
    for dataset in datasets_generator:
        print(f"- Dataset ID: {dataset.get('dataset_id')}")
        print(f"  Name: {dataset.get('name')}")
        print(f"  Description: {dataset.get('description')}")
        print(f"  Data Type: {dataset.get('data_type')}")
        print(f"  Files Count: {dataset.get('files_count', 0)}")

except LabellerrError as e:
    print(f"Failed to retrieve datasets: {str(e)}")

Pagination Support:

Returns a generator that yields individual datasets
Use page_size=-1 to automatically paginate through all datasets (recommended)
Use specific page_size to limit results (e.g., page_size=20)
Use last_dataset_id for manual pagination across multiple requests
Generator approach is memory-efficient for large numbers of datasets

Available Scope Options:

DataSetScope.client - Datasets with client-level permissions
DataSetScope.project - Datasets with project-level permissions

This method is useful when you need to:

Get a comprehensive list of all datasets in your workspace
Filter datasets by specific data types (image, video, audio, document, text)
Organize datasets by scope (client-level or project-level permissions)
Efficiently iterate through large numbers of datasets with pagination
Memory-efficient processing of datasets using generators

Advanced Pagination Examples

Auto-Pagination (Recommended)
Fixed Page Size
Manual Pagination

Auto-Paginate Through All Datasets

from labellerr.core.datasets import list_datasets
from labellerr.core.schemas import DataSetScope

# Auto-paginate through all datasets (memory-efficient)
all_datasets = list_datasets(
    client=client,
    datatype='image',
    scope=DataSetScope.client,
    page_size=-1  # Automatically handles all pages
)

# Process datasets as they're retrieved
for dataset in all_datasets:
    print(f"Processing dataset: {dataset.get('name')}")
    # Do something with each dataset

Retrieving Files from a Dataset

You can fetch all files from a specific dataset using the fetch_files() method:

Example Usage:

Fetch Files from Dataset

from labellerr.client import LabellerrClient
from labellerr.core.datasets import LabellerrDataset
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Get dataset instance
    dataset = LabellerrDataset(client=client, dataset_id="your_dataset_id")
    
    # Fetch all files
    files = dataset.fetch_files()
    
    print(f"Dataset contains {len(files)} files:")
    for file in files:
        print(f"- File ID: {file.get('file_id')}")
        print(f"  File Name: {file.get('file_name')}")
        print(f"  Status: {file.get('status')}")
        
except LabellerrError as e:
    print(f"Failed to fetch files: {str(e)}")

This method is useful when you need to:

Get a list of all files in a dataset
Check file statuses before creating a project
Verify dataset contents
Build custom file processing workflows

Working with Projects and Datasets

Get Project Information

Get Project Details

from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

# Initialize the client with your API credentials
client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    # Get project instance
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Access project properties
    print(f"Project ID: {project.project_id}")
    print(f"Data Type: {project.data_type}")
    print(f"Attached Datasets: {project.attached_datasets}")
    
except LabellerrError as e:
    print(f"Failed to retrieve project: {str(e)}")

Attach Datasets to a Project

Attach Datasets

from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Attach a single dataset
    result = project.attach_dataset_to_project(dataset_id="dataset_id_to_attach")
    
    # Or attach multiple datasets at once
    # result = project.attach_dataset_to_project(dataset_ids=["dataset_1", "dataset_2"])
    
    print(f"Dataset(s) attached successfully: {result}")
    
except LabellerrError as e:
    print(f"Failed to attach dataset: {str(e)}")

Detach Datasets from a Project

Detach Datasets

from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Detach a single dataset
    result = project.detach_dataset_from_project(dataset_id="dataset_id_to_detach")
    
    # Or detach multiple datasets at once
    # result = project.detach_dataset_from_project(dataset_ids=["dataset_1", "dataset_2"])
    
    print(f"Dataset(s) detached successfully: {result}")
    
except LabellerrError as e:
    print(f"Failed to detach dataset: {str(e)}")

Important: When detaching datasets from a project, the method no longer requires client_id and project_id parameters as these are automatically derived from the project instance.

Bulk Assign Files

You can bulk assign multiple files to a new status in a project, optionally assigning them to a specific user.

Example Usage:

Bulk Assign Files

from labellerr.client import LabellerrClient
from labellerr.core.projects import LabellerrProject
from labellerr.core.exceptions import LabellerrError

client = LabellerrClient(
    api_key='your_api_key',
    api_secret='your_api_secret',
    client_id='your_client_id'
)

try:
    project = LabellerrProject(client=client, project_id="your_project_id")
    
    # Bulk assign files to review status
    result = project.bulk_assign_files(
        file_ids=["file_id_1", "file_id_2", "file_id_3"],
        new_status="review",
        assign_to="[email protected]"  # optional
    )
    
    print(f"Files assigned successfully: {result}")
    
except LabellerrError as e:
    print(f"Failed to bulk assign files: {str(e)}")

Acceptable Status Values

Status	Description
`annotation`	Assign files for annotation
`review`	Assign files for review after annotation
`client_review`	Assign files for client review
`accepted`	Mark files as accepted
`rejected`	Mark files as rejected

When to Use Bulk Assign

Bulk assign is essential for automating annotation workflows:

Common Use Cases

Use Case	Example
Batch Assignment	Assign 100 images to annotator at once instead of clicking 100 times
Status Progression	Move all “annotation” files to “review” status after completion
Team Management	Distribute files across multiple team members efficiently
Workflow Automation	Automate the flow: annotation → review → client_review → accepted
Onboarding	Assign initial set of files to new annotators
Quality Control	Send all files to a senior reviewer before client submission

Example Scenario: Imagine you have 500 images that need annotation. After annotators finish, you want to send them all to review status. Without bulk assign, you’d click 500 times. With bulk_assign_files(), it’s one API call:

# After annotation is complete
result = project.bulk_assign_files(
    file_ids=all_500_file_ids,
    new_status="review",
    assign_to="[email protected]"
)

Error Handling

The Labellerr SDK uses a custom exception class, LabellerrError, to indicate issues during API interactions. Always wrap your function calls in try-except blocks to gracefully handle errors.

Example:

Error Handling Example

from labellerr.core.exceptions import LabellerrError
from labellerr.core.projects import LabellerrProject

try:
    # Example function call
    project = LabellerrProject(client=client, project_id="project_id")
    datasets = project.attached_datasets
except LabellerrError as e:
    print(f"An error occurred: {str(e)}")

API Reference

Function Signatures

list_projects(client)

from labellerr.core.projects import list_projects

def list_projects(client: LabellerrClient) -> list[LabellerrProject]:
    """
    Retrieves a list of projects associated with a client ID.
    
    Parameters:
        client: LabellerrClient instance
        
    Returns:
        List of LabellerrProject objects with properties:
        - project_id: str
        - data_type: str
        - attached_datasets: list
        - created_by: str
        - created_at: datetime
        - annotation_template_id: str
        - status_code: int
    """

Example:

projects = list_projects(client)
for project in projects:
    print(f"{project.project_id}: {project.data_type}")

list_datasets(client, datatype, scope, page_size, last_dataset_id)

from labellerr.core.datasets import list_datasets
from labellerr.core.schemas import DataSetScope

def list_datasets(
    client: LabellerrClient,
    datatype: str,
    scope: DataSetScope,
    page_size: int = 10,
    last_dataset_id: str = None
) -> Generator:
    """
    Retrieves datasets by parameters with pagination support.
    
    Parameters:
        client: LabellerrClient instance
        datatype: Type of data ('image', 'video', 'audio', 'document', 'text')
        scope: DataSetScope.client or DataSetScope.project
        page_size: Number of datasets per page (default: 10, -1 for auto-pagination)
        last_dataset_id: ID of last dataset from previous page (for manual pagination)
        
    Returns:
        Generator yielding dataset dictionaries with keys:
        - dataset_id: str
        - name: str
        - description: str
        - data_type: str
        - files_count: int
        - created_at: datetime
        - created_by: str
    """

Examples:

# Auto-paginate through all
for dataset in list_datasets(client, 'image', DataSetScope.client, page_size=-1):
    print(dataset['name'])

# Get first 20 only
datasets = list(list_datasets(client, 'video', DataSetScope.client, page_size=20))

LabellerrDataset.fetch_files()

from labellerr.core.datasets import LabellerrDataset

dataset = LabellerrDataset(client=client, dataset_id="dataset_id")
files = dataset.fetch_files()

Returns:List of file dictionaries with metadata including:

file_id: str
file_name: str
status: str
file_type: str
file_size: int

LabellerrProject - Properties

from labellerr.core.projects import LabellerrProject

project = LabellerrProject(client=client, project_id="project_id")

# Available properties:
project.project_id           # str: Project identifier
project.data_type            # str: Data type (image, video, etc.)
project.attached_datasets    # list: List of dataset IDs
project.created_by           # str: Creator email
project.created_at           # datetime: Creation timestamp
project.annotation_template_id  # str: Template ID
project.status_code          # int: Project status code

Methods:

attach_dataset_to_project(dataset_id=None, dataset_ids=None)
detach_dataset_from_project(dataset_id=None, dataset_ids=None)
bulk_assign_files(file_ids, new_status, assign_to=None)
upload_preannotations(annotation_format, annotation_file, conf_bucket=None, _async=False)
create_export(export_config)

Best Practices

Use Auto-Pagination

Set page_size=-1 for list_datasets() to automatically handle all pages without manual intervention

Leverage Generators

Process datasets as they’re retrieved instead of loading all into memory at once

Filter by Scope

Use DataSetScope.client for workspace-level datasets or DataSetScope.project for project-specific ones

Error Handling

Always wrap API calls in try-except blocks to catch LabellerrError exceptions

The Labellerr SDK is a fast and reliable solution for managing annotation workflows. Want to try it end-to-end? Refer to this Google Colab Cookbook for a ready-to-run tutorial. For more related cookbooks and examples, please visit our repository: Labellerr Hands-On Learning

SDK Documentation

​Retrieving All Projects for a Client

​Example Usage:

​Retrieving All Datasets

​Example Usage:

​Advanced Pagination Examples

​Retrieving Files from a Dataset

​Example Usage:

​Working with Projects and Datasets

​Get Project Information

​Attach Datasets to a Project

​Detach Datasets from a Project

​Bulk Assign Files

​Example Usage:

Acceptable Status Values

​When to Use Bulk Assign

Common Use Cases

​Error Handling

​Example:

​API Reference

​Function Signatures

​Best Practices

Use Auto-Pagination

Leverage Generators

Filter by Scope

Error Handling

Retrieving All Projects for a Client

Example Usage:

Retrieving All Datasets

Example Usage:

Advanced Pagination Examples

Retrieving Files from a Dataset

Example Usage:

Working with Projects and Datasets

Get Project Information

Attach Datasets to a Project

Detach Datasets from a Project

Bulk Assign Files

Example Usage:

When to Use Bulk Assign

Error Handling

Example:

API Reference

Function Signatures

Best Practices