TUTORIAL.md BEOD API Tutorial - Upload an image

BEOD API tutorial : Uploading data

Introduction

BrightEarth On Demand (BEOD) can be used through the Web application or accessed via an API.

This API provides programmatic access to all functionalities of the BEOD platform.

The BEOD extraction process can be executed on basemaps, provider images, or your own custom images.

In this tutorial you will learn how to use the BEOD API to upload your own images and metadata to use as the source image for an extraction process.

The same procedure applies for uploading vector data (to be used as fine-tuning ground truth, for example).

Prerequisites to upload an image

Once uploaded, a user image will appear as a user raster asset. An asset is always associated with a project, which means that images must be uploaded to an existing project.

The API endpoints for authentication, project creation, and AOI setup are covered in the introduction tutorial and will not be explained here. Please refer to https://landing.brightearth.ai/beod-api-tutorial for detailed explanations of these prerequisite steps.

Upload Process Overview

The BEOD API uses a multi-part upload mechanism to handle large files efficiently. The upload process consists of three main steps:

Initialize: Prepare the upload by specifying the files and their characteristics
Get Multi-part URLs: Obtain presigned URLs for uploading file chunks
Complete: Finalize the upload after all parts have been uploaded

This approach allows for: - Uploading large files in smaller, manageable chunks - Resuming interrupted uploads - Parallel upload of multiple parts (for better performance) - Uploading multiple files as a single asset (e.g., image + metadata files)

Detailed Upload Process

Step 1: Initialize the Upload

The first step is to initialize the multi-part upload by calling the /users/me/projects/{project_id}/assets/upload/initialize endpoint.

Input Parameters: - asset_name: A descriptive name for your asset - asset_type: The type of asset ("raster", "vector", or "aoi") - asset_files: An array of file information objects

Each file object in asset_files should contain: - name: The filename - size: File size in bytes - parts_count: Number of chunks the file will be split into

Python Example:

 import os

# We use 10 MB chunks (maximum size for S3 multipart upload is 5 GB)
CHUNK_SIZE = 10 * 1024 * 1024

def get_local_files_chunks(files: List[str], chunk_size: int = CHUNK_SIZE) -> List[Dict]:
    """
    Get the local files and their chunks for upload.
    """
    upload_files = []
    for file in files:
        size = os.path.getsize(file)
        parts_count = size // chunk_size
        if size % chunk_size > 0:
            parts_count += 1
        upload_files.append({
            "name": file,
            "size": size,
            "parts_count": parts_count,
        })
    return upload_files

# Prepare file information
files = ["image.tif", "image.imd"]
files_chunks = get_local_files_chunks(files)

# Initialize the upload
headers = {"Authorization": f"Bearer {token}"}
initialize_request = {
    "asset_name": "my uploaded image",
    "asset_type": "raster",
    "asset_files": files_chunks
}

response = requests.post(
    f"{API_URL}/users/me/projects/{project_id}/assets/upload/initialize",
    json=initialize_request,
    headers=headers,
)

Response: The response contains: - asset_id: The ID of the created asset - activity_id: ID to track the upload progress - uploads: A dictionary mapping filenames to their upload information (file_key and upload_id)

Example response:

 {
    "asset_id": 12345,
    "activity_id": 67890,
    "uploads": {
        "image.tif": ["file_key_123", "upload_id_456"],
        "image.imd": ["file_key_789", "upload_id_012"]
    }
}

Step 2: Get Multi-part Upload URLs

For each file, you need to get presigned URLs for uploading each part. This is done by calling /users/me/projects/{project_id}/assets/{asset_id}/upload/multi-part-urls.

Input Parameters: - upload_id: The upload ID from the initialize response - file_key: The file key from the initialize response - parts: Array of part numbers (starting from 1)

Python Example:

 initialize_response = response.json()

for file_index, upload_file in enumerate(list(initialize_response['uploads'].keys())):
    file_key, upload_id = initialize_response["uploads"][upload_file]
    parts = list(range(1, files_chunks[file_index]["parts_count"] + 1))

    # Get upload URLs for the file parts
    get_multi_part_urls_request = {
        "upload_id": upload_id,
        "file_key": file_key,
        "parts": parts,
    }
    
    response = requests.post(
        f"{API_URL}/users/me/projects/{project_id}/assets/"
        f"{uploaded_asset_id}/upload/multi-part-urls",
        json=get_multi_part_urls_request,
        headers=headers,
    )
    
    get_multi_part_urls_response = response.json()
    # This response contains the URLs array for uploading each part

Response: The response contains an array of presigned URLs, one for each requested part:

 {
    "urls": [
        "https://s3.amazonaws.com/bucket/key?uploadId=...&partNumber=1",
        "https://s3.amazonaws.com/bucket/key?uploadId=...&partNumber=2",
        "..."
    ]
}

Step 3: Upload File Parts

Now you can upload each part of the file to its corresponding presigned URL using HTTP PUT requests.

Python Example:

 def upload_one_file(
    filename: str,
    parts: List[int],
    urls: List[str],
) -> List[Dict[str, Union[int, str]]]:
    """
    Upload a single file in multiple parts to the given URLs.
    """
    uploaded_parts = []
    with open(filename, "rb") as f:
        for part_index, part_number in enumerate(parts):
            # Read the part
            offset = (part_number - 1) * CHUNK_SIZE
            f.seek(offset)
            data = f.read(CHUNK_SIZE)

            # Upload the part
            print(f"uploading part {part_number} of {filename}...")
            response = requests.put(urls[part_index], data=data)

            if response.status_code != 200:
                print(f"Failed to upload part {part_number}")
                return []

            # Keep track of uploaded parts for completion step
            etag = response.headers["ETag"]
            uploaded_parts.append({"ETag": etag, "PartNumber": part_number})

    return uploaded_parts

# Upload the file parts
uploaded_parts = upload_one_file(
    upload_file,
    get_multi_part_urls_request["parts"],
    get_multi_part_urls_response["urls"]
)

Important Notes: - Each part must be uploaded using HTTP PUT - Save the ETag from each successful upload response as we have to provide it when completing the upload - Parts can be uploaded in parallel for better performance - Each part (except the last) should be exactly CHUNK_SIZE bytes

Step 4: Complete the Upload

After all parts of a file are uploaded, you must complete the upload by calling /users/me/projects/{project_id}/assets/{asset_id}/upload/complete.

Input Parameters: - upload_id: The upload ID from the initialize response - file_key: The file key from the initialize response
- parts: Array of objects containing ETag and PartNumber for each uploaded part

Python Example:

 # Complete the upload for each file
complete_upload_request = {
    "upload_id": upload_id,
    "file_key": file_key,
    "parts": uploaded_parts  # Array of {"ETag": "...", "PartNumber": 1} objects
}

response = requests.post(
    f"{API_URL}/users/me/projects/{project_id}/assets/"
    f"{uploaded_asset_id}/upload/complete",
    json=complete_upload_request,
    headers=headers,
)

if response.status_code == 201:
    complete_response = response.json()
    print(f"File {file_key} - status: {complete_response['status']}")

Response: The response confirms the completion with a status boolean:

 {
    "status": true
}

Step 5: Monitor Upload Processing

After all files are uploaded and completed, the system will post-process your asset. You can monitor this using the activity ID returned in the initialize step.

For a raster asset, the processing typically includes the creation of a displayable version of the image (as a COG).

Python Example:

 def follow_activity(token: str, project_id: str, activity_id: int) -> bool:
    """
    Follow an activity until it terminates.
    """
    url = f"{API_URL}/users/me/projects/{project_id}/activities/{activity_id}"
    headers = {"Authorization": f"Bearer {token}"}
    
    while True:
        response = requests.get(url, headers=headers)
        if response.status_code != 200:
            print('Unable to track the activity')
            return False
        
        json_response = response.json()
        state = json_response["state"]
        progress = json_response["progress"]
        
        print(f'Activity state: {state}, progress: {progress}')
        
        if state == "SUCCESSFUL":
            return True
        elif state == "FAILED":
            return False
        
        time.sleep(2)  # Check every 2 seconds

# Monitor the processing
activity_id = initialize_response["activity_id"]
success = follow_activity(token, project_id, activity_id)

Complete Upload Example

Here's the complete beod_upload_asset.py script that demonstrates the entire upload process:

 import os
import json
import requests
import time
from typing import List, Dict, Union
import requests
from datetime import datetime

"""
This sample script shows how to upload a TIF image and its associated IMD metadata file
to the BEOD platform using the API.

The upload creates a user-image raster asset than can be used as the source for an extraction.

It performs the following steps:
1. Logs in the user and gets the access token.
2. Creates a project.
3. Creates an AOI (Area of Interest) from a GeoJSON file.
4. Uploads the TIF image and its associated IMD metadata file in multiple parts.
5. Follows the activity to see the progress of the post-processing of the uploaded asset.
"""

# We do 10 MB chunks (maximum size for S3 multipart upload is 5 GB)
CHUNK_SIZE = 10 * 1024 * 1024

# dev API URL
# API_URL = "https://dev.api.beod.luxcarta.cloud/v1"

# production API URL
API_URL = "https://api.beod.luxcarta.cloud/v1"


def check_response(expected_code: int, response) -> bool:
    """
    Check if the response status code is as expected.
    If not, print the status code, reason, and text of the response.
    """
    if response.status_code != expected_code:
        print("FAILED")
        print("STATUS CODE:", response.status_code, " - EXPECTED:", expected_code)
        print("REASON:", response.reason)
        print("TEXT:", response.text)
        return False
    return True


def get_aoi_polygon(geojson_file) -> str:
    """
    Get the WKT polygon from a GeoJSON file.
    The geometry of first feature in the GeoJSON is used.
    """
    with open(geojson_file, "r") as f:
        geojson_data = json.load(f)
    
    if not geojson_data or "features" not in geojson_data or not geojson_data["features"]:
        raise ValueError("Invalid GeoJSON file")
    
    first_feature = geojson_data["features"][0]
    if "geometry" not in first_feature or "coordinates" not in first_feature["geometry"]:
        raise ValueError("No geometry found in GeoJSON feature")
           
    coordinates = first_feature["geometry"]["coordinates"]
    if first_feature["geometry"]["type"] == "Polygon":
        coords_str = ", ".join([f"{coord[0]} {coord[1]}" for coord in coordinates[0]])
        wkt_polygon = f"POLYGON (({coords_str}))"
    else:
        raise ValueError("Only Polygon geometry is supported")
    
    return wkt_polygon


def login(username: str, password: str) -> dict:
    """
    This function logs in the user and returns the access token
    """
    url = API_URL + "/auth/login"
    data = {
        "email": username,
        "password": password
    }
    response = requests.post(url, json=data)
    if response.status_code == 200:
        return response.json()
    else:
        return None


def create_project(token: str, project_name: str) -> int:
    """
    Create a project with the given name.
    """
    url = API_URL + "/users/me/projects"
    headers = {"Authorization": f"Bearer {token}"}
    data = {"name": project_name}
    response = requests.post(url, json=data, headers=headers)
    
    if response.status_code == 201:
        return response.json()["project_id"]
    else:
        return None


def create_aoi(token: str, project_id: str, wkt_geometry: str) -> int:
    """
    Create an aoi with the given geometry for the specified project.
    """
    url = API_URL + f"/users/me/projects/{project_id}/assets/aoi"
    headers = {"Authorization": f"Bearer {token}"}
    data = {"geometry": wkt_geometry}
    response = requests.post(url, json=data, headers=headers)
    
    if response.status_code == 201:
        return response.json()["asset_id"]
    else:
        return None


def follow_activity(token: str, project_id: str, activity_id: int) -> bool:
    """
    Follow an activity in a project until it terminates.
    """
    url = API_URL + f"/users/me/projects/{project_id}/activities/{activity_id}"
    headers = {"Authorization": f"Bearer {token}"}
    
    while True:
        response = requests.get(url, headers=headers)

        if response.status_code != 200:
            print('Unable to track the activity')
            return False
        
        json_response = response.json()
        state = json_response["state"]
        progress = json_response["progress"]
        details = f", details: {json_response.get('details', '')}" if 'details' in json_response else ''
        print(f'> Activity state: {state}{details}, progress: {progress}')
        
        if state == "SUCCESSFUL":
            return True
        elif state == "FAILED":
            return False
        
        time.sleep(1)


def upload_one_file(
    filename: str,
    parts: List[int],
    urls: List[str],
) -> List[Dict[str, Union[int, str]]]:
    """ 
    Upload a single file in multiple parts to the given URLs.
    """
    uploaded_parts = []
    with open(filename, "rb") as f:
        for part_index, part_number in enumerate(parts):
            # read the part
            offset = (part_number - 1) * CHUNK_SIZE
            f.seek(offset)
            data = f.read(CHUNK_SIZE)

            # upload the part
            print(f"uploading part {part_number} of {filename}...")
            headers = {}
            response = requests.put(urls[part_index], data=data, headers=headers)

            if not check_response(200, response):
                return []

            # keep track of the uploaded parts as we will need them to complete the upload
            etag = response.headers["ETag"]
            uploaded_parts.append({"ETag": etag, "PartNumber": part_number})

    return uploaded_parts


def get_local_files_chunks(files: List[str], chunk_size: int = CHUNK_SIZE) -> List[Dict]:
    """
    Get the local files and their chunks for upload.
    """
    upload_files = []
    for file in files:
        size = os.path.getsize(file)
        parts_count = size // chunk_size
        if size % chunk_size > 0:
            parts_count += 1
        upload_files.append({
            "name": file,
            "size": size,
            "parts_count": parts_count,
        })
        
    return upload_files


def upload_image_asset(token: str, project_id: int, files: List[str]):
    """
    Upload multiple files as a single raster asset.
    """
    # Step 1: Initialize the upload
    files_chunks = get_local_files_chunks(files)
    headers = {"Authorization": f"Bearer {token}"}

    initialize_request = {
        "asset_name": "my uploaded image",
        "asset_type": "raster",
        "asset_files": files_chunks
    }

    response = requests.post(
        f"{API_URL}/users/me/projects/{project_id}/assets/upload/initialize",
        json=initialize_request,
        headers=headers,
    )

    if not check_response(201, response):
        return

    initialize_response = response.json()
    uploaded_asset_id = initialize_response["asset_id"]
    activity_id = initialize_response["activity_id"]

    # Step 2-4: For each file, get URLs, upload parts, and complete
    for file_index, upload_file in enumerate(list(initialize_response['uploads'].keys())):
        file_key, upload_id = initialize_response["uploads"][upload_file]
        parts = list(range(1, files_chunks[file_index]["parts_count"] + 1))
 
        # Get the upload urls for the file parts
        get_multi_part_urls_request = {
            "upload_id": upload_id,
            "file_key": file_key,
            "parts": parts,
        }
        
        response = requests.post(
            f"{API_URL}/users/me/projects/{project_id}/assets/"
            f"{uploaded_asset_id}/upload/multi-part-urls",
            json=get_multi_part_urls_request,
            headers=headers,
        )

        if not check_response(200, response):
            return

        get_multi_part_urls_response = response.json()

        # Upload the file parts
        uploaded_parts = upload_one_file(
            upload_file,
            get_multi_part_urls_request["parts"],
            get_multi_part_urls_response["urls"]
        )

        # Complete the upload
        complete_upload_request = {
            "upload_id": upload_id,
            "file_key": file_key,
            "parts": uploaded_parts
        }

        response = requests.post(
            f"{API_URL}/users/me/projects/{project_id}/assets/"
            f"{uploaded_asset_id}/upload/complete",
            json=complete_upload_request,
            headers=headers,
        )

        if not check_response(201, response):
            return

        complete_response = response.json()
        print(f"File {file_key} - status = {complete_response['status']}")

    # Step 5: Follow the activity to monitor processing
    print("Following the activity to see the progress of the post-processing...")
    follow_activity(token, project_id, activity_id)


if __name__ == "__main__":
    # Local data files
    aoi_file = "data/aoi.geojson"
    tif_file = "data/image.tif"
    imd_file = "data/image.imd"
    
    # Credentials must be defined in the environment variables
    if 'TEST_USER' not in os.environ or 'TEST_PASSWORD' not in os.environ:
        print("Please set the TEST_USER and TEST_PASSWORD environment variables.")
        exit(1)
        
    # Get the user credentials from the environment variables
    user = os.environ['TEST_USER']
    password = os.environ['TEST_PASSWORD']
    
    # First setup a project with an AOI which fits the uploaded image
    aoi_wkt_polygon = get_aoi_polygon(geojson_file=aoi_file)    
    token = login(user, password)["access_token"]
    assert token is not None, "Login failed, please check your credentials"
    
    project_id = create_project(token, f"Upload Tutorial project {datetime.now().strftime('%Y-%m-%d/%H:%M:%S')}")
    assert project_id is not None, "Project creation failed"
    
    aoi_id = create_aoi(token, project_id, aoi_wkt_polygon)
    assert aoi_id is not None, "AOI creation failed"
    
    # Upload the TIF image and its associated IMD metadata file
    files = [tif_file, imd_file]
    upload_image_asset(token, project_id, files)

Error Handling and Best Practices

Common Issues and Solutions

Large File Handling: Files larger than 5GB should use smaller chunk sizes
Network Interruptions: Implement retry logic for failed part uploads
Parallel Uploads: For better performance, upload parts in parallel using threading

Recommended Practices

Use appropriate chunk sizes (10MB works well for most cases)
Implement exponential backoff for retries
Monitor activity status to ensure processing completes successfully
Keep track of upload progress for user feedback

Summary

This tutorial has covered the three main steps of the BEOD image upload process:

Initialize: Set up the multi-part upload with file information
Get Multi-part URLs: Obtain presigned URLs for each file chunk
Upload and Complete: Upload file parts and finalize the process

The multi-part upload approach provides robustness and efficiency for handling large image files and their associated metadata. Once uploaded, your images become available as user raster assets within your BEOD project and can be used as source images for extraction scenarios.

Remember to always monitor the upload activity to ensure successful processing of your uploaded assets before proceeding with any extraction workflows.

Download Complete Example

To get started quickly with the BEOD upload process, you can download a complete working example that includes:

beod_upload_asset.py: The complete Python script shown above
Sample data files: Example TIF image, IMD metadata, and GeoJSON AOI files

Download BEOD Upload Tutorial Package

The package contains everything you need to test the upload functionality with your own BEOD account. Simply extract the files, set your environment variables, and run the script.

Conclusion

This tutorial has provided a comprehensive guide to uploading images and metadata to the BrightEarth On Demand platform using the BEOD API. By following the outlined steps, you can efficiently manage large file uploads, ensuring your data is ready for extraction and analysis.