Skip to main content

File Utilities

File operations and handling utilities for common tasks.

Module Overview

The file utilities are organized into specialized modules:

ModulePurpose
synapse_sdk.utils.file.archiveZIP archive creation and extraction
synapse_sdk.utils.file.checksumFile hash calculations and verification
synapse_sdk.utils.file.downloadFile downloading with async support
synapse_sdk.utils.file.ioJSON/YAML file operations
synapse_sdk.utils.file.requirementsPython requirements parsing

All functions are accessible from the main module:

from synapse_sdk.utils.file import (
create_archive,
calculate_checksum,
download_file,
get_dict_from_file,
)

Archive Operations

Create and extract ZIP archives with filtering and progress tracking.

create_archive

Create a ZIP archive from a directory.

from synapse_sdk.utils.file import create_archive
from pathlib import Path

# Basic usage
archive_path = create_archive(
source_path=Path("./my_plugin"),
archive_path=Path("./my_plugin.zip")
)

create_archive_from_git

Create an archive respecting .gitignore rules.

from synapse_sdk.utils.file import create_archive_from_git
from pathlib import Path

# Archive only git-tracked files
archive_path = create_archive_from_git(
source_path=Path("./my_project"),
archive_path=Path("./release.zip")
)

extract_archive

Extract a ZIP archive.

from synapse_sdk.utils.file import extract_archive
from pathlib import Path

extract_archive(
archive_path=Path("./archive.zip"),
output_path=Path("./extracted")
)

Progress Callback

Monitor archive progress with a callback function:

from synapse_sdk.utils.file import create_archive
from pathlib import Path

def on_progress(current: int, total: int) -> None:
percent = (current / total) * 100
print(f"Progress: {percent:.1f}% ({current}/{total} files)")

create_archive(
source_path=Path("./data"),
archive_path=Path("./data.zip"),
progress_callback=on_progress
)

Checksum Functions

Calculate and verify file hashes using various algorithms.

calculate_checksum

Calculate checksum for a file.

from synapse_sdk.utils.file import calculate_checksum
from pathlib import Path

# Default: MD5
checksum = calculate_checksum(Path("./file.bin"))

# Specify algorithm (use string literals)
sha256_checksum = calculate_checksum(
Path("./file.bin"),
algorithm='sha256'
)

Supported Algorithms

HashAlgorithm is a Literal type that can be imported for type hints:

from synapse_sdk.utils.file import HashAlgorithm

def hash_file(path: str, algo: HashAlgorithm = 'sha256') -> str:
return calculate_checksum(path, algorithm=algo)
AlgorithmUse Case
'md5'Default, legacy compatibility, fast
'sha1'Git compatibility
'sha256'Secure
'sha512'Maximum security

Note: Pass algorithm values as string literals (e.g., algorithm='sha256'). The HashAlgorithm type provides IDE autocompletion and type checking.

verify_checksum

Verify a file against an expected checksum.

from synapse_sdk.utils.file import verify_checksum
from pathlib import Path

is_valid = verify_checksum(
file_path=Path("./download.zip"),
expected="a1b2c3d4e5f6..."
)

if not is_valid:
raise ValueError("Checksum mismatch - file may be corrupted")

calculate_checksum_from_file_object

Calculate checksum from file-like objects.

from synapse_sdk.utils.file import calculate_checksum_from_file_object
from io import BytesIO

data = BytesIO(b"Hello, world!")
checksum = calculate_checksum_from_file_object(data)

Good to know: Checksum functions read in chunks (1MB by default) for memory efficiency with large files.


Download Functions

Download files from URLs with both synchronous and asynchronous support.

download_file

Synchronous file download. The filename is automatically derived from the URL (or generated from URL hash).

from synapse_sdk.utils.file import download_file
from pathlib import Path

# Download to directory (filename derived from URL)
local_path = download_file(
url="https://example.com/file.zip",
path_download=Path("./downloads")
)

# With custom filename
local_path = download_file(
url="https://example.com/file.zip",
path_download=Path("./downloads"),
name="my_file" # Results in my_file.zip
)

adownload_file

Asynchronous file download.

import asyncio
from synapse_sdk.utils.file import adownload_file
from pathlib import Path

async def main():
path = await adownload_file(
url="https://example.com/file.zip",
path_download=Path("./downloads")
)
print(f"Downloaded to: {path}")

asyncio.run(main())

Batch Downloads

Convert file URLs to local paths by downloading them. The function modifies the dictionary in-place.

from synapse_sdk.utils.file import files_url_to_path

# Simple URL values - replaced with local Paths
files = {
'image1': 'https://example.com/file1.jpg',
'image2': 'https://example.com/file2.jpg',
}
files_url_to_path(files) # In-place modification
# files['image1'] is now a local Path object

# Dict values with 'url' key - 'url' is replaced with 'path'
files = {
'video': {'url': 'https://example.com/vid.mp4', 'size': 1024},
}
files_url_to_path(files)
# files['video'] is now {'path': Path(...), 'size': 1024}

For async batch downloads, use afiles_url_to_path:

import asyncio
from synapse_sdk.utils.file import afiles_url_to_path

async def main():
files = {
'image1': 'https://example.com/file1.jpg',
'image2': 'https://example.com/file2.jpg',
}
await afiles_url_to_path(files)
print(files) # URLs replaced with local Paths

asyncio.run(main())

I/O Functions

Read and write structured data files.

get_dict_from_file

Load a dictionary from JSON or YAML files.

from synapse_sdk.utils.file import get_dict_from_file

# JSON file
config = get_dict_from_file("/path/to/config.json")

# YAML file
settings = get_dict_from_file("/path/to/settings.yaml")

get_temp_path

Get a temporary directory path for SDK operations.

from synapse_sdk.utils.file import get_temp_path

# Get base temp directory: /tmp/datamaker
temp_dir = get_temp_path()

# Get subdirectory path: /tmp/datamaker/media
media_dir = get_temp_path("media")

Requirements Parsing

Parse Python requirements files.

read_requirements

Parse a requirements.txt file into a list. Returns None if the file doesn't exist or contains no valid requirements.

from synapse_sdk.utils.file import read_requirements

requirements = read_requirements("./requirements.txt")

if requirements:
print(requirements)
# ['numpy>=1.20.0', 'pandas>=1.3.0', 'requests>=2.25.0']
else:
print("No requirements found or file doesn't exist")

See Also