File Utilities
Comprehensive file operations and handling utilities organized in a modular structure for better maintainability and functionality.
Module Overview
The file utilities have been refactored into a modular structure with specialized modules for different operations:
synapse_sdk.utils.file.archive- ZIP archive creation and extractionsynapse_sdk.utils.file.checksum- File hash calculations and verificationsynapse_sdk.utils.file.chunking- Memory-efficient file reading in chunkssynapse_sdk.utils.file.download- File downloading utilities with async supportsynapse_sdk.utils.file.encoding- Base64 encoding and file format handlingsynapse_sdk.utils.file.io- General I/O operations for JSON/YAML filessynapse_sdk.utils.file.video- Video transcoding and format conversion
Backward Compatibility
All functions remain accessible from the main module import:
# Both approaches work identically
from synapse_sdk.utils.file import read_file_in_chunks, download_file
from synapse_sdk.utils.file.chunking import read_file_in_chunks
from synapse_sdk.utils.file.download import download_file
Archive Operations
Functions for creating and extracting ZIP archives.
from synapse_sdk.utils.file.archive import archive, unarchive
# Create archive
archive('/path/to/directory', '/path/to/output.zip')
# Extract archive
unarchive('/path/to/archive.zip', '/path/to/extract/directory')
Chunked File Operations
read_file_in_chunks
Read files in chunks for efficient memory usage, particularly useful for large files or when processing files in chunks for uploading or hashing.
from synapse_sdk.utils.file.chunking import read_file_in_chunks
# Read a file in default 50MB chunks
for chunk in read_file_in_chunks('/path/to/large_file.bin'):
process_chunk(chunk)
# Read with custom chunk size (10MB)
for chunk in read_file_in_chunks('/path/to/file.bin', chunk_size=1024*1024*10):
upload_chunk(chunk)
Parameters:
file_path(str | Path): Path to the file to readchunk_size(int, optional): Size of each chunk in bytes. Defaults to 50MB (52,428,800 bytes)
Returns:
- Generator yielding file content chunks as bytes
Raises:
FileNotFoundError: If the file doesn't existPermissionError: If the file can't be read due to permissionsOSError: If there's an OS-level error reading the file
Use Cases
Large File Processing: Efficiently process files that are too large to fit in memory:
import hashlib
def calculate_hash_for_large_file(file_path):
hash_md5 = hashlib.md5()
for chunk in read_file_in_chunks(file_path):
hash_md5.update(chunk)
return hash_md5.hexdigest()
Chunked Upload Integration: The function integrates seamlessly with the CoreClientMixin.create_chunked_upload method:
from synapse_sdk.clients.backend.core import CoreClientMixin
client = CoreClientMixin(base_url='https://api.example.com')
result = client.create_chunked_upload('/path/to/large_file.zip')
Best Practices:
- Use default chunk size (50MB) for optimal upload performance
- Adjust chunk size based on available memory and network conditions
- For very large files (>1GB), consider using smaller chunks for better progress tracking
- Always handle exceptions when working with file operations
Checksum Functions
calculate_checksum
Calculate checksum for regular files:
from synapse_sdk.utils.file.checksum import calculate_checksum
checksum = calculate_checksum('/path/to/file.bin')
get_checksum_from_file
Calculate checksum for file-like objects without requiring Django dependencies. This function works with any file-like object that has a read() method, making it compatible with Django's File objects, BytesIO, StringIO, and regular file objects.
import hashlib
from io import BytesIO
from synapse_sdk.utils.file.checksum import get_checksum_from_file
# Basic usage with BytesIO (defaults to SHA1)
data = BytesIO(b'Hello, world!')
checksum = get_checksum_from_file(data)
print(checksum) # SHA1 hash as hexadecimal string
# Using different hash algorithms
checksum_md5 = get_checksum_from_file(data, digest_mod=hashlib.md5)
checksum_sha256 = get_checksum_from_file(data, digest_mod=hashlib.sha256)
# With real file objects
with open('/path/to/file.txt', 'rb') as f:
checksum = get_checksum_from_file(f)
Parameters:
file(IO[Any]): File-like object with read() method that supports reading in chunksdigest_mod(Callable[[], Any], optional): Hash algorithm from hashlib. Defaults tohashlib.sha1
Returns:
str: Hexadecimal digest of the file contents
Key Features:
- Memory Efficient: Reads files in 4KB chunks to handle large files
- Automatic File Pointer Reset: Resets to beginning if the file object supports seeking
- Text/Binary Agnostic: Handles both text (StringIO) and binary (BytesIO) file objects
- No Django Dependency: Works without Django while being compatible with Django File objects
- Flexible Hash Algorithms: Supports any hashlib algorithm (SHA1, SHA256, MD5, etc.)
Download Functions
Utilities for downloading files from URLs with both synchronous and asynchronous support.
from synapse_sdk.utils.file.download import download_file, adownload_file
# Synchronous download
local_path = download_file(url, destination)
# Asynchronous download
import asyncio
local_path = await adownload_file(url, destination)
# URL to path conversion for multiple files
from synapse_sdk.utils.file.download import files_url_to_path
paths = files_url_to_path(url_list, destination_directory)
Encoding Functions
Base64 encoding utilities for files.
from synapse_sdk.utils.file.encoding import convert_file_to_base64
# Convert file to base64
base64_data = convert_file_to_base64('/path/to/file.jpg')
I/O Functions
General I/O operations for structured data files.
from synapse_sdk.utils.file.io import get_dict_from_file, get_temp_path
# Load dictionary from JSON or YAML file
config = get_dict_from_file('/path/to/config.json')
settings = get_dict_from_file('/path/to/settings.yaml')
# Get temporary file path
temp_path = get_temp_path()
temp_subpath = get_temp_path('subdir/file.tmp')
Video Transcoding
Advanced video transcoding capabilities using FFmpeg for format conversion, compression, and optimization.
Requirements
- ffmpeg-python:
pip install ffmpeg-python - FFmpeg: Must be installed on the system and available in PATH
Supported Video Formats
The video module supports a wide range of input formats:
- MP4 (.mp4, .m4v)
- AVI (.avi)
- MOV (.mov)
- MKV (.mkv)
- WebM (.webm)
- FLV (.flv)
- WMV (.wmv)
- MPEG (.mpeg, .mpg)
- 3GP (.3gp)
- OGV (.ogv)
Core Functions
validate_video_format
Check if a file has a supported video format:
from synapse_sdk.utils.file.video.transcode import validate_video_format
if validate_video_format('video.mp4'):
print("Supported format")
else:
print("Unsupported format")
get_video_info
Extract metadata from video files:
from synapse_sdk.utils.file.video.transcode import get_video_info
info = get_video_info('input.mp4')
print(f"Duration: {info['duration']} seconds")
print(f"Resolution: {info['width']}x{info['height']}")
print(f"Video Codec: {info['video_codec']}")
print(f"Audio Codec: {info['audio_codec']}")
print(f"FPS: {info['fps']}")
transcode_video
Main transcoding function with extensive configuration options:
from synapse_sdk.utils.file.video.transcode import transcode_video, TranscodeConfig
from pathlib import Path
# Basic transcoding with default settings
output_path = transcode_video('input.avi', 'output.mp4')
# Custom configuration
config = TranscodeConfig(
vcodec='libx264', # Video codec
preset='fast', # Encoding speed vs quality
crf=20, # Quality (lower = better quality)
acodec='aac', # Audio codec
audio_bitrate='128k', # Audio bitrate
resolution='1920x1080', # Output resolution
fps=30, # Frame rate
start_time=10.0, # Start from 10 seconds
duration=60.0 # Only process 60 seconds
)
output_path = transcode_video('input.mkv', 'output.mp4', config)
TranscodeConfig Options
@dataclass
class TranscodeConfig:
vcodec: str = 'libx264' # Video codec (libx264, libx265, etc.)
preset: str = 'medium' # Encoding preset (fast, medium, slow)
crf: int = 28 # Quality factor (0-51, lower = better)
acodec: str = 'aac' # Audio codec (aac, opus, etc.)
audio_bitrate: str = '128k' # Audio bitrate
movflags: str = '+faststart' # MP4 optimization flags
resolution: Optional[str] = None # Output resolution (e.g., '1920x1080')
fps: Optional[int] = None # Output frame rate
start_time: Optional[float] = None # Start time in seconds
duration: Optional[float] = None # Duration to process in seconds
Progress Callback Support
Monitor transcoding progress with callback functions:
def progress_callback(progress_percent):
print(f"Progress: {progress_percent:.1f}%")
output_path = transcode_video(
'input.mp4',
'output.mp4',
progress_callback=progress_callback
)
optimize_for_web
Quick web optimization with predefined settings:
from synapse_sdk.utils.file.video.transcode import optimize_for_web
# Optimized for web streaming with fast start
web_video = optimize_for_web('input.mov', 'web_output.mp4')
This function uses optimized settings:
- Fast encoding preset
- Web-friendly compression (CRF 23)
- Fast start flag for streaming
- Fragment keyframes for better web compatibility
Error Handling
The video module provides specific exceptions:
from synapse_sdk.utils.file.video.transcode import (
VideoTranscodeError,
UnsupportedFormatError,
FFmpegNotFoundError,
TranscodingFailedError
)
try:
transcode_video('input.xyz', 'output.mp4')
except UnsupportedFormatError:
print("Input format not supported")
except FFmpegNotFoundError:
print("FFmpeg not installed")
except TranscodingFailedError as e:
print(f"Transcoding failed: {e}")
Advanced Usage Examples
Batch Processing:
import os
from pathlib import Path
input_dir = Path('/path/to/videos')
output_dir = Path('/path/to/output')
for video_file in input_dir.glob('*'):
if validate_video_format(video_file):
output_file = output_dir / f"{video_file.stem}.mp4"
try:
transcode_video(video_file, output_file)
print(f"Processed: {video_file.name}")
except VideoTranscodeError as e:
print(f"Failed to process {video_file.name}: {e}")
Quality Optimization:
# High quality for archival
archive_config = TranscodeConfig(
preset='slow',
crf=18,
audio_bitrate='256k'
)
# Small size for mobile
mobile_config = TranscodeConfig(
preset='fast',
crf=28,
resolution='1280x720',
audio_bitrate='96k'
)
# Apply different configs
archive_output = transcode_video(input_file, 'archive.mp4', archive_config)
mobile_output = transcode_video(input_file, 'mobile.mp4', mobile_config)
Video Clipping:
# Extract 30-second clip starting from 1 minute
clip_config = TranscodeConfig(
start_time=60.0, # Start at 1 minute
duration=30.0, # Extract 30 seconds
crf=20 # High quality
)
clip = transcode_video('long_video.mp4', 'clip.mp4', clip_config)