본문으로 건너뛰기

Dataset Formats & Converters

Pydantic models for dataset formats and bidirectional converters between Datamaker and external formats.

Overview

The dataset annotation models and converters are now organized as follows:

  • Annotation Models: synapse_sdk.utils.annotation_models - Pydantic models for all formats

  • Converters: synapse_sdk.utils.converters - Bidirectional conversion between formats

  • Format models: Pydantic models for Datamaker (v1/v2), YOLO, and other formats

  • Converters: Bidirectional conversion between Datamaker and external formats

# Annotation models
from synapse_sdk.utils.annotation_models import (
DMVersion,
DMDataset,
DMImageItem,
YOLODataset,
YOLOImage,
COCODataset,
PascalAnnotation,
)

# Converters
from synapse_sdk.utils.converters import (
DatasetFormat,
FromDMToYOLOConverter,
YOLOToDMConverter,
FromDMToCOCOConverter,
FromDMToPascalConverter,
get_converter,
)

Dataset Formats

DatasetFormat Enum

from synapse_sdk.utils.converters import DatasetFormat

DatasetFormat.DM_V1 # Datamaker v1 format
DatasetFormat.DM_V2 # Datamaker v2 format (default)
DatasetFormat.YOLO # YOLO format
DatasetFormat.COCO # COCO format
DatasetFormat.PASCAL # Pascal VOC format

DMVersion Enum

from synapse_sdk.utils.annotation_models import DMVersion

DMVersion.V1 # Datamaker schema v1
DMVersion.V2 # Datamaker schema v2 (current)

Converting Between Formats

DM to YOLO

from synapse_sdk.utils.converters import FromDMToYOLOConverter, DMVersion

# Convert categorized dataset (train/valid/test splits)
converter = FromDMToYOLOConverter(
root_dir='/data/dm_dataset',
is_categorized=True,
dm_version=DMVersion.V2,
)

# Run conversion
result = converter.convert()

# Save to output directory
converter.save_to_folder('/data/yolo_output')

Source structure (categorized):

dm_dataset/
├── train/
│ ├── json/
│ │ └── *.json
│ └── original_files/
│ └── *.jpg
├── valid/
│ ├── json/
│ └── original_files/
└── test/
├── json/
└── original_files/

Output structure:

yolo_output/
├── data.yaml
├── train/
│ ├── images/
│ └── labels/
├── valid/
│ ├── images/
│ └── labels/
└── test/
├── images/
└── labels/

YOLO to DM

from synapse_sdk.utils.converters import YOLOToDMConverter, DMVersion

converter = YOLOToDMConverter(
root_dir='/data/yolo_dataset',
is_categorized=True,
dm_version=DMVersion.V2,
)

result = converter.convert()
converter.save_to_folder('/data/dm_output')

Using get_converter Factory

from synapse_sdk.utils.converters import get_converter, DatasetFormat

# Get appropriate converter
converter = get_converter(
source_format=DatasetFormat.DM_V2,
target_format=DatasetFormat.YOLO,
root_dir='/data/source',
is_categorized=True,
)

converter.convert()
converter.save_to_folder('/data/output')

Datamaker Format Models

DMv2 Models (Current)

from synapse_sdk.utils.annotation_models import (
DMDataset, # Alias for DMv2Dataset
DMImageItem, # Alias for DMv2ImageItem
DMBoundingBox,
DMPolygon,
DMKeypoint,
DMPolyline,
DMRelation,
DMGroup,
DMAttribute,
)

DMDataset Structure

from synapse_sdk.utils.annotation_models import DMDataset

# Load from JSON
dataset = DMDataset.model_validate(json_data)

# Access properties
print(dataset.version) # "2.0"
print(dataset.item.name) # Image filename
print(dataset.item.width) # Image width
print(dataset.item.height) # Image height

# Access annotations
for annotation in dataset.annotations:
print(annotation.category) # e.g., "object_detection"
print(annotation.data) # Annotation-specific data

Annotation Types

from synapse_sdk.utils.annotation_models import (
DMBoundingBox,
DMPolygon,
DMKeypoint,
)

# Bounding box
bbox = DMBoundingBox(
x=100,
y=100,
width=200,
height=150,
label="car",
)

# Polygon
polygon = DMPolygon(
points=[[100, 100], [200, 100], [200, 200], [100, 200]],
label="building",
)

# Keypoint
keypoint = DMKeypoint(
x=150,
y=150,
label="nose",
visible=True,
)

DMv1 Models (Legacy)

from synapse_sdk.utils.annotation_models import (
DMv1Dataset,
DMv1AnnotationBase,
DMv1Classification,
)

YOLO Format Models

from synapse_sdk.utils.annotation_models import (
YOLODataset,
YOLODatasetConfig,
YOLOImage,
YOLOAnnotation,
)

YOLODatasetConfig

from synapse_sdk.utils.annotation_models import YOLODatasetConfig

config = YOLODatasetConfig(
path='/data/yolo_dataset',
train='train/images',
val='valid/images',
test='test/images',
names=['person', 'car', 'bicycle'],
)

# Save to data.yaml
config.save('/data/yolo_dataset/data.yaml')

YOLOAnnotation

from synapse_sdk.utils.annotation_models import YOLOAnnotation

# Standard YOLO format: class_id x_center y_center width height (normalized)
annotation = YOLOAnnotation(
class_id=0,
x_center=0.5,
y_center=0.5,
width=0.3,
height=0.2,
)

# Convert to YOLO line format
line = annotation.to_line() # "0 0.5 0.5 0.3 0.2"

BaseConverter Class

All converters extend BaseConverter with common functionality.

Properties

PropertyTypeDescription
root_dirPathRoot directory containing source data
is_categorizedboolWhether dataset has train/valid/test splits
is_single_conversionboolWhether converting single files
converted_dataAnyHolds converted data after convert()

Methods

convert()

Convert data in-memory.

result = converter.convert()

save_to_folder()

Save converted data to output directory.

converter.save_to_folder('/output/path')

convert_single_file()

Convert a single data object (requires is_single_conversion=True).

converter = FromDMToYOLOConverter(is_single_conversion=True)
result = converter.convert_single_file(dm_json, image_file)

Utility Methods

# Ensure directory exists
path = converter.ensure_dir('/some/path')

# Get image dimensions
width, height = converter.get_image_size('/path/to/image.jpg')

# Find image for label file
image_path = converter.find_image_for_label('image001', image_dir)

Pipeline Integration

Use with DatasetAction for automated workflows:

from synapse_sdk.plugins.actions import DatasetAction, DatasetParams
from synapse_sdk.plugins.pipelines import ActionPipeline

# Pipeline: Download -> Convert -> Train
pipeline = ActionPipeline([
DatasetAction, # Download dataset
DatasetAction, # Convert to YOLO
TrainAction, # Train model
])

Supported Formats

SourceTargetConverter ClassStatus
DM v1/v2YOLOFromDMToYOLOConverterStable
YOLODM v1/v2YOLOToDMConverterStable
DM v1/v2COCOFromDMToCOCOConverterStable
COCODM v1/v2COCOToDMConverterStable
DM v1/v2Pascal VOCFromDMToPascalConverterStable
Pascal VOCDM v1/v2PascalToDMConverterStable

Note: All converters now return Pydantic models for type safety. See the Migration Guide for details.