ADR 2025009 — YAML Filter Specification (OmegaT)¶
Status¶
Accepted - Date: 2025-11-20
Implemented - Date: 2025-12-22
Pull Request: #1832
Context¶
OmegaT needs a filter to read and translate YAML files commonly used for configuration and localization. YAML is a superset of JSON and supports nested mappings, sequences, and scalar values, which makes robust parsing preferable over ad‑hoc text processing.
The project already depends on Jackson’s YAML binding (via
com.fasterxml.jackson.dataformat.yaml.YAMLFactory).OmegaT filters are pluggable. For consistent deployment and lifecycle management, the YAML filter shall be implemented as a plugin with explicit load/unload hooks and registered as an internal filter through
Plugins.propertiesin the project root.Prior art exists: Okapi Framework provides a third‑party OmegaT plugin that includes a YAML filter. While this offers immediate YAML support for some workflows, relying on an external bundle limits OmegaT’s ability to define default extraction rules consistent with other built‑in filters, align UX and configuration, and guarantee availability without extra installation steps. This ADR documents a native plugin to improve integration while remaining mindful of interoperability with Okapi.
Decision¶
Implement a new YAML filter as a plugin class org.omegat.filters2.text.yaml.YamlFilter located under src/omegat/filters2/text/yaml/ with the following properties:
Only scalar values (string scalars) are translation targets.
Section titles and mapping keys are not translation targets.
Alignment functionality is not required for the initial version.
Use Jackson YAML (
ObjectMapperwithYAMLFactory) to parse and serialize.Provide plugin lifecycle: static
loadPlugins()andunloadPlugins()methods on the filter class.Register the internal filter via an entry in the root
Plugins.properties.
Goals and Scope¶
Support reading and translating YAML documents and emitting a translated YAML document.
Support nested mappings and sequences. Only extract string scalar values; numeric/boolean/null values are ignored for translation.
Basic extraction only; advanced selection by path is out of scope for v1 but considered for future enhancements.
Non-Goals (v1)¶
Alignment support.
Preserving comments and exact whitespace/formatting styles (block/folded style, quoting style). YAML comment round‑tripping is not guaranteed by Jackson.
Extraction Rules (v1)¶
Translation targets: string scalar values anywhere in the document tree (mapping values, sequence items), excluding mapping keys.
Non-targets: mapping keys, anchors/aliases keys, tags, numeric/boolean/null scalars, dates, and other non-string nodes.
Nested structures: traverse the full tree depth-first; whenever a node is a string scalar value, create a text unit.
Key sequences and IDs: For interoperability with Okapi, each extracted text unit is assigned an ID and a comment.
Path: Uses
/as a separator for nested keys and[n]for sequence indices.ID format:
<path>_<index>, where index is the occurrence count of the path (starting at 0).Comment format:
name=<path>.
Sequences: traverse elements; for each element, append
[n]to the path wherenis the zero-based index. Extract elements that are string scalars; ignore non-string elements.Multi-line scalars: include the full scalar content as a single text unit.
Aliases/anchors: treat aliases as separate occurrences at their locations; underlying value content is extracted where it appears.
Writing/Generation Rules¶
The translated document is produced by replacing the original string scalar values with their translations and serializing back with the same
ObjectMapper/YAMLFactory.Limitations:
Comments and exact original whitespace/indent/quoting style may not be preserved by Jackson’s serializer.
Key order: best-effort preservation (Jackson can preserve insertion order for objects/maps but is not guaranteed for all sources).
Supported File Extensions and Encoding¶
Extensions:
.yml,.yamlDefault encoding: UTF‑8. Detect BOM when present; otherwise treat as UTF‑8.
Error Handling¶
If the YAML cannot be parsed, the filter should:
Log a descriptive error including the file path and parsing location if available.
Fail the document processing with a clear message to the user.
Defensive parsing for large or deeply nested documents: optionally cap maximum depth and document size if the framework supports it; otherwise rely on standard memory limits.
Examples¶
Input YAML:
title: Welcome
menu:
items:
- Home
- About
- Contact
footer:
copyright: "(c) 2025 Example Co."
links:
help: /help
terms: /terms
Extracted translation units (order illustrative):
term:
Welcomeid:title_0comment:name=titleterm:
Homeid:menu/items[0]_0comment:name=menu/items[0]term:
Aboutid:menu/items[1]_0comment:name=menu/items[1]term:
Contactid:menu/items[2]_0comment:name=menu/items[2]term:
(c) 2025 Example Co.id:footer/copyright_0comment:name=footer/copyright
Non-extracted:
Keys:
title,menu,items,footer,copyright,links,help,termsPath-like values that are not strings? In the example,
/helpand/termsare string scalars. Whether to translate such values can be controlled by future path-based include/exclude rules; in v1 they are included as extractable strings.
Implementation Notes¶
The filter should follow OmegaT’s existing filter interfaces and patterns used by other text-based filters under
src/omegat/filters2/.Use a single
ObjectMapperinstance per filter instance; callfindAndRegisterModules()once in the constructor.Tree traversal: operate on
JsonNodeto collect and replace string scalars. Maintain a path stack (e.g., YAML Pointer-like) for debugging and potential future configuration.Round-trip: after translation, write the modified node tree back via the same mapper.
Registration (Internal Build)¶
Add an entry to
Plugins.propertiesin the project root to load the filter as an internal plugin. The key/value should reference the filter’s FQNorg.omegat.filters2.text.yaml.YamlFilteraccording to the file’s established convention. EnsureloadPlugins()registers the filter with the internal filter registry.
Testing Strategy¶
Unit tests for extraction logic on representative YAML structures:
Flat mappings, nested mappings, sequences, mixed types.
Multi-line block scalars (
|and>) and quoted scalars.Anchors and aliases.
Non-string values ensured to be ignored.
Round-trip tests: parse -> extract -> inject translations -> write -> re-parse and verify scalar values are replaced appropriately.
Backward Compatibility and Migration¶
This is a new filter; no migration is required. Existing projects remain unaffected until the filter is selected for relevant file types.
Security and Performance Considerations¶
YAML is complex; avoid enabling features that execute arbitrary code (not applicable with Jackson YAML in default configuration).
Consider guarding against extremely deep documents to avoid stack overflows during traversal.
Alternatives Considered¶
Okapi Framework YAML via Okapi OmegaT plugin: Reusing the Okapi plugin would avoid implementing a new filter and benefit from Okapi’s mature parsing and wide coverage. However, it introduces an external dependency, separate release cadence, and configuration semantics that differ from OmegaT’s native filters. It also constrains us from tailoring default extraction to OmegaT’s conventions (e.g., “keys not translatable, only string scalars”), tight integration with project settings, and internal QA/testing pipelines. Decision: acknowledge and remain compatible where feasible (e.g., similar extraction defaults and extension handling), but proceed with a native plugin for first‑class integration.
SnakeYAML directly: viable, but Jackson YAML offers a consistent API already used within OmegaT and integrates well with
JsonNodeprocessing.
Open Questions / Future Enhancements¶
Path-based include/exclude rules (e.g., YAML Pointer or JSONPath-like) to skip non-localizable values such as URLs or keys matching patterns.
Optional preservation of scalar style (block vs folded vs quoted).
Comment preservation by switching to or augmenting with a comment-aware YAML library.
References:
Okapi Framework and OmegaT plugin (prior art): https://okapiframework.org/