Build and Dependency Security¶
Motivation¶
OmegaT is a mature free, libre, and opensource project with a large dependency graph managed by Gradle. Like any software that pulls artifacts from public repositories, it is exposed to supply chain attacks — a threat category that has grown significantly in recent years and continues to speed up.
A supply chain attack targets the build process rather than the application source code directly. If an attacker can substitute or tamper with a dependency artifact served by a repository, malicious code can enter a release without any modification to the OmegaT source tree and without triggering code review processes.
The risk is not theoretical. High-profile incidents in the broader open-source ecosystem have demonstrated that:
Package repositories can be compromised or have individual packages hijacked.
Maintainers of widely-used libraries can be targeted or impersonated.
Typosquatting and dependency confusion attacks introduce malicious packages that are silently resolved by build tools.
Build infrastructure itself (CI runners, caches, mirrors) is a target.
The Evolving Threat Landscape¶
AI-Assisted Vulnerability Discovery¶
The increasing availability of AI coding assistants has a dual effect on security. While it can help developers write safer code, it also lowers the barrier for adversaries to discover and exploit vulnerabilities at scale. AI tools can:
Rapidly analyze large codebases for patterns associated with known vulnerability classes (null dereferences, unsafe deserialization, injection points).
Generate targeted exploit code for identified weaknesses.
Automate the discovery of supply chain targets — identifying widely depended packages maintained by a small number of contributors.
For a project like OmegaT, this means that even obscure transitive dependencies can become viable attack surfaces if they are poorly maintained upstream.
Incremental and Persistent Improvement¶
No single measure eliminates supply chain risk. The appropriate response is a layered, incremental approach that improves the security posture over time without requiring large one-time efforts that are challenging to sustain.
OmegaT’s security hardening follows this principle: each improvement is self-contained, reviewable, and does not block ongoing development work.
Dependency Verification¶
What It Does¶
Gradle’s built-in dependency verification validates the SHA-256 checksum of
every downloaded artifact against a known-good value recorded in
gradle/verification-metadata.xml. If any artifact does not match its
expected hash, the build fails immediately — before any code is compiled or
executed.
This provides a checksum-based safety net against:
Compromised or hijacked packages on Maven Central or other repositories.
Man-in-the-middle substitution during artifact download.
Cache poisoning attacks on local or CI artifact caches.
Configuration¶
The verification metadata file is located at:
gradle/verification-metadata.xml
Key configuration decisions in the current setup:
verify-metadata: true
POM files and Gradle module metadata (.module files) are also verified, not
only JAR artifacts. This prevents attacks that manipulate dependency resolution
graphs without changing JAR content.
verify-signatures: false
PGP signature verification is intentionally deferred. Enabling it requires
maintaining a keyring for all upstream publishers, which introduces significant
ongoing maintenance cost. It can be added incrementally in the future.
Source and Javadoc JARs are globally trusted
-sources.jar and -javadoc.jar artifacts are excluded from checksum
verification. These files contain no executable code and are resolved by IDE
synchronization outside the normal build resolution path, making them
impractical to track in the verification metadata.
Note: Gradle syntax requires group=".*" for trusted artifact patterns; the artifact attribute alone is
not accepted. Additionally, many dependencies do not publish source or Javadoc
JARs at all; attempting to verify absent artifacts would produce spurious build
errors.
Updating the Verification Metadata¶
When dependencies are added, removed, or upgraded, the verification metadata must be regenerated. Run:
./gradlew --write-verification-metadata sha256 dependencies
This resolves all dependency configurations and rewrites
gradle/verification-metadata.xml with updated checksums. Review the diff
before committing to confirm that only expected artifacts changed.
When running in a CI environment without network access to the expected repositories, dependency verification failures indicate either a configuration problem or a potential artifact substitution — both warrant investigation.
Relationship to SBOM¶
Dependency verification establishes a verified, reproducible artifact baseline. This is a prerequisite for meaningful Software Bill of Materials (SBOM) generation: an SBOM is only trustworthy if the artifacts it enumerates have been verified against known-good checksums.
SBOM generation tooling will be added in further steps, building on this foundation.
Maintaining the Dependency Graph: Forked Projects¶
Dependency verification confirms that artifacts match their expected checksums, but it does not by itself ensure that those artifacts are safe or up to date. A dependency pinned to a years-old version may contain known vulnerabilities, even if its checksum is correct. Managing the full dependency graph — including transitive dependencies — is an ongoing responsibility.
OmegaT uses unmodified upstream packages wherever possible. However, some
dependencies are either no longer actively maintained, or have diverged from
OmegaT’s requirements in ways that cannot wait on upstream release schedules.
In these cases, the project maintains forks under the
omegat-org umbrella on GitHub. Forks are
published on Maven Central under the org.omegat group for reproducible
consumption by the build:
https://search.maven.org/search?q=g:org.omegat*
Keeping these forks current — updating their own dependency graphs, targeting supported Java runtimes, and tracking upstream security fixes — is part of OmegaT’s supply chain maintenance work.
LanguageTools¶
OmegaT bundles LanguageTool for grammar and style checking. LanguageTool depends on Apache Lucene for indexing and linguistic analysis.
The upstream LanguageTool project ships with Lucene 6.x. OmegaT’s fork has bumped this to Lucene 8.11.x, and the change has been accepted in upstream PR discussion. However, it has not been merged due to upstream prioritization decisions unrelated to the technical merit of the change.
The situation has since become more urgent: the Apache Lucene project now develops on the 10.x line, and only Lucene 9.x receives security updates. OmegaT’s fork already has a branch tracking Lucene 9.x, and this will be the basis for the next dependency update cycle. Staying on Lucene 6.x indefinitely is not an option from a security standpoint.
The fork also replaces the language-detector dependency bundled by upstream
LanguageTool with OmegaT’s own fork of that library (see below), ensuring the
entire LanguageTool dependency subtree is under controlled supply chain
management.
Fork: omegat-org/languagetool
Complete Fork Inventory¶
The table below lists all forks maintained under the omegat-org umbrella, including both runtime dependencies and build/test tooling.
Fork |
Role |
Reason for fork |
|---|---|---|
Grammar and style checker |
Lucene version uplift (6.x → 9.x); language-detector subtree replacement |
|
Language identification |
Upstream unmaintained; Java runtime and dependency updates |
|
Docking window framework for Swing UI |
Upstream unmaintained; Java compatibility and dependency updates |
|
HTML parsing for the HTML filter |
Upstream unmaintained (originally CVS-hosted); bug fixes and build modernization |
|
Diff algorithm for translation memory matching |
Upstream unmaintained; Java runtime and build updates |
|
Character encoding detection |
UTF-8 + emoji detection fix not merged upstream |
|
Keyboard mnemonic management for menus |
Upstream unmaintained; Java and build updates |
|
Localized GUI labels for OpenJDK Swing (Arabic, Catalan, Russian, Ukrainian, et al.) |
Supplements missing locale data not provided by the JDK |
|
Japanese morphological analysis for Lucene tokenizer |
Lucene API compatibility tracking |
|
Windows executable wrapper and installer builder |
Upstream unmaintained; ARM64 support added |
|
SRX-based text segmentation |
Upstream unmaintained; Java runtime, dependency updates, SLF4J migration |
|
Text alignment algorithms |
Upstream unmaintained; Java runtime, dependency updates, SLF4J migration |
|
Swing GUI test assertions (test tooling) |
Upstream stops its maintenance and call for maitainer |
|
Snowball stemming algorithms for tokenizers |
Java runtime and build updates |
|
Trie data structure library for dictionary lookup |
Java runtime and build updates |
|
Help compiler tooling for documentation builds |
Provide the library on Maven Central |
Maintenance Cadence¶
Fork maintenance is treated as ongoing security work, not a one-time migration. Each fork is reviewed when:
A new CVE is disclosed against a dependency it carries.
A dependency version reaches end-of-life for security updates.
A new OmegaT release is prepared and the dependency graph is reviewed as part of the release checklist.
This incremental cadence keeps each individual update small and reviewable, avoiding the risk of large, infrequent upgrades that are harder to test and more likely to introduce regressions.