Build and Dependency Security

Motivation

OmegaT is a mature free, libre, and opensource project with a large dependency graph managed by Gradle. Like any software that pulls artifacts from public repositories, it is exposed to supply chain attacks — a threat category that has grown significantly in recent years and continues to speed up.

A supply chain attack targets the build process rather than the application source code directly. If an attacker can substitute or tamper with a dependency artifact served by a repository, malicious code can enter a release without any modification to the OmegaT source tree and without triggering code review processes.

The risk is not theoretical. High-profile incidents in the broader open-source ecosystem have demonstrated that:

  • Package repositories can be compromised or have individual packages hijacked.

  • Maintainers of widely-used libraries can be targeted or impersonated.

  • Typosquatting and dependency confusion attacks introduce malicious packages that are silently resolved by build tools.

  • Build infrastructure itself (CI runners, caches, mirrors) is a target.

The Evolving Threat Landscape

AI-Assisted Vulnerability Discovery

The increasing availability of AI coding assistants has a dual effect on security. While it can help developers write safer code, it also lowers the barrier for adversaries to discover and exploit vulnerabilities at scale. AI tools can:

  • Rapidly analyze large codebases for patterns associated with known vulnerability classes (null dereferences, unsafe deserialization, injection points).

  • Generate targeted exploit code for identified weaknesses.

  • Automate the discovery of supply chain targets — identifying widely depended packages maintained by a small number of contributors.

For a project like OmegaT, this means that even obscure transitive dependencies can become viable attack surfaces if they are poorly maintained upstream.

Incremental and Persistent Improvement

No single measure eliminates supply chain risk. The appropriate response is a layered, incremental approach that improves the security posture over time without requiring large one-time efforts that are challenging to sustain.

OmegaT’s security hardening follows this principle: each improvement is self-contained, reviewable, and does not block ongoing development work.

Dependency Verification

What It Does

Gradle’s built-in dependency verification validates the SHA-256 checksum of every downloaded artifact against a known-good value recorded in gradle/verification-metadata.xml. If any artifact does not match its expected hash, the build fails immediately — before any code is compiled or executed.

This provides a checksum-based safety net against:

  • Compromised or hijacked packages on Maven Central or other repositories.

  • Man-in-the-middle substitution during artifact download.

  • Cache poisoning attacks on local or CI artifact caches.

Configuration

The verification metadata file is located at:

gradle/verification-metadata.xml

Key configuration decisions in the current setup:

verify-metadata: true
POM files and Gradle module metadata (.module files) are also verified, not only JAR artifacts. This prevents attacks that manipulate dependency resolution graphs without changing JAR content.

verify-signatures: false
PGP signature verification is intentionally deferred. Enabling it requires maintaining a keyring for all upstream publishers, which introduces significant ongoing maintenance cost. It can be added incrementally in the future.

Source and Javadoc JARs are globally trusted
-sources.jar and -javadoc.jar artifacts are excluded from checksum verification. These files contain no executable code and are resolved by IDE synchronization outside the normal build resolution path, making them impractical to track in the verification metadata.

Note: Gradle syntax requires group=".*" for trusted artifact patterns; the artifact attribute alone is not accepted. Additionally, many dependencies do not publish source or Javadoc JARs at all; attempting to verify absent artifacts would produce spurious build errors.

Updating the Verification Metadata

When dependencies are added, removed, or upgraded, the verification metadata must be regenerated. Run:

./gradlew --write-verification-metadata sha256 dependencies

This resolves all dependency configurations and rewrites gradle/verification-metadata.xml with updated checksums. Review the diff before committing to confirm that only expected artifacts changed.

When running in a CI environment without network access to the expected repositories, dependency verification failures indicate either a configuration problem or a potential artifact substitution — both warrant investigation.

Relationship to SBOM

Dependency verification establishes a verified, reproducible artifact baseline. This is a prerequisite for meaningful Software Bill of Materials (SBOM) generation: an SBOM is only trustworthy if the artifacts it enumerates have been verified against known-good checksums.

SBOM generation tooling will be added in further steps, building on this foundation.

Maintaining the Dependency Graph: Forked Projects

Dependency verification confirms that artifacts match their expected checksums, but it does not by itself ensure that those artifacts are safe or up to date. A dependency pinned to a years-old version may contain known vulnerabilities, even if its checksum is correct. Managing the full dependency graph — including transitive dependencies — is an ongoing responsibility.

OmegaT uses unmodified upstream packages wherever possible. However, some dependencies are either no longer actively maintained, or have diverged from OmegaT’s requirements in ways that cannot wait on upstream release schedules. In these cases, the project maintains forks under the omegat-org umbrella on GitHub. Forks are published on Maven Central under the org.omegat group for reproducible consumption by the build:

https://search.maven.org/search?q=g:org.omegat*

Keeping these forks current — updating their own dependency graphs, targeting supported Java runtimes, and tracking upstream security fixes — is part of OmegaT’s supply chain maintenance work.

LanguageTools

OmegaT bundles LanguageTool for grammar and style checking. LanguageTool depends on Apache Lucene for indexing and linguistic analysis.

The upstream LanguageTool project ships with Lucene 6.x. OmegaT’s fork has bumped this to Lucene 8.11.x, and the change has been accepted in upstream PR discussion. However, it has not been merged due to upstream prioritization decisions unrelated to the technical merit of the change.

The situation has since become more urgent: the Apache Lucene project now develops on the 10.x line, and only Lucene 9.x receives security updates. OmegaT’s fork already has a branch tracking Lucene 9.x, and this will be the basis for the next dependency update cycle. Staying on Lucene 6.x indefinitely is not an option from a security standpoint.

The fork also replaces the language-detector dependency bundled by upstream LanguageTool with OmegaT’s own fork of that library (see below), ensuring the entire LanguageTool dependency subtree is under controlled supply chain management.

Fork: omegat-org/languagetool

Complete Fork Inventory

The table below lists all forks maintained under the omegat-org umbrella, including both runtime dependencies and build/test tooling.

Fork

Role

Reason for fork

languagetool

Grammar and style checker

Lucene version uplift (6.x → 9.x); language-detector subtree replacement

language-detector

Language identification

Upstream unmaintained; Java runtime and dependency updates

vldocking

Docking window framework for Swing UI

Upstream unmaintained; Java compatibility and dependency updates

htmlparser

HTML parsing for the HTML filter

Upstream unmaintained (originally CVS-hosted); bug fixes and build modernization

gnudiff4j

Diff algorithm for translation memory matching

Upstream unmaintained; Java runtime and build updates

juniversalchardet

Character encoding detection

UTF-8 + emoji detection fix not merged upstream

lib-mnemonics

Keyboard mnemonic management for menus

Upstream unmaintained; Java and build updates

swing-extra-locales

Localized GUI labels for OpenJDK Swing (Arabic, Catalan, Russian, Ukrainian, et al.)

Supplements missing locale data not provided by the JDK

lucene-gosen

Japanese morphological analysis for Lucene tokenizer

Lucene API compatibility tracking

launch4j

Windows executable wrapper and installer builder

Upstream unmaintained; ARM64 support added

segment

SRX-based text segmentation

Upstream unmaintained; Java runtime, dependency updates, SLF4J migration

maligna

Text alignment algorithms

Upstream unmaintained; Java runtime, dependency updates, SLF4J migration

assertj-swing

Swing GUI test assertions (test tooling)

Upstream stops its maintenance and call for maitainer

libstemmer

Snowball stemming algorithms for tokenizers

Java runtime and build updates

trie4j

Trie data structure library for dictionary lookup

Java runtime and build updates

whc

Help compiler tooling for documentation builds

Provide the library on Maven Central

Maintenance Cadence

Fork maintenance is treated as ongoing security work, not a one-time migration. Each fork is reviewed when:

  • A new CVE is disclosed against a dependency it carries.

  • A dependency version reaches end-of-life for security updates.

  • A new OmegaT release is prepared and the dependency graph is reviewed as part of the release checklist.

This incremental cadence keeps each individual update small and reviewable, avoiding the risk of large, infrequent upgrades that are harder to test and more likely to introduce regressions.

Further Reading