How to create a file filter plugin

This manual describes how to write a file filter plugin for OmegaT.

What is a file filter plugin

OmegaT can be extended with plugins. A plugin is just a .jar file, which is stored in OMEGAT_INSTALLATION/plugins/ dir or OMEGAT_USERPEFERENCES/plugins/ (see StaticUtils.getConfigDir() for details) If a plugin needs to use additional jars, they can be placed in the same directory.

One type of plugin is the file filter plugin. File filters can read files of a specific format and extract the text that needs to be translated and pass it to OmegaT so the user can translate the text in OmegaT. The filter can get the translation back from OmegaT, and produce the translated file in the same format.

requirements

To write a file filter plugin, you need to * implement the org/omegat/filters2/IFilter.java interface * create a .jar file that contains the implementation and a manifest file that indicates that the jar file is an OmegaT file filter plugin.

manifest

There must be a manifest file that indicates that it is an OmegaT plugin. There are two flavors, see below. A plugin should be declared in META-INF/MANIFEST.MF:

plugins for OmegaT 2.1.3 and up

OmegaT-Plugin: true
Plugin-Name: …
Plugin-Version: x.y.z
Plugin-Author: …
Plugin-Description: …
Plugin-Link: …

Name: my.Class
OmegaT-Plugin: filter

[Name: my.optional.other.Class
OmegaT-Plugin: filter]

plugins for OmegaT 3.0.1 and up

Plugin-Name: …
Plugin-Version: x.y.z
Plugin-Author: …
Plugin-Description: …
Plugin-Link: https://..
Plugin-Category: filter
OmegaT-Plugins: <classname>

where <classname> is the fully qualified classname of the plugin's initialization class. Multiple classnames can be defined, like in “Class-Path” attribute, i.e., space separated. This class should contain the following methods:

public static void loadPlugins() {}
public static void unloadPlugins() {}

The loadPlugins() method is executed on application startup before any GUI initialization. The plugin initialization class should analyze OmegaT version and register classes for filters:

Core.registerFilterClass(MyFilter.class);

Further advanced interface is described in Loading Plugins.

Set up your development project

When you develop your plugin, you need the OmegaT dependencies. OmegaT project publish OmegaT.jar on Maven Central, so you can add OmegaT as dependency for Gradle/Maven project.

Here is an example of how to configure the plugin project in a Gradle project. You can use skeleton project as a project template forked from https://github.com/omegat-org/plugin-skeleton

build.gradle

plugin {
    id('java')
    id('distribution')
    id 'org.omegat.gradle' version '1.5.11'
}
version='1.0.0'
group='your.group.id'
omegat {
    version='5.7.1'
    pluginClass='org.myorganization.MyFilter'
}

gradle.properties

plugin.name=My Filter Name
plugin.category=filter
plugin.link=https://github.com/omegat-org/plugin-skeleton
plugin.author=My name here.
plugin.description=I describe my plugin here. This plugin does amazing things
plugin.license=GNU General Public License version 3

More details are described in Setup Plugin Project.

Implementation

Now you've set up your project, it's time to implement the filter. As said before, you need to extend the IFilter interface.

Most functions are self-explanatory. There are three functions that are used to actually parse a file: * parseFile: used when OmegaT reads a file to show the texts that need to be translated to the user. * translateFile: used when OmegaT writes translated documents * alignFile: used only for the console mode align function. NB: it has nothing to do with the align function that you can see in the OmegaT UI; it is only for automatic alignment in console mode.

All three functions work with a callback. When you implement it, your function must call the callback for each text fragment (segment) from a file that need to be translated. NB: this is before OmegaT applies segmentation rules.

translation

On translation, three properties are involved, as you can see in the ITranslateCallback.getTranslation function: id, source and path.

ID: some file formats like properties files or key=value files have keys/ids to uniquely identify a segment. You see this often in localizing software. The ID is not shown on screen, only the value, the text itself, is. The ID field is optional. Only use it if your file format has IDs. The translator can see the ID in OmegaT in the 'segment properties' window.
source: the actual text that needs translation. This is what the translator sees in the normal translation window in OmegaT. NB: if an ID is specified, the source can be empty.
path: something that additionally makes a segment unique. The path filed is optional. The only usage so far is in the PO filter. A text can occur multiple times, but with a different 'context', or with a different number for plural alternatives.

OmegaT allows to give some source text different alternative translations, based on the ID, source and path, and additionally on filename, and optionally on previous/next segments. - Filename is known by OmegaT, the filter doesn't need to provide it. - previous/next segments are useful to determine the context of a segment, in 'normal' text files where one sentence follows another. Other formats, like key=value, do not have relationships between segments, and thus this should not be used. The previous/next segments are linked by calling linkPrevNextSegments() function of the IParseCallback interface at the end of processing a file. When translating a file, the previous/next segments are not known yet till at the end the segments are linked by calling linkPrevNextSegments() function of the ITranslateCallback interface. Therefore, another pass is needed. On the second pass you can fetch the correct translations. To indicate the pass, call setPass(). See for an example the AbstractFilter.

parsing files

For loading files into OmegaT, the IParseCallback interface is used, and the functions to call have more arguments than you see on ITranslateCallback.getTranslation. Some are related to bilingual files, to give the existing translation to OmegaT: translation, isFuzzy, and one of the possible 'properties'. And there is place for comments and for protected parts (a.k.a. tags).

parsing bilingual files

On parsing files, you can set the translation as it was found in the source, if your file format has it. This translation will show in the comments pane and can be automatically filled in as translation (if it is not fuzzy and if translation != source or 'Allow translation to be equal to source' is true)

If you mark the translation to be fuzzy, it will only show in the Fuzzy Matches pane and in the comments, but not as translation.

Additionally, if you implement the function isBilingual returning true, then the filter can be used to read files in the /tm/ folder of the project as external TMs.

properties

for each segment, you can add extra properties, which are key=value pairs.

One of the possible keys is SegmentProperties.COMMENT. Comments will show on the comment pane. There is also a callback function that has the comment as separate argument, which is easier if you don't have other properties.

Another property is SegmentProperties.REFERENCE, useful for bi-lingual files. When set to "true", it means that the segment (source+translation) will be used as reference TM (and not added to the project as a segment to be translated). The PO filter uses this.

All properties will show on the segment properties pane and can be searched for via the search function. (and the values also on the comment pane, but layout is not as good as on segment properties pane)

protected parts

When the file format contains formatting tags, placeholders or anything else you don't want to be altered in any way in the translation, you have two options - your filter replaces the text parts with 'OmegaT tags' before it sends the text to the callback function (and on translation: does the reverse after the translation is fetched from the callback function). An OmegaT tag looks like <x#> (see PatternConsts.OMEGAT_TAG for the regex pattern). The HTML and XML filters use this technique. See for example org.omegat.filters2.html2.FilterVisitor.shortcut() - you use the protected parts argument of the callback function. This function exists since OmegaT 3.0.6 and is used by Java properties files and PO files for example. The differences: - OmegaT tags are more complex to implement. They hide the meaning / original text for the translator. They are
possibly paired (i.e. open and close tags, and it can be checked if tag pairs do not partially overlap with other sets (e.g. <a1><b1></a1></b1> instead of <a1><b1></b1></a1>.). - protected parts are easier to implement. They can show the exact text to the translator, or whatever you want. In both cases, the tags show greyed to the translator, and depending on the OmegaT config, they can(not) be modified or order changed and errors show on 'tools->check issues' command.

The easiest way to specify protected parts is using List<ProtectedPart> protectedParts = TagUtil.applyCustomProtectedParts(source, java.util.regex.Pattern.compile("myregularexpression"), null);
which will find tags according a regular expression, and the text shown to the translator is the tag text itself without modification.

align file

The alignFile function is used when starting OmegaT from the command line using argument --mode=console-align. In this mode, OmegaT will create a TMX file with the source and translation as found by the filter. The resulting TMX is stored in the /omegat/ folder under the name align.tmx.

The arguments for the callback function are identical to the parse function. isFuzzy results in the fuzzy mark to be added to the translation.

Plugin options

OmegaT by default lets the user specify the filename pattern and the encoding for the files used by a filter, if the filter does not auto-detect it. Other options can be programmed in the filter, by implementing changeOptions(). You can show dialogs etc (using the parent Dialog as parent for your dialogs). Saving the options is handled by OmegaT. You only need to return the set op options (key/values).

Head start

The AbstractFilter class gives you a head start in dealing with many tasks like linking segments. So you better extend the AbstractFilter instead of implementing IFilter from scratch.

And if your file format is close to a format of an other filter, or if you need some inspiration, then you might want to copy or look at the code of one of the other filters and adapt it. You can find the filters under org.omegat.filters2 and the XML filters under org.omegat.filters3.

Testing

Every good piece of code comes with unit tests. It can be hard to create a test for every function, especially where code is relying on other classes, like an instantiated OmegaT project (RealProject), FilterBase, config etc. OmegaT source code is not very DependencyInjection ready. The class org/omegat/filters/TestFilterBase.java will help you set up a suitable test environment, and provides some handy functions to test if the filter extracts the correct segments, and if the translation file is what it should look like.

When you start from a plugin-skeleton samle project, test example is also bundled. It will be good start point to create your own test cases.

debugging and running in OmegaT

When you select Gradle build system and use gradle-omegat-plugin, it is very handy to debug and run your plugin in OmegaT.

The gradle-omegat plugin provide you handy command to run the tasks.

$ ./gradlew runOmegaT

Above command on a plugin project root, the gradle-omegat plugin setup test user configuration, build your plugin, and install your plugin into test user provisioning. Your home directory is still clean but it run a clean OmegaT instance specified by omegat { version="5.7.1" } directive.

When you want to use Java debugger, you can run ;

$ ./gradlew debugOmegaT

Above command on a plugin project root, the gradle-omegat plugin setup test user configuration, build your plugin, and install your plugin into test user provisioning as same as 'runOmegaT' command. It also does open Java monitor port 5566 and wait connection from java debugger.

When you don't want to use these features, you can also run debug session with manual operation.

To run your plugin, you need to compile a .jar file, copy it to the right OmegaT folder (see begin of this document) and start OmegaT. For debugging and testing, you best write unit tests, and debug by running them.

If you really need to debug in the context of a running OmegaT instance (for some other plugin types this might be more relevant), you can 'run' org.omegat.Main. Make sure all dependent 3rd party libraries are in the classpath. Since you did not compile a .jar file, you have to make sure there is a correct META-INF/MANIFEST.MF file (which is missing if you rely on e.g. maven-jar plugin to generate it for you)