How to create a file filter plugin
This manual describes how to write a file filter plugin for OmegaT.
What is a file filter plugin
OmegaT can be extended with plugins.
A plugin is just a .jar
file, which is stored in OMEGAT_INSTALLATION/plugins/
dir or OMEGAT_USERPEFERENCES/plugins/
(see StaticUtils.getConfigDir()
for details)
If a plugin needs to use additional jars, they can be placed in the same directory.
One type of plugin is the file filter plugin. File filters can read files of a specific format and extract the text that needs to be translated and pass it to OmegaT so the user can translate the text in OmegaT. The filter can get the translation back from OmegaT, and produce the translated file in the same format.
requirements
To write a file filter plugin, you need to
* implement the org/omegat/filters2/IFilter.java
interface
* create a .jar file that contains the implementation and a manifest file that indicates that the jar file is an
OmegaT file filter plugin.
manifest
There must be a manifest file that indicates that it is an OmegaT plugin. There are two flavors, see below.
A plugin should be declared in META-INF/MANIFEST.MF
:
plugins for OmegaT 2.1.3 and up
OmegaT-Plugin: true
Plugin-Name: …
Plugin-Version: x.y.z
Plugin-Author: …
Plugin-Description: …
Plugin-Link: …
Name: my.Class
OmegaT-Plugin: filter
[Name: my.optional.other.Class
OmegaT-Plugin: filter]
plugins for OmegaT 3.0.1 and up
Plugin-Name: …
Plugin-Version: x.y.z
Plugin-Author: …
Plugin-Description: …
Plugin-Link: https://..
Plugin-Category: filter
OmegaT-Plugins: <classname>
where <classname>
is the fully qualified classname of the plugin's initialization class. Multiple classnames can be
defined, like in “Class-Path” attribute, i.e., space separated.
This class should contain the following methods:
public static void loadPlugins() {}
public static void unloadPlugins() {}
The loadPlugins()
method is executed on application startup before any GUI initialization.
The plugin initialization class should analyze OmegaT version and register classes for filters:
Core.registerFilterClass(MyFilter.class);
Further advanced interface is described in Loading Plugins.
Set up your development project
When you develop your plugin, you need the OmegaT dependencies. OmegaT project publish OmegaT.jar on Maven Central, so you can add OmegaT as dependency for Gradle/Maven project.
Here is an example of how to configure the plugin project in a Gradle project. You can use skeleton project as a project template forked from https://github.com/omegat-org/plugin-skeleton
build.gradle
plugin {
id('java')
id('distribution')
id 'org.omegat.gradle' version '1.5.11'
}
version='1.0.0'
group='your.group.id'
omegat {
version='5.7.1'
pluginClass='org.myorganization.MyFilter'
}
gradle.properties
plugin.name=My Filter Name
plugin.category=filter
plugin.link=https://github.com/omegat-org/plugin-skeleton
plugin.author=My name here.
plugin.description=I describe my plugin here. This plugin does amazing things
plugin.license=GNU General Public License version 3
More details are described in Setup Plugin Project.
Implementation
Now you've set up your project, it's time to implement the filter. As said before, you need to extend the IFilter
interface.
Most functions are self-explanatory. There are three functions that are used to actually parse a file:
* parseFile
: used when OmegaT reads a file to show the texts that need to be translated to the user.
* translateFile
: used when OmegaT writes translated documents
* alignFile
: used only for the console mode align function. NB: it has nothing to do with the align function that you
can see in the OmegaT UI; it is only for automatic alignment in console mode.
All three functions work with a callback. When you implement it, your function must call the callback for each text fragment (segment) from a file that need to be translated. NB: this is before OmegaT applies segmentation rules.
translation
On translation, three properties are involved, as you
can see in the ITranslateCallback.getTranslation
function:
id, source and path.
- ID: some file formats like properties files or key=value files have keys/ids to uniquely identify a segment. You see this often in localizing software. The ID is not shown on screen, only the value, the text itself, is. The ID field is optional. Only use it if your file format has IDs. The translator can see the ID in OmegaT in the 'segment properties' window.
- source: the actual text that needs translation. This is what the translator sees in the normal translation window in OmegaT. NB: if an ID is specified, the source can be empty.
- path: something that additionally makes a segment unique. The path filed is optional. The only usage so far is in the PO filter. A text can occur multiple times, but with a different 'context', or with a different number for plural alternatives.
OmegaT allows to give some source text different alternative translations, based on the ID, source and path, and
additionally on filename, and optionally on previous/next segments.
- Filename is known by OmegaT, the filter doesn't need to provide it.
- previous/next segments are useful to determine the context of a segment, in 'normal' text files where one sentence
follows another. Other formats, like key=value, do not have relationships between segments, and thus this should not be
used.
The previous/next segments are linked by calling linkPrevNextSegments()
function of the IParseCallback
interface at the end of processing a file. When translating a file, the previous/next segments are not known yet till
at the end the segments are linked by calling linkPrevNextSegments()
function of the ITranslateCallback interface.
Therefore, another pass is needed. On the second pass you can fetch the correct translations. To indicate the pass,
call setPass()
. See for an example the AbstractFilter
.
parsing files
For loading files into OmegaT, the IParseCallback
interface is used, and the functions to call have more arguments than
you see on ITranslateCallback.getTranslation
.
Some are related to bilingual files, to give the existing translation to OmegaT: translation
, isFuzzy
, and one of
the possible 'properties'.
And there is place for comments and for protected parts (a.k.a. tags).
parsing bilingual files
On parsing files, you can set the translation as it was found in the source, if your file format has it. This translation will show in the comments pane and can be automatically filled in as translation (if it is not fuzzy and if translation != source or 'Allow translation to be equal to source' is true)
If you mark the translation to be fuzzy, it will only show in the Fuzzy Matches pane and in the comments, but not as translation.
Additionally, if you implement the function isBilingual
returning true, then the filter can be used to read files in
the /tm/
folder of the project as external TMs.
properties
for each segment, you can add extra properties, which are key=value pairs.
One of the possible keys is
SegmentProperties.COMMENT
. Comments will show on the comment pane.
There is also a callback function that has the
comment as separate argument, which is easier if you don't have other properties.
Another property is SegmentProperties.REFERENCE
, useful for bi-lingual files.
When set to "true", it means that the segment (source+translation) will be used as reference TM (and not added to the
project as a segment to be translated). The PO filter uses this.
All properties will show on the segment properties pane and can be searched for via the search function. (and the values also on the comment pane, but layout is not as good as on segment properties pane)
protected parts
When the file format contains formatting tags, placeholders or anything else you don't want to be altered in any
way in the translation, you have two options
- your filter replaces the text parts with 'OmegaT tags' before it sends the text to the callback function (and on
translation: does the reverse after the translation is fetched from the callback function). An OmegaT tag looks like
<x#>
(see PatternConsts.OMEGAT_TAG
for the regex pattern). The HTML and XML filters use this technique. See for
example org.omegat.filters2.html2.FilterVisitor.shortcut()
- you use the protected parts argument of the callback function. This function exists since OmegaT 3.0.6 and is used by
Java properties files and PO files for example.
The differences:
- OmegaT tags are more complex to implement. They hide the meaning / original text for the translator. They are
possibly paired
(i.e. open and close tags, and it can be checked if tag pairs do not partially overlap with other sets
(e.g. <a1><b1></a1></b1>
instead of <a1><b1></b1></a1>
.).
- protected parts are easier to implement. They can show the exact text to the translator, or whatever you want.
In both cases, the tags show greyed to the translator, and depending on the OmegaT config, they can(not) be modified or
order changed and errors show on 'tools->check issues' command.
The easiest way to specify protected parts is using
List<ProtectedPart> protectedParts =
TagUtil.applyCustomProtectedParts(source, java.util.regex.Pattern.compile("myregularexpression"), null);
which will find tags according a regular expression, and the text shown to the translator is the tag text itself
without modification.
align file
The alignFile
function is used when starting OmegaT from the command line using argument --mode=console-align
.
In this mode, OmegaT will create a TMX file with the source and translation as found by the filter.
The resulting TMX is stored in the /omegat/
folder under the name align.tmx
.
The arguments for the callback function are identical to the parse function. isFuzzy
results in the fuzzy mark to be
added to the translation.
Plugin options
OmegaT by default lets the user specify the filename pattern and the encoding for the files
used by a filter, if the filter does not auto-detect it.
Other options can be programmed in the filter, by implementing changeOptions()
. You can show dialogs etc
(using the parent Dialog as parent for your dialogs). Saving the options is handled by OmegaT. You only need to return
the set op options (key/values).
Head start
The AbstractFilter class gives you a head start in dealing with many tasks like linking segments. So you better extend the AbstractFilter instead of implementing IFilter from scratch.
And if your file format is close to a format of an other filter, or if you need some inspiration,
then you might want to copy or look at the code of one of the
other filters and adapt it. You can find the filters under org.omegat.filters2
and the XML filters under
org.omegat.filters3
.
Testing
Every good piece of code comes with unit tests. It can be hard to create a test for every function, especially where
code is relying on other classes, like an instantiated OmegaT project (RealProject), FilterBase, config etc.
OmegaT source code is not very DependencyInjection ready. The class org/omegat/filters/TestFilterBase.java
will help
you set up a suitable test environment, and provides some handy functions to test if the filter extracts the correct
segments, and if the translation file is what it should look like.
When you start from a plugin-skeleton samle project, test example is also bundled. It will be good start point to create your own test cases.
debugging and running in OmegaT
When you select Gradle build system and use gradle-omegat-plugin, it is very handy to debug and run your plugin in OmegaT.
The gradle-omegat plugin provide you handy command to run the tasks.
$ ./gradlew runOmegaT
Above command on a plugin project root, the gradle-omegat plugin setup test user configuration, build your plugin, and
install your plugin into test user provisioning. Your home directory is still clean but it run a clean OmegaT instance
specified by omegat { version="5.7.1" }
directive.
When you want to use Java debugger, you can run ;
$ ./gradlew debugOmegaT
Above command on a plugin project root, the gradle-omegat plugin setup test user configuration, build your plugin, and install your plugin into test user provisioning as same as 'runOmegaT' command. It also does open Java monitor port 5566 and wait connection from java debugger.
When you don't want to use these features, you can also run debug session with manual operation.
To run your plugin, you need to compile a .jar
file, copy it to the right OmegaT folder (see begin of this document)
and start OmegaT.
For debugging and testing, you best write unit tests, and debug by running them.
If you really need to debug in the context of a running OmegaT instance (for some other plugin types this might be
more relevant), you can 'run' org.omegat.Main
. Make sure all dependent 3rd party libraries are in the classpath.
Since you did not compile a .jar file, you have to make sure there is a correct META-INF/MANIFEST.MF file
(which is missing if you rely on e.g. maven-jar plugin to generate it for you)