All Versions
Latest Version
Avg Release Cycle
50 days
Latest Release
1284 days ago

Changelog History
Page 2

  • v0.3.0 Changes

    January 16, 2019

    ๐Ÿš€ This major release offers a lot of new features, including new languages. Finally! :-)


    • โž• added 18 languages: Arabic, Belarusian, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Hungarian, Latvian, Lithuanian, Polish, Persian, Romanian, Russian, Swedish, Turkish

    ๐Ÿ”‹ Features

    • Language models can now be cached by MapDB to reduce memory usage and speed up loading times.

    ๐Ÿ‘Œ Improvements

    • In the standalone app, you can now choose which language models to load in order to compare detection accuracy between strongly related languages.
    • โœ… For test report generation using Maven, you can now select a specific language using the attribute language and do not need to run the reports for all languages anymore: mvn test -P accuracy-reports -D detector=lingua -D language=German.

    API changes

    • Lingua's package structure has been simplified. The public API intended for end users now lives in com.github.pemistahl.lingua.api. Breaking changes herein are tried to keep to a minimum in 0.*.* versions and will not be performed anymore starting from version 1.0.0. All other code is stored in com.github.pemistahl.lingua.internal and is subject to change without any further notice.
    • โž• added new class com.github.pemistahl.lingua.api.LanguageDetectorBuilder which is now responsible for building and configuring instances of com.github.pemistahl.lingua.api.LanguageDetector

    โœ… Test Coverage

    • โœ… Test coverage of the public API has been extended from 6% to 23%.

    ๐Ÿ“š Documentation

    • โœ… In addition to the test reports, graphical plots have been created in order to compare the detection results between the different classifiers even more easily. The code for the plots has been written in Python and is stored in an IPython notebook under /accuracy-reports/accuracy-reports-analysis-notebook.ipynb.
  • v0.2.2 Changes

    December 28, 2018

    โšก๏ธ This minor version update provides the following:

    ๐Ÿ‘Œ Improvements

    • The included language model JSON files now use a more efficient formatting, saving approximately 25% disk space in uncompressed format compared to version 0.2.1.

    ๐Ÿ› Bug Fixes

    • โœ… The version of the Jacoco test coverage Maven plugin was incorrectly specified, leading to download errors. Now the most current snapshot version of Jacoco is used which provides enhancements for Kotlin test coverage measurement.
  • v0.2.1 Changes

    December 20, 2018

    โšก๏ธ This minor version update provides the following:

    ๐ŸŽ Performance Improvements

    • Lingua's language detection has been speeded up. It is now approximately 25% faster for large data sets.

    Comparison with Apache Tika

    • Accuracy report test classes have been written for Apache Tika to compare its language detection performance with Lingua's one. Lingua actually outperforms Tika for short paragraphs of text by up to 15% in accuracy. A detailed comparison table can be found in the README.
  • v0.2.0 Changes

    December 17, 2018

    ๐Ÿš€ This release provides both new features and bug fixes. It is the first release that has been published to JCenter. Publication on Maven Central will follow soon.


    • โž• added detection support for Portuguese

    ๐Ÿ”‹ Features

    • extended language models for already existing languages to provide for more accurate detection results
    • the larger language models are now lazy-loaded to reduce waiting times during start-up, especially when starting the lingua REPL
    • โž• added some unit tests for the LanguageDetector class that cover the most basic functionality (will be extended in upcoming versions)
    • โž• added accuracy reports and test data for each supported language, in order to measure language detection accuracy (can be generated with mvn test -P accuracy-reports)
    • โž• added accuracy statistics summary of the current implementation to README

    API changes

    • ๐Ÿ“‡ renamed method LanguageDetector.detectLanguageFrom() to LanguageDetector.detectLanguageOf() to use the grammatically correct English preposition
    • in version 0.1.0, the now called method LanguageDetector.detectLanguageOf() returned null for strings whose language could not be detected reliably. Now, Language.UNKNOWN is returned instead in those cases to prevent NullPointerExceptions especially in Java code.

    ๐Ÿ› Bug Fixes

    • fixed a bug in lingua's REPL that caused non-ASCII characters to get broken in consoles which do not use UTF-8 encoding by default, especially on Windows systems
  • v0.1.0 Changes

    November 16, 2018

    This is the very first release of Lingua. It aims at accurate language detection results for both long and especially short text. Detection on short text fragments such as Twitter messages is a weak spot of many similar libraries.

    ๐Ÿ‘Œ Supported languages so far:

    • English
    • French
    • German
    • Italian
    • Latin
    • Spanish