lingua v0.3.0 Release Notes

Release Date: 2019-01-16 // about 5 years ago
  • ๐Ÿš€ This major release offers a lot of new features, including new languages. Finally! :-)

    Languages

    • โž• added 18 languages: Arabic, Belarusian, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Hungarian, Latvian, Lithuanian, Polish, Persian, Romanian, Russian, Swedish, Turkish

    ๐Ÿ”‹ Features

    • Language models can now be cached by MapDB to reduce memory usage and speed up loading times.

    ๐Ÿ‘Œ Improvements

    • In the standalone app, you can now choose which language models to load in order to compare detection accuracy between strongly related languages.
    • โœ… For test report generation using Maven, you can now select a specific language using the attribute language and do not need to run the reports for all languages anymore: mvn test -P accuracy-reports -D detector=lingua -D language=German.

    API changes

    • Lingua's package structure has been simplified. The public API intended for end users now lives in com.github.pemistahl.lingua.api. Breaking changes herein are tried to keep to a minimum in 0.*.* versions and will not be performed anymore starting from version 1.0.0. All other code is stored in com.github.pemistahl.lingua.internal and is subject to change without any further notice.
    • โž• added new class com.github.pemistahl.lingua.api.LanguageDetectorBuilder which is now responsible for building and configuring instances of com.github.pemistahl.lingua.api.LanguageDetector

    โœ… Test Coverage

    • โœ… Test coverage of the public API has been extended from 6% to 23%.

    ๐Ÿ“š Documentation

    • โœ… In addition to the test reports, graphical plots have been created in order to compare the detection results between the different classifiers even more easily. The code for the plots has been written in Python and is stored in an IPython notebook under /accuracy-reports/accuracy-reports-analysis-notebook.ipynb.