lingua/CHANGELOG and lingua Releases

All Versions

Latest Version

1.0.3

Avg Release Cycle

50 days

Latest Release

1289 days ago

Changelog History

Page 1

v1.0.3 Changes
October 15, 2020
🐛 Bug Fixes
- 🚚 When two languages had exactly the same confidence values, one of them was erroneously removed from the result map.
  Thanks to @mmedek for reporting this bug. (#72)
- There was still a problem with the classification of texts consisting of certain alphabets.
  Thanks to @nicolabertoldi for reporting this bug. (#76)
- The language detection for Spanish did not take the rarely used accented characters á, é, í, ó, ú and ü into account.
  Thanks to @joeporter for reporting this bug. (#73)
- 🛠 A bug in the rule engine led to weak detection accuracy for Macedonian and Serbian. This has been fixed.
Other Changes
- 🚀 The Kotlin compiler and runtime have been updated to version 1.4. This includes the current stable release 1.0.0 of the kotlinx-serialization framework.
- 🚚 The accuracy report files have been moved to their own Gradle source set. This allows for separate compilation of unit tests and accuracy report tests, leading to more flexible and slightly faster compilation.
v1.0.2 Changes
August 09, 2020
🐛 Bug Fixes
- The language mapping for character ë was incorrect which has been fixed.
  Thanks to @sandernugterenedia for reporting this bug. (#66)
- The implementation of LanguageDetector made use of functionality that was
  introduced in Java 8 which made the library unusable for Java 6 and 7.
  Thanks to @levant916 for reporting this bug. (#69)
- 🔌 The Gradle shadow plugin has been
  ➕ added so that ./gradlew jarWithDependencies produces a jar file whose dependencies
  do not conflict anymore with the same dependencies of different versions in the same project. (#67)
v1.0.1 Changes
July 04, 2020
🐛 Bug Fixes
- If no ngram probabilities were found for a given input text, a NullPointerException would be thrown.
  Thanks to @fsonntag for finding and fixing this bug. (#63)
v1.0.0 Changes
June 24, 2020
Languages
- ➕ added 9 new languages, this time with a focus on Africa: Ganda, Shona, Sotho, Swahili, Tsonga, Tswana, Xhosa, Yoruba, Zulu
- ✂ removed language Norwegian in favor of Bokmal and Nynorsk (#59)
🔋 Features
- LanguageDetector can now provide confidence scores for each evaluated language. (#11)
- ✅ The public API for creating language model (LanguageModelFilesWriter) and test data files (TestDataFilesWriter) has been stabilized. (#37)
- 🆕 New convenience methods have been added to LanguageDetectorBuilder in order to build LanguageDetector from languages written in a certain script. (#61)
👌 Improvements
- The rule-based detection algorithm has been made less sensitive so that single words in a different language cannot mislead the algorithm so easily.
- The fastutil library has been added again to reduce memory consumption. (#58)
- ⚡️ The language model-based algorithm has been optimized so that language detection performs approximately 25% faster now. (#58)
- 👌 Support for the Kotlin linter ktlint has been added to help with a consistent coding style. (#47)
- ⚡️ Third-party dependencies have been updated to their latest versions. (#36)
🐛 Bug Fixes
- Incorrect regex character classes caused the library to not work properly on Android. (#32)
✅ Test Coverage
- ✅ Test coverage has been extended from 59% to 72%.
📚 Documentation
- The README contains a new section describing how users can add their own languages to Lingua.
Other changes

🚀 There is a breaking change in this release:
- Methods with the prefix fromAllBuiltIn... have been renamed to fromAll... to make them more succinct and clear. (#61)
v0.6.1 Changes
February 06, 2020
🐛 Bug Fixes
- The rule-based engine did not take language subset filtering from public api into account (#23).
- It was possible to pass through Language.UNKNOWN within the public api (#24).
- 🛠 Fixed a bug in the rule-based engine's alphabet detection algorithm which could be misled by single characters (#25).
v0.6.0 Changes
January 05, 2020
Languages
- ➕ added 11 new languages: Armenian, Bosnian, Azerbaijani, Esperanto, Georgian, Kazakh, Macedonian, Marathi, Mongolian, Serbian, Ukrainian
🔋 Features

🚀 There are some breaking changes in this release:
- 🚚 The support for MapDB has been removed. It did not provide enough advantages over Kotlin's lazy loading of language models. It used a lot of disc space and language detection became slow. With the long-term goal of creating a multiplatform library, only those features will be implemented in the future that support JavaScript as well.
- 🚚 The dependency on the fastutil library has been removed. It did not provide enough advantages over Kotlin's lazy loading of language models.
- 🚚 The method LanguageDetector.detectLanguagesOf(text: Iterable<String>) has been removed because the sorting order of the returned languages was undefined for input collections such as a HashSet. From now on, the method LanguageDetector.detectLanguageOf(text: String) will be the only one to be used.
- The LanguageDetector can now be built with the following additional methods:
  - LanguageDetectorBuilder.fromIsoCodes639_1(vararg isoCodes: IsoCode639_1)
  - LanguageDetectorBuilder.fromIsoCodes639_3(vararg isoCodes: IsoCode639_3)
  - the following method has been removed: LanguageDetectorBuilder.fromIsoCodes(isoCode: String, vararg isoCodes: String)
- 🚚 The Gson library has been replaced with kotlinx-serialization for the loading of the json language models. This results in a significant reduction of code and makes reflection obsolete, so the dependency on kotlin-reflect could be removed.
👌 Improvements
- The overall detection algorithm has been improved again several times to fix several detection bugs.
v0.5.0 Changes
August 12, 2019
Languages
- ➕ added 12 new languages: Bengali, Chinese (not differentiated between traditional and simplified, as of now), Gujarati, Hebrew, Hindi, Japanese, Korean, Punjabi, Tamil, Telugu, Thai, Urdu
🔋 Features
👍 The LanguageDetectorBuilder now supports the additional method withMinimumRelativeDistance() that allows to specify the minimum distance between the logarithmized and summed up probabilities for each possible language. If two or more languages yield nearly the same probability for a given input text, it is likely that the wrong language may be returned. By specifying a higher value for the minimum relative distance, Language.UNKNOWN is returned instead of risking false positives.
✅ Test report generation can now use multiple CPU cores, allowing to run as many reports as CPU cores are available. This has been implemented as an additional attribute for the respective Gradle task: ./gradlew writeAccuracyReports -PcpuCores=...
The REPL now allows to freely specify the languages you want to try out by entering the desired ISO 639-1 codes. Before, it has only been possible to choose between certain language combinations.

👌 Improvements
- The overall detection algorithm has been improved, yielding slightly more accurate results for those languages that are based on the Latin alphabet.
🐛 Bug Fixes

🛠 Thanks to the great work of contributor Bernhard Geisberger, two bugs could be fixed.
The fix in pull request #8 solves the problem of not being able to recreate the MapDB cache files automatically in case the data has been corrupted.
The fix in pull request #9 makes the class LanguageDetector completely thread-safe. Previously, in some rare cases it was possible that two threads mutated one of the internal variables at the same time, yielding inaccurate language detection results.

Thank you, Bernhard.
v0.4.0 Changes
May 07, 2019
🚀 This release took some time, but here it is.

Languages
- ➕ added 18 new languages: Afrikaans, Albanian, Basque, Bokmal, Catalan, Greek, Icelandic, Indonesian, Irish, Malay, Norwegian, Nynorsk, Slovak, Slovene, Somali, Tagalog, Vietnamese, Welsh
🔋 Features
Language models are now lazy-loaded into memory upon first access and not already when an instance of LanguageDetector is created. This way, if the rule-based engine can filter out some unlikely languages, their language models are not loaded into memory as they are not necessary at that point. So the overall memory consumption is further reduced.
The fastutil library is used to compress the probability values of the language models in memory. They are now stored as primitive data types (double) instead of objects (Double) which reduces memory consumption by approximately 500 MB if all language models are selected.

👌 Improvements
- 🔧 The overall code quality has been improved significantly. This allows for easier unit testing, configuration and extensibility.
🐛 Bug Fixes
- 🛠 Reported bug #3 has been fixed which prevented certain character classes to be used on Android.
👷 Build system
- 👷 Starting from this version, Gradle is used as this library's build system instead of Maven. This allows for more customizations, such as in test report generation, and is a first step towards multiplatform support. Please take a look at this project's README to read about the available Gradle tasks.
✅ Test Coverage
- ✅ Test coverage has been extended from 24% to 55%.
v0.3.2 Changes
February 08, 2019
🚑 This minor update fixes a critical bug reported in issue #1.

🐛 Bug Fixes
- 🚀 The attempt to detect the language of a string solely containing characters that do not occur in any of the supported languages returned kotlin.KotlinNullPointerException. This has been fixed in this release. Instead, Language.UNKNOWN is now returned as expected.
⚡️ Dependency Updates
- ⚡️ The Kotlin compiler, standard library and runtime have been updated from version 1.3.20 to 1.3.21
v0.3.1 Changes
January 24, 2019
⚡️ This minor update contains some significant detection accuracy improvements.

Accuracy Improvements
- ➕ added new detection rules to improve accuracy especially for single words and word pairs
- accuracy for single words has been increased from 78% to 82% on average
- accuracy for word pairs has been increased from 92% to 94% on average
- accuracy for sentences has been increased from 98% to 99% on average
- overall accuracy has been increased from 90% to 91% on average
- overall standard deviation has been reduced from 6.01 to 5.35
API changes
- LanguageDetectorBuilder.fromIsoCodes() now accepts vararg arguments instead of a List in order to have a consistent API with the other methods of LanguageDetectorBuilder
- If a language iso 639-1 code is passed to LanguageDetectorBuilder.fromIsoCodes() which does not exist, then an IllegalArgumentException is thrown. Previously, Language.UNKNOWN was returned. However, this could lead to bugs as a LanguageDetector with Language.UNKNOWN was built. This is now prevented.
⚡️ Dependency Updates
- ⚡️ The Kotlin compiler, standard library and runtime have been updated from version 1.3.11 to 1.3.20

lingua changelog

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Changelog History Page 1

🐛 Bug Fixes

Other Changes

🐛 Bug Fixes

🐛 Bug Fixes

Languages

🔋 Features

👌 Improvements

🐛 Bug Fixes

✅ Test Coverage

📚 Documentation

Other changes

🐛 Bug Fixes

Languages

🔋 Features

👌 Improvements

Languages

🔋 Features

👌 Improvements

🐛 Bug Fixes

Languages

🔋 Features

👌 Improvements

🐛 Bug Fixes

👷 Build system

✅ Test Coverage

🐛 Bug Fixes

⚡️ Dependency Updates

Accuracy Improvements

API changes

⚡️ Dependency Updates

Changelog History

Page 1