All Versions
6
Latest Version
Avg Release Cycle
218 days
Latest Release
1549 days ago

Changelog History

  • v0.4.4 Changes

    January 29, 2020
    • ⬆️ Upgrade Kotlin to 1.3.61
    • ⚡️ Upgrade kotlinx.coroutines. This required an update to some of the places where coroutine builders were called internally.
    • ⬆️ Upgrade Gradle wrapper
  • v0.4.3 Changes

    November 26, 2017
    • ➕ Added ability to clear crawl queues by RequestId and Age, see Krawler#removeUrlsByRootPage
      🚚 and Krawler#removeUrlsByAge
    • ➕ Added config option to prevent crawler shutdown on empty queues
    • ➕ Added new single byte priority field to KrawlQueueEntry. Queues will always attempt to pop the lowest priority
      entry available. Priority can be assigned by overriding the Krawler#assignQueuePriorty method.
    • ⚡️ Update dependencies
  • v0.4.1 Changes

    August 16, 2017

    0.4.1 (2017-8-15)

    • ✂ Removed logging implementation from dependencies to prevent logging conflicts when used as a library.
    • ⚡️ Updated Kotlin version to 1.1.4
    • ⚡️ Updated kotlinx.coroutines to .17
  • v0.4.0 Changes

    May 16, 2017

    0.4.0 (2017-5-17)

    🚚 Rewrote core crawl loop to use Kotlin 1.1 coroutines. This has effectively turned the crawl process into a multi-stage pipeline. This architecture change has removed the necessity for some locking by removing resource contention by multiple threads.

    ⚡️ Updated the build file to build the simple example as a runnable jar

    Minor bug fies in the KrawlUrl class.

  • v0.3.2 Changes

    March 03, 2017

    🛠 Fixed a number of bugs that would result in a crashed thread, and subsequently an incorrect number of crawled pages
    👷 as well as cause slowdowns due to a reduced number of worker threads.

    ➕ Added a new utility function to wrap doCrawl and log any uncaught exceptions during crawling.

  • v0.3.1 Changes

    February 02, 2017
    • Created 1:1 mapping between threads and the number of queues used to serve URLs to visit. URLs have an
      affinity for a particular queue based on their domain. All URLs from that domain will end up in the same
      🐎 queue. This improves parallel crawl performance by reducing the frequency that the politeness delay
      effects requests. For crawls bound to fewer domains than queues, the excess queues are not used.
    • 🛠 Many bug fixes including fix that eliminates accidental over-crawling.