Popularity

7.9

Growing

Activity

6.4

Stars 752

Watchers 12

Forks 56

Last Commit about 2 months ago

Programming language: Kotlin

License: MIT License

Tags: Web

skrape.it alternatives and similar libraries

Based on the "Web" category.
Alternatively, view skrape.it alternatives based on common mentions on social networks and blogs.

ktor

9.9 9.4 skrape.it VS ktor

Framework for quickly creating connected applications in Kotlin with minimal effort
javalin

9.7 9.1 skrape.it VS javalin

DISCONTINUED. A simple and modern Java and Kotlin web framework [Moved to: https://github.com/javalin/javalin]

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

apollo-android

9.4 9.8 skrape.it VS apollo-android

:robot: A strongly-typed, caching GraphQL client for the JVM, Android, and Kotlin multiplatform.
http4k

9.1 9.8 skrape.it VS http4k

The Functional toolkit for Kotlin HTTP applications. http4k provides a simple and uniform way to serve, consume, and test HTTP services.
GraphQL Kotlin

8.9 7.9 skrape.it VS GraphQL Kotlin

Libraries for running GraphQL in Kotlin
jooby

8.8 9.7 L3 skrape.it VS jooby

The modular web framework for Java and Kotlin
kotlinx.html

8.6 7.4 skrape.it VS kotlinx.html

Kotlin DSL for HTML
KVision

8.5 8.8 skrape.it VS KVision

Object oriented web framework for Kotlin/JS
kotless

8.4 0.0 skrape.it VS kotless

Kotlin Serverless Framework
core

8.3 8.4 skrape.it VS core

A Kotlin web framework
spark-kotlin

8.3 0.0 skrape.it VS spark-kotlin

A Spark DSL in idiomatic kotlin // dependency: com.sparkjava:spark-kotlin:1.0.0-alpha
hexagon

7.8 9.5 skrape.it VS hexagon

Hexagon is a microservices toolkit written in Kotlin. Its purpose is to ease the building of services (Web applications or APIs) that run inside a cloud platform.
kara

7.6 0.0 skrape.it VS kara

Kotlin Web Framework for the JVM
wasabi

7.5 0.0 skrape.it VS wasabi

An HTTP Framework
A pure Kotlin, UI framework

7.4 8.9 skrape.it VS A pure Kotlin, UI framework

A pure Kotlin UI framework for the Web (and desktop).
firefly

7.3 6.0 skrape.it VS firefly

Firefly is an asynchronous web framework for rapid development of high-performance web application.
fritz2

7.3 8.6 skrape.it VS fritz2

Easily build reactive web-apps in Kotlin based on flows and coroutines.
KGraphQL

7.1 0.0 skrape.it VS KGraphQL

DISCONTINUED. A GraphQL implementation written in Kotlin
vertx-lang-kotlin

6.8 7.2 skrape.it VS vertx-lang-kotlin

Vert.x for Kotlin
Kanary

6.6 0.0 skrape.it VS Kanary

A minimalist web framework for building REST APIs in Kotlin/Java.
vaadin-on-kotlin

5.8 8.7 skrape.it VS vaadin-on-kotlin

Writing full-stack statically-typed web apps on JVM at its simplest
alpas

5.7 0.0 skrape.it VS alpas

🚀 The Rapid and Delightful Kotlin Web Framework. Easy, elegant, and productive!
kraph

5.5 0.0 skrape.it VS kraph

GraphQL request string builder written in Kotlin
kovert

5.4 0.0 skrape.it VS kovert

The invisible REST and web framework
ShapeShift️

5.2 0.8 skrape.it VS ShapeShift️

A Kotlin/Java library for intelligent object mapping and conversion between objects.
krawler

5.0 0.0 skrape.it VS krawler

A web crawling framework written in Kotlin
lambda-kotlin-request-router

4.5 8.1 skrape.it VS lambda-kotlin-request-router

A REST request routing layer for AWS lambda handlers written in Kotlin
yested

4.5 0.0 skrape.it VS yested

A Kotlin framework for building web applications in Javascript.
KotlinPrimavera

4.5 0.0 skrape.it VS KotlinPrimavera

Spring support libraries for Kotlin
kottpd

4.2 0.0 skrape.it VS kottpd

REST framework written in pure Kotlin
kog

3.1 0.0 skrape.it VS kog

🌶 A simple Kotlin web framework inspired by Clojure's Ring.
tekniq

2.9 7.8 skrape.it VS tekniq

A framework designed around Kotlin providing Restful HTTP Client, JDBC DSL, Loading Cache, Configurations, Validations, and more
Pellet

2.3 4.1 skrape.it VS Pellet

An opinionated, Kotlin-first web framework that helps you write fast, concise, and correct backend services 🚀.
bootique-kotlin

2.2 7.2 skrape.it VS bootique-kotlin

RETIRED. Provides extension functions and features for smooth development with Bootique and Kotlin.
kotlin

2.2 0.0 skrape.it VS kotlin

DISCONTINUED. Starter project for Kotlin
h

2.0 0.0 skrape.it VS h

Html templating library for kotlin
Zeko-RestApi

1.5 0.0 skrape.it VS Zeko-RestApi

Asynchronous web framework for Kotlin. Create REST APIs in Kotlin easily with automatic Swagger/OpenAPI doc generation
graphql-kotlin-toolkit

0.9 0.0 skrape.it VS graphql-kotlin-toolkit

GraphQL toolkit for Kotlin.
komock

0.8 0.0 skrape.it VS komock

KoMock - Simple HTTP/Consul/SpringConfig http server framework written in Kotlin. Wiremock use cases
voyager-server-spring-boot-starter

0.5 0.0 skrape.it VS voyager-server-spring-boot-starter

Easily create REST endpoints with permissions (access control level) and hooks includeded
sponge

0.4 4.7 skrape.it VS sponge

sponge is a website crawler and links downloader command-line tool
kweb-core

- skrape.it VS kweb-core

Build rich live-updating web apps in pure server-side Kotlin.
graphql-kotlin

- skrape.it VS graphql-kotlin

Code-only GraphQL schema generation for Kotlin

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of skrape.it or a related project?

Add another 'Web' Library

Popular Comparisons

README

last commit [](#)

skrape{it}

skrape{it} is a Kotlin-based HTML/XML testing and web scraping library that can be used seamlessly in Spring-Boot, Ktor, Android or other Kotlin-JVM projects. The ability to analyze and extract HTML including client-side rendered DOM trees and all other XML-related markup specifications such as SVG, UML, RSS,... makes it unique. It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. First and foremost skrape{it} aims to be a testing tool (not tied to a particular test runner), but it can also be used to scrape websites in a convenient fashion.

Features

Parsing

[x] Deserialization of HTML/XML from websites, local html files and html as string to data classes / POJOs.
[x] Designed to deserialize HTML but can handle any XML-related markup specifications such as SVG, UML, RSS or XML itself.
[x] DSL to select html elements as well as supporting CSS query-selector syntax by string invocation. ### Http-Client
[x] Http-Client without verbosity and ceremony to make requests and corresponding request options like headers, cookies etc. in a fluent style interface.
[x] Pre-configure client regarding auth and other request settings
[x] Can handle client side rendered web pages. Javascript execution results can optionally be considered in the response body. ### Idiomatic
[x] Easy to use, idiomatic and type-safe DSL to ensure a high level of readability.
[x] Build-in matchers/assertions based on infix functions to archive a very high level of readability.
[x] DSL is behaving like a Fluent-Api to make data extraction/scraping as comfortable as possible. ### Compatibility
[x] Not bind to a specific test-runner, framework or whatever.
[x] Open to use any other assertion library of your choice.
[x] Open to implement your own fetcher
[x] Supports non-blocking fetching / Coroutine support ### Extensions In addition, extensions for well-known testing libraries are provided to extend them with the mentioned skrape{it} functionality. Currently available:
skrape{it} MockMvc extension
skrape{it} Ktor extension

Quick Start

Read the Docs

You'll always find the latest documentation, release notes and examples regarding official releases at https://docs.skrape.it. The README file you are reading right now provides example related to the latest master. Just use it if you won't wait for latest changes to be released. If you don't want to read that much or just want to get a rough overview on how to use skrape{it}, you can have a look at the Documentation by Example section which refers to the current master.

Installation

All our official/stable releases will be published to mavens central repository.

Add dependency

Gradle

dependencies {
    implementation("it.skrape:skrapeit:1.2.2")
}

Maven

<dependency>
    <groupId>it.skrape</groupId>
    <artifactId>skrapeit</artifactId>
    <version>1.2.2</version>
</dependency>

using bleeding edge features before official release

We are offering snapshot releases by publishing every successful build of a commit that has been pushed to master branch. Thereby you can just install the latest implementation of skrape{it}. Be careful since these are non-official releases and may be unstable as well as breaking changes can occur at any time.

Add experimental stuff

Gradle

repositories {
    maven { url = uri("https://oss.sonatype.org/content/repositories/snapshots/") }
}
dependencies {
    implementation("it.skrape:skrapeit:0-SNAPSHOT") { isChanging = true } // version number will stay - implementation may change ...
}

// optional
configurations.all {
    resolutionStrategy {
        cacheChangingModulesFor(0, "seconds")
    }
}

Maven

<repositories>
    <repository>
        <id>snapshot</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
    </repository>
</repositories>

...

<dependency>
    <groupId>it.skrape</groupId>
    <artifactId>skrapeit</artifactId>
    <version>0-SNAPSHOT</version>
</dependency>

Documentation by Example

(referring to current master)

You can find further examples in the projects integration tests.

Android

We have a working Android sample using jetpack-compose in our example projects as living documentation.

Parse and verify HTML from String

@Test
fun `can read and return html from String`() {
    htmlDocument("""
        <html>
            <body>
                <h1>welcome</h1>
                <div>
                    <p>first p-element</p>
                    <p class="foo">some p-element</p>
                    <p class="foo">last p-element</p>
                </div>
            </body>
        </html>""") {

            h1 {
                findFirst {
                    text toBe "welcome"
                }
            }
            p {
                withClass = "foo"
                findFirst {
                    text toBe "some p-element"
                    className toBe "foo"
                }
            }
            p {
                findAll {
                    text toContain "p-element"
                }
                findLast {
                    text toBe "last p-element"
                }
            }
        }
    }
}

Parse HTML and extract

data class MySimpleDataClass(
    val httpStatusCode: Int,
    val httpStatusMessage: String,
    val paragraph: String,
    val allParagraphs: List<String>,
    val allLinks: List<String>
)

class HtmlExtractionService {

    fun extract() {
        val extracted = skrape(HttpFetcher) {
            request {
                url = "http://localhost:8080"
            }

            response {
                MySimpleDataClass(
                    httpStatusCode = status { code },
                    httpStatusMessage = status { message },
                    allParagraphs = document.p { findAll { eachText } },
                    paragraph = document.p { findFirst { text } },
                    allLinks = document.a { findAll { eachHref } }
                )
            }
        }
        print(extracted)
        // will print:
        // MyDataClass(httpStatusCode=200, httpStatusMessage=OK, paragraph=i'm a paragraph, allParagraphs=[i'm a paragraph, i'm a second paragraph], allLinks=[http://some.url, http://some-other.url])
    }
}

Parse HTML and extract it

data class MyDataClass(
        var httpStatusCode: Int = 0,
        var httpStatusMessage: String = "",
        var paragraph: String = "",
        var allParagraphs: List<String> = emptyList(),
        var allLinks: List<String> = emptyList()
)

class HtmlExtractionService {

    fun extract() {
        val extracted = skrape(HttpFetcher) {
            request {
                url = "http://localhost:8080"
            }           

            extractIt<MyDataClass> {
                it.httpStatusCode = statusCode
                it.httpStatusMessage = statusMessage.toString()
                htmlDocument {
                    it.allParagraphs = p { findAll { eachText }}
                    it.paragraph = p { findFirst { text }}
                    it.allLinks = a { findAll { eachHref }}
                }
            }
        }
        print(extracted)
        // will print:
        // MyDataClass(httpStatusCode=200, httpStatusMessage=OK, paragraph=i'm a paragraph, allParagraphs=[i'm a paragraph, i'm a second paragraph], allLinks=[http://some.url, http://some-other.url])
    }
}

Testing HTML responses:

@Test
fun `dsl can skrape by url`() {
    skrape(HttpFetcher) {
        request {
            url = "http://localhost:8080/example"
        }       
        response {
            htmlDocument {
                // all official html and html5 elements are supported by the DSL
                div {
                    withClass = "foo" and "bar" and "fizz" and "buzz"

                    findFirst {
                        text toBe "div with class foo"

                        // it's possible to search for elements from former search results
                        // e.g. search all matching span elements within the above div with class foo etc...
                        span {
                            findAll {
                                // do something
                            }                       
                        }                   
                    }

                    findAll {
                        toBePresentExactlyTwice
                    }
                }
                // can handle custom tags as well
                "a-custom-tag" {
                    findFirst {
                        toBePresentExactlyOnce
                        text toBe "i'm a custom html5 tag"
                        text
                    }
                }
                // can handle custom tags written in css selctor query syntax
                "div.foo.bar.fizz.buzz" {
                    findFirst {
                        text toBe "div with class foo"
                    }
                }

                // can handle custom tags and add selector specificas via DSL
                "div.foo" {

                    withClass = "bar" and "fizz" and "buzz"

                    findFirst {
                        text toBe "div with class foo"
                    }
                }
            }
        }
    }
}

Scrape a client side rendered page:

fun getDocumentByUrl(urlToScrape: String) = skrape(BrowserFetcher) { // <--- pass BrowserFetcher to include rendered JS
    request { url = urlToScrape }
    response { htmlDocument { this } }
}


fun main() {
    // do stuff with the document
    println(getDocumentByUrl("https://docs.skrape.it").eachLink)
}

Scrape async

skrape{it}'s `AsyncFetcher` provides coroutine support

suspend fun getAllLinks(): Map<String, String> = skrape(AsyncFetcher) {
    request {
        url = "https://my-fancy.website"
    }
    response {
        htmlDocument { eachLink }
    }
}

Configure HTTP-Client:

class ExampleTest {
    val myPreConfiguredClient = skrape(HttpFetcher) {
        // url can be a plain url as string or build by #urlBuilder
        request {
            method = Method.POST // defaults to GET

            url = "" // you can  either pass url as String (defaults to 'http://localhost:8080')
            url { // or build url (will respect value from url as String param)
                // thereby you can pass a url and just override or add parts
                protocol = UrlBuilder.Protocol.HTTPS // defaults to given scheme from url param (HTTP if not set)
                host = "skrape.it" // defaults to given host from url param (localhost if not set)
                port = 12345  // defaults to given port from url param (8080 if not set explicitly - none port if given url param value does noit have port) - set to -1 to remove port
                path = "/foo" // defaults to given path from url param (none path if not set)
                queryParam { // can handle adding query parameters of several types (defaults to none)
                    "foo" to "bar" // add query paramter foo=bar
                    "aaa" to false // add query paramter aaa=false
                    "bbb" to .4711 // add query paramter bbb=0.4711
                    "ccc" to 42    // add query paramter ccc=42
                    "ddd" to listOf("a", 1, null) // add query paramter ddd=a,1,null
                    +"xxx"         // add query paramter xxx (just key, no value)
                }
            }
            timeout = 5000 // optional -> defaults to 5000ms
            followRedirects = true // optional -> defaults to true
            userAgent = "some custom user agent" // optional -> defaults to "Mozilla/5.0 skrape.it"
            cookies = mapOf("some-cookie-name" to "some-value") // optional
            headers = mapOf("some-custom-header" to "some-value") // optional
        }
    }

    @Test
    fun `can use preconfigured client`() {

        myPreConfiguredClient.response {
            status { code toBe 200 }
            // do more stuff
        }

        // slightly modify preconfigured client
        myPreConfiguredClient.apply {
            request {
                followRedirects = false
            }
        }.response {
            status { code toBe 301 }
            // do more stuff
        }
    }
}

send request body

1) plain as string

most low level option, needs to set content-type header "by hand"

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.GET
        headers = mapOf("Content-Type" to "application/json")
        body = """{"foo":"bar"}"""
    }
    response {
        htmlDocument {
            ...

2) plain text with auto added content-type header that can be optionally overwritten

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            data = "just a plain text" // content-type header will automatically set to "text/plain"
            contentType = "your-custom/content" // can optionally override content-type
        }
    }
    response {
        htmlDocument {
            ...

3) with helper functions for json or xml bodies

supports json and xml autocompletion when using intellij

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            json("""{"foo":"bar"}""") // will automatically set content-type header to "application/json" 
            // or
            xml("<foo>bar</foo>") // will automatically set content-type header to "text/xml" 
            // or
            form("foo=bar") // will automatically set content-type header to "application/x-www-form-urlencoded" 
        }
    }
    response {
        htmlDocument {
            ...

4 with on the fly created json via dsl

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            // will automatically set content-type header to "application/json"
            // will create {"foo":"bar","xxx":{"a":"b","c":[1,"d"]}} as request body
            json {
                "foo" to "bar"
                "xxx" to json {
                    "a" to "b"
                    "c" to listOf(1, "d")
                }
            }
        }
    }
    response {
        htmlDocument {
            ...

5 with on the fly created form via dsl

skrape(HttpFetcher) {
    request {
        url = "https://www.my-fancy.url"
        method = Method.POST
        body {
            // will automatically set content-type header to "application/x-www-form-urlencoded"
            // will create foo=bar&xxx=1.5 as request body
            form {
                "foo" to "bar"
                "xxx" to 1.5
            }
        }
    }
    response {
        htmlDocument {
            ...

Get in touch

If you need help, have questions on how to use skrape{it} or want to discuss features please don't hesitate to use the projects discussions section on GitHub or raise an issue if you found a bug.

Issues: You can raise issues on GitHub.
Discussions / Questions: Use the Discussions section or join the #skrape-it channel on the Kotlin Slack.
Twitter: Follow @skrape_it on Twitter for updates and release notifications.
Stackoverflow: post or search issues on Stackoverflow

:sparkling_heart: Support the project

Skrape{it} is and always will be free and open-source. I try to reply to everyone needing help using these projects. Obviously, the development, maintenance takes time.

However, if you are using this project and be happy with it or just want to encourage me to continue creating stuff or fund the caffeine and pizzas that fuel its development, there are few ways you can do it :-

Starring and sharing the project :rocket: to help make it more popular
Giving proper credit when you use skrape{it}, tell your friends and others about it :smiley:
Sponsor skrape{it} with a one-time donations via PayPal by just click this button → or use the GitHub sponsors program to support on a monthly basis :sparkling_heart:

*Note that all licence references and agreements mentioned in the skrape.it README section above are relevant to that project's source code only.

skrape.it