30 July 2018 - Big Data & Data Science

OwnSearch: Web Scraper Software Development

Lots of IT Svit customers faced the need to find some specific information on their corporate websites fast. Platform-specific search engines were not perfect, so we decided to create a bespoke web scraper tool that can be added to a particular website and will create custom search indexes for any website fast.

Project requirements

IT Svit needed to overcome the following challenges:

  • Web crawlers must be lightweight and simple, yet efficient
  • The search index must be built and processed quickly
  • The tools must have convenient user interface
  • The tools must have low hardware requirements

Project results

IT Svit developed the required web scrapers and other Big Data solutions to enable our customer to form the data set for training their search engine. Toweya provided the basic specifications and we helped them create an easy to use and performant search engine platform, which enables incremental web search and provides precise results.

 

Location: Kharkiv, Ukraine

Partnership period: August 2015 – February 2018

Team size: 2 – 4 people

Team location: Kharkiv, Ukraine

Services: Cloud infrastructure design and development, Python development, Data Science, Big Data solutions, Machine Learning algorithms

Expertise delivered: Cloud infrastructure design and implementation, Python development, Big Data architecture design and management

Technologies: Python, asyncio, aiohttp

 

Product Overview

Client’s goals

The main challenge we had to deal with was the absence of the built-in search tools or their rigidity. We decided to build the web scraper solution anew and ensure it can easily interact with any type of CMS or website builder platform.

We wanted this tool to have the following characteristics:

  • High performance
  • Low system resource consumption
  • Ease of configuration
  • Simplicity of usage

Implementation and challenges resolved

The scraper was built with Python using the asyncio and aiohttp libs, and has met all the aforementioned requirements:

  • The scraper comes with a built-in webserver to ensure the simplicity of launching it
  • The tool can be easily integrated into any website
  • The search index results can be viewed through any browser
  • The scraper has low hardware requirements

Due to being written in Python, the tool works quickly

Contact Us




    Our website uses cookies to personalise content and to analyse our traffic. Check our privacy policy and cookie policy to learn more on how we process your personal data. By pressing Accept you agree with these terms.