Personalized News Aggregator

One of IT Svit customers required to build a user-friendly news aggregator platform from scratch. He wanted each customer to be able to form a unique, personalized news feed, based on their preferences and search filters.

Project requirements

The customer wanted each customer to be able to form a unique, personalized news feed, based on their preferences and search filters. IT Svit had to provide the following features:

The news data must be relevant, meaning the data set must be frequently updated
The news must be categorized to simplify the navigation and personalize the user experience
To be valuable, only the primary source of the news must be listed, not the duplicates

Project results

IT Svit has built a powerful, flexible and highly performant news aggregation platform. Our subscribers could embed our news feed code into their websites and have the latest niche news related to their industry shown on respective websites.

Location: Netherlands
Partnership period: December 2016 – July 2017
Team size: 2 people
Team location: Kharkiv, Ukraine
Services: Big Data Development, Machine Learning, Web Development, Data Science, Cloud Architecture, Python Development
Expertise delivered: Big Data solutions, training ML models, rapid detection system for checking the news source updates
Technologies: Python, Django, Docker, jQuery, Theano, aiohttp, MongoDB, SphinxSearch

Product Overview

Client’s Goals

The customer needed a platform for automated collection, verification, deduplication and classification of industry-specific news from a variety of credible sources. These news feeds would be both shown on a web portal, and/or could be embedded into any website.

The system should have had the following features:

The data set must be frequently updated and kept relevant
The news must be categorized to simplify the navigation and personalize the user experience
Only the primary source of the news must be listed, not the duplicates

Implementation and challenges resolved

IT Svit team started by building a highly-performant web scraper tool using aiohttp library. The data gathered was stored to MongoDB. We created several Machine Learning models using Theano and selected the one that provided optimal performance. We used these Machine Learning algorithms to filter out the irrelevant and/or unwanted content.

The platform front-end was built using jQuery, the back-end instances were containerized and automatically scaled using Docker. We used SphinxSearch to allow the customers to create their personalized news feeds.

As the product was intended to be used as a plugin on any website, we had to make it both simplistic and highly configurable.

We created a huge base of news sources worldwide, split by industries, locations, spheres of interest
This base was frequently and rapidly updated, using a highly performant tool that swiftly analyzed the article content to check if it was updated
The platform was able to create a personalized news feed for any customer, based on their preferences and settings
The aggregator could be easily embedded into any website as a plugin with simple configuration