Showcase: News Hub

Most recent news aggregation service – NewsHub tends to provide a users with latest digests in PDF from more then major 450 sources all over the World.

NewsHub uses crawling and scraping platform of TagsReaper.com that allows to process more then 60,000 pages daily, generating pdf digests split by language and subjects ready to download.

Now NewsHub not only delivers to subscribers freshest pdf-digests but also provides largest and most recent news articles database for search, selection and analisys over API by external apps.

Learn More

Showcase: Tags Reaper

TagsReaper is user-friendly data mining service that lets you crawl and extract web content of a different kind (dynamic or static), from multiple IPs and locations, scrap web pages, make data validation, save to archive (JSON, XML, CSV), and transmit it to external application or database.

User can monitor running spiders, schedule new jobs and stop running ones, all from a single page.

It only takes a few minutes to start using our powerful, custom-built system to collect data from the web.

Learn More

Showcase: Chaika

Chaika project is hierarchical cluster engine (HCE) developed since 2006 used to build hierarchical networking and parallel data processing systems. Chaika implemented as binary executable written on pure C++ with usage of modern patterns and libraries like POCO C++ Framework, STL C++11, ZMQ etc. and distributed under GPL2 license.

Learn More

Delivering useful content

We're focused on data mining technologies, personalization, artificial intelligence algorithms, self-learning systems, clustering and distributed data processing.

News & Updates

Latest projects overview

Showcase: TagsReaper

Showcase: TagsReaper

Posted by admin on Feb 29, 2016

Data extraction never been easier! Our news revolutionary web service provide 3 powerful ways to scrap a data from potentially ANY website. Need a data for Market Research your e-Commerce business Analysis Marketing Strategy Or Content for your Website You can probably… spend a weeks colleting a data manually. Or you can use Scrapping Tool – automated web service for data extraction into structured format. But, which one is the best? Firstly, simplicity of setup. Starting a project should be just about to Add URL Make template Specify Results Format No coding, no skills, no time consuming operations. Tags Reaper is JUST AS SIMPLE. Scraping process of a website can be done in 2 general ways: Automated, which works well in most cases. Template-based used to extract a websites with complicated structure. Template creation process is tricky: it requires selecting a parts of webpage pointing scraping tool which one stands for Headline, Body, Picture or Price. Tags Reaper made it simple! With build-in Visual Tag Picker you can add new Template in just a few clicks, selecting elements of content you want to extract. Click on it, Name it – and you’re done. For even better extraction, Tags Reaper has an option to add Multiple Rules and Multiple Templates for same portion of page to detect which one works better. Or even put results of several templates together. Templates are based on xpath which is much more Steady against page modifications, comparing with Styles-based or DOM-based methods, used by other services. But how many pages I can extract with Tags Reaper daily? Using truly distributed and hierarchical cluster structure...

Learn More
NewsHub

NewsHub

Posted by admin on Feb 29, 2016

Most recent news aggregation service – NewsHub tends to provide a users with latest digests in PDF from more then major 450 sources all over the World. NewsHub uses crawling and scraping platform of TagsReaper.com that allows to process more then 60,000 pages daily, generating pdf digests split by language and subjects ready to download. Now NewsHub not only delivers to subscribers freshest pdf-digests but also provides largest and most recent news articles database for search, selection and analisys over API by external...

Learn More
SNATZ: trends in focus

SNATZ: trends in focus

Posted by admin on Feb 26, 2014

SNATZ is a personalized news discovery experience, delivering the latest in tech-related news. We help you discover the articles that you want to read from your PC, tablet or smartphone, without the intensive searching. Start by reading the latest technology news on our Trending page. To experience the full power of SNATZ, register and connect your Facebook and/or Twitter accounts. If you’re migrating from a standard newsreader, you can also import your subscriptions by uploading an OPML or XML file, or an exported ZIP file. Why We Built SNATZ As technology news consumers ourselves, we were unsatisfied with the existing options available. On the one hand, traditional RSS readers were overwhelming, leading to information overload; on the other, many personalized news services felt too narrow, focusing too much on one type of news. What if we could do one better, and create a service that delivered tech news covering a broader, yet still interesting, range of topics? The cherry on top: tailoring it to the individual reader. The problem of information overload is, like any problem, just one side of a bright new opportunity. As the quantity of news and information grows by the hour, we need new means and approaches to organize and make sense of it all. It becomes increasingly harder to find what matters to us. Many tools use content filters to try to solve this problem, which media theorist Clay Shirky once described as “not information overload, but filter failure.” SNATZ uses original Snatch technology to solve the most common filtering problem – Long Tail and Filter Bubble. Read more about new approach in content personalization...

Learn More