The challenge

Our client TES undertook a full digital transformation with the help of the GuideSmiths team. In the past they hadn’t maximised the power of their data, so decided to invest in building an ETL / Big Data platform to better understand their audience, as well as keep track of business KPIs using dashboards.

The solution

GuideSmiths created an ETL platform (extract, transform & load) with the aim of collecting data from multiple sources. We normalised and transformed the data to store it in a Data Warehouse and other databases, in order to make this data available to applications, APIs and data scientists. To achieve this, we created more than twenty microservices, a Hadoop-based transformation layer and an integration layer to move data around.


This is just the tip of the iceberg as we also use the Hadoop ecosystem to aggregate and process data, way richer than the original subsets. This draws the line between what is data, analytics and insight. When you have insight, you have power, and only then can you make thoughtful decisions. If you don’t have insights extracted from your data, all you have is an opinion.

All data and processes are monitored using Datadog dashboards, and alerts are notified when something abnormal happened. More than 20GBs of data are ingested on a daily basis, just from user interactions.

The tech

  • We implemented our microservices in NodeJS, hosted in AWS using a CI/CD pipeline in jenkins.
  • For the Data Warehouse we used Amazon Redshift. AWS was also our first choice for the Hadoop ecosystem: EMR gave us access to starting Hive / Spark clusters for batch processing overnight. Here you can read a bit more about the story of how we migrated from Hive to Spark.
  • Some of the databases we used to digest and store data are Postgres, Redis, MySql and Mongo.
  • Finally, for our integration layer we used RabbitMQ and Amazon Kinesis.

