Digital Marketing in India has grown phenomenally over the last 5 years as a direct result of growth in eCommerce. It is not just the big eComm players who have been at the forefront, the small and medium businesses (or the long tail) have been leveraging Digital Marketing aggressively as well. What this means for a company such as Sokrati is that, in a short span of few months, the number of clients have grown from a few hundred to a few thousand. Along with the client growth, data such as merchandise, web events, campaigns, CRMs has grown exponentially as well.
While it is good news for the business, it has created many interesting challenges for the Data team. One such challenge is having to define standards around the data. Ultimately, it’s not just the architecture that needs to scale, but the nomenclature and organization of data must be in place for the right value to be extracted. On this blog, we will go over some of the interesting experiences from the Data Standardization journey we face at Sokrati.
Let us begin by understanding the problem in deeper detail. Let us focus on the data from web events.
Problem: How do you standardize information coming across web clicks collected from different websites? If you see, for common set of events, there are uncommon sets of definitions, locations and forms across websites. How do you derive useful information from the chaos?
If you look at a typical eComm funnel, there are multiple steps along the way – Search, View Product, Add to Cart, Checkout. It is important to capture details around these events, not only to be able to correctly attribute the conversions to the ads, but also to be able to re-market effectively and support many more such use cases.
Step 1: Define Gold Standard
It is quite obvious that the initial step has to be about defining the standards. Without the gold standard, the result would be chaos. For example, there could be 3 different ways to name a product category – Category, Product_category, Cats, and so on. In such a world, it would be difficult even to run a basic query let alone running any analytics on top. Imagine trying to get top 5 best-selling categories for your clients!
One of the first things that happened at Sokrati was the funnel parameter standardization. We defined a set of events that were interesting from a marketer’s perspective (search, sort, add to cart, checkout, etc.) and also defined a set of attributes with naming convention for each event. There were all in all 50+ attributes for which the gold standard was defined. While, from technical perspective, it might all look straightforward, getting consensus from various stakeholders is not that easy. Things always get interesting when you put businessmen and engineers in the same room. Luckily for us, the rational sense prevailed, and different views converged quickly.
Step 2: Roll Out Changes
There is a lot of truth when they say that success is about 10% inspiration and 90% perspiration. While getting consensus was difficult enough, rolling out of the changes turned out to be even more so.
Firstly, we had to understand the impact of changes on our existing reporting and analytics apps. We had to create some customizations for the applications that would not be able to migrate overnight for whatever reasons. For example, we had an intelligent bidding system that was live and could not be touched. We had to maintain backward compatibility for some fields in the data. Welcome to the real world!
Secondly, any data changes need to be planned with extreme precaution. The roll-out has to be automated and tested thoroughly, any potential client impact needs to be measured and communicated. The data team had to come up with a plan that was detailed enough, and at the same time flexible enough. Frankly, it is about your ability to persist with your intent and then execute. It took us almost 8 weeks to roll out the changes to a majority of our clients.
Step 3: Post-Processing
Logically, once you define the standard, and you roll it out, your job is done. In reality, things are not quite that simple. There could be multiple things that could go wrong – the product information you expected on a checkout is not in the standardized format – someone calls it Footwear and someone else calls it Chappals; or it could be that the required currency is completely missing on Add To Carts.
There could also be unannounced changes to the upstream systems. For example, a client might introduce a new checkout step in between, throwing your calculations off completely. Or he might start listing products in multiple currencies. You have to be able to monitor these changes, and then process your data accordingly. We had to document at least use cases with a major impact and develop post-processing logic. This part of the work is still on-going.
So, where are we today and how did we fare? The data standardization journey started about a year back; the results are evident today. Most of the data is now more or less standardized. It’s not just the 80% of 25+ million daily web events which are standardized, but the majority of campaign, merchandise, vendor performance and social data which is well over 2 TB is now being collected in a standard format. This makes the queries and services on top of it all uniform and much more extensible (more about these on future blogs).
A picture is worth a thousand words. The dashboard that would have taken many weeks before this exercise, was built literally in a couple of hours post standardization.