Our telemetry service is built upon sunbird telemetry sdk. mSeva’s frontend React app pushes the telemetry JS file along with every response that it sends. Now, whenever a user interacts with any of the mSeva pages in any way, for e.g. entering values in a text box or when the window is loaded etc. , event will get triggered and will be recorded.
The telemetry payload consists of an events array. In DIGIT, only 3 event types are used namely START, SUMMARY and END event types. START signifies that events have now started getting collected for a particular page. SUMMARY signifies collection of all the data that is required to get collected for that particular page for e.g. time spent by the user on that page, times at which user came into the page, left the page for another tab etc. are all recorded as part of SUMMARY event. END event signifies the end of collecting events. All these events keep getting collected and are bundled and sent when either the URL changes or END event occurs.
Now, this event data is captured and pushed onto a kafka topic and goes into the processing pipeline where we make topic to topic transfer of the data. So, the format for events payload is checked, then it is pushed to another topic for de-duplicating. Similarly, the messages are unbundled and enriched via topic to topic transfer i.e. pick from one topic and push to another. In this case, there are two sinks, namely the Amazon S3 bucket and ES bucket. To perform this topic to topic transfer of data across the various components of processing pipeline, kafka streams (KStreams) are used which are nothing but a consumer and producer coupled together. To push data to S3 bucket, secor service is being used which is a service developed by Pinterest to pick up JSON data and to push it onto configured S3 buckets. Secor does not always create a new JSON file to any new data that it gets. There are two triggers for it, namely, reaching a particular threshold size or reaching a particular time threshold. To push data to the ES sink, kafka connect is being used. Now, instead of making single API calls every time a message is received, the messages are again combined and persisted onto the ES index via bulk insert.