Isn't this a flawed approach? It seems like Khan Academy is trying to re-constru...

Isn't this a flawed approach? It seems like Khan Academy is trying to re-construct a record of behaviours across their business by stitching together:

1. Parsing web logs for web page views and API accesses

2. Exporting "some client-side events" from MixPanel

3. Mining their transactional databases for state changes

On #1 - web caching and client-side events have long invalidated web log based analytics approaches. How is Khan different?

On #3 - this is reverse engineering your user behaviours by mining state changes in your transactional systems. This is typically a ton of work, it breaks when you change your data models, and your operational systems aren't designed to reveal user behaviours anyway.

Have Khan explored alternative approaches? Typically: defining with the analyst team a set of events you want to monitor, making sure all of your systems (client-side, mobile, server-side, whatever) emit immutable streams of these events, and then collecting, storing, enriching, analyzing at your leisure.