Complexity of Inputs
The modern data stack that we have today is not capable of handling the complex input data that is required by AI systems.
In fact, the current practice in modern data stacks is geared more towards human-readable metrics, with most outputs being data column aggregates that usually require only 1 to 3 input columns.
However, AI has specific needs that are not met by the current data stack. For one, each output in an AI system will typically require from ten to several hundreds of input columns, and the inputs themselves require computationally complex data transformations. To make things even more challenging, the feature lists need to be regularly updated as the world changes.
The consequences of these limitations are significant. AI pipelines have exponentially more potential points of failure, and greater volumes of data need to be moved around. Moreover, this complexity is often built upon inefficient and unreadable SQL spaghetti code that is almost impossible to maintain and update.
It’s time we started building AI-ready data stacks. To address these challenges, we need to minimize data movement and do feature engineering inside the database. We also need automatically generated and optimized SQL, as well as a human-readable source of truth with version control. By doing so, we can create a modern data stack that can handle the complex input data required to unlock the full potential of AI.