Does Data-Centric AI Apply to Tabular Data Too?

Recent Posts

February 21, 2023

The key to unlocking value in AI lies in a data-centric approach, according to Andrew Ng. Data-centric AI, in his opinion, is based on the following principles:

It’s time to focus on the data – after all the progress achieved in algorithms means it’s now time to spend more time on the data
Inconsistent data labels are common since reasonable, well-trained people can see things differently.
Data that has errors and is messy is often fixed by ad hoc data engineering that relies on luck or individual data scientists’ skills.
Making data engineering more systematic through principles and tools will be key to making AI algorithms work.
Smaller amounts of high-quality data might be sufficient for industries without access to tons of data.

The examples that Ng provides to explain data-centric AI have one thing in common – they come from his experience in developing deep learning applications on unstructured data such as images. Although tabular data are less commonly required to be labeled, his other points apply, as tabular data, more often than not, contains errors, is messy, and is restricted by volume. Feature engineering of tabular data demands considerable manual effort, making tabular data preparation even more dependent on luck or the data scientist’s skill set.

Data-centric approaches to AI are as valuable for tabular data as for unstructured data. How are you applying data-centric AI to your tabular data?

Explore more posts