Is Tabular Data Easy?

Recent Posts

February 06, 2023

Are tabular data and unstructured data equally tricky for AI/ML projects? ChatGPT says tabular data is “easier to use,” – but there’s much more to it than that.

Tabular data IS easier to work with…if you keep it simple. Think of a single, not-too-large, immutable table with no data issues.

However, in the real world, tabular data isn’t so neat and simple. It is typically more difficult to use because it:

Has one-to-many relationships across tables in a database
Often has missing or incorrect values
Spans time with potential leakage and structural changes
Requires context and domain knowledge
Requires feature engineering before becoming AI-ready
Doesn’t benefit from pre-trained models or transformers
Is sourced from an almost infinite set of schemas without well-defined AI-specific semantics
Underperforms without feature selection
Can contain collinearities that break many algorithms
Changes quite frequently

So, if you’re working on an AI project, it pays to have a good understanding of the tabular data you’re working with – and to put in the effort to prepare it correctly. What has your experience been with tabular data?

Explore more posts