Is Tabular Data Easy?
Are tabular data and unstructured data equally tricky for AI/ML projects? ChatGPT says tabular data is “easier to use,” – but there’s much more to it than that.
Tabular data IS easier to work with…if you keep it simple. Think of a single, not-too-large, immutable table with no data issues.
However, in the real world, tabular data isn’t so neat and simple. It is typically more difficult to use because it:
- Has one-to-many relationships across tables in a database
- Often has missing or incorrect values
- Spans time with potential leakage and structural changes
- Requires context and domain knowledge
- Requires feature engineering before becoming AI-ready
- Doesn’t benefit from pre-trained models or transformers
- Is sourced from an almost infinite set of schemas without well-defined AI-specific semantics
- Underperforms without feature selection
- Can contain collinearities that break many algorithms
- Changes quite frequently
So, if you’re working on an AI project, it pays to have a good understanding of the tabular data you’re working with – and to put in the effort to prepare it correctly. What has your experience been with tabular data?