Is Tabular Data Easy?

Recent Posts

February 06, 2023

Are tabular data and unstructured data equally tricky for AI/ML projects? ChatGPT says tabular data is “easier to use,” – but there’s much more to it than that.

Tabular data IS easier to work with…if you keep it simple. Think of a single, not-too-large, immutable table with no data issues.

However, in the real world, tabular data isn’t so neat and simple. It is typically more difficult to use because it:

  • Has one-to-many relationships across tables in a database
  • Often has missing or incorrect values
  • Spans time with potential leakage and structural changes
  • Requires context and domain knowledge
  • Requires feature engineering before becoming AI-ready
  • Doesn’t benefit from pre-trained models or transformers
  • Is sourced from an almost infinite set of schemas without well-defined AI-specific semantics
  • Underperforms without feature selection
  • Can contain collinearities that break many algorithms
  • Changes quite frequently

So, if you’re working on an AI project, it pays to have a good understanding of the tabular data you’re working with – and to put in the effort to prepare it correctly. What has your experience been with tabular data?

Explore more posts

© 2024 FeatureByte All Rights Reserved