From Idea to Deployment: 7 Considerations for Implementing Gen AI in ML Pipelines

Recent Posts

February 16, 2024

In a recent episode of The Machine Learning Podcast, Colin Priest, Advisor to FeatureByte, sat down with host Tobias Macey. They discussed valuable insights on applying generative AI in preparing AI data and building and deploying AI and ML pipelines.

We’ve boiled down their conversation into seven key considerations to ensure the successful use of a Gen AI based copilot.

1) Properly Define the Problem

Colin emphasized the importance of clearly defining the business problem rather than diving straight into the data science work: “Coming from more of a business background than an engineering background, I always like to start with properly defining the problem. I’ve seen too many people solve the wrong problem.” Part of the reason 85% of AI projects fail is due to unclear business objectives, so it’s critical to tie the AI model in question back to business value.

2) Unconstrained Feature Ideation

A unique approach Colin advocates for involves starting with feature ideation even before analyzing the data. He explained, “I like to go: what data, if I could have any data in the world, what would I like to get?” Colin explains that this method prevents data scientists from feeling boxed in and biased by the data they have. This unconventional approach encourages data scientists to think creatively and consider all possible features before evaluating the data.

3) More Features ≠ Better Models

When it comes to feature ideation and creation, more features do not necessarily translate into better ML models. Colin dives into some feature-generation tools that identify every possible feature in a dataset. “I’ve seen a number of tools out there that brute force feature generation. And you literally end up with thousands of features. And that’s not a good thing.” It’s important to look at the context of the use case and avoid tools that cause feature explosion, which can lead to lower-performing models.

4) Iterative Development

Feature engineering, experimentation, and optimization must be iterative throughout the model development. “There’s a lot of iterating back and forth…you discover more things about the data that you hadn’t realized.” This iterative approach allows for continuous improvement and refinement of features based on insights gained from experimentation.

5) Opportunities for Standardization

While every problem is unique, Colin notes that many repetitive tasks in feature pipeline development are ideal for automation. “Because every problem is different, you’re going to come up with a different set of features to solve it. If you’re going to be efficient about this, you’re going to try and reuse some of those features and reuse some code.” Standardizing feature creation, feature descriptions and pipeline generation, can help streamline the workflow. “It’s just too hard to deploy if everything is bespoke.”

6) Validation and Verification

Ensuring the accuracy and reliability of AI models requires rigorous validation and verification processes, especially when it comes to generative AI. To mitigate risks, Colin suggests applying unit testing, independent validation, and skepticism when interpreting generative AI outputs. He stressed, “You should always treat it with suspicion…It’s designed to look like and behave like a human, and we tend to trust humans more than we should. So, we’re likely to trust the Gen AI more than we should. I’m a great fan of putting in validation rules.”

7) Gen AI as a Domain Expert

Underneath the layer of necessary skepticism that comes with GenAI, Colin notes that it can be an incredibly useful tool in unlocking domain expertise. Truly great data scientists need a wealth of domain knowledge to be able to build high-performing ML models, but domain expertise typically only comes after years of working with the same types of use cases and problems. Gen AI is particularly useful for adding industry knowledge and domain expertise to a data science problem. “Gen AI can actually teach you something about the domain you work in at times because it’s had access to what people have published elsewhere.”

AI data prep and deployment is an age-old problem that is rife with the potential to be faster and more accurate. Data scientists have spent years dealing with the same challenges, with no credible way to address them. As Colin outlines in this episode, Gen AI can be an essential tool for data scientists to streamline and accelerate the entire feature lifecycle.

To listen to Colin and Tobias’s entire conversation, tune in to The Machine Learning Podcast episode below.

Tags:

#automated feature ideation

Explore more posts