Generative AI project planning

Assorted patterns I've observed first hand.

Jun 19, 2023

Like folks at many other companies, I’ve been discussing and planning many Generative AI (I’ll call it GenAI here for short) projects and how they’d integrate into a user-facing product. I’ve also learned a lot from my peers that were generous enough to share their knowledge with me through conferences and informal chats. I was inspired to write down common threads of discussions at the Toronto Machine Learning Summit. Here’s what I’ve learned so far, and I hope this can give you some insight or resonate with how you’ve been approaching GenAI projects.

Disclaimer: this post is my personal opinion, and does not represent any employer, past or present.

What’s your goal with GenAI?

As a company that is in the tech space, it’s important to get into the “new” competitive space. For companies whose longevity doesn’t necessarily depend on the AI/ML space, they can afford to be slower trying to follow new trends, but should still be open to the idea of trying out new technologies. We’ve all seen what happens when some technologies get completely disregarded (e.g. Blockbuster). Even if examples so extreme might be a small amount, companies generally want to avoid that risk, and invest some into new technology. This has been happening since before GenAI, so I am used to seeing companies invest in new areas since “Big Data”, “Machine Learning” (Remember those booms?), and being part of that tech investment leading ML/AI projects.

I’m going to assume that the goal for a company investing in GenAI is “to stay competitive” in their respective industry and space. For a user product that is in the ML/AI space, it’s very easy on the business side to extrapolate that earlier entry into the GenAI space could make or break this company’s position a few years later; being slow now could mean playing catchup for years later.

For a product, this could mean the following:

Going straight to market; going into a user-facing product
Developing a PoC (proof of concept) first

2.) could lead back to 1.), but deciding from the get-go on 1.) means there will be a faster development loop.

How mature is your current org with ML?

Back to the earlier point about entering the GenAI space as soon as possible to gain a competitive edge, for some companies, this means going straight to market. Is this your company or organization?

For a smaller group of relevant companies, this will be easy since using ML/AI has been in their core investments for ages, think most big tech companies. They have the web platform, DevOps, MLOps, infrastructure to handle large amounts of data, and they have production ML already. It won’t be too different from a typical new large scale ML project to implement LLM-powered (large language models) features or products, or other types of GenAI (e.g. image, animation generation).

If your company has not been investing in machine learning for the last few years, I see that you’ll need to do the extra legwork to get to the place of companies that have.

A bit worried about the hype?

What I am seeing, however, is bypassing some of the usual safeguards of ML/AI projects in this rush to production. Usually, there might always be a proof of concept first, or longer experiments (such as A/B testing ML algorithms vs. non-ML methods).

Of course, despite the rush to ship GenAI into the product, the standards should clear the baseline of not being unethical or otherwise perform so poorly as to break customer trust. This could cause the company to lose competitive edge and even cause the opposite effect of investing in GenAI in the first place. Hence, I do see there being some caution in that sense, by thinking in the longer term, and not just unbridled hastiness to ship something out in GenAI.

How can you bring GenAI to production?

In this article I will focus on GenAI powered by LLMs (large language models) for text generation.

There are many ways to harness LLMs, for example

On demand, in production: Chat interface
Batch, In production: Use LLMs to do classifications that feeds into a recommender system; use LLMs to tag large corpuses of text with taxonomy tags, etc.

I’m going to focus on the chat interface or being able to use GenAI capabilities for product marketing (e.g. OpenAI driven or other types of language models). Here are some things on my mind to approach GenAI for production:

Trust

If you must add AI / LLM powered functionality to your product, will it increase trust with your user, or the opposite? Entering the GenAI space early can establish technical credibility, but once users start using your AI-powered app, will it stand up to scrutiny? A few screenshots of your AI responses with misinformation might lose some trust, unless you’re OpenAI.

Personally, as a developer, I’m more understanding of the tech needing more work at this moment, and not being flawless. But if the product is promising to save me time, not make me do more fact checking in case of hallucinations/misinformation, then I will lose trust in what the product is promised to accomplish.

Using RAG (retrieval augmented generation) can help with reducing misinformation.

User delight vs. user utility

Depending on the type of product messaging to the user, if you’re overpromising on the AI’s results and guaranteeing accuracy, you might experience breaking of trust since you can’t control the probabilistic results of a model. For example, even if you submit the same prompt into ChatGPT multiple times, you will get different results. However, if the marketing communication is more around introducing an AI functionality and that it is still a work in progress, users will be more forgiving.

Sometimes, the value is in the delight in being able to use a chat interface. (h/t Wendy Foster, Denys Linkov for discussion related to this topic.) As a developer, I have my own preferences, such as not often using interactive or oral formats for information. However, it was pointed out that typing a question into TikTok to get an explanation is a common usage of the app. So there are ways of interactions I don’t prefer, but are growing common.

Does your PoC introduce unacceptable latency?

If your PoC (proof of concept) takes a long time to load the AI responses, then you need to do some work to discuss if this is acceptable with your product team. It might still be an issue even if you optimize the best you can, since LLMs themselves will take some time to predict the full results.

Proof of concepts – what’s next?

For prototyping, it seems like LangChain is the tool of choice I’ve heard a lot of and have personally used for 2 projects. I’ve also met the creator, Harrison Chase, in person!

For the next stage in production, I’ve seen teams that eventually build out more custom tooling, to be able to heavily customize to their use case. For example, I’ve spoken to a legaltech / fintech company and a GenAI gaming company that built their own tools in production. This isn’t new; it’s similar to other ML tools where companies might start with open tools first, and then start to build more custom stuff around / on top of those tools.

Privacy, governance

Does the legal team know you’re shipping GenAI in your product? Are you in any way passing the users’ prompts or information back to a 3rd party or cloud? E.g. if you are using a vanilla OpenAI integration, then the users’ prompts could be made accessible to OpenAI. In your product or organization, is this acceptable? There are ways to access LLMs with more security and privacy, such as hosting a private version of a HuggingFace model, but is your organization ready to do this? Or using Azure OpenAI which has more privacy and security features (Link 1, Link 2). However you will need to check if this is enough.

(I’ve been learning more about this area chatting to a friend/mentor, so I will put more on this in any follow-ups.)

Evaluation

For GenAI, I will admit that there is less structured evaluation going on, relying much on qualitative feedback at the moment, or bypassing usual evaluation that other types of ML goes through. One thing you won’t go wrong with is saving as many screenshots as possible when testing. For more quantitative evaluation, one starting point is OpenAI’s Evals framework, though I’m sure many more are being developed at the moment.

There’s more thoughts... But that’s it for now. There might be a part two, feel free to subscribe so you don’t miss it!

Level up your Machine Learning career

Discussion about this post