ChatGPT has taken the world by storm. It’s no wonder that business owners across industries are eager to put it to work. They’ve likely read a few articles, tried it out on a simple example, then gotten super excited about an idea and said, “Let’s do it! Let’s plug GPT into our system!”
Sounds reasonable, right? Sales and marketing teams are probably the most excited, rubbing their hands together in anticipation of all the new leads pouring in, but wait: what if you’re the developer now tasked with deploying GPT?
Your boss has tested it. And they think everyone’s doing it, so “it must be a piece of cake.” But what if I told you that to make ChatGPT work how you want and deliver real value to the end user, it isn’t such a walk in the park? How would you relay that message to your boss?
Fortunately, we’ve written this article to help you do just that, showing how training ChatGPT online is a far cry from preparing a GPT model for a business. And to make the read that bit more accessible, we’ve made the Cookie Monster our main character.
The Cookie Monster is that lovable blue creature from the children’s TV show Sesame Street. His love for cookies is much like ChatGPT’s love for prompts, while they both try to spark joy — so bear with us, let’s see if we can make the analogy work.
But be warned: if you think using ChatGPT in your business will feel as joyous as a kid’s cartoon, it could quickly become a horror movie.
Buckle up and find out why.
5 Cookie Monster (ahem, ChatGPT!) Limitations: Crucial Insights for Your Boss’s Consideration
1. GPT often loses the context of the conversation
First and foremost, we need to emphasize one crucial thing: ChatGPT learns from us during our conversation. Still, it often feels like you’re talking to a forgetful grandpa who can’t remember what you said in your last sentence, so how do you avoid this?
You need to train the model to stop with the memory lapses. You see, GPT works in a ‘prompt-completion’ mode, meaning if you want to teach them, you provide a prompt (i.e., the question), and the model generates the end (i.e., the answer).
But when the Cookie Monster (that is, GPT) gobbles up too many cookies (that is, your prompts), he can quickly get full and forget about the first cookie he ate. That’s why, although you might have had a good talk with ChatGPT, the Cookie Monster might not remember how he’s meant to behave.
2. Preparing training data takes time
How you prepare a model for training depends on what you want to train it for. Input data can be a set of texts, which is relatively simple to pull together.
However, remember that you can’t feed the model an extensive text all at once (like an entire book or even an article). You must factor in limitations of acceptable text length for specific GPT versions, then cut your text accordingly.
Let’s return to the Cookie Monster, who, by this stage, can tell you how your cookies taste.
When it comes to inference, it’s harder than summation because the model requires not only data but also logic. So while GPT can receive any text, there’s a high risk that it misunderstands contradictory or unclear instructions in the text.
Therefore, we must show the Cookie Monster what good cookies look like. He’ll then eat them and say, “Oh! So that’s how you want to bake cookies? Alright, from now on, we’ll bake them like that” — et voila, ChatGPT will start baking cookies exactly as instructed.
Still, the taste might be a little off. This is normally down to one of two reasons: (1) an error in the input data or an inadequate number of “question: answer” sets or (2) including multiple tasks in a single prompt (like classifying one text into various questions).
Let’s use an example to show you what we mean.
Suppose we want to train ChatGPT on what meal we should have for breakfast. We want ChatGPT to choose for us based on our dietary preferences. How many rules do you think we create? The answer is — the more, the better (that is, unless you want to eat eggs and bacon for breakfast every day!).
The good news is that you don’t have to generate all this data. You can always use ChatGPT to create a data set for you. For example, you can give it a list of ingredients and then ask what meals you can make from them.
You can then select the ones you genuinely like and ask which ones are suitable for breakfast, lunch, and dinner. Afterward, you can manually divide them into those that meet your criteria and then ask ChatGPT to indicate a set of questions whose answers would be specific meals.
Just remember to thoroughly check the entire dataset to ensure it generates data in line with your preferences and how you want it to answer specific questions. And beware that the task becomes increasingly complicated if you want GPT to be an expert in a specialized field (say, law or medicine).
In such an instance, you’d need someone with domain expertise to verify your dataset.
3. The provider might up the price
Your boss needs to remember that the price of third-party tools is rarely fixed. OpenAI can always raise its prices, so even if it costs $20-a-month to access today, that could soon change. At the same time, the long-term costs depend on how you want to use GPT.
You might have to pay for a set of tokens to use for specific prompts or completions, and prices can vary depending on the model you choose.
Check the current OpenAI pricing for an idea of the rates.
4. The service might *temporarily* drop offline
You also have no control over downtime, so factor this into your own service plan. Downtime is an unavoidable risk, but you can create a contingency plan so that when users encounter it, they see something more than just an error message.
5. ChatGPT is known to make things up
Whatever you do: do not trust everything ChatGPT says.
While the model can use prior knowledge to answer questions unrelated to your training data (like ‘Who’s the president of the USA?), if you asked, “Give me a recipe for cookies with soap,” you might be surprised by the response. Want to see what we mean?
Check the response below:
In truth, this kind of response isn’t unexpected, as pointed out by Tomasz Maćkowiak, machine learning engineer at DLabs.AI: “Validating the accuracy of Large Language Models (LLMs) is difficult because verifying an LLM (Large Language Model) is a daunting task.”
“The model itself is very general and can be used for many downstream tasks. Those models are trained on huge volumes of data and human-in-the-loop interactions, and thus to properly verify the quality of such a model, we would need to get the number of evaluation samples on the same order of magnitude and with a similar process. This is likely not achievable for small enterprises.”
The Cookie Monster’s ‘GPT Complexity Estimator’: A Checklist to Measure Your Workload
So you want to use GPT, but you’re not sure how much work is involved. Well, you’re in luck. We’ve prepared a multiple-choice quiz for your manager to take to better understand the task at hand. Each response shows how much work is required, measured in ‘developer cookies.’
The more cookies you get, the more time, effort, and possibly support you’ll need to get the job done. Just make sure your boss knows exactly what they need and what business goals they want to achieve.
This will make sure you focus on the right aspects.
Baking Up a Storm: A Recipe for GPT Success
Right, so what’s next? Well, it all depends on the complexity of your project (aka: the number of cookies resulting from the test).
Check how many you got, then consider the following:
5-10 cookies: lowest complexity
If a simple solution is all you need, lucky you! Just hook yourself up to GPT-4 and learn as you go! But be warned: this solution is only suitable for internal testing. If you want a client-facing solution, we suggest you think twice!
10-15 cookies: medium complexity
Well, the project isn’t as easy as it first seemed, but there’s a good chance you can handle it. Before you start work, get familiar with the API, read the docs, do some fine-tuning, and test. And when you find a bug, fix it, then test again.
15-25 cookies: highly complex
Woah, this will be a challenge! Start by familiarizing yourself with the API, read the docs, then analyze and prepare your data (remembering the ‘prompt-completion’ rule). Be sure to consider the risks mentioned above (like price changes and downtime), and maybe, just maybe, consider building your own LLM.
Whatever you do: don’t forget to test all edge cases, and don’t blindly trust the model after just one good prediction. Ah, and if you encounter any problems, just consult a machine-learning specialist, we’re always happy to help 😀.
Good luck!
Key Takeaways for GPT Implementation
We hope we’ve helped you see how deploying GPT isn’t child’s play. Still, that’s not to say it’s something to shy away from.
Regardless of your dataset and your team’s experience, always “start small” and limit the risk of something going wrong. Also, forget about the UX/UI until you know how well the fine-tuned model will perform (or if it will even meet your expectations).
Instead, focus on determining your goals before fine-tuning the model so you know the KPIs you’re working towards — and as ever: feel free to get in touch with DLabs.AI if you’d like support when using this powerful tool.
We’re already working with several clients on similar projects, and we’d be delighted to start working with your company, too.