Does your campaign plan to try GPT-3? Here’s what you need to know

Join Transform 2021 for the top topics in the AI & Data campaign. Learn more.

In a previous article, I discussed the market benefits that enterprises can reap by developing applications using OpenAI’s GPT-3 natural language model. Here I want to provide some primer for companies first looking at the technology.

There is currently a waiting list to access the GPT-3 API, but I had the opportunity to play around in the system. For those who haven’t tried yet, here are a few things to prepare:

1. Context is everything

The input you provide GPT-3 is some seed text on which you want to train the model. This is the context you are setting for the GPT-3 response. But you also give his answer a “prefix”. This prefix is a directive that controls the text created by the model, and is marked with a colon at the end. For example, you can provide a paragraph as a context and use prefixes such as “Explain to 5-year-old:” to generate a simple explanation. (It is recommended not to place anywhere after the prefix). Below is a sample response from GPT-3.

As you can see in the example above, your prefix does not need to follow complex reader encoding. It’s just a simple phrase that people read.

You can use several prefixes to describe a larger or extended context – as in the chatbot example. You want to provide chat history to help the bot generate responses. This context is used to optimize GPT-3 output and to generate response. For example, you could make the chatbot helpful and friendly, or you could make it convincing and unfriendly. In the example below, I have given four prefixes to GPT-3. I have provided a sample result for the first three and then left GPT-3 to continue from there.

Since the result you get from the model depends entirely on the context you provide, it is important to pick up these elements carefully.

2. Carefully adjust or compromise your signals

Configuration is the options shown on the right in the examples above. These are parameters that you enter with your API call that will help fine-tune the response. For example, you can change random responses using the Temperature setting setting, which ranges from 0 to 1. If the temperature is set to 0, each time you make a call with some context you get the same answer. If the Temperature is 1, the answer will be random.

Another configuration you can tweak is Length Response, which restricts the text returned by the API. Keep in mind that OpenAI costs to use the platform on a token basis rather than a per word basis. And a token usually covers four characters for you. So, at the test stage, be sure to adjust your response length so that you don’t use your signals right away.

With the free 3 month route of GPT-3 you get $ 18 worth of tokens. I ended up eating almost 75% of my share just with a little experimentation with the API. In fact there are four different versions of the GPT-3 model available as “engines,” each of which has a different pricing model. The average cost for tokens as of today is $ 0.06 per thousand tokens for the DaVinci engine, which is the best performer of the four. The less user-friendly engines are Curie, Babbage, and Ada, $ .006, $ 0.0012, and $ 0.0008 per thousand tokens respectively.

3. MLaaS will be bigger than SaaS

GPT-3 is perhaps the most well-known example of an advanced natural language processing API, but it is likely to be one of many as the NLP ecosystem matures. Machine learning as a service (MLaaS) is a powerful business model because you can spend both the time and money to pre-train a module yourself (for context, GPT-3 costs close to $ 12 million for OpenAI), or you can pre-train its model for a penny on the dollar.

In the case of GPT-3, all calls you make to the API will be routed to some shared instance of the GPT-3 module running in the OpenAI cloud. As mentioned before, the DaVinci engine performs best, but you should try it for yourself with each engine for specific use cases.

DaVinci forgives if there are spelling errors or extra / missing spaces in your entry context, and provides a very easy-to-read response. You may notice that it is trained on a larger corpus and is prone to mistakes. The cheapest engines require you to do more work to design the context and usually require tuning to get the exact answer you expect. Below is an example of a classification of companies with a name spelled FedExt in context. DaVinci is able to get a correct answer while Ada gets it wrong.

Again, when we look up a specific example of drug interaction, DaVinci gets to the point and answers the question much better than Ada or Babbage:

4. Models are built on top of each other like Russian dolls

GPT-3 is a stateless language module, which means it can’t remember or learn from your previous applications. It relies entirely on its initial training (which largely makes up the entire text on the internet) and the context and alignment you provide.

This is the main barrier to initiatives in adoption. You can generate some very interesting demos, but for GPT-3 to be a major competitor for real-world usage issues in banking, healthcare, business, etc., we need to train models that are unique to the land. For example, you would want a module trained on policy documents within your company or patient health records or machine manuals.

Therefore, applications built directly on top of GPT-3 may not be of use to campaigns. A more beneficial monetization scheme could host modules similar to GPT-3 as an API that is specialized for specific problems such as drug detection, insurance policy recommendation, summary financial reports, maintenance of design tools, etc.

The end use is to speed up an application built on a model built on top of another model. A specialist model built by an enterprise on its property data must be able to adapt based on new knowledge gained from business documentation to remain relevant. In the future, we will see more land language models with active learning potential. And we may eventually see an active learning business model from GPT-3, too, where organizations will be able to gradually train an exemplar on their standard data. However, this comes at a hefty price point as OpenAI has to host a special setting for that customer.

Dattaraj Rao is an innovation and R&D architect at Persistent Systems and author of the book Keras to Kubernetes: A machine learning model journey to production. At Persistent Systems, he leads the AI Research Lab. He holds 11 patents in machine learning and computer vision.

VentureBeat

VentureBeat’s mission is to be a digital city square for tech-savvy decision-makers to experience transformational technology and mobility. Our site provides essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the topics in which you are interested
our newsletters
with thought-provoking content and discounted access to our precious events, e.g. Transform 2021: Learn more
network features, and more

Become a member

Source

1. Context is everything

2. Carefully adjust or compromise your signals

3. MLaaS will be bigger than SaaS

4. Models are built on top of each other like Russian dolls

VentureBeat

Share this:

Related