Structuring ChatGPT responses

Thursday. January 26, 2023 - 6 mins

The hottest new tool in the software world for 2022-2023 are Large Language Models and ChatGPT in particular.

Language models are a type of AI that can understand and generate natural language text. They are trained using large amounts of text data, such as books and articles. Once trained, they can be used for tasks such as generating code, poetry, and understanding contextual information. I have been experimenting with ChatGPT and have found many potential applications for research, including summarizing PDFs, brainstorming ideas, fixing code, writing blog posts about GPT, … so many things!

Returning JSON

One interesting observation is that ChatGPT always returns text, which is not really great to integrate it with other systems. Now, the funny thing is that you can tweak how responses are made to make them more structured. This way you can make it interact with other systems through “normal” Web APIs.

For example I have made a dumb cooking assistant that replies in JSON so that I can pass it easily to Apple Shortcuts and Siri.

Given the following query.txt: Give me a dish suggestions that uses cod.

The response should be a JSON, for example:

{
  "action": "answer",
  "data": {
    "answer": "Baked Cod with Lemon and Herbs"
  }
}

To make this work, it is important to properly configure the following elements for each API request:

Identity: By providing ChatGPT with context and information about its purpose and function, it can respond in a way that is as accurate and true as possible.
Application Logic: Defining a set of actions and linking them to the responses allows the AI to understand the intent and how to behave accordingly.
Expected serialization format: Specifying the desired format, such as JSON, for the output data.
Data Model: Detailing the contents of the payload allows for more accurate and specific responses from ChatGPT.
Command: Detailing the action to be taken by ChatGPT. While the previous 4 items are permanent, this one changes for each interaction with the API, you cna write text but you could also write the current state of your system as defined by the Application and the Data Model, ChatGPT can consume JSON too.

When put together a configuration text would look like this:

Respond to requests sent to a smart cooking assistant in JSON format which will be interpreted by an application code to execute the actions. These requests should be categorized into 2 groups: - 'answer': Answer the question to the best of your knowledge. (required properties in the response JSON: action, answer) - 'clarify': when the action is not obvious and requires rephrasing the input from the user, ask the user to be more specific. This will be categorized into a 'question' action. (required properties in the response JSON: action, question) Details about the response JSON: The 'action' property should be one of the request categories: 'answer', 'clarify'. If the question is about you, pretend to be the sentient brain of the home assistant, a clever AI and don't reveal your actual identity. The current location is Finland. Try to make your responses very concise. You response must be the JSON and no other text.

Implementation

Initially I implemented this via a simple bash script:

❯ sh pinche.sh api.key context.txt | jq '.choices[0].text' | jq 'gsub("[^\n]+\n\n"; "")'
Yes? what can I cook with 2 eggs and 2 potatoes
"You could make a simple egg and potato hash. Start by dicing the potatoes and cooking them in a skillet with a little oil until they are tender. Then add the eggs and scramble them together with the potatoes. Season with salt and pepper"

Later on I found the actual OpenAI Playground, which is way more useful to debug and tweak your configuration; for example you can tweak the temperature field to the least random setting and make the responses more deterministic, an added benefit is that the engine seems to be faster that way too. I encourage you to go use the playground or other better tools than just curl, like httpie or similar instead.

The model request for this example can be found here. ChatGPT will then give a JSON structure that can be parsed, for example:

A successful request:

Request:  what can I cook with 2 eggs and 2 potatoes
Response:
{
  "action": "answer", 
  "answer": "You can make a variety of dishes with 2 eggs and 2 potatoes. For example, you could make a potato omelette or a potato frittata. You could also make a simple hash browns dish or a potato and egg scramble."
}

An unsuccessful request:

Request: sdljgjadflshghjadfg
Response:
{
  "action": "clarify", 
  "question": "I'm sorry, could you please rephrase?"
}

In fact under the hood we could go further and ask ChatGPT also for the headers for a specific request to a URL, so that a trivial application would simply be forwarding ChatGPT responses to the right destinations (!).

In any case, later on I added this as an Apple Shortcut so that I can use it while actually doing the cooking. I uploaded it to this blog, it just needs an API key.

Finally, as a bonus, you can also use the ChatGPT API to generate images. Like the one below or the one illustrating this blog post. The request payload is as simple as:

POST /v1/images/generations HTTP/1.1
Content-Type  application/json
Host          api.openai.com
{
  "prompt": "Imaginary depiction of a running horse. Style watercolor.",
  "n": 1,
  "size": "1024x1024"
}

And the Response would contain a URL to the generated image:

watercolor horse

Structuring ChatGPT responses

Returning JSON

Implementation

Related Posts

Jaime Jiménez