Prompt Design
As previously discussed, ChatGPT has few-shot learning capabilities, which means it can learn new patterns from a small number of examples. In the case of converting human unstructured commands to JSON-serialized commands, ChatGPT can be trained on a set of sample prompts that map human command format to JSON-serialized format. These examples can be used to teach ChatGPT how to identify and extract the relevant information from the unstructured commands and how to transform it into the appropriate JSON structure. With this training, ChatGPT can then generate structured commands for completely new and unseen commands, increasing its flexibility and usefulness in human-robot interaction scenarios.
Here comes the importance of prompt engineering and design. It addresses the question on how to write the best prompts that will optimally guide ChatGPT to infer the proper structure command. There is no single way on how to do it. Usually, this is an interative process that require common sense and expertise in the context of application.
Figure 5 illustrates a sample of few-shot training prompts. There are multiple ways to frame these prompts, and the more prompts are added, the more accurate the expected results will be.
Using this ontology-based approach, the prompts generated for ChatGPT are designed to guide the model in providing the correct robotic action in a structured format, enhancing the overall performance and interpretability of the system.
Prompt Validation
To ensure ChatGPT generates the expected results, it is crucial to conduct extensive testing after designing the prompts. Initial tests can be performed on ChatGPT before integrating it into the application.
Figure 6 shows ChatGPT’s response to a human command prompt without prior training. The response is completely different from the expected output in the context of ROS 2 navigation. However, by providing ChatGPT with a few-shot training prompts, it can learn to generate the JSON patterns that we expect from unstructured human commands. The accuracy of ChatGPT’s performance improves as we add more prompts.
To evaluate the effectiveness of prompt engineering and design, we conducted a series of experiments on ChatGPT. Using only three carefully crafted sample prompts, we trained ChatGPT to produce the desired context-specific output. Detailed excerpts from ChatGPT’s prompt training and validation are provided in the Appendices section to demonstrate the effectiveness of our ontology-based approach.
Case 1: Without using ontology keywords in the prompts:Figure 7 shows a few-shot training example where the keyword "ontology" is not used (refer to [Appendix 1: ROSGPT:ChatGPT]).
Case 2: With ontology keywords used in the prompts:Figure 8 shows a few-shot training example where the keyword "ontology" is used (refer to [Appendix 2: ROSGPT:ChatGPT Ontology]).
Figure 7.
Example of Few-Shot Learning (without ontology).
Figure 7.
Example of Few-Shot Learning (without ontology).
Figure 8.
Example of Few-Shot Learning with Ontology.
Figure 8.
Example of Few-Shot Learning with Ontology.
Observation 1: Elicitation and Adaptability Capabilities. Our analysis of the Appendices outputs confirms that ChatGPT exhibits strong elicitation ability, as it was able to accurately learn the required output patterns from a small sample of only three prompt examples. This finding highlights the remarkable capacity of the model to generalize from limited training data and adapt to new and diverse natural language commands. In particular, we observed that ChatGPT consistently generated correct JSON commands in response to various novel and distinct human commands, indicating its robustness and versatility as a language model.
Consider the following two examples of unseen prompts that demonstrate ChatGPT’s technical strength and adaptability in generating JSON-serialized commands for human-robot interaction:
- 1.
-
Prompt: Move 2.4 meters for 2 seconds Response:
{
"action": "move",
"params": {
"linear_speed": 1.2,
"distance": 2.4,
"is_forward": true,
"unit": "meter",
"duration": 2
}
}
In this example, ChatGPT infers the linear speed as 1.2 m/s, which is not explicitly mentioned in the prompt. This demonstrates the model’s ability to make appropriate calculations to generate accurate JSON structures based on the given information, making human-robot interaction more flexible.
- 2.
-
Prompt: Move backward 15 meters at 20 km per hour
Response:
{
"action": "move",
"params": {
"linear_speed": -5.56,
"distance": 15,
"is_forward": false,
"unit": "meter"
}
}
In this example, ChatGPT converts the speed from km/h to m/s and adjusts the linear speed to a negative value to account for the backward movement. This demonstrates the model’s unique understanding of concepts and ability to adapt to different types of human prompts, even when the input is in different units.
These examples illustrate the technical strength and flexibility of ChatGPT in processing and responding to human prompts, making human-robot interaction more efficient and user-friendly.
Observation 2: Ontology-Enhanced Contextual Accuracy In this scenario, we examined the output produced by ChatGPT in response to a human command without incorporating the ontology keyword in the few-shot learning sample. This test aimed to assess how well the language model could generate structured robotic commands without the guidance provided by an ontology.
In
Figure 9, the results showed that ChatGPT generated the action "take_picture," even though this specific action was not defined within the learning sample. This outcome highlights the potential limitations of the model in interpreting and generating contextually accurate commands when not guided by a structured framework like ontology. Without the ontology keyword, ChatGPT generates actions based on its understanding, which may not always align with the desired actions defined in the learning sample. Consequently, this may result in outputs not adhering to the specific constraints and requirements defined in the learning sample.
On the other hand,
Figure 9 depicts the output generated by ChatGPT in response to a human command that utilizes the ontology keyword within the few-shot learning sample. In this scenario, the model refrains from generating the action "take_picture" since it is not defined as a valid action in the learning sample. Incorporating ontology effectively constrains the model’s output, ensuring that it aligns with the desired patterns and adheres to the context-specific requirements.
This experimental study illustrates the important role of ontology in guiding large language models such as ChatGPT to generate contextually relevant and accurate structured robotic commands. By incorporating ontology and other structured frameworks in the training and fine-tuning processes, we can significantly enhance the model’s ability to generate outputs that are both consistent with the application context and compliant with the predefined constraints and requirements.
Observation 3: Unpredictable Hallucinations Limitation Our experiments have revealed a limitation of ChatGPT when using ontology-based prompting. The model can sometimes become confused by the ontology unexpectedly, leading to errors or omissions in its responses. This phenomenon is known as "hallucination" in the literature on language modeling.
In one of our experiments, we observed a clear example of hallucination, as shown in
Figure 11. When prompted to "go to the bathroom," ChatGPT mistakenly followed the ontology statement that the target location could only be "Kitchen," which happened to be an example value in the training sample. As a result, the model either generated no response or provided an incorrect response due to the deviation from the ontology.
This limitation results from the fact that ChatGPT, like all language models, is trained on a finite dataset and is, therefore, prone to biases and inaccuracies in its understanding of language. In this case, the model’s training on the ontology led it to expect only specific values for the target location, which caused it to fail when faced with an unexpected value.
However, it is worth noting that ChatGPT was able to behave correctly when presented with the same ontology and human prompt, as shown in
Figure 12. This suggests that the model can understand and follow ontologies properly specified and is not subject to biases or anomalies.
This observation underscores the importance of carefully considering the potential for unexpected behavior in ChatGPT when developing human-robot interaction models that rely on these language models.
Our findings highlight the need for careful design and validation of ontologies when using them to prompt language models like ChatGPT. The limitations of training data and potential biases in ontologies can significantly impact the performance of these models. They must be taken into account in their use and interpretation.