1. Introduction
In the design of the regional development plan of the Gorenjska region, the brainstorming process plays an important role in the initial steps with the goal of providing innovative ideas that would benefit the community [
1,
2,
3,
4]. New tools such as ChatGPT [
2] present the opportunity to enhance group brainstorming sessions with the aid of AI. In present paper, we do not use ChatGPT interface directly, but rather consider the integration of the existing software tools for brainstorming, in our case tool “kresilnik” with OpenAI API which enables advanced use of the Generative Pre-trained Transformer (GPT) models. In the previous cycle of the design of the regional development plan of the Gorenjska region [
3] the tool »kresilnik« was used to gather, categorize, and prioritize the innovative ideas in the initial stage of the regional development plan design. In present research, we propose the concept of hybrid brainstorming sessions as well as the concept of realization of the hybrid brainstorming tools.
Classical brainstorming sessions will never be the same since the invention of ChatGPT [
5,
6,
7,
8,
9,
10]. This has a profound impact on the brainstorming methodology[
11,
12]; therefore, it is our intention to provide a novel, hybrid framework for the idea-generation process.
To test the feasibility of the augmentation of the »kresilnik« brainstorming tool, the software system design will be defined, which will enable integration of the tool with the OpenAI-GPT-3.5-turbo model via web socket over a secure API-key encrypted connection.
In each brainstorming session, we start with the initial question or call for ideas. When using OpenAI-GPT-3.5-turbo API, the proper prompt should be formed.
The initial question that was posed to the OpenAI GPT-3.5-turbo model was the following: “In the period up to 2034, with what activities will we take advantage of the strengths and eliminate the weaknesses of the Gorenjska region in the field of human resources development? Please give one idea according to the principles of brainstorming.” Stated initial question was similar to the one posed to the human participants’ group in 2018, only the year was changed from 2027 to 2034, and the statement “Please give one idea according to the principles of brainstorming.” was added, which was actually also stated to the real committee participants in 2018. Therefore one could consider that the same question was posed here.
The question, stated in the above text in English, analyzed by the OpenAI tokenizer [
13] of the original Slovenian language, yielded 84 tokens, 203 characters. The tokens are marked with different colors, which is shown in
Figure 1.
Figure 1.
Tokenization of the initial brainstorming question.
Figure 1.
Tokenization of the initial brainstorming question.
“The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.” [
13]
We should also mention that this initial question was stated in Slovene language, which is exceptional. The real, i.e. human group counted 8 members, which generated 95 ideas in 24 minutes. In our experiments, the sets of 95 ideas were generated under different GPT temperature parameters.
Generated ideas by the OpenAI-GPT-3.5-turbo model will be examined by the generation of the word cloud at different model temperature parameters. Temperature governs the randomness and, thus, the creativity of the responses. Large language models predict the next best word when the initial prompt is provided, one word at a time. The model assigns a probability to each word in the model vocabulary and picks among those words. With temperature 0, the variation of the word selection would be small; the algorithm would try to pick the word with the highest probability. A higher temperature would result in the selection of words with slightly lower probability which would lead to more variation, randomness, and creativity [
14]. If one wants to experiment and create many variations quickly, a high temperature would be better.
With the developed “kresilnik” system, the 8x95=760 ideas were generated at eight different gpt-3.5-turbo model temperatures (0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75) in order to examine the functioning of the gpt-3.5-turbo model and the appropriateness of the generated results at the different temperatures. We examined how different temperatures influence the innovativeness of the ideas and whether ideas might be applicable to the design of the Gorenjska regional development plan.
In order to distinguish between the process of idea generation by human group and AI, the entropy of generated ideas has been determined. In our previous research, the time needed to generate idea by participants were recorded, enabling us to compare the human vs. artificial intelligence generation process. We have also observed the frequency distributions of the entropy H as well as the correlation between the length of the generated ideas and time needed and corresponding distributions.
In the present research unique comparison has been provided between the human group and AI system in the process of generating innovative ideas. Provided results will enable the development of novel information systems, enhance the brainstorming methodology and contribute to the means of the detection of human-generated and artificially generated ideas.
2. Methodology and Tools
Figure 2 shows a part of the brainstorming process flowchart that is modified with the addition of the possibility of generating new ideas via OpenAI API exploiting the
gpt-3.5-turbo model. Initial brainstorming steps stay the same i.e. inviting the participants, stating the rules etc. However, after the Call for ideas, one could generate ideas with the aid of OpenAI GPT model. These ideas can, later on, be checked, augmented, and corrected by the participants. When the idea is edited, one should check whether it is unique and then passed on to the next classic steps of brainstorming.
Figure 2.
A flowchart for conducting a brainstorming session with the aid of OpenAI. Only a fragment of an entire brainstorming session is shown, including the OpenAI application.
Figure 2.
A flowchart for conducting a brainstorming session with the aid of OpenAI. Only a fragment of an entire brainstorming session is shown, including the OpenAI application.
An important addition to the classic brainstorming process is the possibility of acquiring innovative ideas with the OpenAI ChatGPT API (Cahan & Treutlein, 2023) providing the methodological framework for hybrid brainstorming process. One should note, that in the real world, the participants of the professional committees might generate only one idea in the timeframe of 30 minutes [
1,
4].
Figure 3 shows the design of the hybrid [
15,
16,
17,
18,
19] human-OpenAI tool for idea generation named “kresilnik”. The system consists of the server-side infrastructure, OpenAI API and User interface. The server runs Ubuntu Linux with node.js. The user can generate ideas with the help of OpenAI by pressing the corresponding button (1). The request is transmitted via web socket to the server. The server then issues an asynchronous call of the OpenAI API function (2). OpenAI API provides the response via the secure connection encrypted by the secret OpenAI API key (3). After the receipt of the results, i.e., generated idea, the idea is transmitted to the client that has requested it via web socket (4). Generated idea is displayed in the user interface (5), where the user can edit the generated idea and add or remove a particular part of the generated idea. After the idea is checked by the user, the user can send the idea to the server (6). The server then distributes the generated and approved idea to all participants in the brainstorming group. Users can also provide their own ideas or work in hybrid mode, merging the generated and own ideas.
Figure 3.
Design of the hybrid human-OpenAI tool for idea generation.
Figure 3.
Design of the hybrid human-OpenAI tool for idea generation.
In order to compare the set of ideas generated by the members of the real regional development committee and ideas generated with the aid of GPT model at different values of temperature parameter, the entropy of the particular idea was determined by the following equation:
where
represents the frequency of the i-th character occurrence within the idea text and
the length of the idea.
The connection to the OpenAI GPT system was realized by the asynchronous API call to OpenAI GPT-2.5-turbo model. The code of the function
createChatCompletion with defined model and messages is shown in
Figure 4.
Figure 4.
OpenAI GPT-3.5-turbo API call code.
Figure 4.
OpenAI GPT-3.5-turbo API call code.
Figure 5 shows user interface mockup on iPhone of the “kresilnik” prepared for the resolution 375 x 667 px in Slovene language. This is a prototype, a fully functional version of the system with added OpenAI GPT functionality. One can observe the number “19593” in front of the generated idea, which is the number of milliseconds passed from pressing the button “Generiraj idejo” ~ “Generate Idea” until the idea is displayed in the above text box.
Figure 5.
User interface mockup on iPhone 375 x 667 px in Slovene language. “Vnesite idejo” = “Enter idea”, “Povezava s strežnikom je vzpostavljena” = “Connection to the server is established”, “Pošlji idejo” = “Send idea”, “Generiraj idejo” = “Generate idea”.
Figure 5.
User interface mockup on iPhone 375 x 667 px in Slovene language. “Vnesite idejo” = “Enter idea”, “Povezava s strežnikom je vzpostavljena” = “Connection to the server is established”, “Pošlji idejo” = “Send idea”, “Generiraj idejo” = “Generate idea”.
The user interface was developed with html and JavaScript for mobile and stationary devices, which is convenient for the users.
3. Results
Figure 6 shows the word cloud for 95 ideas generated by Gorenjska regional development committee in the year 2018. One can observe that the real committee has proposed “elderly” while ideas generated by GPT API did not emphasize this topic. One should also consider that the “Human Resources” topic could be understood specifically for the Gorenjska region in particular timespan, i.e., year 2018. The presented word cloud in
Figure 6 can be applied as the reference point to compare the ideas generated by OpenAI GPT API.
Figure 6.
Word cloud for 95 ideas generated by Gorenjska regional development committee in the year 2018.
Figure 6.
Word cloud for 95 ideas generated by Gorenjska regional development committee in the year 2018.
Figure 7 shows word clouds for 95 ideas generated by OpenAI GPT API for temperatures 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, here top left is for temperature 0. Picture in row 1, column 2 shows temperature 0.25 etc. At the start, where the temperature is low, several keywords are emphasized. The larger the text, the higher the frequency of specific keywords. With increasing temperatures, the importance of the keywords is distributed more evenly. Interestingly, word cloud extracts meaningful keywords even at the temperature of 1.75, where generated text is in a significant portion rather random. Here word cloud might be considered as the proper filtering method to extract helpful suggestions from the OpenAI GPT API.
One could observe, that at the temperature=1.5, the GPT API sometimes returns the “halucinational” results [
20], providing some English text passes:
(Our instrumentalization however, may call for introducing mechanisms that bolostequally sitintparenthood and We),
some Chinese simplified text:
(此公海冬请OA原图帝劣因 不只有斢果),
and fragments of the computer code:
(if(this.href.indexOf(’#ghost’)>-1){bref=location.href.split(’#’)[0];anchor=csv) and apparently random characters (ingmethodamaupzerðrrazmislekitoliko).
Nevertheless, some ideas generated at the temperature=1.5 are still impressively innovative and original, such as: “Creating events and study programs at Gorenjska public schools that focus on the 21st century with the aim of creating a new generation force that can stay in the region and promote the sustainability and development of the wider community.”
At the temperature=1.75 the parts of generated ideas are hardly usefull. Here, the level of “halucination” is rather very high, mixing different languages and apparently random strings of characters. Example of such an idea (“Opening a local training center or support system for small and medium-sized enterprises. A joint program would be developed as IQSEJA”):
“Odpiranje lokalnega izobraževalnega centra ali podpornega sistema za mala in srednja podjetja.Razviti bi bil skupni program kot IQSEJA-lta karton kurslarī bu verstandrevalktu Karivgenlassalirdizilani na ovo prodručb iz voulksi teridas tyukeućih. Toringg treneniiisisk ….”
Similar for the following case (We should take advantage of Gorenjska mountains because of winte123):
“Izkoristiti bi morali vrline gorenjsKIH gor zaradi zims123_k_o nivojsanju griDen motena na lv2cjvh rjugin vsxt-g mq523 kvula.zros bo232 @hm5276 zag917ki sod zb GAIM327-LKO.(Naše verige besedic "---ccc---, "---rjuga(r/m/i.P)###imetnik bokeškrb --- z (#mrzlice)).EN: COVID KILLS". Lemma=GIBREL 326Q|#Y;;ISO-Sr(FSK/T31089/O2384ISR, osalz4M332%)”
One could observe, that starting text of the idea, marked with bold, could even be useful, however, the following text appears rather random. System could also generate emojis such as 🐴,✨ and ❌ at temperature 1.75:
“(TES(DSV29tkzaapmaEn🐴zypoBenBAcrahslAdSh | Chelah!✨ | Neveljavana Id❌).”
Besides, the randomness in the proposed ideas might trigger positive associations with interacting subjects providing new, innovative ideas.
At the temperature=1.75, even a useful idea might be generated, such as: “Establishment of ‘mondene’ Gorenjska market place.” Or for example: “Digital knowledge for Gorenjska competitiveness. To enable the introduction of informatics as a more comprehensive subject already in primary schools and the planned orientation of educated computer scientists…” following, again with the random text.
One could anticipate a post idea generation filter, that would extract useful text. Here, the real human subjects in the brainstorming session might filter out the good parts of the text. Certainly, the algorithm could be useful here, but somehow challenging to extract the good parts. Here the possibility for the useful application of AI with the aid of human actors could be usefully exploited.
At the temperature=0, the system returns the ideas that are in part or even as a whole completely the same; in the next example marked in bold. Some parts of the idea might still be different, for example:
The establishment of a regional centre for career development, which will provide free counseling and education for various target groups, from school and university students to the unemployed and employed, and connect employers with job seekers.
The establishment of a regional centre for career development and education, which will connect educational institutions, companies and other organizations and provide training, mentoring and opportunities to gain experience for young and experienced workers.
However, at the temperature=0, all 95 ideas start with the “The establishment of a regional center…” This might be some hint from the previous regional development plans, which might include such ideas.
One could observe that the ideas somehow repeat, and the innovativeness might be questionable. The members of the real committees might know all the areas. However, if you would compare one member to GPT set of ideas, it could happen that a particular, important area would be omitted. Therefore, the GPT could be perceived as a useful tool for participants to consider wider scope of the initial brainstorming question.
Figure 7.
Word cloud for 95 ideas generated by GPT API for temperatures 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, here top left is for temp. 0, row 1, col 2 is for temp 0.25.
Figure 7.
Word cloud for 95 ideas generated by GPT API for temperatures 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, here top left is for temp. 0, row 1, col 2 is for temp 0.25.
For the set of generated ideas, the entropy of each idea was determined by the Equation (1). On the left side of
Figure 8, the increase of the average Entropy H computed by Equation (1) is shown. One could observe an increase in entropy at values 1.5 and 1.75. This is, in most part, due to the addition of languages other than Slovene in the results as well as lengthier generated text. On the right side of
Figure 8, one can observe the corresponding average time in milliseconds needed to generate a particular idea. From the temperature of 0.75, one could observe a steady, somewhat exponential increase in the time needed to generate a particular idea.
Figure 8.
Left) Entropy H at different GPT-3 model temperatures Right) Average time of idea generation in ms, N=95.
Figure 8.
Left) Entropy H at different GPT-3 model temperatures Right) Average time of idea generation in ms, N=95.
The average entropy of generated ideas by GPT API ranged from H = 4.247 bit up to H = 4.579 bit, while the entropy of a human committee was H = 4.567 bit. In general, one could conclude, that the entropy H increases with the temperature, which is more apparent at temperatures higher than 1. With the correlation coefficient r2 = 0.97 one could conclude that the ideas with the higher entropy, i.e. more innovative, take up more time to generate.
Figure 9 shows entropy
H histograms of the Human Group (humanGroup) and OpenAI- generated sets of ideas from temperature 0 (temp0) to temperature 1.75 (temp1.75). On the y-axis, the absolute frequency of the H bins is shown. One could observe different shapes of the distributions.
Figure 9.
Histograms of the entropy H for Human Group (humanGroup) and OpenAI generated sets of ideas from temperature 0 (temp0) to temperature 1.75 (temp1.75).
Figure 9.
Histograms of the entropy H for Human Group (humanGroup) and OpenAI generated sets of ideas from temperature 0 (temp0) to temperature 1.75 (temp1.75).
Table 1 shows the results of the Shapiro-Wilk test for Human group (humanGroup) and OpenAI results for temperatures from 0 to 1.75 with the step 0.25. W statistics represent a measure of how well the ordered and standardized sample quantiles fit the standard normal quantiles. One could observe that entropy distributions at the temperatures 0.5, 1 and 1.25 fulfil the criteria of normality.
Table 1.
Results of the Shapiro-Wilk test for Human group (humanGroup) and OpenAI results for temperatures from 0 to 1.75 with the step 0.25.
Table 1.
Results of the Shapiro-Wilk test for Human group (humanGroup) and OpenAI results for temperatures from 0 to 1.75 with the step 0.25.
Group | temp. |
W |
pValue |
humanGroup |
0.958 |
0.004 |
temp0 |
0.704 |
0.000 |
temp0.25 |
0.973 |
0.048 |
temp0.5 |
0.996 |
0.995 |
temp0.75 |
0.934 |
0.000 |
temp1 |
0.985 |
0.364 |
temp1.25 |
0.992 |
0.824 |
temp1.5 |
0.810 |
0.000 |
temp1.75 |
0.902 |
0.000 |
While the human group distribution of generated ideas entropy was not Gaussian (normal), this might be an indicator that the set of ideas was artificially generated.
The non-parametric Mann-Whitney U-test was conducted to assess the similarity between the entropy of ideas (H) within the human group and the ideas generated by OpenAI. At the level of p=0.001 none of the entropy sets matched the Human Group entropy (U-stat: temp0=2353.0, temp0.25=2341.0, temp0.5=2114.0, temp0.75=2228.0, temp1=1851.0, temp1.25=1809.0, temp1.5=1253.0, temp1.75=708.0).
Figure 10 shows the correlation between the length of generated ideas measured by the number of characters and the time needed to generate ideas in milliseconds [ms]. The linear trendline is added as well as r
2 (r2). The first correlation subplot shows the human group. Here the r
2 = 0.11. The r
2 for temperature T0 is 0.01 which might be attributed to the fact, that the results of OpenAI API were most deterministic and that the time needed to generate the idea was in part dependent on the network latency. The times to generate the ideas at T0 were also the shortest. A similar situation might be speculated at T0.25 however, here clear trend with r
2 = 0.47 could be observed. For other temperatures, from T0.5 to T1.75 higher values of correlation coefficient could be observed.
Figure 10.
Correlation between length of generated ideas (number of characters) and time to generate ideas in milliseconds [ms]. r2 represents the squared value r2.
Figure 10.
Correlation between length of generated ideas (number of characters) and time to generate ideas in milliseconds [ms]. r2 represents the squared value r2.
By observing the correlation plots, distinction could be made between the process where the human group is involved and OpenAI on the other hand.
If we look further into the differences of the idea generation process, we might inspect the distribution of time needed for generating ideas in milliseconds [ms] which are shown in the
Figure 11. On the x-axis of the graphs the time needed to generate ideas is shown while on the y-axis the absolute frequency is represented. One could observe that the distribution of times is not symmetrical for the human group. One could anticipate an exponential distribution of the times needed to generate ideas. For temperatures from T0 to T1.25 the distributions are more symmetrical. For temperatures T1.5 and T1.75, the distributions become asymmetrical again. Combined with the analysis of the correlations between ideas’ length and times with the distribution, the distinction between human and artificial processes might become more precise.
Figure 11.
Distribution of time needed for generating ideas in milliseconds [ms].
Figure 11.
Distribution of time needed for generating ideas in milliseconds [ms].
A similar situation might be observed if we consider the distributions of the number of characters in generated ideas which is shown in
Figure 12. Again, the distribution of the human group is distinctively asymmetrical.
With the combined analysis of the correlation between times needed to generate ideas and the length of ideas in addition to described distribution analysis of, times, lengths and entropy, the distinction could be provided in order to determine, weather the underlying process is human or artificial.
Figure 12.
Distribution of the number of characters in generated ideas.
Figure 12.
Distribution of the number of characters in generated ideas.
With the rise of the AI, the identification of the idea generation process is important not only for the determination of the idea’s origin [
21,
22,
23,
24,
25,
26,
27] but also for better understanding of the human innovation group processes.