Using Machine Learning to Information Classification Related on Child-Rearing of Infants from Twitter

It is difficult to obtain necessary information accurately from Social Networking Service (SNS) while raising children, and it is thought that there is a certain demand for the development of a system that presents appropriate information to users according to the child's developmental stage. There are still few examples of research on knowledge extraction that focuses on childcare. This research aims to develop a system that extracts and presents useful knowledge for people who are actually raising children, using texts about childcare posted on Twitter. In many systems numbers in text data are just strings like words and are normalized to zero or simply ignored. In this paper, we created a set of tweet texts and a set of profiles created according to the developmental stages of infants from "0-year-old child" to "6-year-old child". For each set, we used Support Vector Machine (SVM), and Bidirectional Encoder Representations from Transformers (BERT), a neural language model, to construct a classification model that predicts numbers from "0" to "6" from sentences. The accuracy rate predicted by the BERT classifier was slightly higher than that of the SVM classifier, indicating that the BERT classification method was better.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

This research aims to develop a system that extracts and presents useful knowledge to people who are actually raising children, using texts on childcare posted on SNS. The information is necessary for childcare changes from moment to moment according to the developmental stage of the child. Even in conventional media such as books and magazines, a lot of information about childcare is provided, but it is difficult to accurately obtain the necessary information at that time, and a system that presents appropriate information to the used is needed. For this problem, we aim to develop a system that collects information on childcare and appropriately presents it according to the user's situation, targeting SNS, especially Twitter. On Facebook, there are many long-form posts about business information and more detailed status reports. Compared to Facebook, many people use Twitter for the purpose of catching the latest information because the freshness of information is higher on Twitter. Users in their 20s to 40s, who are particularly sensitive to trends, use it because they can get real-time information. Since it is easy to post short 140-character posts that you can tweet as soon as you think of it, it seems that there is a lot of real-time information about infants posted by parents who are raising children in their 20s to 40s. Many live voices from parents who are actually raising children are posted on SNS, and it is thought that many posts are in the same situation as the user. It is thought that more useful knowledge can be presented to users by using the actual experiences of parents in similar circumstances.

There are still few examples of research on knowledge extraction focusing on child care, for example, there [1] is research to predicting life events from SNS. They [2] contributed a codebook to identify life event disclosures and build regression models on the factors that explain life event disclosures for self-reported life events in a Facebook dataset of 14,000 posts. Choudhur et al. [3] collected the users posted about their "engagement" on Twitter and analyzed the changes in words and posts used. Burke et al. [4] also analyzed users who have experienced "unemployed" by advertising or email on Facebook, and analyzed the activities on Facebook before changing stress and taking new jobs. but in this research, we collect texts specialized in childcare. We aim to develop a more accurate method by conducting the analysis. As a method, we mainly use natural language processing technology using neural networks, which have been rapidly developing in recent years, especially.

Our contribution can be summarized as follows.

(a) We aim to develop a method that can perform more semantically accurate analysis by using techniques that can accurately handle numerical expressions related to childcare (“2-year-old child”, “37 ° C”, “100 ml”, etc.).

(b) By using the profile information of the user who posted the text together with the text and grasping the attribute information of the poster, we aim to develop a method that emphasizes the text that is closer to the user's situation.

In these two points, we think that the research will be highly novel in terms of method.

2. Related Work

The relationship between numerals and words in text data has received less attention than other areas of natural language processing. Both words and numerals are tokens found in almost every document, but each has different characteristics. However, less attention is paid to numbers in texts. In many systems, numbers treated in an ad-hoc way, documents are just strings like words, normalized to zero, or simply ignored them.

Information Extraction (IE) [5] is a question-answering task that asks the a priori question, "Extract all dates and event locations in a given document." Since much of the information extracted is numeric, special treatment of numbers often improves performance on IE systems. For example, Anton Bakalov and Fuxman [6] proposed a system to extract numerical attributes of objects given attribute names, seed entities, and related Web pages and properly distinguish the attributes having similar values.

BERT [7] is one of the pre-learning models in natural language processing for the large text corpus using the neural network called Pre-training of Bidirectional Trans-formers to fine-tune for each task. The pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. The method of outputting a fixed-dimensional vector regardless of sentence length has the advantage that the more words input has, the more of each word will be in the output vector. In recent years, it has been attracting attention as a versatile language model because it has demonstrated the highest level of results in a wide range of natural language processing tasks. BERT is based on Transformer. The transformer is a neural machine learning model using the attention mechanism. Manually building large dataset with human attribute labels is ex-pensive, BERT does not need such a human label [8]. Zhang et al. [9] identify contextual information in pre-training and numeracy as two key factors affecting their performance, the simple method of canonicalizing numbers can have a significant effect on the results.

3. Proposed Method

The method proposed in this research classifies tweets about child-rearing posted on Twitter into classes. The procedure is shown below.

●Acquire a large amount of Japanese text data from Twitter.

●Terms related to child care are selected, and two types of texts (tweets) containing those terms and profiles are collected. We created a set of texts divided into children's developmental stages.

●Using SVMs and the neural language model BERT, we build a classification model that predicts numbers from "0" to "6" from sentences.

Figure 1 shows a simple flow for classifying tweets related to child-rearing.

3.1. Data collection

In this research, we use Twitter's API to obtain texts that target child-rearing information from Twitter. Next, we select terms related to parenting and screen texts (tweets) containing those terms. Parenting terms are shown in Table 1.

Two types of text collections are formed, one for containing terms in tweets and the other for containing terms in profiles.　Search for the character string "*-year-old child" and create a set of texts classified by the developmental stage of infants from "0-year-old child" to "6-year-old child". Table 2 and Table 3 show part of the tweet text set and profile set created according to the developmental stages of infants.

3.2. Preprocessing

For the data, the tweet text and profile are used as they are. We have not done any processing such as removing hashtags. Also, information such as retweets and mentions are not used.

Data pre-processing is performed in order to perform machine learning on the set of created tweet texts and profiles. In preprocessing, morphological analysis is performed with MeCab, a tool specialized for Japanese language analysis, and the document is vectorized using the Scikit-learn library TfidfVectorizer. we used Unigram features.

Morphological analysis is the task of breaking down the words that make up a sentence into the smallest units and dividing and writing the sentences. Given a set of documents, TfidfVectorizer converts each document into a vector based on the TF-IDF values. TF (Term Frequency) represents the frequency of occurrence of words in a document. IDF (Inverse Document Frequency) is a score that lowers the importance of a word that appears in a large number of documents. TF-IDF is a metric that is a product of TF and IDF [2].

3.3. Classification Method

BERT+fine tuning is used in this research, so it is neural supervised learning. We used the supervised SVM model to compare with the pre-trained model with fine tuning BERT model. We used SVM because it is a standard non-neural algorithm.

Classifier A and classifier B are created using two classification methods, SVM and BERT.

3.3.1. SVM classification method

The classifier A is constructed using the data divided for each developmental stage of infants. For the text categorization task, we used the SVM algorithm and the implementation used the Python machine learning library Scikit-learn to predict the numbers "0" to "6" from the sentences to build a classifier A. Classifier A using SVM creates two types: Task Ts (tweet), which uses tweet sentences as data, and Task Ps (profile), which uses profiles.

3.3.2. Classification method by BERT

As a language model, the neural language model BERT has been actively studied in recent years. For the constructing classification model B, we used neural language model BERT to classify from "0" to "6”. Two types of classifier B using BERT are created: task Tb (tweet), which uses tweet text as data, and task Pb (Profile), which uses profiles. The BERT model is required to create a classification model. However, it is difficult to prepare a sufficient amount of data set for pre-training to create a model specialized for numerical classification from "0" to "6", so it's a need to fine-tune the pre-trained model. A classifier is created by fine-tuning a pre-trained model.

In BERT, a model specialized for a specific task can be configured by fine-tuning using supervised data for each task, so performance improvement can be expected compared to applying a pre-trained model as it is. As a pretrained model uses a BERT-Base model with 110M parameters and a large model size. The data used for fine-tuning are a set of tweet texts and profile sets created by the developmental stages of infants from 0 to 6 years old. The Task Tb uses the tweet text set, and the Task Pb uses the profile set for fine-tuning.

4. Experimental setup and results

4.1. Classification method by BERT

4.1.1. Data used in experiments

For the experiment, we used tweets about child-rearing collected using Twitter's API. The collection period is from July 19, 2017 to October 31, 2017. At that time, we collected tweet texts and profiles containing any of the words shown in Table 1.

Regarding the data size, the profile data is 1575 items and the size is 376㎅, The vocabulary size of profile is 78720 words. The data of the tweet text is 953 items, and the size is 238㎅, The vocabulary size of tweet is 55378 words. Items of each category of Tweet data profile data are shown in the Table 4.

4.1.2. Experiment details

In the classification experiment, Task Tb and Pb described in Section 3.3.2 are used to classify tweet texts or profiles described in Section 4.1.1, and the performance is compared with classifiers other than BERT.

Classifier B, which uses BERT, classifies the data as described in the classifier. The Task Pb is fine-tuned on the profile set and classifies the profiles of users who are raising infants aged 0 to 6 years. One of the eight parts of the training data is used for verification and the rest for fine-tuning. We checked the classification results in each case. For each classifier output, the result output by the BERT classifier is the normalized probabilities ranging from 0 to 1 for each label, summing to 1. As for the classification results, the one with the highest probability of each label for the input is taken as the output.

We also use SVM classifier A for performance comparison. Nonlinear SVM is to map nonlinear data to a space that becomes linearly separable and linearly separable on a hyperplane. In order to process the Japanese sentences of the tweets to be classified, the input text is converted to vectors. When we classify with scikit-learn's SVM module, we first normalized with StandardScaler and then used the default parameters. The nonlinear SVM classifies each label from "0" to "6" in the identification space according to which region it belongs.

We performed a 5-fold cross-validation on the train set using the train-test split on the profile data and the tweet body data. 80% of the tweets and profile used in the experiment were used as learning/verification data, and 20% as test data. Then, using stratified 5-fold cross-validation, the training data is divided into 5 so that the ratio of labels in each division is the same as the overall ratio, and 4 of the 5 are training data, 1 is used as validation data. to train and evaluate the model.

Once evaluation and training are completed, one of the four training data and validation data are replaced, and the model is trained and evaluated again. By doing this five times and obtaining the average classification accuracy of the five times, and using that value as the classification accuracy, it is possible to perform a robust evaluation that does not depend on the division of the data.

Regarding the adjustment of the parameters of each machine learning method, the above-mentioned cross-validation is performed for all combinations of parameters specified in advance using grid search, and the parameter model that shows the best classification accuracy is generated. Finally, we tested how well each generated model could classify the test data.

4.2. Experimental results

4.2.1. The classification results of the created classifier A.

We performed a 5-fold cross-validation on the train set using the train-test split on the profile data and the tweet body data. Finally, here are the test set results:

Profile data test set results:

Best score on validation set: 0.5834370306801115

Best parameters: {'gamma': 0.01, 'C': 100}

Test set score with best parameters: 0.5736040609137056

Results from a test set of tweet body data:

Best score on validation set: 0.27586920122131386

Best parameters: {'gamma': 0.01, 'C': 100}

Test set score with best parameters: 0.30962343096234307

The confusion matrix is shown in Table 5.

Regarding the evaluation of test data, we used three values: accuracy, precision, and recall. accuracy=

\frac{A + D}{A + B + C + D}

, recall=

\frac{A}{A + C}

, precision=

\frac{A}{A + B}

Table 6 shows the classification results of the created classifier A. The accuracy of the resulting model can be measured using the entire test data. We classified the profile set with classifier A and found that the average accuracy rate of Task Ts was 30%. After classifying the tweet set with classifier A, we found that the average accuracy rate of Task Ps was 57%. classification results of the classifier A created. The results show on blow Table 6.

4.2.2. Results of classifier B using BERT

Table 7 shows the classification results of the created classifier B. Classifier B training data, validation data, and evaluation data are randomly divided, so the results change each time the program is run. Table 7 shows the results obtained after many runs. The accuracy of the resulting model can be measured using the entire test data. Classification of the profile set was performed with classifier B; the average accuracy rate of Task Tb was 60%. We classified the tweet set with classifier B and found that the average accuracy rate of Task Pb was 32.6%.

From these results, it can be seen that the accuracy rate of classifier B is slightly higher than that of classifier A.

5. Conclusions

In this study, we find that the results obtained by classifying the set of tweet texts with BERT are higher than the results obtained by classifying with SVM. the results obtained by classifying the set of profile with SVM are higher than the results obtained by classifying with BERT.

In order to improve the accuracy rate, we increase the amount of data and remove noise in the data preprocessing.

In recent years, there has been a lot of research into visualizing the learning results of neural network models (research on the explain ability of models) to solve the problem of how to present language models that have actually been obtained to users. By incorporating these research results, we plan to present learning results to users.

Future plans are as follows. Collect information about infants from Twitter to increase the size of the data. Data preprocessing is to denoise text data and to remove stopwords. We will try to apply more sophisticated models such as GPT for comparison in the future.

References

Maryam, Khodabakhsh.; Fattane, Zarrinkalam.; Hossein, Fani.; Ebrahim, Bagheri.; Predicting Personal Life Events from Streaming Social Content, October 2018, the 27th ACM International Conference.
Koustuv, Saha.; Jordyn, Seybolt.; Stephen, M, Mattingly.; Talayeh, Aledavood.; Chaitanya, Konjeti.; Gonzalo, J, Martinez.; Ted, Grover.; Gloria, Mark.; Munmun De Choudhury.; What Life Events are Disclosed on Social Media, How, When, and By Whom? CHI '21: CHI Conference on Human Factors in Computing Systems, Yokohama Japan, May 8 - 13, 2021.
Choudhury, M.D.; and Massimi, M.; “She said yes!” Liminality and Engagement Announcements on Twitter, Proc. iConference 2015, Newport Beach, CA. USA, pp.1-13 (2015).
Burke, M.; and Kraut, R.; Using Facebook after Losing a Job: Differential Benefits of Strong and Weak Ties, Proc. 2013 Conf. Computer Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, TX, USA, pp.1419-1430 (2013).
Minoru, Yoshida.; Kenji, Kita.; Mining Numbers in Text: A Survey;2021. [CrossRef]
Anton, Bakalov.; Ariel, Fuxman.; Partha, Pratim, Talukdar.; Soumen, Chakrabarti.; SCAD: collective discovery of attribute values. WWW 2011: 447-456.
Jacob, Devlin.; Ming-Wei, Chang.; Kenton, Lee.; and Kristina, Toutanova.; Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018. arXiv:1810.04805.
Kohei, Yamamoto.; Kazutaka, Shimada.; Acquisition of Periodic Events with Person Attributes. IALP 2020, Kuala Lumpur, Dec 4-6, 2020.
Xikun, Zhang.; Deepak, Ramachandran.; Ian, Tenney.; Yan, Elazar.; Dan, Roth: Do Language Embeddings capture Scales? EMNLP(Findings) 2020: 4889-4896.

Figure 1. Flow of proposed method.

Table 1. The Parenting terms.

Search words used to collect text	Search terms used to collect profiles
child,0-year-old child One year old,1-year-old,Two years old,２-years old,Three years old,３-years old,Four years old,４-years old,Five years old,５-years old, Six years old,6 -years old,zero-year-old	raising children, Dad, Mother, Father, my dad, my mum, raising children, child

Table 2. part of the profile text set collection from 0 to 6 years old.

Old	Part of profile data collection
0	I am a housewife raising a *-year-old child.
6	Full support for Hiroshima Cup in Tokyo. Takuya Kimura was my all-time favorite player. Mother of a * -year-old child. She is from Hiroshima Prefecture. She lives in Tokyo.
2	around four-working mother who is raising *-year-old child. I like music, Hanshin Tigers, Daichi Miura, and animals.
4	I live with Sunhora and Ayanogo.Adult.I'm quietly doing cosplay. I'm a mom of -year-old child. RT is too much.\n Archive ID: 52993
1	An adventurer who explores cognitive science and sign language with a focus on linguistics.Specializes in cognitive linguistics.Postdoc in R.With *-year-old child. Working mama first grader. http:\/\/ask.fm\/rhetorico
３	With * year-old daughter,Shinma who is taking care of her family.\nTwitter is still not working.
5	Daily mumblings of an unfortunate rotten adult who likes anime, manga and sometimes games. Childcare (*Year-old child) Mutters here and there. Currently pregnant with second child. (Scheduled for the second half of October) Gestational diabetes, hospitalization for threatened premature labor, etc. I'm already full_ (:3”∠) _

Table 3. part of the tweet text set collection from 0 to 6 years old.

Old	Part of original tweet texts
4	I'm coming Tokyo Disney, and when I saw the Coconut tree*year old boy said, "Mom! This is Hawaii!" and I laughed.
5	*Year-old boy often shakes his head. Isn't the ex-nurse mother-in-law a tic ... https:\/\/t.co\/KstYS2lcJ1
0	My daughter's album has stopped until *-year-old 5 months ...that's bad... After all, I haven't had time since I returned to work.
3	@0246monpe in May become -year-old! In the pitch-black room, I can hear the year-olds humming (groaning?) (´ｰ｀)\n I'm depressed that it will be crowded, but I'm looking forward to it and I have to take him (^◇^;)
1	@Alice_ssni*-year-old \n started crawling. he can get over my mother's body too. \n Even when my mother was lying on her back, I was able to come up to π and drink. \n thank you for the meal. Itadakimasu has a high probability of being done by yourself (when you wearing an apron then understand).
2	RT @FururiMama98: Parenting is hard and painful. Now, I have a *-year-old daughter, but it's really hard. It's really hard to kill myself and keep looking at others and hugging them. Even after chasing, cuteness: frustration = about 1:9. He is also an unbalanced eater, and is always overturned with toys around the house. How can I rest and comfort myself...
6	RT @unikunmama:🎉happy birthday unnie🎂\n I turned *-year-old today (o^^o) Let's have fun together from now on, https:\/\/t.co\/ybW9AcS4oP

Table 4. Items of each category of Tweet data and profile data.

data	age_0	age_1	age_2	age_3	age_4	age_5	age_6	total
Profile	206	572	255	170	80	85	207	1575
Tweet-	46	200	212	205	105	129	56	953

Table 5. The confusion matrix.

		True label
		positive	negative
Prediction label by SVM	positive	(A)True positive	(B)False positive
Prediction label by SVM	negative	(C)False negative	(D)True negative

True positive (A): True label positive and prediction positive (correct answer). False positive (B): True label negative and prediction positive (wrong answer). False Negative (C): True label positive and prediction negative (wrong answer). True negative (D): True label negative and prediction negative (correct answer).

Table 6. Classification results of the classifier A.

	Accuracy	precision	recall
Task Ts	0.30	0.24	0.18
Task Ps	0.57	0.83	0.42

Table 7. Classification results of the classifier B.

	Accuracy rate
Task Tb	32.6％
Task Pb	60％

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer