Part of speech tagger online

11/12/2023

It is an early stage for the Thai language in this area. Most studies are focused on question classification in English however, there are some research works on other languages such as Chinese, Arabic, Indonesian, and Thai language. Therefore, if the readers understand how to use wh-questions, they can improve their reading comprehension. For example, “Who” can identify the characters of the narrative story, “When” can identify the time happening, and “Where” can identify the location. The wh-questions would help the reader identify the information. The second one is an automatic classification. The first one is a manual classification by handmade rules to identify expected answer types. Question classification has two main approaches. Question classification learns matching questions to one class or multiple classes and helps identify text's answer types. Question classification (QC) is an essential part in many applications, such as Question Answering (QA), , Information Retrieval (IR), E-learning systems, and Question generation. Using words with the same meaning in the question is complicated to train a text model to understand language like humans. Because question and corresponding answers are related depending on question types, the readers can answer the question based on a keyword. Readers always seek to find an answer based on the type of question encountered. While readers seeking an answer will need to deal much more deeply with the problem of extracting the meaning of a text in a rich sense. The desirable answers could be explored in many resources such as textbooks, encyclopedias, Wikipedia, etc. We use the questioning ability to ask for information or seeking answers. Questioning is the key to gaining more information and is very useful in many applications. Questioning is a significant ability in both human and intelligent engines. In recent years, we have required large amounts of information to retrieve the answer via the question answering applications. The highest average micro F 1-score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset. The classifying question categories achieved with average micro F 1-score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine.

Machine learning models based on traditional and deep learning classifiers were used. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. Question classification is a crucial task for answer selection.

0 Comments

Part of speech tagger online

Leave a Reply.

Author

Archives

Categories