Chatbot Designing Information Service for New Student Registration Based on AIML and Machine Learning

Abstract


Introduction
The process of admitting new students is a routine activity carried out by higher education providers. As a start of the academic process, it is demanded that the process of admitting new students be done well and optimally. One of the efforts made is by providing a new student registration website so that prospective students can get information related to the registration process and flow and other information they need. In addition, it provides consulting and information services to answer questions from prospective students to meet their information needs in order to increase confidence for the tertiary institution.
Consultation and information services are usually carried out directly at the booth provided or through phone service and live chat support available on the college website, because this service involves a conversation between two people that might lead to discussion, an increase in service users can lead to an increase in questionnaires and time wait because it exceeds the capacity of available staff, which results in decreased satisfaction of prospective new students, moreover this service is only available during campus operating hours. Increasing the number of officers can be a solution to this problem but it is not the best choice because it must consider the cost aspects (salary) in the long term. While educational providers need to always provide the best service to ensure the satisfaction of prospective new students, as an alternative solution Chatbot is able to answer these problems because usually the questions raised by prospective new students are questions that are similar or can be spelled out that's all so that it can be categorized as Frequently Shortened Asked Questions FAQ. According to wartaekonomi.co.id that Georgia State University Chatbot named Pounce is able to have up to 50,000 words of conversation from 3,114 new students who have been accepted to help register at Georgia State University. Of all these conversations, Pounce only directs less than 1% of student messages to campus administration staff for questions that require immediate assistance from experts.
Chatbot itself is a computer program that aims to simulate conversations between two people, and has recently become popular because of the widespread use of messaging services and increased Natural Language Understanding. Chatbot technology has been started since 1965 and continues to grow until now, chatbot can be developed various techniques including rules-based AI techniques, artificial intelligence markup language (AIML), chatfuel, chatscript, Unstructured Information Management Architecture (UIMA), Language Understanding Information Service (LUIS), and google dialogflow. The most popular AI technique is AIML, AIML has major weaknesses, namely the inability to produce an appropriate response, no reasoning ability and cannot produce a response like humans [1]. But AIML users remain large like Pandorabots.com which is a free, open-source based website that can be used to develop and publish chatbots on the web. Pandorabots is the largest chatbot community on the internet. Since February 2012 the free Pandorabots community service is home to more than 166,000 botmasters and 201,000 chatbots in various languages [2].
The researchers also used AIML in their chatbots including Indah who implemented AIML on admission of new students at Malang University in 2015, still in the same year Bahartyan who integrated chatbot on the e-commerce web. Followed by Maskur in 2016 by designing a student information center chatbot and to overcome AIML shortcomings Maskur added the ability to Crawling website data on his chatbot if user input was not available in the AIML database. In 2017, Ronaliya created a chatbot to help Manipal universities answer questions about the FAQ, still in the same year Suryani and Amalia created a chatbot application to introduce attractions in East Java. Therefore, JAIA -Journal Of Artificial Intelligence And Applications Vol. 1, No. 1, October 2020, pp. 01-10 ISSSN: XXXX-XXXX, DOI: XX.XXXXX the writer will use AIML for chatbot that will be developed in this study, and to overcome the shortcomings of AIML, the writer will add a learning system that is focused on supervised learning, where if user input is not available in the database, chatbot will predict the answer based on available patterns. on the knowledge base using the selective neural conversational model (or commonly called the Deep Semantic Similarity Model (DSSM)) [3] developed by Microsoft, the selective model studies the similarity function between the answer and the context of the question, where the similarity between the question and answer context vector will computed using cosine similarity. Then the system will choose one answer with maximum similarity as the output of the system answer, the output will be forwarded to the officer to be verified, whether the answer given by the system is correct or not, if it turns out to be wrong then the operator must provide the correct answer, the verification results will be saved on the chatbot database and becomes a learning process for chatbots to answer similar questions.
Measurement of chatbot performance will be done using Confusion Matrix which is a method of evaluating the performance of algorithms from Machine Learning (ML) (especially supervised learning). Confusion Matrix represents the predictions and actual (actual) conditions of the data generated by the ML algorithm. Based on the Confusion Matrix, we can determine Accuracy, Precission, Recall, which is important for a system's performance. Basically the confusion matrix contains information that compares the results of the classification carried out by the system with the results of the classification that should be [4].
Based on the description above, the writer intends to raise the research with the title "Designing Chatbot Information Services for New Student Registration Based on AIML and Machine Learning", it is hoped that the research results can be applied later on the STMIK Amik Riau registration website so that it can improve the services of tertiary institutions.

A. Chatbot
Chatbot is a conversation software system designed to mimic human communication skills that interact automatically with users [1]. Chatbots are based on AI techniques that understand natural language, identify meaning, emotions, and design for meaningful responses. For example, making it easy for customers to get answers to their questions in a convenient way without spending time waiting in the telephone queue or sending emails repeatedly. Chatbots can reduce the number of customer calls, average handling time and customer service costs. However, it is not easy to achieve these functions because it requires various complex interactions between systems. Note that the words 'AI chatbot application system' or 'AI chatbot' are used in this study as synonyms for conversation agents or advanced dialogue systems.

B. Machine Learning
Machine Learning is a sub-field of computer science that has evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from data and make predictions on data. The algorithm operates by building a model of sample input to make predictions or decisions based on data, rather than following program instructions that are truly static [5].
Machine Learning is usually classified into three broad categories, depending on the nature of the "signal" or "feedback" learning available to the learning system.
1. Supervised Learning: Computers are presented with desired sample inputs and outputs, given by "teachers", and the aim is to learn general rules that map inputs to outputs. C. Natural Langguage Processing Natural Language Processing (NLP) is a combination of Artificial Intelligence and Linguistics, which is intended to make computers understand statements or words written in human language. Natural language processing appears to facilitate the user's work and to fulfill the desire to communicate with computers in natural languages [6].

D. Relationship Between Chatbot, Machine
Learning and Natural Language Processing Chatbot is just a robot assistant that can be used in conversation to solve what needs to be done. The conversation can take place via text message or voice command. However, the important thing to understand is that what makes software or machines communicate with humans to do simple to complex tasks is 'artificial intelligence', or AI technology.
The core of the effectiveness of AI is machine learning and Natural Language Processing (or NLP). While machine learning can be used to build several types of chatbot algorithms, NLP acts as a sensor to detect patterns in human speech and even imitate them. Combined, these two main AI factors can help chatbots become more sensitive to human emotions and behavior in real chats, and therefore, can provide a better user experience.

E. AIML
AIML stands for The Artificial Intelligent Mark up Language which is a derivative of Extensible Mark up Language (XML) [7]. AIML objects are composed of units called topics from categories, which contain both parsed and un-parsed data. Parsed data contains characters, some of which are character data, others can be AIML elements. The AIML element speculates knowledge in the form of stimulus-response in the document. AIML contains a collection of patterns and responses that can be used by chatbots to search for answers to each sentence given. AIML interpreters are needed to receive input and trace answers to AIML documents. Currently there are only AIML interpreters in various programming languages so that the process of making chatbots can be focused on compiling AIML documents. In addition, there are currently many ready-to-use AIML documents available for various fields of conversation.

F. Selective Neural Conversational Model
Selective Neural Conversational Model or commonly called Deep Semantic Similarity Model (DSSM) developed by Microsoft [3]. The selective model studies the similarity function between the answer (reply), the context of the question (context), where an answer is one of the elements in the standard group of possible answers (illustration below). Intuition (reason) is the network taking the context and candidate an answer as input and a return of confidence (confidence) of how appropriate they are and others. The Selective Network has two "towers": first for context and second for answers. Each tower may have the architecture you want. The tower takes input and implements it into the semantic vector space (vectors R and C in the illustration). Then, the similarity between the context vector and the computed answer, using cosine similarity. During inference time, you can calculate the similarity between the given context and all possible answers and choose the one with maximum similarity.

G. Confusion Matrix
Confusion Matrix which is a method of evaluating the performance of algorithms from Machine Learning (ML) (especially supervised learning). Confusion Matrix represents the predictions and actual (actual) conditions of the data generated by the ML algorithm. Based on the Confusion Matrix, we can determine Accuracy, Precission, Recall, which is important for a system's performance. Basically the confusion matrix contains information that compares the results of the classification carried out by the system with the results of the classification that should be [4].
Confusion Matrix measures performance for a problem by classifying results into tabular form with 4 different combinations of predicted and actual values, namely True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN). The following is an analogy so that classifiers are easy to understand.
True Positive (TP) : • Interpretation: You predict positive and that's true. • You predict that a woman is pregnant and she really is. In general, the formula used to calculate Precision, Recall, Accuracy is as follows : F. Black Box Testing Black Box Testing or commonly known as functional testing is a software testing method used to test software without knowing the internal structure of the code or program. In this test, the tester is aware of what the program must do but has no knowledge of how to do it. Software testing is done by providing valid and invalid input to the software and see if the output of the software meets the specifications that it should.

Research Metodology
The framework in this study uses a waterfall model consisting of five stages which are described below

A. Requirement Analysis
This stage has the aim to understand the software expected by the user and the limitations of the software. At this stage information is obtained through data collection stages in the form of direct observation on the research object, namely STMIK Amik Riau.

B. Design
The system design stage is the second stage. The requirements specification from the previous stage will be studied in this phase and the system design is prepared. After analysis, the design phase is continued. Here is the architectural process made in detail. The design includes making Use Case Diagrams, Activity Diagrams and Class Diagrams of the chatbot system using UML, coupled with several Flowcharts to explain the workflow of the system including the ChatBot Flowchart, AIML, and Prediction Answers (Machine Learning) using Microsoft Visio.

C. Implementation
The next stage is the third stage, namely implementation. At this stage, the system is first developed in a small program called a unit, which is integrated in the next stage. The coding process in a system starts from the smallest unit. Each unit is developed and tested for functionality called unit testing. Making chatbot system using PHP programming language and using MySQL database with hardware and software specifications as follows: 1.

D. Testing
After the successful implementation, the next stage is the testing process where the system that has been created will be tested using the Black Box Testing technique. Testing is done by providing a number of inputs to the system to see the success of the functionality made whether it is in accordance with the expectations to be achieved.

E. Operation and Maintenance
This stage is the end of the waterfall model. If all stages have been completed and have become a system, will be carried out and carried out maintenance. This maintenance is included in fixing errors that were not found in the previous testing step. At this stage, maintenance is prioritized, because this trial will determine whether the system will succeed or not meet the needs. The technique used to test the success of the system is the Confusion Matrix, by measuring several variables of the system, namely the precision, recall, and accuracy of the system created.

A. Research Data
Based on data collection related to a list of questions that are frequently asked to Amik Riau STMIK, 62 questions were started from June 2016 to August 2019 obtained from the register.stmik-amik-riau.ac.id system through the STMIK Amik Riau sisfo team. The following is an example of the data obtained: Usecase diagram shows the relationships that occur between actors with usecase in the system. The benefit of usecase diagrams is to describe the relationship between system users and software. There are three actors involved in this system, namely Admin, Operator, and Prospective Students.

E. Flowchart Prediction Answer
The answer prediction flowchart is a description of the answer prediction subprocess that is performed on the Chatbot flowchart. The AIML flowchart is as follows:

F. Implementation
Chatbot is made using the PHP programming language and XAMPP (MySQL) for database processing. The appearance of the chatbot system is as follows : 1. Admin Page The admin page consists of several views, i.e.:  Knowledge Base Input Display The knowledge base input display is the system interface when the admin inputs the knowledge base for the chatbot system. The appearance is as follows: The abbreviation input display is the system interface when the admin inputs the abbreviation module for the chatbot system. The appearance is as follows:

Prospective Students Page
Prospective student pages consist of only one view, the ChatBot view, which is the interface that Prospective Students will use when interacting with chatbots. The appearance is as follows : The chatbot test is conducted using Black Box Testing where the Tester will ask the random chatbot system for registration at STMIK Amik Riau. This trial is based on users who know the functionality of the system created, and for chabot conversations by giving 100 questions to chatbots. TP: Questions about PMB were successfully answered by ChatBot FN: The question about PMB was successfully answered by ChatBot but it was incorrect FP: The question asked is not about PMB but it is still answered by chatbot. FN: The question asked was not about PMB but was not answered by chatbot.

A. Conclusions
Based on the results of implementation, trials, and evaluations that have been carried out in the previous chapter, it can be concluded that: 1. The chatbot system is able to answer questions posed by prospective students properly and correctly while JAIA -Journal Of Artificial Intelligence And Applications Vol. 1, No. 1, October 2020, pp. 01-10 ISSSN: XXXX-XXXX, DOI: XX.XXXXX the question is available in the chatbot knowledge base. 2. The success rate of prediction answers given by the chatbot system depends very much on how much knowledge base the chatbot has.

B. Suggestions
The following are suggestions from the results of implementation, trials, and evaluations that can be used as input for further research: 1. The chatbot system is still limited to answering questions raised by prospective students, without being able to provide input so that the conversation is impressed in one direction. 2. The chatbot system still requires human assistance in conducting the Learning process. It is hoped that in the future it can learn on its own without human assistance. 3. Chatbot system can only accept input in the form of text, as well as the output. It is expected that in the future it can receive voice-based input as well as its output.