Sentiment Analysis of Technology Utilization by Pekanbaru City Government Based on Community Interaction in Social Media

Government services for the public are currently utilizing technology, especially in the city of Pekanbaru. The government has currently centralized all services for the public, both online and offline, in public service malls. The type of service that uses technology, especially for online services, has received criticism from the netizen in online media such as Twitter. In an effort to improve services for the public, the government must listen to the wishes of the people. To resolve that issue, this study will use sentiment analysis to see positive, negative, and neutral comments. The method used is to see the accuracy generated using the Naïve Bayes Classifier (NBC) method. Bayes Classifier is a statistical classifier, where the classifier can predict the probability of class membership of a data tuple that will fall into a certain class, according to the probability calculation. Accuracy results are obtained by dividing training data and testing data with a comparison of 70%:30% with an accuracy value of 55.56%, Precision 64%, recall 80%, f-score 71.2%.


Introduction
Pekanbaru City is currently experiencing fairly rapid development, from the infrastructural development to the technological developments [1] [2]. The development of information technology requires the Pekanbaru City Government to provide services that are faster, more effective and efficient. For this reason, a technology-based government system such as e-money has been developed at Sultan Syarif Kasim II airport [3]. Problems that occur to utilize the technology by the Pekanbaru City Government are still receiving complaints from the public such as the command center public service [4] which only some people know about this public service and the utilization of CCTV which is applied in traffic light [5] which is not working properly. We can see the utilization of technology by the Pekanbaru City Government from several web portal sites and other official web portals. Meanwhile, to see the various comments of netizens seen from social media Twitter. Netizens' comments on the government are a form of community participation for regional progress [6].
This research start from the web scraping stage, sentiment analysis, and analyzes word weighting Steaming Tokenizing Cleaning Case Folding Preprocessing sentiment analysis process Data analysis sentiment results using the Naïve Bayes classifier method. Web Scraping is the process of retrieving documents or data from web pages that are needed for analysis [7]. The web scraping process can be done manually and automatically, and then the data that has been taken from web pages is stored in a Spreadsheet or Microsoft excel. At the web scraping stage in this study, data were collected from several web portals which were the top sites as well as from official portals discussing the Pekanbaru City Government. The results of web scraping from several web portals are used as reference material to see how the utilization of technology by the Pekanbaru City Government. After we get the data from the web scraping, we can see other netizen comments on social media. Social media that is used as an object to see netizen comments is Twitter, where netizen comments obtained on Twitter are classified into 3 types of comments, namely positive, negative and neutral comments. The types of comments are obtained after crawling the data by utilizing the Twitter API, namely by using sentiment analysis in the comment classification process. Sentiment analysis or opinion mining is the process of understanding, extracting and processing textual data automatically to obtain sentiment information contained in an opinion sentence [8]. In this study, sentiment analysis was carried out to see the opinions or tendencies of netizens' opinions towards the Pekanbaru City Government which contained positive, negative and neutral sentiments regarding the utilization of technology, then the sentiment results were analyzed using the Naïve Bayes Classifier method.
Naïve Bayes is a data mining algorithm with a classification method. Naïve Bayes is a classification with probability and statistical methods [9]. The Naïve Bayes Classifier research aims to carry out the process of classifying the results of netizen comments regarding the application of technology that has gone through a sentiment analysis process.
There are several methods used to apply sentiment analysis, including: Naïve Bayes Classfier, Support Vector Machine (SVM), Neural Network, K-Nearest Neighbor, K-Means, and so forth [10] [11]. Based on previous research [12][13] [14] [15], revealed that the Naïve Bayes Classifier method has high speed and accuracy when applied to large databases and diverse data and the advantages of the Naïve Bayes Classifier method are the simplicity, and the accuracy is quite high, by 90%. Because of that reason, this study uses the Naïve Bayes classifier method. The results of this study will be compared with previous studies using 3 methods with fairly high accuracy [16].

Research Methods
The stages of the research process carried out in this study are described in the flow of the research methodology as follows:

Crawling Data
The following is the data crawling process using the Twitter API with Crawling Data RapidMiner.

Figure 2. Crawling data
After the Twitter data crawling process is carried out as shown in Figure 2, the next step is to pre-process the data obtained.

Preprocessing
The following are the tools used to perform preprocessing in this study: A. The first stage is case folding, this is the process to convert all document text into lowercase letters [17]. B. The second stage is cleaning, which is the stage where characters other than letters are removed and are considered as delimiters, and delete URLs, Mentions and Hashtags [18]. C. The third stage is tokenizing which is the process of separating text data into several tokens. Tokenizing broadly breaks down a set of characters in a text into a word, distinguishing certain characters that can be treated as word separators or not. [19]. D. The last stage is Stemming, where this stage occurs the removal of affixes on each word so that it becomes a basic word, and at this stage it also aims to clean up a word from improper spelling [20].

Word Weighting
Data that has gone through the preprocessing stage must be in numeric form. To convert the data into numeric, using the TF-IDF weighting method. Metode Term Frequency Invers Document Frequency (TF-IDF) is a method used to determine how far the terms are connected to the document by giving each word a weight [21] [22]. This TF-IDF method combines two concepts, namely the frequency of occurrence of a word in a document and the inverse frequency of the document containing the word [23].
In calculating the weights using TF-IDF, the TF value per word is calculated first with the weight of each word valued 1. While the IDF value is formulated in Equation (1).
(1) IDF (word) is the IDF value of each word to be searched, td is the total number of existing documents, df is the number of words that appear in all documents.

Classification Process with Naïve Bayes
Based on the analysis of data requirements in the previous text mining process, this process will explain the processes that will be carried out in data classification.

A. Training Data
At this stage, the value of the data that weight is known will be used as training data for being a reference in making a classification model. Then we will look for the category probability value and the probability of each word in each term for each class from the training data.

Prior Determination Process
First, calculate the probability of each category (prior), in this study, there are three categories, namely positive, negative and neutral categories.

Probability
Then the probability is calculated on each term of all documents. The total number of terms used in this calculation is 35 (table  3.7), 24 positive class terms, 27 negative class terms, and 23 neutral class terms. The number of terms depends on the results of preprocessing the data. The following is a probability calculation for each term: Known: |vocabulary| = 74 Positive Term = 24 Negative Term = 27 Neutral Term = 23 For example, the probability of the word "ada" Then calculated all the probability values. After getting all the probability values for each word, then a test data search is carried out.

B. Test Data
In this process, the testing process will probability in each term taken from all the training data. Before going through the calculation process, the test data first performs the text mining process. Table 2 is the result of test data that has gone through the preprocessing stage.

Misalnya Probabilitas kata "agar"
The following are the results of the probability calculation process from the test data. In the test data above, the probability value is calculated based on the probability value in each term. Calculating the probability value using the Naïve Bayes method in this test data is by multiplying the probability value of all categories by the The results of the calculation of the probability value above obtained the highest value on P(uji|Net) which is 3.27161E-19, so that the results of the comments on the test data are classified in the POSITIF category.

Data Analysis
After doing manual calculations, the next step is to analyze data using RapidMiner to see accuracy, recall, and precision.

Results and Discussion
The process carried out in this study is almost the same as previous research [16]. However, this study did not use Cross-Validation. Previous studies got 100% results using the Naïve Bayes method. Figure 3 is the process flow design that is carried out. Here is the explanation of figure 3: a. The apply model is used to accommodate the resulting data from training data and testing data. b. Naïve Bayes is used for calculating the methods used in RapidMiner. c. Performance is used to see the results of data accuracy in the form of percentages and in the form of confidence.

RapidMiner Analysis Results
The display of accuracy using RapidMiner is shown in the following figure. a. Sentiment Analysis can also see the performance results from the Naïve Bayes calculation, so the performance results obtained from the Naïve Bayes analysis are 55.56%. RapidMiner can also see the results on the confidence graph from the Naïve Bayes calculation, and the graph results show NEUTRAL data, the graph tends to be NEUTRAL as well. So it can be concluded that the sentiment towards the tweet data is NEUTRAL. Figure 6. Display of analysis graph results

Testing
The testing stage is carried out by taking the results from the RapidMiner which is the result of the test data process as many as 45 test data from a sample of 70 data. The following is the result of the rapid miner. In this test, 70 comments of training data and 45 comments of test data are used. The results are as in the table below. The results of manual testing and RapidMiner testing get the same results with an accuracy value of: 55.55%.

Precision and Recall Calculation
Precision is the ratio of true positive predictions to the overall positive predicted results. And recall is the ratio of true positive predictions compared to the overall data that are true positive. True

Conclusion
After completing a series of steps towards analyzing sentiment data on Twitter with the hashtag #pemerintahanpku, #teknologipku, #smartcitypekanbaru, #pekanbaru, #infopku regarding the technology utilization by the Pekanbaru City Government, several conclusions can be drawn including the graphic data showing that the application of technology by the Pekanbaru City Government has a neutral sentiment with the largest value with a percentage of 55.56%.
Then the graph shows that netizens or the public have not fully cared about the utilization of technology applied by the Pekanbaru City Government, seen from the results of the sentiments that have been analysed which are neutral. The results of comparison with previous studies show a significant decrease. This is because this research does not use crossvalidation in the classification process