Sentiment Analysis about Electric Motorbikes in Indonesia Using Twitter Data

ABSTRACT


Introduction
One of Indonesia's efforts to reduce air pollution is using environmentally friendly transportation.Although bicycles are one of the modes of transportation that can positively impact the environment, they have yet to become the primary choice for long-distance travel.Most bicycles still rely on human or manual power, making many users feel tired when travelling long distances.The issue of increasing air pollution is exacerbated by the increasing number of non-environmentally friendly motorized vehicles on the roads and is the cause of increasing pollution in Indonesia (Sukarno et al., 2016;Istiqomah & Marleni, 2020;Lestari et al., 2022;Abidin et al., 2024).
Seeing these conditions, many researchers have previously attempted to design vehicles with environmentally friendly engine operations.One vehicle believed to reduce environmental pollution is an electric vehicle (Masruroh et al., 2023;Fitrianto, 2023;Ravi et al., 2023;Sadiq O. A. & Chidi O. M., 2024).The development of electric vehicles is growing rapidly around the world, even in Indonesia.Electric vehicles that have been developed around the world range from electric cars, electric trains, and electric trucks to electric motorcycles.The type of electric vehicle that is now starting to be found in Indonesia is the electric motorcycle.The Ministry of Transportation (Kemenhub) recorded the number of electric vehicles in Indonesia at 14,400 units as of mid-November 2021.These electric vehicles comprise 1,656 passenger cars, 262 three-wheeled vehicles, 12,464 electric motorbikes, 13 buses, and five freight cars (Kemenhub records the number of new RI electric vehicles at 14,400 units.).The number of electric motorcycle users in Indonesia is higher than that of other types of electric vehicles.The acceptance of electric vehicles in Indonesia has prompted researchers to conduct further research on electric vehicles, including electric motorbikes.Research on the development of electric motorbikes has been carried out in Indonesia, including research conducted (Nurhadi, 2018;Pratiwi et al., 2020;Miftachul U. et al., 2021;Gustiana et al., 2022).Many more studies aim to develop electric motorbikes, starting with battery innovation, bicycle bodies, etc.
The development of electric vehicles in Indonesia was increasingly wide open when the President enacted Presidential Regulation of the Republic of Indonesia Number 55 of 2019, explaining the acceleration of the Battery-Based Electric motorized Vehicle program (Indonesia, P.P., 2019).Not only that, the Provincial Government of Bali also followed suit by issuing Bali Governor Regulation Number 48 of 2019 concerning the Use of Battery-Based Electric motorized Vehicles.Over time, the local industry produced several electric vehicles, especially electric motorcycles.To support a consistent level of electric motorcycle production, the Ministry of Industry has set a target of 400 thousand units of electric cars and 1.76 million units of electric motorcycles for 2025.The next target for 2030 is 600,000 electric cars and 2.45 million motorcycles (Indonesia, C., 2021).
Indonesian people often express their opinions or feelings on social media.Some social media platforms Indonesians use are Facebook, Instagram, and Twitter.Twitter is a social media site that Indonesians still use.Indonesian people's comments on the presence of electric vehicles were also conveyed on Twitter.Indonesian people are active on Twitter in Indonesia, giving positive and negative comments about the presence of electric vehicles.Information from Kurniawan, R. & Apriliani, A. (2020), stated that in 2019, Indonesia experienced an increase in the number of daily active Twitter users.Therefore, Twitter is a medium that researchers often use as an object for collecting data and conducting analysis.Sentiment analysis is a widely chosen method for making observations related to topics that are currently viral or commonly referred to as "Trending Topics."Research from Garcia & Berton (2021); Malik et al. (2021); Neogi et al. (2021); Olabanjo et al. (2023); Persadaa et al. (2024), and many other studies on Twitter as a means of collecting people's responses.This method is an effective way of distributing questionnaires.
According to Liu (2011), sentiment analysis refers to the broad or global field of natural language processing, computation, and text mining, which in turn can be used to analyze a person's opinions, evaluations, attitudes, judgments, and emotions, whether speaker or writer-excited or interested in a particular object, topic, service, or activity.At the sentiment analysis stage, data mining is needed.Text mining is a data mining method defined as a process to obtain and collect information from a database system.Later, a user can take advantage of this information as material for analysis related to something predictive.The data obtained from the text mining process is semi-structured or structured.This semi-finished data still needs to be corrected and formatted consistently to not interfere with the output quality.The study by Ashari et al. (2023), which successfully discussed sentiment analysis regarding the Indonesian public's response to the presence of electric vehicles in general, stated that there were 55% positive responses and the remaining 45% negative.However, in his research, he did not consider neutral opinions.Every tweet word may not fall into the category of positive or negative words.Apart from that, this research only discusses electric vehicles in general.Similar to research conducted by Salsabila et al. (2023), Pratama et al. (2023), andMerdiansah et al. (2024), which also relies on sentiment analysis to identify public responses to electric vehicles in general from the Twitter database.This research specifically conducts sentiment analysis on the Indonesian public's response to electric vehicles, especially electric motorcycles, which are currently being promoted in Indonesia.Therefore, this study can provide an understanding of the acceptance and challenges faced by the electric motorcycle industry in Indonesia.The purpose of this study is to see the level of public enthusiasm regarding electric motorcycle products through Twitter, both from mentions, replies, likes, and retweets.Later, do a sentiment analysis that can categorize text data polarity into positive, negative, and neutral opinion classifications (Fanissa et al., 2018).Finally, this study contributes to analyze why people disagree with the presence of electric motorcycles.Developers can later use the results of this research to create products, especially electric motorcycles, that suit the community's needs.

Method
The research stages contain a pathway or sequence of research that will be undertaken to achieve the research objectives.The sequence of research stages is shown in Fig. 1.Here is an explanation of the research steps mentioned: a. Literature Review It is conducted to gain a comprehensive understanding of the research topic, starting from data collection and analysis to applying data in tools within the system.Referring to relevant theories to understand the research topic deeply is important.These theories can be drawn from journals, articles, and other online scholarly sources, including book references.

b. Collecting Data
The process of collecting data in this study is to scrape all tweets, retweets, mentions, replies, and others in Indonesian using predetermined keywords.In this study, the keywords used to collect data were electric motorcycles.

c. Text Pre-Processing
At this stage, the process of cleaning the data that has been collected is carried out.As we know, Twitter users generally use some emoticons or excessive punctuation in a comment; this includes unnecessary components in the analysis and is feared to interfere with the data analysis process of grouping comments into positive, negative, and neutral classifications.Therefore, these components need to be removed first.Some of the commonly used preprocessing texts are: 1) Folding Box Namely, the process changed the entire text obtained to the standard or same form (into even lowercase letters).

2) Tokenization
It is a process to break down a set of unwanted characters in a sentence structure (automatically raising a comma or period in a word).

3) Filters
The process of filtering important words (words that fit the sentence structure) and removing unimportant words (words that have no meaning).The data labeling process is conducted to identify categories within a dataset.This labeling is useful for determining whether the data falls into the positive or negative category.

e. Data Processing
The data processing technique used in this study utilizes data scraping from Twitter using the API.After collecting the data, the next step is to analyze it using text mining.Text mining is an activity that gathers information from Twitter users, who then interact with a set of documents using an analysis tool (Imam & Fajtriab, 2015).The benefit of using the text mining method is that there will be a classification or grouping of negative, positive, and neutral comments based on the similarity of words with predetermined input.
f. Classification This classification stage is the data stage, with a text preprocessing process.It is then ready to be analyzed according to class classification to determine the popularity of existing texts and whether they belong to positive, negative, or neutral opinion groups (Indrayuni, E., 2019).Interpreting the results is an explanation of the findings obtained from the sentiment data analysis conducted previously.

h. Conclusion and Recommendation
The conclusion will encompass the key points from the data analysis conducted in this study, presented briefly.Meanwhile, the recommendations will include suggestions related to data processing utilizing different methods and media.

Results and Discussion
This research was conducted by carrying out the process of collecting data obtained from scraping Twitter data with the hashtag (electric motor.The syntax used to perform this scraping process is: Motor <-search_tweets (q = "electric motor", n = 1000) Previously, the author tried to scrape as many as 1000 tweets, but scraping data on Twitter only collected as many as 970 tweets.This condition means that only 970 tweets about electric motorcycles are being discussed on Twitter.Based on these results, the scraping results become data in Excel, which, when entered into a table, will appear like Table 1.

Create Corpus
The corpus contains some of the texts obtained and will be used in the research discussion.Corpus keeps all tweets on Twitter clean, in the sense that the tweets obtained on Twitter no longer contain images or videos because the authors will only use text as research material.The syntax corpus and processed results are displayed in Fig. 2. tweet_document <-Corpus (VectorSource(textdata))

Create Stopwords
A stopword is a group of text containing conjunctions or words that are unnecessary in an opinion sentence.Later opinion sentences can be taken from the core and be more optimal in grouping positive, negative, and neutral opinions.The MasDevid account on Github uploaded a file that contains this stopword, which you can access by clicking the following link: https://github.com/masdevid/ID-Stopwords.

Input Positive and Negative Words
Classifying opinions from scraping results helps obtain more optimal data.This classification is then divided into positive, negative, and neutral.The positive and negative words collected came from KBBI, which were recorded manually with the help of sources from the internet.One thousand four hundred seventy-three positive word lists were found, while 2960 negative words were found.Opinion sentences containing positive comments will be included in the classification of positive opinions.Opinion sentences containing negative remarks will be included in the negative opinion classification.Still, if the opinion sentence does not have positive and negative words, then the opinion sentence is included in the neutral opinion classification.Examples of terms used as a reference to separate positive, negative, and unbiased opinions can be seen in Table 2.

Running Results
After doing text preprocessing, the data can be run to show how the opinion classification results have been obtained from as many as 970 tweets through the previous scraping process.Of the 970 deleted tweets, there were 2213 repetitions of words related to electric motorcycle opinions.The ten words that are often written shown in Fig. 3. Based on Fig. 3, it can be seen that the word electricity ranks first for the most written words in Twitter tweets about electric motors, with a total of 1246 words of repetition out of a total of 970 tweets.Next comes the motor word with 858 repetitions: 534 rushes, 534 forces, 534 whys, and so on.Meanwhile, if the sequence of the ten most repeated words is depicted in a bar chart, it will look like Fig. 4.
After knowing this, the writer wants to know how the sentiment results (Table 4 and Fig. 5) from Twitter opinion groups are divided into three classifications: positive, adverse, and Neutral.The syntax used to display sentiment values and sentiment analysis results is as follows: analysis <-score.sentiment(cleartext, pos.words, neg.words) # sentiment score frequency table table(analysis$score) Based on the results of the sentiment analysis, it can be seen that the results of as many as 970 tweets produce points or levels of classification that vary greatly.If the results are further explained in Table 4, information is obtained that 142 tweets are in a neutral classification.In contrast, a negative classification is a total of 208 opinions spread from column 1 to column -7.As for the positive classification, there are 620 opinions spread from column 1 to column 4. So, in general, the favourable opinion classification has a higher number than the negative and neutral opinion classifications.

Word Cloud
A word cloud is an alternative often used to illustrate the level of similarity between one opinion and another.All text often written in Twitter tweets is visualized as coloured writing with different thicknesses depending on the number of words written.The more often a word is used or written, the larger the word size will be displayed in the Word Cloud.The Word Cloud results of sentiment classification in this study are shown in Fig. 6.
Based on the results of the Word Cloud in Fig. 6, it can be seen that the frequency of writing the words bicycle, motorcycle, and electricity shows the highest results, or in other words, these words appear more often.Meanwhile, other words tend to have the same size, meaning they have the same average frequency of writing.

Conclusion
The results of the sentiment analysis that has been carried out in this study can be concluded to show that there are still many Twitter users who have diverse opinions about electric motorcycle products.Based on sentiment analysis calculations and tables, it can be seen that 620 tweets, or 63% of opinions, are classified as positive opinions, 208 tweets, or 21% of opinions, are classified as negative opinions, and the remaining 142 tweets, or 14% of opinions, are classified as neutral opinions.So, 970 tweets by Twitter users about electric motorcycles are positive.This condition illustrates that electric motorcycle product innovation can continue and become an alternative vehicle acceptable to the people of Indonesia.Suggestions for this research include conducting further research by increasing the keywords and analyzing them using other algorithms such as Support Vector Machine (SVM), Nave Bayes, and others.

Fig. 1 .
Fig.1.Research Flowchart d.Labeling Data ProcessThe data labeling process is conducted to identify categories within a dataset.This labeling is useful for determining whether the data falls into the positive or negative category.

Fig. 3 .
Fig. 3.The 10 Words That Appear the Most and Their Number

Table 2 .
Examples of Positive and Negative Words

Table 3 .
Text that has Passed the Text Preprocessing Stage

Table 4 .
Sentiment Analysis Results Table of Sentiment Analysis in The Software