1. Introduction
The E-Learning Platform has increased from year to year. In 2016, 21 million users were registered on the Coursera E-Learning Platform, in 2017 there were 28 million users, and in 2018 there were 35 million users. There is an increase of 7 million every year. In 2020 with the Covid-19 pandemic, the number of people participating in online learning increased dramatically. This can be seen from the online registration data on the Coursera Platform which has experienced a very significant increase, namely in 2019 there were 76 million registrants while in 2020 there were 143 million. The number of registrations has doubled, while from 2020 to 2021 there was an increase of 30%, namely 189 million registrants. Based on data from Coursera, Indonesia is included in the 10 countries with the most increase in learners on the Coursera Platform. Indonesia is in fifth place with an increase of 69% with 789,000 learners in 2021 (WEF, 2022).
After the WHO officially announced that Covid-19 became a symptom of an international pandemic. The Covid-19 pandemic has had a huge impact on all fields, especially in the field of education. Face-to-face learning cannot be carried out during Covid-19. This has made many countries change the traditional way of learning to online learning to stop transmission from the Covid-19 pandemic. This online learning process is adapted quickly in various countries so that the teaching and learning process can still be carried out even though it is not face-to-face.
This data shows that online learning platforms are starting to be trusted and can be an option to add expertise in various fields. In Indonesia itself, online learning platforms such as Coursera have also increased, and many startups have the same concept, namely online learning. Some of them are Zenius, a company engaged in online tutoring in video format for elementary, junior high, high school, and college entrance preparation students. Zenius.net based learning can motivate and increase the concept of understanding from students by 60% on average ideal scores [
1]. Ruangguru is also a company engaged in online learning for elementary, junior high, high school, and college entrance preparation students. The system provided by Ruang Guru creates new learning behaviors in Indonesia and future technology-based learning experiences through Ruang Guru products, students and teachers can learn critical thinking, creativity, collaboration, and communication [
2]. In addition, there are also online learning platforms such as Dicoding which is used for education in the field of technology, Foodizz which is used for culinary business education, and many others.
The huge potential in the field of online learning has increased significantly, especially during the Covid-19 pandemic, and this changed the way people learn significantly. However, providing personalized content for learners is a problem in online learning. Helping determine the right learning according to the needs and interests of students is an important problem to be solved by platform providers [
3]. Recommendation System can help distance learning learners get personalized material that can increase efficiency in learning, satisfy the needs of learners, and offer learner-centered services [
4].
In the era of the industrial revolution, 4.0 Machine Learning has become popular in various fields because of its ability to learn past and make smart decisions [
5]. Recommendation System which is part of machine learning is one solution to provide personalized material in online learning. The Recommendation System has been widely developed in many fields such as music, film, news, and products in general. Recommendation System has been widely applied by large companies to meet the needs of customers. Linkedin, Amazon, and Netflix are just a few examples. Linkedin recommends relevant connections with people the user might know and fit their profile. Amazon suggests products that are relevant to what customers like. Netflix provides recommendations for movies or series that match what customers watch and like [
6].
Recommendation systems have a big impact on business, large companies like Netflix that 75% of what customers watch comes from the Recommendation System, Youtube also reports that 60% of clicks on the homepage come from the Recommendation System, the CEO of Amazon in 2006 also stated that 35% of their sales came from Cross Selling, namely the Recommendation System. Researchers estimate the business value of recommendations and personalization at more than
$1 billion per year. The amount of potential that can be obtained from the Recommendation System to create recommendations on personalized online learning can also have a big business impact on the company. This makes researchers choose a Recommendation System using Content-Based Filtering with Jaccard Similarity and Cosine Similarity algorithms as research [
7].
Content-Based Filtering aims to group products with similar attributes, based on references from users. Then the Recommendation System with Content-Based Filtering will suggest different items with the same attributes [
8]. Jaccard Similarity and Cosine Similarity algorithms were used in this study because these algorithms are widely used in building Recommendation Systems. Cosine Similarity algorithm uses vector similarity measured by the cosine angle of both vectors. The closer the cosine value is to 1, the smaller the angle between the two vectors in space. But the accuracy of the Recommendation System in the study was very low, the researchers said it was because the use of data sets was too small, which only used titles and descriptions from Netflix. And researchers suggest more complete attributes in the data set such as show duration, Netflix ratings, prominent cast, and others so that the Recommendation System can be more accurate [
9].
The Jaccard Similarity algorithm is also widely used in building Recommendation Systems. Jaccard Similarity is a popular method for calculating similarities between users/items. In the calculation, only considers the number of general ratings between the two users. The benefits of using this method are maximum when the number of general ratings is greater [
10]. In another study, used student data sets collected from undergraduate students with computer science backgrounds within 4 months. The student data set consists of 480 student data with 468 descriptions of learning objects and 8200 student ratings. To create a Recommendation System for Learning Management System (LMS). Of the 4 algorithms used, namely Pearson Correlation Similarity (PCC), Cosine Vector Similarity (CVS), Jaccard Similarity Correlation (JSC), and Euclidean Distance Similarity (EDS) with evaluation metrics, namely Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), the best algorithm in the study was Cosine Vector Similarity (CVS), the second Jaccard Similarity Correlation (JSC), the third Euclidean Distance Similarity (EDS), and finally Pearson Correlation Similarity (PCC) [
11]. From previous studies, researchers chose Jaccard Similarity and Cosine Similarity algorithms to build a Recommendation System.
Data sets are also an important part of building a Recommendation System. The amount of data has increased dramatically with the number of internet users. But not all data available on the internet is useful or provides satisfactory results for users. Researchers created a Recommendation System to provide relevant information to users taking into account past data preferences. Data is personally filtered and customized as per user needs. With a large amount of data, the Recommendation System also becomes more accurate [
6]. In this study, researchers used a qualitative data set obtained from Kaggle, namely the Dataset of the free courses available on Coursera which consists of 8 variables, namely URL, price, institution, title, skill you will gain, ratings, reviews, level type duration. The data consists of 975 instances, the variables used to create the Recommendation System are taken from two variables, namely title and skill you will gain. The skill variable you will gain will be used as an attribute in this study. These attributes will be vectorized so that the skill variable you will gain becomes a binary format using Python. Tools used to process data in the form of Python programming language through Google Colaboratory.
The Jaccard Similarity and Cosine Similarity algorithms will be used by importing the Scikit-Learn library which is a machine learning library for the Python programming language. The test results can be seen based on distance measurement values obtained from the SciPy library which is also a library of the Python programming language for SciPy providing algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other classes of problems.
Research using Content-Based Filtering is most widely used for recommended films. Movie Lens Dataset which contained 9126 films classified by genre. There are a total of 11 genres. And movie ratings have been collected from 671 users. The algorithm used in this study is Euclidean distances to calculate distances against other users obtained and which have recommended minimum values. The results of the study show the output of various movies that have been recommended to users based on their previous behavior patterns [
6]. This other study aims to offer general recommendations for each user, based on the popularity and/or genre of the film. The algorithm used is Cosine Similarity, Cosine Similarity is beneficial because it helps in finding the similarity of objects. After the Cosine Similarity result matrix was found later using the KNN function, researchers found the nearest neighbor of the film to be recommended to the user. Researchers state that the KNN algorithm is implemented in the model along with Cosine Similarity because it provides more accuracy than other distance metrics and its complexity is relatively low as well [
12].