We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

A Simplified Comparative Study of Machine Learning Classifiers

Abstract

Machine Learning is an emerging field, aiming to make the machines:

  • do predictions about future
  • classify the information in order to help out people so that they can make better decisions.

Several machine learning algorithms are available. ML algorithm is made to learn from past experiences by analyzing the historical data. In this way ML algorithm is said to be trained enough to make future prediction very well . Classification is a supervised machine learning approach, where a set of given data points are assigned different classes being already defined. In other words classification predicts the class and labels the given data points.

We will write a custom essay on A Simplified Comparative Study of Machine Learning Classifiers specifically for you
for only $16.38 $13.9/page

Order now

Several classification algorithms are available in order to classify the given data. For example Logistic regression, Naïve Bayes, KNN, Decision Trees etc. In this paper, I have applied six classification algorithms on same dataset. Results have been compared then: using some performance evaluation measures like precision, accuracy, incorrect predictions and recall.

Introduction

Whenever there is a problem to be solved using machine learning, we are not sure that which ML algorithm’s performance will be best . Observing the problem nature, one can easily identify the type of algorithm to be used to solve a particular problem, either its regression or classification? But to choose the exact regression or classification algorithm type, which will outperform, is a difficult task. The only way to choose the best algorithm is checking in advance the performance of specific algorithms and then select certain algorithms to move forward.

Main purpose of the paper is to simplify the task of choosing best algorithm(s) for your problem. Here six classification ML algorithms are compared:

  1. Logistic Regression
  2. K-Nearest Neighbors
  3. Support Vector Machine(SVM)
  4. Kernel SVM
  5. Naive Bayes
  6. Decision Trees
  7. Random Forest

The problem is a standard binary classification dataset called the Loan Dataset for loan prediction problem. This dataset has 615 instances, 13 attributes and 2 classes.

Related Work

Mostly all the research work dealing with classifiers comparison falls into two main categories:

  1. Relatively few classifiers are compared and validated in order to justify the need of a new approach (e.g -)
  2. Some has done the comparison of many classifiers in a very systematic way both qualitatively and quantitatively.

For example ,, have done qualitative analysis over many different classifiers telling the advantages and limitations, disadvantages of each method. While has done quantitative analysis of classifiers.

Mostly the researchers conducted research in order to find out that which classifiers are more suitable for problems under discussion (see e.g -), but very few of these compare the performance of these classifiers in a quantitative way.

Moreover it is also found that mostly these classifiers are analyzed/ compared on multiple datasets. This research paper simplifies the analysis task by targeting same dataset for different classifiers in order to facilitate ML beginners so that they could easily take an overview of these algorithms’ working.

Research Methodology

Selected Dataset

I selected loan classification dataset taken from kaggle. This dataset has 615 instances, 13 attributes and 2 classes.

This dataset can be downloaded from this link:

Loan Data Set

Basically this dataset was provided by Dream Housing Finance company (dealing in all home loans).

Selected Classification Algorithms

I have used following algorithms for same loan dataset:

  1. Logistic Regression
  2. K-Nearest Neighbors
  3. Support Vector Machine(SVM)
  4. Kernel SVM
  5. Naive Bayes
  6. Decision Trees
  7. Random Forest

Kaggle is a platform to perform analytics and predictive modeling, arrange competitions related to Data Science around the world. Moreover it has powerful tools and resources to test our projects. Python language is used for project coding.

READ:  Review on Methodologies in the Field of Comparative Education

Project Steps explained:

Step 1: Import required libraries and Load the dataset

Step 2: Data preprocessing (Filling missing values, converting non-numeric attributes to numeric etc,).

Step 3: Visualizing the dataset

Step4: Splitting the Data set

This dataset has around 615 records. I used 80% of it for training the model and 20% of the records to evaluate all chosen classifiers one by one. As this dataset has lot of columns, I used most influencing fields Income fields, loan amount, loan duration and credit history fields to train the models.

Step5: Applying different classifiers one by one. In this step I applied different classifiers (already told) and then evaluated model performance using confusion matrix.

Results

In classification problems, the predicted results can be compared with the actual results by using the confusion matrix. In simple words, confusion matrix tells the count of correct and incorrect entries.

Comparative Analysis

Below is given bar charts of above Table:2 comparison parameters erforms the other classifiers as it has higher accuracy and less number of incorrect predictions.

Conclusion

Classification is the main field of machine learning dealing with the categorization of given data in different classes.

A large number of classification algorithms are there. So it’s a difficult and technical task (especially for ML beginners), to find out in advance which algorithm will solve our machine learning problem effectively. I have applied seven different classifiers on a single dataset in order to get an idea that which classifier is best for this dataset.

Based on different quality measures like accuracy, precision, recall and number of incorrect predictions, I concluded that Logistic regression fits best for this dataset.

References

  1. Blog, Machine learning 6 May 2020 What is Machine Learning? A definition by Expert System Team
  2. Machine Learning Project 17 — Compare Classification Algorithms by Omair Aasim
  3. Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Transactions Pattern Analysis and Machine Intelligence 27: 230–244.  
  4. Bezdek JC, Chuah SK, Leep D (1986) Generalized k-nearest neighbor rules. Fuzzy Sets and Systems 18: 237–256.
  5. Seetha H, Narasimha MM, Saravanan R (2011) On improving the generalization of SVM classifier. Communications in Computer and Information Science 157: 11–20. 
  6. Fan L, Poh K-L (1999) Improving the Naïve Bayes classifier. Encyclopedia of Artificial Intelligence 879–883.
  7. Tsang IW, Kwok JT, Cheung P-K (2005) Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research 6: 363–392. 
  8. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans. Pattern Analysis and Machine Intelligence  22: 4–37. 
  9. Wu X, Kumar V, Quinlan JR, Ghosh J (2007) Top 10 algorithms in data mining. Springer-Verlag.
  10. Kotsiantis SB (2007) Supervised machine learning: A review of classification techniques. Informatica 31: 249-268.
  11. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7: 1–30.
  12. Howell AJ, Buxton H (2002) RBF network methods for face detection and attentional frames. Neural Processing Letters 15: 197–211.
  13. Darrell T, Indyk P, Shakhnarovich F (2006). Nearest neighbor methods in learning and vision: theory and practice. MIT Press.
  14. Huang L-C, Hsu S-E, Lin E (2009) A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic. Journal of Translational Medicine 7: 81.  

Choose Type of service

Choose writer quality

Page count

1 page 275 words

Deadline

Order Essay Writing

$13.9 Order Now
icon Get your custom essay sample
icon
Sara from Artscolumbia

Hi there, would you like to get such an essay? How about receiving a customized one?
Check it out goo.gl/Crty7Tt

A Simplified Comparative Study of Machine Learning Classifiers
Artscolumbia
Artscolumbia
Abstract Machine Learning is an emerging field, aiming to make the machines: do predictions about future classify the information in order to help out people so that they can make better decisions. Several machine learning algorithms are available. ML algorithm is made to learn from past experiences by analyzing the historical data. In this way ML algorithm is said to be trained enough to make future prediction very well [1]. Classification is a su
2021-09-13 05:24:54
A Simplified Comparative Study of Machine Learning Classifiers
$ 13.900 2018-12-31
artscolumbia.org
In stock
Rated 5/5 based on 1 customer reviews