Get help now
  • Pages 5
  • Words 1107
  • Views 241
  • Download


    Verified writer
    • rating star
    • rating star
    • rating star
    • rating star
    • rating star
    • 5/5
    Delivery result 2 hours
    Customers reviews 234
    Hire Writer
    +123 relevant experts are online

    Related Topics

    A Simplified Comparative Study of Machine Learning Classifiers

    Academic anxiety?

    Get original paper in 3 hours and nail the task

    Get help now

    124 experts online


    Machine Learning is an emerging field, aiming to make the machines:

    • do predictions about future
    • classify the information in order to help out people so that they can make better decisions.

    Several machine learning algorithms are available. ML algorithm is made to learn from past experiences by analyzing the historical data. In this way ML algorithm is said to be trained enough to make future prediction very well [1]. Classification is a supervised machine learning approach, where a set of given data points are assigned different classes being already defined. In other words classification predicts the class and labels the given data points.

    Several classification algorithms are available in order to classify the given data. For example Logistic regression, Naïve Bayes, KNN, Decision Trees etc. In this paper, I have applied six classification algorithms on same dataset. Results have been compared then: using some performance evaluation measures like precision, accuracy, incorrect predictions and recall.


    Whenever there is a problem to be solved using machine learning, we are not sure that which ML algorithm’s performance will be best [2]. Observing the problem nature, one can easily identify the type of algorithm to be used to solve a particular problem, either its regression or classification? But to choose the exact regression or classification algorithm type, which will outperform, is a difficult task. The only way to choose the best algorithm is checking in advance the performance of specific algorithms and then select certain algorithms to move forward.

    Main purpose of the paper is to simplify the task of choosing best algorithm(s) for your problem. Here six classification ML algorithms are compared:

    1. Logistic Regression
    2. K-Nearest Neighbors
    3. Support Vector Machine(SVM)
    4. Kernel SVM
    5. Naive Bayes
    6. Decision Trees
    7. Random Forest

    The problem is a standard binary classification dataset called the Loan Dataset for loan prediction problem. This dataset has 615 instances, 13 attributes and 2 classes.

    Related Work

    Mostly all the research work dealing with classifiers comparison falls into two main categories:

    1. Relatively few classifiers are compared and validated in order to justify the need of a new approach (e.g [3]-[7])
    2. Some has done the comparison of many classifiers in a very systematic way both qualitatively and quantitatively.

    For example [8],[9],[10] have done qualitative analysis over many different classifiers telling the advantages and limitations, disadvantages of each method. While [11] has done quantitative analysis of classifiers.

    Mostly the researchers conducted research in order to find out that which classifiers are more suitable for problems under discussion (see e.g [12]-[14]), but very few of these compare the performance of these classifiers in a quantitative way.

    Moreover it is also found that mostly these classifiers are analyzed/ compared on multiple datasets. This research paper simplifies the analysis task by targeting same dataset for different classifiers in order to facilitate ML beginners so that they could easily take an overview of these algorithms’ working.

    Research Methodology

    Selected Dataset

    I selected loan classification dataset taken from kaggle. This dataset has 615 instances, 13 attributes and 2 classes.

    Basically this dataset was provided by Dream Housing Finance company (dealing in all home loans).

    Selected Classification Algorithms

    I have used following algorithms for same loan dataset:

    1. Logistic Regression
    2. K-Nearest Neighbors
    3. Support Vector Machine(SVM)
    4. Kernel SVM
    5. Naive Bayes
    6. Decision Trees
    7. Random Forest

    Kaggle is a platform to perform analytics and predictive modeling, arrange competitions related to Data Science around the world. Moreover it has powerful tools and resources to test our projects. Python language is used for project coding.

    Project Steps explained:

    Step 1: Import required libraries and Load the dataset

    Step 2: Data preprocessing (Filling missing values, converting non-numeric attributes to numeric etc,).

    Step 3: Visualizing the dataset

    Step4: Splitting the Data set

    This dataset has around 615 records. I used 80% of it for training the model and 20% of the records to evaluate all chosen classifiers one by one. As this dataset has lot of columns, I used most influencing fields Income fields, loan amount, loan duration and credit history fields to train the models.

    Step5: Applying different classifiers one by one. In this step I applied different classifiers (already told) and then evaluated model performance using confusion matrix.


    In classification problems, the predicted results can be compared with the actual results by using the confusion matrix. In simple words, confusion matrix tells the count of correct and incorrect entries.

    Comparative Analysis

    Below is given bar charts of above Table:2 comparison parameters erforms the other classifiers as it has higher accuracy and less number of incorrect predictions.


    Classification is the main field of machine learning dealing with the categorization of given data in different classes.

    A large number of classification algorithms are there. So it’s a difficult and technical task (especially for ML beginners), to find out in advance which algorithm will solve our machine learning problem effectively. I have applied seven different classifiers on a single dataset in order to get an idea that which classifier is best for this dataset.

    Based on different quality measures like accuracy, precision, recall and number of incorrect predictions, I concluded that Logistic regression fits best for this dataset.


    1. Blog, Machine learning 6 May 2020 What is Machine Learning? A definition by Expert System Team
    2. Machine Learning Project 17 — Compare Classification Algorithms by Omair Aasim
    3. Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Transactions Pattern Analysis and Machine Intelligence 27: 230–244. [PubMed] [Google Scholar]
    4. Bezdek JC, Chuah SK, Leep D (1986) Generalized k-nearest neighbor rules. Fuzzy Sets and Systems 18: 237–256. [Google Scholar]
    5. Seetha H, Narasimha MM, Saravanan R (2011) On improving the generalization of SVM classifier. Communications in Computer and Information Science 157: 11–20. [Google Scholar]
    6. Fan L, Poh K-L (1999) Improving the Naïve Bayes classifier. Encyclopedia of Artificial Intelligence 879–883. [Google Scholar]
    7. Tsang IW, Kwok JT, Cheung P-K (2005) Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research 6: 363–392. [Google Scholar]
    8. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans. Pattern Analysis and Machine Intelligence  22: 4–37. [Google Scholar]
    9. Wu X, Kumar V, Quinlan JR, Ghosh J (2007) Top 10 algorithms in data mining. Springer-Verlag.
    10. Kotsiantis SB (2007) Supervised machine learning: A review of classification techniques. Informatica 31: 249-268. [Google Scholar]
    11. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7: 1–30. [Google Scholar]
    12. Howell AJ, Buxton H (2002) RBF network methods for face detection and attentional frames. Neural Processing Letters 15: 197–211.[Google Scholar]
    13. Darrell T, Indyk P, Shakhnarovich F (2006). Nearest neighbor methods in learning and vision: theory and practice. MIT Press.
    14. Huang L-C, Hsu S-E, Lin E (2009) A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic. Journal of Translational Medicine 7: 81.[PMC free article] [PubMed] [Google Scholar]

    This essay was written by a fellow student. You may use it as a guide or sample for writing your own paper, but remember to cite it correctly. Don’t submit it as your own as it will be considered plagiarism.

    Need custom essay sample written special for your assignment?

    Choose skilled expert on your subject and get original paper with free plagiarism report

    Order custom paper Without paying upfront

    A Simplified Comparative Study of Machine Learning Classifiers. (2021, Sep 13). Retrieved from

    We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

    Hi, my name is Amy 👋

    In case you can't find a relevant example, our professional writers are ready to help you write a unique paper. Just talk to our smart assistant Amy and she'll connect you with the best match.

    Get help with your paper