Abstract
It seems like every day a new cell phone is released. The price of mobile phones varies when we talk about all the products being offered in the market, so there is a need for a system to predict Mobile prices in the future. Mobile price classifier can help the customer determine the selling price of a phone. There have been classifiers of mobile prices. This work aims to implement more sophisticated artificial intelligence techniques to maximize accuracy. Also, to improve accuracy, more instances have been added to data sets. This work uses the Random Forest Regression, Logistic Regression, Naive Bayes Classifier and Support Vector Machines which provides a better accuracy for prediction. Regression is a machine learning tool which helps in making predictions by learning from the existing data, the relationships between target parameter and a set of other parameters.
Keywords: Machine Learning, Logistic Regression, Random forest Regression Decision Tree, Naïve Bayes, Support Vector Machines.
Introduction
Mobile phones are long portable and wireless electronic device used for communication. Few years ago, when mobile phones were not very common, they were expensive and communication costs pretty good to the user. But in the last two decades, as the use of phones increased, their cost has decreased considerably. Mobile phones are now cheaper, easy to use, and are equipped with almost every latest feature we desire. Factors that decide the price of a mobile phone include specifications of the phone like brand of the phone, speed of the processor, display, operating system etc. This project focuses on predicting the mobile phone prices from the specifications of the phone provided by the user. Machine learning techniques like Logistic Regression, Random Forest Regression and Naive Bayes Classification will be used here. The whole idea of this project is to classify the mobile phones according to the prices. Prices fall under four categories or classes viz. Class 0, Class 1, Class 2 and Class 3. Class 0 has the lowest prices and Class 3 being the highest. After using the above stated techniques, we can predict the Class of the mobile phone.
Related Work
Two major challenges that researchers have to face while predicting mobile prices. They are finding the appropriate features suitable for predicting mobile feature. Some of the features are the battery, 4G connectivity, RAM specifications. The other major challenge is to find out the appropriate classification algorithms to predict the phone price classification accurately. This work uses various classification algorithms such as Logistic Regression, Decision Trees, Random Forest and Neural Networks.
Work Progress
The whole process is divided into four parts. They are feature definition, feature extraction, application of machine learning techniques and performance improvement.
- Feature Definition
Dataset has been taken from the website Kaggle.com
- Feature extraction
Feature extraction involves scaling down the variables to a certain range so that the magnitude of one variable does not dominate over the other variable.
- Data preprocessing
Data preprocessing involves processing the data and performing cleaning on it so that raw data is converted into a clean data set. Data preprocessing involves removal of outlier and null value removal techniques.
- Machine Learning Techniques
Logistic Regression – It is a popular classification algorithm which is used to classify data into two or more classes. It uses the sigmoid function to classify data into two or more classes. The decision boundary generated by sigmoid function is linear. The sigmoid function is h(x)=1/(1+e^(-x)).Logistic regression can also be used to classify data into more than two classes Two methods are generally used for the same-One vs all and Multinomial. The type of logistic regression are:
- Ordinal logistic regression
- Multinomial Logistic regression
- Binomial Logistic regression
Random forest Regression – Random forest method are generally used to reduce the overfitting of decision trees. Random forest uses the prediction of a collection of decision trees and predicts the majority result obtained by them. The decision trees are fitted with randomness. We can write this class of models as:
g(x) = f0(x) + f1(x) + f2(x) +…
where final model , g, is the sum of simple base models fi. Here, each base classifier is a simple decision tree. It is very useful in handling tabular data having numerical features. Unlike linear models, random forests can capture non-linear interaction between features and the target.
Naive Bayes – A simple technique for constructing classifiers. Naïve Bayes helps in making models that assign class labels to problem instances, represented by vectors of feature values, where the class labels are taken from some finite set. There is not one single algorithm used for training such classifiers, but a group of algorithms based on one common principle that all naive Bayes classifiers assume that value of a feature is independent of the value of any other feature. For example, a fruit may be assumed to be an apple if it is red in colour, sphere, and 12 cm in width. A naive Bayes classifier considers every one of these features to independently contribute to the probability that the fruit is an apple, regardless of any possibility of having correlations between the features like colour, roundness, and diameter. Other types of probability models like the naive Bayes classifiers can also be trained very efficiently in supervised learning.
Bayes’ Theorem
It is used to find the probability of an event occurring provided that the probability of another event that has already occurred. It is written as following:
where, y is class variable and X is a dependent feature vector (of size n) where
————————————————————————————————-By substituting for X and expanding using the chain rule we get, iv) Support Vector Machines – In machine learning, support-vector machines (SVMs), also known as support-vector networks are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. When given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. However, methods such as Platt scaling exist to use SVM in a probabilistic classification settings. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.
In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.
When data is unlabeled, supervised learning is not possible, and an unsupervised learning approach is required, which attempts to find natural clustering of the data to groups, and then map new data to these formed groups.
Literature Survey
- [1] Muhammad Asim, Zafar Khan (2018) Mobile Price Class prediction using Machine Learning Techniques. International Journal of Computer Applications. Mobile Price Prediction Class Prediction using Machine Learning Techniques by Muhammad Asim and Zafar Khan implemented various classification techniques such as Decision Trees and Naïve Bayes to improve the accuracy of the results of mobile price class prediction. It used various parameters such as Display size, weight, thickness, internal memory, camera, video quality, RAM, Battery and improved the accuracy of previous works. The work aims to find the best marketing strategy to find optimal product. The work suggested that more sophisticated machine learning techniques could be applied to maximize the accuracy of the products. The work also suggested that more instances could be added to the data set, to improve the accuracy.
- Debanjan Banerjee, Suchibrota Dutta (2017). Predicting the Housing Price Direction using Machine Learning Techniques. IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017). The growing world needs a lookup into the variation of pricing of houses. Hence the automated world brings into play the work of machine learning via which one can predict the price of the house to rise or fall in the upcoming time. This involves use of two classes namely 0 and 1, viz. which we can categorize and distinguish the price to rise or fall. Debanjan Bannerjee has used a factors involved in this rise and fall of costs including information value, principle component analysis and data transformation techniques like outlier and missing value treatment and box-cox transformation techniques which can be further measured for their accuracy, precision, specificity and sensitivity and judge the technique.
- Panigrahi Srikanth, Dharmaiah Deverapalli (2016). A Critical Study of Classification Algorithms Using Diabetes Diagnosis. 2016 IEEE 6th International Conference on Advanced Computing. Another implementation of this machine learning implementation came to be widely known by Panigrahy’s work on the medical field making doctors’ work easier by classifying Diabetes Disease Patient Datasets using Decision Tree Algorithm, Byes Algorithm and Rule based Algorithm and evaluate Error Rates and hence identification of the patients based evolution. Another important factor being analyzed by this is Data Mining, the use of which helps in appropriate classification. Data mining is the process of extracting useful information from data rich environment. Methods which extract patterns from dataset can be termed as data mining methods. The book Data Mining: Concepts and Techniques by Jiawei Han and Micheline Kamber provides a good learning point for those interested in this field. Data mining is a pivotal step in the KDD (Knowledge Discovery in Databases).Data preprocessing is explained in the book in the simplest way without focusing a lot of details on implementation so that the reader can easily understand. Data preprocessing involves removal of unnecessary data to reduce the missing values, noise and irregularities before applying machine learning techniques. Decision trees, Bayesian classifiers are considered as popular classification algorithms. Furthermore the chapter on classification algorithms also covers important classification algorithms such as k-NN. Regression is also provided as an extension to classification problems. Classification deals with dividing the data into categories, whereas regression deals with predicting continuous values.
- Yi Wang∗, Hui Zang†, Pravallika Devineni, Michalis Faloutsos, Krishna Janakiraman and Sara Motahari (2014). Which phone will you get next: observing trends and predicting the choice. 2014 IEEE. Once again the digital market was helped being updated by prediction the sale of mobile phones in market with the help of machine learning tool for their demands and the growth in the prices with use of about 3 million subscriber’s datasets in a real time network. The phone usage patterns were recognized by the customers’ demographic information like incomes, level, age and so on. The results turned out the use of Androids as the largest demand as per Yi Wang’s paper. Hence resulting in use of assumption based on factors like the type of previous phone, the social influence, and the demographics of the user. The effectivity of this prediction tool was again determined by reducing the prediction error in number of phones by 1/3, and phone costs, which seem to be a right property used by telecom operators for their prediction tasks.
- Tri Doan, Jugal Kalita (2015). Selecting Machine Learning Algorithms using Regression Models. 2015 IEEE 15th International Conference on Data Mining Workshops
Regression Models and Selection of Machine Learning techniques (for data mining) requires appropriate use of algorithms for retrieving the information. For some details one technique might be impractical, whereas for others the only way out would involve the same. Tri Doan uses meta-knowledge for supervised learning via previous machine learning experience. Datasets are foremost arranged and used for meta-knowledge for developing pure statistical summaries and reducing a high dimensional feature space 4–to a smaller dimension required for algorithm performance management. For both numerical and nominal data obtained from real world environments, this technique works quite well.
Conclusion
Mobile phones are becoming popular day by day. It is a basic necessity of life. Mobile phones usage is on the rise and users want reasonable price for it. By implementing the Random Forest Regression, Logistic Regression, Naive Bayes Classifier and Support Vector Machines we can improve the accuracy of our prediction of mobile phone prices.
References
- Muhammad Asim, Zafar Khan (2018) Mobile Price Class prediction using Machine Learning Techniques. International Journal of Computer Applications.
- Debanjan Banerjee, Suchibrota Dutta (2017). Predicting the Housing Price Direction using Machine Learning Techniques. IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017)
- Panigrahi Srikanth, Dharmaiah Deverapalli (2016). A Critical Study of Classification Algorithms Using Diabetes Diagnosis. 2016 IEEE 6th International Conference on Advanced Computing.
- Yi Wang∗, Hui Zang†, Pravallika Devineni, Michalis Faloutsos, Krishna Janakiraman and Sara Motahari (2014). Which phone will you get next: observing trends and predicting the choice. 2014 IEEE.
- Tri Doan, Jugal Kalita (2015). Selecting Machine Learning Algorithms using Regression Models. 2015 IEEE 15th International Conference on Data Mining Workshops.