Outline:
- Problem statement
- What is the Data
- Workstream
- Heatmap
- Some plot
- Machin Learning
- Compared models
- Future work
Problem Statement
I am always thinking about startup business, for example, if I were to open a business, how could I deal with that, and what I need to know before I begin?
Technology and electronics are my passion, in 2019 I started a small business that offered electronic services with some HW/SW Support in one of Najran university’s bazaars. During that time I faced many challenges and broke many barriers. The business started with success, but the bazaars had a limited duration, and finding high-quality phone pieces was difficult because of my ignorance of the various suppliers; I realize now that I needed to study the market first.
Searching for a dataset related to my business, I found Mr. Bob on Kagel asking for users to Classify the price ranges of an array of devices based on various features. With the skills I have acquired over the past 14 weeks, I am beginning to solve this problem with Data Science.
What is the dataset
In this project, we obtain to explore and analyze a dataset that hold specifications of 2000 mobile phones as well as attempt to predict best price ranges for a list of mobile phones in the market by applying various machine learning algorithm.
Target :
Our Target is the price range, we have four range [0 , 1, 2, 3]
The target variable indicates as below:
- 0 (low cost)
- 1 (medium cost)
- 2 (high cost)
- 3 (very high cost)
The problem can be solved as a classification problem. Since there are four discrete classes.
Workstream
Heatmap
to understand the relationship between the feature.
Pair plot
To explore how data change over the deferent type of price range
Other EDA
Machine learning
Using machine learning to build a system can predict the range of price depending on what is the specification of mobile.
Applied Models
- Random forest model (rf)
- K nearest neighbor model (knn)
- Decision tree model (dt)
- Stacking model (staked)
Improve stacked Model
Optimizing score Using grid search so we achieve those accurcy as shown on the plot below
split the data and apply the stack model
The score jumps from 92% to 94% that means the stacking mode is doing good after optimizing with the best parameter
Future work
Tech stack
conclusion
this was the final project of the Data Science Bootcamp with Coding Dojo Academy & Saudi Digital Academy you can find the whole project on my GitHub on the below link