Complete Machine Learning Project for Beginner
Iris Classification: A Multi-class Classification
Hi!, In this blog, I am gonna show you how to build a complete Machine Learning(ML) project for beginners.
The problem that we are gonna solve today is Iris Flower Classification: A multiclass classification problem and It is also called “Hello World” of Machine Learning.
The Agenda for today:
- Machine Learning
- Template of predictive Analytics project
Machine Learning
Machine Learning is a way to make the computer able to solve specific tasks without being explicitly programmed.
In reality: There is nothing called learning in Machines. There is some Statistical model that is being used in Machine Learning(ML) to make a decision. (ref: The Hundred-Page Machine Learning Book).
I assume here, that you are somehow familiar with the basics of machine learning.
Let’s see the template to solve real-world Machine Learning(ML) projects.
Template of ML (Predictive Analytics) project
(Inspired by- Jason Brownlee)
- Prepare Problem
- Load libraries
- Load dataset
2. Summarize Data
- Descriptive statistics
- Data Visualizations
3. Prepare Data
- Data Cleaning
- Feature Selection
- Data Transform
4. Evaluate Algorithms
- Split-out validation dataset
- Test options and evaluation metric
- Spot Check Algorithms
- Compare Algorithms
5. Improve Accuracy
- Algorithm Tuning
- Ensembles
6. Finalize Model
- Predictions on the validation dataset
- Create Standalone model on an entire training dataset
- Save the model for later use
It’s recommended to use Jupyter notebook although you can use any IDE. But, I am gonna use Jupyter notebook codes in this blog.
Prepare Problem
— Load Libraries
Before proceeding further for model development first load important libraries.
— Load dataset
the dataset should be in the same folder where your python file is.
To understand the dataset in detail just click here. then go further in this blog.
Summarize Data
— The dimension of the dataset
In the above fig 3. It shows that there are 150 rows and 6 columns in the given dataset.
— Peek at the Data
— Bottom of the Data
— Description
— Class Distribution
Class Distribution shows that how many classes are there in the given dataset and how many instances for each class.
Data Visualization
— Pair Plot
By seeing the above Visualization (Pair Plot) It is very clear that two features petal_length and petal_width are import features.
Evaluate Some Algorithms
Now let’s create some models of the given data and estimate their accuracy on unseen data.
Steps to Evaluate Algorithms
- Separate out a validation dataset.
- Setup the test harness to use 10-fold cross-validation.
- Build 5 different models to predict species from flower measurements.
- Select the best model.
Please drop the columns ‘Id’ before going further using dataset.drop(columns='Id')
Question: Why I used an array instead of a pandas Dataframe?
Answer: Because simple array is computationally faster than a pandas Dataframe.
Spot-Check Algorithms
In the above picture, It is showing that SVM is the best choice among all to be selected as an Algorithm to make our model
Make predictions
SVM was the most accurate model that we tested. So, I am gonna make a prediction using the Support Vector Machine(SVM).
— let’s create a model
— let’s fit the model
— Let’s make predictions
— Let’s check what is the accuracy of this model
— Let’s see the confusion matrix of the predicted result
— let’s see the classification report
Save the model for later use
— save the model to the disk
— sometime later
Please check the below link.
You can use the above template to solve any real-world Classification problem.
You are welcome for any queries and questions.