Complete Machine Learning Project for Beginner

Muhammad Iqbal bazmi

Published in

Analytics Vidhya

5 min readOct 15, 2019

Iris Classification: A Multi-class Classification

Hi!, In this blog, I am gonna show you how to build a complete Machine Learning(ML) project for beginners.

The problem that we are gonna solve today is Iris Flower Classification: A multiclass classification problem and It is also called “Hello World” of Machine Learning.

The Agenda for today:

Machine Learning
Template of predictive Analytics project

Machine Learning

Machine Learning is a way to make the computer able to solve specific tasks without being explicitly programmed.

In reality: There is nothing called learning in Machines. There is some Statistical model that is being used in Machine Learning(ML) to make a decision. (ref: The Hundred-Page Machine Learning Book).

Machine learning

Machine learning ( ML) is the scientific study of algorithms and statistical models that computer systems use to…

en.wikipedia.org

I assume here, that you are somehow familiar with the basics of machine learning.

Let’s see the template to solve real-world Machine Learning(ML) projects.

Template of ML (Predictive Analytics) project

(Inspired by- Jason Brownlee)

Machine Learning Mastery

Making developers awesome at machine learning.

machinelearningmastery.com

Prepare Problem

Load libraries
Load dataset

2. Summarize Data

Descriptive statistics
Data Visualizations

3. Prepare Data

Data Cleaning
Feature Selection
Data Transform

4. Evaluate Algorithms

Split-out validation dataset
Test options and evaluation metric
Spot Check Algorithms
Compare Algorithms

5. Improve Accuracy

Algorithm Tuning
Ensembles

6. Finalize Model

Predictions on the validation dataset
Create Standalone model on an entire training dataset
Save the model for later use

It’s recommended to use Jupyter notebook although you can use any IDE. But, I am gonna use Jupyter notebook codes in this blog.

Prepare Problem

— Load Libraries

Before proceeding further for model development first load important libraries.

fig 1. Load important libraries ( Inspired by Jason Brownlee)

— Load dataset

the dataset should be in the same folder where your python file is.

To understand the dataset in detail just click here. then go further in this blog.

fig 2. load dataset

Summarize Data

— The dimension of the dataset

In the above fig 3. It shows that there are 150 rows and 6 columns in the given dataset.

— Peek at the Data

— Bottom of the Data

— Description

— Class Distribution

Class Distribution shows that how many classes are there in the given dataset and how many instances for each class.

Data Visualization

— Pair Plot

fig 8. code for pair plot using Seaborn

By seeing the above Visualization (Pair Plot) It is very clear that two features petal_length and petal_width are import features.

Evaluate Some Algorithms

Now let’s create some models of the given data and estimate their accuracy on unseen data.

Steps to Evaluate Algorithms

Separate out a validation dataset.
Setup the test harness to use 10-fold cross-validation.
Build 5 different models to predict species from flower measurements.
Select the best model.

A Gentle Introduction to k-fold Cross-Validation

Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in…

machinelearningmastery.com

Please drop the columns ‘Id’ before going further using dataset.drop(columns='Id')

Question: Why I used an array instead of a pandas Dataframe?

Answer: Because simple array is computationally faster than a pandas Dataframe.

Spot-Check Algorithms

In the above picture, It is showing that SVM is the best choice among all to be selected as an Algorithm to make our model

Make predictions

SVM was the most accurate model that we tested. So, I am gonna make a prediction using the Support Vector Machine(SVM).

— let’s create a model

fig 12. Creating a model

— let’s fit the model

fig 13. fitting the model using fit() method

— Let’s make predictions

fig 14. predicting result on unseen data

— Let’s check what is the accuracy of this model

— Let’s see the confusion matrix of the predicted result

— let’s see the classification report

Save the model for later use

— save the model to the disk

— sometime later

Please check the below link.

Save and Load Machine Learning Models in Python with scikit-learn

Finding an accurate machine learning model is not the end of the project. In this post, you will discover how to save…

machinelearningmastery.com

You can use the above template to solve any real-world Classification problem.

You are welcome for any queries and questions.

Complete Machine Learning Project for Beginner

Iris Classification: A Multi-class Classification

Machine Learning

Machine learning

Machine learning ( ML) is the scientific study of algorithms and statistical models that computer systems use to…

Template of ML (Predictive Analytics) project

Machine Learning Mastery

Making developers awesome at machine learning.

Prepare Problem

Summarize Data

Data Visualization

Evaluate Some Algorithms

A Gentle Introduction to k-fold Cross-Validation

Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in…

Make predictions

Save the model for later use

Save and Load Machine Learning Models in Python with scikit-learn

Finding an accurate machine learning model is not the end of the project. In this post, you will discover how to save…

Written by Muhammad Iqbal bazmi