4 min to read

# Logistic Regression with Scikit-Learn

Blog 3 in Scikit-Learn series

# Introduction

In my previous Blog, I explained about Linear Regression with Scikit Learn and how it works. Let’s See why Logistic Regression is one of the important topic to understand.

Here’s the **link to my previous article on Linear Regression** in case you missed it.

### Logistic Regression in Python with Scikit-Learn

Logistic Regression is a popular statistical model used for binary classification, that is for predictions of the type this or that, yes or no, etc. Logistic regression can, however, be used for multiclass classification, but here we will focus on its simplest application. It is one of the most frequently used machine learning algorithms for binary classifications that translates the input to 0 or 1.

For example: 0 for negative and 1 for positive.

Some applications of classification are:

- Email: spam / not spam
- Online transactions: fraudulent / not fraudulent
- Tumor: malignant / not malignant

Linear regression is not capable of predicting probability. If you use linear regression to model a binary response variable, for example, the resulting model may not restrict the predicted Y values within 0 and 1. Here’s where logistic regression comes into play, where you get a probability score that reflects the probability of the occurrence at the event.

In the formula of the logistic model,

when **b0+b1X == 0**, then the p will be 0.5,

similarly,**b0+b1X > 0**, then the p will be going towards 1 and

**b0+b1X < 0**, then the p will be going towards 0.

### Logistic Regression on Digits Dataset

**Loading the Data**

The digits dataset is one of datasets scikit-learn comes with that do not require the downloading of any file from some external website. The code below will load the digits dataset.

```
from sklearn.datasets import load_digits
digits = load_digits()
```

**Showing the Images and the Labels**

This section is just to show what the images and labels look like. It usually helps to visualize your data to see what you are working with.

```
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])):
plt.subplot(1, 5, index + 1)
plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
plt.title('Training: %i\n' % label, fontsize = 20)
```

**Splitting Data into Training and Test Sets**

We make training and test sets to make sure that after we train our classification algorithm, it is able to generalize well to new data.

```
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)
```

## Scikit-learn Modeling

**Import the model you want to use**
In sklearn, all machine learning models are implemented as Python classes

```
from sklearn.linear_model import LogisticRegression
```

**Make an instance of the Model**

```
logisticRegr = LogisticRegression()
```

**Training the model on the data, storing the information learned from the data**
Model is learning the relation between digits and labels

```
logisticRegr.fit(x_train, y_train)
```

**Predict labels for new data (new images)**
Uses the information the model learned during the model training process

```
#predict for one image
logisticRegr.predict(x_test[0].reshape(1,-1))
#predict for multiple images
logisticRegr.predict(x_test[0:10])
#for the entire dataset
predictions = logisticRegr.predict(x_test)
```

**Model Performance**

`Accuracy = correct predictions / total number of data points`

```
#Use score method to get accuracy of model
score = logisticRegr.score(x_test, y_test)
print(score)
```

**Confusion Matrix**
A confusion matrix is a table that is used to describe the performance of a classification model on a set of test data for which the true values are known. For making confusion matrices more understandable, I will be sharing the result from our Model.

```
from sklearn import metrics
cm = metrics.confusion_matrix(y_test, predictions)
print(cm)
```

At the end, I hope that you can learn the how to use the simple linear regression techniques. You can also find the full project on the GitHub repository.

### Resources

### Continued in next Week….

This was a simple logistic regression with binary classification in next few weeks we will be discussing how to improve the model using more independent variables, stay tunned..

## Comments