= 250
n_ind = dict(doses=range(1,7),deaths=[28,53,93,126,172,197])
data = pd.DataFrame.from_dict(data)
df 'p_dying'] = df['deaths']/n_ind df[
Logistic regression
1 Binary classification
In this section, we consider the classification task. In particular, we focus here on the binary classification, whcih means that a datapoint can either be in class \(0\) or in class \(1\).
Let us consider a concrete dataset, the toxity dataset. In this dataset, scientists were studying the toxity of a substance to a population of insects. They therefore administrated a serie of toxic doses and measured the number of deaths in the population of \(250\) individuals. The results of the study are shown in Figure 1.
Code
=df, x='doses', y='deaths') px.bar(data_frame
Here, the goal of the classification would be to predict, given a number of dosis whether an individual would be death (class \(0\)) or alive (class \(1\)).
2 Logistic regression
We can here define a probability distribution of dying for each value of the doses by dividing by the total number of individuals (here \(250\)). We can therefore see the classification task as fitting this probability distribution. A simple trial function is the sigmoid function
\[ \sigma(x) =\frac{1}{1+e^{-(ax+b)}}\] and the goal is to find the values of \(a\) and \(b\) that fit the best the data.
When we divide the number of deaths by the number of individuals, we are effectively evaluating the estimator of the parameter \(p\) of the Binomial distribution. As the reminder, the Binomial distribution charatcerizes the probability of having \(k\) successes over \(n\) independent samples of a Bernouilli distribution with succes probability \(p\).
Notice how we transformed our classification problem into a regression problem. We will come back to this probabilisitc view of machine lerning in the next lecture.
def sigmoid(x,a,b):
return 1/(1 + np.exp(-(a*x+b)))
Figure 2 shows a trial of the fit for the values of \(a=0.5\) and \(b=-2\). You can play around with the values of \(a\) and \(b\) to find a more decent fit in the next figure (only in the notebook).
Code
= 0.5, -2
a, b
= np.arange(0,7,0.1)
vecd
= go.Figure()
Fig
=df['doses'], y=df['p_dying'], name='toxity dataset')
Fig.add_bar(x
=f'sigmoid: a={a}, b={b}',x=vecd,y=sigmoid(vecd,a,b)))
Fig.add_trace(go.Scatter(name='doses',yaxis_title='p') Fig.update_layout(xaxis_title
Show that, for the logistic regression,
\[\log(\frac{p}{1-p})=ax+b\]
3 The dataset
Until now, we have considered the viewpoint of the whole population. But generally, the dataset is built from the data of each individual to which is associated a tuple \((x,y)\), where \(x\) is the number of dosis and \(y\) is the class (\(0\): dead, \(1\): alive). To this end, we construct a dataset with \(n_\text{ind}\times n_\text{doses}\) individuals.
for i, d in zip(df['doses'], df['deaths']):
= np.zeros((n_ind,2))
vec1 0], vec1[:d,1] = i, 1
vec1[:,= vec1 if i ==1 else np.concatenate((vec,vec1))
vec
np.random.shuffle(vec)= vec[:,0], vec[:,1] x, y
print(x[:20])
print(y[:20])
[5. 2. 5. 2. 2. 3. 1. 1. 1. 6. 6. 3. 3. 6. 6. 6. 2. 6. 3. 4.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1.]
4 Loss function: Binary Cross Entropy
We now want to automatically find the values of \(a\) and \(b\). To this end, we have to define a loss function. We will here use the binary cross entropy
\[ BCE = -\frac{1}{N}\sum_{i=1}^N y_i \log [\sigma(x_i,a,b)] +(1-y_i)\log[1-\sigma(x_i,a,b)].\]
Let us have a look of the loss landscape of the binary cross entropy with respect to the paramters \(a\) and \(b\), as depicted in Figure 3. We actually observe that the landscape is convex for this choice of the Loss function. In this particular case, the latter can proven.
Code to generate the data for the plot
= np.arange(-5,5,0.1)
vec_a = np.arange(-5,5,0.1)
vec_b = np.zeros((vec_a.size,vec_b.size))
matz
for i, a1 in enumerate(vec_a):
for j, b1 in enumerate(vec_b):
= dict(a=a1, b=b1)
p = BCE(x, y, sigmoid, params=p) matz[i,j]
Code
= go.Figure()
fig
=matz,x=vec_b, y=vec_a,hovertemplate=
fig.add_contour(z'a:%{y:.2f}'
+'<br>b:%{x:.2f}</br>'
+'f:%{z:.2f}<extra></extra>')
= dict(width=600,
d =600,
height={'title':'b'},
xaxis={'title':'a'}
yaxis
)
fig.update_layout(d) fig.show()
What is the value loss for your guess? Is it close to the minimum?
5 Gradient of the Loss function
Let us compute the gradient with respect to \(a\) and \(b\).
What is the gradient of the sigmoid function \(\sigma(x)\)?
\(\sigma'(x) = \sigma(x) (1-\sigma(x))\)
We find
\[\begin{aligned} \partial_a BCE &= \frac{1}{N}\sum_{i=1}^N (\sigma_i -y_i)x_i\\ \partial_b BCE &= \frac{1}{N}\sum_{i=1}^N (\sigma_i -y_i),\\ \end{aligned}\]where we have defined \(\sigma_i\equiv \sigma(ax_i+b)\).
Although the problem is convex, there is no easy way to derive an analytical closed form. This comes from the non-linearity introduced by the sigmoid. Libraries like scikit-learn
propose to use different algorithm such as e.g. conjugate gradient. In the next section, we will consider the stochastic gradient descent approach.
6 Stochastic gradient descent
Implement the function grad_BCE(x, y, p)
where p=dict(a=a, b=b)
is a dictionary containing the the parameters \(a\) and \(b\). The function should return an array of the form np.array([grad_a, grad_b])
.
Having computed the gradients, we can now apply the stochastic gradient descent algorithm.
= 2, 1
a0, b0
= dict(a=a0, b=b0)
pini = dict(loss=BCE, grads=grad_BCE, fun=sigmoid)
ll
= sgd(x, y, pini, ll, niter=int(1.5E2)) trackers
Figure 4 shows the trajectory of the stochastic gradient descent algorithm.
Code
= np.min(trackers['a']), np.max(trackers['a'])
amin, amax = np.min(trackers['b']), np.max(trackers['b'])
bmin, bmax
= 100
n =(amax-amin)/n, (bmax-bmin)/n
stepa, stepb
= np.arange(amin-19*stepa, amax+20*stepa, stepa)
vec_a = np.arange(bmin-19*stepb, bmax+20*stepb,stepb)
vec_b = np.zeros((vec_a.size,vec_b.size))
matz
for i, a1 in enumerate(vec_a):
for j, b1 in enumerate(vec_b):
= dict(a=a1, b=b1)
p = BCE(x, y, sigmoid, params=p)
matz[i,j]
= go.Figure()
fig =matz,x=vec_b, y=vec_a,
fig.add_contour(z=
hovertemplate'a:%{y:.2f}'
+'<br>b:%{x:.2f}</br>'
+'f:%{z:.2f}<extra></extra>')
= np.arange(0,len(trackers['a']),100)
mask = np.array(trackers['a'])[mask],np.array(trackers['b'])[mask], np.array(trackers['loss'])[mask]
veca, vecb, vecl =vecb, y=veca, name=f'SGD',text=vecl, mode='lines+markers',
fig.add_scatter(x=
hovertemplate'a:%{y:.2f}'
+'<br>b:%{x:.2f}</br>'
+'f:%{text:.2f}<extra></extra>')
fig.show()
7 Accuracy and other metrics
So far, we did not discuss about the choice of the metric. The first metric that comes into mind is the accuracy, i.e.
\[ \text{acc}= \frac{\# \text{correct predictions}}{\#\text{dataset}},\]
which considers the ratio between the number of correct predictions and the number of elements in our training set.
let us compute the accuracuy in terms of the training.
= np.array(trackers['a']), np.array(trackers['b']), np.array(trackers['loss'])
veca, vecb, vecl = np.arange(0,len(veca),100)
mask = veca [mask], vecb[mask], vecl[mask] veca, vecb, vecl
Write the function accuracy(x,y,a,b)
, which returns the accuracy for a given dataset and for the parameters \(a\) and \(b\). Choose the label with the following rule: \(0\) if \(\sigma_i<0.5\) else \(1\).
Code
def accuracy(x,y,a,b):
= sigmoid(x, a, b)
yp >0.5] = 1
yp[yp<=0.5] = 0
yp[ypreturn np.sum(yp==y)/y.size
We then compute the accuracy during the training.
= np.zeros_like(veca)
vec_acc
for i,(a,b) in enumerate(zip(veca,vecb)):
= accuracy(x,y,a,b) vec_acc[i]
Figure 5 shows the value of the Loss function and the accuracy during the training. It is interesting to note that while the Loss function is a smooth function, the accuracy is not smooth and that the variations of Loss function do not directly correspond to the variations of the accuracy. It is therefore important to check not only the Loss function but also the figure of merit of the problem, that can be for example the accuracy.
Code
# Create figure with secondary y-axis
= make_subplots(specs=[[{"secondary_y": True}]])
fig
# Add traces
fig.add_trace(=vecl, name="Loss"),
go.Scatter(y=False,
secondary_y
)
fig.add_trace(=vec_acc, name="Accuracy"),
go.Scatter(y=True,
secondary_y
)
# Set x-axis title
="iterations")
fig.update_xaxes(title_text
# Set y-axes titles
="Loss", secondary_y=False)
fig.update_yaxes(title_text="Accuracy", secondary_y=True)
fig.update_yaxes(title_text
fig.show()
Depending on the applications, it might be advantageous to define different figures of merits. The later can be done with the help of the confusion matrix
Positive (Prediction) | Negative (Prediction) | |
---|---|---|
Positive (Ground Truth) | True Positive | False negative |
Negative (Ground Truth) | False positive | True Negative |
Let us now construct the confusion matrix for our model.
= sigmoid(x, a, b)
yp >0.5] = 1
yp[yp<=0.5] = 0
yp[yp= confusion_matrix(y,yp) cm
Figure 6 shows the confustion matrix for the logistic regression.
Code
= ["Alive(P)", "Dead(P)"], ["Alive(GT)", "Dead(GT)"]
X, Y = px.imshow(cm, x=X, y=Y, text_auto=True,color_continuous_scale='Blues')
fig
fig.show()
We readily observe that the accuracy correponds to the trace of this matrix normalized by the number of individuals.
print(f'{np.trace(cm)/np.sum(cm):.3f}')
0.714
There are other metrics that one can use for the predictions
Metric | Description | Formula |
---|---|---|
Precision | Correct positive predictions over the total of positive predictions | \(\frac{TP}{TP+FP}\) |
Recall | Correct positive predictions over the total of actual positives | \(\frac{TP}{TP+FN}\) |
F1 score | Harmonic mean of precision and recall | \(\frac{2\text{Recall}\,\text{Precision}}{\text{Recall}+\text{Precision}}\) |
Compute the different metrics for the trained logistic regression model.
precision = cm[0,0]/(cm[0,0]+cm[1,0])
recall = cm[0,0]/(cm[0,0]+cm[0,1])
F1_score = 2*recall*precision/(precision+recall)
print(f'Precision: {precision:.3f}')
print(f'Recall: {recall:.3f}')
print(f'F1 score: {F1_score:.3f}')
Let us perform the same analysis on another dataset, the brest cancer dataset of scikit learn.
=load_breast_cancer(return_X_y=True)
x1, y1 = x1[:,3]
x1
print(x1[:10])
print(y1[:10])
[1001. 1326. 1203. 386.1 1297. 477.1 1040. 577.9 519.8 475.9]
[0 0 0 0 0 0 0 0 0 0]
= x1.reshape([np.size(x1),1])
x1
= LogisticRegression().fit(x1,y1)
clf = clf.predict(x1)
yp1
= confusion_matrix(y1,yp1) cm
Code
= ["Malignous(P)", "Benign(P)"], ["Malignous(GT)", "Benign(GT)"]
X1, Y1 = px.imshow(cm, x=X1, y=Y1, text_auto=True,color_continuous_scale='Blues')
fig
fig.show()
Compute the different metrics for the trained logistic regression model.