=make_blobs(centers=np.array([[3,3],[0,0]]), cluster_std=0.5)
x, y ==0]=-1
y[yprint(x[:4])
print(y[:4])
[[-1.18004026 -0.61965393]
[-0.57032133 0.96112062]
[ 3.67974571 2.09263418]
[ 0.16344673 -0.22585435]]
[ 1 1 -1 1]
Alexandre Dauphin
In this section we discuss yet anotehr linear classifier, the perceptron. The perceptron is a machinea learning algorithm that allows one to separate with an hyperplane a linearly separable dataset.
To this end, we use the function make_blobs
of scikit learn to generate a dataset with two labelled cluters in two dimensions.
x, y =make_blobs(centers=np.array([[3,3],[0,0]]), cluster_std=0.5)
y[y==0]=-1
print(x[:4])
print(y[:4])
[[-1.18004026 -0.61965393]
[-0.57032133 0.96112062]
[ 3.67974571 2.09263418]
[ 0.16344673 -0.22585435]]
[ 1 1 -1 1]
Figure 1 shows the auto generated dataset.
The goal of the perceptron algorithm is to find an hyperplane that separates these two clusters. Let us therefore consider the line
\[f(x) = w_0+w_1x_1+w_2 x_2=0\]
The normal vector to this line is given by \(\mathbf{w}^*=\mathbf{w}/\parallel \mathbf{w} \parallel\).
Show that
for any point on the line \[ {\mathbf{w}^*}^T \mathbf{x}_1 = -w_0.\]
for any two points \(\mathbf{x_1}\) and \(\mathbf{x}_2\) one the line \[ {\mathbf{w}^*}^T (\mathbf{x_1}-\mathbf{x}_2)=0.\]
The signed distance between the line and a point is given by
\[{\mathbf{w}^*}^T(\mathbf{x}-\mathbf{x}_0)=\frac{\mathbf{w}^T\mathbf{x}+w_0}{\parallel \mathbf{w}\parallel}=\frac{f(x)}{\parallel \mathbf{w}\parallel}\]
The signed distance between the line and a point discussed in the previsou section offers a natural way to define a loss function. For each datapoint\({x_i,y_i}\), we would like to maximize the product
\[y_i (\mathbf{w}^T\mathbf{x}+w_0).\]
Indeed when \(y_i\) and \(\mathbf{w}^T\mathbf{x}+w_0\), this quantity is positive.
Therefore, we define the loss function to minimize the mean over the missclassified examples
\[L =-\sum_{i=1}^{N_\text{misclassified}} y_i(\mathbf{w}^T\mathbf{x}_i+w_0)\]
color = y.astype(str)
color[y>0] = "blue"
color[y<0] = "red"
frames = [go.Frame(data=[go.Scatter(x=x1, y=vec[i,:],mode='lines')],layout=go.Layout(title_text=f'step:{i}, Loss:{loss[i]:.2f}')) for i in range(loss.size)]
buttons = [dict(label="Play",method="animate",
args=[None, {"frame": {"duration": 100, "redraw": True},
"fromcurrent": True,
"transition": {"duration": 300,"easing": "quadratic-in-out"}}]),
dict(label="Pause",method="animate",
args=[[None], {"frame": {"duration": 0, "redraw": False},"mode": "immediate","transition": {"duration": 0}}]),
dict(label="Restart",method="animate",
args=[None])]
Fig = go.Figure(
data=[go.Scatter(x=x1, y= vec[0,:],mode='lines',name = 'line'),
go.Scatter(x=x[:,0], y=x[:,1], mode="markers", marker_color=color,name='data',
hovertemplate='x:%{x:.2f}'
+'<br>y:%{y:.2f}</br><extra></extra>')],
layout=go.Layout(
xaxis=dict(range=[x[:,0].min()-2, x[:,0].max()+2], autorange=False),
yaxis=dict(range=[x[:,1].min()-2, x[:,1].max()+2], autorange=False),
updatemenus=[dict(
type="buttons",
buttons=buttons)]
),
frames= frames
)
Fig.show()