Perceptron

Perceptron

Perceptron is invented before the loss function

Linear classifier: Notation
• The observation xT = [x1, … , xn] is a real-valued vector (d is the number of feature dimensions)
• The class label y ∈ Y is drawn from some finite set of class labels.
• Usually the output vocabulary, Y, is some set of strings. For
convenience, though, we usually map the class labels to a sequence
of integers, Y = [1, … , v} , where � is the vocabulary size

Linear classifier: Definition

A linear classifier is defined by

$$
f(x) = \text{argmax } Wx + b
$$

where:

$w_k, b_k$ are the weight vector and bias corresponding to class $k$, and the argmax function finds the element of the vector $wx$ with the largest value.

There are a total of $v(d + 1)$ trainable parameters: the elements of the matrix $w$.

Example

Notice that in the two-class case, the equation

$$
f(x) = \text{argmax } Wx + b
$$

Simplifies to

The class boundary is the line whose equation is

$$
(w_2 - w_1)^T x + (b_2 - b_1) = 0
$$

Gradient descent

Gradient descent

Suppose we have training tokens $(x_i, y_i)$, and we have some initial class vectors $w_1$ and $w_2$. We want to update them as

$$
w_1 \leftarrow w_1 - \eta \frac{\partial \mathcal{L}}{\partial w_1}
$$

$$
w_2 \leftarrow w_2 - \eta \frac{\partial \mathcal{L}}{\partial w_2}
$$

…where $\mathcal{L}$ is some loss function. What loss function makes sense?

Zero-one loss function

Zero-one loss function

The most obvious loss function for a classifier is its classification error rate,

$$
\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} \ell(f(x_i), y_i)
$$

Where $\ell(\hat{y}, y)$ is the zero-one loss function,

$$
\ell(f(x), y) =
\begin{cases}
0 & \text{if } f(x) = y \
1 & \text{if } f(x) \neq y
\end{cases}

Non-differentiable!

The problem with the zero -one loss function is that it’s not differentiable:

Integer vectors: One-hot vectors, A one-hot vector is a binary vector in which all elements are 0 except for a single element that’s equal to 1.

Drive the perceptron

The perceptron learning algorithm

a mistake happens here (function)

Author

Karobben

Posted on

2024-02-07

Updated on

2024-02-14

Licensed under

Comments