Multi-layer Neural Nets

From linear to nonlinear classifiers

  • Linear classifier
    • a linear classifier computes $f(x) = argmax\ Wx$
    • The resulting classifier divides the x-space into Voronoi regions: convex regions with piece-wise linear boundaries
  • Nonlinear classifier
    • Not all classification problems have convex decision regions with PWL boundaries!
    • Here’s an example problem in which class 0 (blue) includes values of x near [0.8,0]T, but it also includes some values of x near [0.4,0.9]T
    • You can’t compute this function using: $f(x) = argmax\ Wx$
  • The solution: Piece-wise linear functions
    • Nonlinear classifiers, can be learned using piece-wise linear classification boundaries
    • Nonlinear regression problems, can be learned using piece-wise linear regression
    • In the limit, as the number of pieces goes to infinity, the approximation approaches the desired solution

Multi-layer network

A piece-wise linear function $f(x)$ can be represented by a two-layer neural network. First, the hidden nodes compute:

$$
h_j(x) = \max(0, w_j^{(1)T} x + b_j^{(1)})
$$

Then for PWL regression, the output is a weighted sum of the hidden nodes:

$$
f(x) = w^{(2)T}x + b^{(2)}
$$

…while for PWL classification, the output is the softmax or argmax of such a sum:

$$
f(x) = \text{softmax}(0, W^{(2)T} x + b^{(2)})
$$

Training a two-layer network: Back-propagation

Author

Karobben

Posted on

2024-02-18

Updated on

2024-02-20

Licensed under

Comments