Comprehensive CNN in Math

Math for CNN

related information: Piotr Skalski from Medium

Main Formula

$$
y_{i,j,k’} = b_{k’} + \sum_{k=1}^{K} \sum_{p=1}^{P} \sum_{q=1}^{Q} W_{p,q,k,k’} ; x_{i+p-1,,j+q-1,,k}
$$

Why it look so confusing?

The triple summation is simply an expanded form of a matrix multiplication (dot product), written out to show every element-wise operation. In practice, this whole operation is just a generalized matrix (or tensor) dot product.

where:

  • $ x_{i,j,k} $: input feature map at position $(i, j)$, channel $k$
  • $ W_{p,q,k,k’} $: kernel weight at position $(p, q)$, from input channel $k$ to output channel $k’$
  • $ b_{k’} $: bias for output channel $k’$
  • $ y_{i,j,k’} $: output feature map at position $(i, j)$, channel $k’$

With stride $ S $:

$$
y_{i,,j,,k’} = b_{k’} + \sum_{k=1}^{K} \sum_{p=1}^{P} \sum_{q=1}^{Q} W_{p,q,k,k’} ; x_{S\cdot i+p-1,,S\cdot j+q-1,,k}
$$

General CNN layer in vector form:
$$
z^{(\ell)} = W^{(\ell)} * a^{(\ell-1)} + b^{(\ell)}
$$
$$
a^{(\ell)} = g(z^{(\ell)})
$$

In the context of CNNs and neural networks, $g(z^{(\ell)})$ represents the activation function

Kernal Functoins

Common Kernel Types in CNNs

There are different types of kernels (also called filters) in CNNs, each serving a specific purpose to extract important features from input data. Here are some main types and why they’re preferred for different tasks:

1. Edge Detection Kernels

  • Examples: Sobel, Prewitt, Scharr, Roberts
  • Why They’re Good: They highlight edges in images by calculating differences in pixel intensities. Useful for identifying boundaries, shapes, and object outlines.
  • Typical Use: Early/convolutional layers to capture structure, shapes.
Name Kernel Example Detects
Sobel X $\begin{bmatrix}-1 & 0 & 1\\ -2 & 0 & 2\\ -1 & 0 & 1\end{bmatrix}$ Horizontal edges
Sobel Y $\begin{bmatrix}-1 & -2 & -1\\ 0 & 0 & 0\\ 1 & 2 & 1\end{bmatrix}$ Vertical edges

2. Blur/Smoothing Kernels

  • Examples: Average (Box), Gaussian Blur

  • Why They’re Good: Reduce noise and detail, making feature extraction more robust by focusing on larger-scale patterns rather than tiny irrelevant details.

  • Typical Use: Preprocessing, denoising, sometimes within networks before strong feature detection.

  • Average Blur: Each value replaced by the average inside the kernel window.

  • Gaussian Blur: Weights values by a Gaussian function.

3. Sharpening Kernels

  • Examples: Laplacian, Unsharp Mask
  • Why They’re Good: Emphasize transitions in intensity. Useful for enhancing details and features so model can find fine structure.
  • Typical Use: Image enhancement, sometimes as part of feature engineering.

4. Emboss/Outline Kernels

  • Why They’re Good: Emphasize specific patterns (e.g., outlines, textures), helpful for texture analysis, stylization, and making geometric features stand out.

5. Learned Kernels in CNNs

  • Why They’re Good: During training, CNNs automatically learn the kernels/filters best suited for the task:
    • Early layers: tend to learn edge/texture-like filters (as above).
    • Deeper layers: learn more complex patterns, such as parts of objects, motifs, textures, or even whole objects.

6. Specialized Kernels

  • Dilated Kernels: Increase receptive field without increasing computation, good for context aggregation (e.g., semantic segmentation).
  • Depthwise Separable Kernels: Used in efficient architectures (e.g., MobileNet), separate spatial and depth-wise filtering to save computation.
  • Grouped Convolutions: Enables splitting feature processing, used in ResNeXt, AlexNet.

Summary Table:

Kernel Type Typical Purpose Example Layer/Use
Edge Detection Outline objects/features Early feature extraction
Blur (Smoothing) Noise reduction, abstraction Preprocessing, denoising
Sharpening Feature enhancement Enhancement, texture extraction
Emboss/Outline Make outlines/textures stand out Texture/style analysis
Dilated Large context with fewer parameters Segmentation, ASPP modules
Depthwise/Grouped Computational efficiency Mobile/efficient CNNs
Learned (Generic) Task-adaptive (via training) All CNN layers (esp. deep ones)

In summary:

  • Early CNN layers often end up learning kernels similar to classic edge or blur kernels, because these are useful primitives for interpreting raw visual data.
  • Deeper layers learn more complex, task-specific kernels.
  • Kernel choice/learning enables CNNs to adapt to different tasks in computer vision—object recognition, segmentation, style transfer, etc.
Author

Karobben

Posted on

2025-11-12

Updated on

2025-11-12

Licensed under

Comments