Comprehensive CNN in Math
Math for CNN
related information: Piotr Skalski from Medium
Main Formula
$$
y_{i,j,k’} = b_{k’} + \sum_{k=1}^{K} \sum_{p=1}^{P} \sum_{q=1}^{Q} W_{p,q,k,k’} ; x_{i+p-1,,j+q-1,,k}
$$
Why it look so confusing?
The triple summation is simply an expanded form of a matrix multiplication (dot product), written out to show every element-wise operation. In practice, this whole operation is just a generalized matrix (or tensor) dot product.
where:
- $ x_{i,j,k} $: input feature map at position $(i, j)$, channel $k$
- $ W_{p,q,k,k’} $: kernel weight at position $(p, q)$, from input channel $k$ to output channel $k’$
- $ b_{k’} $: bias for output channel $k’$
- $ y_{i,j,k’} $: output feature map at position $(i, j)$, channel $k’$
With stride $ S $:
$$
y_{i,,j,,k’} = b_{k’} + \sum_{k=1}^{K} \sum_{p=1}^{P} \sum_{q=1}^{Q} W_{p,q,k,k’} ; x_{S\cdot i+p-1,,S\cdot j+q-1,,k}
$$
General CNN layer in vector form:
$$
z^{(\ell)} = W^{(\ell)} * a^{(\ell-1)} + b^{(\ell)}
$$
$$
a^{(\ell)} = g(z^{(\ell)})
$$
In the context of CNNs and neural networks, $g(z^{(\ell)})$ represents the activation function
Kernal Functoins
Common Kernel Types in CNNs
There are different types of kernels (also called filters) in CNNs, each serving a specific purpose to extract important features from input data. Here are some main types and why they’re preferred for different tasks:
1. Edge Detection Kernels
- Examples: Sobel, Prewitt, Scharr, Roberts
- Why They’re Good: They highlight edges in images by calculating differences in pixel intensities. Useful for identifying boundaries, shapes, and object outlines.
- Typical Use: Early/convolutional layers to capture structure, shapes.
| Name | Kernel Example | Detects |
|---|---|---|
| Sobel X | $\begin{bmatrix}-1 & 0 & 1\\ -2 & 0 & 2\\ -1 & 0 & 1\end{bmatrix}$ | Horizontal edges |
| Sobel Y | $\begin{bmatrix}-1 & -2 & -1\\ 0 & 0 & 0\\ 1 & 2 & 1\end{bmatrix}$ | Vertical edges |
2. Blur/Smoothing Kernels
-
Examples: Average (Box), Gaussian Blur
-
Why They’re Good: Reduce noise and detail, making feature extraction more robust by focusing on larger-scale patterns rather than tiny irrelevant details.
-
Typical Use: Preprocessing, denoising, sometimes within networks before strong feature detection.
-
Average Blur: Each value replaced by the average inside the kernel window.
-
Gaussian Blur: Weights values by a Gaussian function.
3. Sharpening Kernels
- Examples: Laplacian, Unsharp Mask
- Why They’re Good: Emphasize transitions in intensity. Useful for enhancing details and features so model can find fine structure.
- Typical Use: Image enhancement, sometimes as part of feature engineering.
4. Emboss/Outline Kernels
- Why They’re Good: Emphasize specific patterns (e.g., outlines, textures), helpful for texture analysis, stylization, and making geometric features stand out.
5. Learned Kernels in CNNs
- Why They’re Good: During training, CNNs automatically learn the kernels/filters best suited for the task:
- Early layers: tend to learn edge/texture-like filters (as above).
- Deeper layers: learn more complex patterns, such as parts of objects, motifs, textures, or even whole objects.
6. Specialized Kernels
- Dilated Kernels: Increase receptive field without increasing computation, good for context aggregation (e.g., semantic segmentation).
- Depthwise Separable Kernels: Used in efficient architectures (e.g., MobileNet), separate spatial and depth-wise filtering to save computation.
- Grouped Convolutions: Enables splitting feature processing, used in ResNeXt, AlexNet.
Summary Table:
| Kernel Type | Typical Purpose | Example Layer/Use |
|---|---|---|
| Edge Detection | Outline objects/features | Early feature extraction |
| Blur (Smoothing) | Noise reduction, abstraction | Preprocessing, denoising |
| Sharpening | Feature enhancement | Enhancement, texture extraction |
| Emboss/Outline | Make outlines/textures stand out | Texture/style analysis |
| Dilated | Large context with fewer parameters | Segmentation, ASPP modules |
| Depthwise/Grouped | Computational efficiency | Mobile/efficient CNNs |
| Learned (Generic) | Task-adaptive (via training) | All CNN layers (esp. deep ones) |
In summary:
- Early CNN layers often end up learning kernels similar to classic edge or blur kernels, because these are useful primitives for interpreting raw visual data.
- Deeper layers learn more complex, task-specific kernels.
- Kernel choice/learning enables CNNs to adapt to different tasks in computer vision—object recognition, segmentation, style transfer, etc.
Comprehensive CNN in Math








