Deep Learning
Deep Learning for Computer Vision
Learning types:
- Reinforcement learning - learn to select an action of maximum payoff
- Supervised learning - given input, predict output.
2
types: regression (continuous values), classification (labels) - Unsupervised learning - discover internal representation of an input (also includes self-supervised learning)
Artificial neuron representation:
- $\mathbf{x}$ - input vector (“synapses”) is just the given features
- $\mathbf{w}$ - weight vector which regulates the importance of each input
- $b$ - bias which adjusts the weighted values, i.e., shifts them
- $\mathbf{z}$ - net input vector $\mathbf{w}^{\top}\mathbf{x}+b$ which is linear combination of inputs
- $g(\cdot)$ - activation function through which the net input is passed to introduce non-linearity
- $\mathbf{a}$ - the activation vector $g(\mathbf{z})$ which is the neuron output vector
Artificial neural network representation:
- Each neuron receives inputs from inputs neurons and sends activations to output neurons
- There are multiple neuron layers and the more there are, the more powerful the network is (usually)
- The weights learn to adapt to the required task to produce the best results based on some loss function
Popular activation functions - ReLU, sigmoid and softmax, the last two of which are mainly used in the last layer before the error function:
\[\text{ReLU}(x)=\max(0, x)\] \[\sigma(x)=\frac{1}{1+\exp(-x)}\] \[\text{softmax}(x)=\frac{\exp(x)}{\sum_{x'\in\mathbf{x}}\exp(x')}\]Popular error functions - MSE (for regression), Cross Entropy (for classification):
\[\text{MSE}(\hat{\mathbf{y}}, \mathbf{y})=\frac{1}{N}\sum_{n=1}^N(y_n-\hat{y}_n)^2\] \[\text{CE}(\hat{\mathbf{y}}, \mathbf{y})=-\sum_{n=1}^N y_n \log \hat{y}_n\]Backpropagation - weight update algorithm during which the gradient of the error function with respect to the parameters $\nabla_{\mathbf{w}}E$ is calculated to find the update direction such that the updated weights $\mathbf{w}$ iteratively would lead to minimized error value.
\[\mathbf{w}\leftarrow \mathbf{w} - \alpha \nabla_{\mathbf{w}} E(\mathbf{w})\]