Models

Most modern deep learning models are based on artificial neural networks, specifically convolutional neural networks (CNN). During the training process, algorithms use unknown elements in the input distribution to extract features, group objects, and discover useful data patterns. Much like training machines for self-learning, this occurs at multiple levels, using the algorithms to make a inference on a image without annotation at the end.

While not one network is considered perfect, some algorithms are better suited to perform specific tasks or extract specific patterns.

Here some models available in GDL.

Segmentation

UNet

Unet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use concatenation for fusing decoder blocks with skip connections.

Here some implementation found in the config model folder.

class models.unet.UNetSmall(*args: Any, **kwargs: Any)[source]

Main UNetSmall architecture, less deep version of Unet

__init__(classes, in_channels, dropout=False, prob=0.5)[source]

Initialize the UNetSmall.

Parameters:
  • classes (int) – number of classes for output mask (or you can think as a number of channels of output mask).

  • in_channels (int) – number of input channels for the model, default is 3 (RGB images).

  • dropout (bool, optional) – spatial dropout rate in range. Defaults to False.

  • prob (float, optional) – dropout probability. Defaults to 0.5.

forward(input_data)[source]

Foward function use during trainning.

Parameters:

input_data (Tensor) – tensor containing the image.

Returns:

tensor containing the result from the model.

Return type:

Tensor

class models.unet.UNet(*args: Any, **kwargs: Any)[source]

Main UNet architecture

__init__(classes, in_channels, dropout=False, prob=0.5)[source]

Initialize the UNet.

Parameters:
  • classes (int) – number of classes for output mask (or you can think as a number of channels of output mask).

  • in_channels (int) – number of input channels for the model, default is 3 (RGB images).

  • dropout (bool, optional) – spatial dropout rate in range. Defaults to False.

  • prob (float, optional) – dropout probability. Defaults to 0.5.

forward(input_data)[source]

Foward function use during trainning.

Parameters:

input_data (Tensor) – tensor containing the image.

Returns:

tensor containing the result from the model.

Return type:

Tensor

And an implementation from smp model library. Plus, the folder contains some specific combinaisons the smp model like : unet++, unet pretrained on imagenet, unet with senet154 encoder, unet with resnext101 encoder and more. We invite you to see the config model folder to the complete list on different combinaisons.

DeepLabV3

DeepLabV3 implementation of Rethinking Atrous Convolution for Semantic Image Segmentation paper from smp model library.

Also from the same library, another version of DeepLabV3, named DeepLabV3+ of the Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation paper.

Segformer

Segformer model implementation is based on the SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers paper. The encoder is called from SMP. For more code implementation details check this repo.

class models.segformer.SegFormer(*args: Any, **kwargs: Any)[source]

Segformer Model :param encoder: encoder name :type encoder: str :param in_channels: number of bands/channels :type in_channels: int :param classes: number of classes :type classes: int

HRNet + OCR

HRNet + OCR model implementation is based on the HRNet paper and OCR paper. For more code implementation details check this repo.

class models.hrnet.hrnet_ocr.HRNet(*args: Any, **kwargs: Any)[source]

High Resolution Network (hrnet_w48_v2) with Object Contextual Representation module

Parameters:
  • pretrained (bool) – use pretrained weights

  • in_channels (int) – number of bands/channels

  • classes (int) – number of classes