

To get started you need to install glasses, this can be done through pip

pip install git+


Segmentation models can be found in glasses.models.segmentation. Easily

import torch
from glasses.models.segmentation import UNet

x = torch.randn((1,1, 384, 384))
model = UNet(n_classes=2)
out = model(x)
torch.Size([1, 2, 384, 384])

Change Encoder

In glasses you can on the fly change the encoder of each segmentation model. Each segmentation model inherits from SegmentationModule and expects a Encoder instance.

All glasses classification models are composed of an encoder and a head, thus changing the encoder is as easy as pass it as a parameter

from glasses.models.classification.resnet import ResNetEncoder

x = torch.randn((1,1, 384, 384))
model = UNet(encoder=ResNetEncoder, n_classes=2)
out = model(x)
torch.Size([1, 2, 192, 192])

Notice how the output is twice as small as in the standard u-net, this is why resnet has 4 stages, a.k.a four layers that reduce by half the input resolution. To match the correct input shape we have to upsample one more. In other words, we need to increase the widths of the net decoder.

Similar to classification models, each segmentation model is composed by three sub-modules: encoder, decoder and head. So, we can easily compose them to create any custom model.

from glasses.models.segmentation.unet import UNetDecoder
from functools import partial

x = torch.randn((1,1, 384, 384))
model = UNet(encoder=ResNetEncoder, decoder=partial(UNetDecoder, widths=[512, 256, 128, 64, 32]), n_classes=2)
out = model(x)
torch.Size([1, 2, 384, 384])

We used partial to change the widths parameter of the decoder to match the encoder’s stages. Each segmentation model has the .from_encoder method that takes a model as input and automatically the model with that model’s encoder.

from glasses.models import AutoModel

x = torch.randn((1,1, 384, 384))
model = UNet.from_encoder(model=partial(AutoModel.from_name, 'efficientnet_b1'), n_classes=2)
out = model(x)
torch.Size([1, 2, 192, 192])

Pretrained encoders

Easily, we can pass a pretrained network using AutoModel. In this case, pretrained models on ImageNet expects an input with 3 channels

from glasses.models import AutoModel

x = torch.randn((1,3, 384, 384))
model = UNet.from_encoder(model=partial(AutoModel.from_pretrained, 'efficientnet_b1'), in_channels=3, n_classes=2)
out = model(x)
INFO:root:Loaded efficientnet_b1 pretrained weights.

torch.Size([1, 2, 192, 192])

What if we would like to use an input with different channels than 3? We need to replace the stem.

I am working on a way to load only a specific subset of weights, so we can directly create a model with a different stem but with all the rest of the weights pretrained

from glasses.models.classification.resnet import ResNetStem
def get_encoder(*args, **kwargs):
    model = AutoModel.from_pretrained('resnet50')
    # replace the stem
    model.encoder.stem = ResNetStem(1, model.encoder.start_features)
    return model.encoder
x = torch.randn((1,1, 384, 384))
model = UNet(encoder=get_encoder, in_channels=1, n_classes=2)
out = model(x)
INFO:root:Loaded resnet50 pretrained weights.

torch.Size([1, 2, 192, 192])

The APIs are shared from all segmentation models. For example, we can also import PFPN (Panoptic Feature Pyramid Networks) and keep the same code

from glasses.models import AutoModel
from glasses.models.segmentation import PFPN

x = torch.randn((1,1, 384, 384))
model = PFPN.from_encoder(model=partial(AutoModel.from_name, 'efficientnet_b1'), n_classes=2)
out = model(x)
torch.Size([1, 2, 384, 384])

In this case the output always match the input, this is due to how PFPN works.