Implement nnU-Net with Custom Dataset using Google Colab

Motivation:

During my Master Degree in Medical Imaging and Applications, one of my final projects was on Brain Tissue Segmentation using the well known IBSR18 dataset. The task was to segment three tissue classes (white matter, gray matter and CSF) and of course the background. Hence it was a multi class segmentation problem. I won’t go into the details of the dataset as it is not a concern fro this blog. I will jump right into the main point – nnUNet.

Since you are here, I am guessing you already know what nnUNet is. nnUNet is the first segmentation method that is designed to deal with the dataset diversity found in the domain. It condenses and automates the key decisions for designing a successful segmentation pipeline for any given dataset. Feel free to go through the research paper that I have put in the references below. And if you are not a fan of reading research papers, you can go watch the YouTube video that I have referred below (it is a very good video!).

So, let’s get back to our implementation. When I first wanted to implement nnUnet, I planned on implementing it on Google Colab as I already had the subscription to Colab Pro and also because nnUnet needs a pretty good GPU to run. Just a heads up, Google Colab has a GPU of Tesla P100-PCIE-16GB and RAM that goes upto 26.30GB. However, it was very difficult for me to find an easy and straight forward implementation of nnUNet using custom dataset for a beginner like me. So, after I successfully managed to implement the algorithm I thought of sharing it with you guys through this blog.

Hopefully by the end of the blog you will have your nnUNet code ready and running.

Step 1 : Setting up Colab for nnUNet

First of all download nnUNet into your colab and import the necessary libraries.

# Download nnunet
!pip install nnunet

# Import Libraries
from urllib import request
import pathlib
import zipfile
import os

Step 2 : Prepare the Dataset

Any custom dataset needs to be organized in a certain format for the code to be able to process the data. You need to arrange the images and the labels in different folders and also add a json file to describe your data. You need to make sure the following steps:

  • Naming the main Folder : The name of the folder must be in the format of “TaskXXX_TaskName”. The XXX can be any number starting from 101 as the custom dataset number starts at 101. You will be using this number for the rest of the code. And the TaskName can be anything, in my case I named it as IBSR.
  • Naming the Subfolders : The training images should go into the “imagesTr” folder, the training ground truths in the “labelsTr” folder and the test images in the “imagesTs” folder as shown in the image below (on the left).
  • Creating the json File : The training images should go into the “imagesTr” folder, the training ground truths in the “labelsTr” folder and the test images in the “imagesTs” folder.Then you need to add a json file containing name, description, reference, license, release, images dimension, modalities, classes, number of training and testing data as well as name and location of each of them. You need to list every single training images and each training image should be accompanied by its corresponding ground truth. You also need to list all of your test images. I have provided a sample of the json file that I made and the description should be pretty self explanatory.
  • Renaming the Images : The images and their corresponding labels should also have the exact same name. Only the names of the testing and training images should be followed by 0000 in case of single modality as shown in the image below (on the right). Leave the label name as it is. If your dataset has multiple modality refer to the Official GitHub Repo of nnUnet to see the exact naming format.
{ 
 "name": "BrainTissue", 
 "description": "IBSR 18 Dataset",
 "reference": "National Institute of Neurological Disorders and Stroke (NINDS)",
 "licence":"Grant number 1 R01 NS34189-01",
 "release":"1.0 05/07/2013",
 "tensorImageSize": "3D",
 "modality": { 
   "0": "T1-w"
 }, 
 "labels": { 
   "0": "background", 
   "1": "CSF", 
   "2": "GM",
   "3": "WM"
 }, 
 "numTraining": 3, 
 "numTest": 2,
 "training":[{"image":"./imagesTr/IBSR_01.nii.gz","label":"./labelsTr/IBSR_01.nii.gz"},	 {"image":"./imagesTr/IBSR_03.nii.gz","label":"./labelsTr/IBSR_03.nii.gz"},
{"image":"./imagesTr/IBSR_20.nii.gz","label":"./labelsTr/IBSR_20.nii.gz"}], 	 
 "test": ["./imagesTs/IBSR_12.nii.gz",
"./imagesTs/IBSR_13.nii.gz"]
}

Step 3: Setting up the Environments

After my dataset was ready, I compressed it and put it in my drive so that I could extract it into the required folder during execution. Then I set up all the required directories accordingly.

First of all connect your Colab Notebook to your Google Drive. I personally used google drive because nnUNet takes a very very long time to execute and there is a high risk of loosing data if you save it locally in the notebook environment. (It has happened to me and it was very heart breaking).

# connecting to Google Colab
from google.colab import drive, auth, output
drive.mount('/content/drive', force_remount=True)

Then set up the base directory, the directory where your raw data and the preprocessed data will be saved. Also set up the path to the results directory.

# base directory
base_dir = '/content/drive/MyDrive/'
# directories for the data
nnUNet_raw_data_base = os.path.join(base_dir, 'nnUNet_raw_data_base')
nnUNet_preprocessed = os.path.join(base_dir, 'nnUNet_preprocessed')
# directory where all the results will be saved
results_folder = os.path.join(base_dir, 'results')
raw_data_dir = os.path.join(nnUNet_raw_data_base, 'nnUNet_raw_data')

os.environ["nnUNet_raw_data_base"] = str(nnUNet_raw_data_base)
os.environ["nnUNet_preprocessed"] = str(nnUNet_preprocessed)
os.environ["RESULTS_FOLDER"] = str(results_folder)

Extract your dataset into specific folder in the raw data directory. You only need to do this once.

raw_data_dir = os.path.join(nnUNet_raw_data_base, 'nnUNet_raw_data')
# Extracting the data from the url_path where I had saved the zipped dataset folder
zip_ref = zipfile.ZipFile(url_path, 'r')
zip_ref.extractall(raw_data_dir)

In the last step of your setup verify if your dataset was organised properly using the command below. Put your customised number in place of XXX. With this command all the preprocessing of your data will be done.

!nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity

In this step you will also be able to verify if all the labels in your dataset in being recognised properly. All the labels should appear while the preprocessing is being done like the image below. If you get error at this stage, you will need to go back and recheck your dataset.

Step 4: Training

Now let’s start the training of the model. The nnUNet has a couple of configurations, I will only show the codes for the 2D U-Net configuration as all the other codes are the same. For each of the configuration, you need to training the model for Fold 0, 1, 2, 3 and 4. The number of epochs is set to 1000 by default. I have yet to try with customised number of epochs.

For the 2D configuration, each fold took around 25 hours to complete on Google Colab Pro, which means it took more than 5 days to train the model. In other words, you need to be very patient. You just need to type 1 line of command to train this powerful algorithm.

!nnUNet_train TRAINER_CLASS_NAME TASK_NAME FOLD

where, TRAINER_CLASS_NAME is the configuration, task name is the dataset name you have assigned and FOLD is the number from 0 to 4 (1 fold at a time).

!nnUNet_train 2d nnUNetTrainerV2 TaskXXX_TaskName 0 --npz
!nnUNet_train 2d nnUNetTrainerV2 TaskXXX_TaskName 1 --npz
!nnUNet_train 2d nnUNetTrainerV2 TaskXXX_TaskName 2 --npz
!nnUNet_train 2d nnUNetTrainerV2 TaskXXX_TaskName 3 --npz
!nnUNet_train 2d nnUNetTrainerV2 TaskXXX_TaskName 4 --npz

The –npz needs to be added if you want your final segmentation to be done in multi class. If you don’t put it, the entire segmentation will be in binary regardless of what your dataset is. To verify that the segmentation is being done in multi class you can look at the dice score. If you have 3 classes, 3 numbers should appear here. If you see a single number here, it means the training is being done in binary. For Colab Pro the time taken would be around 80s for each epoch. (A less powerful GPU will be assigned to you and the time if you overuse Colab Pro like I did and therefore each epoch will take more time to execute.)

Another fun thing you can do with nnUNet is that, you can resume your training if it stops somewhere. You just need to add -c add the end of the training command like so:

!nnUNet_train 2d nnUNetTrainerV2 TaskXXX_TaskName 0 --npz -c

Now sit back and relax while nnUNet does the entire training.

Step 5: Prediction

After you have run all 5 folds, it is time to make the predictions using the trained model. You just need to type the following command and it will find the best configuration and also provide you with the command to make the predictions on the test data.

# find the best configuration
!nnUNet_find_best_configuration -m 2d -t XXX

To get the segmentation of the test dataset, run the following command. In place of “FOLDER_WITH_TEST_CASES” put the full path to the folder containing the test images and in place of “OUTPUT_FOLDER” put the full path of the location were you want to save the final prediction. Note that the folder path “FOLDER_WITH_TEST_CASES” can be the path to any folder containing images you want to predict (not necessarily the test images you initially uploaded and mentioned in the json file).

The following command is the command that my nnUNet model generated for me to do the predictions:

# test prediction
!nnUNet_predict -i FOLDER_WITH_TEST_CASES -o OUTPUT_FOLDER -tr nnUNetTrainerV2 -ctr nnUNetTrainerV2CascadeFullRes -m 2d -p nnUNetPlansv2.1 -t TaskXXX_TaskName

In case you want to see the prediction of a particular fold, you can add the fold number followed by “-f” to do so. Adding the “–save_npz” option makes the models save the softmax outputs i.e. the multi class segmentation.

# test prediction for fold 0
!nnUNet_predict -i FOLDER_WITH_TEST_CASES -o OUTPUT_FOLDER -m 2d -f 0 -t TaskXXX_TaskName --save_npz

Final Remarks:

Congratulations, you have successfully trained nnUNet on your custom dataset. I hope this blog was helpful for you. I will update this blog along the way when I learn something new.

Leave comments below if you have any questions. Happy Coding!

References:

Leave a comment