Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
cfd7a46
Added initial files before swapping to rangpur
larsmoan Oct 5, 2023
9bec4a7
skeleton for readme
larsmoan Oct 5, 2023
17090b1
Added dataset and fixed gitignore
larsmoan Oct 5, 2023
af1e6c7
Almost finished preprocessing the ISIC dataset so that it can be used…
larsmoan Oct 11, 2023
148231b
Added yolov7 as a submodule
larsmoan Oct 11, 2023
0f7f0a3
Removed unnecessary poetry-dem
larsmoan Oct 11, 2023
60dc40a
Working cfg and data.yaml files for custom training with ISIC2017 dat…
larsmoan Oct 12, 2023
9d079ce
Removing Ds store
larsmoan Oct 12, 2023
61185e5
Removing Ds store
larsmoan Oct 12, 2023
e41ad7e
Added bash training script and ability to download downsampled datase…
larsmoan Oct 13, 2023
2c5318f
Added stat function for downsampled dataset and easier visualizing of…
larsmoan Oct 16, 2023
5c713bd
Added slurm script for cluster
larsmoan Oct 16, 2023
94d8464
Updated gitignore
larsmoan Oct 16, 2023
1e48599
Created using Colaboratory
larsmoan Oct 16, 2023
642ff41
Started working on description of dataset for the reeport
larsmoan Oct 17, 2023
699eb28
Merge branch 'main' of github.com:larsmoan/PatternAnalysis-2023
larsmoan Oct 17, 2023
c19b1f9
Refactored the dataset utils needed to create YOLO version of the dat…
larsmoan Oct 17, 2023
29d8a79
Merge branch 'main' of github.com:larsmoan/PatternAnalysis-2023
larsmoan Oct 17, 2023
1e8ea00
Added link to downsampled dataset
larsmoan Oct 17, 2023
efef3f2
Added working rangpur scripts and requirements.txt
larsmoan Oct 18, 2023
b01e8e3
gitignore change
larsmoan Oct 18, 2023
b6a62a0
merge
larsmoan Oct 18, 2023
0482f05
Changed training file to correct batch size, utilizes the GPU better
larsmoan Oct 18, 2023
ef84a78
Removed some unneccesary stuff in readme
larsmoan Oct 18, 2023
5d757a9
readme work
larsmoan Oct 18, 2023
a90f08e
documentation
larsmoan Oct 18, 2023
3d29c3d
Changed some hyperparams for augmenting the data more during training
larsmoan Oct 19, 2023
e89b65b
Altered gitignore to ignore output train files
larsmoan Oct 19, 2023
73c304a
Finished training on the a100
larsmoan Oct 20, 2023
9bdc810
Added results from test on p100 and batch 32
larsmoan Oct 21, 2023
933f9dd
Cleanup of old and unnecessary files + test script for model
larsmoan Oct 21, 2023
bbd7b0d
Finished working test script, now it automatically downloads the weig…
larsmoan Oct 21, 2023
652e789
commented on the training hyp params
larsmoan Oct 21, 2023
96c4008
Added functionality for downloading transfer learning weights if not …
larsmoan Oct 21, 2023
a22da64
Removed unnecessary notebook file for training on google colab + docu…
larsmoan Nov 9, 2023
4e3f4ce
Merge branch 'topic-recognition' of github.com:larsmoan/PatternAnalys…
larsmoan Nov 9, 2023
134fdf7
Added more explenation about the YOLO architecture
larsmoan Nov 9, 2023
59b91a0
Added in the training run plots for loss and performance from Wandb
larsmoan Nov 9, 2023
65b0330
Added metric plots for mAP, precision and recall
larsmoan Nov 9, 2023
46a67cf
Added commenting on the training plots and the result plots
larsmoan Nov 9, 2023
13c9e2c
Did a new test run on batch size 9 for readme
larsmoan Nov 9, 2023
c1793f4
Results from new test run
larsmoan Nov 9, 2023
0dc5463
Added plot for showing the misalignement with the anchor boxes
larsmoan Nov 9, 2023
2289417
Minor readme changes
larsmoan Nov 10, 2023
1e76280
Added reflection on downsampling the dataset#
larsmoan Nov 10, 2023
89caaf8
Minor
larsmoan Nov 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.venv
.DS_Store
5 changes: 5 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[submodule "lesion_detection_larsmoan/yolov7"]
path = lesion_detection_larsmoan/yolov7
url = https://github.com/WongKinYiu/yolov7


17 changes: 17 additions & 0 deletions lesion_detection_larsmoan/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
data/
__pycache__
out.txt
err.txt
err_p6_p100.txt
out_p6_p100.txt

runs/
wandb

.DS_Store
.venv
.env
*.pt

err_test.txt
out_test.txt
259 changes: 259 additions & 0 deletions lesion_detection_larsmoan/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
## Lesion detection and classification using YOLOV7 on the ISIC2017 dataset

### Table of Contents
- [Lesion detection and classification using YOLOV7 on the ISIC2017 dataset](#lesion-detection-and-classification-using-yolov7-on-the-isic2017-dataset)
- [Table of Contents](#table-of-contents)
- [Installation](#installation)
- [Dataset](#dataset)
- [Overview](#overview)
- [Preprocessing](#preprocessing)
- [Usage](#usage)
- [Model Architecture: open source YOLOV7 Model](#model-architecture-open-source-yolov7-model)
- [Core ideas used in the YOLOV1 paper:](#core-ideas-used-in-the-yolov1-paper)
- [Output](#output)
- [Training](#training)
- [Training discussion](#training-discussion)
- [Results](#results)
- [Confusion Matrix](#confusion-matrix)
- [F1 - curve](#f1---curve)
- [Precision - Recall curve](#precision---recall-curve)
- [Results discussion](#results-discussion)
- [Downsampling](#downsampling)
- [Anchor boxes misaligned](#anchor-boxes-misaligned)
- [Example outputs](#example-outputs)


### Installation
- Prerequisites: python=3.10.12 && cuda=11.7

**A GPU cluster is used for this project, more specifically rangpur @ UQ. Therefore a lot of the training and inference scripts are based on slurm jobs. If needed this can easily be converted to run locally.**

```
git clone git@github.com:larsmoan/PatternAnalysis-2023.git
git submodule init
git submodule update
pip install -r requirements.txt
```

## Dataset
**Source**: [ISIC 2017 Dataset](https://challenge.isic-archive.com/data/#2017)

### Overview
Each image comes with corresponding label and segmentation file highlighting the lesion.
- **Training Set**:
- 2000 images.

- **Validation Set**:
- 600 images.

- **Test Set**:
- 150 images.

**Lesion Classes**:
- `Melanoma`
- `Seborrheic Keratosis`
- `Nevi / Uknown`: Technically known as a benign skin lesion. Commonly referred to as a mole.


### Preprocessing
Given that the dataset provides segmentation files, there's a need for preprocessing to convert these labels into YOLO bounding box labels.

Steps include:
1. Identify the maximum and minimum coordinates within the segmentation area.
2. Fit a bounding box around this region.
3. Assign the class based on the label provided in the associated CSV file.

More information can be found in the file: [dataset_utils.py](./dataset_utils.py)

The dataset itself also needs to be refactored a bit to work with YOLOV7, therefore the structure is changed to the following:
```
dataset/
├── train/
│ ├── img_1.jpg
│ ├── ...
│ ├── img_n.jpg
│ ├── img_1.txt
│ ├── ...
│ └── img_n.txt
├── val/
│ ├── img_1.jpg
│ ├── ...
│ ├── img_n.jpg
│ ├── img_1.txt
│ ├── ...
│ └── img_n.txt
└── test/
├── img_1.jpg
├── ...
├── img_n.jpg
├── img_1.txt
├── ...
└── img_n.txt
```

The prepocessed dataset can be downloaded from this link:
https://drive.google.com/uc?id=1YI3pwanX35i7NCIxKnfXBozXiyQZcGbL or from [dataset_utils.py](./dataset_utils.py)



### Usage
- Download the dataset and pretrained yolov7 weights:
```
python dataset_utils.py
```
- Train the model:
Using rangpur cluster:
```
sbatch run_custom_train.sh
```
Or using Google Colab:
[isic_train.ipynb](./isic_train.ipynb)
- Run inference on testset:
```
sbatch run_test.sh
```

### Model Architecture: open source [YOLOV7 Model](https://github.com/WongKinYiu/yolov7)

YOLOV7 is based on the original YOLO paper: [YOLOV1](https://arxiv.org/abs/1506.02640) which was published in 2015 and presented a leap in inference speed for object detection models. The main reason for this was that it was one of the first models that did object detection in a single stage, hence the name YOLO ( you only look once ) in contrast to the two stage models that were popular at the time. Note that some single stage models were present, such as SSD, but they had relatively poor accuracy performance.

#### Core ideas used in the YOLOV1 paper:

The original paper was trained on input images of size 448x448 and these images where parsed into a grid of 7x7 grid cells.
<figure style="margin-right: 10px; display: inline-block;">
<img src="./figures/grid_cell_yolo.png" alt="Example Image" width="460" height="340">
<figcaption>Original grid cells on 448x448 image. Source: <a href="http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf">"You Only Look Once by Joseph Redmon et al., CVPR 2016</a>.</figcaption>
</figure>

The idea was that each grid cell was responsible for predicting a object if the center of that object was within the given grid cell.
**In this example the red grid cell would be responsible to detecting the car present in the top right corner**

In the paper it was also proposed that each grid cell's output was two bounding boxes each with their own confidence / objectness score + a class probability vector. I.e was each grid cell only capable of predicting one detection, altough it could predict two bounding boxes. The bounding box with the highest objectness score was chosen in addition to the class with the highest probability.

<figure style="margin-right: 10px; display: inline-block;">
<img src="./figures/yolov1.png" alt="Example Image" width="710" height="340">
<figcaption>YOLOv1 output tensor. Source: <a href="http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf">"You Only Look Once by Joseph Redmon et al., CVPR 2016</a>.</figcaption>
</figure>

In the figure above the core ideas is presented, each grid cell proposing two bounding boxes (in black) but only one class. The "best" box and class over a certain treshold is then used as the final prediction. -> Single stage object detection and classification!

### Output
Because the model uses a 7x7 grid, the output is a tensor of shape (batch_size, 7, 7, 30). Where 30 is the number of parameters that is predicted for each grid cell. The 30 parameters per grid cell are:
[objectness_score, box_x, box_y, width, height] + [objectness_score, box_x, box_y, width, height] + [class_probabilities].
Two bounding boxes + a vector of class probabilities. In the case of the original YOLO paper, the model was trained on the [PASCAL dataset](http://host.robots.ox.ac.uk/pascal/VOC/) where there are 20 classes present. Hence the 30 parameters per grid cell.


<figure style="margin-right: 10px; display: inline-block;">
<img src="./figures/output-tensor-yolov1.png.webp" alt="YOLOv1 Output Tensor Visualization" width="580" height="340">
<figcaption>YOLOv1 output tensor. Source: <a href="http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf">"You Only Look Once by Joseph Redmon et al., CVPR 2016</a>.</figcaption>
</figure>


### Training
Training was done on the rangpur cluster using a variety of different GPU's, the two main training runs shown in the plots below was done using the A100 and the P100. (A100 for the batch size of 8 and P100 for the batch size of 32).

- Hyperparameters that was used is described in: [hyp.scratch.p6.yaml](./hyp.scratch.p6.yaml)

***In the plots below, the term step refers to epoch number***

<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/train-boxloss.png" alt="Example Image" width="600" height="300">
<figcaption>Box loss.</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/train-clsloss.png" alt="Example Image" width="600" height="300">
<figcaption>Class loss.</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/train-objloss.png" alt="Example Image" width="600" height="300">
<figcaption>Object loss.</figcaption>
</figure>


#### Training discussion
From the plots showing the box, class and object loss it is clear that they indeed was decreasing when the training was halted. With that said they decreased by small amounts, almost negligible. In the case of the A100 training run, that took about 48 hours.

An interesting observation is the fact that during both runs the model's loss reduced drastically for the first 10 or so epochs, after this the reduction was almost 0. This can be an indication that given the data the models were presented they actually fitted really quickly, but that the data itself might have very complex structures. Hence the model was not able to learn more from the data.


### Results
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/metrics-mAP0.5_0.95.png" alt="Example Image" width="600" height="300">
<figcaption>mAP 0.5:0.95.</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/metrics-mAP_0.5.png" alt="Example Image" width="600" height="300">
<figcaption>mAP@0.5.</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/metrics-precision.png" alt="Example Image" width="600" height="300">
<figcaption>Precision</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/metrics-recall.png" alt="Example Image" width="600" height="300">
<figcaption>Recall</figcaption>
</figure>


#### Confusion Matrix
<img src="results/yolov7_b32_p100/confusion_matrix.png" alt="Description" >


#### F1 - curve
<img src="results/yolov7_b32_p100/F1_curve.png" alt="Description">

#### Precision - Recall curve
<img src="results/yolov7_b32_p100/PR_curve.png" alt="Description">

### Results discussion
The results are not very good, and far off what I anticipated when first embarking on this project. The best result was an mAP@0.5 of about 0.718 which is quite poor.

An important observation is the instability in training related to precision and recall, we can see that for thease metrics the models doesnt improve much past the 10 first epochs, but it has huge variety from epoch to epoch.

Initially I thought these issues were related to the small batch size, that was the main driver for increasing it to 32 during the second training run.
However this did not improve the results drastically, although it had a significant effect on mAP@0.5.

I think the reason for the results being as poor as they are is the inherent complexity of the dataset, when I have went through the data personally I often find it really challenging to understand the labels in the ISIC2017 dataset.

The label unknown / benign is especially tricky to understand as a human because there are a lot of artifacts present in the dataset that is not labeled as benign skin lesions.

#### Downsampling
The original dataset too large to fit on a student node at rangpur, since the storage limit there is 16gb. Therefore I dowsampled the dataset by a factor of 2 and used this during both training and testing. Originally I thought this wouldn't pose any problem as the original dataset has images of really high resolution, often in the range og 5MB per image.
That being said, the downsamplign is in itself a loss of information as the dimensionality is reduced, so it could be a factor that has contributed to the poor results.

#### Anchor boxes misaligned
After diving deeper into the intrinsics of the YOLO architecture I have found one major factor which I suspect could enhance the performance by a ton. Using another set of anchor boxes!
The standard anchor boxes utilized by YOLOV7 does not fit that well with the labels present in the ISIC dataset, where we can see that often a bounding box takes up almost the complete image.

To mitigate this issue I should have performed a custom annchor box analyzis, by doing a clustering on the boxes present in the ISIC dataset and therefore "allow" the model to do bigger predictions. This is something I will try to implement in the future.

### Example outputs
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/yolov7_b32_testing8/test_batch0_labels.jpg" alt="Example Image" width="600" height="350">
<figcaption>Batch 0 - Labels</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/yolov7_b32_testing8/test_batch0_pred.jpg" alt="Example Image" width="600" height="350">
<figcaption>Batch 0 - Predictions</figcaption>
</figure>

<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/yolov7_b32_testing8/test_batch1_labels.jpg" alt="Example Image" width="600" height="350">
<figcaption>Batch 1 - Labels</figcaption>
</figure>
<figure style="margin-right: 10px; display: inline-block;">
<img src="./results/yolov7_b32_testing8/test_batch1_pred.jpg" alt="Example Image" width="600" height="350">
<figcaption>Batch 1 - Predictions</figcaption>
</figure>


In these outputs what I discussed related to anchor boxes above is clear, a lot of the labels take up almost all of the image. This is quite far off from the YOLO standard anchor boxes.

Another way of visualizing this is by the plot below, here we can clearly see an example of how the anchor boxes are misaligned with the labels present in the ISIC dataset.
<figure style="margin-right: 10px; display: inline-block;">
<img src="./figures/anchor_comparison.png" alt="Example Image" width="600" height="350">
<figcaption>Anchor comparison</figcaption>
</figure>
Loading