Configuration
How to use Hydra config files
Since GDL use the Hydra library, here’s how
GDL configuration work. Through a main yaml
file
gdl_config_template.yaml
located in the config
folder which handles additional yaml
files located in other subfolders.
We recommend starting with the
gdl_config_template.yaml
as the configuration entrypoint. To run your code with the proper parameters, see the
Running GDL section for an example with a semantic segmentation task.
There are other examples at the end of this document in the Examples section.
The config
folder is structured as depicted below.
It is important to remember that the
gdl_config_template.yaml
file contains every parameter necessary for executing the command.
Other yaml
files in subfolders handle specific categories of parameters.
The code is currently executed with
gdl_config_template.yaml
as a default configuration, the Examples section have an example on how to run the script with
an other config yaml
.
Each configuration file need to follow the structure show bellow, and each of those sections have their functionality that will be explain.
defaults:
...
general:
...
inference:
...
print_config: ...
mode: ...
debug: ...
Defaults Parameters
The defaults section is where all default yaml
files are loaded as input values for each category of parameters,
you don’t have to specify all of them.
For example, model: gdl_unet
means the config
folder contains a subfolder called model
which have a gdl_unet.yaml
file.
So for every time that GDL code call the parameter model
, it will have all the variables set in the gdl_unet.yaml
.
If you want to run GDL with another model like smp_deeplabv3
, you only have to change model: gdl_unet
to model: smp_deeplabv3
.
Just be sure that you have smp_deeplabv3.yaml
in your model
folder
(the Examples section have an example on how to change this parameter in the command line).
Options for each category of parameters are found in config subfolder by the same name.
defaults:
- model: gdl_unet
- verify: default_verify
- tiling: default_tiling
- training: default_training
- loss: binary/softbce
- optimizer: adamw
- callbacks: default_callbacks
- scheduler: plateau
- dataset: test_ci_segmentation_binary
- augmentation: basic_augmentation_segmentation
- tracker: # set logger here or use command line (e.g. `python GDL.py tracker=mlflow`)
- visualization: default_visualization
- inference: default_binary
- hydra: default
- override hydra/hydra_logging: colorlog # enable color logging to make it pretty
- override hydra/job_logging: colorlog # enable color logging to make it pretty
- _self_
All the files in the defaults section can be overwritten on the command line,
go to Examples section too see how to do.
The main goal of the structure is to organize all the parameters in meaningful and logical categories.
If you want to add new options for a category,
you’ll need to include # @package _global_
at the beginning of each yaml
added.
By doing so, the code in python will read model.parameters_name
as a directory.
If you accidentally omit the prefix # @package _global_
,
the python code will read model.unet.parameters_name
(as set by default currently),
so to be more versatile we want to read model.parameters_name
.
For example if you created new_model.yaml
to be read as a model and you don’t want
to change the main code to read this file each time you change model.
For more information about packages in Hydra,
see Hydra’s documentation on Packages.
For the tiling
parameter, you can find more information in the Data Tiling
containing the information to execute the this job.
Same for the training
and inference
parameter, the information can be found at
the Training and Inference section respectively.
When training the inference
part doesn’t need to be filled
and vice versa.
The tracker
is set to nothing by default, but will still log the information in the log folder.
If you want to set a tracker you can change the value in the config file or add the tracker
parameter at execution time via the command line python GDL.py tracker=mlflow mode=train
.
We recommend to use mlflow
, since the development team use it, but you can use whatever you want and
create a yaml
for it.
General Parameters
This section contains general parameters information that will be read by the code,
normally contain parameters often changed or paths to important file.
Other yaml
files from the defaults section will read parameters from the general section.
task: segmentation
work_dir: ${hydra:runtime.cwd} # where the code is executed
config_name: ${hydra:job.config_name}
config_override_dirname: ${hydra:job.override_dirname}
config_path: ${hydra:runtime.config_sources}
project_name: template_project
workspace: your_name
max_epochs: 2 # for train only
min_epochs: 1 # for train only
raw_data_dir: data
raw_data_csv: tests/tiling/tiling_segmentation_binary_ci.csv
tiling_data_dir: ${general.raw_data_dir}/patches # where the patches will be saved
save_weights_dir: saved_model/${general.project_name}
Note
The task
parameter have multiple options, see the Task section.
Print Config Parameter
If True
, this will save the config inder the run subfolder generated in the log folder.
Mode Parameter
mode: {verify, tiling, train, inference, evaluate}
- For GDL, the modes available are:
verify, verify the given data and generate an
csv
with infos and stats on those images.tiling, generates tiles from each source aoi (image & ground truth).
train, will train the model specified with all the parameters in the configuration file.
inference, generate the inference for the given images.
evaluate, generate statistics on the given images, unlike the inference, this mode need the images to be link to a ground truth.
Note
Each of those modes will be different for all the tasks, for further information on the well being of those modes, see the Mode section.
Debug Parameter
If True
, this will print the complete yaml config at the beginning
plus run a validation test on the dataloader before the training.
Examples
Here some examples on how to run GDL with Hydra.
Basic usage
Run the code with all the defaults value in the gdl_config_template.yaml .
(geo_deep_env) $ python GDL.py mode=train
Overwritting only one parameter
Changing only one parameter in the configuration.
# Changing the number of max epochs for training
(geo_deep_env) $ python GDL.py mode=train general.max_epochs=100
# Changing the dropout for the chosen model
(geo_deep_env) $ python GDL.py mode=train model.dropout=True
Adding a new parameters
Adding a new parameters in the config without having to write it in the yaml
.
(geo_deep_env) $ python GDL.py mode=train +new.params=1.0
The configuration that will be save for this run will look like that:
defaults:
...
general:
...
print_config: ...
mode: ...
debug: ...
new:
params: 1.0
Using an other configuration file
How to using a new gdl_config.yaml
file that has the same structure as the template yaml
but have different values. The usecase for that is, for example, you have a certain configuration
for your pipline that is different form your testing one, you dont want to change your parameters
each time. So you create a new yaml
for your pipline and when you are ready to run it, you
only have to run it like that:
(geo_deep_env) $ python GDL.py --config-name=/path/to/new/gdl_pipline_config.yaml mode=train
Other Hydra parameters to overwrite
See Hydra’s documentation on command line flags page for more informations.