IPA Project Structure
The IPA projects structure aims to support the whole development cycle of your project. It is designed to be flexible and scalable, allowing you to grow your project in size and complexity without sacrificing clarity.
The project template is organized to support clear separation between developing a workflow and running a workflow on data. Furthermore, there is a clear separation between raw_data, processed_data, and results. Separating these concepts helps to keep the project organized and supports reproducibility.
The projects structure is meant to be used with a version control system like git. But we configured the template such that all data directories are excluded from version control.
Data
The raw_data directory is intended to hold the raw data as saved by the acquisition systems. In processed_data outputs generated by scripts in the source directory are stored. These data could for example be segmentation masks or tables of extracted measurements. The results directory usually contains a condensed version of processed_data, like figures, tables, and final reports.
Note
If you are working on a unix based system, we recommend to create symbolic links to your raw data in the raw_data
directory. This way you can keep the original data in its original location and still have it accessible in your project.
Workflow Development
When developing a new image processing or analysis step you most likely need to test things out first. The sandbox directory is meant for this purpose. It is a place to test out new ideas and develop new processing steps. Anything inside the sandbox is not considered final and is not used to generate processed_data or results.
Once a processing step is stable, and you want to use it to generate reproducible outputs, you will move it to the source directory. The source directory contains all scripts that are used to generate processed_data and results. Moving a script from sandbox to source also means that you should start documenting the script and make it configurable. While you can hard-code paths and parameters in sandbox, you should use configuration files in source to facilitate re-using the same script on different data.
Applying a Workflow
With a versioned processing script ready in source it is time to apply it to your data. To keep track of configuration files and outputs generated by the script, we use the runs directory. Where you should create a new sub-directory for each dataset. Inside this sub-directory the config files for the scripts are stored and executed from. The outputs are stored in a sub-directory of processed_data or results with the same name as the run-directory.
Info
The runs directory is included in version control but is up to you to commit them after each run. This way you can keep track of which config file was used with which version of the code to produce a given output.
Documentation
In docs a mkdocs-material project is set up to render a nice website with your documentation. The documentation is meant to be used to describe the project and the processing steps in detail, in particular how to execute them.
Info
To render the documentation locally you can run pixi run show_docs
. This will start a local server and you can view the documentation in your browser. To render a static version use the command pixi run build_docs
and your documentation will be available in the root_dir/site directory.