Containerizing the Tool

To integrate a tool with Stage2, we need to containerize it using Docker. This process involves several components working together to ensure that the tool can interact seamlessly with Stage2.

Overview of the Containerization Process

Build the Scripts
Start by creating and testing scripts locally that perform the required tasks with your selected tool. This ensures the logic and functionality are sound before containerizing.
Create the stage2_interface.py
This Python script acts as the entrypoint for the container. It receives inputs, parameters, and outputs from Stage2 and orchestrates the execution of the tool.
Write a Dockerfile
Use Docker to define the container, specifying the base image, dependencies, and stage2_interface.py as the entrypoint.

Step 1: Building the Scripts

The first step is to write scripts that perform the actual tasks using the selected tool. For our example, we need a script that preprocesses images using ImageMagick.

Example Script

For this operation, the script will:

Convert images to grayscale.
Crop images (manually or automatically based on parameters).
Adjust contrast.
Normalize pixel values.

Start by testing these scripts locally to ensure they handle edge cases and parameter inputs effectively.

The below example is a local script that uses imageMagick to complete our desired task.

Notice that the parameters we are planning to source from our operation interface are simply hard coded at the top of the file.

Also notice, that the main() method is very minimal, this is not strictly required but will simplify usage within stage2_interface.py

Ulitmately, this local scripts goal is to focus strictly on ensuring our imageMagick implementation works before moving onto dockerizing and connecting to Stage2.

Local Script Example

import subprocess
import os
import sys

# -------------------------------
# Hardcoded Parameters
# -------------------------------

# Paths
INPUT_PATH = "wild-beast.jpg"  # Path to the raw input image
OUTPUT_PATH = "output.jpg"  # Path to save the processed image

# ROI Configuration
AUTO_ROI = True  # Set to True to enable automatic ROI cropping
ROI_COORDINATES = [100, 100, 400, 400]  # [x1, y1, x2, y2] for manual cropping

# Image Processing Parameters
CONTRAST_FACTOR = 10  # Percentage to adjust contrast (e.g., 10 means 10%)
NORMALIZATION_RANGE = [0, 1]  # [min, max] normalization range (e.g., [0, 1] or [-1, 1])

# -------------------------------
# Helper Functions
# -------------------------------

def run_command(command):
  try:
    subprocess.run(command, check=True)
  except subprocess.CalledProcessError as e:
    raise RuntimeError(f"Error executing command: {' '.join(command)}\n{e}")


def convert_to_grayscale(input_path, output_path):
  command = ["convert", input_path, "-colorspace", "Gray", output_path]
  run_command(command)
  print(f"Converted to grayscale: {output_path}")


def crop_image(input_path, output_path, roi, auto_roi=False):
  if auto_roi:
    # ImageMagick's 'trim' removes uniform borders, which can be a form of automatic cropping
    command = ["convert", input_path, "-trim", "+repage", output_path]
    run_command(command)
    print(f"Automatically cropped image: {output_path}")
  else:
      x1, y1, x2, y2 = roi
      width = x2 - x1
      height = y2 - y1
      crop_geometry = f"{width}x{height}+{x1}+{y1}"
      command = [
        "convert",
        input_path,
        "-crop",
        crop_geometry,
        "+repage",
        output_path,
      ]
      run_command(command)
      print(f"Cropped image with ROI {roi}: {output_path}")


def adjust_contrast(input_path, output_path, contrast_factor):
  # contrast_factor is used as a percentage
  command = [
    "convert",
    input_path,
    "-contrast-stretch",
    f"{contrast_factor}%",
    output_path,
  ]
  run_command(command)
  print(f"Adjusted contrast by {contrast_factor}%: {output_path}")


def normalize_pixels(input_path, output_path, normalization_range):
  """Normalizes pixel values to the specified range."""
  min_val, max_val = normalization_range
  if min_val == 0 and max_val == 1:
    # Scale pixel values to [0,1]
    command = ["convert", input_path, "-normalize", output_path]
    run_command(command)
    print(f"Normalized pixel values to [{min_val}, {max_val}]: {output_path}")
  elif min_val == -1 and max_val == 1:
    # Scale pixel values to [-1,1]
    command = ["convert", input_path, "-normalize", output_path]
    run_command(command)
    print(
        f"Normalized pixel values to [{min_val}, {max_val}] (Placeholder): {output_path}"
    )
  else:
    print(f"Unsupported normalization range: [{min_val}, {max_val}]")
    sys.exit(1)

# -------------------------------
# Main Processing Workflow
# -------------------------------

def preprocess_image(
  input_path, output_path, auto_roi, roi, contrast_factor, normalization_range
):
  # Step 1: Convert to Grayscale
  grayscale_path = "grayscale_" + os.path.basename(input_path)
  convert_to_grayscale(input_path, grayscale_path)

  # Step 2: Crop Image
  cropped_path = "cropped_" + os.path.basename(grayscale_path)
  crop_image(grayscale_path, cropped_path, roi, auto_roi)

  # Step 3: Adjust Contrast
  contrast_path = "contrast_" + os.path.basename(cropped_path)
  adjust_contrast(cropped_path, contrast_path, contrast_factor)

  # Step 4: Normalize Pixel Values
  normalize_path = "normalized_" + os.path.basename(contrast_path)
  normalize_pixels(contrast_path, normalize_path, normalization_range)

  # Step 5: Copy the final image to output_path
  shutil.copyfile(normalize_path, output_path)

  print(f"Final processed image saved to {output_path}")

  # Step 6 (optional): Clean up intermediate files
  intermediate_files = [grayscale_path, cropped_path, contrast_path, normalize_path]
  for file in intermediate_files:
    try:
      os.remove(file)
      print(f"Deleted intermediate file: {file}")
    except OSError as e:
      print(f"Warning: Could not delete intermediate file {file}: {e}")


def main():
  preprocess_image(
    INPUT_PATH,
    OUTPUT_PATH,
    AUTO_ROI,
    ROI_COORDINATES,
    CONTRAST_FACTOR,
    NORMALIZATION_RANGE,
  )

if __name__ == "__main__":
  main()

TIP: If you have taken the time to define your interface, choose your parameters and select your tool, LLM's are a great resource to build these scripts quickly. That in mind, if your operation is extremely complex or interfaces with a lesser-known tool, it may be more of a starting point than a full-solution.

Step 2: Creating the `stage2_interface.py`

The stage2_interface.py serves as the bridge between Stage2 and the tool. It is responsible for:

Parsing JSON inputs (operation interface) from Stage2.
Validating inputs and parameters.
Executing the scripts with the specified tool.
Saving outputs to output.

Key Elements of the `stage2_interface.py`:

Argument Parsing: Uses argparse to parse JSON object passed from Stage2.
1. arguments.inputs contains the input(s) data
2. arguments.parameters contains the parameter object
Validation and Error Handlling: Ensures that all required inputs and parameters are present and valid.
1. This step is optional, but if the final operation fails, logs and errors from stage2_interface.py will be accessible in the Stage2 web client. These logs are especially valuable for workflows with multiple operations, helping diagnose errors. See Logging and Node Timeline Chart for more info.
Execution: Runs the scripts with the appropriate inputs and parameters.
Store results: Output should always be saved to output with no file extension

JSON input from Stage2

When the operation executes, Stage2 is going to pass a JSON object as a command line argument to this script. This JSON object is a condensed derivative of the operation interface we defined in Designing an Operation Interface. It looks like the following:

stage2_inputs={
  "context":{...},
  "parameters":{
    "Automatic ROI": true,
    "Contrast Factor": 1.4,
    "Normalization Range": [-1,1],
    "Region of Interest": [2,2,1,1]
  },
  "inputs":{
    "Raw Image File":["<path_to_input_file>"]
  },
  "outputs":"/stage2/output/data"
}

You can see how this JSON object is parsed and passed along to our custom script in the below example of the stage2_interface.py.

Note on inputs section
In our example case, the inputs section is very basic since we have one input (Raw Image File). Furthermore we determined it should only accept one file in that input, so the array will always only have one value in it. Let's say we instead wanted 2 inputs, each that accepted any number of files. This section of the JSON would look like this:

...
"inputs":{
  "Raw Image File": ["<file_1_path>","<file_2_path>","<file_3_path>", ...],
  "Other Input": ["<other_path_1>","<other_path_1>", ...],
}
...

stage2_interface.py

import json
import argparse
import os
import sys

# Import the preprocess_image function from preprocess_image.py
from preprocess_image_doc import preprocess_image


def main():
    # Parse command-line arguments
    parser = argparse.ArgumentParser(
        prog="ImageMagick Stage2 Interface",
        description="Interface to preprocess images using ImageMagick within Stage2.",
    )
    parser.add_argument(
        "stage2_input",
        help="JSON string with inputs, parameters, and output path.",
    )
    args = parser.parse_args()

    print("\n\n args:", args, "\n\n")
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Parse the JSON input
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    try:
        stage2_arguments = json.loads(args.stage2_input)
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON input: {e}")
        sys.exit(1)

    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Extract inputs, parameters, and output
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    inputs = stage2_arguments.get("inputs")
    parameters = stage2_arguments.get("parameters")
    output_path = stage2_arguments.get("output")

    # array of all inputs to the input named "Raw Image File"
    input_files = inputs.get("Raw Image File", [])
    print(input_files)
    # in our case the Raw Image File input only has a multiplicity of 1:1, so the array only has 1 element
    input_file = input_files[0]

    if not input_file:
        print("No 'Raw Input File' provided.")
        sys.exit(1)
    if not os.path.exists(input_file):
        print(f"Input file does not exist: {input_file}")
        sys.exit(1)

    # Automatic ROI Validation
    auto_roi = parameters.get("Automatic ROI")
    if auto_roi is None:
        print("'Automatic ROI' parameter is missing.")
        sys.exit(1)
    if not isinstance(auto_roi, bool):
        print("'Automatic ROI' must be a boolean value.")
        sys.exit(1)

    # Region of Interest Validation
    roi = parameters.get("Region of Interest")
    if not auto_roi:
        if roi is None:
            print("Automatic ROI is disabled, but no 'Region of Interest' provided.")
            sys.exit(1)
        if not isinstance(roi, list) or len(roi) != 4:
            print(
                "Invalid 'Region of Interest'. It must be a list of four integers: [x1, y1, x2, y2]."
            )
            sys.exit(1)
        if any(not isinstance(coord, int) or coord < 0 for coord in roi):
            print("ROI coordinates must be non-negative integers.")
            sys.exit(1)
        x1, y1, x2, y2 = roi
        if x2 <= x1 or y2 <= y1:
            print("ROI coordinates must satisfy x2 > x1 and y2 > y1.")
            sys.exit(1)

    # Contrast Factor Validation
    contrast_factor = parameters.get("Contrast Factor")
    if contrast_factor is None:
        print("'Contrast Factor' parameter is missing.")
        sys.exit(1)
    if not isinstance(contrast_factor, (int, float)) or contrast_factor < 0.1:
        print(
            "'Contrast Factor' must be a positive number with a minimum value of 0.1."
        )
        sys.exit(1)

    # Normalization Range Validation
    normalization_range = parameters.get("Normalization Range")
    if normalization_range is None:
        print("'Normalization Range' parameter is missing.")
        sys.exit(1)
    if (
        not isinstance(normalization_range, list)
        or len(normalization_range) != 2
        or not all(isinstance(n, (int, float)) for n in normalization_range)
    ):
        print("'Normalization Range' must be a list of two numbers: [min, max].")
        sys.exit(1)
    min_val, max_val = normalization_range
    if min_val >= max_val:
        print("'Normalization Range' must have min < max.")
        sys.exit(1)
    if normalization_range not in ([0, 1], [-1, 1]):
        print("'Normalization Range' must be either [0, 1] or [-1, 1].")
        sys.exit(1)

    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Invoke the preprocess_image function
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    try:
        preprocess_image(
            input_path=input_file,
            output_path=output_path,
            auto_roi=auto_roi,
            roi=roi if not auto_roi else None,
            contrast_factor=contrast_factor,
            normalization_range=normalization_range,
        )
    except Exception as e:
        print(f"Error during image preprocessing: {e}")
        sys.exit(1)

    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Final output is saved to _output_path in the preprocess_image method
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    print(f"Image preprocessing completed successfully. Output saved to: {output_path}")


if __name__ == "__main__":
    main()

Testing our `stage2_interface.py` script

The final step before moving onto working with docker is to ensure that our stage2_interface.py script is working as expected. This can be done by passing a JSON object similar to that which stage2 will be passing as a command line argument.

# make sure the input file (in my case, wild-beast.jpg) is in the same directory:
# LINUX
python stage2_interface.py '{"inputs": {"Raw Image File": ["wild-beast.jpg"]}, "parameters": {"Automatic ROI": true, "Contrast Factor": 10, "Normalization Range": [0, 1]}, "output": "output"}'

# WINDOWS
python stage2_interface.py "{\"inputs\": {\"Raw Image File\": [\"wild-beast.jpg\"]}, \"parameters\": {\"Automatic ROI\": true, \"Contrast Factor\": 10, \"Normalization Range\": [0, 1]}, \"output\": \"output\"}"

Step 3: Writing the `Dockerfile`

Containerizing your tool with Docker is the final step in integrating it with Stage2. The Dockerfile serves as a blueprint for building a Docker image that encapsulates your application, ensuring a consistent and isolated environment for execution. This section will guide you through the essential components and considerations for writing an effective Dockerfile tailored to our ImageMagick-based image preprocessing tool.

Prerequisites: Required Docker Knowledge

Before proceeding, it's beneficial to have a foundational understanding of Docker concepts and commands. Familiarity with the following topics will help you navigate this step smoothly:

Docker Basics:

Images: Read-only templates used to create containers.
Containers: Running instances of Docker images.
Dockerfile: A script containing a series of instructions to build a Docker image.
Build Context: The set of files accessible to the Dockerfile during the build process.

Docker Commands:

docker build: Builds an image from a Dockerfile.
docker run: Runs a container from a Docker image.
docker pull: Downloads an image from a registry.
docker push: Uploads an image to a registry.

Dockerfile Instructions:

FROM: Specifies the base image.
RUN: Executes commands in the shell.
COPY / ADD: Copies files/directories into the image.
ENV: Sets environment variables.
WORKDIR: Sets the working directory.
ENTRYPOINT / CMD: Defines the default command to run when a container starts.

Overview of the Dockerfile

A Dockerfile orchestrates the following key tasks:

Selecting a Base Image: Choosing a minimal and appropriate base image that includes the necessary runtime environments and tools.
Installing System Dependencies: Installing any required system packages or tools your application depends on.
Setting Up the Application Environment: Configuring environment variables and working directories.
Copying Application Code: Transferring your application scripts and related files into the Docker image.
Installing Application Dependencies: Installing required dependencies specific to your application (e.g., Python packages in a requirements.txt).
Defining the Entrypoint: Specifying the script or command that should run when the container starts.

Each of these components ensures that your tool operates seamlessly within the Dockerized environment, maintaining consistency across different deployments.

Example Context: In our example, we use ImageMagick with python scripts. However, the principles outlined here apply to any tool or application you wish to containerize.

Dockerfile for Preprocess Image Example

# Use an official Python runtime as the base image
FROM python:3.9-slim

# Set environment variables to ensure Python outputs are logged in real-time
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Create and set the working directory inside the container
WORKDIR /app

# Install system dependencies required by your tool
# Replace the below example packages with those needed for your specific tool
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    # Example system dependencies:
    imagemagick \
    && rm -rf /var/lib/apt/lists/*

# Copy the Python dependencies file into the container
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of your application code into the container
COPY . .

# Define the entrypoint to execute your interface script
# This ensures that when the container starts, it runs `stage2_interface.py`
ENTRYPOINT ["python", "stage2_interface.py"]

Next, we'll break down each layer of this Dockerfile and outline some targetted reccomendations to consider when building your own Dockerfile.

Selecting a Base Image

The base image provides the foundational operating system and runtime environment for your application. Selecting the right base image ensures compatibility and efficiency.

Recommendations:

Official Images: Use official images from Docker Hub that closely match your application's requirements.
Example: For Python applications, python:3.9-slim-buster offers a lightweight Debian-based image with Python 3.9 installed.
Slim or Alpine Variants: Opt for slim or alpine variants to reduce image size.
- Pros: Smaller footprint, faster downloads.
- Cons: May require additional dependencies not present in standard images.

Installing System Dependencies

System dependencies are essential packages and tools that your application requires to run correctly. These may include libraries, binaries, or other utilities that are not part of the base image.

Recommendations:

Minimal Installation: Install only the necessary packages to keep the image lightweight.
Use apt-get Efficiently: Combine RUN commands where possible to reduce the number of layers and clean up unnecessary files to minimize image size.
Pin Versions: Specify exact versions of packages to ensure consistency across builds.

Example Reference:

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    imagemagick \
    && rm -rf /var/lib/apt/lists/*

This command updates the package list, installs ImageMagick without recommended additional packages, and then cleans up the package lists to reduce the image size.

Setting Up the Application Environment

Configuring the environment ensures that your application has the necessary settings and directories to operate correctly within the container.

Key Elements:

Environment Variables (ENV): Define configuration parameters that your application can access.
Working Directory (WORKDIR): Sets the directory where subsequent commands are run, ensuring consistency in file paths.

Recommendations:

Use Environment Variables for Configuration: This allows for easy customization without altering the codebase.
Consistent Working Directory: Choose a standard directory (e.g., /app) to organize your application files.

# Set environment variables to ensure Python outputs are logged in real-time
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Create and set the working directory inside the container
WORKDIR /app

These settings ensure that Python outputs are not buffered and that bytecode files are not written, which can be useful for debugging and maintaining a clean environment.

Copying Application Code

Transferring your application’s source code and related files into the Docker image is crucial for the container to run your application.

Methods:

COPY: Copies files and directories from the build context into the image.
ADD: Similar to COPY but with additional features like extracting compressed files.

Recommendations:

Use COPY Over ADD: Unless you need the extra functionality of ADD, prefer COPY for its simplicity and predictability.
Leverage .dockerignore: Exclude unnecessary files from the build context to speed up the build process and reduce image size.

# Copy the Python dependencies file into the container
COPY requirements.txt .

# Copy the rest of your application code into the container
COPY . .

First, the requirements.txt is copied to leverage Docker’s caching mechanism, ensuring that dependencies are only reinstalled when requirements.txt changes. Then, the rest of the application code is copied into the image.

Note: in our example, we dont use any external dependencies so requirements.txt is not a necessary file to include in this process since it is just an empty .txt file.

Installing Application Dependencies

After copying your application code, you need to install any dependencies that your application requires to function.

Common Practices:

Use Dependency Managers: Utilize tools like pip for Python, npm for Node.js, or gem for Ruby to manage and install dependencies.
Leverage Caching: Structure Dockerfile commands to take advantage of Docker’s layer caching, reducing build times for subsequent builds.

Recommendations:

Install Dependencies Early: Place dependency installation steps before copying the full application code to maximize cache reuse.
Avoid Cache Busting: Ensure that commands which can change frequently (like copying source code) come after less frequently changing steps

# Copy the Python dependencies file into the container
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

By copying and installing requirements.txt before the rest of the code, Docker can cache the installed dependencies unless requirements.txt changes, speeding up future builds.

Again, in our example pip install requirements.txt actually does nothing, but is included to give reference

Defining The Entry Point

The entrypoint specifies the default command that runs when a container starts. In the case of building an operation, this is always going to be the stage2_interface.py script. In general, the entrypoint defines the primary process of the container.

Instructions:

ENTRYPOINT: Sets the container’s executable.
CMD: Provides default arguments to the entrypoint.

Recommendations:

Use ENTRYPOINT for Primary Commands: Ensures that the specified command always runs.
Combine with CMD for Flexibility: Allows users to override or append arguments when running the container.

# Define the entrypoint to execute your interface script
ENTRYPOINT ["python", "stage2_interface.py"]

Building and Downloading Your Docker Image

The hard work is done! All that is left is to build our Docker Image and embed it into stage2.

Make sure your docker daemon is running
run the following command in the directory where your Dockerfile and stage2_interface.py live

docker build -t your-image-name

Optionally, you can test to ensure your Dockerfile is working as expected by running the Docker image in a similar fashion to how you ran your stage2_interface.py script. This process is a bit more involved and deals with some more advanced docker commands. It is covered in more detail here: Testing your Docker Image

To download your image, list your docker images with

docker images

note the IMAGE ID (ex: 5d833c3dac14) associated with your newly built image and run:

docker save <image id> > your-image-name.tar

Congratulations! You now have the meat and potatoes of your very own custom Stage2 Operation, ready to be embedded into a workflow!

Overview of the Containerization Process​

Step 1: Building the Scripts​

Example Script​

Step 2: Creating the stage2_interface.py​

Key Elements of the stage2_interface.py:​

JSON input from Stage2​

Testing our stage2_interface.py script​

Step 3: Writing the Dockerfile​

Prerequisites: Required Docker Knowledge​

Overview of the Dockerfile​

Selecting a Base Image​

Installing System Dependencies​

Setting Up the Application Environment​

Copying Application Code​

Installing Application Dependencies​

Defining The Entry Point​

Building and Downloading Your Docker Image​

Overview of the Containerization Process

Step 1: Building the Scripts

Example Script

Step 2: Creating the `stage2_interface.py`

Key Elements of the `stage2_interface.py`:

JSON input from Stage2

Testing our `stage2_interface.py` script

Step 3: Writing the `Dockerfile`

Prerequisites: Required Docker Knowledge

Overview of the Dockerfile

Selecting a Base Image

Installing System Dependencies

Setting Up the Application Environment

Copying Application Code

Installing Application Dependencies

Defining The Entry Point

Building and Downloading Your Docker Image