Core Concept:
Stage2 enables you to package complex tasks into reproducible, configurable steps within a larger automated workflow. Each operation encapsulates a tool and its associated logic within a self-contained Docker image. When an operation runs, Stage2 provides structured JSON as a command-line argument, defining input and output paths, as well as parameters that dictate how the operation is executed.
1. JSON-Driven Interfaces:
Each Stage2 operation defines parameters, inputs, and outputs through a JSON-based interface. This provides a flexible yet standardized way to configure tools:
- Parameters (e.g., boolean flags, numeric ranges, arrays, strings) let you adjust how the operation runs without modifying code.
- Inputs/Outputs map files or datasets into and out of the container, ensuring that each node in the workflow can consume and produce the right data for subsequent steps.
When an operation is executed, stage2 passes a JSON object that looks like this to the stage2_interface.py
main method:
stage2_inputs={
"context":{...},
"parameters":{
"Automatic ROI": true,
"Contrast Factor": 1.4,
"Normalization Range": [-1,1],
"Region of Interest": [2,2,1,1]
},
"inputs":{
"Raw Image File":["<path_to_input_file>"]
},
"outputs":"/stage2/output/data"
}
2. stage2_interface.py
as the Operation Bridge:
Rather than running your tool directly, Stage2 calls an interface script stage2_interface.py
inside the container:
- Orchestration Logic: This script parses JSON parameters passed in by Stage2, validates them, and then executes the chosen tool (e.g., ImageMagick, a Python script, or another CLI) with those parameters.
- Central Control Point: By serving as the container’s entrypoint,
stage2_interface.py
manages error handling, logging, and output formatting in a consistent manner.
3. Containerization with Docker:
Stage2 leverages Docker to create and run isolated environments:
- Stable Environments: Your chosen tools, libraries, and dependencies are baked into the Docker image. Each run is predictable, regardless of the underlying infrastructure.
- Flexible Scaling: By defining everything in a Dockerfile, you can easily scale operations, share images across teams, or run them in different environments (dev, staging, production) without drifting configurations.
4. Parameterization and Reusability:
The JSON-based interface and Dockerized environment mean that the same operation can be reused across multiple workflows just by changing parameter values. For example:
- Dynamic Inputs: You might feed in different files or directories without changing code.
- Adjustable Behavior: Tweak parameters (e.g., toggling an ROI or adjusting normalization ranges) to handle varied use cases or tune performance on the fly.
5. Workflow Integration and Chaining:
In Stage2, operations are often chained together to form complex workflows. The output of one operation’s container feeds directly into the next, all governed by a consistent JSON schema. This design:
- Reduces Complexity: Instead of managing multiple scripts and their dependencies manually, each operation is a discrete, managed step.
- Enhances Maintainability: Operations can be versioned, tested in isolation, and swapped out without affecting upstream or downstream steps.
6. Logging and Debugging:
Because all operations run inside containers and are triggered with structured JSON, logs and results can be centrally collected and inspected within Stage2:
- Unified Logging: Errors, warnings, and process outputs are surfaced directly in the Stage2 UI.
- Easy Diagnostics: If something fails, you know exactly which container, script, and parameter set caused the issue—accelerating troubleshooting and iteration.
In Summary:
By defining interfaces in JSON, implementing a stage2_interface.py
script as the control layer, and encapsulating everything in a Docker image, Stage2 provides a robust framework for creating modular, reproducible operations. You can flexibly tune parameters, easily integrate various tools, and build scalable pipelines that remain predictable and maintainable over time.
Relevant Files
Tool Interaction Script : (preprocess_image.py)
import subprocess
import os
import sys
def run_command(command):
try:
subprocess.run(command, check=True)
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Error executing command: {' '.join(command)}\n{e}")
def convert_to_grayscale(input_path, output_path):
command = ["convert", input_path, "-colorspace", "Gray", output_path]
run_command(command)
print(f"Converted to grayscale: {output_path}")
def crop_image(input_path, output_path, roi, auto_roi=False):
if auto_roi:
# ImageMagick's 'trim' removes uniform borders, which can be a form of automatic cropping
command = ["convert", input_path, "-trim", "+repage", output_path]
run_command(command)
print(f"Automatically cropped image: {output_path}")
else:
x1, y1, x2, y2 = roi
width = x2 - x1
height = y2 - y1
crop_geometry = f"{width}x{height}+{x1}+{y1}"
command = [
"convert",
input_path,
"-crop",
crop_geometry,
"+repage",
output_path,
]
run_command(command)
print(f"Cropped image with ROI {roi}: {output_path}")
def adjust_contrast(input_path, output_path, contrast_factor):
# contrast_factor is used as a percentage
command = [
"convert",
input_path,
"-contrast-stretch",
f"{contrast_factor}%",
output_path,
]
run_command(command)
print(f"Adjusted contrast by {contrast_factor}%: {output_path}")
def normalize_pixels(input_path, output_path, normalization_range):
"""Normalizes pixel values to the specified range."""
min_val, max_val = normalization_range
if min_val == 0 and max_val == 1:
# Scale pixel values to [0,1]
command = ["convert", input_path, "-normalize", output_path]
run_command(command)
print(f"Normalized pixel values to [{min_val}, {max_val}]: {output_path}")
elif min_val == -1 and max_val == 1:
# Scale pixel values to [-1,1]
command = ["convert", input_path, "-normalize", output_path]
run_command(command)
print(
f"Normalized pixel values to [{min_val}, {max_val}] (Placeholder): {output_path}"
)
else:
print(f"Unsupported normalization range: [{min_val}, {max_val}]")
sys.exit(1)
# -------------------------------
# Main Processing Workflow
# -------------------------------
def preprocess_image(
input_path, output_path, auto_roi, roi, contrast_factor, normalization_range
):
# Step 1: Convert to Grayscale
grayscale_path = "grayscale_" + os.path.basename(input_path)
convert_to_grayscale(input_path, grayscale_path)
# Step 2: Crop Image
cropped_path = "cropped_" + os.path.basename(grayscale_path)
crop_image(grayscale_path, cropped_path, roi, auto_roi)
# Step 3: Adjust Contrast
contrast_path = "contrast_" + os.path.basename(cropped_path)
adjust_contrast(cropped_path, contrast_path, contrast_factor)
# Step 4: Normalize Pixel Values
normalize_path = "normalized_" + os.path.basename(contrast_path)
normalize_pixels(contrast_path, normalize_path, normalization_range)
# Step 5: Copy the final image to output_path
shutil.copyfile(normalize_path, output_path)
print(f"Final processed image saved to {output_path}")
# Step 6 (optional): Clean up intermediate files
intermediate_files = [grayscale_path, cropped_path, contrast_path, normalize_path]
for file in intermediate_files:
try:
os.remove(file)
print(f"Deleted intermediate file: {file}")
except OSError as e:
print(f"Warning: Could not delete intermediate file {file}: {e}")
stage2_interface.py
import json
import argparse
import os
import sys
# Import the preprocess_image function from preprocess_image.py
from preprocess_image_doc import preprocess_image
def main():
# Parse command-line arguments
parser = argparse.ArgumentParser(
prog="ImageMagick Stage2 Interface",
description="Interface to preprocess images using ImageMagick within Stage2.",
)
parser.add_argument(
"stage2_input",
help="JSON string with inputs, parameters, and output path.",
)
args = parser.parse_args()
print("\n\n args:", args, "\n\n")
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Parse the JSON input
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
try:
stage2_arguments = json.loads(args.stage2_input)
except json.JSONDecodeError as e:
print(f"Error parsing JSON input: {e}")
sys.exit(1)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Extract inputs, parameters, and output
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
inputs = stage2_arguments.get("inputs")
parameters = stage2_arguments.get("parameters")
output_path = stage2_arguments.get("output")
# array of all inputs to the input named "Raw Image File"
input_files = inputs.get("Raw Image File", [])
print(input_files)
# in our case the Raw Image File input only has a multiplicity of 1:1, so the array only has 1 element
input_file = input_files[0]
if not input_file:
print("No 'Raw Input File' provided.")
sys.exit(1)
if not os.path.exists(input_file):
print(f"Input file does not exist: {input_file}")
sys.exit(1)
# Automatic ROI Validation
auto_roi = parameters.get("Automatic ROI")
if auto_roi is None:
print("'Automatic ROI' parameter is missing.")
sys.exit(1)
if not isinstance(auto_roi, bool):
print("'Automatic ROI' must be a boolean value.")
sys.exit(1)
# Region of Interest Validation
roi = parameters.get("Region of Interest")
if not auto_roi:
if roi is None:
print("Automatic ROI is disabled, but no 'Region of Interest' provided.")
sys.exit(1)
if not isinstance(roi, list) or len(roi) != 4:
print(
"Invalid 'Region of Interest'. It must be a list of four integers: [x1, y1, x2, y2]."
)
sys.exit(1)
if any(not isinstance(coord, int) or coord < 0 for coord in roi):
print("ROI coordinates must be non-negative integers.")
sys.exit(1)
x1, y1, x2, y2 = roi
if x2 <= x1 or y2 <= y1:
print("ROI coordinates must satisfy x2 > x1 and y2 > y1.")
sys.exit(1)
# Contrast Factor Validation
contrast_factor = parameters.get("Contrast Factor")
if contrast_factor is None:
print("'Contrast Factor' parameter is missing.")
sys.exit(1)
if not isinstance(contrast_factor, (int, float)) or contrast_factor < 0.1:
print(
"'Contrast Factor' must be a positive number with a minimum value of 0.1."
)
sys.exit(1)
# Normalization Range Validation
normalization_range = parameters.get("Normalization Range")
if normalization_range is None:
print("'Normalization Range' parameter is missing.")
sys.exit(1)
if (
not isinstance(normalization_range, list)
or len(normalization_range) != 2
or not all(isinstance(n, (int, float)) for n in normalization_range)
):
print("'Normalization Range' must be a list of two numbers: [min, max].")
sys.exit(1)
min_val, max_val = normalization_range
if min_val >= max_val:
print("'Normalization Range' must have min < max.")
sys.exit(1)
if normalization_range not in ([0, 1], [-1, 1]):
print("'Normalization Range' must be either [0, 1] or [-1, 1].")
sys.exit(1)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Invoke the preprocess_image function
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
try:
preprocess_image(
input_path=input_file,
output_path=output_path,
auto_roi=auto_roi,
roi=roi if not auto_roi else None,
contrast_factor=contrast_factor,
normalization_range=normalization_range,
)
except Exception as e:
print(f"Error during image preprocessing: {e}")
sys.exit(1)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Final output is saved to _output_path in the preprocess_image method
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
print(f"Image preprocessing completed successfully. Output saved to: {output_path}")
if __name__ == "__main__":
main()
Example Dockerfile
# Use an official Python runtime as the base image
FROM python:3.9-slim
# Set environment variables to ensure Python outputs are logged in real-time
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
# Create and set the working directory inside the container
WORKDIR /app
# Install system dependencies required by your tool
# Replace the below example packages with those needed for your specific tool
RUN apt-get update && \
apt-get install -y --no-install-recommends \
# Example system dependencies:
imagemagick \
&& rm -rf /var/lib/apt/lists/*
# Copy the Python dependencies file into the container
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of your application code into the container
COPY . .
# Define the entrypoint to execute your interface script
# This ensures that when the container starts, it runs `stage2_interface.py`
ENTRYPOINT ["python", "stage2_interface.py"]