Tutorial 1: Merge Multi-ROI Chromatin Trace Data

This tutorial walks through the standard workflow for merging chromatin tracing data from multiple regions of interest (ROIs):

  1. Collect trace files from all ROIs into one folder

  2. Assess quality by computing Pearson correlations between ROIs

  3. Remove ROIs with poor correlation (outliers)

  4. Re-assess the correlation matrix after removal

  5. Merge the remaining trace files into a single table

  6. Statistics on the merged dataset

  7. Next steps — link to Tutorial 2 for quality control

Step 0: Set-up your data and output path

[2]:
import os
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

Set up your folder data path:

[3]:
data_path = "/home/devos/Documents/data_to_compare_pdx1/PDX1"

Set up destination folder for output:

[6]:
dest_path = f"{data_path}/"

Check ROIs detected:

[7]:
print(f"Data path: {data_path}")
print(f"Output path: {dest_path}")
print(f"\nAvailable ROIs:")
for d in sorted(Path(data_path).iterdir()):
    if d.is_dir() and "ROI" in d.name:
        print(f"  {d.name}")
Data path: /home/devos/Documents/data_to_compare_pdx1/PDX1
Output path: /home/devos/Documents/data_to_compare_pdx1/PDX1/

Available ROIs:
  016_ROI
  017_ROI
  018_ROI
  019_ROI
  020_ROI
  021_ROI
  022_ROI
  023_ROI
  024_ROI
  025_ROI
  026_ROI
  027_ROI
  028_ROI
  029_ROI
  030_ROI
  031_ROI

Step 1: Collect trace files from all ROIs

collect_files scans each subdirectory of --root for a file matching --example-file. The --variable-part "13" indicates that the ROI number varies; fixed-length matching naturally excludes Matrix files (different filename length).

[22]:
!collect_files --root {data_path} --example-file "Trace_3D_barcode_mask-mask0_ROI-18_Pdx1_filtered_Pdx1.ecsv" --variable-part "18" --copy-to {dest_path}/raw_traces --force
Matched (16):
  016_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/016_ROI/Trace_3D_barcode_mask-mask0_ROI-16_Pdx1_filtered_Pdx1.ecsv
  017_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/017_ROI/Trace_3D_barcode_mask-mask0_ROI-17_Pdx1_filtered_Pdx1.ecsv
  018_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/018_ROI/Trace_3D_barcode_mask-mask0_ROI-18_Pdx1_filtered_Pdx1.ecsv
  019_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/019_ROI/Trace_3D_barcode_mask-mask0_ROI-19_Pdx1_filtered_Pdx1.ecsv
  020_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/020_ROI/Trace_3D_barcode_mask-mask0_ROI-20_Pdx1_filtered_Pdx1.ecsv
  021_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/021_ROI/Trace_3D_barcode_mask-mask0_ROI-21_Pdx1_filtered_Pdx1.ecsv
  022_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/022_ROI/Trace_3D_barcode_mask-mask0_ROI-22_Pdx1_filtered_Pdx1.ecsv
  023_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/023_ROI/Trace_3D_barcode_mask-mask0_ROI-23_Pdx1_filtered_Pdx1.ecsv
  024_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/024_ROI/Trace_3D_barcode_mask-mask0_ROI-24_Pdx1_filtered_Pdx1.ecsv
  025_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/025_ROI/Trace_3D_barcode_mask-mask0_ROI-25_Pdx1_filtered_Pdx1.ecsv
  026_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/026_ROI/Trace_3D_barcode_mask-mask0_ROI-26_Pdx1_filtered_Pdx1.ecsv
  027_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/027_ROI/Trace_3D_barcode_mask-mask0_ROI-27_Pdx1_filtered_Pdx1.ecsv
  028_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/028_ROI/Trace_3D_barcode_mask-mask0_ROI-28_Pdx1_filtered_Pdx1.ecsv
  029_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/029_ROI/Trace_3D_barcode_mask-mask0_ROI-29_Pdx1_filtered_Pdx1.ecsv
  030_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/030_ROI/Trace_3D_barcode_mask-mask0_ROI-30_Pdx1_filtered_Pdx1.ecsv
  031_ROI -> /home/devos/Documents/data_to_compare_pdx1/PDX1/031_ROI/Trace_3D_barcode_mask-mask0_ROI-31_Pdx1_filtered_Pdx1.ecsv

Copied 16 file(s) to /home/devos/Documents/data_to_compare_pdx1/PDX1/raw_traces

Step 2: Compute Pearson correlations between ROIs

trace_pearsons computes a pairwise distance map for each ROI (median 3D distance between every barcode pair), then calculates the Pearson correlation between these maps.

A high correlation between two ROIs means they share similar chromatin organization. An ROI with low correlation against all others is likely an outlier (imaging artifact, poor segmentation, etc.).

[15]:
!ls {dest_path}/raw_traces/*.ecsv | trace_pearsons --pipe -O {dest_path}

# Display the correlation matrix
img = mpimg.imread(f"{dest_path}/trace_correlation_matrix.png")
fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
Analyzing 16 trace files...
Processing Trace_3D_barcode_mask-mask0_ROI-16_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-17_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-18_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-19_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-20_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-21_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-22_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-23_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-24_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-25_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-26_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-27_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-28_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-29_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-30_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-31_Pdx1_filtered_Pdx1.ecsv
$ Saved correlation matrix as /home/devos/Documents/data_to_compare_pdx1/PDX1/trace_correlation_matrix.png
$ Saved correlation matrix data in NPY format: /home/devos/Documents/data_to_compare_pdx1/PDX1/trace_correlation_matrix.npy
../_images/tutorials_tutorial_01_merge_multi_roi_12_1.png

Step 3: Remove poorly-correlated ROIs

For each ROI, we compute its mean Pearson correlation with all other ROIs. ROIs below the threshold are removed from the working folder before merging.

A threshold of 0.70 is conservative: it removes only clear outliers while preserving the vast majority of the data.

[16]:
# Load the correlation matrix saved by trace_pearsons
corr = np.load(f"{dest_path}/trace_correlation_matrix.npy")
traces = sorted(Path(f"{dest_path}/raw_traces").glob("*.ecsv"))

# Mean correlation per ROI (exclude self-correlation on the diagonal)
np.fill_diagonal(corr, np.nan)
mean_corr = np.nanmean(corr, axis=1)

# Remove ROIs below threshold
threshold = 0.10
for trace, mc in zip(traces, mean_corr):
    if mc < threshold or np.isnan(mc):
        trace.unlink()
        print(f"Removed {trace.name}  (mean Pearson = {mc:.3f})")
    else:
        print(f"Kept    {trace.name}  (mean Pearson = {mc:.3f})")
Kept    Trace_3D_barcode_mask-mask0_ROI-16_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.457)
Kept    Trace_3D_barcode_mask-mask0_ROI-17_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.140)
Kept    Trace_3D_barcode_mask-mask0_ROI-18_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.475)
Kept    Trace_3D_barcode_mask-mask0_ROI-19_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.377)
Kept    Trace_3D_barcode_mask-mask0_ROI-20_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.504)
Kept    Trace_3D_barcode_mask-mask0_ROI-21_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.534)
Kept    Trace_3D_barcode_mask-mask0_ROI-22_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.432)
Removed Trace_3D_barcode_mask-mask0_ROI-24_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = nan)
Kept    Trace_3D_barcode_mask-mask0_ROI-25_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.347)
Kept    Trace_3D_barcode_mask-mask0_ROI-26_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.393)
Kept    Trace_3D_barcode_mask-mask0_ROI-27_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.106)
Kept    Trace_3D_barcode_mask-mask0_ROI-28_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.414)
Kept    Trace_3D_barcode_mask-mask0_ROI-29_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.150)
Kept    Trace_3D_barcode_mask-mask0_ROI-30_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.452)
Kept    Trace_3D_barcode_mask-mask0_ROI-31_Pdx1_filtered_Pdx1.ecsv  (mean Pearson = 0.494)
/tmp/ipykernel_85188/2373448959.py:7: RuntimeWarning: Mean of empty slice
  mean_corr = np.nanmean(corr, axis=1)

Step 4: Re-run Pearson to verify improvement

After removing the outlier ROIs, the correlation matrix should show higher overall values.

[19]:
!ls {dest_path}/raw_traces/*.ecsv | trace_pearsons --pipe -O {dest_path}

# Display the updated correlation matrix
img = mpimg.imread(f"{dest_path}/trace_correlation_matrix.png")
fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
Analyzing 12 trace files...
Processing Trace_3D_barcode_mask-mask0_ROI-16_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-18_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-19_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-20_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-21_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-22_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-24_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-25_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-27_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-29_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-30_Pdx1_filtered_Pdx1.ecsv
Processing Trace_3D_barcode_mask-mask0_ROI-31_Pdx1_filtered_Pdx1.ecsv
$ Saved correlation matrix as /home/devos/Documents/data_to_compare_pdx1/PDX1/trace_correlation_matrix.png
$ Saved correlation matrix data in NPY format: /home/devos/Documents/data_to_compare_pdx1/PDX1/trace_correlation_matrix.npy
../_images/tutorials_tutorial_01_merge_multi_roi_16_1.png

Step 5: Merge trace files

[20]:
!ls {dest_path}/raw_traces/*.ecsv | trace_merge -F {dest_path} -N merged_traces.ecsv
Number of trace files to merge: 12
 $ Merged trace file will contain 31407 traces
Read and accumulated 12 trace files
$ Saving output table as /home/devos/Documents/data_to_compare_pdx1/PDX1/merged_traces.ecsv ...
Finished execution

Step 6: Compute basic statistics

[21]:
!trace_stats --input {dest_path}/merged_traces.ecsv
$ Importing table from pyHiM format
Successfully loaded trace table: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces.ecsv
Statistics for /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces.ecsv:
- Number of unique ROIs: 12
- Number of unique chromatin traces: 3388
- Number of unique barcodes: 23

Next steps

The merged trace file is ready for downstream analysis.

Continue with Tutorial 2 — Quality Control to:

  • Generate detailed quality metrics with trace_analyzer

  • Interpret barcode detection, neighbor distances, and barcode frequency plots

  • Decide on filtering thresholds for Tutorial 3

Summary

Step

Script

Output

Collect

collect_files

raw_traces/ (one .ecsv per ROI)

Correlations

trace_pearsons

trace_correlation_matrix.png + .npy

Filter ROIs

Python (numpy)

removed outlier files from raw_traces/

Merge

trace_merge

merged_traces.ecsv

Statistics

trace_stats

stdout

Output location: data/output/