[1]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Data path from Tutorial 1
data_path = "/home/devos/Documents/data_to_compare_pdx1/PDX1"
dest_path = f"{data_path}/"

# Input file from Tutorial 1
input_trace = f"{dest_path}/merged_traces.ecsv"

print(f"Input file: {input_trace}")
print(f"Output folder: {dest_path}")
Input file: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces.ecsv
Output folder: /home/devos/Documents/data_to_compare_pdx1/PDX1/

Tutorial 2: Quality Control Analysis of Merged Chromatin Traces

Before filtering or downstream analysis, assess your data quality through comprehensive statistical analysis.

trace_analyzer computes:

  • Trace Statistics: Distribution of barcode counts per trace

  • Barcode Detection: How reliably each barcode is detected (bootstrap analysis)

  • Neighbor Distances: Spacing between consecutive barcodes (ΔX, ΔY, ΔZ)

  • Barcode Frequencies: How often individual barcodes appear

  • Spatial KDE Projections: Density heatmaps showing where spots are located in X, Y, Z

These metrics guide filtering decisions in Tutorial 3.

Step 1: Run Quality Control Analysis

[2]:
# Run trace_analyzer for detailed quality metrics
!trace_analyzer --input {input_trace}

1 trace files to process= /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces.ecsv
$ Importing table from pyHiM format
Successfully loaded trace table: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces.ecsv
> Analyzing traces for /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces.ecsv
$ Number of spots in trace file: 31407
$ Calculating overall barcode detection across 3388 traces...
$ Exporting barcode detection plot to: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces_barcode_detection.png
$ Saved neighbor distances plot: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces_first_neighbor_distances.png
$ Mean distances between neighboring barcodes: X=-0.000, Y=-0.001, Z=-0.021
$ Calculating barcode stats...
$ Exporting relative barcode frequencies figure to: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces_relative_barcode_frequencies.png
$ Saved KDE projection plot: /home/devos/Documents/data_to_compare_pdx1/PDX1//merged_traces_kde_projections.png
Finished execution

Step 2: Barcode Detection Efficiency

[3]:
plot_file = f"{dest_path}/merged_traces_barcode_detection.png"
img = mpimg.imread(plot_file)
fig, ax = plt.subplots(figsize=(14, 6))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
../_images/tutorials_tutorial_02_quality_control_5_0.png

This plot shows how often each individual barcode is detected across all traces. Each barcode is represented by a violin plot (bootstrap distribution with 1000 iterations), which shows:

  • Height of the violin: Range of detection frequencies (0-100%)

  • Median line: Most common detection frequency

  • Shape: Distribution shape indicates consistency

Interpretation:

  • Narrow violin = reliable barcode (consistent detection)

  • Wide violin = unreliable barcode (variable detection)

  • Barcodes with very low median detection may be candidates for --remove_barcode in Tutorial 3

Step 3: Neighbor Distance Distribution

[4]:
plot_file = f"{dest_path}/merged_traces_first_neighbor_distances.png"
img = mpimg.imread(plot_file)
fig, ax = plt.subplots(figsize=(14, 6))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
../_images/tutorials_tutorial_02_quality_control_8_0.png

This plot shows distances between consecutive barcodes (barcode_i+1 - barcode_i) for all traces. Three distributions are shown:

  • ΔX (blue): Distance in X dimension

  • ΔY (green): Distance in Y dimension

  • ΔZ (red): Distance in Z dimension (optical axis)

Interpretation:

  • Bell-shaped, centered near 0 = uniform spacing between barcodes (expected)

  • Large spread or multiple peaks = inconsistent spacing or detection issues

  • ΔZ much larger than ΔX/ΔY = potential focusing problems

Step 4: Barcode Repetition Patterns

[5]:
plot_file = f"{dest_path}/merged_traces_relative_barcode_frequencies.png"
img = mpimg.imread(plot_file)
fig, ax = plt.subplots(figsize=(14, 6))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
../_images/tutorials_tutorial_02_quality_control_11_0.png

This plot shows how often individual barcodes are repeated within single traces:

  • Y-axis: Number of times each barcode appears per trace (repetition count)

  • Near 1 = barcode appears once per trace (ideal)

  • Toward 2+ = barcode frequently duplicated (optical artifacts or detection noise)

Duplicated barcodes are handled in Tutorial 4 (--clean_spots).

Step 5: Trace Statistics Summary

[6]:
plot_file = f"{dest_path}/merged_traces_trace_statistics.png"
img = mpimg.imread(plot_file)
fig, ax = plt.subplots(figsize=(16, 6))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
../_images/tutorials_tutorial_02_quality_control_14_0.png

Three distributions across all traces:

  • N_barcodes (left): Number of barcode detections per trace

  • N_unique_barcodes (middle): Number of distinct barcodes per trace

  • N_repeated_barcodes (right): Number of barcodes appearing more than once per trace

Interpretation:

  • N_barcodes median indicates trace completeness → guides --n_barcodes threshold in Tutorial 3

  • N_repeated_barcodes > 0 indicates duplicates → handled in Tutorial 4

Step 6: Spatial Distribution (KDE Projections)

[7]:
plot_file = f"{dest_path}/merged_traces_kde_projections.png"
img = mpimg.imread(plot_file)
fig, ax = plt.subplots(figsize=(14, 10))
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
../_images/tutorials_tutorial_02_quality_control_17_0.png

This figure shows kernel density estimation (KDE) heatmaps of all spot positions projected onto three planes:

  • XY projection (left): Top-down view of the sample. Bright regions = high spot density.

  • XZ projection (top-right): Side view along Y. Shows the Z range where spots are concentrated.

  • YZ projection (bottom-right): Side view along X.

How this helps for filtering:

  • The XZ and YZ panels reveal the usable Z range. If spots concentrate between Z = 3 and Z = 11 µm, you can set --z_min 3.0 --z_max 11.0 in Tutorial 3 to remove out-of-focus outliers.

  • If the XY panel shows spots concentrated in a sub-region, you could use --x_min / --x_max / --y_min / --y_max to exclude border artifacts.

  • A uniform XY distribution is normal; clusters may indicate imaging artifacts or biological structure.

Step 7: Using QC Insights for Filtering

From the QC plots above, you can extract key metrics to guide filtering in Tutorial 3:

Plot

What to Look For

Filter Option

Barcode Detection

Barcodes with very low median

--remove_barcode

Trace Statistics

N_barcodes median

--n_barcodes

Trace Statistics

N_repeated_barcodes > 0

--clean_spots (Tutorial 4)

KDE Projections

Z range with spot concentration

--z_min / --z_max

KDE Projections

XY border artifacts

--x_min / --y_max etc.

Summary

Quality control revealed important characteristics of your merged dataset:

  • Trace completeness — How many barcodes detected per trace

  • Barcode reliability — Detection consistency for each barcode

  • Spatial distribution — Where spots concentrate in X, Y, Z (KDE projections)

  • Neighbor spacing — Distance characteristics (ΔX, ΔY, ΔZ)

  • Duplicate barcodes — Repetition patterns across traces

Next: Tutorial 3 — Filter Traces using insights from these QC metrics