1 Introduction

SSPA [1-2]

2 Perform differential analyses

Hereafter we will present 3 datasets for which we will determine differential peaks and the consequent power calculations for these analyses:

tamoxifen (DESeq2-mode): ER\(\alpha\) ChIP-seq in human breast cancer cell lines responsive or not to tamoxifen treatment (estrogen receptor inhibitor), from Ross-Innes et al. (Nature, 2012) [3];
endometrium (edgeR-mode): ER\(\alpha\) ChIP-seq in human endometrial tissues derived from healthy donors of endometrial cancer patients, from Gregoricchio et al. (Genome Biology, 2025) [4];
atac_mm (DESeq2-mode): ATAC-seq data from T. Bruno, M.C. Cappelletto, C. Cortile et al. (Blood, 2025) [5] include 11 Monoclonal Gammopathy of Undetermined Significance (MGUS) and 11 Multiple Myeloma (MM) human samples. Of note, we selected the 11 MM samples (out of 55) with the highest number of peaks (see Figure 1B from the original article) in order to have a similar sample size between the two groups.

The final differential analysis results of each dataset are available in power4peaks, and accessible as follows:

data("tamoxifen", package = "power4peaks")
data("endometrium", package = "power4peaks")
data("atac_mm", package = "power4peaks")

However, hereafter we will show how to obtain these results.

2.1 Differential results

The input of power4peaks are DBA objected obtained from differential peaks analyses performed usinf DiffBind.
Hereafter we show all the steps required to processes the data and perform differential analyses.

2.1.1 Tamoxifen

The “tamoxifen” dataset is available in the Biocondutor package DiffBind [2], and data can be downloaded as follow:

## load DiffBind
require(DiffBind)

data_url <- "https://content.cruk.cam.ac.uk/bioinformatics/software/DiffBind/DiffBind_vignette_data.tar.gz"
file <- basename(url)
tmpdir <- tempdir()
options(timeout=600)
download.file(url, file.path(tmpdir,file))
untar(file.path(tmpdir,file), exdir = tmpdir)

Analyses can be performed in one single step:

setwd(file.path(tmpdir,"DiffBind_Vignette"))
tamoxifen <- dba.analyze(file.path(tmpdir,"DiffBind_Vignette/tamoxifen.csv"))

tamoxifen

> 11 Samples, 2795 sites in matrix:
>        ID Tissue Factor  Condition  Treatment Replicate   Reads FRiP
> 1  BT4741  BT474     ER  Resistant Full-Media         1  652697 0.15
> 2  BT4742  BT474     ER  Resistant Full-Media         2  663370 0.14
> 3   MCF71   MCF7     ER Responsive Full-Media         1  346429 0.30
> 4   MCF72   MCF7     ER Responsive Full-Media         2  368052 0.18
> 5   MCF73   MCF7     ER Responsive Full-Media         3  466273 0.24
> 6   T47D1   T47D     ER Responsive Full-Media         1  399879 0.11
> 7   T47D2   T47D     ER Responsive Full-Media         2 1475415 0.06
> 8  MCF7r1   MCF7     ER  Resistant Full-Media         1  616630 0.21
> 9  MCF7r2   MCF7     ER  Resistant Full-Media         2  593224 0.13
> 10  ZR751   ZR75     ER Responsive Full-Media         1  706836 0.32
> 11  ZR752   ZR75     ER Responsive Full-Media         2 2575408 0.21
> 
> Design: [~Condition] | 1 Contrast:
>      Factor      Group Samples    Group2 Samples2 DB.DESeq2
> 1 Condition Responsive       7 Resistant        4       246

2.1.2 Endomtrium

For this data set, the BAM files (GRCh37/Hg19) are deposited on EGA under accession number EGAS00001007240 > EGAD00001010896, while the peak files can be downloaded from GEO under accession number GSE235241 > GSE253900.

Here after we will show how to generate the DiffBind object (class dba) available in the power4peaks package with the name of “endometrium”.

2.1.2.1 Load the data in `DiffBind`

## Load endometrium metadata from power4omics
data("endometrium_metadata", package = "power4peaks")
endometrium_metadata

require(DiffBind)

## Setup DiffBind object/experiment
dba.endometrium =
  dba(sampleSheet = endometrium_metadata,
      config = data.frame(AnalysisMethod = DBA_EDGER,
                          th = 0.05,
                          design = TRUE,
                          cores = 4,
                          RunParallel = TRUE,
                          doBlacklist = TRUE,
                          doGreylist = FALSE))

H147 Endometrium ERa Normal Normal NA bed
H12 Endometrium ERa Normal Normal NA bed
HA Endometrium ERa Normal Normal NA bed
HB Endometrium ERa Normal Normal NA bed
T5 Endometrium ERa Tumor Tumor NA bed
T55 Endometrium ERa Tumor Tumor NA bed
T33 Endometrium ERa Tumor Tumor NA bed
T102 Endometrium ERa Tumor Tumor NA bed

# Apply black list
dba.endometrium = dba.blacklist(dba.endometrium, blacklist = DBA_BLACKLIST_GRCH37)

Genome detected: Hsapiens.NCBI.GRCh37
Applying blacklist...
Removed: 137 of 135444 intervals.
Removed: 54 merged (of 67512) and 34 (of 30155) consensus.

## Compute counts at peaks
dba.endometrium = dba.count(dba.endometrium, bParallel = TRUE, fragmentSize = 200)

## Normalize counts
dba.endometrium = dba.normalize(dba.endometrium) 

## Define contrast
dba.endometrium = dba.contrast(dba.endometrium, contrast=c("Condition", "Tumor", "Normal"))

## Differential binding
endometrium = dba.analyze(dba.endometrium, method = DBA_EDGER)
endometrium

> 8 Samples, 30044 sites in matrix:
>     ID      Tissue Factor Condition Treatment    Reads FRiP
> 1 H147 Endometrium    ERa    Normal    Normal 15210712 0.02
> 2  H12 Endometrium    ERa    Normal    Normal 17667796 0.04
> 3   HA Endometrium    ERa    Normal    Normal 22560840 0.03
> 4   HB Endometrium    ERa    Normal    Normal 21147647 0.02
> 5   T5 Endometrium    ERa     Tumor     Tumor 16648072 0.04
> 6  T55 Endometrium    ERa     Tumor     Tumor 14075912 0.04
> 7  T33 Endometrium    ERa     Tumor     Tumor 15513829 0.03
> 8 T102 Endometrium    ERa     Tumor     Tumor 20545150 0.06
> 
> Design: [~Condition] | 1 Contrast:
>      Factor Group Samples Group2 Samples2 DB.edgeR
> 1 Condition Tumor       4 Normal        4    11235

2.1.3 ATAC-seq

For this data set we re-analyzed the data starting from the FASTQ data as follows.

2.1.3.1 Fastq download

For the download we use the multiDUMP pipeline.
In power4peaks we provide a SRA run table for the download, namely atac_mm_SraRunTable. We selected the 11 MM samples (out of 55) with the highest number of peaks (see Figure 1B from the original article) in order to have a similar sample size between the two groups.

2.1.3.1.1 Extracting data information

require(dplyr)

## MM samples to keep
MM_sample_selection = c("022", "025", "059", '069', "027", "009", "054", "073", "033", "013", "028")

## Read and filter runs table
sra = 
  power4peaks::atac_mm_SraRunTable %>%
  dplyr::filter(Sample_Alias %in% paste0("ATAC_seq_tumour_", MM_sample_selection, "_MM") | 
                grepl("MGUS", Sample_Alias)) %>%
  dplyr::select(Run, Sample_Alias)

sra

2.1.3.1.2 Download the data

snakemake \
-s </target/folder>/multiDUMP/workflow/multiDUMP.snakefile  \
--cores 5 \
--config \
TABLE="./tables/download.run_config.txt" \
OUTDIR="./sources/fastq" \
SUFFIX="['_R1','_R2']" \
EXTENSION=".fastq.gz" \
--keep-going

2.1.3.2 Data pre-processing

For the mapping and analysis of the FASTQ data we use the snakeATAC pipeline.

2.1.3.2.1 Mapping ATAC to Hg38

snakemake \
--cores 20 \
-s </target/folder>/snakeATAC/workflow/snakeATAC_mapping.snakefile \
--configfile ./sources/snakeATAC_mapping_configfile.yaml \
--config \
fastq_directory="./sources/fastq" \
output_directory="./sources/mapping" \
genome_fasta="/path/to/GRCh38.d1.vd1.fa" \
--keep-going

The mapping configuration file can be find here after (click on “Show” to reveal the code).

# This .yaml configuration file contains all variables used by the snakemake pipeline
# DO NOT CHANGE parameter names without changing it in Snakefile as well
# On the other hand, some parameter values have to be inevitably modifed
# *************************************************************************************

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ DNA MAPPING @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

### 0. General workflow parameters
fastq_directory:
output_directory:
genome_fasta:
umi_present: False
fastq_suffix: ".fastq.gz"
read_suffix: ['_R1', '_R2']


### 1. FASTQ trimming
cutadapt_trimm_options: ''
fw_adapter_sequence: "CTGTCTCTTATACACATCT"
rv_adapter_sequence: "CTGTCTCTTATACACATCT"
run_fastq_qc: True

### 2. BWA mapping
bwa_options: ''


### 2. BAM filtering
remove_duplicates: True
MAPQ_threshold: 20
remove_other_chromosomes_pattern: "CMV|HBV|HTLV|HPV|SV40|MCV|KSHV|chrUn|random|HCV|HIV|EBV"

2.1.3.2.2 Peak calling and analyses

snakemake \
--cores 20 \
-s </target/folder>/snakeATAC/workflow/snakeATAC_analyses.snakefile \
--configfile ./sources/snakeATAC_analyses_config_MM.vs.MGUS.yaml \
--keep-going

The analyses configuration file can be find here after (click on “Show” to reveal the code). Notice that paths to genome and blacklist must be adapted to user’s file location.

# This .yaml configuration file contains all variables used by the snakemake pipeline
# DO NOT CHANGE parameter names without changing it in Snakefile as well
# On the other hand, some parameter values have to be inevitably modifed
# ****************************************************************************************

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ESSENTIAL PARAMETERS @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
### 1. General workflow parameters  ======================================================
workflow_configuration:
  runs_directory: "./sources/mapping/02_BAM"
  output_directory: "./sources/peak_calling"

bam_features:
  bam_suffix: "_mapq20_sorted_woMT_dedup.bam"
  skip_bam_filtering: True
  remove_other_chromosomes_pattern: "^CMV|^HBV|^HTLV|^HPV|^SV40|^MCV|^KSHV|^chrUn|random|^HCV|^HIV|EBV"
  umi_present: False
  remove_duplicates: True
  MAPQ_threshold: 20
  minFragmentLength: 0
  maxFragmentLength: 2000

genomic_annotations:
  genome_id: "hg38"
  genome_fasta: "/path/to/GRCh38.d1.vd1.fa"
  blacklist: "/path/to/hg38-blacklist.v2.bed"
  effective_genomeSize: 2900338458
  ignore_for_normalization: "X Y MT M KI270728.1 KI270727.1 KI270442.1 KI270729.1 GL000225.1 KI270743.1 GL000008.2 GL000009.2 KI270747.1 KI270722.1 GL000194.1 KI270742.1 GL000205.2 GL000195.1 KI270736.1 KI270733.1 GL000224.1 GL000219.1 KI270719.1 GL000216.2 KI270712.1 KI270706.1 KI270725.1 KI270744.1 KI270734.1 GL000213.1 GL000220.1 KI270715.1 GL000218.1 KI270749.1 KI270741.1 GL000221.1 KI270716.1 KI270731.1 KI270751.1 KI270750.1 KI270519.1 GL000214.1 KI270708.1 KI270730.1 KI270438.1 KI270737.1 KI270721.1 KI270738.1 KI270748.1 KI270435.1 GL000208.1 KI270538.1 KI270756.1 KI270739.1 KI270757.1 KI270709.1 KI270746.1 KI270753.1 KI270589.1 KI270726.1 KI270735.1 KI270711.1 KI270745.1 KI270714.1 KI270732.1 KI270713.1 KI270754.1 KI270710.1 KI270717.1 KI270724.1 KI270720.1 KI270723.1 KI270718.1 KI270317.1 KI270740.1 KI270755.1 KI270707.1 KI270579.1 KI270752.1 KI270512.1 KI270322.1 GL000226.1 KI270311.1 KI270366.1 KI270511.1 KI270448.1 KI270521.1 KI270581.1 KI270582.1 KI270515.1 KI270588.1 KI270591.1 KI270522.1 KI270507.1 KI270590.1 KI270584.1 KI270320.1 KI270382.1 KI270468.1 KI270467.1 KI270362.1 KI270517.1 KI270593.1 KI270528.1 KI270587.1 KI270364.1 KI270371.1 KI270333.1 KI270374.1 KI270411.1 KI270414.1 KI270510.1 KI270390.1 KI270375.1 KI270420.1 KI270509.1 KI270315.1 KI270302.1 KI270518.1 KI270530.1 KI270304.1 KI270418.1 KI270424.1 KI270417.1 KI270508.1 KI270303.1 KI270381.1 KI270529.1 KI270425.1 KI270396.1 KI270363.1 KI270386.1 KI270465.1 KI270383.1 KI270384.1 KI270330.1 KI270372.1 KI270548.1 KI270580.1 KI270387.1 KI270391.1 KI270305.1 KI270373.1 KI270422.1 KI270316.1 KI270338.1 KI270340.1 KI270583.1 KI270334.1 KI270429.1 KI270393.1 KI270516.1 KI270389.1 KI270466.1 KI270388.1 KI270544.1 KI270310.1 KI270412.1 KI270395.1 KI270376.1 KI270337.1 KI270335.1 KI270378.1 KI270379.1 KI270329.1 KI270419.1 KI270336.1 KI270312.1 KI270539.1 KI270385.1 KI270423.1 KI270392.1 KI270394.1"



### 2. MACS3 peak calling ----------------------------------------------------------------
peak_calling:
  qValue_cutoff: 0.001
  call_summits: True
  FRiP_threshold: 20 #%
  peak_score_heatmap:
    plot_heatmap: False
    raw_heatmap_color: "Blues"
    zScore_heatmap_color: "seismic"



## 3. Quality controls -------------------------------------------------------------------
quality_controls:
  fragmentSize_window_length: 1000 #bp
  multiBigwigSummary_binning_window_size: 10000 #bp
  correlation_heatmap_color: "Blues"
  plotFingerprint:
    binSize: 500 #bp
    sampledRegions: 500000
    extra_parameters: ""



### 4. Differential TF-binding analyses --------------------------------------------------
differential_TF_binding:
  perform_differential_analyses: False
  sample_groups_table: ""
  merged_bigwig_binSize: 10
  group_comparisons: [['MM','MGUS']]
  motifs_file: ""
  whitelist: ""
  flanking_bp: 60



### 5. Somatic Variants ------------------------------------------------------------------
somatic_variants:
  skip_base_quality_recalibration: False
  call_SNPs: False
  call_indels: False
  dbsnp_file: ""
  # To make dbSNP without 'chr' -->  awk '{gsub(/\chr/, "")}1' your.vcf > withoutchr.vcf
  DP_snp_threshold: 20
  QUAL_snp_threshold: 0
  DP_indel_threshold: 20
  QUAL_indel_threshold: 0
  SnpSift_vcf_fields_to_extract: [ "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "DP", "AF", "FILTER", "FORMAT", "GEN[*].GT", "GEN[*].AD", "GEN[*].AF" ]



### 6. Copy Number Variations ------------------------------------------------------------
copy_number_variation:
  call_CNV: False
  chromosome_prefix: ""
  kb_bin_resolution: 50
  CNA_threshold: 2
  CNA_plot_line_colors: "red"
  CNA_plot_point_size: 0.5
  CNA_plot_point_transparency: 0.5
  corrected_bigWig_binSize: 10 #bp

2.1.3.3 Differential analyses

2.1.3.3.1 Load data in R

We perform differential analyses using the DiffBind R-package.
It requires a metadata table in which we indicate the path ot the BAMs and to the peaks (.narrowPeak file) and additional info about the sample groups.

## Get BAM list
bams <- list.files(path = "./sources/mapping/02_BAM",
                   pattern = "_sorted_woMT_dedup.bam$",
                   full.names = TRUE)

## Get peak list
peaks <- list.files(path = "./sources/peak_calling/04_MACS3_peaks",
                    pattern = "_peaks_chr.narrowPeak",
                    full.names = TRUE)

metadata <-
  data.frame(SampleID = gsub("_mapq20_sorted_woMT_dedup.bam|ATAC_seq_|_MGUS|_MM", "", basename(bams)),
             Tissue = "bone marrow",
             Factor = "ATAC-seq",
             Condition = ifelse(test = grepl("MGUS", basename(bams)), yes = "MGUS", no = "MM"),
             bamReads = bams,
             Peaks = peaks,
             PeakCaller = "narrow")

metadata

Now the DiffBind analyses object (dba) can be created.

dba <- dba(sampleSheet = metadata,
           config = data.frame(AnalysisMethod = DBA_DESEQ2,
                               th = 0.01, # FDR threshold
                               design = TRUE,
                               cores = 8,
                               RunParallel = TRUE,
                               doBlacklist = TRUE,
                               doGreylist = FALSE))

CB01 bone marrow ATAC-seq MGUS  NA narrow
CB02 bone marrow ATAC-seq MGUS  NA narrow
CB09 bone marrow ATAC-seq MGUS  NA narrow
CB14 bone marrow ATAC-seq MGUS  NA narrow
CB16 bone marrow ATAC-seq MGUS  NA narrow
CB18 bone marrow ATAC-seq MGUS  NA narrow
CB19 bone marrow ATAC-seq MGUS  NA narrow
CB20 bone marrow ATAC-seq MGUS  NA narrow
CB21 bone marrow ATAC-seq MGUS  NA narrow
CB22 bone marrow ATAC-seq MGUS  NA narrow
CB23 bone marrow ATAC-seq MGUS  NA narrow
tumour_009 bone marrow ATAC-seq MM  NA narrow
tumour_013 bone marrow ATAC-seq MM  NA narrow
tumour_022 bone marrow ATAC-seq MM  NA narrow
tumour_025 bone marrow ATAC-seq MM  NA narrow
tumour_027 bone marrow ATAC-seq MM  NA narrow
tumour_028 bone marrow ATAC-seq MM  NA narrow
tumour_033 bone marrow ATAC-seq MM  NA narrow
tumour_054 bone marrow ATAC-seq MM  NA narrow
tumour_059 bone marrow ATAC-seq MM  NA narrow
tumour_069 bone marrow ATAC-seq MM  NA narrow
tumour_073 bone marrow ATAC-seq MM  NA narrow

2.1.3.3.2 Generating counts

## Apply black list
dba <- dba.blacklist(dba, blacklist = DiffBind::DBA_BLACKLIST_HG38)

Genome detected: Hsapiens.UCSC.hg38
Applying blacklist...
Removed: 171 of 726291 intervals.
Removed: 45 merged (of 110154) and 19 (of 65137) consensus.

## Counting reads in peaks
dba <- dba.count(dba, bParallel = TRUE)

## Count normalization (library size)
dba <- dba.normalize(dba)

2.1.3.3.3 Differential binding analysis

First step is to set up the contrast: which group needs to be compare to which one. In this case will be MM-vs-MGUS.
The format is the following: contrast = c("metadata_column_id", "group1", "group2"), where the fold change is computed as group1/group2.

atac_mm <- dba.contrast(dba, contrast = c("Condition", "MM", "MGUS"))

The second step instead involves the differential analyses itself (in our case we set to use DESeq2).

atac_mm <- dba.analyze(dba_diff, design = "~ Condition")
atac_mm

The DBA object just obtained is equivalent ot the one provided in the power4peaks package:

> 22 Samples, 65114 sites in matrix:
>            ID      Tissue   Factor Condition    Reads FRiP
> 1        CB01 bone marrow ATAC-seq      MGUS 12434542 0.07
> 2        CB02 bone marrow ATAC-seq      MGUS 18907594 0.05
> 3        CB09 bone marrow ATAC-seq      MGUS 16055994 0.04
> 4        CB14 bone marrow ATAC-seq      MGUS 14305402 0.04
> 5        CB16 bone marrow ATAC-seq      MGUS 14236732 0.06
> 6        CB18 bone marrow ATAC-seq      MGUS 18661788 0.05
> 7        CB19 bone marrow ATAC-seq      MGUS 51886033 0.03
> 8        CB20 bone marrow ATAC-seq      MGUS 76842486 0.02
> 9        CB21 bone marrow ATAC-seq      MGUS 38030820 0.03
> 10       CB22 bone marrow ATAC-seq      MGUS 23325610 0.11
> 11       CB23 bone marrow ATAC-seq      MGUS 31237318 0.03
> 12 tumour_009 bone marrow ATAC-seq        MM 19793596 0.25
> 13 tumour_013 bone marrow ATAC-seq        MM 25973984 0.17
> 14 tumour_022 bone marrow ATAC-seq        MM 22603893 0.30
> 15 tumour_025 bone marrow ATAC-seq        MM 32636820 0.25
> 16 tumour_027 bone marrow ATAC-seq        MM 33225198 0.15
> 17 tumour_028 bone marrow ATAC-seq        MM 33654094 0.13
> 18 tumour_033 bone marrow ATAC-seq        MM 28665828 0.12
> 19 tumour_054 bone marrow ATAC-seq        MM 19275562 0.16
> 20 tumour_059 bone marrow ATAC-seq        MM 18243697 0.23
> 21 tumour_069 bone marrow ATAC-seq        MM 21873494 0.16
> 22 tumour_073 bone marrow ATAC-seq        MM 18104410 0.22
> 
> Design: [~ Condition] | 1 Contrast:
>      Factor Group Samples Group2 Samples2 DB.DESeq2
> 1 Condition    MM      11   MGUS       11     53005

Note that all the data sets can be loaded as:

data("tamoxifen", package = "power4peaks")
data("endometrium", package = "power4peaks")
data("atac_mm", package = "power4peaks")

3 Build `power4peaks` objects

The function as.power4peaks can automatically convert a DBA object (from DiffBind) in a power4peaks.stats one.
A power4peaks.stats object is an S4 vector containing the following slots:

Slot id	Description
dba.object	original DBA object derived from `DiffBind`
contrast	contrast that has been used
design	design matrix obtained from the analyses
diff.method	differential peaks analyses method used (DEseq2 or edgeR)
p.adjust.method	p-value adjustment method used
diff.object	differential peak analyses object obtained from the DBA object
results	differential peak analyses results
statistics	values of the statistics obtained from the differential peak analyses
stat.distribution	type of distribution of the statistics (e.g., normal, chi-squared, t, …)
df1	values of the first degree of freedom
df2	values of the second degree of freedom

3.1 Tamoxifen

library(power4peaks)

p4p_tamoxifen = as.power4peaks(tamoxifen)
p4p_tamoxifen
>           Contrast | Condition: Responsive vs Resistant
>             Method | DESeq2
> Stat. distribution | norm
>                df1 | 2
>                df2 |

3.2 Endometrium

library(power4peaks)

p4p_endometrium = as.power4peaks(endometrium)
p4p_endometrium
>           Contrast | Condition: Tumor vs Normal
>             Method | edgeR
> Stat. distribution | chisq
>                df1 | 1
>                df2 |

3.3 ATAC-seq

library(power4peaks)

p4p_atac_mm = as.power4peaks(atac_mm)
p4p_atac_mm
>           Contrast | Condition: MM vs MGUS
>             Method | DESeq2
> Stat. distribution | norm
>                df1 | 2
>                df2 |

3.4 Validate statistics distribution

The most important parameters for the computation of the statistical power of the differential analyses using SSPA are the distribution of the statistics and the degrees of freedom.

Therefore, to validate the distribution type of the statistics, it is possible to visualize the empirical statistics distribution to the theoretical ones using the function plot.stat.distribution:

3.4.1 Tamoxifen

plot.stat.distribution(p4p_tamoxifen)

3.4.2 Endometrium

plot.stat.distribution(p4p_endometrium)

3.4.3 ATAC-seq

plot.stat.distribution(p4p_atac_mm)

4 Compute statistical power

The function compute.power will allow for the power and sample size calculation using a power4peaks.stats object as input.
This will produce and object of class power4peaks.power which is an S4 vector containing the following slots:

Slot id	Description
pilot.data	output of `SSPA:::pilotData`, collects the info relative to statistics, p-value and sample size
sample.size	output of `SSPA:::sampleSize`, contains the estimation of the proportion of non-differentially expressed genes and the density of the effect sizes.
power	output of `SSPA:::predictpower`
effect.size.plot	ggplot object depicting the effect as function of the group size
power.plot	ggplot object depicting the power as function of the group size

4.1 Tamoxifen

tamoxifen_power <- compute.power(p4p_tamoxifen)
tamoxifen_power

4.2 Endometrium

endometrium_power <- compute.power(p4p_endometrium)
endometrium_power

4.3 ATAC-seq

atac_mm_power <- compute.power(p4p_atac_mm)
atac_mm_power

The top panel (effect size plot) depicts the distribution of the effect size. The highest the density the highest is the impact of the group/sample size on the power of the differential analyses. The \(\pi\)⁰ indicates the estimated proportion of non-differential peaks (proportion of test under the null hypothesis).
The lower panel (power plot) depicts the power of the statistics that is achieved with a certain amount of samples per group. The red horizontal line indicates the power threshold indicated by the user (default: 0.80), while the gray dashed vertical line indicates the average group size present in the data set tested.

5 References

van Iterson M, ’t Hoen PA, Pedotti P, Hooiveld GJ, den Dunnen JT, van Ommen GJ, Boer JM, Menezes RX. Relative power and sample size analysis on gene expression profiling data. BMC Genomics. 2009 Sep 17;10:439. doi: 10.1186/1471-2164-10-439. PMID: 19758461; PMCID: PMC2759969.
van Iterson M, van de Wiel MA, Boer JM, de Menezes RX. General power and sample size calculations for high-dimensional genomic data. Stat Appl Genet Mol Biol. 2013 Aug;12(4):449-67. doi: 10.1515/sagmb-2012-0046. PMID: 23934609.
Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis O, Ellis IO, Green AR, Ali S, Chin SF, Palmieri C, Caldas C, Carroll JS. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature. 2012 Jan 4;481(7381):389-93. doi: 10.1038/nature10730. PMID: 22217937; PMCID: PMC3272464.
Gregoricchio S, Kojic A, Hoogstraat M, Schuurman K, Stelloo S, Severson TM, O’Mara TA, Droog M, Singh AA, Glubb DM, Wessels LFA, Vermeulen M, van Leeuwen FE, Zwart W. Endometrial tumorigenesis involves epigenetic plasticity demarcating non-coding somatic mutations and 3D-genome alterations. Genome Biol. 2025 May 9;26(1):124. doi: 10.1186/s13059-025-03596-5. PMID: 40346709; PMCID: PMC12063248.
Bruno T, Cappelletto MC, Cortile C, Di Giovenale S, Amadio B, De Nicola F, Falcone I, Giuliani S, Palermo B, Catena V, Ciuffreda L, Cerruti F, Cascio P, Merola R, Masi S, De Pascale V, Annibali O, Ferraro S +39, Gumenyuk S, Pisani F, Marchesi F, Mengarelli A, Fanciulli M, Corleone G. Nuclear respiratory factor 1 promotes cell survival in multiple myeloma under proteasome inhibition therapy. Blood. 2025 Sep 8:blood.2025028441. doi: 10.1182/blood.2025028441. Epub ahead of print. PMID: 40920573.

6 Session info

> $`R version:`
> [1] R version 4.4.3 (2025-02-28)
> 
> $`Base packages:`
> [1] base      datasets  graphics  grDevices methods   stats     utils    
> 
> $`Other attached packages:`
> [1] dplyr_1.1.4       power4peaks_0.1.0 SSPA_2.30.0      
> 
> $`Loaded via a namespace (and not attached):`
>   [1] abind_1.4-8                 amap_0.8-20                
>   [3] apeglm_1.26.1               ashr_2.2-63                
>   [5] backports_1.5.0             bbmle_1.0.25.1             
>   [7] bdsmatrix_1.3-7             Biobase_2.64.0             
>   [9] BiocGenerics_0.50.0         BiocIO_1.14.0              
>  [11] BiocManager_1.30.26         BiocParallel_1.38.0        
>  [13] Biostrings_2.72.1           bitops_1.0-9               
>  [15] broom_1.0.10                BSgenome_1.72.0            
>  [17] bslib_0.9.0                 cachem_1.1.0               
>  [19] car_3.1-3                   carData_3.0-5              
>  [21] caTools_1.18.3              cli_3.6.5                  
>  [23] coda_0.19-4.1               codetools_0.2-20           
>  [25] colorspace_2.1-1            commonmark_2.0.0           
>  [27] compiler_4.4.3              crayon_1.5.3               
>  [29] curl_7.0.0                  DelayedArray_0.30.1        
>  [31] deldir_2.0-4                DESeq2_1.44.0              
>  [33] dichromat_2.0-0.1           DiffBind_3.14.0            
>  [35] digest_0.6.37               edgeR_4.2.2                
>  [37] emdbook_1.3.14              evaluate_1.0.5             
>  [39] farver_2.1.2                fastmap_1.2.0              
>  [41] fitdistrplus_1.2-4          Formula_1.2-5              
>  [43] fs_1.6.6                    generics_0.1.4             
>  [45] GenomeInfoDb_1.40.1         GenomeInfoDbData_1.2.12    
>  [47] GenomicAlignments_1.40.0    GenomicRanges_1.56.2       
>  [49] ggplot2_4.0.0               ggpubr_0.6.1               
>  [51] ggrepel_0.9.6               ggsignif_0.6.4             
>  [53] ggtext_0.1.2                glue_1.8.0                 
>  [55] gplots_3.2.0                GreyListChIP_1.36.0        
>  [57] grid_4.4.3                  gridtext_0.1.5             
>  [59] gtable_0.3.6                gtools_3.9.5               
>  [61] htmltools_0.5.8.1           htmlwidgets_1.6.4          
>  [63] httr_1.4.7                  hwriter_1.3.2.1            
>  [65] interp_1.1-6                invgamma_1.2               
>  [67] IRanges_2.38.1              irlba_2.3.5.1              
>  [69] jpeg_0.1-11                 jquerylib_0.1.4            
>  [71] jsonlite_2.0.0              KernSmooth_2.23-26         
>  [73] knitr_1.50                  labeling_0.4.3             
>  [75] lattice_0.22-6              latticeExtra_0.6-31        
>  [77] lifecycle_1.0.4             limma_3.60.6               
>  [79] litedown_0.7                locfit_1.5-9.12            
>  [81] magrittr_2.0.4              markdown_2.0               
>  [83] MASS_7.3-64                 Matrix_1.7-2               
>  [85] MatrixGenerics_1.16.0       matrixStats_1.5.0          
>  [87] mixsqp_0.3-54               mvtnorm_1.3-3              
>  [89] numDeriv_2016.8-1.1         parallel_4.4.3             
>  [91] patchwork_1.3.2             pillar_1.11.0              
>  [93] pkgconfig_2.0.3             plyr_1.8.9                 
>  [95] png_0.1-8                   purrr_1.1.0                
>  [97] pwalign_1.0.0               qvalue_2.36.0              
>  [99] R6_2.6.1                    rappdirs_0.3.3             
> [101] RColorBrewer_1.1-3          Rcpp_1.1.0                 
> [103] RCurl_1.98-1.17             reshape2_1.4.4             
> [105] restfulr_0.0.16             rjson_0.2.23               
> [107] rlang_1.1.6                 rmarkdown_2.29             
> [109] Rsamtools_2.20.0            rstatix_0.7.2              
> [111] rstudioapi_0.17.1           rtracklayer_1.64.0         
> [113] rvcheck_0.2.1               S4Arrays_1.4.1             
> [115] S4Vectors_0.42.1            S7_0.2.0                   
> [117] sass_0.4.10                 scales_1.4.0               
> [119] ShortRead_1.62.0            SparseArray_1.4.8          
> [121] splines_4.4.3               SQUAREM_2021.1             
> [123] statmod_1.5.0               stats4_4.4.3               
> [125] stringi_1.8.7               stringr_1.5.2              
> [127] SummarizedExperiment_1.34.0 survival_3.8-3             
> [129] systemPipeR_2.10.0          tibble_3.3.0               
> [131] tidyr_1.3.1                 tidyselect_1.2.1           
> [133] tools_4.4.3                 truncnorm_1.0-9            
> [135] UCSC.utils_1.0.0            vctrs_0.6.5                
> [137] withr_3.0.2                 xfun_0.53                  
> [139] XML_3.99-0.19               xml2_1.4.0                 
> [141] XVector_0.44.0              yaml_2.3.10                
> [143] yulab.utils_0.2.1           zlibbioc_1.50.0

`power4peaks` tutorial

Sebastian Gregoricchio

14 September, 2025

1 Introduction

2 Perform differential analyses

2.1 Differential results

2.1.1 Tamoxifen

2.1.2 Endomtrium

2.1.2.1 Load the data in `DiffBind`

2.1.3 ATAC-seq

2.1.3.1 Fastq download

2.1.3.1.1 Extracting data information

2.1.3.1.2 Download the data

2.1.3.2 Data pre-processing

2.1.3.2.1 Mapping ATAC to Hg38

2.1.3.2.2 Peak calling and analyses

2.1.3.3 Differential analyses

2.1.3.3.1 Load data in R

2.1.3.3.2 Generating counts

2.1.3.3.3 Differential binding analysis

3 Build `power4peaks` objects

3.1 Tamoxifen

3.2 Endometrium

3.3 ATAC-seq

3.4 Validate statistics distribution

3.4.1 Tamoxifen

3.4.2 Endometrium

3.4.3 ATAC-seq

4 Compute statistical power

4.1 Tamoxifen

4.2 Endometrium

4.3 ATAC-seq

5 References

6 Session info

power4peaks tutorial

Sebastian Gregoricchio

14 September, 2025

1 Introduction

2 Perform differential analyses

2.1 Differential results

2.1.1 Tamoxifen

2.1.2 Endomtrium

2.1.2.1 Load the data in DiffBind

2.1.3 ATAC-seq

2.1.3.1 Fastq download

2.1.3.1.1 Extracting data information

2.1.3.1.2 Download the data

2.1.3.2 Data pre-processing

2.1.3.2.1 Mapping ATAC to Hg38

2.1.3.2.2 Peak calling and analyses

2.1.3.3 Differential analyses

2.1.3.3.1 Load data in R

2.1.3.3.2 Generating counts

2.1.3.3.3 Differential binding analysis

3 Build power4peaks objects

3.1 Tamoxifen

3.2 Endometrium

3.3 ATAC-seq

3.4 Validate statistics distribution

3.4.1 Tamoxifen

3.4.2 Endometrium

3.4.3 ATAC-seq

4 Compute statistical power

4.1 Tamoxifen

4.2 Endometrium

4.3 ATAC-seq

5 References

6 Session info

`power4peaks` tutorial

2.1.2.1 Load the data in `DiffBind`

3 Build `power4peaks` objects