release license

1 Introduction

The concept behind the Rseb (R-package for Simplified End-to-end data Build-up) is to provide a toolkit that allows the automation of different type of tasks avoiding retyping of code and loss of time. Furthermore, the advantage is that, in most of the cases, the functions are built in R-language making it suitable for all the operating systems

From a more biological point of view, this package simplifies many downstream analyses of high-throughput data that otherwise would require many hours of code typing and case-to-case adaptation. Moreover, most of the functions aimed to visualize these kind of data are thought to provide a high level of possible customization with a large number of graphical parameters compared to the commonly used already available tools. Another advantage of this package is that it offers multiple methods, with a corresponding visualization, to quantify the difference of signal between samples, in a qualitatively and/or quantitatively way, without any additional coding.

The guide is divided in four parts: 1) the first one will explore some analyses and visualization of RNA-seq data; 2) the second one the representation and quantification of targeted sequencing data (ChIP-seq, ATAC-seq, etc.); 3) the third part is focused on the analyses of RNA expression by RT-qPCR; 4) while the last part is focused on some of the “general” tools available in this package.

1.1 Citation

If you use this package, please cite:

Citation

… The publication is under revision …
Gregoricchio S., et al., “Title”.
Journal (Year).
doi: ABCD


citation("Rseb")
> 
> To cite package 'Rseb' in publications use:
> 
>   Sebastian Gregoricchio (2022). Rseb: An R-package for NGS data
>   managing and visualization. R package version 0.3.0.
>   https://sebastian-gregoricchio.github.io/Rseb/
>   https://github.com/sebastian-gregoricchio/Rseb/
>   https://sebastian-gregoricchio.github.io/
> 
> A BibTeX entry for LaTeX users is
> 
>   @Manual{,
>     title = {Rseb: An R-package for NGS data managing and visualization},
>     author = {Sebastian Gregoricchio},
>     year = {2022},
>     note = {R package version 0.3.0},
>     url = {https://sebastian-gregoricchio.github.io/Rseb/
> https://github.com/sebastian-gregoricchio/Rseb/
> https://sebastian-gregoricchio.github.io/},
>   }
> 
> ATTENTION: This citation information has been auto-generated from the
> package DESCRIPTION file and may need manual editing, see
> 'help("citation")'.


2 RNA-seq data

A common analysis performed on RNA-seq data is the evaluation of the differentially expressed genes between two different conditions (e.g., untreated vs treated cells).

For this analysis it is common to use the R-package DESeq21 which returns a table as the following one:

data("RNAseq", package = "Rseb")
RNAseq
DESeq2 results table example
geneName baseMean log2FC lfcSE stat pvalue padj
Gm7831 1.013385 0.0017493 0.1547084 0.0113069 0.9909786 0.9996304
5330411O13Rik 1.697606 -0.0597426 0.1847122 -0.3234363 0.7463649 0.9996304
Lingo3 272.604552 0.4821823 0.1545217 3.1204835 0.0018055 0.0126236
Trim26 1233.292431 -0.1201515 0.0915186 -1.3128644 0.1892287 0.5989217
Uso1 10686.826190 -0.0070269 0.0678742 -0.1035278 0.9175441 0.9996304
2810454H06Rik 34.838593 0.1106764 0.2768170 0.3998179 0.6892906 0.9996304
Armcx4 1.525542 0.0317658 0.1800692 0.1764088 0.8599728 0.9996304
Gm26621 80.667209 0.1554742 0.2309429 0.6732148 0.5008107 0.9996304
Elf3 1.358421 -0.0638278 0.1718065 -0.3715098 0.7102579 0.9996304
Gm7442 4.913945 -0.1194980 0.2568927 -0.4651668 0.6418120 0.9996304


2.1 Differentially expressed genes

The differential genes are defined depending on the Fold Change (FC) of expression and the adjusted p-value (Padj). The function DEstatus helps in this definition. It takes as input the fold change expression and the p-value adjusted, then a threshold for these two parameters can be set to define four status:

  • UP-regulated (FC greater than threshold, significant p-value);
  • DOWN-regulated (FC lower than threshold, significant p-value);
  • UNRESPONSIVE/NoResp (FC within the range defined by the unresponsive FC threshold, not significant p-value);
  • NULL (all the other genes).

All the labels and thresholds can be custom. We can proceed to add a column with the differential expression status to the original table.

require(dplyr)

RNAseq <-
  RNAseq %>%
  mutate(DE.status = Rseb::DE.status(log2FC = RNAseq$log2FC,
                                     p.value.adjusted = RNAseq$padj,
                                     FC_threshold = 2, # Linear value
                                     FC_NoResp_left = 0.9, # Automatically 0.9 <= FC <= 1/0.9)
                                     p.value_threshold = 0.05,
                                     low.FC.status.label = "DOWN",
                                     high.FC.status.label = "UP",
                                     unresponsive.label = "UNRESPONSIVE",
                                     null.label = "NULL"))
RNA-seq table with differential expression status
geneName baseMean log2FC lfcSE stat pvalue padj DE.status
Gm7831 1.013385 0.0017493 0.1547084 0.0113069 0.9909786 0.9996304 UNRESPONSIVE
5330411O13Rik 1.697606 -0.0597426 0.1847122 -0.3234363 0.7463649 0.9996304 UNRESPONSIVE
Lingo3 272.604552 0.4821823 0.1545217 3.1204835 0.0018055 0.0126236 NULL
Trim26 1233.292431 -0.1201515 0.0915186 -1.3128644 0.1892287 0.5989217 UNRESPONSIVE
Uso1 10686.826190 -0.0070269 0.0678742 -0.1035278 0.9175441 0.9996304 UNRESPONSIVE
2810454H06Rik 34.838593 0.1106764 0.2768170 0.3998179 0.6892906 0.9996304 UNRESPONSIVE
Armcx4 1.525542 0.0317658 0.1800692 0.1764088 0.8599728 0.9996304 UNRESPONSIVE
Gm26621 80.667209 0.1554742 0.2309429 0.6732148 0.5008107 0.9996304 NULL
Elf3 1.358421 -0.0638278 0.1718065 -0.3715098 0.7102579 0.9996304 UNRESPONSIVE
Gm7442 4.913945 -0.1194980 0.2568927 -0.4651668 0.6418120 0.9996304 UNRESPONSIVE

It is possible now to use the DE.status column to count the number of genes per each group:

RNAseq.summary.table <-
  RNAseq %>%
  group_by(DE.status) %>%
  summarise(N = n()) %>%
  rbind(c("Total", nrow(RNAseq)))
RNA-seq differential expression summary
DE.status N
DOWN 38
NULL 650
UNRESPONSIVE 1292
UP 20
Total 2000


2.2 Representation of the RNA-seq data

Two simple representations for RNA-seq data are the MA-plot and the volcano-plot. The first allows to visualize the Fold Change expression as function of basal expression of each gene, while the second always the FC but depending on the significance of the FC computation.

2.2.1 MA-plot

The MA-plot helps to estimate the difference between two samples plotting the Fold Change expression of a gene as function of its expression among all the samples (all the replicates of both conditions compared, defined by the “baseMean” in the DESeq2 output table). Different colors could be used depending on the “DE.status” column that we just added to the RNA-seq table.

require(ggplot2)

MA.plot <-
  ggplot(data = RNAseq,
         aes(x = log2(baseMean),
             y = log2FC,
             col = DE.status)) +
  geom_point(size = 2) +
  scale_color_manual(values = c("#F8766D", "gray30", "#00A5CF", "#00BA38")) +
  ggtitle("MA-plot") +
  theme_classic()

MA.plot