1 Introduction
The concept behind the Rseb (R-package for Simplified End-to-end data Build-up) is to provide a toolkit that allows the automation of different type of tasks avoiding retyping of code and loss of time. Furthermore, the advantage is that, in most of the cases, the functions are built in R-language making it suitable for all the operating systems
From a more biological point of view, this package simplifies many downstream analyses of high-throughput data that otherwise would require many hours of code typing and case-to-case adaptation. Moreover, most of the functions aimed to visualize these kind of data are thought to provide a high level of possible customization with a large number of graphical parameters compared to the commonly used already available tools. Another advantage of this package is that it offers multiple methods, with a corresponding visualization, to quantify the difference of signal between samples, in a qualitatively and/or quantitatively way, without any additional coding.
The guide is divided in four parts: 1) the first one will explore some analyses and visualization of RNA-seq data; 2) the second one the representation and quantification of targeted sequencing data (ChIP-seq, ATAC-seq, etc.); 3) the third part is focused on the analyses of RNA expression by RT-qPCR; 4) while the last part is focused on some of the “general” tools available in this package.
1.1 Citation
If you use this package, please cite:
Citation
… The publication is under revision …
Gregoricchio S., et
al., “Title”.
Journal (Year).
doi: ABCD
citation("Rseb")
>
> To cite package 'Rseb' in publications use:
>
> Sebastian Gregoricchio (2022). Rseb: An R-package for NGS data
> managing and visualization. R package version 0.3.0.
> https://sebastian-gregoricchio.github.io/Rseb/
> https://github.com/sebastian-gregoricchio/Rseb/
> https://sebastian-gregoricchio.github.io/
>
> A BibTeX entry for LaTeX users is
>
> @Manual{,
> title = {Rseb: An R-package for NGS data managing and visualization},
> author = {Sebastian Gregoricchio},
> year = {2022},
> note = {R package version 0.3.0},
> url = {https://sebastian-gregoricchio.github.io/Rseb/
> https://github.com/sebastian-gregoricchio/Rseb/
> https://sebastian-gregoricchio.github.io/},
> }
>
> ATTENTION: This citation information has been auto-generated from the
> package DESCRIPTION file and may need manual editing, see
> 'help("citation")'.
2 RNA-seq data
A common analysis performed on RNA-seq data is the evaluation of the differentially expressed genes between two different conditions (e.g., untreated vs treated cells).
For this analysis it is common to use the R-package DESeq21 which returns a table as the following one:
data("RNAseq", package = "Rseb")
RNAseq
geneName | baseMean | log2FC | lfcSE | stat | pvalue | padj |
---|---|---|---|---|---|---|
Gm7831 | 1.013385 | 0.0017493 | 0.1547084 | 0.0113069 | 0.9909786 | 0.9996304 |
5330411O13Rik | 1.697606 | -0.0597426 | 0.1847122 | -0.3234363 | 0.7463649 | 0.9996304 |
Lingo3 | 272.604552 | 0.4821823 | 0.1545217 | 3.1204835 | 0.0018055 | 0.0126236 |
Trim26 | 1233.292431 | -0.1201515 | 0.0915186 | -1.3128644 | 0.1892287 | 0.5989217 |
Uso1 | 10686.826190 | -0.0070269 | 0.0678742 | -0.1035278 | 0.9175441 | 0.9996304 |
2810454H06Rik | 34.838593 | 0.1106764 | 0.2768170 | 0.3998179 | 0.6892906 | 0.9996304 |
Armcx4 | 1.525542 | 0.0317658 | 0.1800692 | 0.1764088 | 0.8599728 | 0.9996304 |
Gm26621 | 80.667209 | 0.1554742 | 0.2309429 | 0.6732148 | 0.5008107 | 0.9996304 |
Elf3 | 1.358421 | -0.0638278 | 0.1718065 | -0.3715098 | 0.7102579 | 0.9996304 |
Gm7442 | 4.913945 | -0.1194980 | 0.2568927 | -0.4651668 | 0.6418120 | 0.9996304 |
2.1 Differentially expressed genes
The differential genes are defined depending on the Fold Change (FC)
of expression and the adjusted p-value (Padj). The function
DEstatus
helps in this definition. It takes as input the
fold change expression and the p-value adjusted, then a threshold for
these two parameters can be set to define four status:
- UP-regulated (FC greater than threshold, significant p-value);
- DOWN-regulated (FC lower than threshold, significant p-value);
- UNRESPONSIVE/NoResp (FC within the range defined by the unresponsive FC threshold, not significant p-value);
- NULL (all the other genes).
All the labels and thresholds can be custom. We can proceed to add a column with the differential expression status to the original table.
require(dplyr)
<-
RNAseq %>%
RNAseq mutate(DE.status = Rseb::DE.status(log2FC = RNAseq$log2FC,
p.value.adjusted = RNAseq$padj,
FC_threshold = 2, # Linear value
FC_NoResp_left = 0.9, # Automatically 0.9 <= FC <= 1/0.9)
p.value_threshold = 0.05,
low.FC.status.label = "DOWN",
high.FC.status.label = "UP",
unresponsive.label = "UNRESPONSIVE",
null.label = "NULL"))
geneName | baseMean | log2FC | lfcSE | stat | pvalue | padj | DE.status |
---|---|---|---|---|---|---|---|
Gm7831 | 1.013385 | 0.0017493 | 0.1547084 | 0.0113069 | 0.9909786 | 0.9996304 | UNRESPONSIVE |
5330411O13Rik | 1.697606 | -0.0597426 | 0.1847122 | -0.3234363 | 0.7463649 | 0.9996304 | UNRESPONSIVE |
Lingo3 | 272.604552 | 0.4821823 | 0.1545217 | 3.1204835 | 0.0018055 | 0.0126236 | NULL |
Trim26 | 1233.292431 | -0.1201515 | 0.0915186 | -1.3128644 | 0.1892287 | 0.5989217 | UNRESPONSIVE |
Uso1 | 10686.826190 | -0.0070269 | 0.0678742 | -0.1035278 | 0.9175441 | 0.9996304 | UNRESPONSIVE |
2810454H06Rik | 34.838593 | 0.1106764 | 0.2768170 | 0.3998179 | 0.6892906 | 0.9996304 | UNRESPONSIVE |
Armcx4 | 1.525542 | 0.0317658 | 0.1800692 | 0.1764088 | 0.8599728 | 0.9996304 | UNRESPONSIVE |
Gm26621 | 80.667209 | 0.1554742 | 0.2309429 | 0.6732148 | 0.5008107 | 0.9996304 | NULL |
Elf3 | 1.358421 | -0.0638278 | 0.1718065 | -0.3715098 | 0.7102579 | 0.9996304 | UNRESPONSIVE |
Gm7442 | 4.913945 | -0.1194980 | 0.2568927 | -0.4651668 | 0.6418120 | 0.9996304 | UNRESPONSIVE |
It is possible now to use the DE.status
column to count
the number of genes per each group:
<-
RNAseq.summary.table %>%
RNAseq group_by(DE.status) %>%
summarise(N = n()) %>%
rbind(c("Total", nrow(RNAseq)))
DE.status | N |
---|---|
DOWN | 38 |
NULL | 650 |
UNRESPONSIVE | 1292 |
UP | 20 |
Total | 2000 |
2.2 Representation of the RNA-seq data
Two simple representations for RNA-seq data are the MA-plot and the volcano-plot. The first allows to visualize the Fold Change expression as function of basal expression of each gene, while the second always the FC but depending on the significance of the FC computation.
2.2.1 MA-plot
The MA-plot helps to estimate the difference between two samples plotting the Fold Change expression of a gene as function of its expression among all the samples (all the replicates of both conditions compared, defined by the “baseMean” in the DESeq2 output table). Different colors could be used depending on the “DE.status” column that we just added to the RNA-seq table.
require(ggplot2)
<-
MA.plot ggplot(data = RNAseq,
aes(x = log2(baseMean),
y = log2FC,
col = DE.status)) +
geom_point(size = 2) +
scale_color_manual(values = c("#F8766D", "gray30", "#00A5CF", "#00BA38")) +
ggtitle("MA-plot") +
theme_classic()
MA.plot