Function that allows for the imputation of missing values using 4 possible algorithms: missForest, kNN, LLS, SVD.

impute.counts(
  DEprot.object,
  method = "missForest",
  use.normalized.data = TRUE,
  overwrite.imputation = FALSE,
  missForest.max.iterations = 100,
  missForest.variable.wise.OOBerror = TRUE,
  missForest.cores = 1,
  missForest.parallel.mode = "variables",
  kNN.n.nearest.neighbours = 10,
  LLS.k = 2,
  pcaMethods.nPCs.to.test = 5,
  RegImpute.max.iterations = 10,
  RegImpute.fillmethod = "row_mean",
  seed = NULL,
  verbose = FALSE
)

Arguments

DEprot.object

A DEprot object, as generated by load.counts.

method

String indicating the imputation method to use. One among: 'missForest', 'kNN' (VIM), 'tkNN' (imputomics), 'corkNN' (imputomics), 'LLS' (pcaMethods), 'SVD' (a.k.a svdImpute, pcaMethods), 'BPCA' (pcaMethods), 'PPCA' (pcaMethods), 'RegImpute' (DreamAI). Default: "missForest".

use.normalized.data

Logical value indicating whether the imputation should be performed based on the rationalized data. Default: TRUE.

overwrite.imputation

Logical value to indicate whether, in the case already available, the table of imputed counts should be overwritten. Default: FALSE.

missForest.max.iterations

Max number of iterations for the missForest algorithm. Default: 100.

missForest.variable.wise.OOBerror

Logical value to define whether the OOB error is returned for each variable separately. Default: TRUE.

missForest.cores

Number of cores used to run the missForest algorithm. If missForest.cores is 1 (or lower), the imputation will be run in parallel. Two modes are possible and can be defined by the parameter missForest.parallel.mode. Default: 1.

missForest.parallel.mode

Define the mode to use for the parallelization, ignored when cores is more than 1. One among: 'variables', 'forests'. Default: "variables". See also the documentation of the missForest function.

kNN.n.nearest.neighbours

Numeric value indicating the number of nearest neighbors to use to perform the kNN imputation. Default: 10.

LLS.k

Cluster size, this is the number of similar genes used for regression. Default: 2.

pcaMethods.nPCs.to.test

Numeric value indicating the number of Principal Components to test in order to find the optimal number of PCs to used in the imputation methods from the pcaMethods package. This includes: 'LLS', 'SVD' (a.k.a 'svdImpute'), 'BPCA-pcaMethods', and 'PPCA'. Default: 5.

RegImpute.max.iterations

Numeric value indicating the number of maximum iteration for the imputation with RegImpute (from DreamAI). Default: 10.

RegImpute.fillmethod

String identifying the fill method to be used in the RegImpute method (fromDreamAI). One among "row_mean" and "zeros". Default: "row_mean". It throws an warning if "row_median" is used.

seed

Numeric value indicating the seed to use for the randomization. Default: NULL, automatically generated (saved in the seed element in the final imputation method list).

verbose

Logical valued indicating whether processing messages should be printed. Default: FALSE.

Value

A DEprot object. The boxplot showing the distribution of the protein intensity is remade and added to the slot (boxplot.imputed). A list with parameters and other info about the imputation is added as well in the imputation slot.

See also

Author

Sebastian Gregoricchio

Examples

dpo <- impute.counts(DEprot.object = DEprot::test.toolbox$dpo.norm,
                     method = "bPCA")