Function that computes the Root Mean Squared Error (RMSE) for all the 4 possibile imputation algorithms: missForest, kNN, LLS, SVD. A new dataset containing only proteins with known values is created, a certain percentage of NAs is then manually introduced (with the possibility to respect the "pattern" of the missing values). This percentage is equivalent to the percentage of missing values in the original data set. At the end the imputed values in the new data set are compared with the measured (expected) ones.

compare.imp.methods(
  DEprot.object,
  percentage.test = 30,
  sample.group.column = NULL,
  use.normalized.data = TRUE,
  run.missForest = TRUE,
  run.kNN = TRUE,
  run.LLS = TRUE,
  run.SVD = TRUE,
  missForest.max.iterations = 100,
  missForest.variable.wise.OOBerror = TRUE,
  missForest.cores = 1,
  missForest.parallel.mode = "variables",
  kNN.n.nearest.neighbours = 10,
  LLS.k = 2,
  normalize.color.bar = TRUE,
  low.residual.color = "firebrick",
  zero.residual.color = "white",
  high.residual.color = "steelblue4",
  seed = NULL,
  verbose = FALSE
)

Arguments

DEprot.object

A DEprot object, as generated by load.counts or load.counts2.

percentage.test

Numeric value between 0 (excluded) and 100 indicating the percentage of proteins to use for the test dataset. Default: 30.

sample.group.column

String indicating the ID of any column of the metadata table. This will be used to introduce the same frequencies of n-missing values for a protein and therefore not introducing the NAs completely at random in the dataset. Default: NULL, NAs are assigned randomly (same percentage of NAs present in the original table).

use.normalized.data

Logical value indicating whether the imputation should be performed based on the rationalized data. Default: TRUE.

run.missForest

Logical values indicating whether the test for the missForest imputation should be performed. Default: TRUE.

run.kNN

Logical values indicating whether the test for the kNN imputation should be performed. Default: TRUE.

run.LLS

Logical values indicating whether the test for the LLS imputation should be performed. Default: TRUE.

run.SVD

Logical values indicating whether the test for the SVD imputation should be performed. Default: TRUE.

missForest.max.iterations

Max number of iterations for the missForest algorithm. Default: 100.

missForest.variable.wise.OOBerror

Logical value to define whether the OOB error is returned for each variable separately. Default: TRUE.

missForest.cores

Number of cores used to run the missForest algorithm. If missForest.cores is 1 (or lower), the imputation will be run in parallel. Two modes are possible and can be defined by the parameter missForest.parallel.mode. Default: 1.

missForest.parallel.mode

Define the mode to use for the parallelization, ignored when cores is more than 1. One among: 'variables', 'forests'. Default: "variables". See also the documentation of the missForest function.

kNN.n.nearest.neighbours

Numeric value indicating the number of nearest neighbors to use to perform the kNN imputation. Default: 10.

LLS.k

Cluster size, this is the number of similar genes used for regression. Default: 2.

normalize.color.bar

Logical value indicating whether the color bar limits for the residuals in the correlation plots should be normalized among the methods. Default: TRUE, the residual color bar absolute maximum is set to the max of all the residuals identified in all the methods,

low.residual.color

String indicating any R-supported color that must be used for the negative values of the residuals color bar in the correlation plots. Default: "firebrick".

zero.residual.color

String indicating any R-supported color that must be used for the null residuals (zero, mid gradient color) color bar in the correlation plots. Default: "white".

high.residual.color

String indicating any R-supported color that must be used for the positive values of the residuals color bar in the correlation plots. Default: "steelblue4".

seed

Numeric value indicating the seed to use for the randomization. Default: NULL, automatically generated (saved in the seed slot in the final object).

verbose

Logical valued indicating whether processing messages should be printed. Default: FALSE.

overwrite.imputation

Logical value to indicate whether, in the case already available, the table of imputed counts should be overwritten. Default: FALSE.

Value

A DEprot.RMSE object.