Applies flexible quality control filters on an object of class SNPDataLong. Supports call rate filtering, minor allele frequency (MAF), Hardy-Weinberg equilibrium (HWE), removal of monomorphic SNPs, exclusion of specific chromosomes, optionally removing SNPs without positions, and optionally removing SNPs at the same genomic position (keeping the one with highest MAF).

qcSNPs(x, ...)

# S4 method for class 'SNPDataLong'
qcSNPs(
  x,
  missing_ind = NULL,
  missing_snp = NULL,
  min_snp_cr = NULL,
  min_maf = NULL,
  hwe = NULL,
  snp_position = NULL,
  no_position = NULL,
  snp_mono = FALSE,
  remove_chr = NULL,
  action = c("report", "filter", "both")
)

Arguments

x

An object of class SNPDataLong.

...

Additional optional arguments.

missing_ind

Maximum allowed proportion of missing data per individual (currently not implemented).

missing_snp

Maximum allowed proportion of missing data per SNP (currently not implemented).

min_snp_cr

Minimum acceptable call rate for SNPs (e.g., 0.95). SNPs below this threshold are removed.

min_maf

Minimum minor allele frequency allowed for SNPs (e.g., 0.05). SNPs with lower MAF are removed.

hwe

p-value threshold for Hardy-Weinberg equilibrium test (e.g., 1e-6). SNPs violating this are removed.

snp_position

Logical. If TRUE, removes SNPs mapped to the same position, retaining only the one with highest MAF.

no_position

Logical. If TRUE, removes SNPs without defined genomic positions.

snp_mono

Logical. If TRUE, removes monomorphic SNPs (with no variation).

remove_chr

Character vector of chromosomes to exclude (e.g., c("X", "Y")).

action

One of "report" (returns a list of removed SNPs), "filter" (returns filtered SNPDataLong), or "both" (returns both).

Value

Depending on the action argument: - "report": list of SNPs removed by each filter and SNPs retained. - "filter": filtered SNPDataLong object. - "both": list containing the filtered object and detailed report.

Examples

if (FALSE) { # \dontrun{
set.seed(123)
mat <- matrix(sample(c(0, 1, 2, NA), 100,
              replace = TRUE, prob = c(0.4, 0.4, 0.15, 0.05)),
              nrow = 10, ncol = 10)
colnames(mat) <- paste0("snp", 1:10)
rownames(mat) <- paste0("ind", 1:10)
map <- data.frame(Name = colnames(mat), Chromosome = 1, Position = 1:10)
x <- new("SNPDataLong",
         geno = mat,
         map = map,
         path = "dummy_path",
         xref_path = rep("chip1", 10))

# Example using multiple filters
qcSNPs(x,
       min_snp_cr = 0.8,
       min_maf = 0.05,
       snp_mono = TRUE,
       no_position = TRUE,
       snp_position = TRUE,
       action = "filter")
} # }