R/mat2SnpMatrix.R
as_snpmatrix.RdThis function converts a genotype matrix coded as 0/1/2/NA or AA/AB/BB to a
snpStats::SnpMatrix object. It includes checks for coding validity,
missing values, and duplicate sample or SNP IDs, and preserves row and column
names from the input.
A samples x SNPs matrix or data.frame with genotypes coded as
0, 1, 2, or NA. Can be numeric/integer or character. rownames =
sample IDs, colnames = SNP IDs.
One of "012" or "AAABBB". For character inputs only.
"012" expects "0", "1", "2", and missing_codes.
"AAABBB" expects "AA", "AB", "BB", and missing_codes.
Character values to treat as missing (only used when
geno is character), e.g., c("NA","-9",".").
If TRUE, verifies that row and column names are unique
(recommended).
A snpStats::SnpMatrix with the same dimnames as geno.
The function accepts both matrix and data.frame inputs. For
data.frame objects, all columns are coerced to a common type using
as.matrix(), which preserves rownames and colnames.
The returned SnpMatrix object stores each genotype as a single byte,
which is memory-efficient compared to integer storage. However, large datasets
still require substantial RAM. For very large genotype sets, consider using
on-disk formats such as SNPRelate (GDS) or bigsnpr.
# Numeric 0/1/2 with NAs
set.seed(1)
geno <- matrix(sample(c(0L,1L,2L,NA), 20, replace=TRUE), nrow=5)
rownames(geno) <- paste0("ind", 1:5)
colnames(geno) <- paste0("snp", 1:4)
SM <- as_snpmatrix(geno)
# Character AA/AB/BB
geno_c <- matrix(sample(c("AA","AB","BB","."), 20, replace=TRUE,
prob=c(.35,.3,.3,.05)), nrow=5)
rownames(geno_c) <- rownames(geno)
colnames(geno_c) <- colnames(geno)
SMc <- as_snpmatrix(geno_c, coding="AAABBB", missing_codes=".")