This is a wrapper over count_kmers function in order to enable the computation of many types of k-mers in a single invocation of the function.

A user can input multiple k-mer configurations in the following way. Each parameter that is related to the configuration (i.e., k_vector, positional_vector, and kmer_gaps_list) is represented in a sequential form (i.e., a list or a vector). The i-th entry of each sequence corresponds to the i-th configuration.

count_multimers(
  sequences,
  k_vector,
  kmer_alphabet = getOption("seqR_kmer_alphabet_default"),
  positional_vector = rep(getOption("seqR_positional_default"), length(k_vector)),
  kmer_gaps_list = rep(list(c()), length(k_vector)),
  with_kmer_counts = getOption("seqR_with_kmer_counts_default"),
  with_kmer_names = getOption("seqR_with_kmer_names_default"),
  batch_size = getOption("seqR_batch_size_default"),
  hash_dim = getOption("seqR_hash_dim_default"),
  verbose = getOption("seqR_verbose_default")
)

Arguments

sequences

input sequences of one of two supported types, either string vector or list of string vectors

k_vector

an integer vector that represents the lengths of k-mers. The i-th element corresponds to the value of k for the i-th k-mer configuration.

kmer_alphabet

a string vector representing the elements that should be used during the construction of k-mers. By default, all elements that are present in sequences are taking into account

positional_vector

a logical vector that consists of k-mer configurations related to the positional part. The i-th element corresponds to the i-th k-mer configuration (i.e., whether the k-mer is positional or not)

kmer_gaps_list

a list of integer vectors that represents the lengths of k-mer gaps for each configuration separately. The i-th element of the list corresponds to the lengths of gaps of the i-th k-mer configuration

with_kmer_counts

a single logical value that determines whether the result should contain k-mer frequencies

with_kmer_names

a single logical value that determines whether the result should contain human-readable k-mer names

batch_size

a single integer value that represents the number of sequences that are being processed in a single step

hash_dim

a single integer value (1 <= hash_dim <= 500) representing the length of a hash vector that is internally used in the algorithm

verbose

a single logical value that denotes whether a user wants to get extra information on the current state of computations

Value

a Matrix value that represents a result k-mer matrix. The result is a sparse matrix in order to reduce memory consumption. The i-th row of the matrix represents k-mers found in the i-th input sequence. Each column represents a distinct k-mer. The names of columns conform to human-readable schema for k-mers, if parameter with_kmer_names = TRUE

Details

The comprehensive description of supported features is available in vignette("features-overview", package = "seqR").

See also

Function that count k-mers of one type: count_kmers

Function that merges several k-mer matrices (rbind): rbind_columnwise

Examples

batch_size <- 1 # Counting 1-mers count_multimers( c("AAAACFVV", "AAAAAA", "AAAAD"), k_vector = c(1), batch_size=batch_size)
#> Single-threaded mode enabled. In order to speed up computations, increase defined batch_size or use a default value
#> 3 x 5 sparse Matrix of class "dgCMatrix" #> C A F V D #> [1,] 1 4 1 2 . #> [2,] . 6 . . . #> [3,] . 4 . . 1
# Counting 1-mers and 2-mers count_multimers( c("AAAACFVV", "AAAAAA", "AAAAD"), k_vector = c(1, 2), batch_size = batch_size)
#> Single-threaded mode enabled. In order to speed up computations, increase defined batch_size or use a default value
#> 3 x 11 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 11 column names ‘C’, ‘A’, ‘F’ ... ]]
#> #> [1,] 1 4 1 2 . 3 1 1 1 1 . #> [2,] . 6 . . . 5 . . . . . #> [3,] . 4 . . 1 3 . . . . 1
# Counting 1-mers, 2-mers, and gapped 2-mers with the length of the gap = 1 count_multimers( c("AAAACFVV", "AAAAAA", "AAAAD"), k_vector = c(1, 2, 2), kmer_gaps = list(NULL, NULL, c(1)), batch_size=batch_size)
#> Single-threaded mode enabled. In order to speed up computations, increase defined batch_size or use a default value
#> 3 x 17 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 17 column names ‘C’, ‘A’, ‘F’ ... ]]
#> #> [1,] 1 4 1 2 . 3 1 1 1 1 . 2 1 1 1 1 . #> [2,] . 6 . . . 5 . . . . . 4 . . . . . #> [3,] . 4 . . 1 3 . . . . 1 2 . . . . 1
# Counting 3-mers, positional 3-mers, and positional gapped 2-mers with the length of the gap = 1 count_multimers( c("AAAACFVV", "AAAAAA", "AAAAD"), k_vector = c(3, 3, 2), kmer_gaps_list = list(NULL, NULL, c(1)), positional_vector = c(FALSE, TRUE, TRUE), batch_size=batch_size)
#> Single-threaded mode enabled. In order to speed up computations, increase defined batch_size or use a default value
#> 3 x 24 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 24 column names ‘A.C.F_0.0’, ‘F.V.V_0.0’, ‘A.A.A_0.0’ ... ]]
#> #> [1,] 1 1 2 1 1 . 1 1 1 1 1 1 . . . 1 1 1 1 1 1 . . . #> [2,] . . 4 . . . 1 1 . . . . 1 1 . . . . . 1 1 1 1 . #> [3,] . . 2 . . 1 1 1 . . . . . . 1 . . . . 1 1 . . 1