Unseen Words

unseen_words(
  code,
  unweighted = TRUE,
  include_handcoded = FALSE,
  exclude_matched_by = c("excerpts", "word")
)

Arguments

code

Code object

unweighted

logical TRUE (default), binarized the document matrix, so multiple occurrences of single word in one line doesn't count multiple times

include_handcoded

logical FALSE (default), will not use words from the handcoded set. If TRUE, the handcoded set will be used

exclude_matched_by

character, either "excerpts" (default) or "word". "excerpts" will remove full excerpts that match an expression, "word" will remove words from WDM to not include in sum of excerpts

Value

Excerpts that contain unseen words