String manipulation with stringr

Download PDF

Translations (PDF)

Portuguese
Spanish
Vietnamese

The stringr package provides a set of internally consistent tools for working with character strings, i.e.sequences of characters surrounded by quotation marks.

library(stringr)

Detect Matches

str_detect(string, pattern, negate = FALSE): Detect the presence of a pattern match in a string. Also str_like().
```
str(fruit, "a")
```
str_starts(string, pattern, negate = FALSE): Detect the presence of a pattern match at the beginning of a string. Also str_ends().
```
str_starts(fruit, "a")
```
str_which(string, pattern, negate = FALSE): Find the indexes of strings that contain a pattern match.
```
str_which(fruit, "a")
```
str_locate(string, pattern): Locate the positions of pattern matches in a string. Also str_locate_all().
```
str_locate(fruit, "a")
```
str_count(string, pattern): Count the number of matches in a string.
```
str_count(fruit, "a")
```

Mutate Strings

str_sub() <- value: Replace substrings by identifying the substrings with str_sub() and assigning into the results.
```
str_sub(fruit, 1, 3) <- "str"
```
str_replace(string, pattern, replacement): Replace the first matched pattern in each string. Also str_remove().
```
str_replace(fruit, "p", "-")
```
str_replace_all(string, pattern, replacement): Replace all matched patterns in each string. Also str_remove_all().
```
str_replace_all(fruit, "p", "-")
```
str_to_lower(string, locale = "en")¹: Convert strings to lower case.
```
str_to_lower(sentences)
```
str_to_upper(string, locale = "en")¹: Convert strings to upper case.
```
str_to_upper(sentences)
```
See Also
Pokemon X And Y Cheats - Action Replay Codes, Hints, And Guides For 3DS | PokemonCoders Data visualization with ggplot2 :: Cheat Sheet Apply functions with purrr :: Cheatsheet ▷ XYZ-Analyse » Definition, Erklärung & Beispiele + Übungsfragen
str_to_title(string, locale = "en")¹: Convert strings to title case. Also str_to_setence().
```
str_to_title(sentences)
```

Subset Strings

str_sub(string, start = 1L, end = -1L): Extract substrings from a character vector.
```
str_sub(fruit, 1, 3)str_sub(fruit, -2)
```
str_subset(string, pattern, negate = FALSE): Return only the strings that contain a pattern match.
```
str_subset(fruit, "p")
```
str_extract(string, pattern): Return the first pattern match found in each string, as a vector. Also str_extract_all() to return every pattern match.
```
str_extract(fruit, "[aeiou]")
```
str_match(string, pattern): Return the first pattern match found in each string, as a matrix with a column for each ( ) group in pattern. Also str_match_all().
```
str_match(sentences, "(a|the) ([^ +])")
```

Join and Split

str_c(..., sep = "", collapse = NULL): Join multiple strings into a single string.
```
str_c(letters, LETTERS)
```
str_flatten(string, collapse = ""): Combines into a single string, separated by collapse.
```
str_flatten(fruit, ", ")
```
str_dup(string, times): Repeat strings times times. Also str_unique() to remove duplicates.
```
str_dup(fruit, times = 2)
```
str_split_fixed(string, pattern, n): Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match). Also str_split() to return a list of substrings and str_split_i() to return the ith substring.
```
str_split_fixed(sentences, " ", n = 3)
```
str_glue(..., .sep = "", .envir = parent.frame()): Create a string from strings and {expressions} to evaluate.
```
str_glue("Pi is {pi}")
```
str_glue_data(.x, ..., .sep = "", .envir = parent.frame(), .na = "NA"): Use a data frame, list, or environment to create a string from strings and {expressions} to evaluate.
```
str_glue_data(mtcars, "{rownames(mtcars)} has {hp} hp")
```

Manage Lengths

str_length(string): The width of strings (i.e.number of code points, which generally equals the number of characters).
```
str_length(fruit)
```
str_pad(string, width, side = c("left", "right", "both"), pad = " "): Pad strings to constant width.
```
str_pad(fruit, 17)
```
str_trunc(string, width, side = c("left", "right", "both"), ellipsis = "..."): Truncate the width of strings, replacing content with ellipsis.
```
str_trunc(sentences, 6)
```
str_trim(string, side = c("left", "right", "both")): Trim whitespace from the start and/or end of a string.
```
str_trim(str_pad(fruit, 17))
```
str_squish(string): Trim white space from each end and collapse multiple spaces into single spaces.
```
str_squish(str_pad(fruit, 17, "both"))
```

Order Strings

str_order(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...)^1^: Return the vector of indexes that sorts a character vector.
```
fruit[str_order(fruit)]
```
str_sort(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...)^1^: Sort a character vector.
```
str_sort(fruit)
```

Helpers

str_conv(string, encoding): Override the encoding of a string.
```
str_conv(fruit, "ISO-8859-1")
```
str_view(string, pattern, match = NA): View HTML rendering of all regex matches. Also str_view() to see only the first match.
```
str_view(sentences, "[aeiou]")
```
str_equal(x, y, locale = "en", ignore_case = FALSE, ...)¹: Determine if two strings are equivalent.
```
str_equal(c("a", "b"), c("a", "c"))
```
str_wrap(string, width = 80, indent = 0, exdent = 0): Wrap strings into nicely formatted paragraphs.
```
str_wrap(sentences, 20)
```

¹ See http://bit.ly/ISO639-1 for a complete list of locales.

Regular Expressions

Regular expressions, or regexps, are a concise language for describing patterns in strings.

Need to Know

Pattern arguments in stringr are interpreted as regular expressions after any special characters have been parsed.

In R, you write regular expressions as strings, sequences of characters surrounded by quotes("") or single quotes ('').

Some characters cannot be directly represented in an R string. These must be represented as special characters, sequences of characters that have a specific meaning, e.g.\\ represents \, \" represents ", and \n represents a new line. Run ?"'" to see a complete list.

Because of this, whenever a \ appears in a regular expression, you must write it as \\ in the string that represents the regular expression.

Use writeLines() to see how R views your string after all special characters have been parsed.

For example, writeLines("\\.") will be parsed as \.

and writeLines("\\ is a backslash") will be parsed as \ is a backslash.

Interpretation

Patterns in stringr are interpreted as regexs. To change this default, wrap the pattern in one of:

regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...): Modifies a regex to ignore cases, match end of lines as well as end of strings, allow R comments within regexs, and/or to have . match everthing including \n.
```
str_detect("I", regex("i", TRUE))
```
fixed(): Matches raw bytes but will miss some characters that can be represented in multiple ways (fast).
```
str_detect("\u0130", fixed("i"))
```
coll(): Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow).
```
str_detect("\u0130", coll("i", TRUE, locale = "tr"))
```
boundary(): Matches boundaries between characters, line_breaks, sentences, or words.
```
str_split(sentences, boundary("word"))
```

Match Characters

see <- function(rx) str_view("abc ABC 123\t.!?\\(){}\n", rx)

1Many base R functions require classes to be wrapped in a second set of [ ], e.g.[[:digit:]]
string (type this)	regex (to mean this)	matches (which matches this)	example	example output (highlighted characters are in <>)
	`a (etc.)`	`a (etc.)`	`see("a")`	`<a>bc ABC 123\t.!?\(){}\n`
`\\.`	`\.`	`.`	see("\\.")``	`abc ABC 123\t<.>!?\(){}\n`
`\\!`	`\!`	`!`	`see("\\!")`	`abc ABC 123\t.<!>?\(){}\n`
`\\?`	`\?`	`?`	`see("\\?")`	`abc ABC 123\t.!<?>\(){}\n`
`\\\\`	`\\`	`\`	`see("\\\\")`	`abc ABC 123\t.!?<\>(){}\n`
`\\(`	`\(`	`(`	`see("\\(")`	`abc ABC 123\t.!?\<(>){}\n`
`\\)`	`\)`	`)`	`see("\\)")`	`abc ABC 123\t.!?\(<)>{}\n`
`\\{`	`\{`	`{`	`see("\\{")`	`abc ABC 123\t.!?\()<{>}\n`
`\\}`	`\}`	`}`	`see("\\}")`	`abc ABC 123\t.!?\(){<}>\n`
`\\n`	`\n`	new line (return)	`see("\\n")`	`abc ABC 123\t.!?\(){}<\n>`
`\\t`	`\t`	tab	`see("\\t")`	`abc ABC 123<\t>.!?\(){}\n`
`\\s`	`\s`	any whitespace (`\S` for non-whitespaces)	`see("\\s")`	`abc< >ABC< >123<\t>.!?\(){}<\n>`
`\\d`	`\d`	any digit (`\D` for non-digits)	`see("\\d")`	`abc ABC <1><2><3>\t.!?\(){}\n`
`\\w`	`\w`	any word character (`\W` for non-word characters)	`see("\\w")`	`<a><b><c> <A><B><C> <1><2><3>\t.!?\(){}\n`
`\\b`	`\b`	word boundaries	`see("\\b")`	`<>abc<> <>ABC<> <>123<>\t.!?\(){}\n`
	`[:digit:]`¹	digits	`see("[:digit:]")`	`abc ABC <1><2><3>\t.!?\(){}\n`
	`[:alpha:]`¹	letters	`see("[:alpha:]")`	`<a><b><c> <A><B><C> 123\t.!?\(){}\n`
	`[:lower:]`¹	lowercase letters	`see("[:lower:]")`	`<a><b><c> ABC 123\t.!?\(){}\n`
	`[:upper:]`¹	uppercase letters	`see("[:upper:]")`	`abc <A><B><C> 123\t.!?\(){}\n`
	`[:alnum:]`¹	letters and numbers	`see("[:alnum:]")`	`<a><b><c> <A><B><C> <1><2><3>\t.!?\(){}\n`
	`[:punct:]`¹	punctuation	`see("[:punct:]")`	`abc ABC 123\t<.><!><?><\><(><)><{><}>\n`
	`[:graph:]`¹	letters, numbers, and punctuation	`see("[:graph:]")`	`<a><b><c> <A><B><C> <1><2><3>\t<.><!><?><\><(><)><{><}>\n`
	`[:space:]`¹	space characters (i.e.`\s`)	`see("[:space:]")`	`abc< >ABC< >123<\t>.!?\(){}<\n>`
	`[:blank:]`¹	space and tab (but not new line)	`see("[:blank:]")`	`abc< >ABC< >123<\t>.!?\(){}\n`
	`.`	every character except a new line	`see(".")`	`<a><b><c>< ><A><B><C>< ><1><2><3><\t><.><!><?><\><(><)><{><}><\n>`

Classes

The [:space:] class includes new line, and the [:blank:] class
- The [:blank:] class includes space and tab (\t)
The [:graph:] class contains all non-space characters, including [:punct:], [:symbol:], [:alnum:], [:digit:], [:alpha:], [:lower:], and [:upper:]
- [:punct:] contains punctuation: . , : ; ? ! / * @ # - _ " [ ] { } ( )
- [:symbol:] contains symbols: | ` = + ^ ~ < > $
- [:alnum:] contains alphanumeric characters, including [:digit:], [:alpha:], [:lower:], and [:upper:]
  - [:digit:] contains the digits 0 through 9
  - [:alpha:] contains letters, including [:upper:] and [:lower:]
    - [:upper:] contains uppercase letters and [:lower:] contains lowercase letters
The regex . contains all characters in the above classes, except new line.

Alternates

alt <- function(rx) str_view("abcde", rx)

Alternates
regexp	matches	example	example output (highlighted characters are in <>)
`ab\|d`	or	`alt("ab\|d")`	`<ab>c<d>e`
`[abe]`	one of	`alt("[abe]"`	`<a><b>cd<e>`
`[^abe]`	anything but	`alt("[^abe]")`	`ab<c><d>e`
`[a-c]`	range	`alt("[a-c]")`	`<a><b><c>de`

Anchors

anchor <- function(rx) str_view("aaa", rx)

Anchors
regexp \| matches \| example \| example output \| \| \| (highlighted characters are in <>)
`^a` \| start of string \| `anchor("^a")` \| `\| \| \| <a>aa \| \| \|`
`a$` \| end of string \| `anchor("a$")` \| `\| \| \| aa<a> \| \| \|`

Look Arounds

look <- function(rx) str_view("bacad", rx)

Look arounds
regexp	matches	example	example output (highlighted characters are in <>)
`a(?=c)`	followed by	`look("a(?=c)")`	`b<a>cad`
`a(?!c)`	not followed by	`look("a(?!c)")`	`bac<a>d`
`(?<=b)a`	preceded by	`look("(?<=b)a")`	`b<a>cad`
`(?<!b)a`	not preceded by	`look("(?<!b)a")`	`bac<a>d`

Quantifiers

quant <- function(rx) str_view(".a.aa.aaa", rx)

Quantifiers
regexp	matches	example	example output (highlighted characters are in <>)
`a?`	zero or one	`quant("a?")`	`<>.<a><>.<a><a><>.<a><a><a><>`
`a*`	zero or more	`quant("a*")`	`<>.<a><>.<aa><>.<aaa><>`
`a+`	one or more	`quant("a+")`	`.<a>.<aa>.<aaa>`
`a{n}`	exactly `n`	`quant("a{2}")`	`.a.<aa>.<aa>a`
`a{n, }`	`n` or more	`quant("a{2,}")`	`.a.<aa>.<aaa>`
`a{n, m}`	between `n` and `m`	`quant("a{2,4}")`	`.a.<aa>.<aaa>`

Groups

ref <- function(rx) str_view("abbaab", rx)

Use parentheses to set precedent (order of evaluation) and create groups

Groups
regexp	matches	example	example output (highlighted characters are in <>)
`(ab\|d)e`	sets precedence	`alt("(ab\|d)e")`	`abc<de>`

Use an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern. Refer to each group by its order of appearance

More groups
string (type this)	regexp (to mean this)	matches (which matches this)	example (the result is the same as `ref("abba")`)	example output (highlighted characters are in <>)
`\\1`	`\1` (etc.)	first () group, etc.	`ref("(a)(b)\\2\\1")`	`<abba>ab`

CC BY SA Posit Software, PBC • info@posit.co • posit.co

Learn more at stringr.tidyverse.org.

Updated: 2024-06.

packageVersion("stringr")

[1] '1.5.1'

String manipulation with stringr :: Cheatsheet (2024)