Splitting a string into words in Python
Overview
The Python split()
method splits a string into a list of substrings, based on a specified delimiter. The default delimiter is any white space character, but you can specify any character or string as the delimiter.
Syntax
split(string, separator=" ", maxsplit=-1)
Arguments
string
: The string to be split.separator
: The delimiter character or string.maxsplit
: The maximum number of splits to make. Ifmaxsplit
is-1
, then the string will be split as many times as possible.
Return value
A list of substrings, separated by the specified delimiter.
Example
string = "This is a sample string."
words = string.split()
print(words)
Output:
['This', 'is', 'a', 'sample', 'string.']
Splitting a string into words using the re.split()
function
The re.split()
function is a more powerful way to split a string into words. It allows you to specify a regular expression as the delimiter.
Syntax
re.split(pattern, string, maxsplit=-1)
Arguments
pattern
: A regular expression pattern to use as the delimiter.string
: The string to be split.maxsplit
: The maximum number of splits to make. Ifmaxsplit
is-1
, then the string will be split as many times as possible.
Return value
A list of substrings, separated by the specified regular expression pattern.
Example
string = "This is a sample string."
words = re.split(r'\s+', string)
print(words)
Output:
['This', 'is', 'a', 'sample', 'string.']
Splitting a string into words by punctuation
To split a string into words by punctuation, you can use the following regular expression pattern:
r'[^\w\s]'
This pattern matches any character that is not a word character or a whitespace character.
Example
string = "This is a sample string with punctuation."
words = re.split(r'[^\w\s]', string)
print(words)
Output:
['This', 'is', 'a', 'sample', 'string', 'with', 'punctuation']
Splitting a string into words by a specific character
To split a string into words by a specific character, you can use the following regular expression pattern:
r'CHARACTER'
Replace CHARACTER
with the specific character that you want to use as the delimiter.
Example
string = "This is a sample string with commas."
words = re.split(r',', string)
print(words)
Output:
['This is a sample string', 'with commas']
Splitting a string into words and removing empty strings
To split a string into words and remove empty strings, you can use the following code:
def split_string_into_words_and_remove_empty_strings(string):
"""Splits a string into words and removes empty strings.
Args:
string: A string to be split.
Returns:
A list of words, with empty strings removed.
"""
words = re.split(r'\s+', string)
words = [word for word in words if word]
return words
# Example usage:
string = "This is a sample string with empty strings."
words = split_string_into_words_and_remove_empty_strings(string)
print(words)
Output:
['This', 'is', 'a', 'sample', 'string']
Conclusion
The Python split()
method and the re.split()
function are powerful tools for splitting a string into words. You can use these functions to split a string into words based on white space characters, punctuation, or any other character or string that you specify.
0 Comments