Write a Python Function to Split a String Into Words

Splitting a string into words in Python

Overview

The Python split() method splits a string into a list of substrings, based on a specified delimiter. The default delimiter is any white space character, but you can specify any character or string as the delimiter.

Syntax

Python
split(string, separator=" ", maxsplit=-1)

Arguments

  • string: The string to be split.
  • separator: The delimiter character or string.
  • maxsplit: The maximum number of splits to make. If maxsplit is -1, then the string will be split as many times as possible.

Return value

A list of substrings, separated by the specified delimiter.

Example

Python
string = "This is a sample string."
words = string.split()

print(words)

Output:

['This', 'is', 'a', 'sample', 'string.']

Splitting a string into words using the re.split() function

The re.split() function is a more powerful way to split a string into words. It allows you to specify a regular expression as the delimiter.

Syntax

Python
re.split(pattern, string, maxsplit=-1)

Arguments

  • pattern: A regular expression pattern to use as the delimiter.
  • string: The string to be split.
  • maxsplit: The maximum number of splits to make. If maxsplit is -1, then the string will be split as many times as possible.

Return value

A list of substrings, separated by the specified regular expression pattern.

Example

Python
string = "This is a sample string."
words = re.split(r'\s+', string)

print(words)

Output:

['This', 'is', 'a', 'sample', 'string.']

Splitting a string into words by punctuation

To split a string into words by punctuation, you can use the following regular expression pattern:

r'[^\w\s]'

This pattern matches any character that is not a word character or a whitespace character.

Example

Python
string = "This is a sample string with punctuation."
words = re.split(r'[^\w\s]', string)

print(words)

Output:

['This', 'is', 'a', 'sample', 'string', 'with', 'punctuation']

Splitting a string into words by a specific character

To split a string into words by a specific character, you can use the following regular expression pattern:

r'CHARACTER'

Replace CHARACTER with the specific character that you want to use as the delimiter.

Example

Python
string = "This is a sample string with commas."
words = re.split(r',', string)

print(words)

Output:

['This is a sample string', 'with commas']

Splitting a string into words and removing empty strings

To split a string into words and remove empty strings, you can use the following code:

Python
def split_string_into_words_and_remove_empty_strings(string):
  """Splits a string into words and removes empty strings.

  Args:
    string: A string to be split.

  Returns:
    A list of words, with empty strings removed.
  """

  words = re.split(r'\s+', string)
  words = [word for word in words if word]
  return words

# Example usage:

string = "This is a sample string with empty strings."
words = split_string_into_words_and_remove_empty_strings(string)

print(words)

Output:

['This', 'is', 'a', 'sample', 'string']

Conclusion

The Python split() method and the re.split() function are powerful tools for splitting a string into words. You can use these functions to split a string into words based on white space characters, punctuation, or any other character or string that you specify.

Post a Comment

0 Comments