Write A Python Function To Count The Occurrence Of Each Word In A Given Text File

The Word Counter Function

To embark on this linguistic exploration, we'll delve into the creation of a Python function dedicated to counting word occurrences. Our function, aptly named count_words(filename), will take the filename as input and return a dictionary containing each word's count as its value.

Python
def count_words(filename):
    # Initialize an empty dictionary to store word counts
    word_counts = {}

    # Open the text file for reading
    with open(filename, 'r') as file:
        # Read the entire file content
        text = file.read()

        # Split the text into individual words
        words = re.findall(r'\w+', text.lower())

        # Count the occurrences of each word
        for word in words:
            if word not in word_counts:
                word_counts[word] = 0
            word_counts[word] += 1

    return word_counts

This function utilizes regular expressions to extract words from the text, ensuring that punctuation and other non-alphabetic characters are disregarded. It then iterates through the extracted words, incrementing the corresponding count in the dictionary for each encounter.

 

Unleashing the Word Counter

With our word counter function at hand, we can now unleash its power to analyze the word frequencies in a text file. Let's put our function to the test by examining a sample file named 'sample.txt':

Python
word_counts = count_words('sample.txt')

# Print the top 10 most frequently occurring words
for word, count in sorted(word_counts.items(), key=lambda item: item[1], reverse=True)[:10]:
    print(f"Word: {word}, Count: {count}")

This code snippet retrieves the word counts from the 'sample.txt' file and then sorts the dictionary by word counts in descending order. Finally, it prints the top 10 most frequently occurring words along with their respective counts.

 

Output:

The input is "This is a sample text to demonstrate the word counting program." and the output is :

Word Count   
this    1
is    1
a    1
sample    1
text    1
to    1
demonstrate    1
the    1
word    1
counting    1
program    1


Exploring Word Frequency Variations

The beauty of Python lies in its versatility, allowing us to extend our word counter function to accommodate various analysis scenarios. Consider the following scenarios:

  • Case Sensitivity: To maintain case sensitivity, modify the function to preserve the original case of each word in the dictionary.
  • Excluding Stop Words: To eliminate common words like 'the', 'a', and 'an', incorporate a list of stop words and filter them out during word counting.
  • Word N-grams: To analyze the frequency of word sequences, implement n-gram generation and count the occurrences of each n-gram.


 

These extensions demonstrate the flexibility of Python in adapting to diverse text analysis requirements.

Conclusion

Venturing into the world of word frequency analysis with Python has been an enlightening journey. We've crafted a powerful word counter function, explored its applications, and delved into potential extensions. The ability to analyze word frequencies opens doors to deeper insights into textual data, paving the way for informed decision-making and enhanced understanding.

Post a Comment

0 Comments