📘 Assignment: Tokenizing a Story with NLTK

Welcome to your first text pre-processing assignment! In this project, you’ll apply your knowledge of tokenization to process a short story using Python and the Natural Language Toolkit (NLTK).

🎯 Objective

Read a short story from a .txt file and:

Tokenize it into sentences using sent_tokenize()
Tokenize it into words using word_tokenize()
(Optional) Clean the story using re.sub() to remove unwanted characters

📁 Files Provided

story.txt: A short story about a beginner coder.
main.py: A starter Python script where you will write your code.
README.md: You’re reading it now!

✅ Steps

Install NLTK if you haven’t already:
```
pip install nltk
```
Download NLTK resources in main.py: The script already includes the nltk.download('punkt') command for tokenizer models.
Read the file: Use Python’s file reading tools to load story.txt.
Clean the text (optional): Use re.sub() to remove any characters you don’t want included in your analysis.
Tokenize the story:
- Use sent_tokenize() to break the story into sentences.
- Use word_tokenize() to break the story into individual words.
Print the results to compare and see the difference between sentence and word tokenization.

🤔 Questions to Think About

How does sent_tokenize() determine where a sentence ends?
How does punctuation affect word_tokenize()?
How could this process help a chatbot better understand user input?

💡 Bonus Challenge

Count how many sentences and how many words are in the story.
Print the most frequent word (ignoring common stopwords like "the", "and", etc.)

Happy tokenizing! 🧠💬

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
README.md		README.md
main.py		main.py
story.txt		story.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 Assignment: Tokenizing a Story with NLTK

🎯 Objective

📁 Files Provided

✅ Steps

🤔 Questions to Think About

💡 Bonus Challenge

About

Uh oh!

Releases

Packages

Languages

Code-Platoon-Assignments/tokenization

Folders and files

Latest commit

History

Repository files navigation

📘 Assignment: Tokenizing a Story with NLTK

🎯 Objective

📁 Files Provided

✅ Steps

🤔 Questions to Think About

💡 Bonus Challenge

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages