Assignment 3#3
Open
alf-99 wants to merge 3 commits into
Open
Conversation
Aditya-k-23
approved these changes
Jan 14, 2026
Aditya-k-23
left a comment
There was a problem hiding this comment.
Well written code and complete submission. Good job!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)
I completed Assignment 3 on clustering and bootstrapping with the Wine dataset. The main changes were:
-Loading and exploring the wine chemical composition data.
-Creating scatter plots to visualize feature relationships.
-Standardizing the data for K-means clustering.
-Applying K-means with 3 clusters and labeling the data.
-Implementing bootstrapping to calculate a confidence interval for color intensity.
What did you learn from the changes you have made?
This assignment helped me understand:
-How to prepare data for clustering algorithms (especially scaling).
-The importance of standardization when using distance-based methods like K-means.
-How bootstrapping works to estimate confidence intervals without needing more data.
-How to interpret cluster patterns in multidimensional data.
-The elbow method for choosing optimal cluster numbers.
Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?
For the clustering part, I considered trying different values of k (not just 3) and using the elbow method to pick the best one. For bootstrapping, I thought about comparing different confidence levels (95% vs 90%) to see how the interval width changes.
Were there any challenges? If so, what issue(s) did you face? How did you overcome it?
The main challenge was the K-means warning about memory leaks on Windows. The sklearn documentation mentioned this is a known issue with MKL on Windows. I decided to proceed since it doesn't affect the results, just a warning. Also, creating all those scatter plots (78 of them!) was computationally heavy but helped visualize the patterns.
How were these changes tested?
-Verified the dataset loaded correctly (178 wines, 13 features).
-Checked that standardization gave mean=0, std=1 for all features.
-Confirmed clustering assigned all points to one of 3 clusters.
-Validated bootstrap results by checking the original mean (5.058) fell within the 90% CI (4.78 to 5.35).
-Ran all code cells sequentially to ensure no errors.
A reference to a related issue in your repository (if applicable)
N/A - This is for Assignment 3 submission.
Checklist