Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions task-1/output/clean_users.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
[
{
"id": 1,
"name": "Alice Johnson",
"email": "alice.johnson@company.com",
"department": "Engineering",
"salary": 85000
},
{
"id": 2,
"name": "Bob Smith",
"email": "bob.smith@company.com",
"department": "Unknown",
"salary": 72000
},
{
"id": 3,
"name": "Carol Williams",
"email": "carol.williams@company.com",
"department": "Engineering",
"salary": null
},
{
"id": 4,
"name": "David, Jr.",
"email": "david.brown@company.com",
"department": "Sales",
"salary": 68000
},
{
"id": 5,
"name": "Caf\u00e9 Owner",
"email": "eva@company.com",
"department": "Engineering",
"salary": 88000
},
{
"id": 6,
"name": "FRANK WILSON",
"email": "frank@company.com",
"department": "marketing",
"salary": 95000
},
{
"id": 7,
"name": "Grace Lee",
"email": "grace.lee@company.com",
"department": "Engineering",
"salary": null
},
{
"id": 9,
"name": "Henry Davis",
"email": "henry.davis@company.com",
"department": "Sales",
"salary": 82000
},
{
"id": 11,
"name": "Linda Taylor",
"email": "linda.t@company.com",
"department": "HR",
"salary": 55000
},
{
"id": 12,
"name": "Mike Brown",
"email": "mike.b@company.com",
"department": "Sales",
"salary": null
},
{
"id": 13,
"name": "Sarah Connor",
"email": "s.connor@sky.net",
"department": "Unknown",
"salary": -1
},
{
"id": 15,
"name": "John Doe",
"email": "john.doe@company.net",
"department": "Engineering",
"salary": 100000
}
]
Binary file added task-1/src/__pycache__/utils.cpython-312.pyc
Binary file not shown.
24 changes: 20 additions & 4 deletions task-1/src/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,28 @@ def clean_name(raw: str) -> str:

Returns the cleaned string. An empty input returns "".
"""
raise NotImplementedError("Implement clean_name (Task 1)")
return raw.strip()


def clean_email(raw: str) -> str:
"""Lowercase the email, strip surrounding whitespace.

Returns the cleaned string. An empty input returns "".
"""
raise NotImplementedError("Implement clean_email (Task 1)")
return raw.strip().lower()


def clean_department(raw: str) -> str:
"""Return the department, or 'Unknown' if missing/empty.

Strip whitespace; treat empty string as missing.
"""
raise NotImplementedError("Implement clean_department (Task 1)")
department = raw.strip()

if department == "":
return "Unknown"

return department


def clean_salary(raw: str) -> int | None:
Expand All @@ -38,4 +43,15 @@ def clean_salary(raw: str) -> int | None:
Handles inputs like "85000", " 95000", '"68,000"', "N/A", "".
Returns None when the value cannot be parsed (missing or "N/A").
"""
raise NotImplementedError("Implement clean_salary (Task 1)")
salary = raw.strip()
if salary in ("", "N/A"):
return None

salary = salary.replace('"', "")
salary = salary.replace(",", "")
salary = salary.strip()

try:
return int(salary)
except ValueError:
return None
105 changes: 105 additions & 0 deletions task-2/AI_DEBUG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,124 @@ Document one debugging session you had during Task 1 where you used an LLM

<!-- Paste the full traceback or describe the wrong behaviour. Include the
exact error message and the line of your code that triggered it. -->
While working on Task 1, my first version of `clean_salary()` tried to convert the salary directly to an integer after removing commas and quotes. It worked for values like `85000` and `"68,000"`, but it crashed when the CSV contained an unexpected salary format with a period inside the value.

My first version was:

def clean_salary(raw: str) -> int | None:
"""Parse a messy salary cell into an int.

Handles inputs like "85000", " 95000", '"68,000"', "N/A", "".
Returns None when the value cannot be parsed (missing or "N/A").
"""
salary = raw.strip()
if salary in ("", "N/A"):
return None

salary = salary.replace('"', "")
salary = salary.replace(",", "")
salary = salary.strip()

return int(salary)

When I ran the cleaner:

python3 src/cleaner.py --input data/messy_users.csv --output output/clean_users.json

I got this traceback:

Traceback (most recent call last):
File "G:\Halyna_work\traineeship_Amsterdam\Data-track\week-1\c55-data-week1\task-1\src\cleaner.py", line 58, in <module>
main(args.input, args.output)
File "G:\Halyna_work\traineeship_Amsterdam\Data-track\week-1\c55-data-week1\task-1\src\cleaner.py", line 46, in main
cleaned = [c for row in reader if (c := clean_row(row)) is not None]
^^^^^^^^^^^^^^
File "G:\Halyna_work\traineeship_Amsterdam\Data-track\week-1\c55-data-week1\task-1\src\cleaner.py", line 38, in clean_row
"salary": clean_salary(row.get("salary", "")),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\Halyna_work\traineeship_Amsterdam\Data-track\week-1\c55-data-week1\task-1\src\utils.py", line 55, in clean_salary
return int(salary)
^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '62.000'

The problem was that int() can only parse strings that look like valid integers, for example "62000". It cannot parse "62.000".

## The Prompt

<!-- The exact text you sent to the LLM. Include the code you pasted with
it. -->
I am working on a beginner Python CSV cleaning task.

The cleaner reads rows from messy_users.csv and calls this helper function:

def clean_salary(raw: str) -> int | None:
salary = raw.strip()

if salary == "":
return None

if salary.lower() == "n/a":
return None

salary = salary.replace('"', "")
salary = salary.replace(",", "")

return int(salary)

The assignment says:
- salaries like "85000", " 95000", and '"68,000"' should become integers
- empty values and "N/A" should become None
- if the value cannot be parsed, the function should return None instead of crashing

When I run the script, I get this error:

ValueError: invalid literal for int() with base 10: '62.000'

How should I fix clean_salary so the script keeps running?

## The Solution

<!-- What did the LLM suggest? Did it work on the first try, or did you
need a follow-up? -->
The AI suggested wrapping the final int(salary) conversion in a try/except ValueError block. It also suggested keeping the cleaning steps for whitespace, quotes, and commas before the conversion.

The fixed function was:

def clean_salary(raw: str) -> int | None:
salary = raw.strip()
if salary in ("", "N/A"):
return None

salary = salary.replace('"', "")
salary = salary.replace(",", "")
salary = salary.strip()

try:
return int(salary)
except ValueError:
return None

This worked after I reran the command:

python3 src/cleaner.py --input data/messy_users.csv --output output/clean_users.json

The script no longer crashed. Instead, rows with invalid salary formats were still included in the output, but their "salary" value became null in the JSON file.

## Reflection

<!-- A few sentences on: did you understand WHY the original code was
broken, or did you just accept the fix? What would you do differently next
time? -->
I understood why the original code was broken. The mistake was assuming that every non-empty and non-N/A salary could be converted with int(). That assumption was too strong because real CSV data can contain unexpected formats.

I did not need to change cleaner.py, because the orchestration logic was already correct. The bug belonged inside the helper function that parsed one field.

Next time, I would test helper functions separately before running the whole script. For example, I would manually check:

clean_salary("85000")
clean_salary('"68,000"')
clean_salary("N/A")
clean_salary("")
clean_salary("62.000")

That would catch the parsing problem earlier and make the traceback easier to understand.
Binary file added task-3/azure_proof.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.