Skip to content

Migrate sheets reading to use serde #39

@illicitonion

Description

@illicitonion

We have a bunch of structs which we fill from Google Sheets, one per row, e.g.

trainee-tracker/src/prs.rs

Lines 217 to 224 in aa0d578

#[derive(Debug, PartialEq, Eq, Serialize)]
pub(crate) struct ReviewerStaffOnlyDetails {
pub(crate) name: String,
pub(crate) attended_training: bool,
pub(crate) checked: CheckStatus,
pub(crate) quality: String,
pub(crate) notes: String,
}

We currently do this in an annoyingly manual and inflexible way.

For this particular struct, we populate it here:

for (row_index, cells) in sheet.rows.iter().enumerate() {
if row_index == 0 {
continue;
}
if cells.len() < 6 {
continue;
}
let github_login = GithubLogin::from(cell_string(&cells[0]));
let notes = match cells.get(6) {
Some(cell) => cell_string(cell),
None => String::new(),
};
let checked = match (cell_bool(&cells[3]), cell_bool(&cells[4])) {
(true, false) => CheckStatus::CheckedAndOk,
(true, true) => CheckStatus::CheckedAndCheckAgain,
(false, _) => CheckStatus::Unchecked,
};
reviewers.insert(
github_login.clone(),
ReviewerStaffOnlyDetails {
name: cell_string(&cells[1]),
attended_training: cell_bool(&cells[2]),
checked,
quality: cell_string(&cells[5]),
notes,
},
);
}

We ignore the headings completely, so if someone changed the sheet, we would start reading incorrect values. Then we hard-code the field order and types of each cell.

In other places, we manually check headings, and just error if they're wrong:

for (row_number, cells) in sheet_data.into_iter().enumerate() {
// Some sheets have documentation or pivot table
if row_number == 0 && !cells.is_empty() && cell_string(&cells[0]) != "Name" {
continue;
}
if cells.len() < 7 {
return Err(anyhow::anyhow!(
"Not enough columns for row {} - expected at least 7, got {} containing: {}",
row_number,
cells.len(),
format!("{:#?}", cells),
));
}
if row_number == 0 {
let headings = cells.iter().take(7).map(cell_string).collect::<Vec<_>>();
if headings
!= [
"Name",
"Email",
"Timestamp",
"Course",
"Module",
"Day",
"Location",
]
{
return Err(anyhow::anyhow!(
"Register sheet contained wrong headings: {}",
headings.join(", ")
));
}
} else {
if cells[0].effective_value.is_none() {
break;
}
let (sprint_number, attendance) = read_row(&cells, register_url.clone())
.with_context(|| format!("Failed to read attendance from row {}", row_number))?;
if attendance.timestamp.date_naive() <= start_date
|| attendance.timestamp.date_naive() >= end_date
{
continue;
}
let sprint_index = sprint_number - 1;
while sprints.len() < sprint_number {
sprints.push(IndexMap::new());
}
if sprints[sprint_index].contains_key(&attendance.email) {
warn!(
"Register sheet contained duplicate entry for sprint {} trainee {}",
sprint_number, attendance.email
);
} else {
sprints[sprint_index].insert(attendance.email.clone(), attendance);
}
}
}

We also handle partial rows poorly.

I would love instead for us to just use serde. We could derive Deserialize for each struct we populate, and use a serde adaptor to read the sheet into a Vec<T>.

There is an existing serde_sheets crate for doing this, but it has some issues. If it runs into issues with a row, it just prints an error to stdout and continues (https://docs.rs/serde_sheets/latest/src/serde_sheets/lib.rs.html#200).

Ideally we would either improve this crate, or write our own one. I'm imagining supporting a function along the lines of:

pub async fn read_all<T: DeserializeOwned>(
    sheets: &mut Sheets,
    document_id: &str,
    tab_name: &str,
) -> SheetReadResult<T>;

enum SheetReadResult<T> {
    AllOk(Vec<T>),
    SomeOk {
        ok: Vec<T>,
        partial_records: Vec<Record>,
        deserialization_error_records: Vec<(Record, serde::Error)>,
    },
    Err(SheetsError),
}

struct Record {
    row_number: usize,
    cells: Vec<CellData>,
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions