Skip to content

Conversation

@tristanpwdennis
Copy link
Collaborator

I've drafted the user guide and terms of use for the Adir1.0 data.

The links to partner study pages, and GCP will break as the former don't exist (yet) and the latter needs to be reorganised / put into a release bucket.

Text of the ToU to be finalised also.

@jonbrenas
Copy link
Collaborator

In general, it looks pretty good to me.

A few small things:

  • docs/adir1/adir1.0.ipynb:
    • It looks like it is templated more on the small release pages (e.g., Af1.1) than on the initial release pages (e.g., Af1.0). The differences are small but I think it is worth it to have a more detailed introduction of what the resource is. I guess it is a little different here as the project 'Vector Observatory - Asia' covers several resources but I think it would be better to introduce the project more than once than not at all.
    • The title is 'Adir.0' instead of 'Adir1.0' (or, to me, the even better 'Adir1.0 (Vector Observatory - Asia Project Anopheles dirus Phase 1 Data Release')
    • The first link 'https://github.com/malariagen/vector-data/blob/4162d060bf46912c7d56f3528ce74604d10b36cf/docs/adir1/adir.0' looks wrong
    • The 'Partner study' section should contain one entry per partner study and not per sample set.
    • It would probably be a good idea to have a 'Sequencing and variant calling methods' page and refer to it in the similarly named section (but we didn't do it for funestus for some reason, so what do I know).
    • The 'also' in 'The SNP data have also been uploaded to Google Cloud' is not needed (at least not until the data are also uploaded to ENA and a paragraph is added).
  • docs/adir1/cloud.ipynb:
    • I think the data should end up being stored in vo_adir_release_master_us_central1 for consistency but it doesn't exist yet and we have not moved the minimus data so I guess it is fine (at least for now). I think you also moved the data to vo_adir_release_us_central_1 so it is not accurate anymore.
    • The text mentions and links to the Af1 API instead of the Adir1.
    • There is an 'Adir.0' in the "Sample metadata" section.
    • The section "SNP sites and alleles" contains the text '(e.g., 2RL)' which doesn't apply here.
    • It is not this V/PUG problem (as it is present in the other V/PUGs as well) but "Values coded as integers, where -1 represents a missing value, 0 represents the reference allele, and 1, 2, and 3 represent alternate alleles." is not a proper sentence.
  • docs/adir1/download.ipynb:
    • The title has a typo
    • Are the BAMs on ENA? The missing paragraph in adir1.0.ipynb lead me to believe that it was not the case yet.
    • A 'e' is missing in "thes data using wget.". It is also the case for the other V/PUGs, btw.
    • The section "Specimen collection metadata" uses both the bucket vo_adir_production_us_central1 and vo_adir_release_us_central1
    • There is a typo in "Site filters": "For An. dirus, theyu are only available as a Zarr array (see below)"

@tristanpwdennis
Copy link
Collaborator Author

Hi Jon - thank you for your eyes on this!

I (think) have amended all the typos you have highlighted. Answers to a few of your questions...

  • On the Adir1.0.ipynb I've added some more in the preamble of the project.
  • The blob link: I can't see this link in my code - perhaps I am missing something?
  • On the variant calling, I added a bit more info here on the QC.
  • BAMs on ENA - let's check with Anastasia next week.

Let me know if any more modifications needed!

@ahernank
Copy link
Collaborator

@tristanpwdennis Apologies, following @jonbrenas' check above, I now realise I made a mistake with the release bucket naming -- would you mind using vo_adir_release_master_us_central1 instead? Very sorry about the extra work.

@ahernank
Copy link
Collaborator

On BAMs on ENA. The raw reads should be in there but we need to extract the accessions and the aligned BAMs are not in there, but we should be able to upload them in the next couple of weeks, together with the minimus batch.

Copy link
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahernank
Copy link
Collaborator

Just to add that download.ipynb will need updating with the "master" GCS path but looks great otherwise.

And, @tristanpwdennis just to flag that all webpages on the MGEN website are up now so please give us a ping if you think any edits are needed on those.

@tristanpwdennis
Copy link
Collaborator Author

tristanpwdennis commented Oct 21, 2025

Great - thanks guys. I've updated the GCS paths. I can't approve / merge my own PR, so let me know if anything else I can do here :)

@jonbrenas
Copy link
Collaborator

I think there are a few*.ipynb.ipynb that shouldn't be here. I guess your last commit didn't really do what you hoped it would.

@ahernank
Copy link
Collaborator

We just need to remove the Af1.0 file here, and all good to go!

@tristanpwdennis tristanpwdennis merged commit 86f2664 into malariagen:master Oct 22, 2025
1 check failed
@tristanpwdennis tristanpwdennis deleted the adir1-terms-of-use branch October 22, 2025 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants