Replies: 1 comment
-
|
I haven't tested anything but I'm getting more and more interested in using sqlite for quick bam-queries because it could also scale to larger queries using bigquery or parallelized per library. I'm curious your thoughts on using sqlite's inbuilt Levenshtein distance, I guess it depends on how long the sequence is, and if you're matching very small subsequences or the majority of the reads. For single-cell/adjacent, it could be nice to match cell barcodes more accurately, when some programs just using simple hamming distance without even proper indels. Or maybe a variant of this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We'd like to add sequence content indexing, to answer queries for stored DNA/RNA [sub]sequences with similarity to a given one. I'd like advice from the community on what data structures / algos should implement this.
For a table
target_tablewhere each row has a column storing DNA/RNA text, after some sort of indexing we can query it likewhere
query_sequenceis a literal DNA/RNA text.Wish list:
Non-goals:
Beta Was this translation helpful? Give feedback.
All reactions