Hi Andrew,
I have a nanopore sequencing library where one molecule could result in reads on both forward or reverse strands. I would like to know if preseq regard these pairs as "distinct" observation, or as "repeated" observation?
The definition of "distinct" observation is not very clearly stated in the documentation. I was trying to find it in the implementation:
Related lines:
|
curr_gr.get_start() != prev_gr.get_start()) |
Related class:
GenomicRegion.hpp
It seems like get_start() is getting the smaller coordinate of a bam read?
If so, does it mean that both reverse and forward-mapped reads would be seen as the same molecule (same 'start') -- This is the desired behavior in my use case.
I would love to hear from you!
The command I used:
./preseq lc_extrap -B -o ./yield-estimates.txt <input-sorted-bam>
Software version:
v2.0 downloaded precompiled binary
Best,
Li-Ting