Hi all,
was a bit baffled opening biopandas PDB output with MDAnalysis. Instead of some dozen segments, I got thousands. Here's why & my hacky fix:
Biopandas outputs the rows in a following way:
ATOM 50786 CB ASP q 96 219.123 233.404 332.880 1.00 97.39 C
ATOM 50787 N PRO q 97 222.483 233.701 332.586 1.00 100.66 N
while in MDAnalysis expects this format:
ATOM 51419 O UNK r 113 214.624 201.542 285.597 1.00 99.63 O
ATOM 51420 CB UNK r 113 217.297 202.297 286.117 1.00100.32 C
Due to this formatting when B-factors have five numbers (>99.99), MDAnalysis parses the last digit of the B-factor to be the segid and uses them as chains, see the code for th eparser:
Line 297:
segids.append(line[66:76].strip())
Lines 304-306:
# If segids not present, try to use chainids
if not any(segids):
segids = chainids
As a quick fix, I commented out the last if statement in MDAnalysis.