4 pointsby zmactep6 hours ago1 comment
  • dalke5 hours ago
    You might mention in other forums, like the RDKit mailing list (though that's almost moribund).

    I looked at the SDF reader, since that's what I know best. I see a few things which look like they need revisiting.

    Line 75 has 'if name == "$$$$" {return self.parse_molecule();}' This isn't correct. This means the record name is "$$$$" (if you are RDKit), or it means the record is in the wrong format (if you are the CTFile specification, which explicitly prohibits that).

    Also, does Rust have tail recursion? If not, the recursive nature of the code makes me think parsing a file containing 1 million lines of the form "$$$$\n" would likely blow the stack.

    In principle the version number test for V2000 or V3000 should look at the specific column numbers, and not presence somewhere in the line. Someone like me might place a "V3000" in the obsolete fields, with a "V2000" in the correct vvvvvv field. ;)

    The "Skip to end of molecule" code will break on real-world datasets. One classic problem is a company which used "$", "$$", "$$$" and "$$$$" to indicate cost, stored as tag data like:

      > <price>
      $$$$
    
      $$$$
    
    where the first "$$$$" is part of the data item, and the second "$$$$" is the end of the SD record. This ended up causing a problem when an SDF reader somewhere in their system didn't parse data items correctly. (Another common failure in data item parsing is to ignore the requirement for a newline after the data item.)

    I talk about "$$$$" more at http://www.dalkescientific.com/writings/diary/archive/2020/0... .

    Then there's the "S SKP" field, which you'll almost certainly never see in real life! I've only seen it used in a published example of a JICST extended MOLfile. See http://www.dalkescientific.com/writings/diary/archive/2020/0...

    Please don't let these comments get you down! These details are hard to get, and not obvious. It took me years to learn the rare corner cases.

    I also haven't done molviz since the 1990s, or used PyMol (I was VMD person), so can't say anything about the overall project. We started with GL, and had to port to OpenGL. :)

    PS. A bit of history for you. PyMol's and VMD's selection syntax look similar because both drew on the syntax in Axel Brunger's X-PLOR. Warren DeLano came out of Brunger's lab, and VMD was from Schulten's group, which were X-PLOR users. (Schulten was Brunger's PhD advisor.)