Front cover image for Protein structure comparison : from contact map overlap maximisation to distance-based alignment search tool

Protein structure comparison : from contact map overlap maximisation to distance-based alignment search tool

In molecular biology, a fruitful assumption is that proteins sharing close three dimensional structures may share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated. Among all the proposed methods, we focus on the similarity measure called Contact Map Overlap maximisation (CMO), mainly because it provides scores which can be used for obtaining good automatic classifications of the protein structures. In this thesis, comparing two protein structures is modelled as finding specific sub-graphs in specific k-partite graphs called alignment graphs. Then, we model CMO as a kind of maximum edge induced sub-graph problem in alignment graphs, for which we conceive an exact solver which outperforms the other CMO algorithms from the literature. Even though we succeeded to accelerate CMO, the procedure still stays too much time consuming for large database comparisons. To further accelerate CMO, we propose a hierarchical approach for CMO which is based on the secondary structure of the proteins. Finally, although CMO is a very good scoring scheme, the alignments it provides frequently posses big root mean square deviation values. To overcome this weakness, we propose a new comparison method based on internal distances which we call DAST (for Distance-based Alignment Search Tool). It is modelled as a maximum clique problem in alignment graphs, for which we design a dedicated solver with very good performances
Thesis, Dissertation, English, 2010