Skip to Main content Skip to Navigation
Book sections

Strengths and Limits of Multiple Sequence Alignment and Filtering Methods

Abstract : Multiple sequence alignment (MSA) is a prerequisite for most phylogenetic analyses. Aligning sequences to unravel residue homology is a challenging task that has been the focus of much attention in recent decades. Research in this field has been extremely active from both theoretical and practical standpoints. Numerous tools have been developed to align sequences and, more recently, to post-process those alignments and filter out their most dubious parts. Whether or not the inclusion of alignment filtering in a phylogenetic pipeline improves the quality of the inferred phylogenies is still debatable. The goal of this chapter is not to provide an exhaustive list of all tools available to produce or filter an MSA, but rather to cover the limitations of current alignment methods and their causes, to highlight key differences among MSA filtering methods and provide some practical MSA filtering guidelines. We consider that filtering methods can be subdivided into two main categories. The first one includes methods that filter MSA by entirely removing some sites or sequences from the MSA. The second category contains MSA filtering methods that mask residues and are able to extract some pieces of information from a site or sequence, while disregarding the remaining information-we called these picky-filtering methods. In our benchmark, the filtering methods that perform best are, as expected, in the picky category. When inferring phylogenies, MSA filtering impacts the inferred tree topology but it also seems to significantly improve branch length estimations, especially when a picky-filtering method is used.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02535389
Contributor : Christine Bibal <>
Submitted on : Thursday, November 26, 2020 - 2:44:32 PM
Last modification on : Friday, December 4, 2020 - 12:40:21 PM

File

chapter_2.2_ranwez_v2.pdf
Publisher files allowed on an open archive

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

  • HAL Id : hal-02535389, version 2

Collections

Citation

Vincent Ranwez, Nathalie Chantret. Strengths and Limits of Multiple Sequence Alignment and Filtering Methods. Scornavacca, Celine; Delsuc, Frédéric; Galtier, Nicolas. Phylogenetics in the Genomic Era, No commercial publisher | Authors open access book, pp.2.2:1-2.2:36, 2020. ⟨hal-02535389v2⟩

Share

Metrics

Record views

40

Files downloads

81