Recent phylogenetic studies in historical linguistics have focused on lexical data. However, the way that such data are coded into characters for phylogenetic analysis has been approached in different ways, without investigating how coding methods may affect the results. In this paper, we compare three different coding methods for lexical data (multistate meaning-based characters, binary root-meaning characters, and binary cognate characters) in a Bayesian framework, using data from the Tupí-Guaraní and Chapacuran language families as case studies. We show that, contrary to prior expectations, different coding methods can have a significant impact on the topology of the resulting trees.

Natalia Chousou-Polydouri
Joshua Birchall
Sérgio Meira
Zachary O’Hagan
Lev Michael
Publication date: 
January 1, 2016
Publication type: 
Chousou-Polydouri, Natalia, Joshua Birchall, Sergio Meira, Zachary O'Hagan, and Lev Michael. 2016. A test of coding procedures for lexical data with Tupí-Guaraní and Chapacuran languages. Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics. University of Tübingen.