Recent phylogenetic studies in historical linguistics have focused on lexical data. However, the way that such data are coded into characters for phylogenetic analysis has been approached in different ways, without investigating how coding methods may affect the results. In this paper, we compare three different coding methods for lexical data (multistate meaning-based characters, binary root-meaning characters, and binary cognate characters) in a Bayesian framework, using data from the Tupí-Guaraní and Chapacuran language families as case studies. We show that, contrary to prior expectations, different coding methods can have a significant impact on the topology of the resulting trees.
January 1, 2016
Chousou-Polydouri, Natalia, Joshua Birchall, Sergio Meira, Zachary O'Hagan, and Lev Michael. 2016. A test of coding procedures for lexical data with Tupí-Guaraní and Chapacuran languages. Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics. University of Tübingen.