Only organisms with completely sequenced genomes were chosen to avoid poor or incomplete sequence data from shotgun or partial genome sequencing projects. For each set of homologous matches,
there were four proteins: the duplicated genes and an ortholog match for each copy as only the best and most complete hits to each gene in a pair were selected. For these duplicate pairs, two alternative phylogenetic relationships were predicted. The Type-A relationship was predicted when a protein sequence branched with a homolog (ortholog) from a closely related species rather than its counterpart protein (paralog) within the R. sphaeroides genome, whereas as Type-B relationship was predicted when the duplicate protein this website copies within R. sphaeroides branched with each other [28, 33]. Additionally, four example phylogenetic analyses, two exhibiting Type-A phylogeny WZB117 concentration and two exhibiting Type-B phylogeny, were carried out with gene duplications common among the four R. sphaeroides strains. Protein sequence alignments were carried out using MUSCLE [34], a program known for
its accuracy and speed. Phylogenetic SHP099 analysis was performed using PhyML [35] with the WAG model [36] to generate unrooted, maximum likelihood trees. Bootstrap values were calculated using 100 replications for the trees where topology was being determined. Maximum likelihood trees were constructed for all protein-pairs to ascertain the tree topology (Type-A or Type B). If a set of duplicated genes had their highest match to the same ortholog, then the next highest ortholog match, if available, for one of the genes was utilized in the tree construction many to ascertain accurately the duplication topology. Functional Constraints Analysis For the functional constraints analysis, comparisons were conducted within all four R. sphaeroides strains. More specifically, the 28 common
gene pairs among the four strains were utilized for the functional constraints analysis where the genes in a given pair were compared against one another. The synonymous and nonsynonymous substitution rates along with the nonsynonymous-synonymous substitution rate ratio were calculated using the modified Yang-Nielsen algorithm [37, 38]. MUSCLE was used to align amino acid sequences [34]. These aligned sequences were then transformed into the original DNA sequences after which, the KaKs_Calculator was used with each pair of DNA sequences [39] to calculate the synonymous substitution rate (Ks), the nonsynonymous substitution rate (Ka), and the nonsynonymous/synonymous rate ratio (ω = Ka/Ks). Under the MYN model, ω = 0.3, 1, and 3 were used for negative (purifying), neutral, and positive selection, respectively [37, 38]. A one-way ANOVA was used to test whether the distributions of ω among the four strains were dissimilar.