RT PCR was

RT PCR was technical support carried out according to a previous procedure. Primers used in this study were listed in Additional file 2 Table S30. Fold change for gene expression was calculated by normalizing Ct values at each developmental stage against endogenous control using the 2 Ct method. Mapping of reads and calculation of gene expression level Reads obtained by SOLiD sequencing were aligned against soybean genome assembly version 9, using the Lifescope software package. Lifescope used a seed and extend approach to map reads against the reference. The normalized gene expression level was calculated as Reads Per Kilo base of mRNA length per Millions of mapped reads by the GFOLD V1. 0. 7 software. A comparison between the expression levels of genes and intergenic regions was used to find a threshold for detectable expression above background.

The value of 0. 25 RPKM was the threshold classifying annotated genes into two large clusters, and was defined as the threshold between expressed and unexpressed. Next, DEGs were defined using GFOLD diff program The preferentially expressed gene for specific tissue was defined by meeting at least GFOLD 1 and RPKM 4 in the tissue in question compared to all the other tissues. Identification of putative paralogs and differential expression analysis We used the MCScanx software to identify potential paralogous clusters. WGD genes and TD genes were detected with default parameters. The differential expression of paralogs was analyzed based on the Log2 normalized RPKM values across 11 samples and t test to assess statistical significance.

Correlation analysis A correlation matrix was prepared using the R software and Pearsons correlation coefficient as the statistical metric to compare the values of the whole transcriptome in 11 samples. Log2 normalized RPKM values from RNA seq dataset were used to create the correlation matrix, and then R scripts were used to analyze the correlation among samples. Correlation coefficient values were converted into distance to define the height scale of the dendrogram. The heat map of the correlation was implemented by the pheatmap function in the pheatmap package. Discovery of NTRs and RT PCR validation We used the Cufflinks software to assembly transcripts using high quality mapped reads from Lifescope, and obtained intergenic transcripts based on Class Code u comparing the annotated soybean genome, using the following criteria larger than 150 bp in size, reads number 10 and supported by detection in at least two tissue samples.

Based on these criteria, we obtained 6,718 high confidence NTRs. RNA seq reads were visualized on the soybean genome using the inGap software. 10 randomly selected NTRs were verified by performing RT PCR using specific primers designed for this study. Additionally, the BLAST was used to identify nTUs agaist the Rfam.

