PRE-PRINT: Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species

التفاصيل البيبلوغرافية
العنوان: PRE-PRINT: Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species
المؤلفون: Praharshit Sharma, Kuralayanapalya Puttahonnappa Suresh, Divakar Hemadri, Sharanagouda Patil, Anirban Guha
بيانات النشر: Zenodo
سنة النشر: 2021
المجموعة: Zenodo
مصطلحات موضوعية: Bioinformatics, Computational Biology, Genetic Coding, Biological Information Theory, Mathematical Induction, Numerical Verification, Entropy Optimization
الوصف: Background- So far, several research efforts have tried to address the concept of Triplet Block Shannon entropy [TCBShE] computations pertaining to the context of Genetic codon [6] . Though dependence of block Shannon entropy values upon GC% was assessed, specifically GC-1% , GC-2% and GC-3% have not yet been taken into consideration – in this direction. Here, we utilize datasets from GC.evoBase ( Dapeng Wang, 2016 ) to determine the typical TCBShE values and arrive at an interesting mathematical and numerical correlation, worthy of Biological interpretation. Results- Upon carrying out a comprehensive survey of 1118 species’ GC-1,2,3 % values across 5 clades: namely 735 Fungi genomes, 68 Metazoa genomes, 44 Plant genomes, 186 Protist genomes and 85 Vertebrate Ensembl-release genomes respectively; from GC.evoBase datasets, we apply the appopriate formula based on 64 codon Trimers Binarily classified into 8 sets of 3 Blocks - { 000, 001, 010, 011, 100, 101, 110 and 111 } to compute TCBShE. It is observed that HM: Harmonic-Mean of these Entopy values, which in the language of Information theory and coding is the Ratio of “Mutual Information to complement of Normalized Variation of Information” ; and in the case of many Model Organisms the TCBShE values themselves – converge approximating to Napier’s constant/ Base of Natural logarithms. HM of TCBShE for “Protists” is nearest to e ~ 2.71828. Conclusions- Here, the approximation to Napier’s constant that we have attained by considering HM of TCBShE is a sort of Lower-bound and is clearly expressed in Bits. This may very well be corroborated with the direct implications of solving the HyperProteoGenomic–equation, as follows: https://www.wolframalpha.com/input/?i=Solve+4%5E%284%5Ex%29+%3D+20%5E%2820%5E1%29 where in Equation above, 4 = Number of cDNA nucleotides (A|C|G|T) and 20 = Number of Amino-acids, and interestingly, x = 99.9455% Close to e, Napier constant. Moreover, we may envisage “predicting” Modulo-3 (0,1,2) for 1st , 2nd and 3rd codon-positions by ...
نوع الوثيقة: report
اللغة: English
Relation: https://doi.org/10.5281/zenodo.5184730; https://doi.org/10.5281/zenodo.5184731; oai:zenodo.org:5184731
DOI: 10.5281/zenodo.5184731
الاتاحة: https://doi.org/10.5281/zenodo.5184731
Rights: info:eu-repo/semantics/openAccess ; Creative Commons Attribution 4.0 International ; https://creativecommons.org/licenses/by/4.0/legalcode
رقم الانضمام: edsbas.FA75551
قاعدة البيانات: BASE