Genome Annotation
Extract Genes from Non-annotated Genome
For de-novo assembled genomes, gtf
from reference genome is not match the position of de-novo genomes because there are many novel deletions, insertions, etc. So, we can not using than to locate genes, introns, or extrons.
There are a few things we can achieve this goal:
- genome annotation
- align the reference genes into your genome
- call them from the
vcf
file directly
Call genes from VCF file
This is the trickiest way. Because if we have the variation files, we can call the genes based on the location of reference gtf
. And we even don’t need to assemble our genomes.
Cons:
- We have only SNP information
- Structure variation like inversion and duplication cannot be found.
Genome Annotation
One of easiest way to extract genes from non-annotated Genome is annotate it.
MITOS WebServer.
In this server, you just need to submit your fasta
file and wait. A quick test of mitochondria genome shows it can not only annotate genes, but also annotate tRNA:
Name | Start | Stop | Strand | Length | Structure |
---|---|---|---|---|---|
trnI(atc) | 1 | 65 | + | 65 | svg ps |
nad5-1_a | 58 | 120 | + | 63 | |
trnQ(caa) | 97 | 165 | - | 69 | svg ps |
trnM(atg) | 171 | 239 | + | 69 | svg ps |
svg
are secondary structure of the tRNA.
By compaired with the reference gtf
file, the general quality f this annitations is good. tRNA prediction has a very high positive ratio. Most of genes were annotated as well as tRNAs. You can download the BED fiel, GFF file, or other type of annotation formats.
All annotated genes:
Genome Annotation
https://karobben.github.io/2022/09/16/Bioinfor/genefromgenome/