Home
Download
FLUX CAPACITOR
- Installation
- Picture Gallery
- Getting Started
- Data Formats
- Benchmarks
FLUX SIMULATOR
Bugtracker
Forum/FAQ
Sitemap

Data formats

The FLUX CAPACITOR currently uses the file formats listed here.

BED

The BED file format used by the UCSC Genome Browser to describe track objects is recruited in the FLUX package for the possibly spliced alignment of reads. Although I tried to investigate, I could not find out what BED stands for -- maybe remember it as "Begin End Discrepancy" because of its following funny attribute: BED files use 0-based coordinates, although they are rendered in the 1-based genomic coordinates of the UCSC visual interface. Moreover, the position in the "begin" field is included in the described object, whereas the "end" field gives the first excluded position. For conversions from BED coordinates to normal (1-based) coordinates that describe also included end positions, this means that you have to change exclusively the start position: +1 from BED to the rest of the world, and -1 the other way around.

GTF

GTF, here I know that it stands for "Gene Transfer Format", is a standard to describe exonic structures. The FLUX package parser uses the GTF as specified in AStalavista, for the output the format is extended by writing the additional feature "as_event" which is actually not described by the original specification. So, in the input FLUX expects to see lines like:

AB000381 gene_id exon        150  200  .  +  .  gene_id "AB000381.000"; transcript_id "AB000381.000.1";

As currently the gene definition is drifting, the attribute gene_id is not very important for the program run. In contrast, transcript_id is essential in order to assign exons correctly. Please also check that the transcript identifier column is consistent accross your file, for efficiency the parser expects the transcript ID to be in the same token when splitting at whitespaces as it is found in the first line of the file. You may use additional features in your input file, but they are ignored. FLUX outputs GTF lines as these:

AB000381 gene_id exon        150  200  .  +  .  gene_id "AB000381.000"; transcript_id "AB000381.000.1";

Additionally, the feature as_event with several new attributes is used in the output:

chr4    Undefined       as_event        328219  357691  1.50000 +      . transcript_id "uc003fzz.2,uc00
3gaa.2"; gene_id "uc003gab.2"; flanks "328219^,null"; structure "1-2),3-4)"splice_chain "341997-342339)
,356453-357691)"; sources "Undefined,Undefined"; dimension "2_4";degree "4";  exon_RPKM "0.0,8.64427"; 
transcript_RPKM "8.911525,8.996653"; falsification "0.46898872";

chr4    Undefined       as_event        14618027  14627910  1.66666 +       .      transcript_id "uc003
gni.1/uc003gnk.1,uc003gnl.1/uc003gnn.1"; gene_id "uc003gnk.1"; flanks "14618027-,14627910-";structure "
1^3-4^,2^"; splice_chain "14618308^14619060-14619149^,14618317^"; sources "Undefined,Undefined"; dimens
ion "2_3"; degree "4";  exon_RPKM "0.0,0.0"; transcript_RPKM "11.431025,7.674072"; falsification "0.6105551";