Description

This track shows the Repeat Masker annotations on the 31 Oct 2018 Ecytonucleospora hepatopenaei/GCA_003709115.1_ASM370911v1 genome assembly.

This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below.

Percent masking of sequence: %5.55

Assembly size: 2,825,971 bases
Sequence masked: 156,844 bases

RepeatMasker and libraries version

The repeat files provided for this assembly were generated using RepeatMasker.
  Smit, AFA, Hubley, R & Green, P.,
  RepeatMasker Open-3.0.
  1996-2010 .

VERSION:
RepeatMasker version open-4.0.8 , sensitive mode
run with blastp version 2.0MP-WashU [01-Jan-2006] [linux24-i786-ILP32F64 2006-01-02T05:13:21]
RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026

PARAMETERS:
RepeatMasker -engine wublast -species 'enterocytozoon hepatopenaei' -s -no_is -cutoff 255 -frag 20000

REPEATS:
RepeatMasker Database: RepeatMaskerLib.embl
Version: RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026
Species: enterocytozoon hepatopenaei ( enterocytozoon hepatopenaei )
176 ancestral and ubiquitous sequence(s) with a total length of 51400 bp
0 enterocytozoon hepatopenaei specific repeats with a total length of 0 bp
0 lineage specific sequence(s) with a total length of 0 bp
--------------------------------------------------------------------------------

Display Conventions and Configuration

Context Sensitive Zooming

This track employs a technique which chooses the appropriate visual representation for the data based on the zoom scale, and or the number of annotations currently in view. The track will automatically switch from the most detailed visualization ('Full' mode) to the denser view ('Pack' mode) when the window size is greater than 45kb of sequence. It will further switch to the even denser single line view ('Dense' mode) if more than 500 annotations are present in the current view.

Dense Mode Visualization

In dense display mode, a single line is displayed denoting the coverage of repeats using a series of colored boxes. The boxes are colored based on the classification of the repeat (see below for legend).

Pack Mode Visualization

In pack mode, repeats are represented as sets of joined features. These are color coded as above based on the class of the repeat, and the further details such as orientation (denoted by chevrons) and a family label are provided. This family label may be optionally turned off in the track configuration.

The pack display mode may also be configured to resemble the original UCSC repeat track. In this visualization repeat features are grouped by classes (see below), and displayed on seperate track lines. The repeat ranges are denoted as grayscale boxes, reflecting both the size of the repeat and the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading.

Full Mode Visualization

In the most detailed visualization repeats are displayed as chevron boxes, indicating the size and orientation of the repeat. The interior grayscale shading represents the divergence of the repeat (see above) while the outline color represents the class of the repeat. Dotted lines above the repeat and extending left or right indicate the length of unaligned repeat model sequence and provide context for where a repeat fragment originates in its consensus or pHMM model. If the length of the unaligned sequence is large, an iterruption line and bp size is indicated instead of drawing the extension to scale.

For example, the following repeat is a SINE element in the forward orientation with average divergence. Only the 5' proximal fragment of the consensus sequence is aligned to the genome. The 3' unaligned length (384bp) is not drawn to scale and is instead displayed using a set of interruption lines along with the length of the unaligned sequence.

Repeats that have been fragmented by insertions or large internal deletions are now represented by join lines. In the example below, a LINE element is found as two fragments. The solid connection lines indicate that there are no unaligned consensus bases between the two fragments. Also note these fragments form the 3' extremity of the repeat, as there is no unaligned consensus sequence following the last fragment.

In cases where there is unaligned consensus sequence between the fragments, the repeat will look like the following. The dotted line indicates the length of the unaligned sequence between the two fragments. In this case the unaligned consensus is longer than the actual genomic distance between these two fragments.

If there is consensus overlap between the two fragments, the joining lines will be drawn to indicate how much of the left fragment is repeated in the right fragment.

The following table lists the repeat class colors:

Color	Repeat Class
	SINE - Short Interspersed Nuclear Element
	LINE - Long Interspersed Nuclear Element
	LTR - Long Terminal Repeat
	DNA - DNA Transposon
	Simple - Single Nucleotide Stretches and Tandem Repeats
	Low_complexity - Low Complexity DNA
	Satellite - Satellite Repeats
	RNA - RNA Repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)
	Other - Other Repeats (including class RC - Rolling Circle)
	Unknown - Unknown Classification

A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed.

Methods

The RepeatMasker (www.repeatmasker.org) tool was used to generate the datasets found on this track hub.

Class profiles

1,243 - Simple
450 - Low_complexity
15 - RNA

Detail class profiles

1,243 - Simple_repeat
450 - Low_complexity
15 - rRNA

Credits

Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track.

References

Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010.

Repbase Update is described in:

Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072

For a discussion of repeats in mammalian genomes, see:

Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616

Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846