transMapAncRefGene TransMapAncRefGene genePred TransMap via Ancestor Tree RefSeq Gene Predictions 0 1 34 139 34 144 197 144 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapAncRefAttr\ noInherit on\ subTrack transMapAnc\ transMapRefGene TransMapRefGene genePred TransMap RefSeq Gene Predictions 0 1 34 139 34 144 197 144 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapRefAttr\ noInherit on\ subTrack transMap\ transMapAncRefAliGene TransMapAncRefAli genePred TransMapAnc via Ancestor Tree RefSeq Alignments 0 2 120 12 120 187 133 187 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapAncRefAttr\ noInherit on\ subTrack transMapAnc off\ transMapRefAliGene TransMapRefAli genePred TransMap RefSeq Alignments 0 2 120 12 120 187 133 187 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapRefAttr\ noInherit on\ subTrack transMap off\ mapGenethon STS Markers bed 5 + Various STS Markers 0 2 0 0 0 127 127 127 0 0 0 map 1 transMapAncMRnaGene TransMapAncMRnaGene genePred TransMapAnc via Ancestor Tree Protein-Coding mRNA Predictions 0 3 34 139 34 144 197 144 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapAncMRnaAttr\ noInherit on\ subTrack transMapAnc\ transMapMRnaGene TransMapMRnaGene genePred TransMap Protein-Coding mRNA Predictions 0 3 34 139 34 144 197 144 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapMRnaAttr\ noInherit on\ subTrack transMap\ stsMarker STS Markers bed 5 + STS Markers on Genetic (blue), FISH (green) and RH (black) Maps 1 3 0 0 0 128 128 255 0 0 0

Description

\

This track shows locations of Sequence Tagged Site (STS) markers\ along the draft assembly.

\ \

Method

\ These STSs have been mapped using \ either genetic mapping (Genethon and Marshfield maps),\ radiation hybridization mapping (Stanford, Whitehead RH, and GeneMap99 maps) or\ YAC mapping (the Whitehead YAC map) techniques. \ Prior to August 2001, this track also\ showed the approximate positions of fluorescent in situ hybridization (FISH) mapped clones.\ In August 2001 and later assemblies, the FISH clones are displayed in a separate \ track.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a map data set \ within the track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. In the pulldown menu, select the map whose data you would like to highlight or exclude in the display. By default, the "All Genetic" option is selected.\
  2. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not\ display data from the map selected in the pulldown list. If "include" is selected, the browser\ will display only data from the selected map.\
  3. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\

Many thanks to the researchers who worked on these\ maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang,\ Terry Furey and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be\ found at the following links:

\ \ \ map 1 transMapAncMRnaAliGene TransMapAncMRnaAli genePred TransMapAnc via Ancestor Tree mRNA Alignments 0 4 134 139 34 194 197 144 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapAncMRnaAttr\ noInherit on\ subTrack transMapAnc off\ transMapMRnaAliGene TransMapMRnaAli genePred TransMap mRNA Alignments 0 4 134 139 34 194 197 144 0 0 0 x 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ itemAttrTbl transMapMRnaAttr\ noInherit on\ subTrack transMap off\ stsMap STS Markers bed 5 + STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps 1 4 0 0 0 128 128 255 0 0 0

Description

\

\ This track shows locations of Sequence Tagged Site (STS) markers\ along the draft assembly. These markers have been mapped using \ either genetic mapping (Genethon, Marshfield, and deCODE maps),\ radiation hybridization mapping (Stanford, Whitehead RH, and GeneMap99 maps) \ or YAC mapping (the Whitehead YAC map) techniques.

\

\ Genetic map markers are shown in blue; radiation hybrid map markers are shown \ in black. When a marker maps to multiple positions in the genome, it is \ displayed in a lighter color.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the color or \ include/exclude the display of a dataset from an individual lab. This is \ helpful when many items are shown in the track display, especially when only \ some are relevant to the current task. The filter is located at the top of \ the track description page, which is accessed via the small button to the \ left of the track's graphical display or through the link on the track's \ control menu. To use the filter:\

    \
  1. In the pulldown menu, select the map whose data you would like to \ highlight or exclude in the display. By default, the "All Genetic" \ option is selected.\
  2. Choose the color or display characteristic that will be used to highlight \ or include/exclude the filtered items. If "exclude" is chosen, the \ browser will not display data from the map selected in the pulldown list. \ If "include" is selected, the browser will display data only from \ the selected map.\

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ Many thanks to the researchers who worked on these\ maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang,\ Terry Furey and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be\ found at the following links:\

\ \ map 1 rgdQtl RGD QTL bed 4 . Quantitative Trait Locus (from RGD) 0 4.5 12 12 120 133 133 187 0 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ A quantitative trait locus (QTL) is a polymorphic locus that contains alleles\ which differentially affect the expression of a continuously distributed \ phenotypic trait. Usually a QTL is a marker described by statistical \ association to quantitative variation in the particular phenotypic trait that\ is thought to be controlled by the cumulative action of alleles at multiple \ loci.

\

\ For a comprehensive review of QTL mapping techniques in the rat, see Rapp, \ J.P. (2000) in the References section below.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

\ \

References

\

\ Rapp, J.P. \ Genetic Analysis of Inherited Hypertension in the Rat.\ Physiol. Rev., 90, 135-172 (2000).

\ map 1 stsMapMouse STS Markers bed 5 + STS Markers on Genetic Maps 1 5 0 0 0 128 128 255 0 0 0

This track shows locations of Sequence-Tagged Site (STS) markers along \ the mouse draft assembly. These markers appear on the Mouse Genome Informatics (MGI) consensus mouse genetic \ map. Information about the genetic map and STS marker primer sequences are \ provided by the Mouse Genome Informatics database group at The Jackson \ Laboratory.

\ map 1 fishClones FISH Clones bed 5 + Clones Placed on Cytogenetic Map Using FISH 0 6 0 150 0 127 202 127 0 0 0

Description

\

\ This track shows the location of fluorescent in situ hybridization \ (FISH)-mapped clones along the draft assembly sequence. The locations of \ these clones were contributed as a part of the BAC Consortium paper \ Cheung, V.G. et al. (2001) in the References section below.

\

\ More information about the BAC clones, including how they may be obtained, \ can be found at the \ Human BAC Resource and the \ Clone Registry web sites hosted by \ NCBI.\ To view Clone Registry information for a clone, click on the clone name at \ the top of the details page for that item.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the color or \ include/exclude the display of a dataset from an individual lab. This is \ helpful when many items are shown in the track display, especially when only \ some are relevant to the current task. The filter is located at the top of \ the track description page, which is accessed via the small button to the \ left of the track's graphical display or through the link on the track's \ control menu. To use the filter:\

    \
  1. In the pulldown menu, select the lab whose data you would like to \ highlight or exclude in the display. \
  2. Choose the color or display characteristic that will be used to highlight \ or include/exclude the filtered items. If "exclude" is chosen, the \ browser will not display clones from the lab selected in the pulldown list. \ If "include" is selected, the browser will display clones only \ from the selected lab.\

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ We would like to thank all of the labs that have contributed to this resource:\

\ \

References

\

\ Cheung, V.G. et al.. \ Integration of cytogenetic landmarks into the draft sequence of \ the human genome, Nature 409, 953-958 (2001).

\ map 1 genMapDb GenMapDB Clones bed 6 + GenMapDB BAC Clones 0 7 0 0 0 127 127 127 0 0 0

Description

\

BAC clones from GenMapDB\ are placed on the draft sequence using BAC end sequence information\ and confirmed using STS markers by Vivian Cheung's lab at the\ Department of Pediatrics, University of Pennsylvania. Further\ information about each clone can be obtained by clicking on the clone\ name on the track detail page.\

Credits

\ Thanks to Vivian Cheung's lab \ and GenMapDB at the University of Pennsylvania for providing the data used to create this track.\ map 1 recombRate Recomb Rate bed 4 + Recombination Rate from deCODE, Marshfield, or Genethon Maps (deCODE default) 0 8 0 0 0 127 127 127 0 0 0

Description

\

\ The recombination rate track represents\ calculated sex-averaged rates of recombination based on either the\ deCODE, Marshfield, or Genethon genetic maps. By default, the deCODE\ map rates are displayed. Female- and male-specific recombination\ rates, as well as rates from the Marshfield and Genethon maps, can\ also be displayed by choosing the appropriate filter option on the track \ description page.

\ \

Methods

\

\ The deCODE genetic map was created at \ deCODE Genetics and is \ based on 5,136 microsatellite markers for 146 families with a total\ of 1,257 meiotic events. For more information on this map, see\ Kong, A. et al. (2002) in the References section below.

\

\ The Marshfield genetic map was created at the \ Center for Medical Genetics and is based on 8,325 short \ tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134\ individuals with 186 meioses. For more information on this map, see \ Broman, K.W. et al. 1998 in the References section below.

\

\ The Genethon genetic map was created at \ Genethon and is based on 5,264 microsatellites for 8 CEPH \ families consisting of 134 individuals with 186 meioses. For more information \ on this map, see \ Dib et al. 1996 in the References section below.

\

\ Each base is assigned the recombination rate calculated by\ assuming a linear genetic distance across the immediately flanking\ genetic markers. The recombination rate assigned to each 1 Mb window\ is the average recombination rate of the bases contained within the\ window.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the map or\ gender-specific rate displayed. The filter is located at the top of the track \ description page, which is accessed via the small button to the left of \ the track's graphical display or through the link on the track's control menu.\ To view a particular map or gender-specific rate, select the corresponding\ option from the "Map Distances" pulldown list. By default, the \ browser displays the deCODE sex-averaged distances.

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ This track was produced at UCSC using data that are freely available for\ the Genethon, Marshfield, and deCODE genetic maps (see above links). Thanks\ to all who played a part in the creation of these maps.

\ \

References

\

\ Broman, K.W., Murray, J.C., Sheffield, V.C., White, R.L. and Weber, J.L.\ Comprehensive human genetic maps: Individual and sex-specific \ variation in recombination, American Journal of Human Genetics\ 63, 861-689 (1998).

\

\ Dib, C., Faure, S., Fizames, C., Samson, D., Drouot, N., Vignal, A., \ Millasseau, P., Marc, S., Hazan, J., Seboun, E., Lathrop, M., Gyapay, G., \ Morissette, J., and Weissenbach, J. \ A comprehensive genetic map of the human genome based on 5,264 \ microsatellites, \ Nature 380(6570), 152-154 (1996).

\

\ Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., \ Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., \ Shlien, A., Palsson, S.T., Frigge, M.L., Thorgeirsson, T.E., Gulcher, J.R., \ and Stefansson, K.\ A high-resolution recombination map of the human genome,\ Nature Genetics, 31(3), 241-247 (2002).

\ map 1 exonArrows off\ ctgPos Map Contigs ctgPos Physical Map Contigs 0 9 150 0 0 202 127 127 0 0 0

Description

\ This track shows the locations of $organism contigs on the physical map. \ The underlying data is derived from the NCBI seq_contig.md file \ that accompanies this assembly. All contigs in this track are oriented to the \ "+" strand.\ \ map 0 gold Assembly bed 3 + Assembly from Fragments 0 10 150 100 30 230 170 40 0 0 0

Description

\

\ This track shows the draft assembly of the $organism genome. \ Whole-genome shotgun reads were assembled into contigs. When possible, \ contigs were grouped into scaffolds (also known as "supercontigs").\ The order, orientation and gap sizes between contigs within a scaffold are\ based on paired-end read evidence.

\

\ In dense mode, this track depicts the contigs that make up the \ currently viewed scaffold. \ Contig boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist between contigs, spaces are shown between the gold and brown\ blocks. The relative order and orientation of the contigs\ within a scaffold is always known; therefore, a line is drawn in the graphical\ display to bridge the blocks.

\

\ All components within this track are of fragment type "W": \ Whole Genome Shotgun contig.

\ \ map 1 gap Gap bed 3 + Gap Locations 1 11 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the position of gaps — represented by Ns — within \ the C. intestinalis assembly. Gaps of 50 or more bases were most \ likely introduced by the JGI JAZZ assembler.

\

\ For a discussion of gaps and the JAZZ assembler see \ Dehal, P. et al. (2002) in the References section below.

\ \

Display Conventions and Configuration

\

\ Gaps are represented by boxes. If the relative order and orientation of \ the contigs on either side of the gap is known from mRNA, ESTs, or paired BAC \ end reads, it is a bridged gap, indicated by a white line drawn \ through the box. The display must be sufficiently zoomed in to view this \ feature. In full display mode, the item label indicates the type of gap and \ whether the gap is bridged.

\ \

References

\

\ Dehal, P. et al. \ The Draft Genome of Ciona intestinalis: Insights into Chordate \ and Vertebrate Origins (Supplemental Materials). \ Science. 298(5601), 2157-67 (2002).

\ map 1 partMrnas Partially Found mRNAs psl . Partially Found RefSeq and MGC mRNAs 0 12 0 0 0 127 127 127 0 0 0 map 1 missingHg Missing Human psl . Unplaced Human RefSeq Genes Blatted against Mouse Translated 0 13 0 100 0 255 240 200 0 0 0 map 1 clonePos Coverage clonePos Clone Coverage/Fragment Position 0 14 0 0 0 180 180 180 0 0 0

Description

\

\ In dense display mode, this track shows the coverage level of \ the genome. Finished regions are depicted in black. Draft regions \ are shown in various shades of gray that correspond to the level of coverage. \

\ In full display mode, this track shows the position of each contig inside each \ draft or finished clone ("fragment") in the assembly. For some \ assemblies, clones in the sequencing center tiling path are displayed with\ blue rather than gray backgrounds.\

\ map 0 bacEndPairs BAC End Pairs bed 6 + BAC End Pairs 0 15 0 0 0 127 127 127 0 0 0

Description

\

\ Bacterial artificial chromosomes (BACs) are a key part of many \ large-scale sequencing projects. A BAC typically consists of 50 - 300 kb of\ DNA. During the early phase of a sequencing project, it is common\ to sequence a single read (approximately 500 bases) off each end of\ a large number of BACs. Later on in the project, these BAC end reads\ can be mapped to the genome sequence.

\

\ This track shows these mappings\ in cases where both ends could be mapped. These BAC end pairs can\ be useful for validating the assembly over relatively long ranges. In some\ cases, the BACs are useful biological reagents. This track can also be\ used for determining which BAC contains a given gene, useful information\ for certain wet lab experiments.

\

\ A valid pair of BAC end sequences must be\ at least 50 kb but no more than 600 kb away from each other. \ The orientation of the first BAC end sequence must be "+" and\ the orientation of the second BAC end sequence must be "-".

\

\ The scoring scheme used for this annotation assigns 1000 to an alignment \ when the BAC end pair aligns to only one location in the genome (after \ filtering). When a BAC end pair or clone aligns to multiple locations, the \ score is calculated as 1500/(number of alignments).

\ \

Methods

\

\ BAC end sequences are placed on the assembled sequence using Jim Kent's \ blat program.

\ \

Credits

\

\ Additional information about the clone, including how it\ can be obtained, may be found at the \ NCBI Clone Registry. To view the registry entry for a \ specific clone, open the details page for the clone and click on its name at \ the top of the page.

\ map 1 exonArrows off\ bacEndPairsBad Incorrect BAC End Pairs bed 6 + Orphan, Short and Incorrectly Oriented BAC End Pairs 0 16 0 0 0 127 127 127 0 0 0 map 1 exonArrows off\ bacEndPairsLong Long BAC End Pairs bed 6 + Long BAC End Pairs 0 17 0 0 0 127 127 127 0 0 0 map 1 exonArrows off\ fosEndPairs Fosmid End Pairs bed 6 + Fosmid End Pairs 0 18 0 0 0 127 127 127 0 0 0

Description

\

A valid pair of fosmid end sequences must be\ at least 30 kb but no more than 50 kb away from each other. \ The orientation of the first fosmid end sequence must be "+" and\ the orientation of the second fosmid end sequence must be "-".

\ \

Methods

End sequences were trimmed at the NCBI using\ ssahaCLIP written by Jim Mullikin. Trimmed fosmid end sequences were\ placed on the assembled sequence using Jim Kent's \ blat \ program.

\ \

Credits

\

Sequencing of the fosmid ends was done at the \ Eli & Edythe L. Broad Institute of MIT \ and Harvard University.\ Sequences and quality scores are available in the NCBI Trace Respository.\

\ map 1 exonArrows off\ fosEndPairsBad Bad Fosmid End Pairs bed 6 + Orphan, Short and Incorrectly Oriented Fosmid End Pairs 0 19 0 0 0 127 127 127 0 0 0 map 1 exonArrows off\ fosEndPairsLong Long Fosmid End Pairs bed 6 + Long Fosmid End Pairs 0 20 0 0 0 127 127 127 0 0 0 map 1 exonArrows off\ chr18deletions Chr18 Deletions bed 6 + Chromosome 18 Deletions 0 21 0 0 0 127 127 127 0 0 0 map 1 isochores Isochores bed 4 + GC-Rich (dark) and AT-Rich (light) Isochores 0 22 0 0 0 127 127 127 1 0 0

What's an Isochore

\

Isochores describe a region of a chromosome where the CG-content is\ either higher or lower than the whole genome average (42%). A CG-rich\ isochore is given a dark color, while a CG-poor isochore is a light\ color.

\

Isochores were determined by first calculating the CG-content of 100,000 bp\ windows across the genome. These windows were either labeled H or L\ depending on whether the window contained a higher or lower GC-content\ than average. A two-state HMM was created in which one state represented\ GC-rich regions, and the other GC-poor. It was trained using the first 12\ chromosomes. The trained HMM was used to generate traces over all chromosomes.\ These traces define the boundaries of the isochores,\ and their type (GC-rich or AT-rich).

\ map 1 gcPercent GC Percent bed 4 + Percentage GC in 20,000-Base Windows 0 23 0 0 0 127 127 127 1 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in a 20,000 base window. Windows with high GC content are drawn more darkly \ than windows with low GC content. High GC content is typically associated with \ gene-rich areas.\

\

Credits

\

\ This track was generated at UCSC.\ map 1 gc5Base GC Percent wig 0 100 GC Percent in 5-Base Windows 0 23.5 0 0 0 128 128 128 0 0 0

Description

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the "Graph configuration help" link\ for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\ \ map 0 autoScaleDefault Off\ defaultViewLimits 30:70\ graphTypeDefault Bar\ gridDefault OFF\ maxHeightPixels 128:36:16\ spanList 5\ windowingFunction Mean\ quality Quality Scores wig 0 100 $Organism Sequencing Quality Scores 0 23.6 0 128 255 255 128 0 0 0 0

Description

\

\ The Quality Scores track shows the sequencing quality score \ (range: 0 to 99) of each base in the assembly. \ The height at each position of the track \ indicates the quality of the base. \ When zoomed out to a large range, the heights reflect the averaged scores. \ Scores of 40 or higher reflect high confidence in the sequence (with an error rate of less than \ 1/10,000); scores of 20 or higher reflect reasonable confidence (of working draft \ quality).\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

\ \

Credits

\

\ The quality scores were provided as part of the $organism assembly. \ The database representation and graphical display code were written by\ Hiram Clawson.\ map 0 autoScaleDefault Off\ graphTypeDefault Bar\ gridDefault OFF\ maxHeightPixels 128:36:16\ spanList 1,1024\ windowingFunction Mean\ gcPercentSmall GC % 100b bed 4 + Percentage GC in 100-Base Windows 0 24 0 0 0 127 127 127 1 0 0 map 1 GCwiggle GC Samples sample GC Percent Sample Track (every 20,000 bases) 0 25 0 0 0 127 127 127 0 0 1 chr22, map 0 pGC GC Samples sample GC Percent Sample Track 0 26 0 0 0 127 127 127 0 0 0 map 0 humanParalog Human Paralog bed 5 + Human Paralogs Using Fgenesh++ Gene Predictions 0 28 0 100 0 255 240 200 1 0 0 map 1 celeraCoverage WSSD Coverage bed 4 . Regions Assayed for SDD 0 29 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents coverage of clones that were assayed for \ segmental duplications using high-depth Celera reads. Absent regions were \ not assessed by this version of the Segmental Duplication Database (SDD). \ For a description of the whole-genome shotgun sequence detection (WSSD)\ "fuguization" method, see Bailey, J.A. et al. (2001) in \ the References section below.

\ \

Credits

\

\ The data were provided by \ Xinwei She \ and Evan Eichler as part of their\ effort to map human paralogy at the \ University of Washington.

\ \

References

\

\ Bailey, J.A., et al., \ Recent segmental duplications in the human genome. \ Science 297(5583), 945-7 (2002).

\

\ Bailey, J.A., et al., \ Segmental duplications: organization and impact within the \ current human genome project assembly, Genome Res. 11(6), \ 1005-17 (2001).

\

\ She, X., et al., \ Shotgun sequence assembly and recent segmental duplications \ within the human genome. Nature 431(7011), 927-30 (2004).\

\ map 1 celeraDupPositive WSSD Duplication bed 4 + Sequence Identified as Duplicate by High-Depth Celera Reads 0 30 0 0 0 127 127 127 0 0 0

Description

\

\ High-depth sequence reads from the Celera project were used to \ detect paralogy in the human genome reference sequence.\ This track shows confirmed segmental duplications, defined as having \ similarity to sequences in the Segmental Duplication Database (SDD) of\ greater than 90% over more than 250 bp of repeatmasked sequence.\ For a description of the whole-genome shotgun sequence detection (WSSD) \ "fuguization" method, see Bailey, J.A. et al. (2001) in \ the References section below.

\ \

Credits

\

\ The data were provided by \ Xinwei She \ and Evan Eichler as part of their\ efforts to map human paralogy at the \ University of Washington.

\ \

References

\

\ Bailey, J.A., et al., \ Recent segmental duplications in the human genome. \ Science 297(5583), 945-7 (2002).

\

\ Bailey, J.A., et al., \ Segmental duplications: organization and impact within the \ current human genome project assembly, Genome Res. 11(6), \ 1005-17 (2001).

\

\ She, X., et al., \ Shotgun sequence assembly and recent segmental duplications \ within the human genome. Nature 431(7011), 927-30 (2004).\

\ map 1 celeraOverlay WSSD Overlay bed 4 + Celera WGS Assembly Overlay on Public Assembly 0 30.1 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows regions detected as overlays of Celera\ whole-genome shotgun sequence assembly on the public human \ assembly.

\ \

Credits

\

\ The data were provided by \ Xinwei She \ and Evan Eichler \ as part of their effort to map \ human paralogy at the \ University of Washington.

\ \

References

\

\ Bailey, J.A., et al., \ Recent segmental duplications in the human genome. \ Science 297(5583), 945-7 (2002).

\

\ Bailey, J.A., et al., \ Segmental duplications: organization and impact within the \ current human genome project assembly, Genome Res. 11(6), \ 1005-17 (2001).

\

\ She, X., et al., \ Shotgun sequence assembly and recent segmental duplications \ within the human genome. Nature 431(7011), 927-30 (2004).\

\ map 1 genomicDups Duplications bed 6 + Duplications of >1000 Bases Sequence 0 31 170 0 0 160 150 0 0 0 0 This region was detected as a genomic duplication within the golden path. \ Duplications of 99% or greater similarity, which are likely missed overlaps, \ are shown as red. Duplications of 98% - 99% similarity are shown as yellow. \ Duplications of 90% - 98% similarity are shown as shades of gray. Cut off \ values were at least 1 kb of total sequence aligned (containing at least 500 bp \ non-RepeatMasked sequence) and at least 90% sequence identity. For a \ description of the 'fuguization' detection method see \ Bailey, et al (2001) Genome Res 11:1005-17. \ The data were provided by \ Jeff Bailey \ \ and Evan Eichler.\
\ map 1 dupes Duplications bed 6 . Duplications of >98% Identity >1kb 1 32 0 0 0 127 127 127 0 0 0 map 1 Nregion N Regions bed 4 . N Regions 0 32.5 150 100 30 202 177 142 0 0 0

Description

\

\ This track displays contiguous Ns of 1000 or more.\

Credits

\ It was generated with the nibCheck utility.\ \ map 1 genieKnown Known Genes genePred Known Genes (from Full-Length mRNAs) 3 33 20 20 170 137 137 212 0 0 0 genes 1 knownGene Known Genes genePred knownGenePep knownGeneMrna Known Genes Based on SWISS-PROT, TrEMBL, mRNA, and RefSeq 3 34 12 12 120 133 133 187 0 0 0

Description

\

\ The UCSC Known Genes track shows known protein-coding genes based on \ protein data from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their\ corresponding mRNAs from \ GenBank.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for\ gene prediction\ tracks. Black coloring indicates features that have corresponding entries\ in the Protein Databank (PDB). Blue indicates features associated with\ mRNAs from NCBI RefSeq or (dark blue) items having associated proteins in\ the SWISS-PROT database. The variation in blue shading of RefSeq items\ corresponds to the level of review the RefSeq record has undergone:\ predicted (light), provisional (medium), or reviewed (dark).

\

\ This track contains an optional codon coloring\ feature that allows users to quickly validate and compare gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. Click\ here for more\ information about this feature.

\ \

Methods

\

\ mRNA sequences were aligned against the $organism genome using blat. When a \ single mRNA aligned in multiple places, only alignments having at least 98% \ base identity with the genomic sequence were kept. This set of mRNA \ alignments was further reduced by keeping only those mRNAs referenced by a \ protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.

\

\ Among multiple mRNAs referenced by a single protein, the best mRNA was \ selected, based on a quality score derived from its length, the level of the\ match between its translation and the protein sequence, and its release date.\ The resulting mRNA and protein pairs were further filtered by removing \ short invalid entries and consolidating entries with identical CDS regions.\

\

\ Finally, RefSeq entries derived from DNA sequences instead of \ mRNA sequences were added to produce the final data set shown in this track. \ Disease annotations were obtained from SWISS-PROT.

\ \

Credits

\

\ The Known Genes track was produced at UCSC based primarily on cross-references\ between proteins from \ SWISS-PROT \ (including TrEMBL and TrEMBL-NEW) and mRNAs from \ GenBank\ contributed by scientists worldwide. \ NCBI RefSeq \ data were also included in this track.

\ \

Data Use Restrictions

\

\ The UniProt data have the following terms of use, UniProt copyright(c) 2002 - \ 2004 UniProt consortium:

\

\ For non-commercial use, all databases and documents in the UniProt FTP\ directory may be copied and redistributed freely, without advance\ permission, provided that this copyright statement is reproduced with\ each copy.

\

\ For commercial use, all databases and documents in the UniProt FTP\ directory except the files\

\ may be copied and redistributed freely, without advance permission,\ provided that this copyright statement is reproduced with each copy.\ More information for commercial users can be found \ here.\

\ From January 1, 2005, all databases and documents in the UniProt FTP\ directory may be copied and redistributed freely by all entities,\ without advance permission, provided that this copyright statement is\ reproduced with each copy.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32:D23-6.

\

\ Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D.\ The UCSC Known Genes.\ Bioinformatics. 2006 May 1;22(9):1036-46.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ directUrl /cgi-bin/hgGene?hgg_gene=%s&hgg_chrom=%s&hgg_start=%d&hgg_end=%d&hgg_type=%s&db=%s\ hgGene on\ hgsid on\ ccdsGene CCDS genePred Consensus CDS 0 34.5 12 120 12 133 187 133 0 0 0

Description

\

\ This track shows $organism genome high-confidence gene annotations from the\ Consensus \ Coding DNA Sequence (CCDS) project. This project is a collaborative effort \ to identify a core set of \ $organism protein-coding regions that are consistently annotated and of high \ quality. The long-term goal is to support convergence towards a standard set \ of gene annotations on the $organism genome.\

\

Collaborators include:\

\ \

Methods

\

\ CDS annotations of the $organism genome were obtained from two sources:\ NCBI \ RefSeq and a union of the gene annotations from \ Ensembl and \ Vega, collectively known \ as Hinxton.

\

\ Genes with identical CDS genomic coordinates in both sets become CCDS \ candidates. The genes undergo a quality evaluation, which must be approved by \ all collaborators. The following criteria are currently used to assess each\ gene: \

\

\ A unique CCDS ID is assigned to the CCDS, which links together all gene \ annotations with the same CDS. CCDS gene annotations are under continuous\ review, with periodic updates to this track.\

\ \

Credits

\

\ This track was produced at UCSC from data downloaded from the\ CCDS project \ web site.\

\ \

References

\

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, et al.\ The Ensembl genome database project. \ Nucl. Acids Res. 2002 Jan 1;30(1):38-41.

\

\ Pruitt KD, Tatusova T, Maglott DR.\ NCBI Reference Sequence (RefSeq): a curated non-redundant \ sequence database of genomes, transcripts and proteins. \ Nucl. Acids Res. 2005 Jan 1;33(Database Issue):D501-D504. \

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ interPro InterPro psl InterPro Domains 3 34.6 12 12 120 133 133 187 0 0 0

Description

\

\ Description of InterPro goes here.\ \

Methods

\

\ Methods goes here.\

Credits

\

\ Credits goes here.\ \ genes 1 refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 1 35 12 12 120 133 133 187 0 0 0

Description

\

\ The RefSeq Genes track shows known protein-coding genes taken from \ the NCBI mRNA reference sequences collection (RefSeq). On assemblies in \ which incremental GenBank downloads are supported, the data underlying this \ track are updated nightly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark). \ In some assemblies, non-coding RNA genes are shown in a separate track.

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \ This page is accessed via the small button to the left of the track's \ graphical display or through the link on the track's control menu. \

\

\ After you have made your selections, click the Submit button to \ return to the tracks display page.

\ \

Methods

\

\ RefSeq mRNAs were aligned against the $Organism genome using blat; \ those with an alignment of less than 15% were discarded. When a single mRNA \ aligned in multiple places, the alignment having the highest base identity \ was identified. Only alignments having a base identity level within 0.1% of \ the best and at least 96% base identity with the genomic sequence were kept.\

\ \ \

Credits

\

\ This track was produced at UCSC from mRNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \

Pruitt K.D., Tatusova, T., Maglott D.R. \ NCBI Reference Sequence (RefSeq): a curated non-redundant \ sequence database of genomes, transcripts and proteins Nucleic Acids \ Res. 33(1), D501-D504 (2005).\

\ \ \ genes 1 xenoRefGene Other RefSeq genePred xenoRefPep xenoRefMrna Non-$Organism RefSeq Genes 1 35.1 12 12 120 133 133 187 0 0 0

Description

\

\ This track shows known protein-coding genes from organisms other than \ $organism, taken from the NCBI mRNA reference sequences collection (RefSeq). \

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark). \ In some assemblies, non-coding RNA genes are shown in a separate track.

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \

\ \

Methods

\

\ The mRNAs were aligned against the $organism genome using blat; those\ with an alignment of less than 15% were discarded. When a single mRNA aligned \ in multiple places, the alignment having the highest base identity was \ identified. Only alignments having a base identity level within 0.5% of \ the best and at least 25% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ This track was produced at UCSC from mRNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ genes 1 rgdGene RGD Genes genePred Rat Genome Database Curated Genes 1 35.5 12 12 120 133 133 187 0 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ This track shows RefSeq genes curated by the Rat Genome Database (RGD).\ Coding exons are represented by \ blocks connected by horizontal lines representing introns. The 5' and 3' \ untranslated regions (UTRs) are displayed as thinner blocks on the leading \ and trailing ends of the aligning regions. In full display mode, arrowheads \ on the connecting intron lines indicate the direction of transcription.

\ \

Methods

\

\ The annotation data file, \ RGD_curated_genes.gff, was downloaded from the RGD website\ and processed to create this track.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

\ genes 1 mgcGenes MGC Genes genePred Mammalian Gene Collection Full ORF mRNAs 3 36 34 139 34 144 197 144 0 0 0

Description

\

\ This track shows alignments of $organism mRNAs from the\ Mammalian Gene Collection \ (MGC) having full-length open reading frames (ORFs) to the genome.

\ \

Display Conventions and Configuration

\

\ The track follows the display conventions for \ gene prediction \ tracks.

\

\ An optional codon coloring feature is available for quick\ validation and comparison of gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. Click \ here for more \ information about this feature.

\ \

Methods

\

\ GenBank $organism MGC mRNAs identified as having full-length ORFs \ were aligned against the genome using blat. When a single mRNA \ aligned in multiple places, the alignment having the highest base identity was\ found. Only alignments having a base identity level within 1% of\ the best and at least 95% base identity with the genomic sequence \ were kept.

\ \

Credits

\

\ The $organism MGC full-length mRNA track was produced at UCSC from \ mRNA sequence data submitted to \ GenBank by the \ Mammalian Gene Collection project.

\ \

References

\

\ Mammalian Gene Collection project references.

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ genes 1 orfeomeGenes ORFeome Clones genePred . orfeomeMRna ORFeome Collaboration Gene Clones 0 36.1 34 139 34 144 197 144 0 0 0

Description

\

\ This track shows alignments of $organism clones from the\ ORFeome Collaboration\

\ \

Display Conventions and Configuration

\

\ The track follows the display conventions for \ gene prediction \ tracks.

\ \

Methods

\

\ ORFeome $organism clones were obtained from Genbank and aligned against the\ genome using the blat program. When a single clones aligned in multiple places,\ the alignment having the highest base identity was found. Only alignments\ having a base identity level within 0.5% of the best and at least 96% base\ identity with the genomic sequence were kept.\

\ \

Credits and references

\

\ Visit the \ ORFeome Collaboration members page for a list of credits and references.\

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ cdsDrawOptions enabled\ exonArrows on\ transMap TransMap genePred TransMap Genes 0 36.25 0 0 0 127 127 127 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ compositeTrack on\ transMapAnc TransMap Ancestor genePred TransMap via Ancestor Tree Genes 0 36.26 0 0 0 127 127 127 0 0 0 genes 1 compositeTrack on\ protBlat Protein BLAT psl protein Protein Blatted Against Genome 0 37 0 100 0 255 240 200 0 0 0 genes 1 genieAlt AltGenie genePred genieAltPep Genie Gene Predictions from Affymetrix 1 39 125 0 150 190 127 202 0 0 0

Description

\

Genie predictions are based on \ Affymetrix's \ Genie gene finding software. Genie is a generalized HMM \ which accepts constraints based on mRNA and EST data.

\ genes 1 ensGene Ensembl Genes genePred ensPep Ensembl Gene Predictions 0 40 150 0 0 202 127 127 0 0 0

Description

\

\ These gene predictions were generated by Ensembl.

\ \

Methods

\

\ For a description of the methods used in Ensembl gene prediction, refer to \ Hubbard, T. et al. (2002) in the References section below.

\ \

Credits

\

\ Thanks to Ensembl for providing this annotation.

\ \

References

\

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T, et al. \ The Ensembl genome database project.\ Nucleic Acids Res. 2002 Jan 1;30(1):38-41.

\ \ genes 1 ensEstGene Ensembl EST Genes genePred ensEstPep Ensembl EST Gene Predictions 0 40.5 150 0 0 202 127 127 0 0 0 http://www.ensembl.org/perl/geneview?db=estgene&transcript=$$

Description

\

\ Gene predictions from Ensembl based on ESTs.

\ \

Methods

\

\ ESTs were mapped onto the genome using a combination of Exonerate, Blast \ and Est_Genome, with a threshold defined as an overall percentage identity \ of 90% and at least one exon having a percentage identity of 97% or higher. \ The results were processed by merging the redundant ESTs and setting \ splice sites to the most common ends, resulting in alternative spliced \ forms. This evidence was processed by Genomewise, which finds the longest \ ORF and assigns 5' and 3' UTRs.

\ \

Track Configuration

\

\ This track has an optional codon coloring feature that allows users to \ quickly validate and compare gene predictions. To display codon colors, \ select the genomic codons option from the Color track by \ codons pull-down menu at the top of the track description page.\ This page is accessed via the small button to the left of the track's\ graphical display or through the link on the track's control menu. Click \ here for more information about this feature.\

\

\ After you have made your configuration selections, click the \ Submit button to return to the tracks display page.

\ \

Credits

\

\ Thanks to Ensembl \ for providing this annotation.

\ \ genes 1 acembly AceView Genes genePred acemblyPep acemblyMrna AceView Gene Models With Alt-Splicing 0 41 155 0 125 205 127 190 0 0 0 http://www.ncbi.nih.gov/IEB/Research/Acembly/av.cgi?db=human&l=$$

Description

\

\ This track shows AceView gene models constructed from\ mRNA, EST and genomic evidence by Danielle and Jean Thierry-Mieg\ and Vahan Simonyan using the \ Acembly program.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks. Gene models that fall into the "main" prediction class\ are displayed in purple; "putative" \ genes are displayed in pink.

\

\ The track description page offers the following filter and configuration\ options:\

\

\ \

Methods

\

\ AceView attempts to find the best alignment of each mRNA/EST against the\ genome, and clusters the alignments into the least possible number of\ alternatively spliced transcripts. The reconstructed transcripts are then\ clustered into genes by simple transitive contact. To see the evidence that \ supports each transcript, click the "Outside Link" on an individual \ transcript's details page to access the NCBI AceView web site.

\

\ Each AceView transcript model has a gene cluster designation\ (alternate name) that is categorized into a prediction class\ of either main or \ putative.

\

\ Prediction Class: main \
Class of genes that includes the protein coding genes (defined\ here by CDS > 100 amino acids) and all genes with at least one\ well-defined standard intron, i.e., an intron with a GT-AG or GC-AG\ boundary, supported by at least one clone matching exactly, with\ no ambiguous bases, and the 8 bases on either side of the intron \ identical to the genome. Genes with a CDS smaller than 100 amino acids are\ included in this class if they meet one of the following conditions: they \ have a NCBI RefSeq sequence (NM_#) or an OMIM identifier, or they encode a \ protein with BlastP homology (< 1e-3) to a cDNA-supported nematode AceView \ protein.

\

\ Prediction Class: putative\
Class of genes that have no standard intron and do not\ encode CDS of more than 100 amino acids, yet may be sufficiently useful to \ justify not disregarding them completely. Putative genes may be of two\ types: either those supported by more than six cDNA clones or those that\ encode a putative protein with an interesting annotation. Examples include\ a PFAM motif, a BlastP hit to a species other than itself (< 1e-3), \ a transmembrane domain or other rare and meaningful domains\ identified by Psort2, or a highly probable localization in a cell\ compartment (excluding cytoplasm and nucleus).

\ \

Credits

\

\ Thanks to Danielle and Jean \ Thierry-Mieg at NIH for providing this track.

\ \

References

\

\ Thierry-Mieg D, Thierry-Mieg J. \ AceView: a comprehensive cDNA-supported gene and transcripts \ annotation.\ Genome Biol. 2006;7 Suppl 1:S12.1-14.

\ genes 1 ECgene ECgene Genes genePred ECgenePep ECgene Gene Predictions with Alt-Splicing 0 41.5 155 0 125 205 127 190 0 0 0

Description

\

\ ECgene (gene prediction by EST clustering) predicts genes by combining \ genome-based EST clustering and transcript \ assembly methods. The EST clustering is based on genomic alignment of mRNA \ and ESTs similar to that of NCBI's UniGene for the human genome. The \ transcript assembly procedure yields gene models for each cluster that \ include alternative splicing variants. This algorithm was developed by Prof. \ Sanghyuk Lee's Lab of Bioinformatics at Ewha Womans University in Seoul, \ Korea.

\

\ For more detailed information, see the \ ECgene website.\

\ \

Display Conventions

\

\ This track follows the display conventions for \ gene prediction \ tracks.

\ \

Methods

\ The following is a brief summary of the ECgene algorithm: \
    \
  1. \ Genomic alignment of mRNA and ESTs: Input sequences are aligned against the \ genome using the Blat program developed by Jim Kent. Blat alignments are corrected for \ valid splice sites, and the SIM4 program is used for suspicious alignments if necessary.\
  2. \ Sequences that share more than one splice site are clustered together. This produces the \ primary clusters without unspliced sequences (singletons).\
  3. \ The genomic alignment of exons in each spliced sequence is represented as a directed \ acyclic graph (DAG), and all possible gene models are derived by the depth-first-search \ (DFS) method.\
  4. \ Sequences compatible with each gene model are grouped together as sub-clusters. Gene \ models without sufficient evidence are discarded at this stage. Sensitive detection of \ polyA tails is achieved by analyzing genomic alignment of mRNA and EST sequences,\ and specifically used to determine the gene boundary.\
  5. \ Finally, unspliced sequences are added so as not to change the splice sites of the \ existing gene model.\
\ \

Credits

\

\ The predictions for this track were produced by Namshin Kim and Sanghyuk Lee \ at Ewha Womans Univeristy, Seoul, KOREA.\ genes 1 ensEst Ensembl ESTs genePred ensEstPep $Organism ESTs From Ensembl 0 42 175 20 125 215 137 190 0 0 0

Description

\

\ Gene predictions from Ensembl based on expressed sequence tags (ESTs).

\ \

Methods

\

\ For a description of the methods used, refer to \ Hubbard, T. et al. (2002) in the References section below.

\ \

Track Configuration

\

\ This track has an optional codon coloring feature that allows users to \ quickly validate and compare gene predictions. To display codon colors, \ select the genomic codons option from the Color track by \ codons pull-down menu at the top of the track description page.\ This page is accessed via the small button to the left of the track's\ graphical display or through the link on the track's control menu. Click \ here for more information about this feature.

\

\ After you have made your configuration selections, click the \ Submit button to return to the tracks display page.

\ \

Credits

\

\ Thanks to Ensembl \ for providing this annotation.

\ \

References

\

\ Hubbard, T. et al.. \ The Ensembl genome database project.\ Nucleic Acids Research 30(1), 38-41 (2002).

\ \ genes 1 ncbiGenes NCBI Gene Models genePred ncbiPep $Organism Gene Models from NCBI 0 43 0 0 0 127 127 127 0 0 0

Description & Credits

\ \ Gene predictions from \ NCBI . \ See the human build \ \ release notes \ for a description of the build. \ genes 1 npredGene NCBI Prediction genePred npredPep NCBI Gene Predictions 0 44 170 100 0 212 177 127 0 0 0 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=$$

Description

\

This track shows predictions from NCBI Genome\ Assembly/Annotation Projects.\ \

Methods

\ Methods details goes here.\

Credits

\ Thanks to NCBI.\ \ genes 1 ucscFromMouse UCSC Mm3 genePred UCSC Gene Predictions from Known Mouse Genes Mapped to Human 0 45 0 100 100 0 50 50 0 0 0 genes 1 twinscan Twinscan genePred twinscanPep Twinscan Gene Predictions Using Mouse/Human Homology 0 45 0 100 100 127 177 177 0 0 0

Description

\

\ The Twinscan program predicts genes in a manner similar to Genscan, except \ that Twinscan takes advantage of genome comparisons to improve gene prediction\ accuracy. More information and a web server can be found at\ http://mblab.wustl.edu/.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.

\

\ The track description page offers the following filter and configuration\ options:\

\ \

Methods

\

\ The Twinscan algorithm is described in Korf, I. et al. (2001) in the\ References section below.

\ \

Credits

\

\ Thanks to Michael Brent's Computational Genomics Group at Washington \ University St. Louis for providing these data.

\ \

References

\

\ Korf I, Flicek P, Duan D, Brent MR.\ Integrating genomic homology into gene structure prediction.\ Bioinformatics. 2001 Jun 1;17(90001)S140-8.

\ genes 1 genomeScan NCBI GenomeScan genePred genomeScanPep $Organism GenomeScan Models from NCBI 0 46 0 0 0 127 127 127 0 0 0

Description & Credits

\ \ Pure GenomeScan gene predictions from \ NCBI .\ See the human build \ \ release notes \ for a description of the build. \ genes 1 sgpGene SGP Genes genePred sgpPep SGP Gene Predictions Using Mouse/Human Homology 0 47 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the SGP program, which is being developed at \ the Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. To predict genes in a genomic\ query, SGP combines geneid predictions with tblastx comparisons of the genomic query against other genomic sequences.\

\

Credits

\

\ Thanks to GRIB for providing these gene predictions.\

\ \ \ \ genes 1 softberryGene Fgenesh++ Genes genePred softberryPep Fgenesh++ Gene Predictions 0 48 0 100 0 127 177 127 0 0 0

Description

\

\ Fgenesh++ predictions are based on Softberry's gene-finding software.

\ \

Methods

\

\ Fgenesh++ uses both hidden Markov models (HMMs) and protein similarity to \ find genes in a completely automated manner. For more information, see \ Solovyev, V.V. (2001) in the References section below.

\ \

Credits

\

\ The Fgenesh++ gene predictions were produced by \ Softberry Inc. \ Commercial use of these predictions is restricted to viewing in \ this browser. Please contact Softberry Inc. to make arrangements for further \ commercial access.

\ \

References

\

\ Solovyev, V.V. \ "Statistical approaches in Eukaryotic gene prediction" in the \ Handbook of Statistical Genetics (ed. Balding, D. et al.), \ 83-127. John Wiley & Sons, Ltd. (2001).

\ genes 1 geneid Geneid Genes genePred geneidPep Geneid Gene Predictions 0 49 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the geneid program developed at the \ Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. \

\

Methods

\

\ Geneid is a program to predict genes in anonymous genomic sequences designed \ with a hierarchical structure. In the first step, splice sites, start and stop \ codons are predicted and scored along the sequence using Position Weight Arrays \ (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the \ scores of the defining sites, plus the the log-likelihood ratio of a \ Markov Model for coding DNA. Finally, from the set of predicted exons, the gene \ structure is assembled, maximizing the sum of the scores of the assembled exons. \

\

Credits

\

\ Thanks to GRIB for providing these data.\

\ genes 1 jgiGene JGI Genes genePred jgiPep JGI Gene Predictions 1 49 0 0 0 127 127 127 0 0 0 genes 1 snapGene SNAP Genes genePred SNAP Gene Predictions 1 49 0 0 0 127 127 127 0 0 0 genes 1 genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 0 50 170 100 0 212 177 127 0 0 0

Description

\

\ This track shows predictions from the \ Genscan program \ written by Chris Burge.\ The predictions are based on transcriptional, \ translational, and donor/acceptor splicing signals, as well as the length \ and compositional distributions of exons, introns and intergenic regions.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks. \

\ The track description page offers the following filter and configuration\ options:\

\ \

Methods

\

\ For a description of the Genscan program and the model that underlies it, \ refer to Burge and Karlin (1997) in the References section below. \ The splice site models used are described in more detail in Burge (1998)\ below.

\ \

Credits

\ Thanks to Chris Burge for providing these data.\ \

References

\

\ Burge C. \ Modeling Dependencies in Pre-mRNA Splicing Signals. \ In Salzberg S, Searls D, Kasif S, eds. \ Computational Methods in Molecular Biology, \ Elsevier Science, Amsterdam. 1998;127-163.

\

\ Burge C, Karlin S. \ Prediction of Complete Gene Structures in Human Genomic DNA.\ J. Mol. Biol. 1997 Apr 25;268(1):78-94.

\ genes 1 genscanExtra Genscan Extra bed 6 . Genscan Extra (Suboptimal) Exon Predictions 0 51 180 90 0 217 172 127 0 0 1 chr22, genes 1 augustus Augustus genePred Augustus Gene Predictions 0 51.7 180 0 0 217 127 127 0 0 0 genes 1 rnaGene RNA Genes bed 6 + Non-coding RNA Genes (dark) and Pseudogenes (light) 0 52 170 80 0 230 180 130 0 0 0

Description

\

\ This track shows the location of non-protein coding RNA genes and\ pseudogenes. \

\ Feature types include:\

\

\ \

Methods

\ \

\ Eddy-tRNAscanSE (tRNA genes, Sean Eddy):
\ tRNAscan-SE 1.23 with default parameters.\ Score field contains tRNAscan-SE bit score; >20 is good, >50 is great.

\

\ Eddy-BLAST-tRNAlib (tRNA pseudogenes, Sean Eddy):
\ Wublast 2.0, with options "-kap wordmask=seg B=50000 W=8 cpus=1".\ Score field contains % identity in blast-aligned region.\ Used each of 602 tRNAs and pseudogenes predicted by tRNAscan-SE\ in the human oo27 assembly as queries. Kept all nonoverlapping\ regions that hit one or more of these with P <= 0.001.

\

\ Eddy-BLAST-snornalib (known snoRNAs and snoRNA pseudogenes, Steve Johnson):
\ Wublastn 2.0, with options "-V=25 -hspmax=5000 -kap wordmask=seg \ B=5000 W=8 cpus=1".\ Score field contains blast score.\ Used each of 104 unique snoRNAs in snorna.lib as a query.\ Any hit >=95% full length and >=90% identity is annotated as a\ "true gene".\ Any other hit with P <= 0.001 is annotated as a "related sequence" \ and interpreted as a putative pseudogene.

\

\ Eddy-BLAST-otherrnalib \ (non-tRNA, non-snoRNA noncoding RNAs with GenBank entries\ for the human gene.):
\ Wublastn 2.0 [15 Apr 2002]\ with options: "-kap -cpus=1 -wordmask=seg -W=8 -E=0.01 -hspmax=0\ -B=50000 -Z=3000000000". Exceptions to this are:\

\

\ The score field contains the blastn score. \ Used 41 unique miRNAs, and 29 other ncRNAs as queries.\ Any hit >=95% full length and >=95% identity is annotated as a \ "true gene".\ Any other hit with P <= 0.001 and >= 65% identity is annotated\ as a "related sequence". An exception to this is: all miRNAs consist \ \ of 16-26 bp sequences in GenBank \ and are only annotated if 100% full length and 100% identity. \ miRNAs consist of Let-7 from Pasquinelli et al., \ Nature (2000) 408:86; 40 from Mourelatos et al., Gene & Dev (2002) \ 16:720.

\

Credits

\

\ These data were kindly provided by Sean Eddy at Washington University.

\ genes 1 superfamily Superfamily bed 4 + Superfamily/SCOP: Proteins Having Homologs with Known Structure/Function 0 53 150 0 0 202 127 127 0 0 0 http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gene.cgi?genome=

Description

\

\ The \ Superfamily \ track shows proteins having homologs with known structures or functions.

\

\ Each entry on the track shows the coding region of a gene (based on Ensembl gene predictions).\ In full display mode, the label for an entry consists of the names of \ all known protein domains encoded by this gene. This \ usually contains structural and/or functional descriptions that provide valuable \ information to help users get a quick grasp of the biological significance of the \ gene.

\ \

Methods

\

\ Data are downloaded from the Superfamily server.\ Using the cross-reference between Superfamily entries and Ensembl gene prediction \ entries and their alignment to the appropriate genome, the associated data are \ processed to generate a simple BED format track.

\

Credits

\

\ Superfamily was developed by\ Julian\ Gough at the MRC Laboratory\ of Molecular Biology, Cambridge.

\

\ Gough, J., Karplus, K., Hughey, R. and\ Chothia, C. (2001). "Assignment of Homology to Genome Sequences using a\ Library of Hidden Markov Models that Represent all Proteins of Known Structure". \ J. Mol. Biol., 313(4), 903-919.

\ \ genes 1 mrna $Organism mRNAs psl . $Organism mRNAs from GenBank 3 54 0 0 0 127 127 127 1 0 0

Description

\

\ The mRNA track shows alignments between $organism mRNAs\ in GenBank and the genome.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly validate and compare mRNA. For more \ information about this option, click \ here.\

\ \

Methods

\

\ GenBank $organism mRNAs were aligned against the genome using the \ blat program. When a single mRNA aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL.\ GenBank: update. Nucleic Acids Res.\ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ showDiffBasesAllScales .\ all_mrna Ciona mRNAs psl mrna $Organism mRNAs from Genbank 1 54 0 0 0 127 127 127 1 0 0

Description

\

\ The mRNA track shows alignments between $Organism mRNAs\ in GenBank and the genome.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in the \ liver, type "liver" in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly validate and compare mRNA. For more \ information about this option, click \ here.\

\ \

Methods

\

\ GenBank $Organism mRNAs were aligned against the genome using the \ blat program. When a single mRNA aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ This track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ rna 1 baseColorDefault diffCodons\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ showDiffBasesAllScales .\ tightMrna Tight mRNAS psl . Tightly Filtered $Organism mRNAs from GenBank 0 55 0 0 0 127 127 127 1 0 0 rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ showDiffBasesAllScales .\ intronEst Spliced ESTs psl est $Organism ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $Organism expressed sequence tags\ (ESTs) in GenBank and the genome that show signs of splicing when\ aligned against the genome. ESTs are single-read sequences, typically about \ 500 bases in length, that usually represent fragments of transcribed genes.\

\

\ To be considered spliced, an EST must show \ evidence of at least one canonical intron, i.e. one that is at least\ 32 bases in length and has GT/AG ends. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ $Organism EST track.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, $Organism ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence are displayed in this track.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ rna 1 indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ showDiffBasesAllScales .\ est $Organism ESTs psl est $Organism ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $organism expressed sequence tags \ (ESTs) in GenBank and the genome. ESTs are single-read sequences, \ typically about 500 bases in length, that usually represent fragments of \ transcribed genes.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, $organism ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence were kept.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL.\ GenBank: update. Nucleic Acids Res.\ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorUseSequence genbank\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ all_est All Ciona ESTs psl est $Organism ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $Organism expressed sequence tags\ (ESTs) in GenBank and the genome. ESTs are single-read sequences, \ typically about 500 bases in length, that usually represent fragments of \ transcribed genes.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, $Organism ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence are displayed in this track.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ rna 1 baseColorUseSequence genbank\ indelDoubleInsert on\ indelQueryInsert on\ rgdEst RGD EST psl est RGD EST 0 57.5 12 12 120 133 133 187 1 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ This track shows expressed sequence tags (ESTs) downloaded from the\ Rat Genome Database (RGD). An EST is a partial sequence of a randomly-chosen \ cDNA, obtained from the results of a single DNA sequencing reaction. ESTs \ are used to identify transcribed regions in genomic sequence and to \ characterize patterns of gene expression in the tissue from which the \ cDNA was derived.

\ \

Methods

\

\ The data used to create this annotation were obtained from the file \ RGD_EST.gff downloaded from the RGD website.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled \ "Rat Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the \ National Institutes of Health \ (NIH).

\ \ rna 1 tightEst Tight ESTs psl est Tightly Filtered $Organism ESTs Including Unspliced 0 58 0 0 0 127 127 127 1 0 0 rna 1 baseColorUseSequence genbank\ indelDoubleInsert on\ indelQueryInsert on\ miRNA miRNA bed 8 . MicroRNAs from miRBase 0 63 255 64 64 255 159 159 1 0 0 http://microrna.sanger.ac.uk/cgi-bin/sequences/mirna_entry.pl?id=$$

Description

\

\ The miRNA track shows microRNAs from the\ \ miRBase at The \ Wellcome Trust Sanger Institute.

\ \

Display Conventions and Configuration

\

\ Mature miRNAs (miRs) are represented by \ thick blocks. The predicted stem-loop portions of the primary transcripts\ are indicated by thinner blocks. miRNAs in the sense orientation are shown in\ black; those in the reverse orientation are colored grey. When a single \ precursor produces two mature miRs from its 5' and 3' parts, it is displayed \ twice with the two different positions of the mature miR.

\

\ To display only those items that exceed a specific unnormalized score, enter\ a minimum score between 0 and 1000 in the text box at the top of the track \ description page.\

\ \

Methods

\

\ Mature and precursor miRNAs from the miRNA Registry were\ aligned against the genome using blat.\ The extents of the precursor sequences were not generally known, and were\ predicted based on base-paired hairpin structure. \ miRBase is described in Griffiths-Jones, S. et al. (2006).\ The miRNA Registry is\ described in Griffiths-Jones, S. (2004) and Weber, M.J. (2005) in the \ References section below.

\ \

Credits

\

\ \ This track was created by Michel Weber of \ Laboratoire de Biologie Moléculaire Eucaryote,\ CNRS Université Paul Sabatier\ (Toulouse, France), Yves Quentin of Laboratoire de Microbiologie et Génétique\ Moléculaires (Toulouse, France) and Sam Griffiths-Jones of\ \ The Wellcome Trust Sanger Institute\ (Cambridge, UK).\

\

References

\

\ When making use of these data, please cite:

\

\ Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ.\ miRBase: microRNA sequences, targets and gene nomenclature.\ Nucl. Acids Res. 34, Database Issue, D140-D144 (2006).

\

\ Griffiths-Jones, S. \ The microRNA Registry,\ Nucl. Acids Res. 32, D109-D111 (2004).

\

\ Weber, M. J. \ New human and mouse microRNA genes found by homology search.\ Febs J 272, 59-73 (2005).

\

\ You may also want to cite The Wellcome Trust Sanger Institute \ miRNA Registry.

\

\ The following publication provides guidelines on miRNA annotation:\ Ambros, V. et al., \ A uniform system for microRNA annotation. \ RNA 9(3), 277-279 (2003).

\

\ For more information on blat, see \ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ genes 1 urlLabel miRBase:\ xenoMrna Other mRNAs psl xeno Non-$Organism mRNAs from Genbank 1 63 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in \ GenBank from organisms other than $Organism.\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching \ translated genomic sequence. Because the two orientations of a DNA \ sequence give different predicted protein sequences, there are four \ combinations. ++ is not the same as --, nor is +- the same as -+.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly validate and compare mRNAs. For more \ information about this option, click \ here.\

\ \

Methods

\

\ The mRNAs were aligned against the $organism genome using translated \ blat. When a single mRNA aligned in multiple places, the alignment having the\ highest base identity was found. Only those alignments having a base \ identity level within 1% of the best and at least 25% base identity with the \ genomic sequence were kept.

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ indelDoubleInsert on\ indelQueryInsert on\ showDiffBasesAllScales .\ xenoBestMrna Other Best mRNAs psl xeno Non-$Organism mRNAs from GenBank Best in Genome Alignments 0 64 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in \ GenBank from organisms other than $organism. \ Better alignments are indicated by darker coloration in the display.

\ \

Methods

\

\ The mRNAs were aligned against the $organism genome using translated blat. \ When a single mRNA aligned in multiple places, the alignment having the \ highest base identity was found. Only those alignments having a base \ identity level within 1% of the best and at least 25% base identity with the\ genomic sequence were kept.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the display mode, \ change the color, and include/exclude a subset of items within the track.\ This may be helpful when many items are shown in the track display, \ especially when only some are relevant to the current task. \ The filter is located at the top of the track description page, which is \ accessed via the small button to the left of the track's graphical \ display or through the link on the track's control menu. \ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in the \ liver, type "liver" in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ indelDoubleInsert on\ indelQueryInsert on\ showDiffBasesAllScales .\ xenoEst Other ESTs psl xeno Non-$Organism ESTs from GenBank 0 65 0 0 0 127 127 127 1 0 0 http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$

Description

\

\ This track displays translated blat alignments of expressed sequence tags \ (ESTs) in GenBank from organisms other than $organism.\ ESTs are single-read sequences, typically about 500 bases in length, that \ usually represent fragments of transcribed genes.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --, nor is +- the same\ as -+.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\

\ \

Methods

\

\ To generate this track, the ESTs were aligned against the genome using \ blat. When a single EST aligned in multiple places, the \ alignment having the highest base identity was found. Only alignments \ having a base identity level within 0.5% of the best and at least 96% base \ identity with the genomic sequence were kept.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data submitted to the \ international public sequence databases by scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and \ Wheeler, D.L. \ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ rna 1 baseColorUseSequence genbank\ indelDoubleInsert on\ indelQueryInsert on\ anyCovBed mRNA/EST/Pseud bed 3 . Blastz Alignments of GenBank mRNA Including Pseudogenes and All ESTs 0 66 170 128 128 212 191 191 0 0 0 rna 1 anyMrnaCov mRNA/Pseud bed 3 . Blastz Alignments of GenBank mRNA Including Pseudogenes 0 67 170 128 128 212 191 191 0 0 0 rna 1 tigrGeneIndex TIGR Gene Index genePred Alignment of TIGR Gene Index TCs Against the $Organism Genome 0 68 100 0 0 177 127 127 0 0 0 http://www.tigr.org/tigr-scripts/tgi/tc_report.pl?$$

Description

\

This track displays alignments of the TIGR Gene Index (TGI)\ against the $organism genome. The TIGR Gene Index is based\ largely on assemblies of EST sequences in the public databases.\ See \ www.tigr.org for more information about TIGR and the Gene Index.

\ \

Credits

\

Thanks to Foo Cheung and Razvan Sultana of the The Institute for Genomic Research, for converting these data into a track for the browser.

\ rna 1 autoTranslate 0\ uniGene_3 UniGene psl UniGene Alignments 0 69 0 0 0 127 127 127 1 0 0 http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=

Description

\

\ This track shows the UniGene genes from NCBI.\ Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location. \

\

\ Coding exons are represented by \ blocks connected by horizontal lines representing introns. \ In full display mode, arrowheads \ on the connecting intron lines indicate the direction of transcription.

\ \

Methods

\

\ The UniGene sequence file, Hs.seq.uniq.gz, is downloaded from NCBI.\ Sequences are aligned to base genome using BLAT to create this track.

\

\ When a single UniGene gene aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.2% of the best and \ at least 96.5% base identity with the genomic sequence were kept. \

\

Credits

\

\ Thanks to UniGene for \ providing this annotation. \

\ rna 1 uniGene_2 UniGene bed 12 . UniGene Alignments and SAGE Info 0 69 0 0 0 127 127 127 1 0 0 rna 1 uniGene UniGene psl . UniGene Alignments and SAGE Info 0 70 0 0 0 127 127 127 0 0 0 rna 1 rnaCluster Gene Bounds bed 12 . Gene Boundaries as Defined by RNA and Spliced EST Clusters 0 71 200 0 50 227 127 152 0 0 0

Description

\

\ This track shows the boundaries of genes and the direction of\ transcription as deduced from clustering spliced ESTs and mRNAs\ against the genome. When many spliced variants of the same gene exist, \ this track shows the variant that spans the greatest distance in the \ genome.

\ \

Method

\

\ ESTs and mRNAs from \ GenBank were aligned against the genome using blat.\ Alignments with less than 97.5% base identity within the aligning blocks \ were filtered out. When multiple alignments occurred, only those\ alignments with a percentage identity within 0.2% of the\ best alignment were kept. The following alignments were also discarded: \ ESTs that aligned without any introns, blocks smaller than 10 bases, and \ blocks smaller than 130 bases that were not located next to an intron. \ The orientations of the ESTs and mRNAs were deduced from the GT/AG splice \ sites at the introns; ESTs and mRNAs with overlapping blocks\ on the same strand were merged into clusters. Only the\ extent and orientation of the clusters are shown in this track.

\

\ Scores for individual gene boundaries were assigned based on the number of \ cDNA alignments used:\

\ \

Credits

\

\ This track, which was originally developed by Jim Kent,\ was generated at UCSC and uses data submitted to GenBank by \ scientists worldwide.

\ \

References

\

\ Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and\ Wheeler, D.L.\ GenBank: update. Nucleic Acids Res. 32,\ D23-6 (2004).

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ rna 1 genieBounds Clone Bounds bed 9 . Clone Boundaries from EST Mate Pairs 0 72 178 34 34 216 144 144 0 0 0

Description & Credits

\ \

These clone bounds are based on EST mate pairs from \ Affymetrix's \ Genie gene finding software. \

\ rna 1 exonWalk ExonWalk genePred ExonWalk Alt-Splicing Transcripts 0 72 23 58 58 139 156 156 0 0 0

Description

\ \

The ExonWalk program merges cDNA evidence together to predict full\ length isoforms, including alternative transcripts. To predict\ transcripts that are biologically functional, rather than the result\ of technical or biological noise, ExonWalk requires that every intron\ and exon be either: 1) Present in cDNA libraries of another organism\ (i.e. also present in mouse), 2) Have three separate cDNA GenBank\ entries supporting it, or 3) Be evolving like a coding exon as\ determined by Exoniphy.\ Once the transcripts are predicted an ORF finder (BESTORF from\ Softberry) is used to find the\ best open reading frame. By default transcripts that are targets for\ nonsense mediated decay (NMD) are filtered out as they are less likely\ to be translated into proteins.\ \

Methods

\ \

The input to the ExonWalk program is the AltSplice track which has\ filtered out exons and introns that are not: 1) Present in cDNA\ libraries of another organism (i.e. also present in mouse), 2) Have\ three separate cDNA GenBank entries supporting it, or 3) Be evolving\ like a coding exon as determined by Exoniphy.\ \

The ExonWalk algorithm takes these filtered sequences and\ constructs a graph where the exons are the nodes and the introns are\ the edges. The goal of the program is to produce all full length\ transcripts implied by the transcripts. Full length transcripts are\ defined as transcripts that are not a subsequences of another\ transcript. The stages of the algorithm can be divided into three\ steps as illustrated in Figure 1 below:\ \

    \
  1. Detection and connection of compatible transcripts (Figure 1B).
  2. \
  3. Merging of vertices that are identical in terms of splicing (Figure 1C).
  4. \
  5. Exploration of all paths in the resulting graph (Figure 1D).
  6. \
\ \ \ \
\ \
\ Different stages of the ExonWalk Program. A. Different\ transcripts for a particular gene have been aligned to the genome to\ give an order and orientation. B. Exons in the overlapping\ section of compatible transcripts are joined to form new\ edges. C. Vertices which are redundant are pruned from the\ graph, being replaced by edges from other, equivalent, vertices. This\ simplifies the initial graph and yet retains splicing specific\ information. D. The maximal paths through the graph are\ explored to produce a set of maximal (full length) transcripts.\
\ \

Initially each each transcript is an independent sub-graph in the\ exon graph. Individual transcripts are then compared pairwise to\ determine if they are compatible. If they are compatible, an edge is\ created between exons of the overlap, called a compatibility edge.\ This results in a directed graph where overlapping exons are connected\ together, and thus compatible transcripts have been connected as well\ (Figure 1B). The algorithm then makes use of the\ implicit order provided by the genome sequence and the fact that\ splicing occurs in order to explore all of the paths present in the\ graph.\ \

Comments/Questions? Email sugnet@soe.ucsc.edu\ genes 1 exonWalk2 ExonWalk2 genePred ExonWalk Alt-Splicing Transcripts - take 2 0 72.01 23 58 58 139 156 156 0 0 0 genes 1 exonWalkRna ExonWalkRna genePred ExonWalk Alt-Splicing Transcripts mRNA only, no orthology 0 72.02 23 58 58 139 156 156 0 0 0 genes 1 exonWalkRnaNoCds ExonWalkRnaNoCds bed 12 Exonwalk on Rna only, no orthology, no CDS mapping 0 72.03 23 58 58 139 156 156 0 0 0 genes 1 agxMapped agxMapped bed 12 . Condensed version of AltGraphX Mapped from Mouse 0 72.1 153 26 42 204 140 148 0 0 0 rna 1 orthoIntrons orthoIntrons bed 12 . Bed version of AltGraphX Mapped Inrons from Mouse 0 72.2 107 74 34 181 164 144 0 0 0 rna 1 altGraph AltGraph psl . AltGraph 0 73 0 0 0 127 127 127 0 0 0 rna 1 allenBrainAli Allen Brain psl . Allen Brain Atlas Probes 0 80 50 0 100 152 127 177 0 0 0

Description

\

\ This track provides a link into the \ Allen Brain Atlas (ABA)\ images for this probe. The ABA is an extensive\ database of high resolution in-situ hybridization images of adult\ male mouse brains covering the majority of genes.

\ \

Methods

\

\ The ABA created a platform for high-throughput in situ hybridization \ (ISH) that allows a highly systematic approach to analyzing gene expression in \ the brain. ISH is a technique that allows the cellular localization of mRNA \ transcripts for specific genes. Labeled antisense probes, specific to a \ particular gene, are hybridized to cellular (sense) transcripts and subsequent \ detection of the bound probe produces specific labeling in those cells \ expressing the particular gene. This method involves tagged nucleotides \ detected by colorimetric methods.

\ \

The platform used for the ABA utilizes this non-isotopic approach, with \ digoxigenin-labeled nucleotides incorporated into a riboprobe produced by in\ vitro transcription. This method produces a label that fills the cell body,\ in contrast to autoradiography that produces scattered silver grains surrounding\ each labeled cell. To enhance the ability to detect low level expression, the \ ABA has incorporated a tyramide signal amplification step into the protocol that\ greatly increases sensitivity. The specific methodology is described in detail \ within the ABA Data Production Processes document.

\ \

Credits

\

\ Thanks to the Allen \ Institute for Brain Science in general, and Susan \ Sunkin in particular, for coordinating with UCSC on this annotation.

\ \ regulation 1 rosetta Rosetta bed 15 + Rosetta Experimental Confirmation of Chr22 Exons 0 88 0 0 0 127 127 127 0 0 1 chr22,

Description

\

Expression data from Rosetta Inpharmatics.\ See the paper "Experimental Annotation of the Human Genome Using Microarray Technology"\ Nature Feb. 2001, vol 409 pp 922-7 for more\ information. Briefly, Rosetta created DNA probes for each exon as\ described by the Sanger center for the October 2000 draft of the\ genome and used them to explore expression leves over 69 different\ experiments. As in the original paper exons are labeled according to\ contig name, relative position in the contig, and whether they were\ predicted (pe) or confirmed (true->te) exons at the time of\ publication. For example, AC000097_256_te is the 256th exon on\ AC000097 predicted by Genescan which was confirmed\ independently. Hybridization names refer to the sources of the two\ mRNA populations used for the experiment.\ Please note: in the browser window the hybridization names\ are too long to fit and have been abbreviated. Also, the ratios\ were inverted as of Feb 12, 2002 to conform with standard microarray\ conventions of having the experimental sample in the red (cy5) channel\ and the reference sample in the green (cy3) channel.\ \

Display Options

\ The track can be configured with a few different options:\ \
Reference Sample: This option is only valid when the track is displayed in\ full. It determines how the 69 different experiments are displayed. The\ options are:\ \ \ Exons Shown: Probes on the microarrays correspond to gene\ predictions on chromosome 22, some of which were confirmed by known\ genes, others are predictions. This option determines whether data are\ shown for probes corresponding to confirmed, predicted, or all exons\ are shown.\ \
Color Scheme: Data are presented using two color false\ display. By default the Brown/Botstein colors of red -> positive log\ ratio, green -> negative log ratio are used. However, blue can be\ substituted for green for those who are color blind. Gray values\ indicate missing data. Please note that due to technical limitations\ the details page will have many more color shades possible than those used\ on the browser image and thus may not match exactly.\ \

Details Page

\ On the details page the probes presented correspond to those contained\ in window range seen on the Genome Browser, the exon probe selected is highlighted\ in blue. The detail display table is actually an average of many data\ points. It is possible to see the full data for each experiment\ graphically by selecting the check-boxes for the experiments of interest\ and clicking the submit value button.\ regulation 1 cpgIslandExt CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 90 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\ \

Methods

\

\ CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula cited in \ Gardiner-Garden et al. (1987) in the References section below: \

\
    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\
\ where N = length of sequence.\

\ \

Credits

\

\ This track was generated using a\ modification of a program developed by G. Miklem and L. Hillier.

\ \

References

\

\ Gardiner-Garden M, Frommer M. \ CpG islands in vertebrate genomes.\ J. Mol. Biol. 1987 Jul 20;196(2):261-282.

\ regulation 1 cpgIsland CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 90 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\ \

Methods

\

\ CpG islands are predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment is then\ evaluated for the following criteria:\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.

\ \

Credits

\

\ This track was generated using a modification of a program developed by \ G. Miklem and L. Hillier.

\ \ regulation 1 cpgIslandGgfAndyMasked CpG Islands (AL) bed 4 + CpG Islands - Andy Law, masked sequence (Islands < 300 Bases are Light Green) 0 90.001 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.\

\ \

Methods

\

\ The genome sequence was masked using the output of RepeatMasker and\ the Tandem Repeats Finder (period ≤ 12). A sliding-window search\ was performed on the set of CpG locations in the masked genome\ sequence to find the longest spans that met the criteria given in\ Gardiner-Garden, M. and Frommer, M. (1987) in the References section\ below:\

\ The ratio of observed to expect CpGs is calculated as follows:\
\
\
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\
\

\ \

Credits

\

\ This track was generated using a program written by Andy Law (Roslin \ Institute) with minor modifications by Angie Hinrichs (UCSC).

\ \

References

\

\ Gardiner-Garden, M., Frommer, M.\ CpG islands in vertebrate genomes.\ J. Mol. Biol. 196(2), 261-282 (1987).

\ \ regulation 1 cpgIslandGgfAndy CpG Islands (AL) bed 4 + CpG Islands - Andy Law (Islands < 300 Bases are Light Green) 0 90.01 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.\

\ \

Methods

\

\ A sliding-window search was performed on the set of CpG locations in \ the genome to find the longest spans that met the criteria given in \ Gardiner-Garden, M. and Frommer, M. (1987) in the References section below:\

\ The ratio of observed to expect CpGs is calculated as follows:\
\
\
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\
\

\ \

Credits

\

\ This track was generated using a program written by Andy Law (Roslin \ Institute) with minor modifications by Angie Hinrichs (UCSC).

\ \

References

\

\ Gardiner-Garden, M., Frommer, M.\ CpG islands in vertebrate genomes.\ J. Mol. Biol. 196(2), 261-282 (1987).

\ \ regulation 1 firstEF FirstEF bed 6 . FirstEF: First-Exon and Promoter Prediction 0 90.1 0 0 0 127 127 127 1 0 0 http://rulai.cshl.org/tools/FirstEF/Readme/README.html

Description

\ \

This track shows predictions from\ the FirstEF\ (First Exon Finder) program.

\ \

Three types of predictions are displayed: exon, promoter and CpG window. \ If two consecutive predictions are separated by less than 1000 bp, \ FirstEF treats them as one cluster of alternative first exons that may \ belong to same gene. The cluster number is displayed in the parentheses \ of each item. For example, "exon(405-)" \ represents the exon prediction in cluster number 405 on the minus strand. \ The exon, promoter and CpG-window are interconnected by this cluster number. \ Alternative predictions within the same cluster are denoted by "#N" \ where "N" is the serial number of an alternative prediction in the \ cluster.

\ \

Each predicted exon is either CpG-related or non-CpG-related, based on\ a score of the frequency of CpG dinucleotides.\ An exon is classified as CpG-related if the CpG score is greater \ than a threshold value, and non-CpG-related if less than the threshold. If an \ exon is CpG-related, \ its associated CpG-window is displayed. The browser displays features with higher\ scores in darker shades of gray/black.

\ \

Method

\ \

FirstEF is a 5' terminal exon and promoter\ prediction program. It consists of different discriminant functions structured\ as a decision tree. The probabilistic models are optimized to find potential\ first donor sites and CpG-related and non-CpG-related promoter regions based on\ discriminant analysis. For every potential first donor site (GT) and an upstream\ promoter region, FirstEF decides whether or not the intermediate region can be\ a potential first exon, based on a set of quadratic discriminant functions.\ FirstEF calculates the a posteriori probabilities of exon, donor, and\ promoter for a given GT and an upstream window of length 570 bp.

\ \

For a description of the FirstEF program and the underlying classification \ models, refer to Davuluri et al., 2001. \ \

Credits

\ \

The predictions for this track are produced by Ramana V.\ Davuluri of Ohio State University and Ivo Grosse and\ Michael Q. Zhang of Cold Spring Harbor Lab.\ \

References

\

\ Davuluri RV, Grosse I, Zhang MQ.\ Computational identification of promoters and first exons in the \ human genome. \ Nat Genet. 2001 Dec;29(4):412-7.

\ regulation 1 scoreMax 1000\ scoreMin 500\ softPromoter TSSW Promoters bed 5 + TSSW Promoter Predictions 0 90.2 0 100 0 127 177 127 0 0 0 regulation 1 transfacHit Transfac Hits bed 6 . Transfac Transcription Factor Binding Sites Near Transcription Start 0 91 0 0 0 127 127 127 1 0 0 regulation 1 eponine Eponine TSS bed 4 + Eponine Predicted Transcription Start Sites 0 91.9 0 100 100 127 177 177 0 0 0

Description

\

\ The Eponine program provides a probabilistic method for detecting \ transcription start sites (TSS) in mammalian genomic sequence, with \ good specificity and excellent positional accuracy.

\ \

Methods

\

\ Eponine models consist of a set of DNA weight matrices recognizing\ specific sequence motifs. Each of these is associated with a position\ distribution relative to the TSS.

\ \

\ Eponine has been tested by comparing the output with annotated mRNAs\ from human chromosome 22. From this work, we estimate that using the\ default threshold (0.999) it detects >50% of transcription start\ sites with approximately 70% specificity. However, it does not always\ predict the direction of transcription correctly—an effect that\ seems to be common among computational TSS finders.

\ \

Credits

\

\ Thanks to Thomas Down at the \ Sanger Institute \ for providing the \ Eponine program (version 2, March 6, 2002) which was run \ at UCSC to produce this track.

\ \

References

\

\ Down TA, Hubbard TJP. \ \ Computational detection and location of transcription start sites \ in mammalian genomic DNA. \ Genome Res. 2002 Mar;12(3):458-61.

\ regulation 1 nibbImageProbes NIBB Frog Images psl xeno Xenopus Laevis In Situ mRNA Probes from NIBB 0 92 50 0 100 152 127 177 0 0 0 regulation 1 triangleSelf Golden Triangle bed 6 . Golden Triangle Possible Transcription Factor Binding Sites 0 92 0 0 0 127 127 127 1 0 0 regulation 1 esRegGeneToMotif Regulatory Module bed 6 + Eran Segal Regulatory Module 1 93 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows predicted transcription factor binding sites \ based on sequence similarities upstream of coordinately expressed genes.\

\ In dense display mode the gold areas indicate the extent of the area\ searched for binding sites; black boxes indicate the actual\ binding sites. In other modes the gold areas disappear and only\ the binding sites are displayed. Clicking on a particular predicted binding \ site displays a page that shows the sequence motif associated with the \ predicted transcription factor and the sequence at the predicted binding site.\ Where known motifs have been identified by this method, they are named;\ otherwise, they are assigned a motif number.\ \

Methods

\

\ This analysis was performed according to \ Genome-wide discovery of transcriptional modules from DNA \ sequence and gene expression on various pre-existing microarray datasets.\ A regulatory module is comprised of a set of genes predicted to be regulated \ by the same combination of DNA sequence motifs. The predictions are based on \ the co-expression of the set of genes in the module and on the appearance of\ common combinations of motifs in the upstream regions of genes assigned to\ the same module. \ \

Credits

\

\ Thanks to Eran Segal for providing the data analysis that forms the \ basis for this track. The display was programmed by \ Jim Kent.\ regulation 1 exonArrows off\ triangle Golden Extra bed 6 . Golden Triangle Motif Matching Sites Near Transcription Start 0 94 0 0 0 127 127 127 1 0 0 regulation 1 transfac Transfac Hits genePred refPep refMrna Transfac Hits 0 95 12 12 120 133 133 187 0 0 0 regulation 1 transfacRatios Transfac Ratios bed 6 . Transfac Likelihood Ratios 0 96 12 12 120 133 133 187 0 0 0 regulation 1 psuReg Known Regulatory bed 4 . Functional Regulatory Elements Compiled by Penn State 0 97 30 130 210 142 192 232 0 0 0

Regulatory Elements


\
\ This list of functional regions contains names and coordinates of the regulatory regions relative to the Decmber version of the Human Genome Browser. \
\ Note these regions have not been trimmed to show the smallest possible functional element with maximum activity. They range in size from 300-4000 bp. \
\

\

\ Details on source of Regulatory Region data
\ \

\ \ please direct comments or questions to Laura Elnitski at \ elnitski@bio.cse.psu.edu.\ \

\ April 16,2002\

\ Data made available by Laura Elnitski, Webb Miller, Ross Hardison, Scott Schwartz, Emmanouil Dermitzakis, Andrew Clark, William Krivan and Wyeth Wasserman\ regulation 1 oreganno ORegAnno bed 4 + Regulatory elements from ORegAnno 0 98 102 102 0 178 178 127 0 0 0

Description

\

\ This track displays literature-curated regulatory regions, transcription\ factor binding sites, and regulatory polymorphisms from\ ORegAnno (Open Regulatory Annotation). For more detailed\ information on a particular regulatory element, follow the link to ORegAnno\ from the details page. \ \

\ \

Display Conventions and Configuration

\

\ The display may be filtered to show only selected region types, such as\ regulatory regions, regulatory polymorphisms, or transcription factor\ binding sites. To exclude a region type, check the appropriate box\ in the "Exclude region type" list at the top of the Track Settings\ page. \

\ \

Methods

\

\ An ORegAnno record describes an experimentally proven and published regulatory\ region (promoter, enhancer, etc.), transcription factor binding site, or\ regulatory polymorphism. Each annotation must have the following attributes:\

    \
  • A stable ORegAnno identifier.\
  • A valid taxonomy ID from the NCBI taxonomy database.\
  • A valid PubMed reference. \
  • A target gene that is either user-defined, in Entrez Gene or in EnsEMBL.\
  • A sequence with at least 40 flanking bases (preferably more) to allow the\ site to be mapped to any release of an associated genome.\
  • At least one piece of specific experimental evidence, including the\ biological technique used to discover the regulatory sequence. (Currently\ only the evidence subtypes are supplied with the UCSC track.)\
  • A positive, neutral or negative outcome based on the experimental results\ from the primary reference. (Only records with a positive outcome are currently\ included in the UCSC track.)\
\ The following attributes are optionally included:\
    \
  • A transcription factor that is either user-defined, in Entrez Gene\ or in EnsEMBL.\
  • A specific cell type for each piece of experimental evidence, using the\ eVOC cell type ontology.\
  • A specific dataset identifier (e.g. the REDfly dataset) that allows\ external curators to manage particular annotation sets using ORegAnno's\ curation tools.\
  • A "search space" sequence that specifies the region that was\ assayed, not just the regulatory sequence. \
  • A dbSNP identifier and type of variant (germline, somatic or artificial)\ for regulatory polymorphisms.\
\ Mapping to genome coordinates is performed periodically to current genome\ builds by BLAST sequence alignment. \ The information provided in this track represents an abbreviated summary of the \ details for each ORegAnno record. Please visit the official ORegAnno entry \ (by clicking on the ORegAnno link on the details page of a specific regulatory\ element) for complete details such as evidence descriptions, comments,\ validation score history, etc.\

\ \

Credits

\

\ ORegAnno core team and principal contacts: Stephen Montgomery, Obi Griffith, \ and Steven Jones from Canada's Michael Smith Genome Sciences Centre, Vancouver, \ British Columbia, Canada.

\

\ The ORegAnno community (please see individual citations for various\ features): ORegAnno Citation.\ \

References

\

\ Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, \ Prychyna Y, Zhang X, Jones SJ. \ ORegAnno: an open access database and curation system for \ literature-derived promoters, transcription factor binding sites and \ regulatory variation.\ Bioinformatics. 2006 Mar 1;22(5):637-40.\

\ regulation 1 snpMap SNPs bed 4 . Simple Nucleotide Polymorphisms (SNPs) 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track consolidates all the Simple Nucleotide Polymorphisms \ into a single track.

\ \

Filtering

\

\ The SNPs in this track include all known polymorphisms that\ can be mapped against the current assembly. These include known point\ mutations (Single Nucleotide Polymorphisms), insertions, deletions,\ and segmental mutations from the current build of \ dbSnp, \ which is shown in the Genome Browser \ release log.\

\

\ There are three major cases that are not mapped and/or annotated:\

    \
  • \ Submissions that are completely masked as repetitive elements. \ These are dropped from any further computations. This set of\ reference SNPs is found in chromosome "rs_chMasked"\ on the dbSNP ftp\ site.\
  • \ Submissions that are defined in a cDNA context with extensive\ splicing. These SNPs are typically annotated on refSeq mRNAs through a\ separate annotation process. Effort is being made to reverse map these\ variations back to contig coordinates, but that has not been\ implemented. For now, you can find this set of variations in\ "rs_chNotOn" on the dbSNP ftp site.\
  • \ Submissions with excessive hits to the genome. Variations with 3+ hits\ to the genome are not included in the tracks, but are available in\ "rs_chMulti" on the dbSNP ftp site.\
\

\

The heuristics for the non-SNP variations (i.e. named elements and\ STRs) are quite conservative; therefore, some of these are probably lost. This\ approach was chosen to avoid false annotation of variation in\ inappropriate locations.

\ \

Supporting Details

\

\ Positional information can be found in the annotations section\ of the Genome Browser \ downloads page, \ which is organized by species and assembly. Non-positional information\ displayed on this page can be found in the \ shared\ data section of the same page, where it is split into tables by\ organism: \ dbSnpRsHg for Human, \ dbSnpRsMm for Mouse, and \ dbSnpRsRn for Rat.\ \

Credits

\

\ Thanks to NIH's dbSNP for providing the public data, which \ are available from dbSnp at the NCBI.

\ \ varRep 1 snp SNPs bed 6 + Simple Nucleotide Polymorphisms (SNPs) 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track consolidates all the Simple Nucleotide Polymorphisms (SNPs) into\ a single track. This represents data from dbSnp and commercially-available \ genotyping arrays.\

\

\ Please be aware that some mapping inconsistencies are known to exist in \ the dbSnp data set. If you encounter information that seems incorrect on \ the details page for a variant, we advise you to verify the record information\ on the dbSnp website using the provided link. In some\ known instances, the size of the variant does not match the size of its \ genomic location; UCSC is working with dbSnp to correct these errors in\ the data set. \

\

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\

\ When the start coordinate for a SNP is shown as chromStart = chromEnd+1 on \ the SNP's details page, this is generally not an \ error; rather, it indicates that the variant is an insertion at this genomic\ position. In these instances, the location type will be set to \ "between". Note that insertions are represented as chromStart = \ chromEnd in the snp table accessible from the Table Browser \ or downloads server, due to the half-open zero-based representation of\ data in the underlying database. \

\

\ The colors of variants in the display may be changed to highlight\ their source, molecule type, variant class, validation status, or\ functional classification. Variants can be excluded from the display\ based on these same criteria or if they fall below the\ user-specified minimum \ \ average heterozygosity. The track configuration options are\ located at the top of the SNPs track\ description page. By default variants are colored by functional\ classification, with SNPs likely to cause a phenotype in red\ (non-synonymous and splice site mutations).\

\

\ The following configuration categories reflect the following definitions defined\ in the document type definition (DTD) that describes the \ dbSnp XML format. \

    \
  • \ \ Source: Origin of this data
    \
      \
    • dbSnp - From the current build of dbSnp\
    • Affymetrix Genotyping Array 10K - SNPs on the commercial array\
    • Affymetrix Genotyping Array 10K v2 - SNPs on the commercial array\
    • Affymetrix Genotyping Array 50K HindIII - SNPs on the commercial array\
    • Affymetrix Genotyping Array 50K XbaI - SNPs on the commercial array\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Unknown - sample type not known\
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Mitochondrial - variant discovered using a mitochondrial template\
    • Chloroplast - variant discovered using a chloroplast template\
    \
  • \
  • \ \ Variant Class: Variant classification
    \
      \
    • Unknown - no classification provided by data contributor\
    • Single Nucleotide Polymorphism - single nucleotide \ \ variation: alleles of length = 1 and from set of {A,T,C,G}\
    • Insertion/deletion - insertion/deletion variation: alleles \ \ of different length or include '-' character\
    • Heterozygous - heterozygous (undetermined) variation: \ \ allele contains string '(heterozygous)'\
    • Microsatellite - microsatellite variation: allele string \ \ contains numbers and '(motif)' pattern\
    • Named - insertion/deletion of named object (length unknown)\
    • No Variation - no variation asserted for sequence\
    • Mixed - mixed class\
    • Multiple Nucleotide Polymorphism - alleles of the same \ \ length, length > 1, and from set of {A,T,C,G}\
    \
  • \
  • \ \ Validation Status: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • Unknown - no validation has been reported for this refSNP\
    • Other Population - at least one ss in cluster was validated\ \ by independent assay\
    • By Frequency - at least one subsnp in cluster has frequency\ \ data submitted\
    • By Cluster - cluster has 2+ submissions, with 1+ \ \ submissions assayed with a non-computational method\
    • By 2 Hit/2 Allele - all alleles have been observed in 2+ \ \ chromosomes\
    • By HapMap - validated by HapMap project\
    • By Genotype - at least one genotype reported for this refSNP\
    \
  • \
  • \ \ Function: Predicted functional role (each \ \ variant may have more than one functional role)
    \
      \
    • Unknown - no known functional classification\
    • Locus Region - variation in region of gene, but not in \ \ transcript\
    • Coding - variation in coding region of gene, assigned if \ \ allele-specific class unknown\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to contig seq\
    • Coding - Non-Synonymous - change in peptide with respect to\ \ contig sequence\
    • mRNA/UTR - variation in transcript, but not in coding \ \ region interval\
    • Intron - variation in intron, but not in first two or last \ \ two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron\
    • Reference - allele observed in reference contig sequence\
    • Exception - variation in coding region with exception \ \ raised on alignment. This occurs when protein with gap in sequence is \ \ aligned back to contig sequence. Variations that are on the 3' side \ \ of the gap have undefined functional inference.\
    \
  • \
  • \ \ Location Type: Describes how a segment of the reference assembly \ \ must be altered to represent the variant SNP allele
    \
      \
    • Unknown - undefined or error\
    • Range - a range of two or more bases in the reference \ \ assembly must be altered. This occurs, for example, when the variant\ \ allele is a deletion of two or more bases relative to the allele \ \ represented by the reference assembly.\
    • Exact - one base in the reference assembly must be altered.\ \ This occurs when the variant allele is a single-base substitution\ \ relative to the reference genome or when the variant allele is a \ \ deletion of a single base.\
    • Between - no reference assembly bases must be altered.\ \ This occurs when the variant allele is an insertion of one or more\ \ bases relative to the allele represented by the reference assembly.\
    \
  • \
\

\ \ \

Large Scale SNP Annotation at UCSF

\

\ LS-SNP is a database of functional and structural SNP annotations\ with links to protein structure models. Annotations are based on a\ variety of features extracted from protein structure, sequence, and\ evolution. Currently only coding non-synonomous SNPs are included.\ LS-SNP at UCSF.\

\ \

Data Filtering

\

\ The SNPs in this track include all known polymorphisms available in the\ current build of dbSnp that can be mapped against the current assembly. \ The version of dbSnp from which these data were obtained can be found in the\ SNP track entry in the Genome Browser \ release log.\

\

\ There are two reasons that some variants may not be mapped and/or\ annotated in this track:\

    \
  • \ Submissions are completely masked as repetitive elements.\ These are dropped from any further computations. This set of\ reference SNPs is found in chromosome "rs_chMasked" on\ the dbSNP\ ftp site.\
  • \
  • \ Submissions are defined in a cDNA context with extensive\ splicing. These SNPs are typically annotated on refSeq mRNAs\ through a separate annotation process. Effort is being made to\ reverse map these variations back to contig coordinates, but\ that has not been implemented. For now, you can find this set of\ variations in "rs_chNotOn" on the dbSNP ftp\ site. \
  • \
\

\

\ The heuristics for the non-SNP variations (i.e. named elements and\ short tandem repeats (STRs)) are quite conservative; therefore, some of \ these are probably lost. This approach was chosen to avoid false \ annotation of variation in inappropriate locations.\

\ \

Credits and Data Use Restrictions

\

\ Thanks to the SNP\ Consortium and NIH for providing the public data, which are\ available from dbSnp at NCBI.\

\

\ Thanks to Affymetrix, Inc. \ for developing the genotyping arrays. Please see the \ Terms and Conditions page on the Affymetrix\ website for restrictions on the use of their data.\ For more details on the Affymetrix genotyping assay, see the supplemental \ information on the \ Affymetrix 10K SNP and \ Affymetrix Genotyping Array products. Additional \ information, including genotyping data, is available on those pages.\

\

\ Karchin, R., Diekhans, M., Kelly, L., Thomas, D.J., Pieper, U., Eswar, N.,\ Haussler, D. and Sali, A.\ LS-SNP: large-scale annotation of coding non-synonymous SNPs based on \ multiple information sources. \ Bioinformatics 21:2814-2820; April 12, 2005.\

\ varRep 1 snp125 SNPs bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 125) 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains\ dbSNP\ build 125, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

    \ \
  • \ \ Location Type: Describes the alignment of the flanking sequence
    \
      \
    • Range - the flank alignments leave a gap of 2 or more bases in the reference assembly\
    • Exact - the flank alignments leave exactly one base between them\
    • Between - the flank alignments are contiguous; the variation is an insertion\
    • RangeInsertion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly; \ \ \ \ \ the submitted sequence is shorter\
    • RangeSubstitution - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence and the reference assembly sequence are of equal length\
    • RangeDeletion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence is longer\
    \
  • \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not in transcript\
    • Coding - Synonymous - no change in peptide for allele with respect to reference assembly\
    • Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly\
    • Untranslated - variation in transcript, but not in coding region interval\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of intron\
    • Reference - allele observed in a coding region of the reference sequence\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment count
    \
      \
    • Weight can be 1, 2, 3 or 10. \
    • Weight = 10 is excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    • Alignments to chrN_random are not included.\
    \
  • \
\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and 'deletion' categories, based on location type.\ The location types 'range' and 'exact' are deletions relative to the reference assembly.\ The location type 'between' indicates insertions relative to the reference assembly.\ For the new location types, the class 'in-del' is preserved.\ \

UCSC Annotations

\

\ In addition to presenting the dbSNP data, the following annotations are provided:\

    \
  • The size of the dbSNP reference allele is checked to see if it matches the coordinate\ span; exceptions are noted.
  • \
  • The dbSNP reference allele is compared to the UCSC reference allele, and a note is made\ if the dbSNP reference allele is the reverse complement of the UCSC reference allele.
  • \
  • Single-base substitutions are noted where the alignments of the\ flanking sequences are adjacent or have a gap of more than one base.
  • \
  • A note is made if the observed alleles are not available from the rs_fasta files.
  • \
  • Observed alleles with an unexpected format are noted.
  • \
  • The length of the observed alleles is checked for consistency with location types;\ exceptions are noted.
  • \
  • Single-base substitutions are checked to see that one of the observed alleles matches\ the reference allele; exceptions are noted.
  • \
  • Simple deletions are checked to see that the observed allele matches the reference allele;\ exceptions are noted.
  • \
  • Tri-allelic and quad-allelic single-base substitutions are noted.
  • \
  • Variants that have multiple mappings are noted.
  • \
\

\ \

Data Sources

\

\

    \
  • Coordinates, orientation, location type and dbSNP reference allele \ data were obtained from b125_SNPContigLoc.bcp.gz. \
  • b125_SNPMapInfo.bcp.gz provided the alignment weights; alignments with \ weight = 10 were filtered out.\
  • Functional classification information was obtained from b125_SNPContigLocusId.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for class, \ observed polymorphism and molecule type.\
\

\ \ \ \ varRep 1 snp126 SNPs (126) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 126) 1 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track contains\ dbSNP\ build 126, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

    \ \
  • \ \ Location Type: Describes the alignment of the flanking sequence
    \
      \
    • Range - the flank alignments leave a gap of 2 or more bases in the reference assembly\
    • Exact - the flank alignments leave exactly one base between them\
    • Between - the flank alignments are contiguous; the variation is an insertion\
    • RangeInsertion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly; \ \ \ \ \ the submitted sequence is shorter\
    • RangeSubstitution - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence and the reference assembly sequence are of equal length\
    • RangeDeletion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence is longer\
    \
  • \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not in transcript\
    • Coding - Synonymous - no change in peptide for allele with respect to reference assembly\
    • Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly\
    • Untranslated - variation in transcript, but not in coding region interval\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of intron\
    • Reference - allele observed in a coding region of the reference sequence\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and \ 'deletion' categories, based on location type. The location types 'range' and 'exact' are deletions\ relative to the reference assembly. The location type 'between' indicates \ insertions relative\ to the reference assembly. For the new location types, the class 'in-del' is preserved.

\ \

UCSC Annotations

\

\ In addition to presenting the dbSNP data, the following annotations are provided:\

    \
  • The dbSNP reference allele is compared to the UCSC reference allele, and a note is made if the \ dbSNP reference allele is the reverse complement of the UCSC reference allele.
  • \
  • Single-base substitutions where the alignments of the flanking sequences are adjacent \ or have a gap of more than one base are noted.
  • \
  • Observed alleles with an unexpected format are noted.
  • \
  • The length of observed alleles is checked for consistency with location types;\ exceptions are noted.
  • \
  • Single-base substitutions are checked to see that one of the observed alleles matches\ the reference allele; exceptions are noted.
  • \
  • Simple deletions are checked to see that the observed allele matches the reference allele;\ exceptions are noted.
  • \
  • Tri-allelic and quad-allelic single-base substitutions are noted.
  • \
  • Variants that have multiple mappings are noted.
  • \
\

\ \

Data Sources

\

\

    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b126_SNPContigLoc_36_1.bcp.gz. \
  • b126_SNPMapInfo_36_1.bcp.gz provided the alignment weights; alignments with\ weight = 0 or weight = 10 were filtered out.\
  • Class and observed polymorphism were obtained from the shared UniVariation.bcp.gz,\ using the univar_id from SNP.bcp.gz as an index.\
  • Functional classification was obtained from b126_SNPContigLocusId_36_1.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type.\
\

\ \

Orthologous Alleles (human only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • locType = 'exact'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end \ position to 0 (zero).\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation. .\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\ \ varRep 1 snp127 SNPs (127) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 127) 1 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track contains\ dbSNP\ build 127, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

    \ \
  • \ \ Location Type: Describes the alignment of the flanking sequence
    \
      \
    • Range - the flank alignments leave a gap of 2 or more bases in the reference assembly\
    • Exact - the flank alignments leave exactly one base between them\
    • Between - the flank alignments are contiguous; the variation is an insertion\
    • RangeInsertion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly; \ \ \ \ \ the submitted sequence is shorter\
    • RangeSubstitution - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence and the reference assembly sequence are of equal length\
    • RangeDeletion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence is longer\
    \
  • \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not in transcript\
    • Coding - Synonymous - no change in peptide for allele with respect to reference assembly\
    • Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly\
    • Untranslated - variation in transcript, but not in coding region interval\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of intron\
    • Reference - allele observed in a coding region of the reference sequence\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment count
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and \ 'deletion' categories, based on location type. The location types 'range' and 'exact' are deletions\ relative to the reference assembly. The location type 'between' indicates \ insertions relative\ to the reference assembly. For the new location types, the class 'in-del' is preserved.

\ \

UCSC Annotations

\

\ In addition to presenting the dbSNP data, the following annotations are provided:\

    \
  • The dbSNP reference allele is compared to the UCSC reference allele, and a note is made if the \ dbSNP reference allele is the reverse complement of the UCSC reference allele.
  • \
  • Single-base substitutions where the alignments of the flanking sequences are adjacent \ or have a gap of more than one base are noted.
  • \
  • Observed alleles with an unexpected format are noted.
  • \
  • The length of observed alleles is checked for consistency with location types;\ exceptions are noted.
  • \
  • Single-base substitutions are checked to see that one of the observed alleles matches\ the reference allele; exceptions are noted.
  • \
  • Simple deletions are checked to see that the observed allele matches the reference allele;\ exceptions are noted.
  • \
  • Tri-allelic and quad-allelic single-base substitutions are noted.
  • \
  • Variants that have multiple mappings are noted.
  • \
\

\ \

Data Sources

\

\

    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b127_SNPContigLoc_36_2.bcp.gz. \
  • b127_SNPMapInfo_36_2.bcp.gz provided the alignment weights; alignments with\ weight = 0 or weight = 10 were filtered out.\
  • Class and observed polymorphism were obtained from the shared UniVariation.bcp.gz,\ using the univar_id from SNP.bcp.gz as an index.\
  • Functional classification was obtained from b127_SNPContigLocusId_36_2.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type.\
\

\ \

Orthologous Alleles (human only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • locType = 'exact'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end \ position to 0 (zero).\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation. .\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\ \ varRep 1 phastConsElements Most Conserved bed 5 . PhastCons Conserved Elements 0 105 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows predictions of conserved elements produced by the phastCons\ program. PhastCons is part of the PHAST (PHylogenetic Analysis with \ Space/Time models) package. The predictions are based on a phylogenetic hidden \ Markov model (phylo-HMM), a type of probabilistic model that describes both \ the process of DNA substitution at each site in a genome and the way this \ process changes from one site to the next.

\ \

Methods

\

\ Best-in-genome pairwise alignments were generated for\ each species using blastz, followed by chaining and netting. A multiple\ alignment was then constructed from these pairwise alignments using multiz.\ Predictions of conserved elements were then obtained by running phastCons\ on the multiple alignments with the --most-conserved option.

\

\ PhastCons constructs a two-state phylo-HMM with a state for conserved\ regions and a state for non-conserved regions. The two states share a\ single phylogenetic model, except that the branch lengths of the tree\ associated with the conserved state are multiplied by a constant scaling\ factor rho (0 <= rho <= 1). The free parameters of the\ phylo-HMM, including the scaling factor rho, are estimated from\ the data by maximum likelihood using an EM algorithm. This procedure is\ subject to certain constraints on the "coverage" of the genome by conserved\ elements and the "smoothness" of the conservation scores. Details can be\ found in Siepel et al. (2005).

\

\ The predicted conserved elements are segments of the alignment that are\ likely to have been "generated" by the conserved state of the phylo-HMM.\ Each element is assigned a log-odds score equal to its log probability\ under the conserved model minus its log probability under the non-conserved\ model. The "score" field associated with this track contains transformed\ log-odds scores, taking values between 0 and 1000. (The scores are\ transformed using a monotonic function of the form a * log(x) + b.) The\ raw log odds scores are retained in the "name" field and can be seen on the\ details page or in the browser when the track's display mode is set to\ "pack" or "full".

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

    \
  • \ Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group. \
  • \ AxtBest, axtChain, chainNet, netSyntenic, and netClass\ by Jim Kent at UCSC. \
  • PhastCons by Adam Siepel at Cornell University. \
\

\ \

References

\ \

PhastCons

\

\ Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, \ K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., \ Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. \ Evolutionarily conserved elements in vertebrate, insect, worm, \ and yeast genomes.\ Genome Res. 15, 1034-1050 (2005).

\ \

Chain/Net

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\ \

Multiz

\

\ Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., \ Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., \ Haussler, D., Miller, W. \ Aligning multiple genomic sequences with the threaded blockset\ aligner.\ Genome Res. 14(4), 708-15 (2004).

\ \

Blastz

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 exonArrows off\ showTopScorers 200\ phastConsElementsPaper Most Cons. (Std) bed 5 . PhastCons Conserved Elements, Standardized Across Species 0 105.1 0 0 0 127 127 127 1 0 0 compGeno 1 exonArrows off\ showTopScorers 200\ phastConsTopPaper phastCons HCE bed 5 . PhastCons Highly Conserved Elements (HCEs) 0 109.25 0 100 0 127 177 127 0 0 0 compGeno 1 HMRConservation HMRConservation sample 0 .466217 Mouse/Human/Rat Evolutionary Conservation Score 0 110 100 50 0 175 150 128 0 0 0 compGeno 0 exoFish Exofish Ecores bed 5 . Exofish Tetraodon/Human Evolutionarily Conserved Regions 1 111 0 60 120 200 220 255 1 0 0

Description

\

The Exofish track shows regions of homology with the \ pufferfish Tetraodon nigroviridis. \ exofish@genoscope.cns.fr. The following paper describes \ Exofish: 'Estimate of human gene number provided by \ genome-wide analysis using Tetraodon nigroviridis \ DNA sequence' Nature Genetics volume 25 page 235, \ June 2000.

\

Credits

\ This information \ was provided by Olivier Jaillon and Hugues Roest Crollius at Genoscope. \ For further information and other Exofish tools please visit the \ \ Genoscope Exofish web site, or \ email exofish@genoscope.cns.fr\ \ compGeno 1 blatFish Tetraodon Blat psl xeno Tetraodon nigroviridis Translated Blat Alignments 1 112 0 60 120 200 220 255 1 0 0

Description

\

\ This track displays translated alignments of 728 million bases of \ Tetraodon whole genome shotgun reads vs. the $organism genome. \ Areas highlighted by this track are quite likely to be coding regions.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring two \ nearby 4-mer matches to trigger a detailed alignment. The human genome was \ masked with RepeatMasker and Tandem Repeat Finder before running blat.

\ \

Credits

\

\ Many thanks to Genoscope for providing the Tetraodon sequence.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 blatFr1 Fugu Blat psl xeno fr1 $o_Organism ($o_date/$o_db) Translated Blat Alignments 0 113 0 60 120 200 220 255 1 0 0

Description

\

\ This track shows blat translated protein alignments of the Fugu \ ($o_date/$o_db) genome assembly to the $organism genome. The \ v3.0 Fugu whole genome shotgun assembly was provided by the\ US \ DOE Joint Genome Institute (JGI). \

\

\ The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --; nor is +- the same\ as -+.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring two \ nearby 4-mer matches to trigger a detailed alignment. The $organism\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running blat.

\ \

Credits

\

\ The \ \ 3.0 draft from JGI was used in the\ UCSC Fugu blat alignments. These data were provided freely by the JGI\ for use in this publication only.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 colorChromDefault off\ otherDb fr1\ blatCioSav1 C. savignyi Blat psl xeno Ciona savignyi Translated Blat Alignments 1 113 0 60 120 200 220 255 1 0 0 compGeno 1 blatTetra Tetra Blat psl xeno Tetraodon nigroviridis Translated Blat Alignments 1 114 0 60 120 200 220 255 1 0 0

Description

\

\ This track displays translated alignments of 728 million bases of \ Tetraodon whole genome shotgun reads vs. the draft \ $organism genome. Areas highlighted by this track are quite likely to be \ coding regions.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring two \ nearby 4-mer matches to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and \ Tandem \ Repeats Finder before running blat.

\ \

Credits

\

\ Many thanks to Genoscope for providing the Tetraodon sequence.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 tet_waba Tetraodon Tetraodon nigroviridis Homologies 0 115 50 100 200 85 170 225 0 0 0 compGeno 0 blatChicken Chicken Blat psl xeno Chicken Translated Blat Alignments 0 116 100 50 0 255 240 200 1 0 0 compGeno 1 netSyntenyCi1 C.intestinalis Synteny netAlign ci1 chainCi1 $o_Organism ($o_date/$o_db) Syntenic Alignment Net 0 124.2 0 100 0 255 240 200 0 0 0 compGeno 0 otherDb ci1\ chainHg16ProtEx chainHg16ProtEx chain hg16 chainHg16ProtEx 0 125 100 50 0 255 240 200 1 0 0 x 1 otherDb hg16\ chainCi1 C. intestinalis chain chain ci1 C. intestinalis chain 0 125 100 50 0 255 240 200 1 0 0 compGeno 1 otherDb ci1\ syntenyHuman Human Synteny bed 4 + Human/Mouse Synteny Using Blastz Single Coverage (100k window) 0 127 0 100 0 255 240 200 0 0 0

Description

\

\ This track shows syntenous (corresponding) regions between human and mouse chromosomes. \

Methods

\

\ We passed a 100k non-overlapping window over the genome and - using the blastz best in mouse \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in mouse. 100k segments were joined together if they agreed in direction and\ were within 500kb of each other in the human genome and within 4mb of each other in the mouse. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and mouse location). Finally, we extended the syntenic block to include those \ areas.

\

Credits

\

\ Contact Robert \ Baertsch at UCSC for more information about this track.\ Thanks to the Mouse Genome Sequencing Consortium for providing the mouse sequence data. \ compGeno 1 BRout BRout psl xeno BRout 0 128 0 0 0 127 127 127 1 0 0 x 1 tblastHg16 tblastHg16 psl xeno tblastHg16 (Hg16 Known Genes tblastn ci1) 0 128 0 0 0 127 127 127 1 0 0 x 1 chainDm2 D. mel. Chain chain dm2 $o_Organism ($o_date/$o_db) Chained Alignments 0 133 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows $o_organism/$organism genomic alignments using\ a gap scoring system that allows longer gaps than traditional\ affine gap scoring systems. It can also tolerate gaps in both $o_organism \ and $organism simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species. The $o_organism sequence is from the $o_date ($o_db)\ assembly.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ $o_organism assembly or an insertion in the $organism assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where there are multiple \ chains over a particular portion of the $organism genome, chains with \ single-lined gaps are often due to processed pseudogenes, while chains \ with double-lined gaps are more often due to paralogs and unprocessed \ pseudogenes. In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and \ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed, and the resulting abbreviated genomes were\ aligned with blastz. The transposons were then put back into the\ alignments. The resulting alignments were converted into axt format\ and the resulting axts fed into axtChain. AxtChain organizes all the \ alignments between a single $o_organism and a single $organism chromosome\ into a group and makes a kd-tree out of all the gapless subsections\ (blocks) of the alignments. Next, maximally scoring chains of these\ blocks were found by running a dynamic program over the kd-tree. Chains\ scoring below a threshold were discarded; the remaining chains are\ displayed here.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ program RepeatMasker.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 otherDb dm2\ chainDm1 (dm1) D. mel. Chain chain dm1 $o_Organism ($o_date/$o_db) Chained Alignments 0 133 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows $o_organism/$organism genomic alignments using\ a gap scoring system that allows longer gaps than traditional\ affine gap scoring systems. It can also tolerate gaps in both $o_organism \ and $organism simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species. The $o_organism sequence is from the $o_date ($o_db)\ assembly.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ $o_organism assembly or an insertion in the $organism assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where there are multiple \ chains over a particular portion of the $organism genome, chains with \ single-lined gaps are often due to processed pseudogenes, while chains \ with double-lined gaps are more often due to paralogs and unprocessed \ pseudogenes. In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and \ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed, and the resulting abbreviated genomes were\ aligned with blastz. The transposons were then put back into the\ alignments. The resulting alignments were converted into axt format\ and the resulting axts fed into axtChain. AxtChain organizes all the \ alignments between a single $o_organism and a single $organism chromosome\ into a group and makes a kd-tree out of all the gapless subsections\ (blocks) of the alignments. Next, maximally scoring chains of these\ blocks were found by running a dynamic program over the kd-tree. Chains\ scoring below a threshold were discarded; the remaining chains are\ displayed here.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ program RepeatMasker.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 otherDb dm1\ ratChain $o_Organism Chain chain rn2 Chained Rat/$Organism Alignments 0 133 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows rat/$organism genomic alignments using\ a gap scoring system that allows longer gaps than traditional\ affine gap scoring systems. It can also tolerate gaps\ in both rat and $organism simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ $o_organism assembly or an insertion in the $organism assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where there are multiple \ chains over a particular portion of the $organism genome, chains with \ single-lined gaps often are due to processed pseudogenes, while chains \ with double-lined gaps are more often due to paralogs and non-prodessed \ pseudogenes.

\ \

Methods

\

\ Transposons that have been inserted since the rat/$organism\ split were removed, and the resulting abbreviated genomes were\ aligned with blastz. The transposons were then put back into the\ alignments. The resulting alignments were converted into axt format\ and the resulting axts were fed into axtChain. AxtChain organizes all \ the alignments between a single rat and a single $organism chromosome\ into a group and creates a kd-tree out of all the gapless subsections\ (blocks) of the alignments. Next, maximally scoring chains of these\ blocks were found by running a dynamic program over the kd-tree. Chains\ scoring below a threshold were discarded; the remaining chains are\ displayed here.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 otherDb rn2\ netDm2 (dm2) D. mel. Net netAlign dm2 chainDm2 $o_Organism ($o_date/$o_db) Alignment Net 0 134 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_Organism/$Organism \ chain for every part of the $Organism genome. It is useful for\ finding orthologous regions and for studying genome rearrangement.\ The $o_organism sequence used in this annotation is \ from the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb dm2\ netDm1 (dm1) D. mel. Net netAlign dm1 chainDm1 $o_Organism ($o_date/$o_db) Alignment Net 1 134 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_Organism/$Organism \ chain for every part of the $Organism genome. It is useful for\ finding orthologous regions and for studying genome rearrangement.\ The $o_organism sequence used in this annotation is \ from the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb dm1\ netCi1 C.intestinalis Net netAlign ci1 chainCi1 $o_Organism ($o_date/$o_db) Alignment Net 1 134 0 0 0 127 127 127 1 0 0 compGeno 0 otherDb ci1\ blatCi1 Squirt Blat psl xeno Ciona intestinalis Translated Blat Alignments 0 135 0 60 120 200 220 255 1 0 0

Description

\

\ The Ciona genome shotgun assembly was constructed with the DOE Joint \ Genome Institute (JGI) assembler, JAZZ, paired end sequencing reads at a \ coverage of 8.2X produced at the JGI. The assembly contains 116.7 million base\ pairs of nonrepetitive sequence in 2,501 scaffolds greater than 3 kb. Half of \ this (60 Mbp) is assembled into 117 scaffolds longer than 190 Kbp; 85% of the \ assembly (104.1 Mbp) is found in 905 scaffolds longer than 20 kb. Gene \ modeling and analysis were performed at the JGI.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring \ two nearby 4-mer matches to trigger a detailed alignment.

\ \

Credits

\

\ These data were freely provided by the \ JGI\ for use in this publication/correspondence only.

\

\ The 1.0 draft from \ http://genome.jgi-psf.org/ciona4/ciona4.info.htm was used \ in these alignments.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 chainBraFlo1 $o_Organism Chain chain braFlo1 $o_Organism ($o_date/$o_db) Chained Alignments 0 137 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz using dynmamic masking, and the transposons were then \ added back in. The resulting alignments were converted into psl format \ using the lavToPsl program. The axt alignments were fed into axtChain, which \ organizes all alignments between a single $o_organism chromosome and a \ single $organism chromosome into a group and creates a kd-tree out of the \ gapless subsections (blocks) of the alignments. A dynamic program was then \ run over the kd-trees to find the maximally scoring chains of these blocks.\ \ $matrix\ \ Chains scoring below a threshold of 2000 were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Repeat areas we marked in the genome with WindowMasker as\ developed by: Morgulis A, Gertz EM, Schäffer AA, Argarwala R.\

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\

\ Morgulis A, Gertz EM, Schäffer AA, Argarwala R.\ WindowMasker: window-based masker for sequenced genomes. \ Bioinformatics 2006 Jan 15;22(2):134-41. Epub 2005 Nov 15

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ otherDb braFlo1\ mouseSyn NCBI Synteny bed 4 + Corresponding Chromosome in Mouse (NCBI) 0 137 120 70 30 187 162 142 0 0 0

Description

\

This track shows syntenous (corresponding) regions between human and mouse\ chromosomes.

\

Method

\

This track was created by looking for homology to known mouse genes in the draft \ assembly. The mouse data are provided at the chromosome level (not cytoband).

\

Credits

\

The data for this track were kindly provided by Deanna Church at NCBI. Refer to the \ NCBI Homology site for more\ details.

\ \

Credits

\

This track is produced from mouse sequence data provided by the \ Mouse Genome Sequencing Consortium. \ compGeno 1 netBraFlo1 $o_Organism Net netAlign braFlo1 chainBraFlo1 $o_Organism ($o_date/$o_db) Alignment Net 0 138 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA. 2003;100(20):11484-11489.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb braFlo1\ mouseSynWhd Mouse Synteny bed 6 + Whitehead Corresponding Chromosome in Mouse (300k window) 0 139 120 70 30 187 162 142 0 0 0

Description

\

\ This track shows orthologous (syntenic) regions between mouse and human\ chromosomes.\

\ See \ \ http://www-genome.wi.mit.edu/mouse/synteny/index.html \ for genomic dotplots and additional information or the following site for\ an alternative synteny map based on orthologous genes:\ \ http://www.ncbi.nlm.nih.gov/Homology/ .\ \

Credits

\

\ The data for this track are kindly provided by \ Michael Kamal \ at the \ \ Whitehead Institute. \ Mouse sequence data is provided by the \ Mouse Genome Sequencing Consortium. \ \ compGeno 1 chainFr2 $o_Organism Chain chain fr2 $o_Organism ($o_date/$o_db) Chained Alignments 0 139.8 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_Organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_Organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_Organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_Organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_Organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ $matrix \ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb fr2\ netFr2 $o_Organism Net netAlign fr2 chainFr2 $o_Organism ($o_date/$o_db) Alignment Net 0 139.9 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_Organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_Organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb fr2\ chainFr1 $o_db Chain chain fr1 $o_Organism ($o_date/$o_db) Chained Alignments 0 140 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_Organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_Organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_Organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_Organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_Organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb fr1\ syntenyRat Rat Synteny bed 4 + $Organism/Rat Synteny Using Blastz Single Coverage (100k window) 0 140 0 100 0 255 240 200 0 0 0

Description

\

\ This track shows syntenous (corresponding) regions between $organism and rat chromosomes. \

Methods

\

\ We passed a 100k non-overlapping window over the genome and using the Blastz best in rat \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in rat. 100k segments were joined together if they agreed in direction and\ were within 500kb of each other in the $organism genome and within 4mb of each other in the rat. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and rat location). Finally, we extended the syntenic block to include those \ areas.

\

Credits

\

\ Contact Robert \ Baertsch at UCSC for more information about this track.\ compGeno 1 netFr1 $o_db Net netAlign fr1 chainFr1 $o_Organism ($o_date/$o_db) Alignment Net 0 140.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_Organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_Organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb fr1\ blatChimpWashu Chimp Blat - WashU psl xeno Chimp Blat Alignments - WashU 0 141 100 50 0 255 240 200 1 0 0 compGeno 1 blatHg16KG Human knownGene BLAT psl protein Human knownGene BLAT 0 142 0 0 0 127 127 127 0 0 0 compGeno 1 colorChromDefault off\ chimp Chimp sample Chimp Sample Track 0 142 100 50 0 0 0 255 0 0 1 chr7, compGeno 0 rgdSslp RGD SSLP bed 4 . Rat Genome Database Simple Sequence Length Polymorphisms 0 144.5 12 12 120 133 133 187 0 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ Simple sequence-length polymorphisms (SSLPs) are \ also known as microsatellite DNA. SSLPs consist of 1 - 6 simple nucleotide \ repeat sequences that are highly polymorphic in repeat length among strains. \ They are often used as genetic markers for genotyping.

\ \

Methods

\

\ The annotation data file, \ RGD_SSLP.gff, was downloaded from the Rat Genome Database\ (RGD) website and processed to create this track.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

\ \ varRep 1 genomicSuperDups Segmental Dups bed 6 . Duplications of >1000 Bases of Non-RepeatMasked Sequence 0 146 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows regions detected as putative genomic duplications within the\ golden path. The following display conventions are used to distinguish\ levels of similarity:\

    \
  • \ Light to dark gray: 90 - 98% similarity\
  • \ Light to dark yellow: 98 - 99% similarity\
  • \ Light to dark orange: greater than 99% similarity \
  • \ Red: duplications of greater than 98% similarity that lack sufficient \ Segmental Duplication Database evidence (most likely missed overlaps) \
\ For a region to be included in the track, at least 1 Kb of the total \ sequence (containing at least 500 bp of non-RepeatMasked sequence) had to \ align and a sequence identity of at least 90% was required.

\ \

Methods

\

\ For a description of the 'fuguization' detection method, see Bailey, J.A. \ et. al. (2001) in the References section below.\ \

Credits

\

\ These data were provided by \ Xinwei She \ and Evan Eichler \ at the University of Washington.

\ \

References

\

\ Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. and Eichler, E.E.\ Segmental duplications: organization and impact within the \ current human genome project assembly.\ Genome Res 11, 1005-17 (2001). \ varRep 1 noScoreFilter .\ lineageMutations LineageMutations sample Lineage Specific Mutations 0 147 0 0 0 0 160 0 0 0 0 varRep 0 olly25 Cross-hyb3 25 sample 0 5 0.199 Cross-hybridization Counts for Off-by-3 25-mers 0 148.5 0 0 0 127 127 127 0 0 0

Description

\

This track shows the number of 25-mers in the genome that\ are the same as the 25-mer centered at the current position \ with up to three mismatches allowed. The current position is\ included. This track is empty over areas masked by RepeatMasker\ or trf at period 12 or less. It is best to design microarray\ probes and PCR primers where the count in this track is only\ one to avoid cross-hybridization.

\ \

For best results view this track with Interpolation set to Only Samples.

\ \

Credits

\ This track was computed with the program 'olly' at the default settings.\ Olly was created by Jim Kent.\ x 0 olly2 Cross-hyb2 25 sample 0 5 0.199 Cross-hybridization for Off-by-2 25-mers 0 148.6 0 0 0 127 127 127 0 0 0

Description

\

This track shows the number of 25-mers in the genome that\ are the same as the 25-mer centered at the current position \ with up to two mismatches allowed. The current position is\ included. This track is empty over areas masked by RepeatMasker\ or trf at period 12 or less. It is best to design microarray\ probes and PCR primers where the count in this track is only\ one to avoid cross-hybridization.

\ \

For best results view this track with Interpolation set to Only Samples.

\ \

Credits

\ This track was computed with the program 'olly' at the default settings.\ Olly was created by Jim Kent.\ x 0 gpcr Gpcr genePred Gpcr from softberry and Rachel Karchin's HMM 0 149 0 0 0 127 127 127 0 0 0 x 1 rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 149.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the \ query sequence, as well as a modified version of the query sequence \ in which all the annotated repeats have been masked. RepeatMasker uses \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ RepBase is described in Jurka, J. (2000) in the References section below.

\ \

Display Conventions and Configuration

\

\ In full display mode, this track displays nine different classes of repeats:\

    \
  • Short interspersed nuclear elements (SINEs), which include ALUs\
  • Long interspersed nuclear elements (LINEs)\
  • Long terminal repeat elements (LTRs), which include retroposons\
  • DNA repeat elements (DNA)\
  • Simple repeats (micro-satellites)\
  • Low complexity repeats\
  • Satellite repeats\
  • tRNA repeats\
  • Other repeats\

\

\ The level of color shading in the graphical display reflects the amount of \ base mismatch, base deletion, and base insertion associated with a repeat \ element. The higher the combined number of these, the lighter the shading.

\ \

Methods

\

\ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

\

\ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may \ extend through repeats, but are not permitted to initiate in them. \ See the \ FAQ for \ more information.

\ \

Credits

\

\ Thanks to Arian Smit and GIRI\ for providing the tools and repeat libraries used to generate this track.

\ \

References

\

\ RepBase is described in \ Jurka, J. \ Repbase update: a database and an electronic journal of \ repetitive elements. \ Trends Genet. 16(9), 418-420 (2000).

\ \ varRep 0 reconRepeat Recon Repeats bed 4 + Repeats Determined with Recon 0 149.2 0 0 0 127 127 127 0 0 0 varRep 1 rmskCensor CENSOR Repeats rmsk Repeating Elements by CENSOR and RepBase 11.6 (Giri Ihstitute) 0 149.2 0 0 0 127 127 127 1 0 0 varRep 0 windowmasker WindowMasker bed 3 Genomic Intervals Masked by WindowMasker 0 149.25 0 0 0 127 127 127 0 0 0 varRep 1 windowmaskerSdust WM + SDust bed 3 Genomic Intervals Masked by WindowMasker + SDust 0 149.26 0 0 0 127 127 127 0 0 0

Description

\ This track depicts masked sequence as determined by WindowMasker. The \ WindowMasker tool is included in the NCBI C++ toolkit. The source code \ for the entire toolkit is available \ here.\ \

Methods

\ WindowMasker was operated with the following parameters:\
\
windowmasker -mk_counts true -input $db.fa -output wm_counts\
windowmasker -ustat wm_counts -sdust true -input $db.fa -output repeats.bed\
\ The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for\ this track.\ \

References

\

\ Morgulis A, Gertz EM, Schäffer AA, Argarwala R.\ WindowMasker: window-based masker for sequenced genomes. \ Bioinformatics 2006 Jan 15;22(2):134-41. Epub 2005 Nov 15

\ varRep 1 simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 149.3 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays simple tandem repeats (possibly imperfect) located\ by Tandem Repeats\ Finder (TRF), which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

\ \

Methods

\

\ For more information about the TRF program, see Benson (1999).\

\ \

Credits

\

\ TRF was written by \ Gary Benson.

\ \

References

\

\ Benson G. \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.

\ varRep 1 microsat Microsatellite bed 4 Microsatellites - Di-nucleotide and Tri-nucleotide Repeats 0 149.4 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays regions that are likely to be useful as microsatellite\ markers. These are sequences of at least 15 perfect di-nucleotide and \ tri-nucleotide repeats, and tend to be highly polymorphic in the\ population.\

\ \

Methods

\

\ The data shown in this track are a subset of the Simple Repeats track, \ selecting only those \ repeats of period 2 and 3, with 100% identity and no indels, and with\ at least 15 copies of the repeat. The Simple Repeats track is\ created using the Tandem Repeats Finder. For more information about this \ program, see Benson (1999).

\ \

Credits

\

\ Tandem Repeats Finder was written by \ Gary Benson.

\ \

References

\

\ Benson G. \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.

\ varRep 1 chainTetNig1 $o_Organism Chain chain tetNig1 $o_Organism ($o_date/$o_db) Chained Alignments 0 150 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_Organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_Organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_Organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_Organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_Organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. \ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 1 otherDb tetNig1\ blatFugu Fugu Blat psl xeno Takifugu rubripes Translated Blat Alignments 0 150 0 60 120 200 220 255 1 0 0

Description

\

\ The Fugu v.3.0 whole genome shotgun assembly was provided by the\ US DOE Joint \ Genome Institute (JGI). The assembly was constructed with the JGI\ assembler, JAZZ, from paired end sequencing reads produced at JGI, Myriad \ Genetics, and Celera Genomics, resulting in a sequence coverage of 5.7X. All \ reads are plasmid, cosmid, or BAC end-sequences, with the predominant coverage\ derived from 2 Kb insert plasmids. This assembly contains 20,379\ scaffolds totaling 319 million base pairs. The largest 679 scaffolds\ total 160 million base pairs.

\

\ The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --; nor is +- the same\ as -+.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring two\ nearby 4-mer matches to trigger a detailed alignment. The $organism\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running blat.

\ \

Credits

\

\ The 3.0 draft from the\ \ JGI Fugu rubripes website was used in the\ UCSC Genome Browser Fugu blat alignments. These data were freely provided \ by the JGI for use in this publication only.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 netTetNig1 $o_Organism Net netAlign tetNig1 chainTetNig1 $o_Organism ($o_date/$o_db) Alignment Net 0 150.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_Organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_Organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb tetNig1\ loweProbes Lowe's Probes bed 6 . Candidate Oligos for Stanford Microarray 0 151 0 0 0 127 127 127 0 0 1 chr22,

Candidate Oligos for every Stanford Oligo Chip track

\

\ Oligos were chosen for every Sanger22 annotation on chr22 as\ well as about 2000 other genes. Two oligos were chosen with\ a 3' bias, two with a 5' bias, and two with no bias. For this\ purpose exons are defined to include 3' and 5' UTRs.

\ \

The strategy

\

\ These oligo selections are based on the following ideas:\

    \
  • Oligos should have minimum secondary structure as\ they must be available for hybridization.
  • \
  • Oligos should be unique in genome if possible. No\ repeats, should not Blat or Blast other places in genome.
  • \
  • If using oligo-dT for RT-Priming oligos should be in 3' end\ of gene transcript (including UTR).
  • \
  • Oligos should have a uniform hybridization temperature if\ possible. All oligos must be hybridized at same temperature,\ want to minimize cross hybe yet maximize signal.\

\

\ Currently we don't have data to identify which parameters\ are more important than others. Also, some of these scores\ are overlapping (i.e. if tm is limited then high secondary\ structure is less likely). See below for histograms of these\ criteria.

\ \
\ \

The Details:

\ \

The Algorithm

\

\

    \
  • Step through each exon at a step size proportional to the\ size of the exon examining possible oligos, excluding areas that\ are RepeatMasked.
  • \
  • Score each oligos for: Tm difference, distance from 3' end,\ secondary structure, and an Affymetrix heuristic.
  • \
  • Look through candidate probes remembering the maximum\ score for each score.
  • \
  • Each score is then normalized by dividing by the maximum\ and then the normalized scores are combined as an average and oligos\ are sorted to find the best overall score.
  • \
  • Oligos with the best combined normalized scores are blatted\ until one is found that has a blat score below a given \ threshold.
  • \
  • As oligos are chosen, candidate oligos that overlap those\ already chosen are discarded.
  • \
  • If no scores pass the blat score or not enough oligos have been\ chosen just pick oligos that have the best combined score.
  • \
\

\

About the scores:

\

\

    \
  • Tm: Formulas for calculating Tm taken from: "A unified\ view of polymer, dumbbell, and oligonucleotide DNA\ nearest-neighbor thermodynamics" John SantaLucia, Jr. PNAS, Vol\ 95, pp 1460-1465 February 1998.
  • A web version called \ Hyther exists.\
  • Secondary Structure: Calculates the Gibbs free energy of the\ best secondary structure using libraries from the RNAstructure program.
  • \
  • Affy Heuristic: 1 if oligo passes heuristics derived from that published by Affymetrix \ "Nature Biotechnology" vol. 14, Dec, '96) are satisfied, 0 otherwise. The heuristic\ is as follows:\
    \
       no more than 9 A's in window of 20 \
       no more than 9 T's in window of 20\
       no more than 8 C's in window of 20\
       no more than 8 G's in window of 20\
      \
       no more than 6 A's in window of  8\
       no more than 6 T's in window of  8\
       no more than 5 C's in window of  8 \
       no more than 5 G's in window of  8\
    
    \
  • \
  • 3' Dist: Distance from end of oligo to 3' end of target\ sequence.
  • \
  • Blat Score: Blat score of second most homologous region\ in the genome. If no inserts this is approximately the number of\ base pairs that match.
  • \
\

\ \

Histograms of Scores

\

\ Histograms are from the Stanford picked gene set.\ \ \ \ \ \ \ \ \ \

\ Secondary structure measured in Gibb's Free energy, higher scores are better.

\ Blat (similar to blast) histogram, lower scores are better.

\ Melting temperatures, scores over 100C do happen in algorithm.

\ Percentage GC, not used in algorithm but presented anyway.
\

\

Please note that all coordinates are relative to the '+' strand\ while all oligo sequences are 5'->3'. This means that all sequences\ displayed are part of the sense strand. So if the oligo is represented\ in the database as being on the '-' strand and starts at 1 and ends at\ 5 of 'atgcatgc' the '+' sequence of the probe would be 'tgcat' but\ that is 3'->5' on the '-' strand so the sequence in the sequence would\ be the reverse complement 'atgct'.

\ x 1 exoMouse Exonerate Mouse bed 6 + Mouse/Human Evolutionarily Conserved Regions (Exonerate) 0 152 100 50 0 255 240 200 1 0 0

The Exonerate mouse shows regions of homology with the\ mouse based on Exonerate alignments of mouse random reads\ with the human genome. The data for this track were kindly provided by\ Guy Slater, Michele Clamp, and Ewan Birney at\ Ensembl.

\ x 1 mouseOrtho Mouse Ortholog bed 5 + Mouse Orthology Using Fgenesh++ Gene Predictions (top 4 reciprocal best) 0 156 0 100 0 255 240 200 1 0 0 x 1 mouseOrthoSeed Tight Ortholog bed 5 + Tight Mouse Orthology Using Fgenesh++ Gene Predictions (only reciprocal best) 0 158 0 100 0 255 240 200 1 0 0 x 1 blastzSelfUnmasked Self Unmasked psl xeno Blastz Self Join without Repeats Masked (tandem repeats masked) 0 159 0 0 0 127 127 127 1 0 0 x 1 chainDanRer4 $o_Organism Chain chain danRer4 $o_Organism ($o_date/$o_db) Chained Alignments 0 159.1 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in. The \ resulting alignments were converted into psl format using the lavToPsl program. The axt alignments were fed into axtChain, which organizes all alignments \ between a single $o_organism chromosome and a single $organism chromosome \ into a group and creates a kd-tree out of the gapless subsections (blocks) of \ the alignments. A dynamic program was then run over the kd-trees to find the \ maximally scoring chains of these blocks.\ \ $matrix\ \ Chains scoring below a threshold of 5000 were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb danRer4\ netDanRer4 $o_Organism Net netAlign danRer4 chainDanRer4 $o_Organism ($o_date/$o_db) Alignment Net 0 159.2 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program \ chainNet was used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring \ chain. During this process, a natural hierarchy emerged in which a chain \ that filled a gap in a higher-scoring chain was placed underneath that \ chain. The program netSyntenic was used to fill in information about the \ relationship between higher- and lower-level chains, such as whether a \ lower-level chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb danRer4\ chainDanRer1 $o_db Chain chain danRer1 $o_Organism ($o_date/$o_db) Chained Alignments 0 160 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb danRer1\ chainDanRer3 $o_db Chain chain danRer3 $o_Organism ($o_date/$o_db) Chained Alignments 0 160 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in. \ Because the $o_organism chrNA and chrUn chromosomes are comprised of unordered \ scaffolds separated by 500 Ns, the blastz alignments and subsequent \ chaining were first performed on the scaffolds (without using lineage-specific \ repeats), \ and the coordinates were then lifted up to the chromosome level. This avoided \ false alignments across the Ns. The\ resulting alignments were converted into psl format using the lavToPsl program. The axt alignments were fed into axtChain, \ which organizes all alignments between a single $o_organism chromosome and a \ single $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ $matrix\ \ Chains scoring below a threshold of 5000 were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb danRer3\ netDanRer3 $o_db Net netAlign danRer3 chainDanRer3 $o_Organism ($o_date/$o_db) Alignment Net 0 160.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. Because\ the $o_organism chrNA and chrUn chromosomes are comprised of unordered scaffolds\ separated by 500 Ns, the blastz alignments and subsequent chaining were\ first performed on the scaffolds (without using lineage-specific repeats), and \ the coordinates were then lifted up to the chromosome level. This avoided \ false alignments across the Ns.

\

\ The program chainNet was used to place the chains one at a time, \ trimming them as necessary to fit into sections not already covered by a \ higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that \ chain. The program netSyntenic was used to fill in information about the \ relationship between higher- and lower-level chains, such as whether a \ lower-level chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb danRer3\ chainDanRer2 $o_Organism Chain chain danRer2 $o_Organism ($o_date/$o_db) Chained Alignments 0 160.3 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. \ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb danRer2\ netDanRer2 $o_Organism Net netAlign danRer2 chainDanRer2 $o_Organism ($o_date/$o_db) Alignment Net 0 160.4 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb danRer2\ ancientR Ancient Repeats bed 12 . Human/Mouse Ancient Repeats 0 163 0 0 0 127 127 127 1 0 0

Display

\

This track displays alignments of the current mouse assembly (phusion.3)\ against regions of the human genome contained in an ancient copies of\ transposable elements. In this case "ancient" means that RepeatMasker's\ annotation indicates that the copy was fixed as an interspersed repeat in\ a common ancestor of human and mouse. These regions are of interest\ because they, more likely then any other region, have not been under\ functional constraint.\ Each block in the alignment is displayed as a colored block on the track\ with a line connecting all the blocks. The color of each alignment\ indicates the percent identity of aligned residues over all blocks of the\ alignment. 50% identity and below is lightly colored and the color gets\ linearly darker as the percent identity approaches 100%.\ In the alignments, lower case letters indicate that RepeatMasker annotated\ them as an interspersed repeat. Because of the high substitution rate in\ the mouse lineage, the element often only was recognized in the human\ genome. The original alignments often are much longer, but only the region\ witin the repeat is displayed.\ \

Methods

\

The sequences were aligned with blastz (discontiguous exact seeds,\ ungapped extension, local alignments via dynamic programming) and\ postprocessed for single coverage.\ \

Data

\

Human sequence from:

\ \ http://genome-test.cse.ucsc.edu/gs.8/oo.33/chromFa.zip\

Mouse sequence from phusion.3:

\ \ ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB/mus_musculus/ClipReads/Assemblies/Sanger_Oct15/\

Repeats from:

\ \ http://genome.ucsc.edu/goldenPath/06aug2001/database/
\ (chrN_rmsk.txt.gz for chromosome N)\
\

Credits

\ Alignments contributed by Scott Schwartz. See \ http://bio.cse.psu.edu/genome/hummus/2001-12-16/aar/README.\ x 1 chainOryLat1 $o_Organism Chain chain oryLat1 $o_Organism ($o_date/$o_db) Chained Alignments 0 165.49 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz using dynmamic masking, and the transposons were then \ added back in. The resulting alignments were converted into psl format \ using the lavToPsl program. The axt alignments were fed into axtChain, which \ organizes all alignments between a single $o_organism chromosome and a \ single $organism chromosome into a group and creates a kd-tree out of the \ gapless subsections (blocks) of the alignments. A dynamic program was then \ run over the kd-trees to find the maximally scoring chains of these blocks.\ \ $matrix\ \ Chains scoring below a threshold of 2000 were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb oryLat1\ netOryLat1 $o_Organism Net netAlign oryLat1 chainOryLat1 $o_Organism ($o_date/$o_db) Alignment Net 0 165.5 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA. 2003;100(20):11484-11489.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb oryLat1\ chainGasAcu1 $o_Organism Chain chain gasAcu1 $o_Organism ($o_date/$o_db) Chained Alignments 0 165.51 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 1 otherDb gasAcu1\ netGasAcu1 $o_Organism Net netAlign gasAcu1 chainGasAcu1 $o_Organism ($o_date/$o_db) Alignment Net 0 165.52 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 0 otherDb gasAcu1\ chainAnoCar1 $o_Organism Chain chain anoCar1 $o_Organism ($o_date/$o_db) Chained Alignments 0 167 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz using dynmamic masking, and the transposons were then \ added back in. The resulting alignments were converted into psl format \ using the lavToPsl program. The axt alignments were fed into axtChain, which \ organizes all alignments between a single $o_organism chromosome and a \ single $organism chromosome into a group and creates a kd-tree out of the \ gapless subsections (blocks) of the alignments. A dynamic program was then \ run over the kd-trees to find the maximally scoring chains of these blocks.\ \ $matrix\ \ Chains scoring below a threshold of 2000 were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Repeat areas we marked in the genome with WindowMasker as\ developed by: Morgulis A, Gertz EM, Schäffer AA, Argarwala R.\

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\

\ Morgulis A, Gertz EM, Schäffer AA, Argarwala R.\ WindowMasker: window-based masker for sequenced genomes. \ Bioinformatics 2006 Jan 15;22(2):134-41. Epub 2005 Nov 15

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb anoCar1\ netAnoCar1 $o_Organism Net netAlign anoCar1 chainAnoCar1 $o_Organism ($o_date/$o_db) Alignment Net 0 167.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA. 2003;100(20):11484-11489.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb anoCar1\ chainXenTro2 $o_db Chain chain xenTro2 $o_Organism ($o_date/$o_db) Chained Alignments 0 170 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ The genomes of $o_organism and $organism were aligned with blastz.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb xenTro2\ encodeRegionsLiftOver liftOver Regions bed 4 . liftOver ENCODE Region Orthologs (Freeze 3) 0 170.1 0 200 0 127 227 127 0 0 0 encode 1 netXenTro2 $o_db Net netAlign xenTro2 chainXenTro2 $o_Organism ($o_date/$o_db) Alignment Net 0 170.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb xenTro2\ encodeRegionsMercator Mercator Regions bed 4 . Mercator ENCODE Region Orthologs (Freeze 3) 0 170.2 0 0 200 127 127 227 0 0 0 encode 1 chainXenTro1 $o_Organism Chain chain xenTro1 $o_Organism ($o_date/$o_db) Chained Alignments 0 170.2 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_Organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_Organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_Organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the \ $o_Organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_Organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb xenTro1\ encodeRegionsMercatorMerged Mercator Regions bed 4 . Merged Mercator ENCODE Region Orthologs (Freeze 3) 0 170.3 0 0 200 127 127 227 0 0 0 encode 1 netXenTro1 $o_Organism Net netAlign xenTro1 chainXenTro1 $o_Organism ($o_date/$o_db) Alignment Net 1 170.3 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is \ from the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb xenTro1\ encodeRegionsConsensus ENCODE Region Consensus bed 4 . Consensus Orthology of ENCODE Regions from LiftOver and Mercator (Freeze 3) 0 170.4 150 100 30 202 177 142 0 0 0 encode 1 encodeRegions2 ENCODE Region Consensus (Freeze 2) bed 4 . Consensus Orthology of ENCODE Regions from liftOver and Mercator (Freeze 2) 3 170.5 200 0 0 227 127 127 0 0 0 encode 1 ratNet Rat Net netAlign rn2 ratChain Rat/$Organism Alignment Net 0 171 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best rat/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. \ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ \ compGeno 0 otherDb rn2\ chainGalGal2 $o_Organism Chain chain galGal2 $o_Organism ($o_date/$o_db) Chained Alignments 0 180 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb galGal2\ netGalGal2 $o_Organism Net netAlign galGal2 chainGalGal2 $o_Organism ($o_date/$o_db) Alignment Net 0 180.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb galGal2\ chainGalGal3 $o_db Chain chain galGal3 $o_Organism ($o_date/$o_db) Chained Alignments 0 181 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. \ \ $matrix \ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb galGal3\ netGalGal3 $o_db Net netAlign galGal3 chainGalGal3 $o_Organism ($o_date/$o_db) Alignment Net 0 181.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ compGeno 0 otherDb galGal3\ chainOrnAna1 $o_Organism Chain chain ornAna1 $o_Organism ($o_date/$o_db) Chained Alignments 0 185 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb ornAna1\ netOrnAna1 $o_Organism Net netAlign ornAna1 chainOrnAna1 $o_Organism ($o_date/$o_db) Alignment Net 0 185.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb ornAna1\ chainMonDom1 $o_Organism Chain chain monDom1 $o_Organism ($o_date/$o_db) Chained Alignments 0 190 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb monDom1\ chainMonDom2 $o_Organism Chain chain monDom2 $o_Organism ($o_date/$o_db) Chained Alignments 0 190 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb monDom2\ chainMonDom4 $o_Organism Chain chain monDom4 $o_Organism ($o_date/$o_db) Chained Alignments 0 190 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ The genomes of $o_organism and $organism were aligned with blastz.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb monDom4\ netMonDom1 $o_Organism Net netAlign monDom1 chainMonDom1 $o_Organism ($o_date/$o_db) Alignment Net 0 190.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb monDom1\ netMonDom2 $o_Organism Net netAlign monDom2 chainMonDom2 $o_Organism ($o_date/$o_db) Alignment Net 0 190.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb monDom2\ netMonDom4 $o_Organism Net netAlign monDom4 chainMonDom4 $o_Organism ($o_date/$o_db) Alignment Net 0 190.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb monDom4\ blastSacCer1SG Yeast Proteins psl protein Yeast Proteins from SGD Mapped by Chained tBLASTn 0 200 0 0 0 127 127 127 0 0 0 http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus= genes 1 blastRef sacCer1.blastSGRef00\ colorChromDefault off\ pred sacCer1.blastSGPep00\ chainBorEut13 $o_Organism Chain chain borEut13 $o_Organism ($o_date/$o_db) Chained Alignments 0 200 100 50 0 255 240 200 1 0 0 compGeno 1 otherDb borEut13\ chainCanHg12 $o_Organism Chain chain canHg12 $o_Organism ($o_date/$o_db) Chained Alignments 0 200 100 50 0 255 240 200 1 0 0 compGeno 1 otherDb canHg12\ chainRn4 $o_Organism Chain chain rn4 $o_Organism ($o_date/$o_db) Chained Alignments 0 200 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps than \ traditional affine gap scoring systems. It can also tolerate gaps in both \ $o_organism and $organism simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ $o_organism assembly or an insertion in the $organism assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where multiple chains align over a particular region of \ the $organism genome, the chains with single-lined gaps are often due to \ processed pseudogenes, while chains with double-lined gaps are more often \ due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and \ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all \ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program \ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. \ \ $matrix \ \ Chains scoring below a threshold were discarded; the remaining \ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ otherDb rn4\ chainRn3 $o_Organism Chain chain rn3 $o_Organism ($o_date/$o_db) Chained Alignments 0 200 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. \ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb rn3\ blastDm2FB D. mel. Proteins (dm2) psl protein D. melanogaster Proteins (dm2) Mapped by Chained tBLASTn 1 200 0 0 0 127 127 127 0 0 0 http://flybase.bio.indiana.edu/.bin/fbidq.html?

Description

\

\ This track contains tBLASTn alignments of the peptides\ from the predicted and known genes identified in the D. melanogaster\ FlyBase as of 25 June 2005 to the $organism sequence.\ \

Methods

\

\ First, predicted proteins from the D. melanogaster FlyBase track were\ aligned with the D. melanogaster genome using the blat program to \ discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ $organism sequence using the tBLASTn program.\ Finally, the putative $organism exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. \ Basic local alignment search tool. \ J Mol Biol. 1990 Oct 5;215(3):403-410.

\

\ Blat was written by Jim Kent. The remaining utilities \ required to produce this track were written by Jim Kent or Brian Raney.\ genes 1 blastRef dm2.blastFBRef01\ colorChromDefault off\ pred dm2.blastFBPep01\ netRn4 $o_Organism Net netAlign rn4 chainRn4 $o_Organism ($o_date/$o_db) Alignment Net 0 200.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ program RepeatMasker.\

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb rn4\ netRn3 $o_Organism Net netAlign rn3 chainRn3 $o_Organism ($o_date/$o_db) Alignment Net 0 200.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ program RepeatMasker.\

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb rn3\ cgapSage CGAP SAGE bed 8 + CGAP Long SAGE 0 200.1 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays genomic mappings for $organism LongSAGE tags from the The Cancer Genome Anatomy\ Project. SAGE (Serial Analysis of Gene Expression) [Velculescu 1995] is a\ quantitative technique for measuring gene expression. For a brief overview\ of SAGE, see the CGAP SAGE\ information page.\

\ \

Display Conventions and Configuration

\

\ Genomic mappings of 17-base LongSAGE tags are displayed. Tag counts are\ normalized to tags per million (TPM) in each library. Tags with higher TPM are\ more darkly shaded. The CATG restriction site before the start of the tag\ is rendered as a thick line; the 17 bases of the tag are drawn as a thinner\ line. Thus the thin end points in the direction of transcription.\ \ The track display modes are:\

    \
  • dense - Draws locations of mapped tags on a single line.\
  • squish - Draws one item per tag per tissue without labels.\
  • pack - Draws one item per tag per tissue with labels. The label\ includes the number of libraries containing the tag.\ Clicking on an item lists the libraries containing the tag, with the libraries\ from the particular tissue in bold. Clicking on a library in the lists\ displays detailed information about that library.\
  • full - Draws one item per tag per library.\ Clicking on an item displays information about the library, along with other\ libraries containing the tag.\
\

\

\ The track can be configured to display only tags from a selected tissue or a\ selected library.\

\ \

Methods

\

\ Tag and library data, along with genomic mappers, were obtained\ from The Cancer Genome Anatomy Project.\

\

\ Information about the various SAGE libraries, data downloads and other tools\ for exploring and analyzing this data is available from the\ CGAP SAGE Genie web site.\

\ \ \

Human Embryonic Stem Cell library construction

\

\ Detailed information regarding the human ESC lines used in this study can be\ found at http://stemcells.nih.gov and in Hirst, et al. 2007.\ The ESC tags were generated from RNA purified from human ESCs maintained under\ conditions that promote their maintenance in an undifferentiated state.\

\ \

\ A complete set of embryonic stem cell LongSAGE tags are available through the\ BCCA-GSC embryonic stem cell\ transcriptomes web site and through the CGAP web portal.\

\ \

Credits

\

\ Many thanks to Martin Hirst of Canada's Michael\ Smith Genome Sciences Centre for his assistance in developing this\ track.\

\ \

\ The LongSAGE data and genomic mappings were provided by the \ The Cancer Genome Anatomy Project\ of the National Cancer Institute,\ U.S. National Institutes of Health.\

\ \

The Human Embryonic Stem Cell library was supported by funds from the\ National Cancer Institute, National Institutes of Health, under Contract\ No. N01-C0-12400 and by grants from Genome Canada, Genome British Columbia and\ the Canadian Stem Cell Network.\

\ \

References

\ \

\ Velculescu VE, Zhang L, Vogelstein B, and Kinzler KW.\ \ Serial analysis of gene expression.\ Science. 1995 Oct 20;270(5235):484-7. \

\ \

\ Hirst M, Delaney A, Rogers SA, Schnerch A, Persaud DR, O'Connor MD, Zeng T, Moksa M, Fichter K, Mah D, et al.\ Human embryonic stem cell transcriptomes are enriched in novel transcripts and those encoding RNA binding proteins.\ Genome Biology. 2007 submitted\

\ \

\ Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE.\ \ Using the transcriptome to annotate the genome.\ Nat Biotechnol. 2002 May;20(5):508-12. \

\ \

\ Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J, Babakaiff R, Barber S, Beland J, Bohacec S, et al.\ \ A mouse atlas of gene expression: Large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells.\ Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18485-90. Epub 2005 Dec 13.\

\ \

\ Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu AL, Ma K, et al.\ \ Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.\ Genome Res. 2007 Jan;17(1):108-16. Epub 2006 Nov 29. \

\ \

\ Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA, Prange C, Morin PJ, Polyak K, et al.\ \ A public database for gene expression in human cancers.\ Cancer Res. 1999 Nov 1;59(21):5403-7.\

\ \

\ Riggins GJ, Strausberg RL. \ \ Genome and genetic resources from the Cancer Genome Anatomy Project.\ Hum Mol Genet. 2001 Apr;10(7):663-7. Review.\

\ \

\ Boon K, Osorio E, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, de Souza SJ, Riggins GJ.\ \ An anatomy of normal and malignant gene expression.\ Proc Natl Acad Sci U S A. 2002 Sep 3;99(18):11547-8.\

\ \

\ Liang P.\ \ SAGE Genie: a suite with panoramic view of gene expression.\ Proc Natl Acad Sci U S A. 2002 Aug 20;99(17):11287-92.\

\ rna 1 blastzBestMm3X Mm3X Best psl xeno mm3X Mm3X Blastz Best-in-Genome Alignments 0 200.8 0 0 0 127 127 127 1 0 0 This track is created by running simpleChain on the\ results from running axtBest on the alignments from the\ standard blastz process.\ x 1 otherDb mm3X\ blastDm1FB D. mel. Proteins psl protein D. melanogaster Proteins (dm1) Mapped by Chained tBLASTn 1 201 0 0 0 127 127 127 0 0 0 http://flybase.bio.indiana.edu/.bin/fbidq.html?

Description

\

\ This track contains tBLASTn alignments of the peptides\ from the predicted and known genes identified in the D. melanogaster\ FlyBase as of 24 July 2004 to the $organism sequence.\ \

Methods

\

\ First, predicted proteins from the D. melanogaster FlyBase track were\ aligned with the D. melanogaster genome using the blat program to \ discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ $organism sequence using the tBLASTn program.\ Finally, the putative $organism exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.\ Basic local alignment search tool.\ J Mol Biol. 1990 Oct 5;215(3):403-410.

\

\ Blat was written by Jim Kent. The remaining utilities \ required to produce this track were written by Jim Kent or Brian Raney.\ genes 1 blastRef dm1.blastFBRef00\ colorChromDefault off\ pred dm1.blastFBPep00\ blastzBestMm2X Mm2X Best psl xeno mm2X Mm2X Blastz Best-in-Genome Alignments 0 201.8 0 0 0 127 127 127 1 0 0 This track is created by running simpleChain on the\ results from running axtBest on the alignments from the\ standard blastz process.\ x 1 otherDb mm2X\ chainMm7 $o_Organism Chain chain mm7 $o_Organism ($o_date/$o_db) Chained Alignments 0 210 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb mm7\ chainMm8 $o_Organism Chain chain mm8 $o_Organism ($o_date/$o_db) Chained Alignments 0 210 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 1 otherDb mm8\ chainMm6 $o_db Chain chain mm6 $o_Organism ($o_date/$o_db) Chained Alignments 0 210 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb mm6\ chainMm5 $o_db Chain chain mm5 $o_Organism ($o_date/$o_db) Chained Alignments 0 210 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb mm5\ chainMm4 $o_db Chain chain mm4 $o_Organism ($o_date/$o_db) Chained Alignments 0 210 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb mm4\ chainMm3 $o_db Chain chain mm3 $o_Organism ($o_date/$o_db) Chained Alignments 0 210 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb mm3\ netMm7 $o_Organism Net netAlign mm7 chainMm7 $o_Organism ($o_date/$o_db) Alignment Net 0 210.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb mm7\ netMm8 $o_Organism Net netAlign mm8 chainMm8 $o_Organism ($o_date/$o_db) Alignment Net 0 210.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 0 otherDb mm8\ netMm6 $o_db Net netAlign mm6 chainMm6 $o_Organism ($o_date/$o_db) Alignment Net 0 210.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb mm6\ netMm5 $o_db Net netAlign mm5 chainMm5 $o_Organism ($o_date/$o_db) Alignment Net 0 210.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb mm5\ chainBosTau1 $o_Organism Chain chain bosTau1 $o_Organism ($o_date/$o_db) Chained Alignments 0 220 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb bosTau1\ chainBosTau2 $o_db Chain chain bosTau2 $o_Organism ($o_date/$o_db) Chained Alignments 0 220 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the scaffold, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ The genomes of $o_organism and $organism were aligned with blastz.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism scaffold and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ otherDb bosTau2\ netBosTau1 $o_Organism Net netAlign bosTau1 chainBosTau1 $o_Organism ($o_date/$o_db) Alignment Net 0 220.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb bosTau1\ netBosTau2 $o_db Net netAlign bosTau2 chainBosTau2 $o_Organism ($o_date/$o_db) Alignment Net 0 220.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb bosTau2\ chainCanFam1 $o_Organism Chain chain canFam1 $o_Organism ($o_date/$o_db) Chained Alignments 0 230 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb canFam1\ chainFelCat3 $o_Organism Chain chain felCat3 $o_Organism ($o_date/$o_db) Chained Alignments 0 230 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput . 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA. 2003;100(20):11484-11489.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, \ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 1 matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ otherDb felCat3\ chainCanFam2 $o_Organism Chain chain canFam2 $o_Organism ($o_date/$o_db) Chained Alignments 0 230 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. $matrix Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 1 matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ otherDb canFam2\ netCanFam1 $o_Organism Net netAlign canFam1 chainCanFam1 $o_Organism ($o_date/$o_db) Alignment Net 0 230.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb canFam1\ netCanFam2 $o_Organism Net netAlign canFam2 chainCanFam2 $o_Organism ($o_date/$o_db) Alignment Net 0 230.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 0 otherDb canFam2\ netFelCat3 $o_Organism Net netAlign felCat3 chainFelCat3 $o_Organism ($o_date/$o_db) Alignment Net 0 230.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 0 otherDb felCat3\ chainPanTro2 $o_Organism Chain chain panTro2 $o_Organism ($o_date/$o_db) Chained Alignments 0 240 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb panTro2\ chainPanTro1 $o_Organism Chain chain panTro1 $o_Organism ($o_date/$o_db) Chained Alignments 0 240 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. The $o_organism genomic sequence is\ from the 13 Nov. 2003 Arachne draft assembly.\

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ The alignments were generated by blastz on repeatmasked sequence using\ the following $organism/$o_organism scoring matrix:\

\
          A    C    G    T\
     A   100 -300 -150 -300\
     C  -300  100 -300 -150\
     G  -150 -300  100 -300\
     T  -300 -150 -300  100\
\
     K = 4500, L = 3000,  Y = 3400, H = 2000\

\

\ The resulting alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed here.

\ \

Credits

\

\ The $o_organism sequence used in this track was obtained from the 13 Nov. \ 2003 Arachne assembly. We'd like to thank the National Human Genome Research \ Institute (NHGRI), the Broad Institute at MIT/Harvard, and Washington \ University St. Louis School of Medicine for providing this sequence.

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 otherDb panTro1\ netPanTro2 $o_Organism Net netAlign panTro2 chainPanTro2 $o_Organism ($o_date/$o_db) Alignment Net 0 240.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb panTro2\ netPanTro1 $o_Organism Net netAlign panTro1 chainPanTro1 $o_Organism ($o_date/$o_db) Alignment Net 0 240.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb panTro1\ chainHg18 $o_Organism Chain chain hg18 $o_Organism ($o_date/$o_db) Chained Alignments 0 250 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 1 otherDb hg18\ chainHg17 $o_db Chain chain hg17 $o_Organism ($o_date/$o_db) Chained Alignments 0 250 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb hg17\ chainHg16 $o_Organism Chain chain hg16 $o_Organism ($o_date/$o_db) Chained Alignments 0 250 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb hg16\ chainHg15 $o_Organism Chain Hg15 chain hg15 $o_Organism ($o_date/$o_db) Chained Alignments 1 250 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 otherDb hg15\ netHg17 $o_db Net netAlign hg17 chainHg17 $o_Organism ($o_date/$o_db) Alignment Net 1 250.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 otherDb hg17\ netHg18 $o_Organism Net netAlign hg18 chainHg18 $o_Organism ($o_date/$o_db) Alignment Net 1 250.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 0 otherDb hg18\ blastHg18KG Human Proteins psl protein Human Proteins Mapped by Chained tBLASTn 3 250.2 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains tBLASTn alignments of the peptides from the predicted and \ known genes identified in the hg18 Known Genes track as of 13 Feb 2006.

\ \

Methods

\ First, the predicted proteins from the human Known Genes track were aligned \ with the human genome using the blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ $organism sequence using the tBLASTn program.\ Finally, the putative $organism exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. \ Basic local alignment search tool. \ J Mol Biol. 1990 Oct 5;215(3):403-410.

\

\ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

\ genes 1 blastRef hg18.blastKGRef04\ colorChromDefault off\ pred hg18.blastKGPep04\ blastHg17KG Hg17 Proteins psl protein Hg17 Proteins Mapped by Chained tBLASTn 3 250.3 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains tBLASTn alignments of the peptides from the predicted and \ known genes identified in the hg17 Known Genes track as of 30 Aug 2004.

\ \

Methods

\ First, the predicted proteins from the human Known Genes track were aligned \ with the human genome using the blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ $organism sequence using the tBLASTn program.\ Finally, the putative $organism exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. \ Basic local alignment search tool. \ J Mol Biol. 1990 Oct 5;215(3):403-410.

\

\ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

\ genes 1 blastRef hg17.blastKGRef01\ colorChromDefault off\ pred hg17.blastKGPep01\ blastHg16KG Human Proteins psl protein Human Proteins (hg16) Mapped by Chained tBLASTn 3 250.4 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains tBLASTn alignments of the peptides from the predicted \ and known genes identified in the hg16 Known Genes track as of 27 May 2004.\

\ \

Methods

\

\ First, the predicted proteins from the human Known Genes track were aligned \ with the human genome using the blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ $organism sequence using the tBLASTn program.\ Finally, the putative $organism exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

\ \

Credits

\

\ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.\ Basic local alignment search tool.\ J Mol Biol. 1990 Oct 5;215(3):403-410.

\

\ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

\ genes 1 blastRef hg16.blastKGRef00\ colorChromDefault off\ pred hg16.blastKGPep00\ chainRheMac2 $o_Organism Chain chain rheMac2 $o_Organism ($o_date/$o_db) Chained Alignments 0 257.1 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of $o_organism ($o_db, $o_date) to the\ $organism genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ $o_organism and $organism simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ $o_organism assembly or an insertion in the $organism \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the $organism genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \

Methods

\

\ Transposons that have been inserted since the $o_organism/$organism\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were then added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single $o_organism chromosome and a single\ $organism chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. \ \ $matrix\ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 1 matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ otherDb rheMac2\ netRheMac2 $o_Organism Net netAlign rheMac2 chainRheMac2 $o_Organism ($o_date/$o_db) Alignment Net 0 257.2 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best $o_organism/$organism chain for \ every part of the $organism genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The $o_organism sequence used in this annotation is from\ the $o_date ($o_db) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 otherDb rheMac2\ chainTupBel1 $o_Organism Chain chain tupBel1 $o_Organism ($o_date/$o_db) Chained Alignments 0 272 100 50 0 255 240 200 1 0 0 compGeno 1 otherDb tupBel1\ netTupBel1 $o_Organism Net netAlign tupBel1 chainTupBel1 $o_Organism ($o_date/$o_db) Alignment Net 0 272.2 0 0 0 127 127 127 1 0 0 compGeno 0 otherDb tupBel1\ chainOtoGar1 $o_Organism Chain chain otoGar1 $o_Organism ($o_date/$o_db) Chained Alignments 0 274 100 50 0 255 240 200 1 0 0 compGeno 1 otherDb otoGar1\ chainOtoGar1Best $o_Organism Best Chain chain otoGar1 $o_Organism ($o_date/$o_db) Chained Alignments Recip Best 0 274.1 100 50 0 255 240 200 1 0 0 compGeno 1 otherDb otoGar1\ netOtoGar1 $o_Organism Net netAlign otoGar1 chainOtoGar1 $o_Organism ($o_date/$o_db) Alignment Net 0 274.2 0 0 0 127 127 127 1 0 0 compGeno 0 otherDb otoGar1\ chainMm3XSingle Mm3X Best Recip chain mm3X Mm3X Best Reciprocal (best w/no overlap) 0 300 100 50 0 255 240 200 1 0 0 x 1 otherDb mm3X\ chainMm2XSingle Mm2X Best Recip chain mm2X Mm2X Best Reciprocal (best w/no overlap) 0 301 100 50 0 255 240 200 1 0 0 x 1 otherDb mm2X\ chainHg16MergeEx chainHg16MergeEx chain hg16 chainHg16MergeEx 0 303 100 50 0 255 240 200 1 0 0 x 1 otherDb hg16\ chainDm1MergeEx chainDm1MergeEx chain dm1 chainDm1MergeEx 0 304 100 50 0 255 240 200 1 0 0 x 1 otherDb dm1\