This file is from: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/README.txt This directory contains compressed multiple alignments of the following assemblies to the human genome (hg19/GRCh37, Feb. 2009): Assemblies used in these alignments: - Human Homo sapiens Feb. 2009 hg19/GRCh37 - Chimp Pan troglodytes Mar. 2006 panTro2 - Gorilla Gorilla gorilla gorilla Oct. 2008 gorGor1 - Orangutan Pongo pygmaeus abelii July 2007 ponAbe2 - Rhesus Macaca mulatta Jan. 2006 rheMac2 - Baboon Papio hamadryas Nov. 2008 papHam1 - Marmoset Callithrix jacchus June 2007 calJac1 - Tarsier Tarsius syrichta Aug. 2008 tarSyr1 - Mouse lemur Microcebus murinus Jul. 2007 micMur1 - Bushbaby Otolemur garnettii Dec. 2006 otoGar1 - Tree shrew Tupaia belangeri Dec. 2006 tupBel1 - Mouse Mus musculus July 2007 mm9 - Rat Rattus norvegicus Nov. 2004 rn4 - Kangaroo rat Dipodomys ordii Jul. 2008 dipOrd1 - Guinea Pig Cavia porcellus Feb. 2008 cavPor3 - Squirrel Spermophilus tridecemlineatus Feb. 2008 speTri1 - Rabbit Oryctolagus cuniculus Apr. 2009 oryCun2 - Pika Ochotona princeps Jul. 2008 ochPri2 - Alpaca Vicugna pacos Jul. 2008 vicPac1 - Dolphin Tursiops truncatus Feb. 2008 turTru1 - Cow Bos taurus Oct. 2007 bosTau4 - Horse Equus caballus Sep. 2007 equCab2 - Cat Felis catus Mar. 2006 felCat3 - Dog Canis lupus familiaris May 2005 canFam2 - Microbat Myotis lucifugus Mar. 2006 myoLuc1 - Megabat Pteropus vampyrus Jul. 2008 pteVam1 - Hedgehog Erinaceus europaeus June 2006 eriEur1 - Shrew Sorex araneus June 2006 sorAra1 - Elephant Loxodonta africana Jul. 2009 loxAfr3 - Rock hyrax Procavia capensis Jul. 2008 proCap1 - Tenrec Echinops telfairi July 2005 echTel1 - Armadillo Dasypus novemcinctus Jul. 2008 dasNov2 - Sloth Choloepus hoffmanni Jul. 2008 choHof1 - Wallaby Macropus eugenii Nov. 2007 macEug1 - Opossum Monodelphis domestica Oct. 2006 monDom5 - Platypus Ornithorhynchus anatinus Mar. 2007 ornAna1 - Chicken Gallus gallus May 2006 galGal3 - Zebra finch Taeniopygia guttata Jul. 2008 taeGut1 - Lizard Anolis carolinensis Feb. 2007 anoCar1 - X. tropicalis Xenopus tropicalis Aug. 2005 xenTro2 - Tetraodon Tetraodon nigroviridis Mar. 2007 tetNig2 - Fugu Takifugu rubripes Oct. 2004 fr2 - Stickleback Gasterosteus aculeatus Feb. 2006 gasAcu1 - Medaka Oryzias latipes Oct. 2005 oryLat2 - Zebrafish Danio rerio Dec. 2008 danRer6 - Lamprey Petromyzon marinus Mar. 2007 petMar1 These alignments were prepared using the methods described in the track description file: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way based on the phylogenetic tree: 46way.nh. Files in this directory: - 46way.nh - phylogenetic tree used during the multiz multiple alignment - commonNames.46way.nh - same as 46way.nh with the UCSC database name replaced by the common name for the species UPDATE - January 2010 - corrected tree diagram - 46way.corrected.nh - Wallaby has moved to be a sister to Opossum in the tree diagram from the above 46way.nh - commonNames.46way.corrected.nh - Wallaby has moved to be a sister to Opossum in the tree diagram from the above commonNames.46way.nh See also: http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment The "alignments" directory contains compressed FASTA alignments for the CDS regions of the human genome (hg19/GRCh37, Feb. 2009) aligned to the assemblies. The maf/chr*.maf.gz files each contain all the alignments to that particular human chromosome, with additional annotations to indicate gap context, genomic breaks, and quality scores for the sequence in the underlying genome assemblies. Beware, the compressed data size of these files is 31 Gb, uncompressed is more than 250 Gb. The maf/upstream*.maf.gz files contain alignments in regions upstream of annotated transcription starts for RefSeq genes with annotated 5' UTRs. These files differ from the standard MAF format: they display alignments that extend from start to end of the upstream region in human, whether or not alignments actually exist. In situations where no alignments exist or the alignments of one or more species are missing, dot (".") is used as a placeholder. Multiple regions of an assembly's sequence may align to a single region in human; therefore, only the species name is displayed in the alignment data and no position information is recorded. The alignment score is always zero in these files. These files are updated weekly. For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html. PhastCons conservation scores for these alignments are available at: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way PhyloP conservation scores for these alignments are available at: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phyloP46way --------------------------------------------------------------- To download a large file or multiple files from this directory, we recommend that you use rsync or ftp rather than downloading the files via our website. There is approximately 31 Gb of compressed data in this directory. Via rsync: rsync -avz --progress \ rsync://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/ ./ Via FTP: ftp hgdownload.cse.ucsc.edu user name: anonymous password: go to the directory goldenPath/hg19/multiz46way To download multiple files from the UNIX command line, use the "mget" command. mget ... - or - mget -a (to download all the files in the directory) Use the "prompt" command to toggle the interactive mode if you do not want to be prompted for each file that you download. --------------------------------------------------------------- All the files in this directory are freely usable for any purpose. For data use restrictions regarding the individual genome assemblies, see http://genome.ucsc.edu/goldenPath/credits.html.