This file is from: http://hgdownload.cse.ucsc.edu/goldenPath/panTro3/multiz12way/README.txt This directory contains compressed multiple alignments of the following assemblies to the chimp genome (panTro3, Oct. 2010): Assemblies used in these alignments: - Chimp Pan troglodytes Oct. 2010 panTro3 - Human Homo sapiens Feb. 2009 hg19/GRCh37 - Orangutan Pongo pygmaeus abelii July 2007 ponAbe2 - Rhesus Macaca mulatta Jan. 2006 rheMac2 - Marmoset Callithrix jacchus Mar. 2009 calJac3 - Mouse Mus musculus July 2007 mm9 - Rat Rattus norvegicus Nov. 2004 rn4 - Horse Equus caballus Sep. 2007 equCab2 - Dog Canis lupus familiaris May. 2005 canFam2 - Opossum Monodelphis domestica Oct. 2006 monDom5 - Chicken Gallus gallus May. 2006 galGal3 - Zebrafish Danio rerio Jul. 2010 danRer7 These alignments were prepared using the methods described in the track description file: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro3&g=cons12way based on the phylogenetic tree: 12way.nh. Files in this directory: - 12way.nh - phylogenetic tree used during the multiz multiple alignment - panTro3.commonNames.12way.nh - same as 12way.nh with the UCSC database names replaced by the common name for the species The "alignments" directory contains compressed FASTA alignments for the CDS regions of the chimp genome (panTro3, Oct. 2010) aligned to the assemblies. The multiz12way.maf.gz file contains all the alignments for all chromosomes and contigs in the chimp genome. Additional annotations to indicate gap context and genomic breaks for the sequence in the underlying genome assemblies. Beware, the compressed data size of this file is 11 Gb, uncompressed is more than 56 Gb. The maf/upstream*.maf.gz files contain alignments in regions upstream of annotated transcription starts for Ensembl genes with annotated 5' UTRs. These files differ from the standard MAF format: they display alignments that extend from start to end of the upstream region in human, whether or not alignments actually exist. In situations where no alignments exist or the alignments of one or more species are missing, dot (".") is used as a placeholder. Multiple regions of an assembly's sequence may align to a single region in chimp; therefore, only the species name is displayed in the alignment data and no position information is recorded. The alignment score is always zero in these files. These files are updated weekly. For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html. PhastCons conservation scores for these alignments are available at: http://hgdownload.cse.ucsc.edu/goldenPath/panTro3/phastCons12way PhyloP conservation scores for these alignments are available at: http://hgdownload.cse.ucsc.edu/goldenPath/panTro3/phyloP12way --------------------------------------------------------------- To download a large file or multiple files from this directory, we recommend that you use rsync or ftp rather than downloading the files via our website. There is approximately 31 Gb of compressed data in this directory. Via rsync: rsync -av --progress \ rsync://hgdownload.cse.ucsc.edu/goldenPath/panTro3/multiz12way/ ./ Via FTP: ftp hgdownload.cse.ucsc.edu user name: anonymous password: go to the directory goldenPath/panTro3/multiz12way To download multiple files from the UNIX command line, use the "mget" command. mget ... - or - mget -a (to download all the files in the directory) Use the "prompt" command to toggle the interactive mode if you do not want to be prompted for each file that you download. --------------------------------------------------------------- All the files in this directory are freely usable for any purpose. For data use restrictions regarding the individual genome assemblies, see http://genome.ucsc.edu/goldenPath/credits.html.