This file is from: http://hgdownload.soe.ucsc.edu/goldenPath/dm6/multiz124way/README.txt This directory contains compressed multiple alignments of 123 insect genome assembly sequences on the reference D. melanogaster/dm6/Aug 2014. See also: assemblyInformation.txt for information about which assemblies have been used in this 124-way multiple alignment Files in this directory: dm6.124way.sequenceNames.nh - phylogenetic tree used for multiz alignment, - with UCSC database names or sequence names dm6.124way.scientificName.nh - the same phylogenetic tree with strictly - scientific names dm6.124way.taxId.nh - the same phylogenetic tree with NCBI taxonomy IDs - there is a duplicate ID in this list: 46245 - there are two assembly versions of: D_pseudoobscura - named: D_pseudoobscura_1 and droPse3 nameCrossReference.txt - tab separated columns with the different names - for these sequences and databases, columns: 1. sequence name - UCSC database or scientific name - sequence names used in MAF files 2. NCBI accession 3. NCBI taxon identifier 4. assembly name 5. scientific name sciNameToUcscDbName.txt - translation of UCSC database name to an abbreviated - scientific name sequences/* - directories with 2bit files and assembly reports from NCBI - for each sequence used that is not in a UCSC database sequences/md5sum.txt - the MD5 sums for the files in this directory structure maf/*.maf.gz - alignments referenced to dm6, separate maf files for each chromosome upstream1000.ncbiRefSeq.maf.gz - alignments in regions upstream, see below upstream2000.ncbiRefSeq.maf.gz - alignments in regions upstream, see below upstream5000.ncbiRefSeq.maf.gz - alignments in regions upstream, see below md5sum.txt - MD5 sums of these file to verify transmission The "alignments" directory contains compressed FASTA alignments for the NCBI RefSeq Gene CDS regions of the D. melanogaster genome (dm6, Aug. 2014) aligned to the assemblies. The upstream*.maf.gz files contain alignments in regions upstream of annotated transcription starts for the NCBI RefSeq Genes with annotated 5' UTRs. These files differ from the standard MAF format: they display alignments that extend from start to end of the upstream region in D. melanogaster, whether or not alignments actually exist. In situations where no alignments exist or the alignments of one or more species are missing, dot (".") is used as a placeholder. Multiple regions of an assembly's sequence may align to a single region in the human sequence; therefore, only the species name is displayed in the alignment data and no position information is recorded. The alignment score is always zero in these files. For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html. The phastCons data can be found at: http://hgdownload.cse.ucsc.edu/goldenPath/dm6/phastCons124way/ The phyloP data can be found at: http://hgdownload.cse.ucsc.edu/goldenPath/dm6/phyloP124way/ For more information about this data, see the track description for the Conservation track: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=dm6&g=cons124way --------------------------------------------------------------- Note, the uncompressed maf/*.maf.gz files are 156 Gb of data, when compressed, they are approximately 18 Gb of compressed data. The entire set of data in this directory is approximately 20 Gb. --------------------------------------------------------------- To download a large file or multiple files from this directory, we recommend that you use rsync or ftp rather than downloading the files via our website. Via rsync: rsync -avz --progress \ rsync://hgdownload.cse.ucsc.edu/goldenPath/dm6/multiz124way/ ./ Via FTP: ftp hgdownload.cse.ucsc.edu user name: anonymous password: go to the directory goldenPath/dm6/multiz124way To download multiple files from the UNIX command line, use the "mget" command. mget ... - or - mget -a (to download all the files in the directory) Use the "prompt" command to toggle the interactive mode if you do not want to be prompted for each file that you download. --------------------------------------------------------------- All the files in this directory are freely usable for any purpose. For data use restrictions regarding the individual genome assemblies, see http://genome.ucsc.edu/goldenPath/credits.html. ---------------------------------------------------------------