This file is from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/cactus447way/README.txt This directory contains compressed multiple alignments of the following assemblies to the human genome (hg38/GRCh38, Dec. 2013): ----------------------------------------------------------------------------- Assemblies used in these alignments: (alignment type) Human - Homo sapiens Dec. 2013 (GRCh38/hg38) (reference) --------------------------------------------------------------- This alignment was created by making three edits (using Cactus) to the 241-way mammalian Zoonomia Cactus alignment ( https://cglgenomics.ucsc.edu/data/cactus/). 1. One additional cat genome, "Felis_catus_fca126" (GCA_018350175.1) was added as a sister taxa to the existing "Felis_catus" species 2. Five additional canine genomes were also added: canFam4, "Canis_lupus_dingo" (GCA_003254725.1), "Canis_lupus_orion" (GCA_905319855.2), "Nyctereutes_procyonoides" (GCA_905146905.1) and "Otocyon_megalotis" (GCA_017311455.1). "Canis_lupus" from the Zoonomia alignment was also renamed "Canis_lupus_VD" to reflect the fact that it corresponds to a "village dog" and not "wolf" sample. 3. The 43-species primates clade from the Zoonomia alignment was removed and replaced with the 243-way primates alignment from TODO: [CITE ILLUMINA PAPER], increasing the alignment by 200 additional primate species. Phylogenic tree The phylogenic tree was established by the research described in the paper: Lukas F. K. Kuderna, et al. A global catalog of whole-genome diversity from 233 primate species. Science Vol. 380, No. 6648, 906-913 (2023), DOI: 10.1126/science.abn7829 Files in this directory: - hg38.447way.nh.txt - phylogenetic tree used to guide the cactus alignment - hg38.447way.commonNames.nh.txt - same tree with the common names - cactus447wayFrames.bb - the reading frames for display of amino acid coding regions in the display at base level - cactus447waySummary.bb - a bigBed format file for a pre-calculated display - of the maf coverage graph when viewing larger areas - hg38.cactus447way.bb - the maf file data in bigMaf file format for use in the display of the track in the genome browser - md5sum.txt - MD5 sum of the files to verify downloads See also: The "maf" directory contains the alignments to the human assembly, with additional annotations to indicate gap context, and genomic breaks for the sequence in the underlying genome assemblies. Beware, the compressed data size of the files in the 'maf' directory is 7.4 Tb, uncompressed here is approximately 1.5 Tb. For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html. PhyloP conservation scores for these alignments are available at: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP447way --------------------------------------------------------------- Note, the uncompressed maf/*.maf.gz files are 7.4 Tb of data, when compressed, they are approximately 1.5 Tb of compressed data. The entire set of data in this directory is approximately 4.1 Tb. To download a large file or multiple files from this directory, we recommend that you use rsync or ftp rather than downloading the files via our website. Via rsync: rsync -avz --progress \ rsync://hgdownload.cse.ucsc.edu/goldenPath/hg38/cactus447way/ ./ Via FTP: ftp hgdownload.cse.ucsc.edu user name: anonymous password: go to the directory goldenPath/hg38/cactus447way To download multiple files from the UNIX command line, use the "mget" command. mget ... - or - mget -a (to download all the files in the directory) Use the "prompt" command to toggle the interactive mode if you do not want to be prompted for each file that you download. --------------------------------------------------------------- All the files in this directory are freely usable for any purpose. For data use restrictions regarding the individual genome assemblies, see http://genome.ucsc.edu/goldenPath/credits.html. ---------------------------------------------------------------