This file is from: http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/multiz7way/README.txt This directory contains compressed multiple alignments of 7 virus sequences. These 7 sequences represent coronavirus strains in human populations The 'reference' sequence for this collection is the sequence: NC_045512v2 - 2019-12-30 - Wuhan-Hu-1 https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2 Description files in this directory: md5sum.txt - md5 sums to verify copied files wuhCor1.7way.nameList.txt - relating the accession name to sequence name, and sample collection date wuhCor1.7way.nh - Phylogenetic tree used for multiz alignment. The phylogenetic tree was calculated on 31mer frequency similarity and neighbor joining that distance matrix with the phylip toolset: http://evolution.genetics.washington.edu/phylip.html 'neighbor' command: http://evolution.genetics.washington.edu/phylip/progs.data.dist.html wuhCor1.multiz7way.maf.gz - alignments with gap annotation with accession identifiers dnaFasta7.fa.tgz - gzipped tar file for the DNA fasta, 7 sequences - to extract sequences: tar xvzf dnaFasta7.fa.tgz - creates seven files: # -rw-rw-r-- 1 27718 May 13 08:44 CoV229E.fa # -rw-rw-r-- 1 30361 May 13 08:44 HKU1.fa # -rw-rw-r-- 1 30557 May 13 08:44 MERS.fa # -rw-rw-r-- 1 27954 May 13 08:44 NL63.fa # -rw-rw-r-- 1 31188 May 13 08:44 OC43.fa # -rw-rw-r-- 1 30190 May 13 09:37 SARS_CoV_1.fa # -rw-rw-r-- 1 30515 May 13 08:49 wuhCor1.fa Example measurement of the sequences with the 'faCount' command: # faCount *.fa # # #seq len A C G T N cpg # CoV229E 27317 7420 4549 5903 9445 0 488 # HKU1 29926 8331 3895 5699 12001 0 340 # MERS 30119 7900 6116 6304 9799 0 711 # NL63 27553 7253 3979 5516 10805 0 332 # OC43 30741 8502 4660 6649 10930 0 485 # SARS_CoV_1 29751 8481 5940 6187 9143 0 568 # NC_045512v2 29903 8954 5492 5863 9594 0 439 # total 205310 56841 34631 42121 71717 0 3363 For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html. --------------------------------------------------------------- To download a large file or multiple files from this directory, we recommend that you use rsync or ftp rather than downloading the files via our website. Via rsync: rsync -avz --progress \ rsync://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/multiz7way/ ./ Via FTP: ftp hgdownload.soe.ucsc.edu user name: anonymous password: go to the directory goldenPath/wuhCor1/multiz7way To download multiple files from the UNIX command line, use the "mget" command. mget ... - or - mget -a (to download all the files in the directory) Use the "prompt" command to toggle the interactive mode if you do not want to be prompted for each file that you download. --------------------------------------------------------------- All the files in this directory are freely usable for any purpose. For data use restrictions regarding the individual genome assemblies, see http://genome.ucsc.edu/goldenPath/credits.html. ---------------------------------------------------------------