This directory contains FASTA files which contain a modified version of the Dec. 2013 (GRCh38/hg38) reference human genome assembly. The chromosomal sequences were assembled by the International Human Genome Project sequencing centers. The assembly sequence was changed to use IUPAC ambiguous nucleotide characters at each base covered by a stringently filtered subset of single-base substitutions annotated by dbSNP build 149. For example, if the assembly has an 'A' at a position where dbSNP has annotated an A/C/T substitution SNP, the 'A' is replaced by 'H' in the FASTA file here. dbSNP single-base substitutions were excluded from masking in the following cases: - UCSC tagged the dbSNP item with any of these exceptions (see also the exceptions field of the hg38.snp149 database table as well as the hg38.snp149ExceptionDesc table): - MultipleAlignments: dbSNP mapped item to multiple locations - ObservedMismatch: the reference allele does not appear in the item's observed alleles. - ObservedWrongFormat: the observed sequence has an unexpected format - dbSNP item class is not "single". - dbSNP item length is not exactly one base. - dbSNP item weight is greater than 1. (lower weight = higher confidence) The remaining single-base substitutions were used to mask the genomic sequence. Files included in this directory: chr*.subst.fa.gz - FASTA files with IUPAC characters for substitution SNPs md5sum.txt - checksums of files in this directory ------------------------------------------------------------------ If you plan to download a large file or multiple files from this directory, we recommend that you use ftp rather than downloading the files via our website. To do so, ftp to hgdownload.cse.ucsc.edu [username: anonymous, password: your email address], then cd to the directory goldenPath/hg38/bigZips. To download multiple files, use the "mget" command: mget ... - or - mget -a (to download all the files in the directory) Alternate methods to ftp access. Using an rsync command to download the entire directory: rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg38/snp149Mask/ . For a single file, e.g. chr1.subst.fa.gz rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg38/snp149Mask/chr1.subst.fa.gz . Or with wget, all files: wget --timestamping 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/snp149Mask/*' With wget, a single file: wget --timestamping 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/snp149Mask/chr1.subst.fa.gz' -O chr1.subst.fa.gz To uncompress the fa.gz files: gunzip .fa.gz