This is the human assembly made by the T2T project called CHM13 V2.0. It contains sequences for centromeres and other difficult to sequence regions and has no gaps. If you are new to genomics or unsure about which assembly to use, note that the improvements concern regions of the genome that are not yet important for most analyses or diagnostic purposes. This is not the human genome reference sequence released by the Genome Reference Consortium GRCh38 (aka hg38), which is used by most databases, aligners and diagnostics labs at the moment. If you are looking for the current human reference sequence then you may want to use https://hgdownload.ucsc.edu/goldenPath/hg38/bigZips/ rather than basing your analysis on CHM13. Files: hs1.fa.gz - "Soft-masked" assembly CHM13 sequence in one file. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. hs1.2bit - contains the complete human CHM13 genome sequence in the 2bit file format. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. The utility program, twoBitToFa (available from the kent src tree), can be used to extract .fa file(s) from this file. A pre-compiled version of the command line tool can be found at: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ See also: http://genome.ucsc.edu/admin/git.html http://genome.ucsc.edu/admin/jk-install.html hs1.repeatMasker.out - RepeatMasker .out file. RepeatMasker was run with the -s (sensitive) setting. hs1.repeatMasker.version.txt - version of repeatmasker that was used. genes/hs1.ncbiRefSeq.gp.gz - gene annotations made by NCBI RefSeq in UCSC genePred format. genes/hs1.ncbiRefSeq.gtf.gz - gene annotations made by NCBI RefSeq in GFF/GTF format.