================================================================ ======== addCols ==================================== ================================================================ addCols - Sum columns in a text file. usage: addCols adds all columns (up to 16 columns) in the given file, outputs the sum of each column. can be the name: stdin to accept input from stdin. ================================================================ ======== ameme ==================================== ================================================================ ameme - find common patterns in DNA usage ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] [rcToo=on] [controlRun=on] [startScanLimit=20] [outputLogo] [constrainer=1] where goodIn.fa is a multi-sequence fa file containing instances of the motif you want to find, badIn.fa is a file containing similar sequences but lacking the motif, numMotifs is the number of motifs to scan for, background is m0,m1, or m2 for various levels of Markov models, maxOcc is the maximum occurrences of the motif you expect to find in a single sequence and motifOutput is the name of a file to store just the motifs in. rcToo=on searches both strands. If you include controlRun=on in the command line, a random set of sequences will be generated that match your foreground data set in size, and your background data set in nucleotide probabilities. The program will then look for motifs in this random set. If the scores you get in a real run are about the same as those you get in a control run, then the motifs Improbizer has found are probably not significant. ================================================================ ======== autoDtd ==================================== ================================================================ autoDtd - Give this a XML document to look at and it will come up with a DTD to describe it. usage: autoDtd in.xml out.dtd out.stats options: -tree=out.tree - Output tag tree. -atree=out.atree - Output attributed tag tree. ================================================================ ======== autoSql ==================================== ================================================================ autoSql - create SQL and C code for permanently storing a structure in database and loading it back into memory based on a specification file usage: autoSql specFile outRoot {optional: -dbLink -withNull -json} This will create outRoot.sql outRoot.c and outRoot.h based on the contents of specFile. options: -dbLink - optionally generates code to execute queries and updates of the table. -addBin - Add an initial bin field and index it as (chrom,bin) -withNull - optionally generates code and .sql to enable applications to accept and load data into objects with potential 'missing data' (NULL in SQL) situations. -defaultZeros - will put zero and or empty string as default value -django - generate method to output object as django model Python code -json - generate method to output the object in JSON (JavaScript) format. ================================================================ ======== autoXml ==================================== ================================================================ autoXml - Generate structures code and parser for XML file from DTD-like spec usage: autoXml file.dtdx root This will generate root.c, root.h options: -textField=xxx what to name text between start/end tags. Default 'text' -comment=xxx Comment to appear at top of generated code files -picky Generate parser that rejects stuff it doesn't understand -main Put in a main routine that's a test harness -prefix=xxx Prefix to add to structure names. By default same as root -positive Don't write out optional attributes with negative values ================================================================ ======== ave ==================================== ================================================================ ave - Compute average and basic stats usage: ave file options: -col=N Which column to use. Default 1 -tableOut - output by columns (default output in rows) -noQuartiles - only calculate min,max,mean,standard deviation - for large data sets that will not fit in memory. ================================================================ ======== aveCols ==================================== ================================================================ aveCols - average together columns usage: aveCols file adds all columns (up to 16 columns) in the given file, outputs the average (sum/#ofRows) of each column. can be the name: stdin to accept input from stdin. ================================================================ ======== axtChain ==================================== ================================================================ axtChain - Chain together axt alignments. usage: axtChain -linearGap=loose in.axt tNibDir qNibDir out.chain Where tNibDir/qNibDir are either directories full of nib files, or the name of a .2bit file options: -psl Use psl instead of axt format for input -faQ qNibDir is a fasta file with multiple sequences for query -faT tNibDir is a fasta file with multiple sequences for target -minScore=N Minimum score for chain, default 1000 -details=fileName Output some additional chain details -scoreScheme=fileName Read the scoring matrix from a blastz-format file -linearGap= Specify type of linearGap to use. *Must* specify this argument to one of these choices. loose is chicken/human linear gap costs. medium is mouse/human linear gap costs. Or specify a piecewise linearGap tab delimited file. sample linearGap file (loose) tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 ================================================================ ======== axtSort ==================================== ================================================================ axtSort - Sort axt files usage: axtSort in.axt out.axt options: -query - Sort by query position, not target -byScore - Sort by score ================================================================ ======== axtSwap ==================================== ================================================================ axtSwap - Swap source and query in an axt file usage: axtSwap source.axt target.sizes query.sizes dest.axt options: -xxx=XXX ================================================================ ======== axtToMaf ==================================== ================================================================ axtToMaf - Convert from axt to maf format usage: axtToMaf in.axt tSizes qSizes out.maf Where tSizes and qSizes is a file that contains the sizes of the target and query sequences. Very often this with be a chrom.sizes file Options: -qPrefix=XX. - add XX. to start of query sequence name in maf -tPrefex=YY. - add YY. to start of target sequence name in maf -tSplit Create a separate maf file for each target sequence. In this case output is a dir rather than a file In this case in.maf must be sorted by target. -score - recalculate score -scoreZero - recalculate score if zero ================================================================ ======== axtToPsl ==================================== ================================================================ axtToPsl - Convert axt to psl format usage: axtToPsl in.axt tSizes qSizes out.psl Where tSizes and qSizes are tab-delimited files with columns. options: -xxx=XXX ================================================================ ======== bedClip ==================================== ================================================================ bedClip - Remove lines from bed file that refer to off-chromosome places. usage: bedClip input.bed chrom.sizes output.bed options: -verbose=2 - set to get list of lines clipped and why ================================================================ ======== bedCommonRegions ==================================== ================================================================ bedCommonRegions - Create a bed file (just bed3) that contains the regions common to all inputs. Regions are common only if exactly the same chromosome, starts, and end. Overlap is not enough. Each region must be in each input at most once. Output is stdout. usage: bedCommonRegions file1 file2 file3 ... fileN ================================================================ ======== bedCoverage ==================================== ================================================================ bedCoverage - Analyse coverage by bed files - chromosome by chromosome and genome-wide. usage: bedCoverage database bedFile Note bed file must be sorted by chromosome -restrict=restrict.bed Restrict to parts in restrict.bed ================================================================ ======== bedExtendRanges ==================================== ================================================================ bedExtendRanges - extend length of entries in bed 6+ data to be at least the given length, taking strand directionality into account. usage: bedExtendRanges database length files(s) options: -host mysql host -user mysql user -password mysql password -tab Separate by tabs rather than space -verbose=N - verbose level for extra information to STDERR example: bedExtendRanges hg18 250 stdin bedExtendRanges -user=genome -host=genome-mysql.cse.ucsc.edu hg18 250 stdin will transform: chr1 500 525 . 100 + chr1 1000 1025 . 100 - to: chr1 500 750 . 100 + chr1 775 1025 . 100 - ================================================================ ======== bedGeneParts ==================================== ================================================================ bedGeneParts - Given a bed, spit out promoter, first exon, or all introns. usage: bedGeneParts part in.bed out.bed Where part is either 'exons' or 'firstExon' or 'introns' or 'promoter' or 'firstCodingSplice' or 'secondCodingSplice' options: -proStart=NN - start of promoter relative to txStart, default -100 -proEnd=NN - end of promoter relative to txStart, default 50 ================================================================ ======== bedGraphToBigWig ==================================== ================================================================ bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format. usage: bedGraphToBigWig in.bedGraph chrom.sizes out.bw where in.bedGraph is a four column file in the format: and chrom.sizes is two column: and out.bw is the output indexed big wig file. Use the script: fetchChromSizes to obtain the actual chrom.sizes information from UCSC, please do not make up a chrom sizes from your own information. The input bedGraph file must be sorted, use the unix sort command: sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph options: -blockSize=N - Number of items to bundle in r-tree. Default 256 -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024 -unc - If set, do not use compression. ================================================================ ======== bedIntersect ==================================== ================================================================ bedIntersect - Intersect two bed files usage: bedIntersect a.bed b.bed output.bed options: -aHitAny output all of a if any of it is hit by b -minCoverage=0.N min coverage of b to output match (or if -aHitAny, of a). Not applied to 0-length items. Default 0.000010 -bScore output score from b.bed (must be at least 5 field bed) -tab chop input at tabs not spaces -allowStartEqualEnd Don't discard 0-length items of a or b (e.g. point insertions) ================================================================ ======== bedItemOverlapCount ==================================== ================================================================ bedItemOverlapCount - count number of times a base is overlapped by the items in a bed file. Output is bedGraph 4 to stdout. usage: sort bedFile.bed | bedItemOverlapCount [options] stdin To create a bigWig file from this data to use in a custom track: sort -k1,1 bedFile.bed | bedItemOverlapCount [options] stdin \ > bedFile.bedGraph bedGraphToBigWig bedFile.bedGraph chrom.sizes bedFile.bw where the chrom.sizes is obtained with the script: fetchChromSizes See also: http://genome-test.cse.ucsc.edu/~kent/src/unzipped/utils/userApps/fetchChromSizes options: -zero add blocks with zero count, normally these are ommitted -bed12 expect bed12 and count based on blocks Without this option, only the first three fields are used. -max if counts per base overflows set to max (4294967295) instead of exiting -outBounds output min/max to stderr -chromSize=sizefile Read chrom sizes from file instead of database sizefile contains two white space separated fields per line: chrom name and size -host=hostname mysql host used to get chrom sizes -user=username mysql user -password=password mysql password Notes: * You may want to separate your + and - strand items before sending into this program as it only looks at the chrom, start and end columns of the bed file. * Program requires a connection to lookup chrom sizes for a sanity check of the incoming data. Even when the -chromSize argument is used the must be present, but it will not be used. * The bed file *must* be sorted by chrom * Maximum count per base is 4294967295. Recompile with new unitSize to increase this ================================================================ ======== bedPileUps ==================================== ================================================================ bedPileUps - Find (exact) overlaps if any in bed input usage: bedPileUps in.bed Where in.bed is in one of the ascii bed formats. The in.bed file must be sorted by chromosome,start, to sort a bed file, use the unix sort command: sort -k1,1 -k2,2n unsorted.bed > sorted.bed Options: -name - include BED name field 4 when evaluating uniqueness -tab - use tabs to parse fields -verbose=2 - show the location and size of each pileUp ================================================================ ======== bedRemoveOverlap ==================================== ================================================================ bedRemoveOverlap - Remove overlapping records from a (sorted) bed file. Gets rid of `the smaller of overlapping records. usage: bedRemoveOverlap in.bed out.bed options: -xxx=XXX ================================================================ ======== bedRestrictToPositions ==================================== ================================================================ bedRestrictToPositions - Filter bed file, restricting to only ones that match chrom/start/ends specified in restrict.bed file. usage: bedRestrictToPositions in.bed restrict.bed out.bed options: -xxx=XXX ================================================================ ======== bedSort ==================================== ================================================================ bedSort - Sort a .bed file by chrom,chromStart usage: bedSort in.bed out.bed in.bed and out.bed may be the same. ================================================================ ======== bedToBigBed ==================================== ================================================================ bedToBigBed v. 2.5 - Convert bed file to bigBed. (BigBed version: 4) usage: bedToBigBed in.bed chrom.sizes out.bb Where in.bed is in one of the ascii bed formats, but not including track lines and chrom.sizes is two column: and out.bb is the output indexed big bed file. Use the script: fetchChromSizes to obtain the actual chrom.sizes information from UCSC, please do not make up a chrom sizes from your own information. The in.bed file must be sorted by chromosome,start, to sort a bed file, use the unix sort command: sort -k1,1 -k2,2n unsorted.bed > sorted.bed options: -type=bedN[+[P]] : N is between 3 and 15, optional (+) if extra "bedPlus" fields, optional P specifies the number of extra fields. Not required, but preferred. Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1) -as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition of each field in a row in AutoSql format here. -blockSize=N - Number of items to bundle in r-tree. Default 256 -itemsPerSlot=N - Number of data points bundled at lowest level. Default 512 -unc - If set, do not use compression. -tab - If set, expect fields to be tab separated, normally expects white space separator. -extraIndex=fieldList - If set, make an index on each field in a comma separated list extraIndex=name and extraIndex=name,id are commonly used. ================================================================ ======== bedToExons ==================================== ================================================================ bedToExons - Split a bed up into individual beds. One for each internal exon. usage: bedToExons originalBeds.bed splitBeds.bed options: -cdsOnly - Only output the coding portions of exons. ================================================================ ======== bedToGenePred ==================================== ================================================================ Too few arguments: bedToGenePred - convert bed format files to genePred format usage: bedToGenePred bedFile genePredFile Convert a bed file to a genePred file. If BED has at least 12 columns, then a genePred with blocks is created. Otherwise single-exon genePreds are created. ================================================================ ======== bedToPsl ==================================== ================================================================ Too few arguments: bedToPsl - convert bed format files to psl format usage: bedToPsl chromSizes bedFile pslFile Convert a BED file to a PSL file. This the result is an alignment. It is intended to allow processing by tools that operate on PSL. If the BED has at least 12 columns, then a PSL with blocks is created. Otherwise single-exon PSLs are created. Options: -keepQuery - instead of creating a fake query, create PSL with identical query and target specs. Useful if bed features are to be lifted with pslMap and one wants to keep the source location in the lift result. ================================================================ ======== bedWeedOverlapping ==================================== ================================================================ bedWeedOverlapping - Filter out beds that overlap a 'weed.bed' file. usage: bedWeedOverlapping weeds.bed input.bed output.bed options: -maxOverlap=0.N - maximum overlapping ratio, default 0 (any overlap) -invert - keep the overlapping and get rid of everything else ================================================================ ======== bigBedInfo ==================================== ================================================================ bigBedInfo - Show information about a bigBed file. usage: bigBedInfo file.bb options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -chroms - list all chromosomes and their sizes -zooms - list all zoom levels and their sizes -as - get autoSql spec -extraIndex - list all the extra indexes ================================================================ ======== bigBedNamedItems ==================================== ================================================================ bigBedNamedItems - Extract item of given name from bigBed usage: bigBedNamedItems file.bb name output.bed options: -nameFile - if set, treat name parameter as file full of space delimited names -field=fieldName - use index on field name, default is "name" ================================================================ ======== bigBedSummary ==================================== ================================================================ bigBedSummary - Extract summary information from a bigBed file. usage: bigBedSummary file.bb chrom start end dataPoints Get summary data from bigBed for indicated region, broken into dataPoints equal parts. (Use dataPoints=1 for simple summary.) options: -type=X where X is one of: coverage - % of region that is covered (default) mean - average depth of covered regions min - minimum depth of covered regions max - maximum depth of covered regions -fields - print out information on fields in file. If fields option is used, the chrom, start, end, dataPoints parameters may be omitted -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigBedToBed ==================================== ================================================================ bigBedToBed - Convert from bigBed to ascii bed format. usage: bigBedToBed input.bb output.bed options: -chrom=chr1 - if set restrict output to given chromosome -start=N - if set, restrict output to only that over start -end=N - if set, restict output to only that under end -maxItems=N - if set, restrict output to first N items -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigWigAverageOverBed ==================================== ================================================================ bigWigAverageOverBed - Compute average score of big wig over each bed, which may have introns. usage: bigWigAverageOverBed in.bw in.bed out.tab The output columns are: name - name field from bed, which should be unique size - size of bed (sum of exon sizes covered - # bases within exons covered by bigWig sum - sum of values over all bases covered mean0 - average over bases with non-covered bases counting as zeroes mean - average over just covered bases Options: -bedOut=out.bed - Make output bed that is echo of input bed but with mean column appended -sampleAroundCenter=N - Take sample at region N bases wide centered around bed item, rather than the usual sample in the bed item. ================================================================ ======== bigWigCorrelate ==================================== ================================================================ bigWigCorrelate - Correlate bigWig files, optionally only on target regions. usage: bigWigCorrelate a.bigWig b.bigWig options: -restrict=restrict.bigBed - restrict correlation to parts covered by this file -threshold=N.N - clip values to this threshold ================================================================ ======== bigWigInfo ==================================== ================================================================ bigWigInfo - Print out information about bigWig file. usage: bigWigInfo file.bw options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -chroms - list all chromosomes and their sizes -zooms - list all zoom levels and their sizes -minMax - list the min and max on a single line ================================================================ ======== bigWigMerge ==================================== ================================================================ bigWigMerge - Merge together multiple bigWigs into a single output bedGraph. You'll have to run bedGraphToBigWig to make the output bigWig. The signal values are just added together to merge them usage: bigWigMerge in1.bw in2.bw .. inN.bw out.bedGraph options: -threshold=0.N - don't output values at or below this threshold. Default is 0.0 -adjust=0.N - add adjustment to each value -clip=NNN.N - values higher than this are clipped to this value ================================================================ ======== bigWigSummary ==================================== ================================================================ bigWigSummary - Extract summary information from a bigWig file. usage: bigWigSummary file.bigWig chrom start end dataPoints Get summary data from bigWig for indicated region, broken into dataPoints equal parts. (Use dataPoints=1 for simple summary.) NOTE: start and end coordinates are in BED format (0-based) options: -type=X where X is one of: mean - average value in region (default) min - minimum value in region max - maximum value in region std - standard deviation in region coverage - % of region that is covered -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigWigToBedGraph ==================================== ================================================================ bigWigToBedGraph - Convert from bigWig to bedGraph format. usage: bigWigToBedGraph in.bigWig out.bedGraph options: -chrom=chr1 - if set restrict output to given chromosome -start=N - if set, restrict output to only that over start -end=N - if set, restict output to only that under end -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigWigToWig ==================================== ================================================================ bigWigToWig - Convert bigWig to wig. This will keep more of the same structure of the original wig than bigWigToBedGraph does, but still will break up large stepped sections into smaller ones. usage: bigWigToWig in.bigWig out.wig options: -chrom=chr1 - if set restrict output to given chromosome -start=N - if set, restrict output to only that over start -end=N - if set, restict output to only that under end -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== blastToPsl ==================================== ================================================================ blastToPsl - Convert blast alignments to PSLs. usage: blastToPsl [options] blastOutput psl Options: -scores=file - Write score information to this file. Format is: strands qName qStart qEnd tName tStart tEnd bitscore eVal -verbose=n - n >= 3 prints each line of file after parsing. n >= 4 dumps the result of each query -eVal=n n is e-value threshold to filter results. Format can be either an integer, double or 1e-10. Default is no filter. -pslx - create PSLX output (includes sequences for blocks) Output only results of last round from PSI BLAST ================================================================ ======== blastXmlToPsl ==================================== ================================================================ blastXmlToPsl - convert blast XML output to PSLs usage: blastXmlToPsl [options] blastXml psl options: -scores=file - Write score information to this file. Format is: strands qName qStart qEnd tName tStart tEnd bitscore eVal qDef tDef -verbose=n - n >= 3 prints each line of file after parsing. n >= 4 dumps the result of each query -eVal=n n is e-value threshold to filter results. Format can be either an integer, double or 1e-10. Default is no filter. -pslx - create PSLX output (includes sequences for blocks) -convertToNucCoords - convert protein to nucleic alignments to nucleic to nucleic coordinates -qName=src - define element used to obtain the qName. The following values are support: o query-ID - use contents of the element if it exists, otherwise use o query-def0 - use the first white-space separated word of the element if it exists, otherwise the first word of . Default is query-def0. -tName=src - define element used to obtain the tName. The following values are support: o Hit_id - use contents of the element. o Hit_def0 - use the first white-space separated word of the element. o Hit_accession - contents of the element. Default is Hit-def0. -forcePsiBlast - treat as output of PSI-BLAST. blast-2.2.16 and maybe others indentify psiblast as blastp. Output only results of last round from PSI BLAST ================================================================ ======== blat ==================================== ================================================================ blat - Standalone BLAT v. 35x1 fast sequence search command line tool usage: blat database query [-ooc=11.ooc] output.psl where: database and query are each either a .fa , .nib or .2bit file, or a list these files one file name per line. -ooc=11.ooc tells the program to load over-occurring 11-mers from and external file. This will increase the speed by a factor of 40 in many cases, but is not required output.psl is where to put the output. Subranges of nib and .2bit files may specified using the syntax: /path/file.nib:seqid:start-end or /path/file.2bit:seqid:start-end or /path/file.nib:start-end With the second form, a sequence id of file:start-end will be used. options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein The default is dna -prot Synonymous with -t=prot -q=prot -ooc=N.ooc Use overused tile file N.ooc. N should correspond to the tileSize -tileSize=N sets the size of match that triggers an alignment. Usually between 8 and 12 Default is 11 for DNA and 5 for protein. -stepSize=N spacing between tiles. Default is tileSize. -oneOff=N If set to 1 this allows one mismatch in tile and still triggers an alignments. Default is 0. -minMatch=N sets the number of tile matches. Usually set from 2 to 4 Default is 2 for nucleotide, 1 for protein. -minScore=N sets minimum score. This is the matches minus the mismatches minus some sort of gap penalty. Default is 30 -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -maxGap=N sets the size of maximum gap between tiles in a clump. Usually set from 0 to 3. Default is 2. Only relevent for minMatch > 1. -noHead suppress .psl header (so it's just a tab-separated file) -makeOoc=N.ooc Make overused tile file. Target needs to be complete genome. -repMatch=N sets the number of repetitions of a tile allowed before it is marked as overused. Typically this is 256 for tileSize 12, 1024 for tile size 11, 4096 for tile size 10. Default is 1024. Typically only comes into play with makeOoc. Also affected by stepSize. When stepSize is halved repMatch is doubled to compensate. -mask=type Mask out repeats. Alignments won't be started in masked region but may extend through it in nucleotide searches. Masked areas are ignored entirely in protein or translated searches. Types are lower - mask out lower cased sequence upper - mask out upper cased sequence out - mask according to database.out RepeatMasker .out file file.out - mask database according to RepeatMasker file.out -qMask=type Mask out repeats in query sequence. Similar to -mask above but for query rather than target sequence. -repeats=type Type is same as mask types above. Repeat bases will not be masked in any way, but matches in repeat areas will be reported separately from matches in other areas in the psl output. -minRepDivergence=NN - minimum percent divergence of repeats to allow them to be unmasked. Default is 15. Only relevant for masking using RepeatMasker .out files. -dots=N Output dot every N sequences to show program's progress -trimT Trim leading poly-T -noTrimA Don't trim trailing poly-A -trimHardA Remove poly-A tail from qSize as well as alignments in psl output -fastMap Run for fast DNA/DNA remapping - not allowing introns, requiring high %ID. Query sizes must not exceed 5000. -out=type Controls output file format. Type is one of: psl - Default. Tab separated format, no sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabular format with comments -fine For high quality mRNAs look harder for small initial and terminal exons. Not recommended for ESTs -maxIntron=N Sets maximum intron size. Default is 750000 -extendThroughN - Allows extension of alignment through large blocks of N's ================================================================ ======== calc ==================================== ================================================================ calc - Little command line calculator usage: calc this + that * theOther / (a + b) ================================================================ ======== catDir ==================================== ================================================================ catDir - concatenate files in directory to stdout. For those times when too many files for cat to handle. usage: catDir dir(s) options: -r Recurse into subdirectories -suffix=.suf This will restrict things to files ending in .suf '-wild=*.???' This will match wildcards. -nonz Prints file name of non-zero length files ================================================================ ======== catUncomment ==================================== ================================================================ catUncomment - Concatenate input removing lines that start with '#' Output goes to stdout usage: catUncomment file(s) ================================================================ ======== chainAntiRepeat ==================================== ================================================================ chainAntiRepeat - Get rid of chains that are primarily the results of repeats and degenerate DNA usage: chainAntiRepeat tNibDir qNibDir inChain outChain options: -minScore=N - minimum score (after repeat stuff) to pass -noCheckScore=N - score that will pass without checks (speed tweak) ================================================================ ======== chainFilter ==================================== ================================================================ chainFilter - Filter chain files. Output goes to standard out. usage: chainFilter file(s) options: -q=chr1,chr2 - restrict query side sequence to those named -notQ=chr1,chr2 - restrict query side sequence to those not named -t=chr1,chr2 - restrict target side sequence to those named -notT=chr1,chr2 - restrict target side sequence to those not named -id=N - only get one with ID number matching N -minScore=N - restrict to those scoring at least N -maxScore=N - restrict to those scoring less than N -qStartMin=N - restrict to those with qStart at least N -qStartMax=N - restrict to those with qStart less than N -qEndMin=N - restrict to those with qEnd at least N -qEndMax=N - restrict to those with qEnd less than N -tStartMin=N - restrict to those with tStart at least N -tStartMax=N - restrict to those with tStart less than N -tEndMin=N - restrict to those with tEnd at least N -tEndMax=N - restrict to those with tEnd less than N -qOverlapStart=N - restrict to those where the query overlaps a region starting here -qOverlapEnd=N - restrict to those where the query overlaps a region ending here -tOverlapStart=N - restrict to those where the target overlaps a region starting here -tOverlapEnd=N - restrict to those where the target overlaps a region ending here -strand=? -restrict strand (to + or -) -long -output in long format -zeroGap -get rid of gaps of length zero -minGapless=N - pass those with minimum gapless block of at least N -qMinGap=N - pass those with minimum gap size of at least N -tMinGap=N - pass those with minimum gap size of at least N -qMaxGap=N - pass those with maximum gap size no larger than N -tMaxGap=N - pass those with maximum gap size no larger than N -qMinSize=N - minimum size of spanned query region -qMaxSize=N - maximum size of spanned query region -tMinSize=N - minimum size of spanned target region -tMaxSize=N - maximum size of spanned target region -noRandom - suppress chains involving '_random' chromosomes -noHap - suppress chains involving '_hap' chromosomes ================================================================ ======== chainMergeSort ==================================== ================================================================ chainMergeSort - Combine sorted files into larger sorted file usage: chainMergeSort file(s) Output goes to standard output options: -saveId - keep the existing chain ids. -inputList=somefile - somefile contains list of input chain files. -tempDir=somedir/ - somedir has space for temporary sorting data, default ./ ================================================================ ======== chainNet ==================================== ================================================================ chainNet - Make alignment nets out of chains usage: chainNet in.chain target.sizes query.sizes target.net query.net where: in.chain is the chain file sorted by score target.sizes contains the size of the target sequences query.sizes contains the size of the query sequences target.net is the output over the target genome query.net is the output over the query genome options: -minSpace=N - minimum gap size to fill, default 25 -minFill=N - default half of minSpace -minScore=N - minimum chain score to consider, default 2000.0 -verbose=N - Alter verbosity (default 1) -inclHap - include query sequences name in the form *_hap*. Normally these are excluded from nets as being haplotype pseudochromosomes ================================================================ ======== chainPreNet ==================================== ================================================================ chainPreNet - Remove chains that don't have a chance of being netted usage: chainPreNet in.chain target.sizes query.sizes out.chain options: -dots=N - output a dot every so often -pad=N - extra to pad around blocks to decrease trash (default 1) -inclHap - include query sequences name in the form *_hap*. Normally these are excluded from nets as being haplotype pseudochromosomes ================================================================ ======== chainSort ==================================== ================================================================ chainSort - Sort chains. By default sorts by score. Note this loads all chains into memory, so it is not suitable for large sets. Instead, run chainSort on multiple small files, followed by chainMergeSort. usage: chainSort inFile outFile Note that inFile and outFile can be the same options: -target sort on target start rather than score -query sort on query start rather than score -index=out.tab build simple two column index file where is score, target, or query depending on the sort. ================================================================ ======== chainSplit ==================================== ================================================================ chainSplit - Split chains up by target or query sequence usage: chainSplit outDir inChain(s) options: -q - Split on query (default is on target) -lump=N Lump together so have only N split files. ================================================================ ======== chainStitchId ==================================== ================================================================ chainStitchId - Join chain fragments with the same chain ID into a single chain per ID. Chain fragments must be from same original chain but must not overlap. Chain fragment scores are summed. usage: chainStitchId in.chain out.chain ================================================================ ======== chainSwap ==================================== ================================================================ chainSwap - Swap target and query in chain usage: chainSwap in.chain out.chain ================================================================ ======== chainToAxt ==================================== ================================================================ chainToAxt - Convert from chain to axt file usage: chainToAxt in.chain tNibDirOr2bit qNibDirOr2bit out.axt options: -maxGap=maximum gap sized allowed without breaking, default 100 -maxChain=maximum chain size allowed without breaking, default 1073741823 -minScore=minimum score of chain -minId=minimum percentage ID within blocks -bed Output bed instead of axt ================================================================ ======== chainToPsl ==================================== ================================================================ chainToPsl - Convert chain file to psl format usage: chainToPsl in.chain tSizes qSizes target.lst query.lst out.psl Where tSizes and qSizes are tab-delimited files with columns. The target and query lists can either be fasta files, nib files, 2bit files or a list of fasta, 2bit and/or nib files one per line options: -tMasked - If specified, the target is soft-masked and the repMatch counts are computed ================================================================ ======== checkAgpAndFa ==================================== ================================================================ checkAgpAndFa - takes a .agp file and .fa file and ensures that they are in synch usage: checkAgpAndFa in.agp in.fa options: -exclude=seq - Ignore seq (e.g. chrM for which we usually get sequence from GenBank but don't have AGP) in.fa can be a .2bit file. If it is .fa then sequences must appear in the same order in .agp and .fa. ================================================================ ======== checkCoverageGaps ==================================== ================================================================ checkCoverageGaps - Check for biggest gap in coverage for a list of tracks. For most tracks coverage of 10,000,000 or more will indicate that there was a mistake in generating the track. usage: checkCoverageGaps database track1 ... trackN Note: for bigWig and bigBeds, the biggest gap is rounded to the nearest 10,000 or so options: -allParts If set then include _hap and _random and other wierd chroms -female If set then don't check chrY -noComma - Don't put commas in biggest gap output ================================================================ ======== checkHgFindSpec ==================================== ================================================================ checkHgFindSpec - test and describe search specs in hgFindSpec tables. usage: checkHgFindSpec database [options | termToSearch] If given a termToSearch, displays the list of tables that will be searched and how long it took to figure that out; then performs the search and the time it took. options: -showSearches Show the order in which tables will be searched in general. [This will be done anyway if no termToSearch or options are specified.] -checkTermRegex For each search spec that includes a regular expression for terms, make sure that all values of the table field to be searched match the regex. (If not, some of them could be excluded from searches.) -checkIndexes Make sure that an index is defined on each field to be searched. ================================================================ ======== checkTableCoords ==================================== ================================================================ checkTableCoords - check invariants on genomic coords in table(s). usage: checkTableCoords database [tableName] Searches for illegal genomic coordinates in all tables in database unless narrowed down using options. Uses ~/.hg.conf to determine genome database connection info. For psl/alignment tables, checks target coords only. options: -table=tableName Check this table only. (Default: all tables) -daysOld=N Check tables that have been modified at most N days ago. -hoursOld=N Check tables that have been modified at most N hours ago. (days and hours are additive) -exclude=patList Exclude tables matching any pattern in comma-separated patList. patList can contain wildcards (*?) but should be escaped or single-quoted if it does. patList can contain "genbank" which will be expanded to all tables generated by the automated genbank build process. -ignoreBlocks To save time (but lose coverage), skip block coord checks. -verboseBlocks Print out more details about illegal block coords, since they can't be found by simple SQL queries. ================================================================ ======== chopFaLines ==================================== ================================================================ chopFaLines - Read in FA file with long lines and rewrite it with shorter lines usage: chopFaLines in.fa out.fa ================================================================ ======== chromGraphFromBin ==================================== ================================================================ chromGraphFromBin - Convert chromGraph binary to ascii format. usage: chromGraphFromBin in.chromGraph out.tab options: -chrom=chrX - restrict output to single chromosome ================================================================ ======== chromGraphToBin ==================================== ================================================================ chromGraphToBin - Make binary version of chromGraph. usage: chromGraphToBin in.tab out.chromGraph options: -xxx=XXX ================================================================ ======== colTransform ==================================== ================================================================ colTransform - Add and/or multiply column by constant. usage: colTransform column input.tab addFactor mulFactor output.tab where: column is the column to transform, starting with 1 input.tab is the tab delimited input file addFactor is what to add. Use 0 here to not change anything mulFactor is what to multiply by. Use 1 here not to change anything output.tab is the tab delimited output file ================================================================ ======== countChars ==================================== ================================================================ countChars - Count the number of occurences of a particular char usage: countChars char file(s) Char can either be a two digit hexadecimal value or a single letter literal character ================================================================ ======== crTreeIndexBed ==================================== ================================================================ crTreeIndexBed - Create an index for a bed file. usage: crTreeIndexBed in.bed out.cr options: -blockSize=N - number of children per node in index tree. Default 1024 -itemsPerSlot=N - number of items per index slot. Default is half block size -noCheckSort - Don't check sorting order of in.tab ================================================================ ======== crTreeSearchBed ==================================== ================================================================ crTreeSearchBed - Search a crTree indexed bed file and print all items that overlap query. usage: crTreeSearchBed file.bed index.cr chrom start end ================================================================ ======== dbSnoop ==================================== ================================================================ dbSnoop - Produce an overview of a database. usage: dbSnoop database output options: -unsplit - if set will merge together tables split by chromosome -noNumberCommas - if set will leave out commas in big numbers ================================================================ ======== dbTrash ==================================== ================================================================ dbTrash - drop tables from a database older than specified N hours usage: dbTrash -age=N [-drop] [-historyToo] [-db=] [-verbose=N] options: -age=N - number of hours old to qualify for drop. N can be a float. -drop - actually drop the tables, default is merely to display tables. -db= - Specify a database to work with, default is customTrash. -historyToo - also consider the table called 'history' for deletion. - default is to leave 'history' alone no matter how old. - this applies to the table 'metaInfo' also. -extFile - check extFile for lines that reference files - no longer in trash -extDel - delete lines in extFile that fail file check - otherwise just verbose(2) lines that would be deleted -topDir - directory name to prepend to file names in extFile - default is /usr/local/apache/trash - file names in extFile are typically: "../trash/ct/..." -tableStatus - use 'show table status' to get size data, very inefficient -delLostTable - delete tables that exist but are missing from metaInfo - this operation can be even slower than -tableStatus - if there are many tables to check. -verbose=N - 2 == show arguments, dates, and dropped tables, - 3 == show date information for all tables. ================================================================ ======== estOrient ==================================== ================================================================ wrong # of args: estOrient [options] db estTable outPsl Read ESTs from a database and determine orientation based on estOrientInfo table or direction in gbCdnaInfo table. Update PSLs so that the strand reflects the direction of transcription. By default, PSLs where the direction can't be determined are dropped. Options: -chrom=chr - process this chromosome, maybe repeated -keepDisoriented - don't drop ESTs where orientation can't be determined. -disoriented=psl - output ESTs that where orientation can't be determined to this file. -inclVer - add NCBI version number to accession if not already present. -fileInput - estTable is a psl file -estOrientInfo=file - instead of getting the orientation information from the estOrientInfo table, load it from this file. This data is the output of polyInfo command. If this option is specified, the direction will not be looked up in the gbCdnaInfo table and db can be `no'. -info=infoFile - write information about each EST to this tab separated file qName tName tStart tEnd origStrand newStrand orient where orient is < 0 if PSL was reverse, > 0 if it was left unchanged and 0 if the orientation couldn't be determined (and was left unchanged). ================================================================ ======== faCmp ==================================== ================================================================ faCmp - Compare two .fa files usage: faCmp [options] a.fa b.fa options: -softMask - use the soft masking information during the compare Differences will be noted if the masking is different. -sortName - sort input files by name before comparing -peptide - read as peptide sequences default: no masking information is used during compare. It is as if both sequences were not masked. Exit codes: - 0 if files are the same - 1 if files differ - 255 on an error ================================================================ ======== faCount ==================================== ================================================================ faCount - count base statistics and CpGs in FA files. usage: faCount file(s).fa -summary show only summary statistics -dinuc include statistics on dinucletoide frequencies -strands count bases on both strands ================================================================ ======== faFilter ==================================== ================================================================ faFilter - Filter fa records, selecting ones that match the specified conditions usage: faFilter [options] in.fa out.fa Options: -name=wildCard - Only pass records where name matches wildcard * matches any string or no character. ? matches any single character. anything else etc must match the character exactly (these will will need to be quoted for the shell) -namePatList=filename - A list of regular expressions, one per line, that will be applied to the fasta name the same as -name -v - invert match, select non-matching records. -minSize=N - Only pass sequences at least this big. -maxSize=N - Only pass sequences this size or smaller. -maxN=N Only pass sequences with fewer than this number of N's -uniq - Removes duplicate sequence ids, keeping the first. -i - make -uniq ignore case so sequence IDs ABC and abc count as dupes. All specified conditions must pass to pass a sequence. If no conditions are specified, all records will be passed. ================================================================ ======== faFilterN ==================================== ================================================================ faFilterN - Get rid of sequences with too many N's usage: faFilterN in.fa out.fa maxPercentN options: -out=in.fa.out -uniq=self.psl ================================================================ ======== faFrag ==================================== ================================================================ faFrag - Extract a piece of DNA from a .fa file. usage: faFrag in.fa start end out.fa options: -mixed - preserve mixed-case in FASTA file ================================================================ ======== faNoise ==================================== ================================================================ faNoise - Add noise to .fa file usage: faNoise inName outName transitionPpt transversionPpt insertPpt deletePpt chimeraPpt options: -upper - output in upper case ================================================================ ======== faOneRecord ==================================== ================================================================ faOneRecord - Extract a single record from a .FA file usage: faOneRecord in.fa recordName ================================================================ ======== faPolyASizes ==================================== ================================================================ faPolyASizes - get poly A sizes usage: faPolyASizes in.fa out.tab output file has four columns: id seqSize tailPolyASize headPolyTSize options: ================================================================ ======== faRandomize ==================================== ================================================================ faRandomize - Program to create random fasta records usage: faRandomize [-seed=N] in.fa randomized.fa Use optional -seed argument to specify seed (integer) for random number generator (rand). Generated sequence has the same base frequency as seen in original fasta records. ================================================================ ======== faRc ==================================== ================================================================ faRc - Reverse complement a FA file usage: faRc in.fa out.fa In.fa and out.fa may be the same file. options: -keepName - keep name identical (don't prepend RC) -keepCase - works well for ACGTUN in either case. bizarre for other letters. without it bases are turned to lower, all else to n's -justReverse - prepends R unless asked to keep name -justComplement - prepends C unless asked to keep name (cannot appear together with -justReverse) ================================================================ ======== faSize ==================================== ================================================================ faSize - print total base count in fa files. usage: faSize file(s).fa Command flags -detailed outputs name and size of each record has the side effect of printing nothing else -tab output statistics in a tab separated format ================================================================ ======== faSomeRecords ==================================== ================================================================ faSomeRecords - Extract multiple fa records usage: faSomeRecords in.fa listFile out.fa options: -exclude - output sequences not in the list file. ================================================================ ======== faSplit ==================================== ================================================================ faSplit - Split an fa file into several files. usage: faSplit how input.fa count outRoot where how is either 'about' 'byname' 'base' 'gap' 'sequence' or 'size'. Files split by sequence will be broken at the nearest fa record boundary. Files split by base will be broken at any base. Files broken by size will be broken every count bases. Examples: faSplit sequence estAll.fa 100 est This will break up estAll.fa into 100 files (numbered est001.fa est002.fa, ... est100.fa Files will only be broken at fa record boundaries faSplit base chr1.fa 10 1_ This will break up chr1.fa into 10 files faSplit size input.fa 2000 outRoot This breaks up input.fa into 2000 base chunks faSplit about est.fa 20000 outRoot This will break up est.fa into files of about 20000 bytes each by record. faSplit byname scaffolds.fa outRoot/ This breaks up scaffolds.fa using sequence names as file names. Use the terminating / on the outRoot to get it to work correctly. faSplit gap chrN.fa 20000 outRoot This breaks up chrN.fa into files of at most 20000 bases each, at gap boundaries if possible. If the sequence ends in N's, the last piece, if larger than 20000, will be all one piece. Options: -verbose=2 - Write names of each file created (=3 more details) -maxN=N - Suppress pieces with more than maxN n's. Only used with size. default is size-1 (only suppresses pieces that are all N). -oneFile - Put output in one file. Only used with size -extra=N - Add N extra bytes at the end to form overlapping pieces. Only used with size. -out=outFile Get masking from outfile. Only used with size. -lift=file.lft Put info on how to reconstruct sequence from pieces in file.lft. Only used with size and gap. -minGapSize=X Consider a block of Ns to be a gap if block size >= X. Default value 1000. Only used with gap. -noGapDrops - include all N's when splitting by gap. -outDirDepth=N Create N levels of output directory under current dir. This helps prevent NFS problems with a large number of file in a directory. Using -outDirDepth=3 would produce ./1/2/3/outRoot123.fa. -prefixLength=N - used with byname option. create a separate output file for each group of sequences names with same prefix of length N. ================================================================ ======== faToFastq ==================================== ================================================================ faToFastq - Convert fa to fastq format, just faking quality values. usage: faToFastq in.fa out.fastq options: -qual=X quality letter to use. Default is '<' which is good I think.... ================================================================ ======== faToTab ==================================== ================================================================ faToTab - convert fa file to tab separated file usage: faToTab infileName outFileName options: -type=seqType sequence type, dna or protein, default is dna -keepAccSuffix - don't strip dot version off of sequence id, keep as is ================================================================ ======== faToTwoBit ==================================== ================================================================ faToTwoBit - Convert DNA from fasta to 2bit format usage: faToTwoBit in.fa [in2.fa in3.fa ...] out.2bit options: -noMask - Ignore lower-case masking in fa file. -stripVersion - Strip off version number after . for genbank accessions. -ignoreDups - only convert first sequence if there are duplicate sequence - names. Use 'twoBitDup' to find duplicate sequences. ================================================================ ======== faTrans ==================================== ================================================================ faTrans - Translate DNA .fa file to peptide usage: faTrans in.fa out.fa options: -stop stop at first stop codon (otherwise puts in Z for stop codons) -offset=N start at a particular offset. -cdsUpper - cds is in upper case ================================================================ ======== fastqToFa ==================================== ================================================================ fastqToFa - Convert from fastq to fasta format. usage: fastqToFa [options] in.fastq out.fa options: -nameVerify='string' - for multi-line fastq files, 'string' must match somewhere in the sequence names in order to correctly identify the next sequence block (e.g.: -nameVerify='Supercontig_') -qual=file.qual.fa - output quality scores to specifed file (default: quality scores are ignored) -qualSizes=qual.sizes - write sizes file for the quality scores -noErrors - warn only on problems, do not error out (specify -verbose=3 to see warnings -solexa - use Solexa/Illumina quality score algorithm (instead of Phread quality) -verbose=2 - set warning level to get some stats output during processing ================================================================ ======== featureBits ==================================== ================================================================ featureBits - Correlate tables via bitmap projections. usage: featureBits database table(s) This will return the number of bits in all the tables anded together Pipe warning: output goes to stderr. Options: -bed=output.bed Put intersection into bed format. Can use stdout. -fa=output.fa Put sequence in intersection into .fa file -faMerge For fa output merge overlapping features. -minSize=N Minimum size to output (default 1) -chrom=chrN Restrict to one chromosome -chromSize=sizefile Read chrom sizes from file instead of database. (chromInfo three column format) -or Or tables together instead of anding them -not Output negation of resulting bit set. -countGaps Count gaps in denominator -noRandom Don't include _random (or Un) chromosomes -noHap Don't include _hap chromosomes -dots=N Output dot every N chroms (scaffolds) processed -minFeatureSize=n Don't include bits of the track that are smaller than minFeatureSize, useful for differentiating between alignment gaps and introns. -bin=output.bin Put bin counts in output file -binSize=N Bin size for generating counts in bin file (default 500000) -binOverlap=N Bin overlap for generating counts in bin file (default 250000) -bedRegionIn=input.bed Read in a bed file for bin counts in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of bin counts in specific regions from bedRegionIn -enrichment Calculates coverage and enrichment assuming first table is reference gene track and second track something else Enrichment is the amount of table1 that covers table2 vs. the amount of table1 that covers the genome. It's how much denser table1 is in table2 than it is genome-wide. '-where=some sql pattern' Restrict to features matching some sql pattern You can include a '!' before a table name to negate it. Some table names can be followed by modifiers such as: :exon:N Break into exons and add N to each end of each exon :cds Break into coding exons :intron:N Break into introns, remove N from each end :utr5, :utr3 Break into 5' or 3' UTRs :upstream:N Consider the region of N bases before region :end:N Consider the region of N bases after region :score:N Consider records with score >= N :upstreamAll:N Like upstream, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd :endAll:N Like end, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd The tables can be bed, psl, or chain files, or a directory full of such files as well as actual database tables. To count the bits used in dir/chrN_something*.bed you'd do: featureBits database dir/_something.bed ================================================================ ======== fetchChromSizes ==================================== ================================================================ usage: fetchChromSizes > .chrom.sizes used to fetch chrom.sizes information from UCSC for the given - name of UCSC database, e.g.: hg18, mm9, etc ... This script expects to find one of the following commands: wget, mysql, or ftp in order to fetch information from UCSC. Route the output to the file .chrom.sizes as indicated above. Example: fetchChromSizes hg18 > hg18.chrom.sizes ================================================================ ======== findMotif ==================================== ================================================================ findMotif - find specified motif in sequence usage: findMotif [options] -motif= sequence where: sequence is a .fa , .nib or .2bit file or a file which is a list of sequence files. options: -motif= - search for this specified motif (case ignored, [acgt] only) -chr= - process only this one chrN from the sequence -strand=<+|-> - limit to only one strand. Default is both. -bedOutput - output bed format (this is the default) -wigOutput - output wiggle data format instead of bed file -verbose=N - set information level [1-4] NOTE: motif must be longer than 4 characters, less than 17 -verbose=4 - will display gaps as bed file data lines to stderr ================================================================ ======== gapToLift ==================================== ================================================================ gapToLift - create lift file from gap table(s) usage: gapToLift [options] db liftFile.lft uses gap table(s) from specified db. Writes to liftFile.lft generates lift file segements separated by non-bridged gaps. options: -chr=chrN - work only on given chrom -minGap=M - examine gaps only >= than M -insane - do *not* perform coordinate sanity checks on gaps -bedFile=fileName.bed - output segments to fileName.bed -verbose=N - N > 1 see more information about procedure ================================================================ ======== genePredCheck ==================================== ================================================================ genePredCheck - validate genePred files or tables usage: genePredCheck [options] fileTbl .. If fileTbl is an existing file, then it is check. Otherwise, if -db is provided, then a table by this name is checked. options: -db=db - If specified, then this database is used to get chromosome sizes, and perhaps the table to check. ================================================================ ======== genePredHisto ==================================== ================================================================ wrong number of arguments genePredHisto - get data for generating histograms from a genePred file. usage: genePredHisto [options] what genePredFile histoOut Options: -ids - a second column with the gene name, useful for finding outliers. The what arguments indicates the type of output. The output file is a list of numbers suitable for input to textHistogram or similar The following values are current implemented exonLen- length of exons 5utrExonLen- length of 5'UTR regions of exons cdsExonLen- length of CDS regions of exons 3utrExonLen- length of 3'UTR regions of exons exonCnt- count of exons 5utrExonCnt- count of exons containing 5'UTR cdsExonCnt- count of exons count CDS 3utrExonCnt- count of exons containing 3'UTR ================================================================ ======== genePredSingleCover ==================================== ================================================================ wrong # args genePredSingleCover - create single-coverage genePred files genePredSingleCover [options] inGenePred outGenePred Create a genePred file that have single CDS coverage of the genome. UTR is allowed to overlap. The default is to keep the gene with the largest numberr of CDS bases. Options: -scores=file - read scores used in selecting genes from this file. It consists of tab seperated lines of name chrom txStart score where score is a real or integer number. Higher scoring genes will be choosen over lower scoring ones. Equaly scoring genes are choosen by number of CDS bases. If this option is supplied, all genes must be in the file ================================================================ ======== genePredToBed ==================================== ================================================================ genePredToBed - Convert from genePred to bed format. Does not yet handle genePredExt usage: genePredToBed in.genePred out.bed options: -xxx=XXX ================================================================ ======== genePredToFakePsl ==================================== ================================================================ genePredToFakePsl - Create a psl of fake-mRNA aligned to gene-preds from a file or table. usage: genePredToFakePsl db fileTbl pslOut cdsOut If fileTbl is an existing file, then it is used. Otherwise, the table by this name is used. pslOut specifies the fake-mRNA output psl filename. cdsOut specifies the output cds tab-separated file which contains genbank-style CDS records showing cdsStart..cdsEnd e.g. NM_123456 34..305 ================================================================ ======== genePredToGtf ==================================== ================================================================ genePredToGtf - Convert genePred table or file to gtf. usage: genePredToGtf database genePredTable output.gtf If database is 'file' then track is interpreted as a file rather than a table in database. options: -utr - Add 5UTR and 3UTR features -honorCdsStat - use cdsStartStat/cdsEndStat when defining start/end codon records -source=src set source name to uses -addComments - Add comments before each set of transcript records. allows for easier visual inspection Note: use a refFlat table or extended genePred table or file to include the gene_name attribute in the output. This will not work with a refFlat table dump file. If you are using a genePred file that starts with a numeric bin column, drop it using the UNIX cut command: cut -f 2- in.gp | genePredToGtf file stdin out.gp ================================================================ ======== genePredToMafFrames ==================================== ================================================================ wrong # args genePredToMafFrames - create mafFrames tables from a genePreds genePredToMafFrames [options] targetDb maf mafFrames geneDb1 genePred1 [geneDb2 genePred2...] Create frame annotations for one or more components of a MAF. It is significantly faster to process multiple gene sets in the same"run, as 95% of the CPU time is spent reading the MAF Arguments: o targetDb - db of target genome o maf - input MAF file o mafFrames - output file o geneDb1 - db in MAF that corresponds to genePred's organism. o genePred1 - genePred file. Overlapping annotations ahould have be removed. This file may optionally include frame annotations Options: -bed=file - output a bed of for each mafFrame region, useful for debugging. -verbose=level - enable verbose tracing, the following levels are implemented: 3 - print information about data used to compute each record. 4 - dump information about the gene mappings that were constructed 5 - dump information about the gene mappings after split processing 6 - dump information about the gene mappings after frame linking ================================================================ ======== gfClient ==================================== ================================================================ gfClient v. 35x1 - A client for the genomic finding program that produces a .psl file usage: gfClient host port seqDir in.fa out.psl where host is the name of the machine running the gfServer port is the same as you started the gfServer with seqDir is the path of the .nib or .2bit files relative to the current dir (note these are needed by the client as well as the server) in.fa is a fasta format file. May contain multiple records out.psl where to put the output options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein -prot Synonymous with -d=prot -q=prot -dots=N Output a dot every N query sequences -nohead Suppresses psl five line header -minScore=N sets minimum score. This is twice the matches minus the mismatches minus some sort of gap penalty. Default is 30 -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -out=type Controls output file format. Type is one of: psl - Default. Tab separated format without actual sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabular format with comments -maxIntron=N Sets maximum intron size. Default is 750000 ================================================================ ======== gfServer ==================================== ================================================================ gfServer v 35x1 - Make a server to quickly find where DNA occurs in genome. To set up a server: gfServer start host port file(s) Where the files are .nib or .2bit format files specified relative to the current directory. To remove a server: gfServer stop host port To query a server with DNA sequence: gfServer query host port probe.fa To query a server with protein sequence: gfServer protQuery host port probe.fa To query a server with translated dna sequence: gfServer transQuery host port probe.fa To query server with PCR primers gfServer pcr host port fPrimer rPrimer maxDistance To process one probe fa file against a .nib format genome (not starting server): gfServer direct probe.fa file(s).nib To test pcr without starting server: gfServer pcrDirect fPrimer rPrimer file(s).nib To figure out usage level gfServer status host port To get input file list gfServer files host port Options: -tileSize=N size of n-mers to index. Default is 11 for nucleotides, 4 for proteins (or translated nucleotides). -stepSize=N spacing between tiles. Default is tileSize. -minMatch=N Number of n-mer matches that trigger detailed alignment Default is 2 for nucleotides, 3 for protiens. -maxGap=N Number of insertions or deletions allowed between n-mers. Default is 2 for nucleotides, 0 for protiens. -trans Translate database to protein in 6 frames. Note: it is best to run this on RepeatMasked data in this case. -log=logFile keep a log file that records server requests. -seqLog Include sequences in log file (not logged with -syslog) -ipLog Include user's IP in log file (not logged with -syslog) -syslog Log to syslog -logFacility=facility log to the specified syslog facility - default local0. -mask Use masking from nib file. -repMatch=N Number of occurrences of a tile (nmer) that trigger repeat masking the tile. Default is 1024. -maxDnaHits=N Maximum number of hits for a dna query that are sent from the server. Default is 100. -maxTransHits=N Maximum number of hits for a translated query that are sent from the server. Default is 200. -maxNtSize=N Maximum size of untranslated DNA query sequence Default is 40000 -maxAaSize=N Maximum size of protein or translated DNA queries Default is 8000 -canStop If set then a quit message will actually take down the server ================================================================ ======== gff3ToGenePred ==================================== ================================================================ gff3ToGenePred - convert a GFF3 file to a genePred file usage: gff3ToGenePred inGff3 outGp options: -maxParseErrors=50 - Maximum number of parsing errors before aborting. A negative value will allow an unlimited number of errors. Default is 50. -maxConvertErrors=50 - Maximum number of conversion errors before aborting. A negative value will allow an unlimited number of errors. Default is 50. -honorStartStopCodons - only set CDS start/stop status to complete if there are corresponding start_stop codon records This converts: - top-level gene records with mRNA records - top-level mRNA records - mRNA records that contain: - exon and CDS - CDS, five_prime_UTR, three_prime_UTR - only exon for non-coding - top-level gene records with transcript records - top-level transcript records - transcript records that contain: - exon The first step is to parse GFF3 file, up to 50 errors are reported before aborting. If the GFF3 files is successfully parse, it is converted to gene, annotation. Up to 50 conversion errors are reported before aborting. Input file must conform to the GFF3 specification: http://www.sequenceontology.org/gff3.shtml ================================================================ ======== gff3ToPsl ==================================== ================================================================ gff3ToPsl - convert a GFF3 CIGAR file to a PSL file usage: gff3ToPsl mapFile inGff3 out.psl arguments: mapFile mapping of locus names to chroms and sizes. File formatted: locusName chromeName chromSize inGff3 GFF3 formatted file with Gap attribute in match records out.psl PSL formatted output options: This converts: The first step is to parse GFF3 file, up to 50 errors are reported before aborting. If the GFF3 files is successfully parse, it is converted to PSL Input file must conform to the GFF3 specification: http://www.sequenceontology.org/gff3.shtml ================================================================ ======== gmtime ==================================== ================================================================ gmtime - convert unix timestamp to date string usage: gmtime