UCSC Genome Bioinformatics
Home- Genomes- Blat- Tables- Gene Sorter- PCR- Session- FAQ- Help
  Procedure for Creating a Mirror Site for the UCSC Genome Browser
 

The following procedure provides you with step-by-step instructions to incrementally create a mirror of the UCSC Genome Browser. You may choose to set up either a full mirror browser or a partial one, depending on your disk space and needs. See also: Minimal Browser Installation instructions on genomewiki.

Space required:

The amount of data available in the Genome Browser is growing constantly. To determine the size of any of the download directories mentioned in these instructions, use the rsync "-n" option on the directory prior to actually transferring the data. For instance, to find the size of the /gbdb directory, run:

rsync -navzP rsync://hgdownload.cse.ucsc.edu/gbdb/
The rsync options used in this command are:
-n, --dry-run         show what would have been transferred
-a, --archive         archive mode, equivalent to -rlptgoD
-v, --verbose         increase verbosity
-z, --compress        compress file data
-P                    equivalent to --partial --progress

Mirror site questions may be directed to the mailing list genome-mirror@soe.ucsc.edu. Messages sent to this address will be posted to the moderated genome-mirror mailing list, which is archived on a public Web-accessible pipermail archive. This archive may be indexed by non-UCSC sites such as Google.

Subscribe to the genome-mirror mailing list.

Search the Genome-mirror mailing list archives:  



  Step-by-Step Details
 
  1. Install Apache server and MySQL server
  2. Get all the html files and most of the text files 
  3. Get the data for each individual genome assembly and install databases  
  4. Obtain the /gbdb data file area 
  5. Set up database 
  6. Get executable files
  7. Set up Hgcentral session preference table
  8. Create a $WEBROOT/trash apache user writable directory for temp files
  9. Set up HgFixed database 
  10. Set up the protein databases 
  11. Set up the UniProt database 
  12. Set up the visiGene database 



1. Install Apache server and MySQL server

  1. We will not provide consulting on the operation or configuration of Apache or MySQL. If you are not familiar with the setup of Apache or MySQL, you will have to find someone at your site that is. We use a reference platform of Redhat 9 for all the steps described here.
  2. Beginning early March 2005, the static html pages for the browser expect Apache to be configured with the XbitHack option enabled.
    #
    # Executable files will be processed for SSI
    #
    XBitHack on
    
  3. Find the location of your web pages. For example, on a Redhat Linux system using a stock apache RPM, the web pages are stored in /var/www/html. We will refer to this directory as WEBROOT in this document and you should substitute your real path for WEBROOT whenever you see that in our write-up. But, before doing this, you MUST be aware of the following:
    • THE FOLLOWING ASSUMES YOU ARE NOT ALREADY SERVING UP DATA USING YOUR APACHE SERVER. BEAR IN MIND THE SELECTION OF THE WEBROOT WILL OVERWRITE ANY EXISTING DATA IN THAT AREA.
  4. Find the location of your cgi-bin directory. It should be under the parent directory of the WEBROOT directory. For example, on a Redhat Linux system using a stock apache RPM, the path of the cgi-bin directory is /var/www/cgi-bin. We will refer to this directory as CGI_BIN in this document and you should substitute your real path for CGI_BIN whenever you see that.
  5. Next, find the location of your MySQL data. For example, on a Redhat Linux system using stock RPM's, this is located in /var/lib/mysql. We will refer to this directory as MYSQLDATA in this document and you should substitute your real path for MYSQLDATA whenever you see that.

2. Get all the html files and most of the text files

  1. Obtain the software package rsync. If it isn't already installed on your system, obtain it from http://rsync.samba.org
  2. Test the rsync connection:
    rsync -navz --progress rsync://hgdownload.cse.ucsc.edu
    
    This should respond with:
    genome          UCSC Human Genome Downloads
    htdocs          UCSC Human Genome Web Site Htdocs
    goldenPath      UCSC Human Golden Path Downloads
    cgi-bin         UCSC Human Genome Web Site CGI Binaries x86_64
    cgi-bin-i386    UCSC Human Genome Web Site CGI Binaries i386
    gbdb            UCSC Human Genome Browser Gbdb Config Files
    archives        UCSC Human Genome Browser Archived Config Files
    mysql           UCSC Human Genome Raw Mysql Tables
    
  3. Determine the destination of the copy ($WEBROOT) and fire off the production copy (270 Mb)
    The trailing slash is important!
    rsync -avzP rsync://hgdownload.cse.ucsc.edu/htdocs/ $WEBROOT/
    

3. Get the data for each individual genome assembly and installing databases

  1. Determine which of the databases you are going to mirror. To see all available databases, use the "SHOW DATABASES;" command on the public MySQL server.
  2. Get the data for each of the desired databases. For instance, to get the Human March 2006 full data set, do:

    mkdir -p $WEBROOT/goldenPath/hg18/database/
    rsync -avzP --delete --max-delete=20 \
      rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hg18/database/ \
      $WEBROOT/goldenPath/hg18/database/
    

4. Obtain the /gbdb data file area

  1. You will need the portions of /gbdb used by the browser:
    rsync -avzP --delete --max-delete=20 \
      rsync://hgdownload.cse.ucsc.edu/gbdb/ /gbdb/
    

5. Set up database

    Use the following table to identify the freeze date ($FREEZEDATE) and the database version ($DBVERSION) for each genome assembly:

    $FREEZEDATE $DBVERSION
    --------------- ---------------
    hg18 hg18
    hg17 hg17
    hg16 hg16
    10april2003 hg15
    panTro2 panTro2
    panTro1 panTro1
    canFam2 canFam2
    canFam1 canFam1
    bosTau2 bosTau2
    bosTau1 bosTau1
    mm8 mm8
    mm7 mm7
    mm6 mm6
    mm5 mm5
    rn4 rn4
    rnJun2003 rn3
    rnJan2003 rn2
    galGal3 galGal3
    galGal2 galGal2
    monDom4 monDom4
    monDom1 monDom1
    xenTro2 xenTro2
    xenTro1 xenTro1
    danRer4 danRer4
    danRer3 danRer3
    danRer2 danRer2
    danRer1 danRer1
    tetNig1 tetNig1
    fr1 fr1
    ci1 ci1
    dm2 dm2
    dm1 dm1
    droYak1 droYak1
    droAna1 droAna1
    dp2 dp2
    droVir1 droVir1
    droMoj1 droMoj1
    apiMel2 apiMel2
    apiMel1 apiMel1
    anoGam1 anoGam1
    ce2 ce2
    ceMay2003 ce1
    cbJul2002 cb1
    sacCer1 sacCer1
    scApr2003 sc1

  1. After connecting to the MySQL server, create a database called "$DBVERSION" corresponding to $FREEZEDATE in the table above.
    mysql> create database $DBVERSION;
    
  2. Create tables for the "$DBVERSION" database.
    • Move to $WEBROOT/goldenPath/$FREEZEDATE/database/ directory.
      All the files for creating the "$DBVERSION" database are in this directory.
    • Create and load data into the tables: (The script here is an example for this process.
      This takes a long time, 10 hours or more for one database.)
      Optionally, you can download the MySQL tables via rsync directly into your /var/lib/mysql/ database hierarchy.
      Note the rsync source location for these directories is rsync://hgdownload.cse.ucsc.edu/mysql/
      You can see a list of directories to fetch with the command: rsync -v --dry-run rsync://hgdownload.cse.ucsc.edu/mysql/
      #!/bin/sh
      DB=$FREEZEDATE
      cd $WEBROOT/goldenPath/${DB}/database
      for SQL in *.sql
      do
          T_NAME=${SQL%%.sql}
          echo "loading table ${T_NAME}"
          mysql -youraccountoptions -e "DROP TABLE ${T_NAME};" ${DB} \
      	> /dev/null 2> /dev/null
          mysql -youraccountoptions ${DB} < ${SQL}
          zcat "${T_NAME}.txt.gz" | mysql -youraccountoptions --local-infile=1 \
              -e "LOAD DATA LOCAL INFILE \"/dev/stdin\" INTO TABLE ${T_NAME};" ${DB}
      done
      
  3. Issue permission to the "$DBVERSION" database.
    Permission could be issued as follows:
    mysql> grant SELECT, CREATE TEMPORARY TABLES on $DBVERSION.*
       to $USERNAME@$HOSTNAME identified by "$PASSWORD";
    

    To make this command work, you need to set up $USERNAME,$HOSTNAME and $PASSWORD properly. In this example, public access to the "$DBVERSION" database is restricted to read-only.

6. Get executable files

  1. Pre-compiled Red-Hat (2.6.12-1.1381_FC3smp) AMD Opteron x86_64 64-bit binaries can be fetched with the rsync command:

    rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin/ $CGI_BIN/

    There are a number of data files that are also used in this directory. This rsync will fetch them all. If you need i386 (x86) 32-bit binaries, please use the following rsync in addition to and after the above rsync, to replace the 64 bit binaries:

    rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin-i386/ $CGI_BIN/

  2. In the CGI_BIN directory, make an hg.conf file, in which $USERNAME,$HOSTNAME and $PASSWORD used for the "$DBVERSION" database should be specified. Remember, variables should not be used in this file.

  3. Set up environment variables needed by the Makefile to install the binaries for the website in the correct place.
    setenv GLOBAL_CONFIG_FILE $CGI_BIN/hg.conf
    setenv HGCGI $CGI_BIN
    
  4. Download the released zipped version of the source files from here.
    Follow the instructions for how to compile the files.
    The source tree can also be obtained via CVS

Existing Mirror Sites: Please note the change history for the hgcentral database. Table structure changes have corresponding structure changes in the source code. The browser version can be seen in the title window decoration of the genome browser display page. Or the source tree file src/hg/inc/versionInfo.h since v62.

Change history for hgcentral database tables:
  • 2005-03-28 table liftOverChain structure change, several fields added. (browser version v102)
  • 2004-12-21 added two new tables: clade and genomeClade to support new pulldown menus in the gateway page. (browser version v93)
  • 2004-08-02 table blatServers has canPcr field added (browser version v74)
  • 2004-06-01 table dbDb has hgNearOk, hgPbOk and sourceName fields added (browser version v66)
  • 2004-04-16 table liftOverChain added (browser version v60)

7. Set up the "hgcentral" tables

  1. Download the schema for the hgcentral database here.
  2. Create a hgcentral database

       mysql> create database hgcentral
    

  3. Add the hgcentral tables

       mysql -youraccountoptions hgcentral < hgcentral.sql
    
  4. Create a user/password with the ability to update and insert. Many folks start out with the same user that they use for reading browser databases and then later create a separate user with higher privilege. However, to get started, consider trying the same user.
  5. Add that user to the hg.conf. A sample hg.conf is below:
    ###########################################################
    # Config file for the UCSC Human Genome server.
    #	This file specifies the host and user for MySQL
    #	database access.
    #
    # the format is in the form of name/value pairs
    # written 'name=value' (note that there is no space between
    # the name and its value.)
    ###########################################################
    # db.host is the name of the MySQL host to connect to
    db.host=localhost
    # db.user is the username is use when connecting to the host
    #	This user only needs SELECT permissions for read-only access
    db.user=myhguser
    # this is the password to use with the above hostname
    db.password=myhguserpassword
    db.trackDb=trackDb
    
    ###########################################################
    # central.host is the name of the host of the central MySQL
    # database where items common to all versions of the genome
    # and the user database is stored.  central.db is the name of
    # the database to access on that host for this information.
    central.host=localhost
    central.db=hgcentral
    #
    # The central.user needs SELECT, INSERT, UPDATE, DELETE,
    #	CREATE, DROP and ALTER permissions for hgcentral
    #	to allow maintainence of the session and user Db's
    #
    central.user=myhguser
    central.password=myhguserpassword
    central.domain=.mydomain.edu
    ###########################################################
    # backupcentral is used when the primary central DB fails.
    #	This can be identical to the central entries as above.
    backupcentral.host=localhost
    backupcentral.db=hgcentral
    backupcentral.user=myhguser
    backupcentral.password=myhguserpassword
    backupcentral.domain=.mydomain.edu
    
    # Change this default documentRoot if different in your installation,
    #       to allow some of the browser cgi binaries to find help text
    #       files
    browser.documentRoot=/usr/local/apache/htdocs
    
    #  New browser function as of March 2007, allowing saved genome browser
    #       sessions into genomewiki
    wiki.host=genomewiki.ucsc.edu
    wiki.userNameCookie=wikidb_mw1_UserName
    wiki.loggedInCookie=wikidb_mw1_UserID
    
    # New browser function as of March 2007.  Future browser code will
    #       have this on by default, and can be turned off with =off
    #   Initial release of this function requires it to be turned on here.
    browser.indelOptions=on
    
    #       personalize the background of the browser with a specified jpg
    #       floret.jpg is the standard UCSC default
    browser.background=/images/floret.jpg
    #  new option for track reording functions, August 2006
    hgTracks.trackReordering=on
    
    #  New browser function as of April 2007, custom track data is kept
    #       in a database instead of in trash files.  This function requires
    #       several other factors to be in place before it will work.
    #  In this first implementation, this is an optional feature, but
    #       approximately by the end of the year 2007, this will be
    #       required.
    #
    #       See also:
    #       http://genomewiki.ucsc.edu/index.php?title=Using_custom_track_database
    #  Uncomment these settings and provide host, user, and password
    #  settings
    # customTracks.host=<your specific host name>
    # customTracks.user=<your specific MySQL user for this function>
    # customTracks.password=<MySQL password for specified user>
    # customTracks.useAll=yes
    # customTracks.tmpdir=/data/tmp
    #       tmpdir of /data/tmp is the default location if not specified
    #       here
    #       Set this to a directory as recommended in the genomewiki
    #       discussion mentioned above.
    

8. Create a "trash" directory

The cgi programs use a temporary area to create and store images used by the browser. This directory is by default looked for in $WEBROOT/trash. You should make this directory and allow the user that runs the web server write access to it.

9. Create the hgFixed database

  1. Download a copy of the dumped hgFixed tables.
    rsync -avzP --delete --max-delete=20 \
       rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hgFixed/ \
       $WEBROOT/goldenPath/hgFixed/
    
  2. Create a hgFixed database
  3. mysql> create database hgFixed
    
  4. Import the data in a similar fashion as listed above.

10. Create the protein databases

  1. Download a copy of the dumped protein tables.
    rsync -avzP --delete --max-delete=20 \
        rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/proteinDB/ \
        $WEBROOT/goldenPath/proteinDB/
    
  2. Get a list of all of the protein databases like so:
    mysql> SELECT DISTINCT(proteomeDb) FROM hgcentral.gdbPdb ORDER BY proteomeDb;
  3. Create a database for each one, e.g.:
    mysql> create database proteins080707
    mysql> create database proteins090821
    and so on
    
  4. Import the data from proteinDB/proteins*/database in a similar fashion as listed above.
  5. Create a symlink to the most recent proteins database, e.g.:
    	/var/lib/mysql/proteome -> proteins090821
    

11. Create the UniProt databases

  1. Please note usage rights for the UniProt database.
  2. Download a copy of the dumped UniProt tables.
    rsync -avzP --delete --max-delete=20 \
      rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/uniProt/ \
      $WEBROOT/goldenPath/uniProt/
    
  3. Get a list of all of the UniProt databases like so:
    mysql> SHOW DATABASES LIKE 'sp%';
  4. Create a database for each one, e.g.:
    mysql> create database sp080707
    mysql> create database sp090821
    and so on
    
  5. Import the data from sp*/database in a similar fashion as listed above.
  6. Create a symlink to the most recent uniProt database, e.g.:
    	/var/lib/mysql/uniProt  -> sp090821
    

12. Create the visiGene database

  1. Download a copy of the dumped visiGene tables.
    rsync -avzP --delete --max-delete=20 \
       rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/visiGene/ \
       $WEBROOT/goldenPath/visiGene/
    
  2. Create a visiGene database
  3. mysql> create database visiGene
    
  4. Import the data in a similar fashion as listed above.


Should you have any comments or questions, please contact genome-mirror@cse.ucsc.edu .
This page last modified: Thursday, 06-May-2010 11:11:19 PDT.