Procedure for Creating a Mirror Site for the UCSC Genome Browser
|
|
|
The following procedure provides you with step-by-step
instructions to incrementally create a mirror of the
UCSC Genome Browser. You may choose to set up either a
full mirror browser or a partial one, depending on your disk
space and needs. See also:
Minimal Browser Installation
instructions on
genomewiki.
Space required:
The amount of data available in the Genome Browser is
growing constantly. To determine the size of any of
the download directories mentioned in these instructions,
use the rsync "-n" option on the directory prior
to actually transferring the data. For instance, to find
the size of the /gbdb directory, run:
rsync -navzP rsync://hgdownload.cse.ucsc.edu/gbdb/
The rsync options used in this command are:
-n, --dry-run show what would have been transferred
-a, --archive archive mode, equivalent to -rlptgoD
-v, --verbose increase verbosity
-z, --compress compress file data
-P equivalent to --partial --progress
Mirror site questions may be directed to the
mailing list genome-mirror@soe.ucsc.edu.
Messages sent to this address will be posted to the
moderated genome-mirror mailing list, which is
archived on a public Web-accessible pipermail
archive. This archive may be indexed by non-UCSC
sites such as Google.
Subscribe to the genome-mirror
mailing list.
| |
|
|
|
- Install Apache server and MySQL server
- Get all the html files and most of the text files
- Get the data for each individual genome assembly and install
databases
- Obtain the /gbdb data file area
- Set up database
- Get executable files
- Set up Hgcentral session preference table
- Create a $WEBROOT/trash apache user writable directory for temp files
- Set up HgFixed database
- Set up the protein databases
- Set up the UniProt database
- Set up the visiGene database
1. Install Apache server and MySQL server
-
We will not provide consulting on the operation or configuration
of Apache or MySQL. If you are not familiar with the setup of
Apache or MySQL, you will have to find someone at your site that
is. We use a reference platform of Redhat 9 for all the steps
described here.
-
Beginning early March 2005, the static html pages for the browser expect
Apache to be configured with the XbitHack option enabled.
#
# Executable files will be processed for SSI
#
XBitHack on
-
Find the location of your web pages. For example, on a Redhat Linux
system using a stock apache RPM, the web pages are stored in
/var/www/html. We will refer to this directory as
WEBROOT in this document and you should substitute your real
path for WEBROOT whenever you see that in our write-up. But, before
doing this, you MUST be aware of the following:
-
THE FOLLOWING ASSUMES YOU ARE NOT ALREADY SERVING UP
DATA USING YOUR APACHE SERVER. BEAR IN MIND THE SELECTION OF
THE WEBROOT WILL OVERWRITE ANY EXISTING DATA IN THAT AREA.
-
Find the location of your cgi-bin directory. It should be under
the parent directory of the WEBROOT directory. For example, on
a Redhat Linux system using a stock apache RPM, the path of
the cgi-bin directory is /var/www/cgi-bin. We will refer to
this directory as CGI_BIN in this document and you should substitute your
real path for CGI_BIN whenever you see that.
-
Next, find the location of your MySQL data. For example, on a
Redhat Linux system using stock RPM's, this is located in
/var/lib/mysql. We will refer to this directory as
MYSQLDATA in this document and you should substitute
your real path for MYSQLDATA whenever you see that.
2. Get all the html files and most of the text files
-
Obtain the software package rsync. If it isn't already
installed on your system, obtain it from
http://rsync.samba.org
- Test the rsync connection:
rsync -navz --progress rsync://hgdownload.cse.ucsc.edu
This should respond with:
genome UCSC Human Genome Downloads
htdocs UCSC Human Genome Web Site Htdocs
goldenPath UCSC Human Golden Path Downloads
cgi-bin UCSC Human Genome Web Site CGI Binaries x86_64
cgi-bin-i386 UCSC Human Genome Web Site CGI Binaries i386
gbdb UCSC Human Genome Browser Gbdb Config Files
archives UCSC Human Genome Browser Archived Config Files
mysql UCSC Human Genome Raw Mysql Tables
- Determine the destination of the copy ($WEBROOT) and fire off the production copy (270 Mb)
The trailing slash is important!
rsync -avzP rsync://hgdownload.cse.ucsc.edu/htdocs/ $WEBROOT/
3. Get the data for each individual genome assembly and installing databases
- Determine which of the databases you are going to mirror. To see all
available databases, use the "SHOW DATABASES;" command on the
public MySQL
server.
- Get the data for each of the desired databases. For instance, to get the
Human March 2006 full data set, do:
mkdir -p $WEBROOT/goldenPath/hg18/database/
rsync -avzP --delete --max-delete=20 \
rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hg18/database/ \
$WEBROOT/goldenPath/hg18/database/
4. Obtain the /gbdb data file area
- You will need the portions of /gbdb used by the browser:
rsync -avzP --delete --max-delete=20 \
rsync://hgdownload.cse.ucsc.edu/gbdb/ /gbdb/
5. Set up database
Use the following table to identify the freeze
date ($FREEZEDATE) and the database version ($DBVERSION) for each genome
assembly:
$FREEZEDATE | | $DBVERSION |
--------------- | | --------------- |
hg18 | | hg18 |
hg17 | | hg17 |
hg16 | | hg16 |
10april2003 | | hg15 |
panTro2 | | panTro2 |
panTro1 | | panTro1 |
canFam2 | | canFam2 |
canFam1 | | canFam1 |
bosTau2 | | bosTau2 |
bosTau1 | | bosTau1 |
mm8 | | mm8 |
mm7 | | mm7 |
mm6 | | mm6 |
mm5 | | mm5 |
rn4 | | rn4 |
rnJun2003 | | rn3 |
rnJan2003 | | rn2 |
galGal3 | | galGal3 |
galGal2 | | galGal2 |
monDom4 | | monDom4 |
monDom1 | | monDom1 |
xenTro2 | | xenTro2 |
xenTro1 | | xenTro1 |
danRer4 | | danRer4 |
danRer3 | | danRer3 |
danRer2 | | danRer2 |
danRer1 | | danRer1 |
tetNig1 | | tetNig1 |
fr1 | | fr1 |
ci1 | | ci1 |
dm2 | | dm2 |
dm1 | | dm1 |
droYak1 | | droYak1 |
droAna1 | | droAna1 |
dp2 | | dp2 |
droVir1 | | droVir1 |
droMoj1 | | droMoj1 |
apiMel2 | | apiMel2 |
apiMel1 | | apiMel1 |
anoGam1 | | anoGam1 |
ce2 | | ce2 |
ceMay2003 | | ce1 |
cbJul2002 | | cb1 |
sacCer1 | | sacCer1 |
scApr2003 | | sc1 |
- After connecting to the MySQL server, create a database called "$DBVERSION"
corresponding to $FREEZEDATE in the table above.
mysql> create database $DBVERSION;
- Create tables for the "$DBVERSION" database.
- Issue permission to the "$DBVERSION" database.
Permission could be issued as follows:
mysql> grant SELECT, CREATE TEMPORARY TABLES on $DBVERSION.*
to $USERNAME@$HOSTNAME identified by "$PASSWORD";
To make this command work, you need to set up $USERNAME,$HOSTNAME and
$PASSWORD properly. In this example, public access to the "$DBVERSION"
database is restricted to read-only.
6. Get executable files
- Pre-compiled Red-Hat (2.6.12-1.1381_FC3smp) AMD Opteron x86_64 64-bit
binaries can be fetched with the rsync command:
rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin/ $CGI_BIN/
There are a number of data files that are also used in this directory.
This rsync will fetch them all. If you need i386 (x86) 32-bit binaries, please
use the following rsync in addition to and after the above rsync,
to replace the 64 bit binaries:
rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin-i386/ $CGI_BIN/
- In the CGI_BIN directory, make an
hg.conf file,
in which $USERNAME,$HOSTNAME and $PASSWORD used for the "$DBVERSION" database should be specified.
Remember, variables should not be used in this file.
- Set up environment variables needed by the Makefile to install the binaries for the website in the correct place.
setenv GLOBAL_CONFIG_FILE $CGI_BIN/hg.conf
setenv HGCGI $CGI_BIN
- Download the released zipped version of the source files from
here.
Follow the instructions for how to compile
the files.
The source tree can also be obtained via CVS
Existing Mirror Sites: Please note the change history for the
hgcentral database. Table structure changes have corresponding
structure changes in the source code. The browser version can
be seen in the title window decoration of the genome browser display
page. Or the source tree file src/hg/inc/versionInfo.h
since v62.
Change history for hgcentral database tables:
- 2005-03-28 table liftOverChain structure change, several fields added.
(browser version v102)
- 2004-12-21 added two new tables: clade and genomeClade to support
new pulldown menus in the gateway page. (browser version v93)
- 2004-08-02 table blatServers has canPcr field added
(browser version v74)
- 2004-06-01 table dbDb has hgNearOk, hgPbOk and sourceName fields added
(browser version v66)
- 2004-04-16 table liftOverChain added (browser version v60)
7. Set up the "hgcentral" tables
- Download the schema for the hgcentral database here.
- Create a hgcentral database
mysql> create database hgcentral
- Add the hgcentral tables
mysql -youraccountoptions hgcentral < hgcentral.sql
- Create a user/password with the ability to update and insert. Many folks
start out with the same user that they use for reading browser databases
and then later create a separate user with higher privilege. However, to
get started, consider trying the same user.
- Add that user to the hg.conf. A sample hg.conf is below:
###########################################################
# Config file for the UCSC Human Genome server.
# This file specifies the host and user for MySQL
# database access.
#
# the format is in the form of name/value pairs
# written 'name=value' (note that there is no space between
# the name and its value.)
###########################################################
# db.host is the name of the MySQL host to connect to
db.host=localhost
# db.user is the username is use when connecting to the host
# This user only needs SELECT permissions for read-only access
db.user=myhguser
# this is the password to use with the above hostname
db.password=myhguserpassword
db.trackDb=trackDb
###########################################################
# central.host is the name of the host of the central MySQL
# database where items common to all versions of the genome
# and the user database is stored. central.db is the name of
# the database to access on that host for this information.
central.host=localhost
central.db=hgcentral
#
# The central.user needs SELECT, INSERT, UPDATE, DELETE,
# CREATE, DROP and ALTER permissions for hgcentral
# to allow maintainence of the session and user Db's
#
central.user=myhguser
central.password=myhguserpassword
central.domain=.mydomain.edu
###########################################################
# backupcentral is used when the primary central DB fails.
# This can be identical to the central entries as above.
backupcentral.host=localhost
backupcentral.db=hgcentral
backupcentral.user=myhguser
backupcentral.password=myhguserpassword
backupcentral.domain=.mydomain.edu
# Change this default documentRoot if different in your installation,
# to allow some of the browser cgi binaries to find help text
# files
browser.documentRoot=/usr/local/apache/htdocs
# New browser function as of March 2007, allowing saved genome browser
# sessions into genomewiki
wiki.host=genomewiki.ucsc.edu
wiki.userNameCookie=wikidb_mw1_UserName
wiki.loggedInCookie=wikidb_mw1_UserID
# New browser function as of March 2007. Future browser code will
# have this on by default, and can be turned off with =off
# Initial release of this function requires it to be turned on here.
browser.indelOptions=on
# personalize the background of the browser with a specified jpg
# floret.jpg is the standard UCSC default
browser.background=/images/floret.jpg
# new option for track reording functions, August 2006
hgTracks.trackReordering=on
# New browser function as of April 2007, custom track data is kept
# in a database instead of in trash files. This function requires
# several other factors to be in place before it will work.
# In this first implementation, this is an optional feature, but
# approximately by the end of the year 2007, this will be
# required.
#
# See also:
# http://genomewiki.ucsc.edu/index.php?title=Using_custom_track_database
# Uncomment these settings and provide host, user, and password
# settings
# customTracks.host=<your specific host name>
# customTracks.user=<your specific MySQL user for this function>
# customTracks.password=<MySQL password for specified user>
# customTracks.useAll=yes
# customTracks.tmpdir=/data/tmp
# tmpdir of /data/tmp is the default location if not specified
# here
# Set this to a directory as recommended in the genomewiki
# discussion mentioned above.
8. Create a "trash" directory
The cgi programs use a temporary area to create and store images
used by the browser. This directory is by default looked for in
$WEBROOT/trash. You should make this directory and allow the user
that runs the web server write access to it.
9. Create the hgFixed database
- Download a copy of the dumped hgFixed tables.
rsync -avzP --delete --max-delete=20 \
rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hgFixed/ \
$WEBROOT/goldenPath/hgFixed/
- Create a hgFixed database
mysql> create database hgFixed
- Import the data in a similar fashion as listed above.
10. Create the protein databases
- Download a copy of the dumped protein tables.
rsync -avzP --delete --max-delete=20 \
rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/proteinDB/ \
$WEBROOT/goldenPath/proteinDB/
- Get a list of all of the protein databases like so:
mysql> SELECT DISTINCT(proteomeDb) FROM hgcentral.gdbPdb ORDER BY proteomeDb;
- Create a database for each one, e.g.:
mysql> create database proteins080707
mysql> create database proteins090821
and so on
- Import the data from proteinDB/proteins*/database in a similar
fashion as listed above.
- Create a symlink to the most recent proteins database, e.g.:
/var/lib/mysql/proteome -> proteins090821
11. Create the UniProt databases
- Please note usage rights for the UniProt database.
- Download a copy of the dumped UniProt tables.
rsync -avzP --delete --max-delete=20 \
rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/uniProt/ \
$WEBROOT/goldenPath/uniProt/
- Get a list of all of the UniProt databases like so:
mysql> SHOW DATABASES LIKE 'sp%';
- Create a database for each one, e.g.:
mysql> create database sp080707
mysql> create database sp090821
and so on
- Import the data from sp*/database in a similar fashion as listed above.
- Create a symlink to the most recent uniProt database, e.g.:
/var/lib/mysql/uniProt -> sp090821
12. Create the visiGene database
- Download a copy of the dumped visiGene tables.
rsync -avzP --delete --max-delete=20 \
rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/visiGene/ \
$WEBROOT/goldenPath/visiGene/
- Create a visiGene database
mysql> create database visiGene
- Import the data in a similar fashion as listed above.
Should you have any comments or questions, please contact
genome-mirror@cse.ucsc.edu
.
This page last modified: Thursday, 06-May-2010 11:11:19 PDT.
|
|
|
|