Sites installation and setup tips: general considerations
This page refers to the INFNgrid customization, although most of its content may be applicable to plain EGEE gLite release.
Getting the operating system
The current release of the operating system to be used for gLite3.2 is ScientificLinux 5.5. Find in the
ScientificLinux website a list of mirrors from which to download the ISOs.
Note (as of 2011-02-10): release SL5.6 is already out, I expect it should also work.
It is advised NOT to use the CERN version of ScientificLinux (also known as SLC5.X), as this version is tailored for CERN environment. For example, it sets tons of environment variables specific to CERN.
Creating a local repository with mrepo
A local repository turns out very useful when doing multiple installations in parallel, or when you desire your system updates to be protected against network instabilities. If these are your needs, then
mrepo is the tool for you.
The mrepo repository will be served by a host which should be configured as an HTTP server.
Download the latest mrepo version from
the CERN DAG repository or from the
mrepo website: the package is available as an RPM. Last available version at time of writing is
0.8.7-1. Note that mrepo relies on another package,
createrepo, which can be downloaded from the DAG repository: on my SL5 host I got
createrepo version
0.4.11. Most probably, however, your machine already "knows" about the DAG repository: if you see a file
/etc/yum.repos.d/dag.repo you can install
mrepo more simply with
yum --enablerepo=dag install mrepo.
Note for SL4 users: the installed packages for
mrepo and
createrepo will be older versions than the ones listed above. You may still manage to have
yum groupinstall working, by following the advice in the next paragraph.
Note for both SL5 and SL4 users: you will need to modify the following in the
mrepo script, since the path to the
comps.xml file is not composed properly.
*** /usr/bin/mrepo 2010-03-22 01:57:41.000000000 +0100
--- mrepo.modified 2010-07-04 20:58:02.000000000 +0200
***************
*** 845,851 ****
repoopts = repoopts + ' --cachedir "%s"' % cachedir
if os.path.isdir(os.path.join(self.wwwdir, '.olddata')):
remove(os.path.join(self.wwwdir, '.olddata'))
! groupfile = os.path.join(cf.srcdir, self.dist.nick, self.name + '-comps.xml')
if os.path.isfile(groupfile):
symlink(groupfile, os.path.join(self.wwwdir, 'comps.xml'))
repoopts = repoopts + ' --groupfile "%s"' % groupfilename
--- 845,853 ----
repoopts = repoopts + ' --cachedir "%s"' % cachedir
if os.path.isdir(os.path.join(self.wwwdir, '.olddata')):
remove(os.path.join(self.wwwdir, '.olddata'))
! # groupfile = os.path.join(cf.srcdir, self.dist.nick, self.name + '-comps.xml')
! groupfile = os.path.join(cf.srcdir, self.dist.nick, self.name + '/comps.xml')
! info(2, 'Groupfile: %s' % (groupfile))
if os.path.isfile(groupfile):
symlink(groupfile, os.path.join(self.wwwdir, 'comps.xml'))
repoopts = repoopts + ' --groupfile "%s"' % groupfilename
After installing the package, edit
/etc/mrepo.conf file to read more or less as follows:
[main]
srcdir = /data2/mrepo
# Make wwwdir to point to a directory served by the webserver
wwwdir = /var/www/mrepo
confdir = /etc/mrepo.conf.d
arch = x86_64
metadata=yum repomd
# option -P=4 will be passed to lftp, to allow parallel downloads
#-P=4 option sometimes gives troubles...# lftp-mirror-options = -c -P=4 -X "*/Fermi/*" -X "*/example/*"
lftp-mirror-options = -c -X "*/Fermi/*" -X "*/example/*"
# Option -d is used to make 'yum groupinstall' work: it can be used ONLY: from createrepo 0.4.11 onwards.
# If you have an older version of createrepo, comment out the following line
createrepo-options = -d -p
Note that all packages will be downloaded to
srcdir, hence allow for enough disk space. Also note that from time to time there are updates being added to the repositories: as a bare minimum allow 50 GB, but more would be desirable (say 100 GB).
On the other hand, the path pointed to by
wwwdir will contain mainly compressed xml files, plus symbolic links to files under
srcdir. Last line is important, to avoid endless recursion when downloading from the ScientificLinux website. Configuration files for individual repositories to be mirrored shall be put under
confdir.
Verify that mrepo installation created a crontab under
/etc/cron.d like this one:
### Enable this if you want mrepo to daily synchronize
### your distributions and repositories at 2:30am.
30 0 * * * root echo "### Mrepo running on: "`date` >> /var/log/mrepo.log 2>&1 ; /usr/bin/mrepo -ugfv >> /var/log/mrepo.log 2>&1
The first time you run mrepo I advise you run it interactively, though.
Find below a list of useful configuration files for mrepo, to be put under
confdir. Read the content of each file and possibly tailor it to your needs/taste: if there is something you do not want to mirror, you have two possibilities:
- edit the configuration file and comment the relevant lines, or
- if you are not interested in ALL archives contained in the file you can more easily rename it. For example, suppose that after initial mirroring you don't want to mirror
jpackage (since, at time of writing the repositories are broken), other than removing jpackage.conf you can just rename it to jpackage.conf.stop.
At time of writing, some services require ScientificLinux4 and gLite31: the relevant files are included in the following archives.
- mrepo_conf_official.tgz: mrepo configuration files for official repositories. This file contains the configuration files needed to mirror the most common repositories: DAQ, lcgCA, Jpackage, SL49, SL57, gLite. Most probably, if you are creating a national mirror you'd want to mirror all of them. If
- mrepo_conf_localmirror.tgz: mrepo configuration files for national mirror repositories. These files are to be used by institutions willing to mirror from a national mirror. To use these files, you just need to replace (in each of them) the name of the server with your national mirror's one. Most probably, your institution will not be interested in mirroring everything (for example, you may just want to mirror gLite32 stuff) so I suggest you read each configuration file and comment lines as needed.
The name of the files should be self-explanatory, however:
- dag: mirror of the DAG archive (little bit tautologically)
- egi-trustanchors: mirror of the Certification Authorities certificates
- jpackage: mirror of the JPackage archive, for SL4 only
- sl49 and sl57: mirror of ScientificLinux 4.9 and 5.7 respectively
- glite31.linuxsoft and glite32.linuxsoft: mirror of gLite31 and gLite32 repositories for EGEE-like installations
- glite31.gridit and glite32.gridit: mirror of gLite31 and gLite32 repositories for InfnGrid-like installations
- ig31 and ig32: mirror of InfnGrid additional repositories for gLite31 and gLite32 respectively
Note that, at time of writing, the official jpackage repositories are broken. I have not yet a solution for the periodic mirroring: however, since this repository is for SL4 only, it is not supposed to change a lot with time. You can try to mirror from GARR (mrepo configuration file in mrepo_localmirror archive above), but this is not going to be true forever. Or, you can follow this recipe ONCE to create your local (static) mirror:
- download jpackage.conf.stop to /etc/mrepo.conf.d/
- cd /etc/mrepo.conf.d/ ; mv jpackage.conf.stop jpackage.conf
- run mrepo interactively: it will crash on some mirror (no worry, it's normal), it will eventually find one working mirror and download packages (check this in a separate window). At time of writing the free part contains about 2700 packages, while the non-free part contains about 15.
- when mrepo is running, check the content of the free/non-free subdirs: if you find that the number returned by "ls -l /RPMS.baseurl/ | wc -l" stays constant for more than 5 minutes, go to the window where mrepo is running and press "Ctrl+C" (you will have to do this once for each working mirror, for both the free and non-free downloads)
- after mrepo will have finished mirroring all remaining repositories, it will correctly create the repomd.xml files somewhere under [wwwdir]
- cd /etc/mrepo.conf.d/ ; mv jpackage.conf jpackage.conf.stop so that jpackage is not automatically mirrored
In order to use your local repositories, you will have to prepare the yum configuration files in order to include, alongside the official repositories, your own ones. Please find below some archives containing most/all the yum files you may possibly want to use:
In order to use these yum configuration files you need to:
To be very explicit, or in case you prefer to do things by hand, or in case you need something else not covered in the above yum archives, here is the philosophy behind the files contained in the archives. Suppose the installation guide tells you need to download the yum configuration file
ig.repo which reads:
#
# INFNGRID repositories
#
[ig_sl5_x86_64]
name = INFNGRID 3.2 x86_64
baseurl = http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0/
enabled = 1
protect = 0
[ig_sl5_x86_64_externals]
name = INFNGRID 3.2 x86_64 (externals)
baseurl = http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0_externals/
enabled = 1
protect = 0
If you are mirroring InfnGrid repository (as per
ig32.conf above), you can modify such yum file so that it reads more or less like:
#
# INFNGRID repositories
#
[ig_sl5_x86_64]
name = INFNGRID 3.2 x86_64
baseurl = http://10.23.23.11/mrepo/ig_sl5-x86_64/RPMS.3_2_0/ http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0/
enabled = 1
protect = 0
[ig_sl5_x86_64_externals]
name = INFNGRID 3.2 x86_64 (externals)
baseurl = http://10.23.23.11/mrepo/ig_sl5-x86_64/RPMS.3_2_0_externals/ http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0_externals/
enabled = 1
protect = 0
As you see, we have added the local repository, which is assumed to be served by server
http://10.23.23.11/, to the
baseurl (which accepts a blank-separated list of repositories). You can now save your modified file as
ig_local.repo, store it in some safe place and copy it to
/etc/yum.repos.d/ on each machine or server you are installing (instead of using
ig.repo).
Worker nodes on a private network
Having worker nodes in a private network is of course possible. Note however that the automatic configuration gets somewhat confused by this setup, so please read carefully the
last part of this page before running
ig_yaim configure.
Assuming you want to use private network
192.168.106.x, you will need to:
- elect one machine to act as NAT: this can be the CE or the UI, for example, or even a cheap diskless piece of hardware. The NAT shall have two network interfaces, let's assume
eth0 is the public and eth1 the private one. You need to:
- choose a domain name for your private network, say
cluster.garr. Name resolution can be dealt with by appropriately configuring DNS or, if you have just a few machines, by editing /etc/hosts file (and keeping it in sync between hosts).
Notes for Torque/Maui configuration:
- your CE will have both a private and a public name: make sure the
root user for both names is declared in the "operators" and "managers" within qmgr. If you need to add managers, for example, you will: stop PBS, edit file /var/spool/pbs/server_priv/acl_svr/managers, start PBS.
- if Maui fails to start, you will probably need to change the value for
SERVERHOST in /var/spool/maui/maui.cfg: most probably you will have to use the public name of the CE, but you can try both and see which one works
- most probably you would want to configure your UI as a WN, too. It's easy, you just need to give your UI a private address (and a private name, to be declared in /etc/hosts for the whole cluster if you don't want to mess with DNS): remember that before configuring with
yaim, you will have to set the hostname to the private one (revert to the public one after configuration has been completed). Another thing to remember: verify that your UI has host-based ssh passwordless access to the CE. Test password-less access works with both the private and the public CE name, and with both the full and short format: you may have to stop service nscd to make sure that what is contained in /etc/hosts is what is in effect.
Decide which services you want to deploy
As a bare minimum you should have a computing element (CE), a site BDII, a storage element (SE), some worker-nodes (WN) and a user interface (UI).
Note that starting from gLite 3.2 update 16, released 2010-08-04, the site BDII cannot be installed on the same host as the CE. Since the site BDII is a "light" service, I suggest you set it up as a virtual machine (to be installed as you please, also on the CE, may be).
Some of the services can be setup as virtual machines. If you are a bit short on resources, you could define the UI to also act as a WN: if you do this, remember to tune the PBS "ideal_load" parameter appropriately for this host.
Setup software area and home directories
The VO software areas will normally be shared via NFS. All the configuration will be dealt with during the configuration step: as usual, if your worker nodes are in a private network, read the
last part of this page.
As far as home directories are concerned, you may want to keep generic GRID accounts separate from local accounts. Or, more generally, you may want to have them in a different place than the default (which is
/home). This would provide you with a much cleaner environment, and will easily allow you to have a shared home directories for your generic accounts throughout the site. You will just need to NFS-export the base home directory to the whole farm and you will be set: remember about this point while configuring the site for MPI.
General tips on nodes setup
SElinux
Having SElinux installed is often causing a number of mysterious problems. My advice is to disable it completely by editing file
/etc/selinux/config and setting variable
SELINUX = disabled and rebooting. If you really like SElinux, I suggest you disable it during service installation and configuration, and may be re-enable it after you've verified that the service works as expected.
Hostname resolution
Some services require the hostname to be properly matched to the correct ethernet adapter. Suppose your hostname is
mysrv01.some.domain.name with public IP address
123.45.67.89, you may have in your
/etc/hosts file a line like:
127.0.0.1 localhost.localdomain localhost mysrv01.some.domain.name mysrv01
Please change it to:
127.0.0.1 localhost.localdomain localhost
123.45.67.89 mysrv01.some.domain.name mysrv01
Disk partitioning
Limiting ourselves to the installation of worker-nodes, there is probably no need to think too much about disk partitioning. Provided you have setup an mrepo repository at your site, and you install nodes via kickstart (read below) reinstallation and worker-node configuration is a matter of 20 minutes or so.
The present setup at INFN-ROMA3 is as easy as the following:
- swap partition, twice as the installed RAM
-
/usr partition: 5 GB (2.5 GB actually used)
- all the rest to
/ partition
- on nodes with two system disks, these are configured as RAID0. We are talking about worker nodes, if anything breaks, well, too bad for those running jobs: take out the broken disk, reinstall and in 30 minutes the node is back in service.
If you are thinking about any fancier partitioning scheme, which could indeed be useful for server installation, note for example that:
- some services install under non-standard paths, like
/opt
- the amount of logfiles being produced, tipically under
/var, may be rather large
It could thus be useful (or needed) to first make a test installation of the service, and then possibly adjust the partitioning scheme and reinstall.
Time configuration
Time synchronization is very important when using GRID. You should configure your machines (server and worker-nodes) to adjust the time automatically using NTP. Instructions are provided in the gLite32 guides. You should edit file
/etc/ntp.conf and add at least 2 or 3 servers: it's best to provide
local or
national servers but If you don't know which ones to use you can try
-
ntp-1.infn.it, ntp-2.infn.it, ntp-3.infn.it: these are the INFN NTP servers
-
ntp1.inrim.it, ntp2.inrim.it: these are other Italian NTP servers
Basic gLite Installation
Installed packages, running services
The operating system installation can be streamlined, as all the software which is really
needed will be installed automatically, as per gLite32 instructions. To give you an idea, here is a
list of packages to remove and a
list of packages to add.
This
list of services to started/stopped could be a reasonable starting point.
Special Note for CREAM-CE installation: as of 2010-05-26, I have found the InfnGrid procedure for installing the CREAM-CE is not fully working. Please follow the generic gLite3.2 CREAM-CE guide (find link at beginning of this page), but also note that:
- you can safely add the yum repositories suggested in such guide, together with those suggested in the INFN guide
- some package dependencies were missed, in my case I had to perform the following actions BEFORE installing gLite metapackages :
-
yum remove log4j
-
yum install log4j.x86_64 and then proceed as per instructions ( yum install xml-commons-apis, then yum install glite-CREAM, ...)
- also note that in order to configure the
blparser-master I also had to set BLPARSER_UPDATER_NOTIFIER = false (default is true) in sile /opt/glite/yaim/defaults/glite-creamce.pre. The default settings should be OK for CREAM-CE version 1.16 and above, but I got version 1.12
- when you have finished installing your CREAM-CE, please verify its functionality
VO-specific tuning
If you are going to support several VOs, check the relevant
VO Identity Cards to find out specific requirements like minimum free space on worker nodes, minimum required RAM, list of extra RPM packages, and so on.
Note for AFRICACERT VO: if you intend to support the africacert VO, you need to have the relevant lines in files pointed to by USERS_CONF and GROUP_CONF variables (see below). You will also need to install the Cometa VOMS certificate which, at time of writing, can be downloaded from
this link.
Note for SAGRID VO: if you intend to support the africacert VO, you need to have the relevant lines in files pointed to by USERS_CONF and GROUP_CONF variables (see below). You will also need to install the VOMS certificate: detailed instructions are available at the
SAGRID wiki page
Example: supporting sagrid VO
- go to the CIC portal, select
sagrid in the "VO selection" box and read about the minimum hardware requirements for this VO, as well as required packages by this VO
- edit your
site-info.def file, find the file where yaim expects to find definitions for generic users (file pointed to by variable USERS_CONF) and add the users:
39401:sagrid001:3940:sagrid:sagrid::
39402:sagrid002:3940:sagrid:sagrid::
39403:sagrid003:3940:sagrid:sagrid::
39404:sagrid004:3940:sagrid:sagrid::
39405:sagrid005:3940:sagrid:sagrid::
39406:sagrid006:3940:sagrid:sagrid::
39407:sagrid007:3940:sagrid:sagrid::
39408:sagrid008:3940:sagrid:sagrid::
39409:sagrid009:3940:sagrid:sagrid::
39410:sagrid010:3940:sagrid:sagrid::
39411:sagrid011:3940:sagrid:sagrid::
39412:sagrid012:3940:sagrid:sagrid::
39413:sagrid013:3940:sagrid:sagrid::
39414:sagrid014:3940:sagrid:sagrid::
39415:sagrid015:3940:sagrid:sagrid::
39416:sagrid016:3940:sagrid:sagrid::
39417:sagrid017:3940:sagrid:sagrid::
39418:sagrid018:3940:sagrid:sagrid::
39419:sagrid019:3940:sagrid:sagrid::
39420:sagrid020:3940:sagrid:sagrid::
39501:sgmsagrid001:3950,3940:sgmsagrid,sagrid:sagrid:sgm:
39502:sgmsagrid002:3950,3940:sgmsagrid,sagrid:sagrid:sgm:
39503:sgmsagrid003:3950,3940:sgmsagrid,sagrid:sagrid:sgm:
The format of each line is
userID:userAccount:primaryGroupID[,SecondaryGroupID[,TertiaryGroupID...]]:primaryGroupName[,SecondaryGroupName[,TertiaryGroupName...]]:VOname:[VOrole]:
Make sure the chosen userID(s) and groupID(s) do not collide with existing local users. The following commands return the lists of already existing users and groups sorted by userID and groupID, respectively:
sort -k3 -t: -g /etc/passwd
sort -k3 -t: -g /etc/group
- edit your
site-info.def file, find the file where yaim expects to find definitions for generic groups (file pointed to by variable GROUPS_CONF) and add the default role and any other special privileged role:
"/sagrid/ROLE=SoftwareManager":::sgm:
"/sagrid"::::
Kickstart installation To be completed
Installation instructions using PXE/tftpboot can easily be found on the web.
This file contains the combination of packages which are installed on the worker-nodes at INFN-ROMA3. The list includes packages which are needed by the Atlas VO, as well as packages needed to ensure backward compatibility with ScientificLinux4: it is a reasonable starting point for creating your own kickstart.
--
FulvioGaleazzi - 2010-11-17