Sites installation and setup tips: general considerations

This page refers to the INFNgrid customization, although most of its content may be applicable to plain EGEE gLite release.

Getting the operating system

The current release of the operating system to be used for gLite3.2 is ScientificLinux 5.5. Find in the ScientificLinux website a list of mirrors from which to download the ISOs. Note (as of 2011-02-10): release SL5.6 is already out, I expect it should also work.

It is advised NOT to use the CERN version of ScientificLinux (also known as SLC5.X), as this version is tailored for CERN environment. For example, it sets tons of environment variables specific to CERN.

Creating a local repository with mrepo

A local repository turns out very useful when doing multiple installations in parallel, or when you desire your system updates to be protected against network instabilities. If these are your needs, then mrepo is the tool for you.

The mrepo repository will be served by a host which should be configured as an HTTP server.

Download the latest mrepo version from the CERN DAG repository or from the mrepo website: the package is available as an RPM. Last available version at time of writing is 0.8.7-1. Note that mrepo relies on another package, createrepo, which can be downloaded from the DAG repository: on my SL5 host I got createrepo version 0.4.11. Most probably, however, your machine already "knows" about the DAG repository: if you see a file /etc/yum.repos.d/dag.repo you can install mrepo more simply with yum --enablerepo=dag install mrepo.

Note for SL4 users: the installed packages for mrepo and createrepo will be older versions than the ones listed above. You may still manage to have yum groupinstall working, by following the advice in the next paragraph.

Note for both SL5 and SL4 users: you will need to modify the following in the mrepo script, since the path to the comps.xml file is not composed properly.

*** /usr/bin/mrepo      2010-03-22 01:57:41.000000000 +0100
--- mrepo.modified      2010-07-04 20:58:02.000000000 +0200
***************
*** 845,851 ****
                  repoopts = repoopts + ' --cachedir "%s"' % cachedir
              if os.path.isdir(os.path.join(self.wwwdir, '.olddata')):
                  remove(os.path.join(self.wwwdir, '.olddata'))
!             groupfile = os.path.join(cf.srcdir, self.dist.nick, self.name + '-comps.xml')
              if os.path.isfile(groupfile):
                  symlink(groupfile, os.path.join(self.wwwdir, 'comps.xml'))
                  repoopts = repoopts + ' --groupfile "%s"' % groupfilename
--- 845,853 ----
                  repoopts = repoopts + ' --cachedir "%s"' % cachedir
              if os.path.isdir(os.path.join(self.wwwdir, '.olddata')):
                  remove(os.path.join(self.wwwdir, '.olddata'))
! #            groupfile = os.path.join(cf.srcdir, self.dist.nick, self.name + '-comps.xml')
!             groupfile = os.path.join(cf.srcdir, self.dist.nick, self.name + '/comps.xml')
!             info(2, 'Groupfile: %s' % (groupfile))
              if os.path.isfile(groupfile):
                  symlink(groupfile, os.path.join(self.wwwdir, 'comps.xml'))
                  repoopts = repoopts + ' --groupfile "%s"' % groupfilename

After installing the package, edit /etc/mrepo.conf file to read more or less as follows:

[main]
srcdir = /data2/mrepo
# Make wwwdir to point to a directory served by the webserver
wwwdir = /var/www/mrepo
confdir = /etc/mrepo.conf.d
arch = x86_64
metadata=yum repomd
# option -P=4 will be passed to lftp, to allow parallel downloads
#-P=4 option sometimes gives troubles...# lftp-mirror-options = -c -P=4 -X "*/Fermi/*" -X "*/example/*"
lftp-mirror-options = -c -X "*/Fermi/*" -X "*/example/*"
# Option -d is used to make 'yum groupinstall' work: it can be used ONLY: from createrepo 0.4.11 onwards.
# If you have an older version of createrepo, comment out the following line
createrepo-options = -d -p

Note that all packages will be downloaded to srcdir, hence allow for enough disk space. Also note that from time to time there are updates being added to the repositories: as a bare minimum allow 50 GB, but more would be desirable (say 100 GB).

On the other hand, the path pointed to by wwwdir will contain mainly compressed xml files, plus symbolic links to files under srcdir. Last line is important, to avoid endless recursion when downloading from the ScientificLinux website. Configuration files for individual repositories to be mirrored shall be put under confdir.

Verify that mrepo installation created a crontab under /etc/cron.d like this one:

### Enable this if you want mrepo to daily synchronize
### your distributions and repositories at 2:30am.
30 0 * * * root echo "### Mrepo running on: "`date` >> /var/log/mrepo.log 2>&1 ; /usr/bin/mrepo -ugfv >> /var/log/mrepo.log 2>&1

The first time you run mrepo I advise you run it interactively, though.

Find below a list of useful configuration files for mrepo, to be put under confdir. Read the content of each file and possibly tailor it to your needs/taste: if there is something you do not want to mirror, you have two possibilities:

  • edit the configuration file and comment the relevant lines, or
  • if you are not interested in ALL archives contained in the file you can more easily rename it. For example, suppose that after initial mirroring you don't want to mirror jpackage (since, at time of writing the repositories are broken), other than removing jpackage.conf you can just rename it to jpackage.conf.stop.

At time of writing, some services require ScientificLinux4 and gLite31: the relevant files are included in the following archives.

  • mrepo_conf_official.tgz: mrepo configuration files for official repositories. This file contains the configuration files needed to mirror the most common repositories: DAQ, lcgCA, Jpackage, SL49, SL57, gLite. Most probably, if you are creating a national mirror you'd want to mirror all of them. If
  • mrepo_conf_localmirror.tgz: mrepo configuration files for national mirror repositories. These files are to be used by institutions willing to mirror from a national mirror. To use these files, you just need to replace (in each of them) the name of the server with your national mirror's one. Most probably, your institution will not be interested in mirroring everything (for example, you may just want to mirror gLite32 stuff) so I suggest you read each configuration file and comment lines as needed.

The name of the files should be self-explanatory, however:

  • dag: mirror of the DAG archive (little bit tautologically)
  • egi-trustanchors: mirror of the Certification Authorities certificates
  • jpackage: mirror of the JPackage archive, for SL4 only
  • sl49 and sl57: mirror of ScientificLinux 4.9 and 5.7 respectively
  • glite31.linuxsoft and glite32.linuxsoft: mirror of gLite31 and gLite32 repositories for EGEE-like installations
  • glite31.gridit and glite32.gridit: mirror of gLite31 and gLite32 repositories for InfnGrid-like installations
  • ig31 and ig32: mirror of InfnGrid additional repositories for gLite31 and gLite32 respectively

Note that, at time of writing, the official jpackage repositories are broken. I have not yet a solution for the periodic mirroring: however, since this repository is for SL4 only, it is not supposed to change a lot with time. You can try to mirror from GARR (mrepo configuration file in mrepo_localmirror archive above), but this is not going to be true forever. Or, you can follow this recipe ONCE to create your local (static) mirror:

  • download jpackage.conf.stop to /etc/mrepo.conf.d/
  • cd /etc/mrepo.conf.d/ ; mv jpackage.conf.stop jpackage.conf
  • run mrepo interactively: it will crash on some mirror (no worry, it's normal), it will eventually find one working mirror and download packages (check this in a separate window). At time of writing the free part contains about 2700 packages, while the non-free part contains about 15.
  • when mrepo is running, check the content of the free/non-free subdirs: if you find that the number returned by "ls -l /RPMS.baseurl/ | wc -l" stays constant for more than 5 minutes, go to the window where mrepo is running and press "Ctrl+C" (you will have to do this once for each working mirror, for both the free and non-free downloads)
  • after mrepo will have finished mirroring all remaining repositories, it will correctly create the repomd.xml files somewhere under [wwwdir]
  • cd /etc/mrepo.conf.d/ ; mv jpackage.conf jpackage.conf.stop so that jpackage is not automatically mirrored

In order to use your local repositories, you will have to prepare the yum configuration files in order to include, alongside the official repositories, your own ones. Please find below some archives containing most/all the yum files you may possibly want to use:

In order to use these yum configuration files you need to:

  • untar the appropriate archive
  • replace in each file the string MYSITENAME with your national/local mirror one: this is achieved by running the script (contained in each tarfile);
       ./changeMYSITENAME.csh <inputDir> <outputDir> <webserverName>
       
    for example, if you want to prepare yum files for gLite3.2 InfnGrid repositories, making sure they refer to your newly created repository hosted on machine web.server.it, run:
       ./changeMYSITENAME.csh glite32_ig_MYSITENAME glite32_ig_MyWebserver  web.server.it
       
    and you will find the modified files in subdirectory glite32_ig_MyWebserver
  • copy the modified .repo files to /etc/yum.repos.d/

To be very explicit, or in case you prefer to do things by hand, or in case you need something else not covered in the above yum archives, here is the philosophy behind the files contained in the archives. Suppose the installation guide tells you need to download the yum configuration file ig.repo which reads:

#
# INFNGRID repositories
#
[ig_sl5_x86_64]
name    = INFNGRID 3.2 x86_64
baseurl = http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0/
enabled = 1
protect = 0

[ig_sl5_x86_64_externals]
name    = INFNGRID 3.2 x86_64 (externals)
baseurl = http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0_externals/
enabled = 1
protect = 0

If you are mirroring InfnGrid repository (as per ig32.conf above), you can modify such yum file so that it reads more or less like:

#
# INFNGRID repositories
#
[ig_sl5_x86_64]
name    = INFNGRID 3.2 x86_64
baseurl = http://10.23.23.11/mrepo/ig_sl5-x86_64/RPMS.3_2_0/ http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0/
enabled = 1
protect = 0

[ig_sl5_x86_64_externals]
name    = INFNGRID 3.2 x86_64 (externals)
baseurl = http://10.23.23.11/mrepo/ig_sl5-x86_64/RPMS.3_2_0_externals/ http://grid-it.cnaf.infn.it/mrepo/ig_sl5-x86_64/RPMS.3_2_0_externals/
enabled = 1
protect = 0

As you see, we have added the local repository, which is assumed to be served by server http://10.23.23.11/, to the baseurl (which accepts a blank-separated list of repositories). You can now save your modified file as ig_local.repo, store it in some safe place and copy it to /etc/yum.repos.d/ on each machine or server you are installing (instead of using ig.repo).

Other things you need to plan

Worker nodes on a private network

Having worker nodes in a private network is of course possible. Note however that the automatic configuration gets somewhat confused by this setup, so please read carefully the last part of this page before running ig_yaim configure.

Assuming you want to use private network 192.168.106.x, you will need to:

  • elect one machine to act as NAT: this can be the CE or the UI, for example, or even a cheap diskless piece of hardware. The NAT shall have two network interfaces, let's assume eth0 is the public and eth1 the private one. You need to:
    • edit /etc/sysctl.conf and set net.ipv4.ip_forward = 1, then restart network
    • configure iptables for masquerading with this command:
            iptables -t nat -A POSTROUTING -s 192.168.106.0/24 -o eth0 -j MASQUERADE
            iptables-save > /etc/sysconfig/iptables
            chkconfig iptables on
            service iptables restart
            
      where the second command is for ScientificLinux distributions and should allow persistance of the rule across reboots
  • choose a domain name for your private network, say cluster.garr. Name resolution can be dealt with by appropriately configuring DNS or, if you have just a few machines, by editing /etc/hosts file (and keeping it in sync between hosts).

Notes for Torque/Maui configuration:

  • your CE will have both a private and a public name: make sure the root user for both names is declared in the "operators" and "managers" within qmgr. If you need to add managers, for example, you will: stop PBS, edit file /var/spool/pbs/server_priv/acl_svr/managers, start PBS.
  • if Maui fails to start, you will probably need to change the value for SERVERHOST in /var/spool/maui/maui.cfg: most probably you will have to use the public name of the CE, but you can try both and see which one works
  • most probably you would want to configure your UI as a WN, too. It's easy, you just need to give your UI a private address (and a private name, to be declared in /etc/hosts for the whole cluster if you don't want to mess with DNS): remember that before configuring with yaim, you will have to set the hostname to the private one (revert to the public one after configuration has been completed). Another thing to remember: verify that your UI has host-based ssh passwordless access to the CE. Test password-less access works with both the private and the public CE name, and with both the full and short format: you may have to stop service nscd to make sure that what is contained in /etc/hosts is what is in effect.

Decide which services you want to deploy

As a bare minimum you should have a computing element (CE), a site BDII, a storage element (SE), some worker-nodes (WN) and a user interface (UI).

Note that starting from gLite 3.2 update 16, released 2010-08-04, the site BDII cannot be installed on the same host as the CE. Since the site BDII is a "light" service, I suggest you set it up as a virtual machine (to be installed as you please, also on the CE, may be).

Some of the services can be setup as virtual machines. If you are a bit short on resources, you could define the UI to also act as a WN: if you do this, remember to tune the PBS "ideal_load" parameter appropriately for this host.

Setup software area and home directories

The VO software areas will normally be shared via NFS. All the configuration will be dealt with during the configuration step: as usual, if your worker nodes are in a private network, read the last part of this page.

As far as home directories are concerned, you may want to keep generic GRID accounts separate from local accounts. Or, more generally, you may want to have them in a different place than the default (which is /home). This would provide you with a much cleaner environment, and will easily allow you to have a shared home directories for your generic accounts throughout the site. You will just need to NFS-export the base home directory to the whole farm and you will be set: remember about this point while configuring the site for MPI.

General tips on nodes setup

SElinux

Having SElinux installed is often causing a number of mysterious problems. My advice is to disable it completely by editing file /etc/selinux/config and setting variable SELINUX = disabled and rebooting. If you really like SElinux, I suggest you disable it during service installation and configuration, and may be re-enable it after you've verified that the service works as expected.

Hostname resolution

Some services require the hostname to be properly matched to the correct ethernet adapter. Suppose your hostname is mysrv01.some.domain.name with public IP address 123.45.67.89, you may have in your /etc/hosts file a line like:

127.0.0.1  localhost.localdomain  localhost  mysrv01.some.domain.name  mysrv01

Please change it to:

127.0.0.1    localhost.localdomain  localhost
123.45.67.89    mysrv01.some.domain.name  mysrv01

Disk partitioning

Limiting ourselves to the installation of worker-nodes, there is probably no need to think too much about disk partitioning. Provided you have setup an mrepo repository at your site, and you install nodes via kickstart (read below) reinstallation and worker-node configuration is a matter of 20 minutes or so.

The present setup at INFN-ROMA3 is as easy as the following:

  • swap partition, twice as the installed RAM
  • /usr partition: 5 GB (2.5 GB actually used)
  • all the rest to / partition
  • on nodes with two system disks, these are configured as RAID0. We are talking about worker nodes, if anything breaks, well, too bad for those running jobs: take out the broken disk, reinstall and in 30 minutes the node is back in service.

If you are thinking about any fancier partitioning scheme, which could indeed be useful for server installation, note for example that:

  • some services install under non-standard paths, like /opt
  • the amount of logfiles being produced, tipically under /var, may be rather large

It could thus be useful (or needed) to first make a test installation of the service, and then possibly adjust the partitioning scheme and reinstall.

Time configuration

Time synchronization is very important when using GRID. You should configure your machines (server and worker-nodes) to adjust the time automatically using NTP. Instructions are provided in the gLite32 guides. You should edit file /etc/ntp.conf and add at least 2 or 3 servers: it's best to provide local or national servers but If you don't know which ones to use you can try

  • ntp-1.infn.it, ntp-2.infn.it, ntp-3.infn.it: these are the INFN NTP servers
  • ntp1.inrim.it, ntp2.inrim.it: these are other Italian NTP servers

Basic gLite Installation

Installed packages, running services

The operating system installation can be streamlined, as all the software which is really needed will be installed automatically, as per gLite32 instructions. To give you an idea, here is a list of packages to remove and a list of packages to add.

This list of services to started/stopped could be a reasonable starting point.

Special Note for CREAM-CE installation: as of 2010-05-26, I have found the InfnGrid procedure for installing the CREAM-CE is not fully working. Please follow the generic gLite3.2 CREAM-CE guide (find link at beginning of this page), but also note that:

  • you can safely add the yum repositories suggested in such guide, together with those suggested in the INFN guide
  • some package dependencies were missed, in my case I had to perform the following actions BEFORE installing gLite metapackages :
    • yum remove log4j
    • yum install log4j.x86_64 and then proceed as per instructions ( yum install xml-commons-apis, then yum install glite-CREAM, ...)
  • also note that in order to configure the blparser-master I also had to set BLPARSER_UPDATER_NOTIFIER = false (default is true) in sile /opt/glite/yaim/defaults/glite-creamce.pre. The default settings should be OK for CREAM-CE version 1.16 and above, but I got version 1.12
  • when you have finished installing your CREAM-CE, please verify its functionality

VO-specific tuning

If you are going to support several VOs, check the relevant VO Identity Cards to find out specific requirements like minimum free space on worker nodes, minimum required RAM, list of extra RPM packages, and so on.

Note for AFRICACERT VO: if you intend to support the africacert VO, you need to have the relevant lines in files pointed to by USERS_CONF and GROUP_CONF variables (see below). You will also need to install the Cometa VOMS certificate which, at time of writing, can be downloaded from this link.

Note for SAGRID VO: if you intend to support the africacert VO, you need to have the relevant lines in files pointed to by USERS_CONF and GROUP_CONF variables (see below). You will also need to install the VOMS certificate: detailed instructions are available at the SAGRID wiki page

Example: supporting sagrid VO

  • go to the CIC portal, select sagrid in the "VO selection" box and read about the minimum hardware requirements for this VO, as well as required packages by this VO
  • edit your site-info.def file, find the file where yaim expects to find definitions for generic users (file pointed to by variable USERS_CONF) and add the users:
       39401:sagrid001:3940:sagrid:sagrid::
       39402:sagrid002:3940:sagrid:sagrid::
       39403:sagrid003:3940:sagrid:sagrid::
       39404:sagrid004:3940:sagrid:sagrid::
       39405:sagrid005:3940:sagrid:sagrid::
       39406:sagrid006:3940:sagrid:sagrid::
       39407:sagrid007:3940:sagrid:sagrid::
       39408:sagrid008:3940:sagrid:sagrid::
       39409:sagrid009:3940:sagrid:sagrid::
       39410:sagrid010:3940:sagrid:sagrid::
       39411:sagrid011:3940:sagrid:sagrid::
       39412:sagrid012:3940:sagrid:sagrid::
       39413:sagrid013:3940:sagrid:sagrid::
       39414:sagrid014:3940:sagrid:sagrid::
       39415:sagrid015:3940:sagrid:sagrid::
       39416:sagrid016:3940:sagrid:sagrid::
       39417:sagrid017:3940:sagrid:sagrid::
       39418:sagrid018:3940:sagrid:sagrid::
       39419:sagrid019:3940:sagrid:sagrid::
       39420:sagrid020:3940:sagrid:sagrid::
       39501:sgmsagrid001:3950,3940:sgmsagrid,sagrid:sagrid:sgm:
       39502:sgmsagrid002:3950,3940:sgmsagrid,sagrid:sagrid:sgm:
       39503:sgmsagrid003:3950,3940:sgmsagrid,sagrid:sagrid:sgm:
       
    The format of each line is
       userID:userAccount:primaryGroupID[,SecondaryGroupID[,TertiaryGroupID...]]:primaryGroupName[,SecondaryGroupName[,TertiaryGroupName...]]:VOname:[VOrole]:
       
    Make sure the chosen userID(s) and groupID(s) do not collide with existing local users. The following commands return the lists of already existing users and groups sorted by userID and groupID, respectively:
       sort -k3 -t: -g /etc/passwd
       sort -k3 -t: -g /etc/group
       
  • edit your site-info.def file, find the file where yaim expects to find definitions for generic groups (file pointed to by variable GROUPS_CONF) and add the default role and any other special privileged role:
       "/sagrid/ROLE=SoftwareManager":::sgm:
       "/sagrid"::::
       

Kickstart installation To be completed

Installation instructions using PXE/tftpboot can easily be found on the web.

This file contains the combination of packages which are installed on the worker-nodes at INFN-ROMA3. The list includes packages which are needed by the Atlas VO, as well as packages needed to ensure backward compatibility with ScientificLinux4: it is a reasonable starting point for creating your own kickstart.

-- FulvioGaleazzi - 2010-11-17

Topic revision: r9 - 2011-11-23 - 09:35:18 - FulvioGaleazzi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback