====== Allegro Computer Environment ======

===== Allegro Computers =====

==== Desktop Machines ====

At Allegro we have five compute servers that are dedicated to high-performance work with ALMA data:

^name       ^processor                ^speed  ^memory  ^alias^
|tulor      |40x Intel Xeon E5-2640 v4|2.4GHz|512GiB |TU   |
|helada     |32x Intel Xeon E5-2640 v3|2.6GHz|512GiB |HE   |
|chaxa      |32x Intel Xeon E5 2665   |2.4GHz|256GiB |CX   |
|cejar      |16x Intel Xeon E5 2650   |2.0GHz| 32GiB |CE   |
|tebinquiche|12x Intel Xeon E5645     |2.4GHz| 48GiB |TQ   |
|miscanti   |48x Intel Xeon Gold 6226 |2.7GHz|512GiB |MI   |

All these machines except tebinquiche are running on a stable Linux operating system **Red Hat Enterprise Linux Server release 7.9 (Maipo)**. tebinquiche still runs 6.10 (Santiago) and is intended for legacy purposes (e.g., CASA version 4).

Logging into any of these computers works through e.g. :
<code bash>
$ ssh -X username@chaxa
</code>
From outside the Leiden Observatory network these computers are not directly accessible, except for miscanti (miscanti.strw.leidenuniv.nl). To access any computer, you first need to ssh to miscanti and from there to one of the other Allegro computers.

==== Filesystem ====

The data at Allegro is mainly stored on six large [[https://en.wikipedia.org/wiki/Network_File_System|NFS]] file systems called ''/allegro1'' through ''/allegro6''. These storages are accessible from all Allegro compute servers through a dedicated high-speed network. From other Linux computers at Leiden Observatory these storages are accessible via ''/net/chaxa/allegro1'' through ''/net/chaxa/allegro6'' at reduced speed. Note that the ''/allegro'' storages have different sizes and purposes, so please consult the Allegro staff before using one of these storages.

You may come accross references to [[http://en.wikipedia.org/wiki/Lustre_%28file_system%29|Lustre]] file systems at Allegro. These are the previous storages that have been replaced. Because of backward compatibiliy, the ''/lustre2'' storage name is still in use, but is in fact a virtual storage that physically lives on the ''/allegro2'' and ''/allegro3'' storages.

<note warning>
Although your files on the Leiden Observatory home area are backed up on a daily basis, there is **no backup scheme** in place for your files on the Allegro storages. You are therefore advised NOT to store single irreplaceable files on these file systems. Allegro-provided software and the data archive are replicated on all file systems, but actual data is not.
</note>

===== Directory Structure =====

The root directory for all relevant files in the work with ALMA data is ''%%<FS>/allegro%%'', where ''%%<FS>%%'' can be any of the file systems described above. The root directory contains the following directories:

  * ''%%allegro_staff%%'': Only accessible for Allegro members
  * ''%%bin%%'': binaries/executables
  * ''%%data%%'': ALMA data
    * ''%%projects%%'': Projects
    * ''%%public_data_archive%%'': Public data that is accessible for everyone
  * ''%%doc%%'': Documentation (e.g., ALMA handbooks, ...)
  * ''%%etc%%'': Startup scripts, etc.
  * ''%%home%%'': Each Allegro user will have a subdirectory under this with their username, but please note: don't be deceived by the name, this is NOT the same as the normal Sterrewacht home area. In fact we mainly use this directory as a place to put links to your data.
  * ''%%lib%%'': libraries (e.g., python modules, ...)

===== Working on Allegro Computers =====

==== Environment Setup ====

Before using the Allegro computers, it is advised to set up your system to get full access to the system-wide installation of python modules, binaries, etc.

=== Setup scripts ===

We have generated a script that sets up all the environment variables (most importantly the environment variable ALLEGRO) to use the software, find programs, etc, to facilitate the work with the Allegro computers. It also displays any important information about changes in the system, scheduled reboots, etc, when logging into any of the Allegro computers.

<note important>
If you don't run these, you won't be able to find stuff.
</note>

Rather than sourcing this script manually, it is advised to add a small statement to your shell rc-file. The following lines will check for the computer you are currently logged in, and source the startup scripts if needed:

  * **Bash** users should add these lines to their ''%%~/.bashrc%%'':<code bash>
lustreroot_file='/home/alma/etc/lustre_root'
if [[ -r $lustreroot_file ]]; then
  lustreroot=`cat $lustreroot_file`
else
  lustreroot='/lustre1'
fi
alg_user_setup=$lustreroot/allegro/bin/bashrc_user.sh
if [[ -r $alg_user_setup ]]; then
    . $alg_user_setup
fi
</code>
  * **C-Shell** (i.e. tcsh) users should add these lines to their ''%%~/.cshrc%%'':<code bash>
set lustre_root_file="/home/alma/etc/lustre_root"
if ( -r $lustre_root_file ) then
  set lustreroot=`cat $lustre_root_file`
else
  set lustreroot="/lustre1"
endif
set alg_user_setup=$lustreroot/allegro/bin/cshrc_user.csh
if ( -r $alg_user_setup ) then
    source $alg_user_setup
endif
</code>

<note important>
If in doubt what type of shell you are using, type :

<code bash>
$ echo $SHELL
</code>
</note>

<note important>
It should be noted that by sourcing these scripts, your default for the umask will be ''%%umask 002%%''. This means that by default, the group gets the same permissions as you, and others will not have write permission.
</note>

<note warning>
If you do not source the rc-files, you might miss important information about changes in the system or scheduled reboots, which are otherwise displayed in the console every time you log in. If you do not wish to source the rc-files, but you still would like to display the system messages, please run ''%%display_messages.sh%%''. If you choose not to display the system messages, we do not take any responsibility for eventual data loss due to reboot.
</note>


=== CASA setup ===

<note>#HUIB: These changes to init.py etc should be done by Huib not the user.</note>

<note>**#HUIB: NEEDS UPDATE: see https://trello.com/c/zxQMCTST/100-allegro-users-guide**</note>

To access the user-provided CASA tasks and other modules when you work on the Allegro computers, simply add the following lines to your ''%%~/.casa/init.py%%'':

<code python>
import socket
hostName = socket.gethostname().split('.')[0] # this seems to be the most robust/portable way to obtain the hostname.
available_hosts = open('/home/alma/etc/available_hosts').read().split('\n')[:-1]
if hostName in available_hosts:
    allegro = os.getenv('ALLEGRO')
    exec(open('%s/etc/casa/init.py' % allegro).read())
</code>
Note


This will only work if (as described in section ''%%s-shell%%'') you have sourced the rc file that sets the environment variable ''%%ALLEGRO%%'', which points to the Allegro root directory.


==== Running CASA ====

The Sterrewacht IT department provides a version of CASA on all machines, and this is what you will see if you type ''%%casapy%%'' on Allegro machines too, //if you have not set up your environment the way we recommend above.// There is nothing wrong with doing that if you wish to, but Allegro does not specifically support that installation of casa. We provide and recommend our own versions, which is what ''%%casapy%%'' will source if you have followed the instructions re setup above.

Allegro maintains several versions of CASA compiled for both Red Hat 6 (only on tebinquiche) and Red Hat 7 (all our other machines). You can find which versions are present, and the shorthand commands to invoke them, by typing on any Allegro machine:

<code bash>
ls -l $ALLEGRO_LOCAL_BIN
</code>
For example, to run CASA-5.4.0 in the Allegro environment, type:

<code bash>
casapy-54
</code>
at the command line (on all machines but tebinquiche, which doesn't have it). Typing simply:

<code bash>
casapy
</code>
will start up the Allegro default version. NOTE however that this is not necessarily the latest version of CASA.

<note>**#HUIB: ADD CASA VERSIONS ON DIFFERENT COMPUTERS: see https://trello.com/c/L0BWiAf6/101-casa-versions-on-various-computers**</note>

==== ALMA Projects ====

The idea of a project is to have a workspace for one or more users where they can collect all necessary information to work on one or more ALMA datasets.

=== Categories ===

There are two kinds of projects for users:

  * **PI Project**: In this kind of project, Allegro supports the PI from the creation of the scheduling blocks up to the point where the data is delivered. The project ID must be the ALMA ID.
  * **Open Project**: This category is dedicated to non-PI users that visit Allegro to work on ALMA data. That can be either archival data, or also propietary data that the user brings along. The project ID can be chosen arbitrarily.

=== ID ===

Each project has a unique ID, the so-called //project ID//. This is a combination of the project category and the ALMA ID, which are combined ''%%projectID = category + '_' + almaID%%''. Categories are ''%%pi%%'' and ''%%open%%'' for the two types of categories.

=== Project Access ===

The root directory for projects is ''%%$ALLEGRO/data/projects%%''. Its content cannot be read by the user, which is part of the security-by-obscurity data protection scheme. The root directory contains one subdirectory for each project. The project directory name is a 8-digit alphanumeric random string, ensuring that no other user can see what other projects are currently active at Allegro.

A project is only accessible for users that are linked to the project. Linking a user to a project means that we guide the user through the obscurity layer. This is achieved by setting a symbolic link to the project directory (with its 8-digit alphanumeric code) in an area that can //only// be accessed by the user.

For this we use a directory under ''%%$ALLEGRO/home%%''. If we grant a user (e.g., with the guest user name ''%%allegro5%%'') access to a project, a directory will be created with special [[https://en.wikipedia.org/wiki/Access_control_list|Access Control List (ACL)]] permission. :

<code bash>
$ ls -lah $ALLEGRO/home
...
drwxrws---+ 2 alma allegro 4096 Aug 19 10:26 allegro5
...
</code>
This directory can only be accessed by allegro members and the user ''%%allegro5%%''. In this directory, the user will find a soft link with the allegro ID that leads them directly into the desired project :

<code bash>
$ ls -lah $ALLEGRO/home/allegro5
...
lrwxrwxrwx 1 alma allegro 38 Jul  1 13:50 pi_2013.1.12345.S -> /lustre1/allegro/data/projects/5Hnd6mtE
...
</code>

=== Project Directory Structure ===

{{:allegrouserguide:directory_structure.png|Example Directory Structure of a PI project.}}

**Figure 2: Example Directory Structure of a PI project.**

Every project contains at least two subdirectories:

  * ''%%analysis%%'': This directory contains sub-directories for every user that is part of the project. **This is the dedicated workspace to perform your work**. All the data (also the delivered data that is usually stored in the ''%%archive%%'' directory) should be copied into this directory.
  * ''%%archive%%'': This is where the ALMA data is stored. This directory is only writeable for the user ''%%alma%%'', but readable for all users.

<note warning>
Data in the ''%%archive%%'' directory should never be changed. Data from the archive should always be copied into personal directories for manipulation.
</note>

Depending on the project type, more subdirectories can exist (''%%sb%%'' and ''%%proposal%%'' for PI projects).

==== non-ALMA Projects ====

<note>**#HUIB: UPDATE /lustre1/ PATH**</note>

You can also use the Allegro computing facilities to perform other, non-ALMA related work. Your dedicated workspace for that is ''%%/lustre1/username%%'' (feel free to create a directory with your username). However, the general terms of using the Allegro computing facility apply.

Note


It should be noted that ALMA-related work has preference compared to non-ALMA projects.


=== High Performance Computing ===

Allegro's computers offer great capacity for high-performance computing. As a rule of thumb, other users should be informed if more than 25% of the available cores are needed (i.e., more than 8 cores on CX, and 3 cores on TQ). Computational time can be reserved by :

<code bash>
$> reserve_cpu_time.sh
</code>
This opens up a text file with some explanations of how to "reserve" computing time.

If some high-performance computing time is reserved in the current timeslot, this is displayed through :

<code bash>
$> display_cpu_reservations.py

== High-performance computing on chaxa ==
username  from        to           purpose
-------------------------------------------------------------
alma      26-08-2014  26-08-2014   modelling
</code>