ARCHER2 notes for UM

These are some tips for transfering from ARCHER to ARCHER 2 (access via PUMA)

information

NCAS CMS have some usefull information on how to modify your suite.

  • LINUX must be set to background (from at)

  • site/archer.rc and ncas_cray might be required to run

Met office password cahce

mosrs-cache-password

Permission errors

rm -f  ~/.ssh/environment.puma
eval `ssh-agent -s`
ssh-add -D
ssh-add ~/.ssh/id_rsa_archerum

weird FCM make permission error… not always ? Sometimes runs fine ?

ARCHER 2

CMS UM on archer 2 pages

  1. Follow on boarding instructions

  2. run example suite

  3. ~/.bash_profile edit on ARCHER 2

  4. edit or add site/archer2.rc type file or site/ncas_cray_ex/suite-adds.rc

    • --chdir=/work/n02/n02/<your ARCHER2 user name> line must be edited

SLURM COMMANDS

  • sinfo information on nodes available

  • squeue -l | grep $USER check if your jobs are running

  • squeue -l | grep PENDING | wc -l how long is the queue

  • squeue -R shortqos check the short queue

Gotchas

  • WALLCLOCK LIMIT in some suites is in minutes not seconds there 60 is 1 hour not 1 min this can be fixed editing bin/setup_metadata

fp.write("WALL_CLOCK_LIMIT=20\n")
fp.write("range=1:172800\n")

and meta/rose-meta.conf swapping range=60:172800 to range=1:172800

  • Node is 8x8 CPUS (actually 128 cores as hyper threaded)

  • Python mismatches - if loading in cray-python must load after other cyclc tasks and unload in after in pre scripts

  • Nesting Suite archer suite had reference to short partition which does not exist short q is accessed by short reservation on standard partition

  • short q limited to 1 running 1 queuing

Archiving

There are various methods for archiving specifical for JASMIN :

suite u-de766 is an example of a nested suite with auto archiving of data

Here an extra rose task has been added to run at the end of each cycle to archive the data to a GWS folder.

coming soon UM13 RA3 nested suite barebones with archer2 era5 fixes, ants and archiving enabled

I/O issues

If having I/O issues consider using the high performance scratch space, this deletes data automatically so not for any lasting storage.

https://docs.archer2.ac.uk/user-guide/data/#solid-state-nvme-file-system-scratch-storage

to rose-suite.conf add the following lines to the top

root-dir{share}=ln*=/mnt/lustre/a2fs-nvme/work/n02/n02/$USER
root-dir{work}=ln*=/mnt/lustre/a2fs-nvme/work/n02/n02/$USER

Module fails

Job fails with error logs not finding modules in old suites

module restore module yields : Lmod has detected the following error:  User module collection: "2020.12.14" does not exist.

module restore is no longer a command. module load um will load um modules

Transitioning to 23-cab from 4 cab (suites pre-2021)

Main files that need altering:

site/ncas-cray-ex/suite-adds.rc

all module restore commands must be modified to something like:

export MODULEPATH=$MODULEPATH:/work/n02/n02/simon/modulefiles
module load um

the CAP9.1 path must be altered

# replace
# export PATH=/work/n02/n02/grenvill/CAP9.1/build/bin:$PATH
# with
export PATH=/work/y07/shared/umshared/CAP9.1/build/bin:$PATH

CAP9 path must also be altered in:

app/install_cold/opt/rose-app-ncas-cray-ex.conf

# replace
# source=/work/n02/n02/grenvill/CAP9.1/build/bin
# with
source=/work/y07/shared/umshared/CAP9.1/build/bin

UM source code mods

Depending on which version of the UM more or less modifications will be required:

CCE fixes

UKCA CC12 fixes

An easy way to find the required mods is to browse Jeff Coles UM code select the version you’re interested in and search the revision log for the “cce12 fixes” in Jeff Coles code. View the revision log and view the changes related the CCE12

e.g. for vn 11.2 the following files need to be modified

atmosphere/AC_assimilation/getobs.F90 (4 diffs)
atmosphere/UKCA/ukca_activ_mod.F90 (1 diff)
atmosphere/UKCA/ukca_calc_drydiam.F90 (2 diffs)
atmosphere/UKCA/ukca_volume_mode.F90 (6 diffs)
atmosphere/boundary_layer/bdy_expl2.F90 (2 diffs)
utility/qxreconf/rcf_h_int_nearest_mod.F90 (2 diffs)

This is highlighted here: UKCA and other Code fixes examples

CCE 15

make the edits suggested by NCAS CMS https://cms.ncas.ac.uk/archer2/os-upgrade/

coming soon nested suite with CCE15 mods

Troubleshooting:

  1. Access issue ssh login.archer2.ac.uk

returns

PTY allocation request failed on channel 0 bash: /home1/z00/puma-access-route/bin/pumawrapper: No such file or  directory

check the path in

/home1/system/puma-access-route/keys/$USER

/home1/z00/ should be /home1/system/