Posted tagged ‘Solaris’

How to get Sun ClusterTools 8.2.1 to work on MIT’s AFS Athena cluster

March 9, 2010

At the end of the 1990’s MIT’s Athena cluster used to be mostly Sparc/Solaris with a small component of MIPS/IRIX boxes – in the 2000s this shifted to x86/x86_64 with Linux and an ever decreasing proportion of Sparc/Solaris boxes – the last (quad processor Sparc IIIi based) of them – which used to offer the old "dialup" service – are supposed to go away in the summer of 2010 – which is a great pity for us as we run daily MITgcm validation tests on them.

In any case, I had installed Sun ClusterTools 8.2.1 (OpenMPI 1.3.4 based and since version 8.2.1c – based on OpenMPI 1.4.2 – called Oracle Message Passing Toolkit) on my Athena account for both x86/x86_64/Linux and Sparc/Solaris. Given that the installation did not happen in the default location I have had to use the OPAL PREFIX environment variable to point to the new installation root directory. Runs within the same node are fine under both operating systems. Running across multiple nodes however is a bit more complicated. In all cases one should use Kerberos and the kerberized rsh with the right flags to allow connecting to other nodes for startup (of course having gotten the tickets first). AFS creates problems when it comes to seeing files on AFS In the absence of a queuing system one uses a manual hostfile and to allow for mixed MPI-OpenMP applications the value of the environment variable OMP_NUM_THREADS also needs to be propagated with the -x flag.

Under Linux this works fine (even for the mixed case) with both the GNU compiler (as a quick test I used the mixed MPI-OpenMP benchmark HOMB):

no-knife:benchmark/comms/HOMB-1.0% mpirun -report-bindings -wd $PWD -mca plm_rsh_agent "krb5-rsh -x -F" -hostfile /mit/13.715/ompihosts-linux -np 6 -pernode -display-map -x OPAL_PREFIX -x OMP_NUM_THREADS /tmp/homb.ex.gnu -NRC 3072 -NITER 10 -pc -s ======================== JOB MAP ======================== Data for node: Name: contents-vnder-pressvre Num procs: 1 Process OMPI jobid: [35507,1] Process rank: 0 Data for node: Name: no-knife.mit.edu Num procs: 1 Process OMPI jobid: [35507,1] Process rank: 1 Data for node: Name: mass-toolpike Num procs: 1 Process OMPI jobid: [35507,1] Process rank: 2 Data for node: Name: home-on-the-dome Num procs: 1 Process OMPI jobid: [35507,1] Process rank: 3 Data for node: Name: scrubbing-bubbles Num procs: 1 Process OMPI jobid: [35507,1] Process rank: 4 Data for node: Name: all-night-tool Num procs: 1 Process OMPI jobid: [35507,1] Process rank: 5 ============================================================= This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. Number of Rows: 3072, Number of Columns: 3072, Number of Iterations: 10 Number of Tasks: 6, Number of Threads per Task: 2 Reduction at end of each iteration. Summary Standard Ouput with Header #==========================================================================================================# # Tasks Threads NR NC NITER meanTime maxTime minTime NstdvTime # #==========================================================================================================# 6 2 3072 3072 10 0.376302 0.636142 0.151431 0.350631

and using the Sun Compilers for Linux:

no-knife:benchmark/comms/HOMB-1.0% mpirun -report-bindings -wd $PWD -mca plm_rsh_agent "krb5-rsh -x -F" -hostfile /mit/13.715/ompihosts-linux -np 6 -pernode -display-map -x OPAL_PREFIX -x OMP_NUM_THREADS /tmp/homb.ex.sun.linux.m32 -NRC 3072 -NITER 10 -pc -s ======================== JOB MAP ======================== Data for node: Name: contents-vnder-pressvre Num procs: 1 Process OMPI jobid: [34033,1] Process rank: 0 Data for node: Name: no-knife.mit.edu Num procs: 1 Process OMPI jobid: [34033,1] Process rank: 1 Data for node: Name: mass-toolpike Num procs: 1 Process OMPI jobid: [34033,1] Process rank: 2 Data for node: Name: home-on-the-dome Num procs: 1 Process OMPI jobid: [34033,1] Process rank: 3 Data for node: Name: scrubbing-bubbles Num procs: 1 Process OMPI jobid: [34033,1] Process rank: 4 Data for node: Name: all-night-tool Num procs: 1 Process OMPI jobid: [34033,1] Process rank: 5 ============================================================= This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. This rsh session is encrypting input/output data transmissions. Number of Rows: 3072, Number of Columns: 3072, Number of Iterations: 10 Number of Tasks: 6, Number of Threads per Task: 2 Reduction at end of each iteration. Summary Standard Ouput with Header #==========================================================================================================# # Tasks Threads NR NC NITER meanTime maxTime minTime NstdvTime # #==========================================================================================================# 6 2 3072 3072 10 0.040981 0.066348 0.015613 0.448715

Unfortunately under Solaris things get ugly: One needs to also set LD_LIBRARY_PATH for both ClusterTools and the Sun Compilers

setenv LD_LIBRARY_PATH /mit/sunsoft_v12u1/sunstudio12.1/libdynamic/v9:/mit/sunsoft_v12u1/sunstudio12.1/libdynamic:$OPAL_PREFIX/lib/64:$OPAL_PREFIX/lib/32:/mit/gcc-4.0/lib:/mit/13.715/ActiveTcl-8.4/lib:/mit/13.715/ActiveTcl-8.4/lib/tclx8.4

and then specify both -prefix and propagate LD_LIBRARY_PATH with the -x flag and still in both 32-bit mode:

mpirun -prefix $OPAL_PREFIX -report-bindings -wd $PWD -mca plm_rsh_agent "rsh -x -F" -hostfile /mit/13.715/ompihosts-sunos -np 2 -pernode -display-map -x OPAL_PREFIX -x OMP_NUM_THREADS -x LD_LIBRARY_PATH /tmp/homb.ex.sun.sparc.m32 -NRC 2048 -NITER 10 -pc -s [department-of-alchemy.mit.edu:14767] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 161 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_plm_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [department-of-alchemy.mit.edu:14767] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [department-of-alchemy.mit.edu:14767] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file orterun.c at line 541

and 64-bit mode.

mpirun -prefix $OPAL_PREFIX -report-bindings -wd $PWD -mca plm_rsh_agent "rsh -x -F" -np 2 -display-map -x OPAL_PREFIX -x OMP_NUM_THREADS -x LD_LIBRARY_PATH /tmp/homb.ex.sun.sparc.m64 -NRC 2048 -NITER 10 -pc -s [department-of-alchemy.mit.edu:14671] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 161 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_plm_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [department-of-alchemy.mit.edu:14671] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [department-of-alchemy.mit.edu:14671] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file orterun.c at line 541

The error messages are rather cryptic but after some effort one discovers that they are caused by AFS – specifically the partner node has trouble accessing files over AFS (including the ones providing the MPI runtime) because the tokens have not been generated/propagated properly – a problem not seen under Athena-Linux. The solution is to manually login to all partner nodes (and run aklog if this is not done automatically for you). Then one can successfully use ClusterTools in 32-bit mode on Sparc/Solaris/Athena on multiple nodes.

mpirun -prefix $OPAL_PREFIX -report-bindings -wd $PWD -mca plm_rsh_agent "rsh -x -F" -hostfile /mit/13.715/ompihosts-sunos -np 2 -display-map -pernode -x OPAL_PREFIX -x OMP_NUM_THREADS -x LD_LIBRARY_PATH /tmp/homb.ex.sun.sparc.m32 -NRC 2048 -NITER 10 -pc -s ======================== JOB MAP ======================== Data for node: Name: department-of-alchemy.mit.edu Num procs: 1 Process OMPI jobid: [2763,1] Process rank: 0 Data for node: Name: biohazard-cafe Num procs: 1 Process OMPI jobid: [2763,1] Process rank: 1 =============================================================This rsh session is encrypting input/output data transmissions.Number of Rows: 2048, Number of Columns: 2048, Number of Iterations: 10 Number of Tasks: 2, Number of Threads per Task: 2 Reduction at end of each iteration.Summary Standard Ouput with Header #==========================================================================================================## Tasks Threads NR NC NITER meanTime maxTime minTime NstdvTime ##==========================================================================================================# 2 2 2048 2048 10 0.054501 0.058747 0.052440 0.036289

Trying to do the same in 64-bit mode runs into trouble because of problematic resolution of dynamic libraries – I use SunStudio 12u1 and the remote node does not want to be forced to properly load them up.