Mercurial > hg > cfcfd3
changeset 142:62e974ac1e4d
Eilmer3 user guide and sphinx docs: moved cluster computer notes to sphinx.
author | Peter Jacobs <peterj@mech.uq.edu.au> |
---|---|
date | Tue, 27 Mar 2012 22:23:17 +1000 |
parents | f8b43ee9c37c |
children | 465ba93a7655 |
files | doc/sphinx/eilmer3.rst doc/sphinx/getting-started.rst examples/eilmer3/user-guide/eilmer3-user-guide.tex |
diffstat | 3 files changed, 93 insertions(+), 90 deletions(-) [+] |
line wrap: on
line diff
--- a/doc/sphinx/eilmer3.rst Mon Mar 26 22:08:39 2012 +1000 +++ b/doc/sphinx/eilmer3.rst Tue Mar 27 22:23:17 2012 +1000 @@ -61,6 +61,88 @@ use the same. If not, use whatever hierarchy you like. +Building and running on the Barrine cluster at UQ +------------------------------------------------- +The details of running simulations on any cluster computer will be specific +to the local configuration. +The Barrine cluster is run by the High-Performance Computing Unit at The University of Queensland +and is a much larger machine, with a little over 3000 cores, running SUSE Enterprise Linux. + +* Set up your environment by adding the following lines to your .bashrc file:: + + module load python + module load intel-cc-11 + module load intel-mpi/3.2.2.006 + export PATH=${PATH}:${HOME}/e3bin + export LUA_PATH=${HOME}/e3bin/?.lua + export LUA_CPATH=${HOME}/e3bin/?.so + + Note that we load a specific version of the MPI module. + +* Get yourself an interactive shell on a compute node so that you don't hammer the login node + while compiling. You won't make friends if you keep the login node excessively busy:: + + $ qsub -I -A uq-Jacobs + +* To compile the MPI-version of the code, use the command:: + + $ make TARGET=for_intel_mpi install + + from the cfcfd2/app/eilmer3/build/ directory. + +* Optionally, clean up after the build.:: + + $ make clean + +* To submit a job to PBS-Pro, which is the batch queue system on barrine, + use the command:: + + $ qsub script_name.sh + +* An example of a shell script prepared for running on the Barrine cluster:: + + #!/bin/bash -l + #PBS -S /bin/bash + #PBS -N lehr + #PBS -q workq + #PBS -l select=3:ncpus=8:NodeType=medium:mpiprocs=8 -A uq-Jacobs + #PBS -l walltime=6:00:00 + # Incantations to get bash to behave and the Intel MPI bits in place. + . /usr/share/modules/init/bash + module load intel-mpi/3.2.2.006 + echo "Where are my nodes?" + echo $PBS_NODEFILE + cat $PBS_NODEFILE + echo "-------------------------------------------" + echo "Begin MPI job..." + date + cd $PBS_O_WORKDIR + mpirun -np 24 $HOME/e3bin/e3mpi.exe --job=lehr --run --max-wall-clock=20000 > LOGFILE + echo "End MPI job." + date + # As we leave the job, make sure that we leave no processes behind. + # (The following incantation is from Gerald Hartig.) + for i in $(cat $PBS_NODEFILE | grep -v `hostname` | sort -u); do + ssh $i pkill -u `whoami` + done + killall -u `whoami` e3mpi.exe + + This is the script input examples/eilmer3/2D/lehr-479/run_simulation.sh. + + Here, we ask for 3 nodes with 8 cores each for a set of 24 MPI tasks. + The medium nodes have 8 cores available, and we ask for all of them so that we are reasonably sure + that our job will not be in competition with another job on the same nodes. + Note the -A accounting option. + You will have to use an appropriate group name + and you can determine which groups you are part of with the "groups" command. + Unlike SGE on Blackhole, we seem to need to change to the working directory before running the + simulation code. + Finally, we have redirected the standard output from the main simulation to the file LOGFILE + so that we can monitor progress with the command:: + + $ tail -f LOGFILE + + When things go wrong -------------------- Eilmer3 is a complex piece of software,
--- a/doc/sphinx/getting-started.rst Mon Mar 26 22:08:39 2012 +1000 +++ b/doc/sphinx/getting-started.rst Tue Mar 27 22:23:17 2012 +1000 @@ -107,6 +107,8 @@ #. tk #. bwidget #. gnuplot +#. tcl-dev (if you want to build IMOC) +#. maxima (to run the Method-of-Manufactured-Solutions test case for Eilmer3) Using the codes on MS-Windows -----------------------------
--- a/examples/eilmer3/user-guide/eilmer3-user-guide.tex Mon Mar 26 22:08:39 2012 +1000 +++ b/examples/eilmer3/user-guide/eilmer3-user-guide.tex Tue Mar 27 22:23:17 2012 +1000 @@ -327,14 +327,17 @@ \subsection{Running the simulation in parallel (e3mpi.exe)} % -One can build and run the distributed-memory version of the program, \texttt{e3mpi.exe}, on computers with -the MPI (Message Passing Interface) library\footnote{See, for example, http://www.open-mpi.org/.} and runtime environment. -The notes in Appendix\,\ref{getting-started-file} show how to build the Eilmer3 executable for OpenMPI. To run -Eilmer3 across multiple processors on a local machine use the following command\\ +One can build and run the distributed-memory version of the program, \texttt{e3mpi.exe}, +on computers with +the MPI (Message Passing Interface) library\footnote{See, for example, http://www.open-mpi.org/.} +and runtime environment. +The notes in Appendix\,\ref{getting-started-file} show how to build and run +the Eilmer3 executable for OpenMPI. +These notes are also available in HTML form at the URL +\texttt{http://www.mech.uq.edu.au/cfcfd/eilmer3.html}. +To run Eilmer3 across multiple processors on a local machine use the following command\\ \texttt{mpirun -np \textit{n} e3mpi.exe --job=name --run}\\ where \textit{n} is the number of processors to use. -There are also some notes in Appendix\,\ref{blackhole-notes-sec} on batch commands and -job output files for the Blackhole and Barrine clusters at UQ. \subsection{Restarting a simulation}\index{restarting a simulation} % @@ -2315,90 +2318,6 @@ %\input{../../../lib/gas_models2/tex/scriptnoneq} \cleardoublepage -\section{Notes on running MPI jobs on cluster computers} -\label{blackhole-notes-sec} -The details of running simulations on any cluster computer will be specific to the local -configuration. -The Blackhole cluster computer belongs to the Hypersonics Group at the University of Queensland -and is a SUN Rack computer consisting of about 66 nodes with dual AMD Opteron processors. -The Barrine cluster is run by the High-Performance Computing Unit at The University of Queensland -and is a much larger machine, with a little over 3000 cores, running SUSE Enterprise Linux. -% -\subsection{The Blackhole (SUN Rack) cluster} -% -\begin{itemize} - \item Set the environment up by customizing your \verb .bash_profile ~ file.\\ - \topbarshort - \lstinputlisting[language={}]{./blackhole_bash_profile.sh} - \bottombarshort - \item Get the source code tree onto Blackhole using the \texttt{rsync} program - or whatever you find convenient to use.\\ - \texttt{rsync -av triton:cfcfd2 .} - \item To compile the MPI-version of the code, use the command:\\ - \texttt{make TARGET=for\_openmpi install}\\ - from the \texttt{cfcfd2/app/eilmer3/build/} directory. - \item Optionally, clean up after the build.\\ - \texttt{make clean} - \item To submit a job to Sun Grid Engine (SGE), which is the batch queue system on Blackhole, - use the command:\\ - \texttt{qsub} \textit{script\_name.sh} -\clearpage - \item An example of a shell script prepared for running on the Blackhole cluster.\\ - \topbarshort - \lstinputlisting[language={}]{../3D/finite-cylinder/thermal-eq/run_simulation.sh} - \bottombarshort - \item When running a job on the through the batch queue system, the job is identified by - a number \textit{nnnnn}. - Output that would have gone to the console (as standard-output) for an interactive job - will be collected in the file \textit{job.onnnnn}. - Error messages will accumulate in the file \textit{job.ennnnn}. - \item To see how the calculation is progressing by following the content of the output file, - use the command\\ - \texttt{tail -f} \textit{job.onnnnn} - \item To put a hold on a job while waiting for another to finish, use the command\\ - \texttt{qsub --hold\_jid} \textit{nnnnn} \textit{script\_name.sh} -\end{itemize} -% -\subsection{The barrine.hpcu.uq.edu.au SGI cluster} -% -\begin{itemize} - \item Set up your environment by adding the following lines to your \texttt{.bashrc} file.\\ - \texttt{module load python}\\ - \texttt{module load intel-cc-11}\\ - \texttt{module load intel-mpi/3.2.2.006}\\ - \texttt{export PATH=\$\{PATH\}:\$\{HOME\}/e3bin} \\ - \texttt{export LUA\_PATH=\$\{HOME\}/e3bin/?.lua} \\ - \texttt{export LUA\_CPATH=\$\{HOME\}/e3bin/?.so} \\ - Note that we load a specific version of the MPI module. - \item Get yourself an interactive shell on a compute node so that you don't hammer the login node - while compiling. You won't make friends if you keep the login node excessively busy.\\ - \texttt{qsub -I -A uq-Jacobs} - \item To compile the MPI-version of the code, use the command:\\ - \texttt{make TARGET=for\_intel\_mpi install}\\ - from the \texttt{cfcfd2/app/eilmer3/build/} directory. - \item Optionally, clean up after the build.\\ - \texttt{make clean} - \item To submit a job to PBS-Pro, which is the batch queue system on barrine, - use the command:\\ - \texttt{qsub} \textit{script\_name.sh} - \item An example of a shell script prepared for running on the Barrine cluster.\\ - \topbarshort - \lstinputlisting[language={}]{../2D/lehr-479/run_simulation.sh} - \bottombarshort\\ - Here, we ask for 3 nodes with 8 cores each for a set of 24 MPI tasks. - The medium nodes have 8 cores available, and we ask for all of them so that we are reasonably sure - that our job will not be in competition with another job on the same nodes. - Note the \texttt{-A} accounting option. You will have to use an appropriate group name - and you can determine which groups you are part of with the \texttt{groups} command. - Unlike SGE on Blackhole, we seem to need to change to the working directory before running the - simulation code. - Finally, we have redirected the standard output from the main simulation to the file \texttt{LOGFILE} - so that we can monitor progress with the command\\ - \texttt{tail -f LOGFILE} -\end{itemize} - - -\cleardoublepage \section{cfpylib modules}\index{module!cfpylib} There are a number of modules that are useful for the definition of flow simulations but are not part of the Eilmer code.