Manual
- This manual covers the core code, setup procedures and the mathematical model.
- The addition of extra outputs, post-processing, and just about anything else should be done with a utility. See Utilities.
$ \newcommand{\bnabla}{\vec{\nabla}} \newcommand{\Rey}{Re} \def\vechat#1{\hat{\vec{#1}}} \def\mat#1{#1} $
Getting started
Overview of files
- Makefile will require modification for your compiler and libraries (see #Compiling_libraries). Sample commands for other compilers can be found near the top of the file.
- parallel.h Ensure _Np is set to 1 if you do not have MPI. This file contains macros for parallelisation that are only invoked if the number of processes Np is greater than 1.
- program/parameters.f90, where you’ll find parameters. See #Parameters
- utils/ contains a number of utilities. Almost anything can be done in a util, both post-processing and analysing data at runtime. There should be no need to alter the core code. See #Making_utils
- Matlab/, a few scripts. See Matlab/Readme.txt .
Parameters
./parallel.h:
_Np
Number of processors (set 1 for serial)
./program/parameters.f90:
i_N | Number of radial points $n\in[1,N]$ |
i_K | Maximum k (axial), $k\in(-K,\,K)$ |
i_M | Maximum m (azimuthal), $m\in[0,\,M)$ |
i_Mp | Azimuthal periodicity, i.e. $m=0,M_p,2M_p,\dots,(M-1)M_p$ |
(set =1 for no symmetry assumption) | |
d_Re | Reynolds number $Re$ or $Rm_m$ |
d_alpha | Axial wavenumber $\alpha=2\pi/L_z$ |
b_const_flux | Enforce constant flux $U_b=\frac1{2}$. |
i_save_rate1 | Save frequency for snapshot data files |
i_save_rate2 | Save frequency for time-series data |
i_maxtstep | Maximum number of timesteps (no limit if set $<0$) |
d_cpuhours | Maximum number of cpu hours |
d_time | Start time (taken from state.cdf.in if set $<0$) |
d_timestep | Fixed timestep (typically 0.01 or dynamically controlled if set $<0$) |
d_dterr | Maximum corrector norm, $\|f_{corr}\|$ |
d_courant | Courant number $\mathrm{C}$ |
d_implicit | Implicitness $c$ |
Input files
State files are stored in the NetCDF data format and can be transferred across different architectures safely. The program main.out runs with the compiled parameters (see main.info) but will load states of other truncations. Copy any state file e.g. state0018.cdf.dat to state.cdf.in It will interpolate if necessary.
ll
state.cdf.in:
$t$ & – Start time. Overridden by d_time if d_time$\ge$0
$\Delta t$ & – Timestep. Ignored, see parameter d_timestep.
$N, M_p, r_n$ & – Number of radial points, azimuthal periodicity,
& $\quad$ radial values of the input state.
& – If the radial points differ the fields are
& $\quad$ interpolated onto the new points.
$u_r,\,u_\theta,\, u_z$ & – Field vectors.
& – If $K\ne\,$K or $M\ne\,$M, harmonics are truncated or
& $\quad$ zeros appended.
Output
Snapshot data
Data saved every i_save_rate1 timesteps:
state????.cdf.dat vel_spectrum.dat
All output is sent to the current directory, and state files are numbered 0000, 0001, 0002,…. Each file can be renamed state.cdf.in should a restart be necessary. To list times $t$ for each saved state file,
grep state OUT
The spectrum files are overwritten each save as they are retrievable from the state data. To verify sufficient truncation, a quick profile of the energy spectrum can be plotted with
gnuplot> set log
gnuplot> plot 'vel_spectrum.dat' w lp
Time-series data
Data saved every i_save_rate2 timesteps:
tim_step.dat | $t$, $\Delta t$, $\Delta t_{\|f\|}$, $\Delta t_{CFL}$ | current and limiting step sizes |
vel_energy.dat | $t$, $E$ | energies and viscous dissipation |
vel_friction.dat | $t$, $U_b$ or $\beta$, $\langle u_z(r=0)\rangle_z$, $u_\tau$ | bulk velocites, pressure, friction vel. |
Compiling libraries
Take a look at Makefile. The compiler and flags will probably need editing. There are suggested flags for many compilers at the top of this file.
Building the main code is successful if running make produces no errors, but this requires the necessary libraries to be present - LAPACK, netCDF and FFTW. Often it is necessary to build LAPACK and NetCDF with the compiler and compile flags that will be used for the main simulation code.
The default procedure for building a package (FFTW3, netCDF) is
tar -xvvzf package.tar.gz cd package/ [set environment variables if necessary] ./configure --prefix=<path> make make install
– FFTW3. This usually requires no special treatment. Install with your package manager or build with the default settings.
– LAPACK. If using gfortran you might be able to use the binary version supplied for your linux distribution. Otherwise, edit the file make.inc that comes with LAPACK, setting the fortran compiler and flags to those you plan to use. Type ‘make’. Once finished, copy the following binaries into your library path (see Makefile LIBS -L$<$path$>$/lib/)
cp lapack.a <path>/lib/liblapack.a
cp blas.a <path>/lib/libblas.a
– netCDF. If using gfortran you might be able to use the supplied binary version for your linux distribution. Otherwise, typical environment variables required to build netCDF are
CXX=""
FC=/opt/intel/fc/10.1.018/bin/ifort
FFLAGS="-O3 -mcmodel=medium"
export CXX FC FFLAGS
After the ‘make install’, ensure that files netcdf.mod and typesizes.mod appear in your include path (see Makefile COMPFLAGS -I$<$path$>$/include/). Several versions can be found at
http://www.unidata.ucar.edu/downloads/netcdf/current/index.jsp. Version 4.1.3 is relatively straight forward to install, the above flags should be sufficient.
Installation is trickier for more recent versions [2013-12-11 currently netcdf-4.3.0.tar.gz]. First
./configure --disable-netcdf-4 --prefix-<path>
which disables HDF5 support (not currently required in code). Fortran is no longer bundled, so get netcdf-fortran-4.2.tar.gz or a more recent version from here
http://www.unidata.ucar.edu/downloads/netcdf/index.jsp
Build with the above flags, and in addition
CPPFLAGS=-I<path>/include
export CPPFLAGS
Add the link flag -lnetcdff before -lnetcdf in the Makefile.
Compiling and running
See also the Tutorial.
For serial use set _Np to 1 in parallel.h. Otherwise MPI is required and the compile must be updated in the Makefile. First set parameters in program/parameters.f90. Next type
> make > make install
The second command creates the directory install/ and a text file main.info, which is a record of settings at compile time. Next an initial condition state.cdf.in is needed,
> mv install ~/runs/job0001 > cd ~/runs/job0001/ > cp .../state0100.cdf.dat state.cdf.in > nohup ./main.out > OUT 2> OUT.err &
Press enter again to see if main.out stopped. If it has stopped, check OUT.err or OUT for clues why. If it appears to be running, wait a minute or two then
> rm RUNNING
This will terminate the job cleanly.
NOTE: Any output state, e.g. state0012.cdf.dat can be copied to state.cdf.in to be used as an initial condition. If resolutions do not match, they are automatically interpolated or truncated. This is how I generate almost all initial conditions, by taking a state from a run with similar parameters. If there is a mismatch in Mp, use utils/changeMp.f90.
Monitoring a run
Immediately after starting a job, it’s a good idea to check for any warnings
> less OUT
To find out number of timesteps completed, or for possible diagnosis of an early exit,
> tail OUT
The code outputs timeseries data and snapshot data, the latter has a 4-digit number e.g. state0012.cdf.dat.
To see when the in the run each state was saved,
> grep state OUT | less [OR] > head -n 1 vel_spec* | less
I often monitor progress with tail vel_energy.dat or
> gnuplot > plot 'vel_energy.dat' w l
Use rm RUNNING to end the job.
Making utils
The core of the code in program/ rarely needs to be changed. Almost anything can be done by creating a utility instead. There are many examples in utils/ and further information on the Utilities page.
In Makefile, set UTIL = utilname (omitting the .f90 extension), then type 'make util' which builds utilname.out.
It is good practice to do a 'make install' to generate a main.info file to keep alongside the executable.
Spatial representation
Finite differences — $r$
A function $p(r)$ is represented by the values $p_n=p(r_n)$, where $p(r)$ is evaluated on the $N$ radial points $r_n$.
Differentiation
Taylor expansions about central point $x_0$, $k$ neighbouring points each side
$\label{eq:taylorexp} \begin{array}{lclrrrr}
f(x_{-k}) &=& f(x_0) &+& (x_{-k}-x_0)\,f'(x_0) &+& \frac{(x_{-k}-x_0)^2}{2!}\,f(x_0)+\dots \\ & \vdots & \\ f(x_k) &=& f(x_0) &+& (x_k-x_0)\,f'(x_0) &+& \frac{(x_k-x_0)^2}{2!}\,f(x_0)+\dots
\end{array}$
The expansions \ref{eq:taylorexp} can be written
$\vec{f}=\mat{A}\,\vec{df}, \quad
\vec{df}=\mat{A}^{-1}\,\vec{f}, \quad \vec{f}=\left[\begin{array}{c}f(x_{-k})\\ \vdots\\ f(x_k)\end{array}\right], \quad \vec{df}=\left[\begin{array}{c} f(x_0)\\ f'(x_0)\\ f(x_0)\\ \vdots \end{array}\right]$
Derivatives are calculated using weights from the appropriate row of $\mat{A}^{-1}$.
Integration
Integrating \ref{eq:taylorexp} the indefinite integral may be approximated about $x_0$
$\begin{eqnarray*}
\int f(x) \, {\mathrm d}x & = & \begin{array}{c}(x-x_0)\,f(x_0) + \frac{(x-x_0)^2}{2!}\,f'(x_0) + \frac{(x-x_0)^3}{3!}\,f(x_0) + \dots \end{array} \\ & = & [\begin{array}{lll}(x-x_0) & \frac{(x-x_0)^2}{2!} & \dots \end{array}] \,\, \mat{A}^{-1}\vec{f}\end{eqnarray*}$
The total integral is approximated by
$ \frac1{2} \sum_n \int_{x_{n-1}}^{x_{n+1}} f(x) \, {\mathrm d}x$
Matrix representation of linear operators
Linear equations can be written in the matrix form
$\mat{L} \,\vec{p} = \mat{M}\, \vec{q} + \vec{s}.$
$\mat{L}$ and $\mat{M}$ are $N\times N$ matrices representing linear operators. If $\vec{q}$ and $\vec{s}$ are known then the right-hand side is simple to evaluate. Consider the equation
$\label{eq:matmod}
\mat{L}\, \vec{p} = \vec{q},$
where $\vec{p}$ is unknown. If the matrix operators are formed by linear combinations of the weights in #Differentiation, using only $k$ neighbouring points to each side, they are banded. Boundary conditions are formed in the same way and are placed in the end rows. The equation is solved by forward-backward substitution, using the banded LU-factorisation of $\mat{L}$.
figure=LU.eps, scale=0.65
Fourier representaion — $\theta$, $z$
Expansion of variables
$A(\theta,z) =
\sum_{km} A_{km} {\mathrm e}^{{\mathrm i} (\alpha kz + m_0 m\theta)}$
$A$ real implies $A_{km} = A_{-k,-m}^*$ (coefficients are conjugate-symmetric). It is sufficient to keep only $m\ge 0$, and for $m=0$ keep $k\ge 0$.
Orthogonality
Exponentials
$\int_0^{2\pi} {\mathrm e}^{{\mathrm i}(m+n)} \, {\mathrm d}\theta
= 2\pi \, \delta_{m,-n}\, , \qquad \int_0^{\frac{2\pi}{\alpha}} {\mathrm e}^{{\mathrm i}\alpha(k+j)} \, {\mathrm d}z = \frac{2\pi}{\alpha} \, \delta_{k,-j}$
Energy integral
$E \, = \,
{\textstyle \frac1{2}} \int \vec{A} \cdot \vec{A} \, {\mathrm d}V \, = \, \frac{2\pi^2}{\alpha} \, \sum_{km} \int |\vec{A}_{km}|^2 \, r\,{\mathrm d}r$
$E = \sum_{m=0}^{\infty} E_m,
\qquad E_m = \left\{ \begin{array}{ll} \displaystyle \frac{2\pi^2}{\alpha}\int\vec{A}_{00}^2 \, r \, {\mathrm d}r + \frac{4\pi^2}{\alpha} \sum_{k>0} \int|\vec{A}_{k0}^2| \, r \, {\mathrm d}r, & m=0, \\[15pt] \displaystyle \frac{4\pi^2}{\alpha} \sum_k \int |\vec{A}_{km}|^2 \, r \, {\mathrm d}r, & m>0 . \end{array} \right.$
Volume integral
$E \, = \,
{\textstyle \frac1{2}} \int A \, {\mathrm d}V \, = \, \frac{4\pi^2}{\alpha} \, \int A_{00} \, r\,{\mathrm d}r$
Enforced conditions at the axis
The geometry enforces conditions on each variable at the axis when expanded over Fourier modes in $\theta$. Given a vector A, each mode for $A_z$ is even in $r$ if $m$ is even, and is odd if $m$ is odd. Each mode for $A_r$ and $A_\theta$ is even in $r$ if $m$ is odd, and is odd if $m$ is even.
Model
Governing equations
Non-dimensionalisation / scales
- Length $R$ (radius=$D/2$).
- Velocity $2\, U_b$ (fixed flux), or $U_{cl}$ of HPF (fixed pressure).
- For laminar HPF $U_{cl} = 2\,U_b$ (centreline, bulk=mean).
- The non-dimensional radius is 1, laminar centreline speed is 1, and bulk speed is 1/2.
Dimensionless parameters
Reynolds number, fixed flux, $Re_m = 2 U_b R / \nu$
Reynolds number, fixed pressure, $Re = U_{cl} R / \nu$
$1+\beta = Re / Re_m$
$Re_\tau=u_\tau R/\nu = (2\,Re_m\,(1+\beta))^\frac{1}{2}=(2\,Re)^\frac{1}{2}$
Evolution equations
Fixed flux,
$ (\partial_{t} + \vec{u}\cdot\bnabla) \vec{u}
= -\bnabla \hat{p} + \frac{4}{\Rey_m}\,(1+\beta)\vechat{z} + \frac{1}{\Rey_m}\bnabla^2 \vec{u} $
Fixed pressure
$ (\partial_{t} + \vec{u}\cdot\bnabla) \vec{u}
= -\bnabla \hat{p} + \frac{4}{\Rey}\vechat{z} + \frac{1}{\Rey}\bnabla^2 \vec{u} $
Let $\vec{u}=W(r)\vechat{z}+\vec{u}'$. Using the scaling above, $W(r) = 1-r^2$. Then
$ (\partial_{t} - \frac{1}{\Rey_m}\bnabla^2)\,\vec{u}'
= \vec{u}' \wedge (\bnabla \wedge\vec{u}') - \frac{\mathrm{d}W}{\mathrm{d}r}\,u'_z \vechat{z} - W\,\partial_{z}\vec{u}' + \frac{4\,\beta}{\Rey_m}\vechat{z} - \bnabla\hat{p}' \, . $
Decoupling the equations
The equations for $u_r$ and $u_\theta$ are coupled in the Laplacian. They can be separated in a Fourier decompositon by considering
$u_\pm = u_r \pm \mathrm{i} \, u_\theta,$
for which the $\pm$ are considered respectively. Original variables are easily recovered
$u_r = \frac{1}{2} ( u_+ + u_-),
\qquad u_\theta = -\,\frac{\mathrm{i}}{2}(u_+ - u_- ) .$
Governing equations are then
$\begin{eqnarray*}
(\partial_{t} - \nabla^2_\pm)\, u_\pm & = & N_\pm - (\bnabla p)_\pm , \\ (\partial_{t} - \nabla^2 )\, u_z & = & N_z - (\bnabla p)_z ,\end{eqnarray*}$
where
$\nabla^2_\pm = \nabla^2 - \frac{1}{r^2}
\pm \frac{\mathrm{i}}{r^2}\partial_{\theta}$
PPE-formulation with correct boundary conditions
Write the time-discretised Navier–Stokes equations in the form
$\label{eq:NSdisc}
\left\{\begin{array}{rcl} \mat{X}\, \vec{u}^{q+1} & = & \mat{Y}\, \vec{u}^q + \vec{N}^{q+\frac{1}{2}} - \bnabla p \, , \\ \nabla^2 p & = & \bnabla\cdot(\mat{Y}\, \vec{u}^q + \vec{N}^{q+\frac{1}{2}}) , \end{array}\right.$
where $q$ denotes time $t_q$, which is sixth order in $r$ for $\vec{u}^{q+1}$ and second order for $p$, where the solenoidal condition is not explicitly imposed. Symmetry conditions provide the conditions at the axis. The difficulty is in imposing the remaining four — this system should be inverted, in principle, simultaneously for $p$ and $\vec{u}^{q+1}$ with boundary conditions $\vec{u}^{q+1}=\vec{0}$ and $\bnabla\cdot\vec{u}^{q+1}=0$ on $r=R$ (Rempfer 2006). In practice it would be preferable to invert for $p$ first then for $\vec{u}^{q+1}$, but the boundary conditions to not involve $p$ directly.
Note that the $\mat{Y}\,\vec{u}^q$ term has been included in the right-hand side of the pressure-Poisson equation, the divergence of which should be small. Assume that pressure boundary condition is known: the right-hand side of the Navier–Stokes equation is then projected onto the space of solenoidal functions though $p$ and hence after inversion, $\vec{u}^{q+1}$ will be solenoidal.
Consider the ‘bulk’ solution, $\{\bar{\vec{u}},\bar{p}\}$, obtained from solution of the following:
$\label{eq:NSbulk}
\left\{\begin{array}{rcl} \mat{X}\, \bar{\vec{u}} & = & \mat{Y}\, \vec{u}^q + \vec{N}^{q+\frac{1}{2}} - \bnabla \bar{p} \, , \\ \nabla^2 \bar{p} & = & \bnabla\cdot(\mat{Y}\, \vec{u}^q + \vec{N}^{q+\frac{1}{2}}) , \end{array}\right.$
with boundary conditions $\bar{\vec{u}}=\vec{0}$ and $\partial_{r}\bar{p}=0$. Introduce the following systems:
$\label{eq:NSp0}
\left\{\begin{array}{rcl} \vec{u}' & = & -\bnabla p' \, , \\ \nabla^2 p' & = & 0, \end{array}\right.$
with boundary condition $\partial_{r}p'=1$ on $r=R$, and
$\label{eq:NSu0}
\left\{\begin{array}{rcl} \mat{X}\, \vec{u}' & = & \vec{0}, \end{array}\right.$
with boundary conditions $u'_+=1$, $u'_-=1$, $u'_z=\mathrm{i}$ on $r=R$. The system \ref{eq:NSp0} provides a linearly independent function $\vec{u}'_j$ that may be added to $\bar{\vec{u}}$ without affecting the right-hand side in \ref{eq:NSbulk}, but altering (to correct) the boundary condition applied. (Note that $\bnabla^2(\bnabla p')=\bnabla(\nabla^2 p')-\bnabla \wedge\bnabla \wedge\bnabla p'=0$.) Similarly the system \ref{eq:NSu0} provides a further three functions. The superposition
$\label{eq:usuperpos}
\vec{u}^{q+1} = \bar{\vec{u}} + \sum_{j=1}^4 a_j\, \vec{u}'_j \,$
may be formed in order to satisfy the four boundary conditions, $\vec{u}^{q+1}=\vec{0}$ and $\bnabla\cdot\vec{u}^{q+1}=0$ on $r=R$. Substituting \ref{eq:usuperpos} into the boundary conditions, they may be written
$\mat{A}\,\vec{a} = -\vec{g}(\bar{\vec{u}}) ,$
where $\mat{A}=\mat{A}(\vec{g}(\vec{u}'))$ is a 4$\times$4 matrix. The appropriate coefficients required to satisfy the boundary conditions are thereby recovered from solution of this small system for $\vec{a}$. The error in the boundary conditions $g_j(\vec{u}^{q+1})$ using the influence-matrix technique is at the level of the machine epsilon, typically order 1e-18.
The functions $u'_j(r)$, the matrix $\mat{A}$ and its inverse may all be precomputed. The boundary conditions for $\vec{u}'$ have been chosen so that that $u'_\pm$ are pure real, $u'_z$ is pure imaginary, and $\mat{A}$ is real. For each timestep, this application of the influence matrix technique requires only evaluation of the deviation from the boundary condition, multiplication by a 4$\times$4 real matrix, and the addition of only two functions to each component of $\vec{u}$, each either pure real or pure imaginary. Compared to the evaluation of nonlinear terms, the computational overhead is negligible.
Predictor-corrector
The model equation for each Fourier mode is
$\label{eq:harmmod}
(\partial_{t} - \nabla^2) f = N,$
where nonlinear terms have been evaluated on each radial point by the transform method, and is solved as in #Matrix_representation_of_linear_operators. The predictor at time $t_q$, with Euler nonlinear terms and implicitness $c$
$\frac{f_1^{q+1}-f^q}{\Delta t}
- \left( c\nabla^2 f_1^{q+1} + (1-c)\nabla^2 f^q \right) = N^q ,$
$\left( \frac1{\Delta t} - c\nabla^2 \right) f_1^{q+1}
= \left( \frac1{\Delta t} + (1-c)\nabla^2 \right) f^q + N^q .$
Corrector iterations are
$\left( \frac1{\Delta t} - c\nabla^2 \right) f_{j+1}^{q+1}
= \left( \frac1{\Delta t} + (1-c)\nabla^2 \right) f^q + c\,N_j^{q+1} + (1-c)\, N^q ,$
or equivalently, for the correction $f_{corr} = f_{j+1}^{q+1} - f_j^{q+1}$
$\left( \frac1{\Delta t} - c\nabla^2 \right) f_{corr}
= c\,N_j^{q+1} - c\, N_{j-1}^{q+1} ,$
where $j=1,2,\dots$ and $N_0^{q+1}=N^q$. The size of the correction $\|f_{corr}\|$ must reduce at each iteration. For $c=\frac1{2}$ the scheme is second order such that $\|f_{corr}\| \sim \Delta t^2$.
Timestep control
$\Delta t = \mathrm{C} \, \min(\Delta \, / \, |\vec{v}| ) ,
\qquad 0<\mathrm{C}<1,$
where $\mathrm{C}$ is the Courant number.
The timestep should also be small enough such that the corrector norm $\|f_{corr}\|$ is satisfactorily small. This may be important for integrating initial transients, but ought not to be the limiting factor in general.
The code
Naming conventions
Parameters
Parameters defined within parameters.f90 are given a prefix indicating the data type. For example:
$N$ | i_N | integer |
$\Delta t$ | d_timestep | double |
fixed flux? | b_fixed_flux | boolean |
Note that the naming convention frees the variable name for dummy variables, e.g.:
integer :: n do n = 1, i_N ...
Public variables
By default variables declared within a module are accessible to any other module or subroutine that uses the module. To ensure that it is clear where a public variable is declared, variables are given a prefix. Examples:
$\Delta t$ | tim_dt | timestep.f90 |
$u_r$ | vel_r | velocity.f90 |
Ordering the Fourier modes
$A = \sum_{km} A_{km} {\mathrm e}^{{\mathrm i}(\alpha kz + m_0 m\theta)},$
Coefficients are stored as in the following loop, where rather than two indices k and m, the single index nh labels the modes:
nh = -1 do m = 0, M-1 m_ = m*Mp do k = -(K-1), (K-1) if(m==0 .and. k<0) cycle nh = nh+1 ...
NOTE: this loop, and its parallel equivalent, are predefinded macros in parallel.h.
Data types
The data types below consist of logical data groups to facilitate the passing of data between functions. For clarity, in the following text they are defined as though we were working on a single processor. See Parallel regarding subtle differences in the definitions due to parallelisation.
coll, spec — storage of coefficients
The collocated data type (coll) is the principle type used for mode-independent operations.
type coll double precision :: Re(i_N, 0:i_H1) double precision :: Im(i_N, 0:i_H1) end type coll
The value Re(n,nh) is the real part of the coefficient for the nh$^{\mathrm{th}}$ harmonic at $r_n$. H1$=24$ for the example in #Ordering_of_the_Fourier_modes and N is the number of radial points.
The spectral type (spec) is the data type used for operations independent at each radial point. It is principally a transitory data format between the coll and phys types.
type spec double precision :: Re(0:i_H1, i_N) double precision :: Im(0:i_H1, i_N) end type spec
Conversions between the coll and spec types involve only a transpose, var_coll2spec() and var_spec2coll().
phys — real space data
The type (phys) is used for data in real space.
type phys double precision :: Re(0:i_Z-1, 0:i_Th-1, i_N) end type phys
The element Re(k,m,n) refers to the value at $z_k$, $\theta_m$ and $r_n$.
mesh, lumesh — storage of banded matrices
The finite-difference stencil has finite width (Finite_Differences) and so matrix operations involve matrices that are banded. The type (mesh), defined in meshs.f90, is used for matrices on the radial mesh involving $kl$ points to either side of each node.
type mesh double precision :: M(2*i_KL+1, i_N) end type mesh
For a matrix A, the element M(KL+1+n-j, j) = A(n,j). See the man page for the lapack routine dgbtrf. The LU-factorisation of a banded matrix is also banded:
type lumesh integer :: ipiv(i_N) double precision :: M(3*i_KL+1, i_N) end type lumesh
The element M(2*KL+1+n-j, j) = A(n,j). Pivoting information is stored along with the matrix.
rdom — the radial domain
The type (rdom) contains information the radial domain. The public variable mes_D is declared in meshs.f90. Data includes the number of radial points N; powers of $r$, $r_n^p$=r(n,p); weights for integration $r{\mathrm d}r$; matrices for taking the $p^{\mathrm{th}}$ derivative dr(p);…
type rdom integer :: N double precision :: r(i_N,i_rpowmin:i_rpowmax) double precision :: intrdr(i_N) type (mesh) :: dr(i_KL) end type rdom
Modifications for the parallel implementation
The code has been parallelised using the Message Passing Interface (MPI). Sections of the code with special MPI function calls are separated from the serial sections by preprocessor directives. The number of processors is given by the parameter _Np set in parallel.h. The rank of the local processor is given by mpi_rnk declared in mpi.f90, and mpi_sze$=$_Np.
Data type changes for the model equation (see equations \ref{eq:matmod}, \ref{eq:harmmod}) are as follows:
figure=model.eps, scale=0.75
Data is split over the spherical harmonics for the linear parts of the code — evaluating curls, gradients and matrix inversions for the timestepping. These linear operations do not couple modes. Here all radial points for a particular mode are located on the same processor, separate modes may be located on separate processors. In the definition of type (coll) the parameter i_H1 is replaced by i_pH1 = (_Np+i_H1)/_Np-1. The variable var_H has elements to determine which harmonics are on which processor. The elements pH0 and pH1 are the zeroth index and one-minus the number of harmonics on the local processor. The corresponding values for all processors are stored in the arrays pH0_(rank) and pH1_(rank). For a variable of type (coll), valid values for the harmonic index in Re(n,nh) are nh=0..pH1, which refer to the indices (#Ordering_of_the_Fourier_modes) $nh$=pH0..pH0+pH1.
Data is split radially when calculating Fourier transforms and when evaluating products in real space. In the definitions of type (spec) and type (phys) the parameter i_N is replaced by i_pN = (_Np+i_N-1)/_Np. The location of the radial subdomain on the local processor is given by additional elements of the variable mes_D. The elements pNi and pN are the location of the first (inner) radial point and the number of points. The values for all processors are stored in the arrays pNi_(rank) and pN_(rank). For a variable of type (spec), valid values for the radial index in Re(nh,n) are n=1..pN which refer to the indices of $r_n$, where $n$=pNi..pNi+pN-1.
The bulk of communication between processors occurs during the data transposes in the functions var_coll2spec() and var_spec2coll(). For _Np processors the data is transfered in _Np$-1$ steps. At each step each processor sends and receives a block of data, the source and destination are selected systematically as set in the following loop:
do step = 1, mpi_sze-1 dest = modulo(mpi_rnk+step, mpi_sze) src = modulo(mpi_rnk-step+mpi_sze, mpi_sze) ...
figure=transp.eps, scale=0.65
The sketch is for the case _Np$=4$. The size of each block is approximately i_pN$\times($i_pH1$+1)$, and the data within each block must also be transposed.
Differential operators in cylindrical–polar coordinates
[app:diffops]
$(\bnabla f)_r = \partial_{r}f,
\quad (\bnabla f)_\theta = \frac1{r}\, \partial_{\theta} f, \quad (\bnabla f)_z = \partial_{z} f .$
$\nabla^2 f = (\frac1{r}\,partial_{r} + \partial_{rr} ) f
+\frac1{r^2}\,\partial_{\theta\theta} f + \partial_{zz} f .$
$(\nabla^2 \vec{A})_r =
\nabla^2 A_r - \frac{2}{r^2}\,\partial_{\theta}A_\theta - \frac{A_r}{r^2} ,$
$(\nabla^2 \vec{A})_\theta =
\nabla^2 A_\theta + \frac{2}{r^2}\,\partial_{\theta}A_r - \frac{A_\theta}{r^2} ,$
$(\nabla^2 \vec{A})_z =
\nabla^2 A_z .$
$\bnabla \cdot \vec{A} = (\frac1{r} + \partial_{r}) A_r
+ \frac1{r}\,\partial_{\theta}A_\theta + \partial_{z} A_z .$
$(\bnabla \wedge \vec{A})_r =
\frac1{r}\,\partial_{\theta} A_z - \partial_{z} A_\theta ,$
$(\bnabla \wedge \vec{A})_\theta =
\partial_{z} A_r - \partial_{r} A_z ,$
$(\bnabla \wedge \vec{A})_z =
(\frac1{r}+\partial_{r}) A_\theta - \frac1{r}\,\partial_{\theta} A_r .$
$(\vec{A}\cdot\bnabla\,\vec{B})_r =
A_r\,\partial_{r}B_r + \frac{A_\theta}{r}\,\partial_{\theta}B_r + A_z\,\partial_{z}B_r - \frac{A_\theta B_\theta}{r} ,$
$(\vec{A}\cdot\bnabla\,\vec{B})_\theta =
A_r\,\partial_{r}B_\theta + \frac{A_\theta}{r}\,\partial_{\theta}B_\theta + A_z\,\partial_{z}B_\theta + \frac{A_\theta B_r}{r} ,$
$(\vec{A}\cdot\bnabla\,\vec{B})_z =
A_r\,\partial_{r}B_z + \frac{A_\theta}{r}\,\partial_{\theta}B_z + A_z\,\partial_{z}B_z .$
Toroidal-poloidal decomposition, axial:
$\begin{eqnarray*}
& \vec{A} = \bnabla \wedge(\vechat{z}\psi) + \bnabla \wedge\bnabla \wedge(\vechat{z}\phi) , & \\ & \psi=\psi(r,\theta,z), \quad \phi=\phi(r,\theta,z). \nonumber & \end{eqnarray*}$
$\nabla^2_h f =
(\frac1{r}\,\partial_{r}+\partial_{rr}) f + \frac1{r^2}\,\partial_{\theta\theta} f .$
$\vec{A}_r =
\frac1{r}\,\partial_{\theta} \psi + \partial_{rz} \phi ,$
$\vec{A}_\theta =
-\partial_{r} \psi + \frac1{r}\,\partial_{\theta z} \phi ,$
$\vec{A}_z =
- \nabla^2_h \phi .$
Toroidal-poloidal decomposition, radial:
$\begin{eqnarray*}
& \vec{A} = \psi_0\,\vechat{\theta} + \phi_0\,\vechat{z} + \bnabla \wedge(\vec{r}\psi) + \bnabla \wedge\bnabla \wedge(\vec{r}\phi) , & \\ & \psi_0=\psi_0(r), \quad \phi_0=\phi_0(r), \quad \psi=\psi(r,\theta,z), \quad \phi=\phi(r,\theta,z). \nonumber & \end{eqnarray*}$
$\nabla^2_c f = \frac1{r^2}\,\partial_{\theta\theta} f + \partial_{zz} f .$
$\vec{A}_r =
-r \nabla^2_c \phi ,$
$\vec{A}_\theta =
\psi_0 + r \partial_{z} \psi + \partial_{r\theta} \phi ,$
$\vec{A}_z =
\phi_0 - \partial_{\theta} \psi + (2+r\partial_{r})\partial_{z} \phi .$