next up previous contents
Next: 6 Example Up: grape6user Previous: 4 Getting started

Subsections


5 Reference Manual for subroutines

All subroutines are written in C language. However, for simple and unified treatment of Fortran and C application programs, most of routines are written in the form which is directly callable from both Fortran and C. HOWEVER, THIS DUAL FORM IS NOT YET AVAILABLE ON THE BABY-GRAPE6 LIBRARY. On the BABY-6, you need to use the Fortran interface even if you use C/C++. This means that all functions have trailing underscore and all arguments are pointers.

5.1 Overview

Initialization routines

Scaling

Sending data to memory

Setting current time for individual timestep algorithm

Force calculation

Neighbour list

High-level function for shared-timestep algorithm

Low-level functions ... (you need not use these stuff... Well, then why are they listed here?)

Note that functions g6_set_nip through g6_get_force_etc are called internally from g6calc_firsthalf and g6calc_lasthalf, and therefore one need not to use these routines, unless one wants direct access to the low-level functionalities of the GRAPE-6 hardware.

5.2 Initialization

5.2.1 g6_open

int g6_open(int clusterid)

subroutine g6_open(clusterid)
integer clusterid

Initializes the GRAPE-6 hardware and interface package for the specified cluster. If multiple clusters are available on the system, they are numbered from zero. The function which returns the available number of clusters is not implemented yet.... If someone else is using the requested GRAPE-6 cluster, it put the process in the queue and sleep, using the lockf(2) system call.

Note that currently this function takes several seconds to complete when called for the first time.

5.2.2 g6_close

int g6_close(int clusterid)
subroutine  g6_close(clusterid)

This function releases the specified GRAPE-6 cluster and allows other users to acquire it.

5.2.3 g6_reinitialize

int g6_reinitialize(int clusterid)
subroutine  g6_reinitialize(clusterid)

This function performs a hard reset on the GRAPE-6 hardware. The user need to reload all data to GRAPE-6 after this function is called.

5.2.4 g6_initialize_jp_buffer

int g6_initialize_jp_buffer(int clusterid, int size)
subroutine  g6_initialize_jp_buffer(clusterid)

This function initialized the DMA buffer for sending j-particles. The size argument should be something reasonably large, around $10^4$ or so. Note that DMA for sending j-particles is only used when specified in the configuration file. However, the buffer itself is used if this function is called even when DMA is not used. In such cases, the content of the buffer is sent to GRAPE using PIO write.

5.3 Scaling

5.3.1 g6_set_tunit

void g6_set_tunit(int newtunit)

subroutine g6_set_tunit(newtunit)
integer newtunit

GRAPE-6 handles the time for predictor in 64 bit fixed point format. Thus, one need to specify where to put the binary point. The argument newtunit gives the point of the binary point counted from LSB. Thus, the value zero means the time expressed in GRAPE-6 is truncated to integral values, while, say, 50 means the resolution is $2^{-50}$. The default value is 51.

What value should be used depends on the system of units that the application uses. If one uses the ``standard'' or so-called Heggie units, the default value should be fine, but one must be careful that the time would not overflow. With the default value, there are only 13 bits available above the binary point. Also, current implementation of the library software does not use the uppermost 2 bits. So the number of bits available above the binary point is $(62-{\tt tunit})$, which is, with default value, 2048. If the simulation covers the time much longer than $10^3$, the default should be changed.

If you are using GRAPE for, say, galactic simulation and use real physical unit such as Myr as the time unit, you might want to use smaller value for newtunit than the default, since the default does not cover the Hubble time, and the resolution is too high (around 1 milliseconds).

When time unit is changed through the call to this function, the content of the GRAPE-6 memory becomes inconsistent with the time unit, and if individual timestep is used, all ``$j$-particles'' should be resent.

5.3.2 g6_set_xunit

void g6_set_xunit(int newxunit)

subroutine g6_set_xunit(newxunit)
integer newxunit

Similar to g6_set_tunit, but gives the binary point for space coordinate. The default is again 51, which is good if the size of the system is order unity. If your system is much larger or much smaller, you should change the default value accordingly.

5.3.3 g6_set_overflow_flag_test_mode

void g6_set_overflow_flag_test_mode(int force_test_mode,
				    int jerk_test_mode,
				    int pot_test_mode);
subroutine g6_set_overflow_flag_test_mode(force_test_mode,
				          jerk_test_mode,
				          pot_test_mode)
integer force_test_mode, jerk_test_mode, pot_test_mode
This function specifies which overflow flags are checked by the library. To let the library test the overflow for force, you set force_test_mode to one, and to ignore overflow, set it to zero.

The most likely use of this is to set jerk_test_mode to zero if you do not use it for time integration (for example, if you use leapfrog).

5.3.4 g6_get_overflow_flag_test_mode

void g6_get_overflow_flag_test_mode(int *force_test_mode,
				    int *jerk_test_mode,
				    int *pot_test_mode);

subroutine g6_get_overflow_flag_test_mode(force_test_mode,
		         	          jerk_test_mode,
				          pot_test_mode)
integer force_test_mode, jerk_test_mode, pot_test_mode
This function returns the current values of the flags to test overflow. Initially all three flags are set to one.

5.4 Sending data to memory

5.4.1 g6_set_j_particle

int g6_set_j_particle(int  clusterid,
                      int address,
                      int index,
                      double tj, /* particle time */
                      double dtj, /* particle time */
                      double mass,
                      double a2by18[3], /* a2dot divided by 18 */
                      double a1by6[3], /* a1dot divided by 6 */
                      double aby2[3], /* a divided by 2 */
                      double v[3], /* velocity */
                      double x[3] /* position */)

integer function g6_set_j_particle(clusterid, address, index, tj, dtj, mass,
                       a2by18, a1by6, aby2, v, double x)
integer clusterid, address, index
double precision  tj, dtj, mass, a2by18(3), a1by6(3), aby2(3), v(3), x(3)

This function sends the so-called $j$-particle data to the memory unit of GRAPE-6. x, v, aby2, a1by6, a2by18 are position, velocity, half of the acceleration, 1/6 of the first time derivative of the acceleration, and 1/18 of the second time derivative. These must all be the arrays of size three. tj, dtj and mass are the time, timestep and mass of the particle. address is the location in the GRAPE-6 memory to store the particle, starting from index 0, and index is the identifier for the particle itself, which may be different from address above.

The time and timestep must be acceptable for the blockstep algorithm, which means that the timestep must be the powers of two (negative values allowed, but smaller than the time resolution set by g6_set_tunit should result in an error). Also the time must be an exact integer multiple of the timestep.

Note that the GRAPE-6 predictor unit can make use of the second time derivative of the force, unlike the GRAPE-4 predictor unit which can use only up to the first time derivative. If the application program cannot provide the second time derivative, it must give the pointer of the array with size three filled with zeros.

The parameter index is introduced to GRAPE-6 to achieve

The function g6calc_firsthalf also has this index as an argument. Thus, if the indices are the same, the accumulation of the result is skipped on hardware. Thus, effectively, one can achieve something like the following:

C calculate the force on particle i from all other particles
      do j = 1, n
          if (index(i) .ne. index(j)) then
C             do the actual force calculation
          endif
      enddo

The functions for the neighbor list will return the list of particles using this index, and not the location of the particles in the memory as used to be so on older GRAPE systems. This should make the application program somewhat simpler, in particular when the number of particles is larger than what fits on the memory of GRAPE-6.

5.4.2 g6_set_j_particle_mxonly

int g6_set_j_particle_mxonly(int clusterid,
                       int address,
                       int index,
                       double mass,
                       double x[3] /* position */)

integer function g6_set_j_particle_mxonly(clusterid, address, index, mass,x)
integer clusterid, address, index
double precision  mass, x(3)

This function works in the same way as g6_set_j_particle except that it sets only mass and position. This is handy if the application does not uses the predictor.

5.4.3 g6_flush_jp_buffer

int g6_flush_jp_buffer(int clusterid, int size)
subroutine  g6_flush_jp_buffer(clusterid)

This function flushes the data in the DMA buffer for j-particles. Should be called before fasthalf/lasthalf pair, if any $j$-particle was sent using g6_set_j_particle (or its variant) in DMA mode (g6_initialize_jp_buffer called).

5.5 Setting current time for individual timestep algorithm

5.5.1 g6_set_ti

void g6_set_ti(int clusterid, double ti)

subroutine  g6_set_ti_(clusterid, ti)
integer clusterid
double precision ti

This function sets the present time ($t_i$) for the predictor unit.

5.6 Force calculation

5.6.1 g6calc_firsthalf

void g6calc_firsthalf(int clusterid,
                      int nj,
                      int ni,
                      int index[],
                      double xi[][3],
                      double vi[][3],
                      double fold[][3],
                      double j6old[][3],
                      double phiold[],
                      double eps2,
                      double h2[])
subroutine g6calc_firsthalf(clusterid, nj, ni, index, xi, vi, fold,
                      j6old, phiold, eps2, h2)
integer clusterid, nj, ni, index(*),
double precision xi(3,*), vi(3,*), fold(3,*), j6old(3,*), phiold(*),
                 eps2, h2(*)
This function just calls g6_set_ip_scales, g6_set_i_particle, g6_set_nip and g6_set_njp in that order. In other words, it sets the necessary scalings for $i$-particles, sends them to GRAPE-6, set number of particles (both $i$ and $j$) and starts the calculation. Note that fold, j6old and phiold are used to determine the scaling. Therefore, they should have the value corresponding to the particle.

The application program can calculate the force by calling this function and g6calc_lasthalf in that order.

5.6.2 g6calc_firsthalf0

void g6calc_firsthalf0(int clusterid,
                      int nj,
                      int ni,
                      int index[],
                      double xi[][3],
                      double vi[][3],
                      double fold[][3],
                      double j6old[][3],
                      double phiold[],
                      double eps2[],
                      double h2[],
                      int * mode)
subroutine g6calc_firsthalf0(clusterid, nj, ni, index, xi, vi, fold,
                      j6old, phiold, eps2, h2, mode)
integer clusterid, nj, ni, index(*), mode
double precision xi(3,*), vi(3,*), fold(3,*), j6old(3,*), phiold(*),
                 eps2, h2(*)
This function is almost same as the function g6calc_firsthalf, except that the last argument mode is added. If mode=1, it behaves in the same way as g6calc_firsthalf, but if mode=0, eps2 is also an array. Thus, using this function, the user can specify different values of eps2 for different $i$-particles.

5.6.3 g6calc_lasthalf

int g6calc_lasthalf(int clusterid,
                    int nj,
                    int ni,
                    int index[],
                    double xi[][3],
                    double vi[][3],
                    double eps2,
                    double h2[],
                    double acc[][3],
                    double jerk[][3],
                    double pot[])

integer function  g6calc_lasthalf(clusterid, ni, nj, index, xi, vi,
                     eps2, h2, acc, jerk, pot)
integer clusterid, nj, ni, index
double precision xi(3,*), vi(3,*), eps2, h2(*), acc(3,*), jerk(3,*), pot(*)

This function waits for the end of calculation and retrieves the calculated result. In the case of the scaling error, this function tries to adjust the scaling factors appropriately and retries the calculation. If some hardware error occurs and is detected, this function returns non-zero value. Otherwise the return value is zero.

5.6.4 g6calc_lasthalf2

int g6calc_lasthalf2(int clusterid,
                     int nj,
                     int ni,
                     int index[],
                     double xi[][3],
                     double vi[][3],
                     double eps2,
                     double h2[],
                     double acc[][3],
                     double jerk[][3],
                     double pot[],
                     int nnbindex[])
i
integer function  g6calc_lasthalf2(clusterid, ni, nj, index, xi, vi,
                     eps2, h2, acc, jerk, pot, nnbindex)
integer clusterid, nj, ni, index, nnbindex(*)
double precision xi(3,*), vi(3,*), eps2, h2(*), acc(3,*), jerk(3,*), pot(*)

Same as g6calc_lastfalf except that this function returns nnbindex, the indices for the nearest neighbours.

5.6.5 g6calc_lasthalf0

int g6calc_lasthalf0(int clusterid,
                     int nj,
                     int ni,
                     int index[],
                     double xi[][3],
                     double vi[][3],
                     double eps2,
                     double h2[],
                     double acc[][3],
                     double jerk[][3],
                     double pot[],
                     int * mode)
i
integer function  g6calc_lasthalf0(clusterid, ni, nj, index, xi, vi,
                     eps2, h2, acc, jerk, pot, 0)
integer clusterid, nj, ni, index, mode
double precision xi(3,*), vi(3,*), eps2, h2(*), acc(3,*), jerk(3,*), pot(*)

Same as g6calc_lastfalf except that this function should be called if one uses g6calc_firsthalf0

5.6.6 g6calc_lasthalf0a

int g6calc_lasthalf0a(int clusterid,
                     int nj,
                     int ni,
                     int index[],
                     double xi[][3],
                     double vi[][3],
                     double eps2,
                     double h2[],
                     double acc[][3],
                     double jerk[][3],
                     double pot[],
                     int nnbindex[],
                     int * mode)
i
integer function  g6calc_lasthalf0(clusterid, ni, nj, index, xi, vi,
                     eps2, h2, acc, jerk, pot, 0)
integer clusterid, nj, ni, index, nnbindex(*), mode
double precision xi(3,*), vi(3,*), eps2, h2(*), acc(3,*), jerk(3,*), pot(*)

Combines both of lasthalf2 and lasthalf0 functionality.

5.7 Neighbour list

5.7.1 g6_read_neighbour_list

int g6_read_neighbour_list(int clusterid)

integer function g6_read_neighbour_list(clusterid)
integer clusterid

This function transfers the content of the hardware neighbor list of GRAPE-6 to the working memory of the interface library. Return value of zero means successful completion, $-1$ some internal error and $1$ overflow of the hardware neighbour list memory. This function must be called only once after g6_calc (or g6calc_lasthalf) is called.

Zero return value means normal end. Negative values indicate some hardware error (you need to reset the hardware and restart calculation). Positive return values mean overflow (radius too large).

Note that GRAPE-6 always calculates the force, and therefore neighbour lists, for 48 particles, even when you send only one particle with g6calc_firsthalf. For particles not supplied by the present call, the values last set are used. Thus, if you have neighbour list overflow with force calculation for less than 48 particles, that may be caused by the values of the neighbour sphere radius for ``unused'' locations.

If you want to use the neighbour list with less than 48 particles, always make sure that you do not set large values to ``unused'' locations. The simplest way to guarantee this is of course just to copy the last particle to all ``unused'' locations.

5.7.2 g6_get_neighbour_list

int  g6_get_neighbour_list_(int clusterid,
                            int ipipe,
                            int maxlength,
                            int * nblen,
                            int nbl[])

integer function  g6_get_neighbour_list(clusterid, ipipe, maxlength, nblen, nbl)
integer clusterid ipipe maxlength nblen, nbl(*)

This function extracts the actual neighbor list of particle calculated for particle ipipe from the working memory of the interface library prepared by g6_read_neighbour_list. The argument maxlength gives the size of the array nbl. Return value of zero means successful completion and 1 overflow. On successful return, nblen has the length of the neighbour list and nbl actual list (sorted by the indices).

5.7.3 g6_set_neighbour_list_sort_mode

void g6_set_neighbour_list_sort_mode(int mode);

subroutine g6_set_neighbour_list_sort_mode(mode)
integer mode

This function sets the sort option flag for the neighbour list function. When set to 1, the neighbour list returned by g6_get_neighbour_list is sorted by the indices of the neibours. this sorting is achieve by using qsort(3) function, and thus takes rather significant time. If you do not need your neighbour list sorted, you can get some speedup by calling this function with argument 0. Initial value of the flag is 1.

5.7.4 g6_get_neighbour_list_sort_mode

int g6_get_neighbour_list_sort_mode();

integer function g6_get_neighbour_list_sort_mode

This function returns the sort option flag for the neighbour list.

5.8 High-level function for shared-timestep algorithm

5.8.1 calculate_accel_by_grape6_separate_trial_noopen

int calculate_accel_by_grape6_separate_trial_noopen(int clusterid,
                                             int ni,
                                             double xi[][3],
                                             double vi[][3],
                                             int nj,
                                             double xj[][3],
                                             double vj[][3],
                                             double m[],
                                             double a[][3],
                                             double jerk[][3],
                                             double pot[],
                                             double eps2)

This function, for historical compatibility reasons, does not have Fortran interface. This function is designed as a quick shortcut to use GRAPE-6 with simple shared-timestep code, which do not need predictors and time derivatives of accelerations. Ni, xi, vi are the number of particles, position and velocity on which the accelerations are calculated, and here ni can be arbitrary large number (not limited to 48). Nj, xj, vj, m are the number of particles, position, velocity and mass of the particles that exert accelerations. A, jerk, pot are the calculated results and eps2 is the softening parameter.

This routine, as well as calculate_accel_by_grape6_noopen, use g6_guestimate_acc_etc internally. So use them with care. You can control the use of g6_guestimate_acc_etc by calling g6_set_calculate_accel_scaling_mode

5.8.2 calculate_accel_by_grape6_noopen

int calculate_accel_by_grape6_noopen(int clusterid,
                                             int n,
                                             double x[][3],
                                             double v[][3],
                                             double m[],
                                             double a[][3],
                                             double jerk[][3],
                                             double pot[],
                                             double eps2)
This is similar to calculate_accel_by_grape6_separate_trial_noopen, but calculates the self-interaction in direct summation and see if the force symmetry is achieved (if the total acceleration of the system is zero or not). If the test failed, it assumes that there may be some hardware error, and retry the same calculation.

5.9 Low level functions

5.9.1 g6_reset

int g6_reset_(int *clusterid)

subroutine g6_reset(clusterid)
integer clusterid

Performs the hardware reset. Usually one need not use this function, except for error recovery.

5.9.2 g6_reset_fofpga

int g6_reset_fofpga_(int *clusterid)

subroutine g6_reset_fofpga(clusterid)
integer clusterid

Force the reinitialization of all FPGA devices on board at the next call of g6_open. See the section of error recovery for details.

5.9.3 g6_npipes

int g6_npipes_()
integer function g6_npipes

Returns the number of pipelines available on the particular GRAPE-6 system in use. In most cases, it just returns 48, the number of physical pipelines per chip. In future, there may be some version of multi-cluster library which returns different numbers.

5.9.4 g6_getnjmax

int g6_getnjmax(int clusterid)
integer function g6_npipes(clusterid)
integer clusterid

Returns the maximum number of $j$-particles which can be stored in the board memory. The return value depends on the available number of chips. With some new hardware, the returned value may be wrong (too small).

5.9.5 g6_set_nip

void g6_set_nip_(int * clusterid, int * nip)
subroutine g6_set_nip(clusterid, nip)
integer clusterid, nip
Set the number of particles on which the forces (etc) are calculated. The maximum value for nip is 48, which is the number of virtual pipelines per chip. At least currently, the range of the arguments are not checked. So the application programmer has the responsibility to put them within the allowed range.

5.9.6 g6_set_njp

void g6_set_njp_(int * clusterid, int * njp)
subroutine g6_set_njp(clusterid, njp)
integer clusterid, njp

This function sets the number of the particles on the memory to on-chip registers of GRAPE-6. As a side effect, it let GRAPE-6 start the calculation.

5.9.7 g6_set_i_particle_scales_from_real_value

void g6_set_ip_scales_(int * clusterid,
                       int *address,
                       double acc[3],
                       double jerk[3],
                       double *phidouble *jfactor,
                       double *ffactor)

subroutine g6_set_ip_scales(clusterid, address, acc, jerk, phi,
                            jfactor, ffactor)
integer clusterid, address
double precision acc(3), jerk(3), phi, jfactor, ffactor

This function sets the binary point for the internal accumulator for force etc. GRAPE-6 performs the accumulation of the force etc is done in fixed-point format, in order to simplify the hardware and yet extend effective accuracy. Therefore, the hardware should know where to place the binary point before starting the calculation, and in order to do so the library function should know approximate size of force (etc). A good guess is the values at the previous timestep, and therefore within the time integration routine one can just pass the actual values of acceleration, jerk and potential to this routine.

Two additional parameters, jfactor and ffactor, are both used to allow more subtle control on the scaling of jerk. While acceleration and potential are accumulated in 64-bit format, jerk is accumulated in 32 bit. Thus, in order to allow reasonable accuracy, I chose the margin for jerk much smaller than that for acceleration and potential. The parameter jfactor is used to change the margin. The passed values of jerk is multiplied by jfactor before being used to calculate the scale. Thus, suppling jfactor larger than unity significantly reduce the chance of overflow, but might result in slightly less accurate value of jerk. I do not recommend the use of jfactor $> 10$.

The parameter ffactor is used to calculate the scale for jerk from the acceleration. To be specific, the scaling factor is calculated by

jmax = fabs(jerk[k])*(*jfactor) + fabs(acc[k])*(*ffactor);
This parameter is used to further reduce the chance of overflow. Since the jerk is the time derivative of the acceleration, acceleration divided by the timestep would give pretty good upper limit for the jerk.

For the first call, the use of this function is a bit problematic, since the application program might not have good knowledge on how larger is the force on a particle. In this case, I'd recommend to initialize the arrays for acceleration etc with fairly large values and then do the force calculation twice. In the first call, a good guess for the actual value is obtained. However, the calculated force itself should not be used, since the binary point might be inappropriate.

5.9.8 g6_adjust_ip_scales

void g6_adjust_ip_scales_(int * clusterid,
                          int *address,
                          int * flagp)

subroutine g6_adjust_ip_scales(clusterid, address, flagp)
integer clusterid, address, flagp
Even if the scaling factors is set according to the previous values, the calculated result might still overflow in some rare cases, and if you integrate $N$-body system long enough, every ``rare case'' would show up. This function adjust the scaling factors for particular ($i$-)particle, according to the flag returned by the function g6_get_force.

5.9.9 g6_set_i_particle

void g6_set_i_particle_(int * clusterid,
                        int *address,
                        int *index,
                        double x[3], /* position */
                        double v[3], /* velocity */
                        double * eps2,
                        double * h2)

subroutine g6_set_i_particle(clusterid, address, index, x, v, eps2, h2)
integer clusterid, address, index,
double precision x(3), v(3), eps2, h2
T his function sends the data of particles on which the force etc are to be calculated. The argument address is the location within GRAPE-6 register files, and it should be within the range of $[0,47]$. The meaning of the index is described in the section for g6_set_j_particle. Eps2 is the softening parameter squared. Setting this zero would not cause error, if index is correctly handled, since the self-interaction is avoided through this index. H2 is the radius of the neighbor sphere.

5.9.10 g6_get_force

int g6_get_force_(int * clusterid,
                   double acc[][3],
                   double jerk[][3],
                   double phi[],
                   int flag[])

integer function  g6_get_force(clusterid, acc, jerk, phi, flag)
integer clusterid, flag(*)
double precision acc(3,*),jerk(3,*), phi(*)
This function returns the calculated result. If the calculation is not finished, it waits until the end.

If non-zero value is returned, it means some error and that the recalculation is required. The meaning of the non-zero error code is not specified yet. Arguments acc, jerk, phi are the calculated acceleration, jerk and potential, respectively. Flag indicates the status. The function to analyze and report flags will be supplied soon.

5.9.11 g6_get_force_etc

int g6_get_force_etc_(int * clusterid,
                   double acc[][3],
                   double jerk[][3],
                   double phi[],
                   int nnbindex[],
                   int flag[])

integer function  g6_get_force_etc(clusterid, acc, jerk, phi, nnbindex, flag)
integer clusterid, flag(*), nnbindex(*)
double precision acc(3,*),jerk(3,*), phi(*)

Essentially the same as g6_get_force, except that this function returns the indices of the nearest neighbours in nnbindex as well. Note that the returned indices are NOT the memory memory address but the index set by g6_set_i_particle.

5.9.12 g6_guestimate_acc_etc

void g6_guestimate_acc_etc( int n,
                         double eps2,
                         double m[],
                         double a[][NDIM],
                         double j[][NDIM],
                         double p[])
This function is not really for public use, but listed here for completeness. The only thing this function does is to assign some ``reasonable'' values for force, jerk and potential so that they would not cause floating point error when they passed to actual GRAPE-6 library. The use of this function is discouraged. As stated before, the application programmer should provide a good guess for force etc. before passing them to g6calc_firsthalf_.

5.9.13 g6_set_calculate_accel_scaling_mode

int g6_set_calculate_accel_scaling_mode(int mode);
As stated earlier, by default calculate_accel_by_grape6_separate** functions calls g6_guestimate_acc_etc internally. You can suppress the call by calling g6_set_calculate_accel_scaling_mode with mode=0 before calling calculate_accel_by_grape6_separate**. By calling this function with mode=1 you can restore the default mode.


next up previous contents
Next: 6 Example Up: grape6user Previous: 4 Getting started
Jun Makino
平成17年1月31日