Next: 9 TIMESHARING Up: grape6user Previous: 7 Error recovery

8 Neighbour list

As it was with previous GRAPE hardwares, GRAPE-6 also has the hardware support to construct the list of neighbours for each particle. The use of this function, however, is not ``fool-proof''.

The basic use is quite simple. When calling firsthalf/lasthalf pair, you specify the radius for the neighbour search. The GRAPE-6 hardware then constructs the list during calculation, and tt read_neighbour{get_neighbour pair returns the actual neighbour list.

The problem is what one should do if the neighbour list overflows. You should check for the overflow by testing the return value of the function g6_read_neighbour_list. If the return value is , the list is probably not correct.

Well, it is difficult to give a universal procedure to recover from the overflow, since there are variety of reasons why the overflow occurs.

So let me here explain how the GRAPE-6 hardware handles the neighbour list. Though from the application's viewpoint there is only 48 pipelines, a single GRAPE-6 cluster might consist of up to 256 GRAPE-6 chips. Logically, each of these 256 chips have 48 pipelines, and all of them calculate the forces on the same set of 48 particles, but from different subset of particles you send to the memory of GRAPE-6. In other words, the ``GRAPE-6 memory'' is actually partitioned to small memories which are local to each GRAPE-6 chip.

The storage for the neighbour list is also local to each chip. Each chip has three memory units for the neighbour list, each of which is shared by 16 pipelines. They can hold up to 256 particles. Thus, if any of the memory units in any of the processor chip is overflown, at least the lists for the 16 particles which share the memory unit in question can be incomplete.

In most cases, this is not a very severe limitation. A single board with 32 chips can store 8192 neighbours for 16 particles. So even if there is no overlap between the neighbour lists for these 16 particles, one particle can, on average, up to 512 neighbours without causing overflow. This is true only if the neighbours are distributed evenly on different chips, which, I hope, is mostly the case.

If you need, on average, more than 16 neighbours per particle on -chip system, the only safe way is to reduce the number of particles you send with firsthalf. If you send just one particle, it can use all memories which is normally shared by 16 particles. So it can have up to neighbours, which should be okay for most applications. If you need even more, well, it is probably faster if you construct the neighbour list on frontend using tree...

If you need, say, less than neighbours on average, the overflow must be very rare. The simplest solution (well, at least for me, but probable also for you) is to construct the neighbour lists on frontend. If the overflow is sufficiently rare, operation on host should not cause much performance penalty.

If you want to be smarter, you could have a routine which set to be all zero but for one particle, and calculate the force and neighbour list for that particle. In this way, the probability of overflow is very small, but still not zero. So you still need a next level of backup routine which calculates the neighbour list on the frontend.

Unless you see performance problem, I'd recommend the simple solution of to construct the neighbour list on the frontend when overflow occurs.

Next: 9 TIMESHARING Up: grape6user Previous: 7 Error recovery

Jun Makino
平成17年1月31日