Shared Memory Parallelization of Dynamic Structural Mechanics Codes

R. Diekmann (1,2), G. Gabriel (2), A. Reinefeld (1), W. Sack (2), J. Wiesbaum (1), and J.-M. Wierum (1).

(1) Paderborn Center for Parallel Computing, University of Paderborn
(2) Corporate Research, Hilti AG, Schaan, Principality of Liechtenstein

We propose a thread-based model for parallelizing dynamic iterative scientific codes on CC-NUMA architecures. The model combines distributed programming -- partitioning of work -- with the convenience of global memory access. The locality automatically exploited by distributed programming assures high cache efficiencies and the use of global memory access avoids to explicitly indentify data for communication. While message-passing parallelizations are relatively easy for static codes, the difficulties increase a lot if dynamic, non-linear codes are being parallelized. The thread model allows a very easy re-assigment of work to processors. Therefore, load balancing is not a problem any longer provided the locality of partitions assigned to processors is not destroyed.

We suggest to use spacefilling curves to sort the data (nodes and elements) of an FE-mesh and to assign consecutive index ranges to individual threads. Load balancing can be performed by simply shifting the index boundaries. The resulting parallelization is very easy. It requires only minimal changes in existing codes and delivers almost linear speedup on a large number of processors.

The strategy is applied to a dynamic, explicit structural mechanics code. Measurements show super-linear speedup on up to 16 processors (due to cache effects) and an efficiency of more than 90% on up to 32 processors.


Last modified September 01, 1998 (hiper98@ethz.ch)
!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!