الوصف: |
Разработка параллельных программ численного моделирования на мультикомпьютерах является сложной задачей ввиду необходимости обеспечивать нефункциональные свойства программ (производительность, расход памяти, нагрузка на сеть и т. п.), а также динамическую балансировку нагрузки, отказоустойчивость и другие свойства. В работе рассматриваются технология фрагментированного программирования и поддерживающая ее система LuNA автоматического конструирования параллельных программ с заданными нефункциональными свойствами. Прикладной алгоритм представляется в виде множества информационно-зависимых задач, что позволяет параллельно исполнять их, динамически перераспределять их по узлам мультикомпьютера, обеспечивая динамическую балансировку нагрузки на узлы, а также реализовывать другие нефункциональные свойства программы автоматически. Рассматривается возможность автоматической настройки исполнения программы на конфигурацию вычислителя на основе профилирования. Ключевые слова: технология фрагментированного программирования, автоматическое конструирование параллельных программ, высокопроизводительные вычисления, система LuNA. Usage of supercomputers for numerical simulations implies the complexity of distributed parallel programs development. Non-functional requirements for such programs include efficiency, memory and network bandwidth economy, tuning to available resources, fault tolerance and check pointing support, dynamic workload balancing, etc. To satisfy these requirements one has to concern peculiarities of application algorithm, hardware configuration and data. The development requires skills and knowledge supercomputer users are unlikely to have. To reduce the complexity of parallel programs development, debugging and modification diverse systems and tools of programs construction automation exist and evolve. Such tools accept a high level description of an application algorithm, as well as hardware configuration description, to produce and execute a parallel program, which implements the algorithm and fits into the non-functional requirements. Thus some of the programmer’s burden is eliminated. Also such systems and tools are necessary to implement an active knowledge technology, where application algorithm is synthesized automatically based on application problem specification and has then to be automatically executed. Automatic parallel programs construction system has to employ a high-level description of an application algorithm. The algorithm representation should be independent from hardware configuration and allow tuning of algorithm execution to given hardware and data. The representation should comprise independently computable parts which system should be able to dynamically map to given resources. According to such demands the fragmented programming technology (FPT) is being developed in ICMMG SB RAS, as well as LuNA system for automatic numerical parallel programs construction (LuNA stands for Language for Numerical Algorithms). In the FPT computations are represented as the fragmented algorithm (FA) — a countable set of computational tasks called computational fragments (CFs). A CF is defined by two finite sets of input and output arguments — immutable data objects called data fragments (DFs) — and an operation on the DFs, which computes values of output DFs from values of input DFs. Execution of FA is the execution of all the CFs once their input DFs are computed. Operations are side-effect-free sequential subroutines, thus a CF can be executed on any computing node dynamically, provided its input DFs’ values are transferred to the node. The job of the programming system is to dynamically map DFs and CFs to computing nodes, provide input DFs transfer to CFs and execute CFs to produce new DFs. During FA execution the system redistributes CFs and DFs in order to balance workload, employs replication to provide fault tolerance, saves DFs values for check pointing or performs other jobs to provide necessary non-functional properties of the program execution. In order to improve non-functional properties of FA execution supplementary information called recommendations can be provided by the user. Recommendations are means to overcome the fundamental difficulty of efficient execution of an application algorithm, represented in a high-level language, such as LuNA. Using recommendations the user can explicitly declare properties of the algorithm, which are hard to obtain automatically, but can significantly affect execution (e.g. estimated operations execution time). Also the user can employ recommendations to express his insight on how the computations should be performed in order to achieve high efficiency (instead of programming it in a low-level language, such as C+MPI). The system will follow the recommendations in cases, where good efficiency is hard to achieve automatically. An experiment of automatic CFs mapping to resources using profiling is concerned. An application (matrices multiplication test) was run multiple times in the profiling mode. After each run the CFs mapping to computing nodes was adjusted to avoid load imbalances, which occurred in the previous execution. After few executions the execution time of the program decreased to the time of an efficient hand-made implementation of the same algorithm. This shows the possibility to automatically tune a FA execution to hardware configuration. Another experiment shows automatic provision of dynamic load balancing on the example application of self-gravitating dust cloud simulation using Particle-in-Cell method. The application is an example of a problem, where workload distribution cannot be predicted statically and depends on input data. Necessary dynamic workload balancing was provided by LuNA system automatically based on the Rope of Beads algorithm, which preserves distributed data structure when balancing. A number of other models and real-life applications were developed using the LuNA system. The efficiency of the constructed program is generally lower, than the one of hand-coded implementation, but reduction of labourness of program development, debugging and modification is often worth it. In future it is planned to implement other system algorithms and heuristics to improve the quality of automatically constructed programs, and to use LuNA system as a basis for active knowledge system construction. |