Содержание
- 2. Goal: Efficient parallelization of complex numerical problems in computational physics HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT Plan of
- 3. … TOP500 List – June 2014
- 4. Source: http://www.top500.org/blog/slides-for-the-43rd-top500-list-now-available/ TOP500 List – June 2014
- 5. Source: http://www.top500.org/blog/slides-for-the-43rd-top500-list-now-available/ TOP500 List – June 2014
- 6. «Lomonosov» Supercomputer , MSU >5000 computation nodes Intel Xeon X5670/X5570/E5630, PowerXCell 8i ~36 Gb DRAM 2
- 7. Custom languages such as CUDA and OpenCL Specifications • 2880 CUDA GPU cores • Peak precision
- 8. «Tornado SUSU» Supercomputer, South Ural State University, Russia 480 computing units (compact and powerful computing blade-modules)
- 9. At the end of 2012, Intel launched the first generation of the Intel Xeon Phi product
- 10. HybriLIT: heterogeneous computation cluster Суперкомпьютер «Ломоносов» МГУ CICC comprises 2582 Cores Disk storage capacity 1800 TB
- 11. 2x Intel Xeon CPU E5-2695v2 3x NVIDIA TESLA K40S 2x Intel Xeon CPU E5-2695v2 NVIDIA TESLA
- 12. Multiple CPU cores with share memory Multiple GPU What we see: modern Supercomputers are hybrid with
- 13. Parallel technologies: levels of parallelism In the last decade novel computational technologies and facilities becomes available:
- 14. In the last decade novel computational facilities and technologies has become available: MPI-OpenMP-CUDA-OpenCL... It is not
- 15. Problem HCE: heat conduction equation Initial boundary value problem for the heat conduction equation: D –
- 16. Problem HCE: computation scheme Locally one-dimensional scheme: reduction of a multidimensional problem to a chain of
- 17. Step 1: Difference equations (Ny-2) on x direction Step 2: Difference equations (Nx-2) on y direction
- 18. Problem HCE: parallelization scheme Parallel Parallel
- 19. Parallel Technologies
- 20. OpenMP realization of parallel algorithm
- 21. OpenMP (Open specifications for Multi-Processing) OpenMP (Open specifications for Multi-Processing) is an API that supports multi-platform
- 22. Compiler directive Library routines OpenMP (Open specifications for Multi-Processing) Use flag -openmp to compile using Intel
- 23. OpenMP realization: Multiple CPU cores that share memory Table 2. OpenMP realization problem 1: execution time
- 24. OpenMP realization: Intel® Xeon Phi™ Coprocessor Compiling: icc -openmp -O3 -vec-report=3 -mmic algLocal_openmp.cc –o alg_openmp_xphi Table
- 25. OpenMP realization: Intel® Xeon Phi™ Coprocessor Optimizations The KMP_AFFINITY Environment Variable: The Intel® OpenMP* runtime library
- 26. CUDA (Compute Unified Device Architecture) programming model, CUDA C
- 27. CUDA (Compute Unified Device Architecture) programming model, CUDA C Source: http://blog.goldenhelix.com/?p=374 Core 1 Core 2 Core
- 28. Source: http://www.realworldtech.com/includes/images/articles/g100-2.gif CUDA (Compute Unified Device Architecture) programming model
- 29. Device Memory Hierarchy Registers are fast, off-chip local memory has high latency Tens of kb per
- 30. Function Type Qualifiers __global__ __host__ CPU GPU __global__ __device__ __global__ void kernel ( void ){ }
- 31. Threads and blocks HETEROGENEOUS COMPUTATIONS GROUP, HybriLIT int tid = threadIdx.x + blockIdx.x * blockDim.x tid
- 32. Scheme program on CUDA C/C++ and C/C++ HETEROGENEOUS COMPUTATIONS GROUP, HybriLIT
- 33. nvcc -arch=compute_35 test_CUDA_deviceInfo.cu -o test_CUDA –o deviceInfo Compilation Compilation tools are a part of CUDA SDK
- 34. Source: https://developer.nvidia.com/cuda-education. (Will Ramey ,NVIDIA Corporation) Some GPU-accelerated Libraries
- 35. Problem HCE: parallelization scheme Parallel Parallel
- 36. Problem HCE: CUDA realization Initialization: parameters of the problem and the computational scheme are copied in
- 37. Table 1. CUDA realization: Execution time and Acceleration CUDA realization of parallel algorithm: efficiency of parallelization
- 38. Problem HCE : analysis of results
- 39. Hybrid Programming: MPI+CUDA: on the Example of GIMM FPEIP Complex GIMM FPEIP : package developed for
- 40. To solve a system of coupled equations of heat conductivity which are a basis of the
- 41. GIMM FPEIP: Logical scheme of the complex
- 42. Using Multi-GPUs
- 43. MPI, MPI+CUDA ( CICC LIT, К100 KIAM)
- 44. Hybrid Programming: MPI+OpenMP, MPI+OpenMP+CUDA The MultiConfigurationalTtimeDependnetHartree (for) Bosons method: PRL 99, 030402 (2007), PRA 77, 033613
- 45. Time-Dependent Schrödinger equation governs the physics of trapped ultra-cold atomic clouds To solve the Time-Dependent Many-Boson
- 46. All the terms of the Hamiltonian are under experimental control and can be manipulated 1D-2D-3D: Control
- 47. Two generic rgimes: (i) non-violent (under-a-barrier) and (ii) Explosive (over-a-barrier) Two generic regimes: (i) non-violent (under-a-barrier)
- 48. List of Applications Modern development of computer technologies (multi-core processors, GPU , coprocessors and other) require
- 50. Скачать презентацию