Huge CPU Memory+Time Expense
Simulation of 214 GeV mu-
jok-tds-gdb : tut_detsim.py simulation time | |
---|---|
Single threaded Geant4 (*) | 41 hrs |
Opticks (TITAN RTX, 1st G) | [AWAIT VALIDATION] |
Need validation to establish equivalent simulation
38M photons, 147k gensteps, Debug build
(*) DELL Precision 7920T Workstation, Intel Xeon Gold 5118, 2.3GHz, 48 cores, 62G
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
ray trace performance : ~2x every ~2 years
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
CSGFoundry Model
Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)
PV | G4VPhysicalVolume | placed, refs LV |
LV | G4LogicalVolume | unplaced, refs SO |
SO | G4VSolid,G4BooleanSolid | binary tree of SO "nodes" |
Opticks CSGFoundry Geometry Model (index references)
struct | Notes | Geant4 Equivalent |
---|---|---|
CSGFoundry | vectors of the below, easily serialized + uploaded + used on GPU | None |
qat4 | 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) | Transforms ref from PV |
CSGSolid | refs sequence of CSGPrim | Grouped Vols + Remainder |
CSGPrim | bbox, refs sequence of CSGNode, root of CSG Tree of nodes | root G4VSolid |
CSGNode | CSG node parameters (JUNO: ~23k CSGNode) | node G4VSolid |
NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)
IAS | Instance Acceleration Structures | JUNO: 1 IAS created from vector of ~50k qat4 (JUNO) |
GAS | Geometry Acceleration Structures | JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode ) |
JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS
Full JUNO, Opticks, OptiX 7.5/8.0
raytrace 2M pixels | |
---|---|
![]() |
0.0118s (85 fps) |
![]() |
0.0031s (323 fps) |
Interactive ray traced visualization via OpenGL/OptiX interop
initial viewpoint, geometry exclusions via envvars
WASDQE+mouse 3D navigation
Intersect with torus expensive on GPU
Triangulation using G4Polyhedron
G4Poly..::SetNumberOfRotationSteps
NumberOfRotationSteps | |
---|---|
HepPolyhedron Default | 24 |
Top Right | 48 |
Bottom Right | 480 |
Adjustable: precision of intersect, number of triangles
GPUs evolved for triangles => fast even with many
curand RNG generators
sizeof state | notes | |
---|---|---|
XORWOW | 48 | curand default, expensive init |
Philox4_32_10 | 64 | cheap init, counter based |
|
32 | slimmed state |
split init from usage by persisting state files
init within simulate kernel
Philox Advantages
Philox Disadvantages
Genstep slices : sslice.h
struct sslice { int gs_start ; // 1st gs idx int gs_stop ; // after last gs idx int ph_offset ; // tot photon before this slice int ph_count ; // tot photon within this slice static int TotalPhoton(const std::vector<sslice>& sl ); static int TotalPhoton(const std::vector<sslice>& sl, int i0, int i1); static void SetOffset(std::vector<sslice>& slice); ... };
Approach centered on QSim::simulate
Philox counter based RNG + Out-of-core => Opticks un-limited
[NP::MakeMetaKVS_ranges2_table num_specs 8 SEvt__Init_RUN_META ==> CSGFoundry__Load_HEAD 655 ## init CSGFoundry__Load_HEAD ==> CSGFoundry__Load_TAIL 4,235,189 ## load_geom CSGOptiX__Create_HEAD ==> CSGOptiX__Create_TAIL 266,810 ## upload_geom A000_QSim__simulate_HEAD ==> A000_QSim__simulate_LBEG 251 ## slice_genstep A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,137,923 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 3,975,867 ## download slice A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,449,227 REP 46,587,150 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 3,924,104 REP 7,899,971 ## download slice A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,736,442 REP 70,323,592 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 4,108,315 REP 12,008,286 ## download slice A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,850,920 REP 94,174,512 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 4,119,275 REP 16,127,561 ## download slice A000_QSim__simulate_LEND ==> A000_QSim__simulate_PCAT 15,900,158 ## concat slices A000_QSim__simulate_BRES ==> A000_QSim__simulate_TAIL 117,551,399 ## save arrays TOTAL: 248,256,535 ]NP::MakeMetaKVS_ranges2_table num_keys:69
Auto Config based on VRAM
Out-of-core optical simulation | |
---|---|
four kernel executions, total time | 94 s |
four hit slice downloads, total time | 16 s |
saving 216M hits (13GB .npy file) | 117 s |
loading geometry from /cvmfs | 4 s |
total time | 248 s |
TEST=medium_scan ~/opticks/cxs_min.sh
Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100 OPTICKS_NUM_EVENT=11 OPTICKS_EVENT_MODE=Hit
Compare simulation scans on two Dell Precision Workstations:
GPU (VRAM) | Arch | GPU Release | CUDA(RT) Cores | RTX Gen | Driver | CUDA | OptiX |
---|---|---|---|---|---|---|---|
NVIDIA TITAN RTX(24G) | Turing | Dec 2018 | 4,608(72) | 1st | 515.43 | 11.7 | 7.5 |
NVIDIA RTX 5000(32G) | Ada | Aug 2023 | 12,800(100) | 3rd | 550.76 | 12.4 | 8.0 |
PH(M) | G1 | G3 | G1/G3 |
---|---|---|---|
1 | 0.47 | 0.14 | 3.28 |
10 | 0.44 | 0.13 | 3.48 |
20 | 4.39 | 1.10 | 3.99 |
30 | 8.87 | 2.26 | 3.93 |
40 | 13.29 | 3.38 | 3.93 |
50 | 18.13 | 4.49 | 4.03 |
60 | 22.64 | 5.70 | 3.97 |
70 | 27.31 | 6.78 | 4.03 |
80 | 32.24 | 7.99 | 4.03 |
90 | 37.92 | 9.33 | 4.06 |
100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
BUT requires:
How to handle continuous geometry change ? (longstanding problem to keep up with changing geometry)
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email: opticks+subscribe@groups.io | subscribe to mailing list |
simon.c.blyth@gmail.com | any questions |