Huge CPU Memory+Time Expense
EXPT | Reactor neutrino |
Daya Bay | neutrino oscillations |
JUNO | mass heirarchy + oscillations => NVIDIA CN Contacts |
Long baseline neutrino beam | |
DUNE | FermiLab->Sanford, LAr TPC, => Assistance from Fermilab Geant4 Group |
Neutrinoless double beta decay, dark matter, other search | |
LZ | LUX-ZEPLIN dark matter experiment, Sandford => NVIDIA US Contacts |
LEGEND | Large Enriched Germanium Experiment, Gran Sasso/SNOLAB |
SABRE | dark matter direct-detection, Australia |
AMoRE | Mo-based Rare process Experiment, S.Korea |
nEXO | next Enriched Xenon Observatory, LLNL |
NEXT-CRAB0 | High Pressure Gaseous Xenon TPC with a Direct VUV Camera Based Readout |
Neutrino telescope | |
KM3Net | Cubic Kilometre Neutrino Telescope, Mediterranean |
IceCube | IceCube Neutrino Observatory, South Pole |
Air shower : gamma-ray and cosmic-ray observatory | |
LHAASO | Large High Altitude Air Shower Observatory, Sichuan |
Accelerator | |
LHCb-RICH | LHCb ring imaging Cherenkov sub-detector, CERN => NVIDIA EU Contacts |
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
CSGFoundry Model
Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)
PV | G4VPhysicalVolume | placed, refs LV |
LV | G4LogicalVolume | unplaced, refs SO |
SO | G4VSolid,G4BooleanSolid | binary tree of SO "nodes" |
Opticks CSGFoundry Geometry Model (index references)
struct | Notes | Geant4 Equivalent |
---|---|---|
CSGFoundry | vectors of the below, easily serialized + uploaded + used on GPU | None |
qat4 | 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) | Transforms ref from PV |
CSGSolid | refs sequence of CSGPrim | Grouped Vols + Remainder |
CSGPrim | bbox, refs sequence of CSGNode, root of CSG Tree of nodes | root G4VSolid |
CSGNode | CSG node parameters (JUNO: ~23k CSGNode) | node G4VSolid |
NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)
IAS | Instance Acceleration Structures | JUNO: 1 IAS created from vector of ~50k qat4 (JUNO) |
GAS | Geometry Acceleration Structures | JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode ) |
JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS
Full JUNO, Opticks, OptiX 7.5/8.0
raytrace 3.7M pixels | |
---|---|
![]() |
0.0118s (85 fps) |
![]() |
0.0031s (323 fps) |
Cutaway ray traced render of JUNO CD
Mostly Analytic CSG
Guide Tube Torus Triangulated
WaterPool HBeam Overlaps FIXED with simpler approach
FastenerAcrylic translated to "list-node"
Testing triangulated GuideTube + XJ + SJ solids
Interactive ray traced visualization via OpenGL/OptiX interop
initial viewpoint, geometry exclusions via envvars
WASDQE+mouse 3D navigation
Render on NVIDIA RTX 5000 Ada Generation in 0.0060 s (not 0.0200 s)
Intersect with torus expensive on GPU
Triangulation using G4Polyhedron
G4Poly..::SetNumberOfRotationSteps
NumberOfRotationSteps | |
---|---|
HepPolyhedron Default | 24 |
Top Right | 48 |
Bottom Right | 480 |
Adjustable: precision of intersect, number of triangles
GPUs evolved for triangles => fast even with many
RTX : Uses "builtin" RT Core triangle intersect
curand XORWOW workaround
XORWOW is default curand generator
Opticks longstanding workaround:
Workaround works, BUT:
Philox4_32_10 is alternative curand generator
Advantages:
Stat. quality of Philox randoms comparable to XORWOW[1]
[1] cuRAND generator tests https://docs.nvidia.com/cuda/curand/testing.html
Opticks Unlimited
Philox counter based RNG + Out-of-core:
=> simulate billion-photon evt, no special setup
Use sliced genstep array in: QSim::simulate
[NP::MakeMetaKVS_ranges2_table num_specs 8 SEvt__Init_RUN_META ==> CSGFoundry__Load_HEAD 655 ## init CSGFoundry__Load_HEAD ==> CSGFoundry__Load_TAIL 4,235,189 ## load_geom CSGOptiX__Create_HEAD ==> CSGOptiX__Create_TAIL 266,810 ## upload_geom A000_QSim__simulate_HEAD ==> A000_QSim__simulate_LBEG 251 ## slice_genstep A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,137,923 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 3,975,867 ## download slice A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,449,227 REP 46,587,150 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 3,924,104 REP 7,899,971 ## download slice A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,736,442 REP 70,323,592 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 4,108,315 REP 12,008,286 ## download slice A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,850,920 REP 94,174,512 ## simulate slice A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 4,119,275 REP 16,127,561 ## download slice A000_QSim__simulate_LEND ==> A000_QSim__simulate_PCAT 15,900,158 ## concat slices A000_QSim__simulate_BRES ==> A000_QSim__simulate_TAIL 117,551,399 ## save arrays TOTAL: 248,256,535 ]NP::MakeMetaKVS_ranges2_table num_keys:69
Auto Config based on VRAM
[1] SEventConfig::HeuristicMaxSlot_Rounded
Out-of-core optical simulation | |
---|---|
four kernel executions, total time | 94 s |
four hit slice downloads, total time | 16 s |
saving 216M hits (13GB .npy file) | 117 s |
loading geometry from /cvmfs | 4 s |
total time | 248 s |
TEST=medium_scan ~/opticks/cxs_min.sh
Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100 OPTICKS_NUM_EVENT=11 OPTICKS_EVENT_MODE=Hit
Compare simulation scans on two Dell Precision Workstations:
GPU (VRAM) | Arch | GPU Release | CUDA(RT) Cores | RTX Gen | Driver | CUDA | OptiX |
---|---|---|---|---|---|---|---|
NVIDIA TITAN RTX(24G) | Turing | Dec 2018 | 4,608(72) | 1st | 515.43 | 11.7 | 7.5 |
NVIDIA RTX 5000(32G) | Ada | Aug 2023 | 12,800(100) | 3rd | 550.76 | 12.4 | 8.0 |
PH(M) | G1 | G3 | G1/G3 |
---|---|---|---|
1 | 0.47 | 0.14 | 3.28 |
10 | 0.44 | 0.13 | 3.48 |
20 | 4.39 | 1.10 | 3.99 |
30 | 8.87 | 2.26 | 3.93 |
40 | 13.29 | 3.38 | 3.93 |
50 | 18.13 | 4.49 | 4.03 |
60 | 22.64 | 5.70 | 3.97 |
70 | 27.31 | 6.78 | 4.03 |
80 | 32.24 | 7.99 | 4.03 |
90 | 37.92 | 9.33 | 4.06 |
100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
Amdahls "Law" : Expected Speedup
Overall speed limited by serial portion
optical photon simulation, P ~ 99% of CPU time
Traditional simulation use:
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email: opticks+subscribe@groups.io | subscribe to mailing list |
simon.c.blyth@gmail.com | any questions |
New active bug reporting (+leak finding/fixing) Opticks user : Ilker Parmaksiz
had contacts with ~5 LZ people
GPU vendor | compute framework | ray trace framework | hardware RT | notes |
---|---|---|---|---|
NVIDIA | CUDA(2007-) | OptiX(2009-) | RTX/RT Core (2018-) | |
Apple | Metal/MPS(2014-) | Metal/MPS(2020-) | From M3 (2023-) | |
AMD | ROCm(2016-) | RadeonRays, HIP-RT(2022-) | From Radeon RX 6000 (2020-) | |
Intel | oneAPI(?2020-) | Embree? | From Arc Alchemist (2022-) | uses SYCL |
Huawei | ? | mobile only | mobile only | |
Cross-vendor | Vulkan compute shaders | Vulkan ray trace extension | NVIDIA/AMD/Intel/? | Depends on vendor drivers |
OpenCL | dead? | |||
OpenMP | new support for GPU offloading |
Other GPU vendors such as Samsung and Qualcomm mostly focussed on mobile.