Opticks : GPU ray trace accelerated optical photon simulation

Opticks :
GPU ray trace accelerated optical photon simulation

Open source, https://bitbucket.org/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — Kaiping — 15 January 2025


Outline

newtons-opticks.png
 


(JUNO) Optical Photon Simulation Problem...


ALL0_Debug_Philox_GUN4_mu214gev.png


Optical Photon Simulation ≈ Ray Traced Image Rendering

simulation
photon parameters at sensors (PMTs)
rendering
pixel values at image plane

Much in common : geometry, light sources, optical physics

Many Applications of ray tracing :


NVIDIA RTX Generations

ray trace performance : ~2x every ~2 years


NVIDIA® OptiX™ Ray Tracing Engine -- Accessible GPU Ray Tracing

OptiX makes GPU ray tracing accessible

OptiX features

User provides (Green):

Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:


Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow

https://bitbucket.org/simoncblyth/opticks

Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory


Geometry Model Translation : Geant4 => CSGFoundry => NVIDIA OptiX 7/8

Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)

PV G4VPhysicalVolume placed, refs LV
LV G4LogicalVolume unplaced, refs SO
SO G4VSolid,G4BooleanSolid binary tree of SO "nodes"

Opticks CSGFoundry Geometry Model (index references)

struct Notes Geant4 Equivalent
CSGFoundry vectors of the below, easily serialized + uploaded + used on GPU None
qat4 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) Transforms ref from PV
CSGSolid refs sequence of CSGPrim Grouped Vols + Remainder
CSGPrim bbox, refs sequence of CSGNode, root of CSG Tree of nodes root G4VSolid
CSGNode CSG node parameters (JUNO: ~23k CSGNode) node G4VSolid

NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)

IAS Instance Acceleration Structures JUNO: 1 IAS created from vector of ~50k qat4 (JUNO)
GAS Geometry Acceleration Structures JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode )

JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS


Ada_cxr_overview_emm_t0_elv_t_moi__ALL.jpg


Analytic + triangulated geometry

NEW FEATURE
Integration of analytic + triangulated geometry

cxr_min__eye_1,0,0__zoom_1__tmin_0.5__sSurftube_0V1_0:0:-1.jpg

Interactive ray traced visualization via OpenGL/OptiX interop

initial viewpoint, geometry exclusions via envvars

WASDQE+mouse 3D navigation


GuideTube : Torus Triangulated

GuideTube (39*2*2 = 156 G4Torus)
split in phi segments, radius breaks

Intersect with torus expensive on GPU

Triangulation using G4Polyhedron

G4Poly..::SetNumberOfRotationSteps

  NumberOfRotationSteps
HepPolyhedron Default 24
Top Right 48
Bottom Right 480

Adjustable: precision of intersect, number of triangles

GPUs evolved for triangles => fast even with many


Optimized curand random number generation with Philox4_32_10

Philox Advantages

Philox Disadvantages


Out-of-core optical photon simulation : multi-launch

Out-of-core
simulate more photons than fit VRAM

Approach centered on QSim::simulate

  1. configure max slots, default based on VRAM
  2. collect scintillation + cerenkov gensteps from Geant4
  3. form vector of genstep slices
    • each slice photon count less than max slots
  4. loop over slices:
    • upload genstep array slice
    • kernel launch simulate
    • gather results into NPFold
  5. concatenate results (NPFold::concat)
curand "slot" offset by ph_offset
=> perfect match with any slicing

Philox counter based RNG + Out-of-core => Opticks un-limited


Simulating One Billion Photons in under 100 sec

cxs_min.sh
pure optical simulation of 40 torch gensteps from CD center totalling 1 billion photons on Dell Precision Workstation with NVIDIA RTX 5000 Ada (3rd Gen) [sreport shows microsecond timestamp deltas]
 [NP::MakeMetaKVS_ranges2_table num_specs 8
      SEvt__Init_RUN_META ==>    CSGFoundry__Load_HEAD          655                    ## init
    CSGFoundry__Load_HEAD ==>    CSGFoundry__Load_TAIL    4,235,189                    ## load_geom
    CSGOptiX__Create_HEAD ==>    CSGOptiX__Create_TAIL      266,810                    ## upload_geom
 A000_QSim__simulate_HEAD ==> A000_QSim__simulate_LBEG          251                    ## slice_genstep
 A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST   23,137,923                    ## simulate slice
 A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN    3,975,867                    ## download slice
 A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST   23,449,227 REP  46,587,150    ## simulate slice
 A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN    3,924,104 REP   7,899,971    ## download slice
 A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST   23,736,442 REP  70,323,592    ## simulate slice
 A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN    4,108,315 REP  12,008,286    ## download slice
 A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST   23,850,920 REP  94,174,512    ## simulate slice
 A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN    4,119,275 REP  16,127,561    ## download slice
 A000_QSim__simulate_LEND ==> A000_QSim__simulate_PCAT   15,900,158                    ## concat slices
 A000_QSim__simulate_BRES ==> A000_QSim__simulate_TAIL  117,551,399                    ## save arrays
                                                TOTAL:  248,256,535
 ]NP::MakeMetaKVS_ranges2_table num_keys:69
Out-of-core optical simulation
four kernel executions, total time 94 s
four hit slice downloads, total time 16 s
saving 216M hits (13GB .npy file) 117 s
loading geometry from /cvmfs 4 s
total time 248 s

Pure Optical TorchGenstep scan : 1M to 100M photons

TEST=medium_scan ~/opticks/cxs_min.sh

Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.

OPTICKS_RUNNING_MODE=SRM_TORCH  ## "Torch" running enables num_photon scan
OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100
OPTICKS_NUM_EVENT=11
OPTICKS_EVENT_MODE=Hit

Compare simulation scans on two Dell Precision Workstations:

GPU (VRAM) Arch GPU Release CUDA(RT) Cores RTX Gen Driver CUDA OptiX
NVIDIA TITAN RTX(24G) Turing Dec 2018 4,608(72) 1st 515.43 11.7 7.5
NVIDIA RTX 5000(32G) Ada Aug 2023 12,800(100) 3rd 550.76 12.4 8.0

ALL1_scatter_10M_photon_22pc_hit_alt.png

4.5M hits from 20M photon TorchGenstep, 4.4(1.1) seconds
with: NVIDIA TITAN RTX(NVIDIA RTX 5000 Ada)  1st(3rd) gen RTX

AB_Substamp_ALL_Etime_vs_Photon_rtx_gen1_gen3.png

Event Time(s) vs PH(M)
PH(M) G1 G3 G1/G3
1 0.47 0.14 3.28
10 0.44 0.13 3.48
20 4.39 1.10 3.99
30 8.87 2.26 3.93
40 13.29 3.38 3.93
50 18.13 4.49 4.03
60 22.64 5.70 3.97
70 27.31 6.78 4.03
80 32.24 7.99 4.03
90 37.92 9.33 4.06
100 41.93 10.42 4.03

Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]

        
        

JUNOSW+Opticks

BUT requires:

How to handle continuous geometry change ? (longstanding problem to keep up with changing geometry)


Acknowledgements