http://simoncblyth.bitbucket.io/env/presentation/gpu_optical_photon_simulation.html Dropbox video shares banned
Objective
Make GPU accelerated Optical Photon Propagation routine
Introducing Chroma
Fivefold path:
Developed by Stan Seibert, University of Pennsylvania.
Chroma tracks photons through a triangle-mesh detector geometry, simulating processes like diffuse and specular reflections, refraction, Rayleigh scattering and absorption. Using triangle meshes eliminate geometry code as just one code path.
200x performance claim:
With a CUDA GPU Chroma has propagated 2.5M photons per second in a detector with 29k photomultiplier tubes. This is 200x faster than GEANT4.
BUT: Chroma needs : triangles + inside/outside materials
http://on-demand.gputechconf.com/gtc/2013/presentations/S3304-Particle-Physics-With-PyCUDA.pdf
Export any Geant4 Geometry into COLLADA/DAE standard 3D files, including:
G4DAE capable NuWa: ./dybinst -X geant4_with_dae trunk all
Greenfield dybinstallation required
NuWa LCG_builder grabs g4dae from bitbucket see NuWa-trunk/lcgcmt/LCG_Builders/geant4/scripts/geant4_config.sh
Qty | DayaBay | Lingao | Far | Juno x0.5 |
---|---|---|---|---|
Volumes | 12,229 | 12,229 | 18,903 | 25,000 |
Triangles | 2,448,064 | 2,448,064 | 4,189,680 | 21,886,158 |
Vertices | 1,245,996 | 1,245,996 | 2,128,208 | 10,993,079 |
DAE/GDML/WRL (MB) | 6.9/4.0/98 | 6.9/4.0/96 | 8.6/6.0/167 | 6.1M/-/- |
VGDX_20140414 counts using g4daeview.py -g 0: --with-chroma, Juno geometry truncated
[1] | Maximum DAE WRL offset < 0.13 mm, after patching VRML2 export precision bug. Details: http://simoncblyth.bitbucket.io/env/notes/geant4/geometry/collada/dae_cf_wrl/ |
Surface representation simplicity (vertices+triangles) and COLLADA standard file format means many tools/libraries available, allowing high level development. Basis libraries used for developments:
daenode.py : G4DAE => Chroma geometry
- https://github.com/pycollada/pycollada numpy based DAE parsing
- http://www.numpy.org
daeserver.py : 3D WebGL interface to G4DAE geometry
- https://github.com/mrdoob/three.js/ Javascript 3D
g4daeview.py : Fast OpenGL 3D viewer/navigator
- http://pyopengl.sourceforge.net OpenGL from python
- https://code.google.com/p/glumpy/ pyopengl+numpy integration
https://bitbucket.org/simoncblyth/env/src/tip/geant4/geometry/collada/daenode.py http://belle7.nuu.edu.tw/dae/tree/3148.html
[2] | MacBook Pro (2013), NVIDIA GeForce GT 750M 2048 MB ; Workstation GPUs such as NVIDIA Kepler K20 expected at least ~3x faster |
Raycasting exercises slowest part of optical photon propagation: geometry intersection.
[3] | Boundary Volume Heirarchy, a tree of bounding boxes with triangles inside leaf nodes |
Split work into multiple CUDA kernel launch, arranged in 2D pattern [4]
Reduced CPU load by reducing transfers
These improvements were necessary as Chroma Camera [6] with Dayabay geometry had
[4] | https://bitbucket.org/simoncblyth/env/src/tip/cuda/cuda_launch.py |
[5] | Using OpenGL pixel buffer objects (PBO) and CUDA-OpenGL interoperability techniques, https://bitbucket.org/simoncblyth/env/src/tip/pycuda/pycuda_pyopengl_interop/pixel_buffer.py https://bitbucket.org/simoncblyth/env/src/tip/chroma/chroma_camera/pbo_renderer.py |
[6] | Presumably Chroma was developed using fast Linux desktop GPUs where the kernel time limit can be avoided. |
[7] | burning CPU errors in system log |
Render Split into 3x3 CUDA kernel launches, 1 thread per pixel, ~1.8s for 1.23M pixels, 2.4M tris.
g4daeview.py --target=.. --eye="-0.3,-1.1,1.7" --look="0.1,0.4,-0.6" --up="1.2,2.0,1.6" --size=1440,852 --near=30.00000 --far=10000.0 --yfov=50.0 --with-chroma
g4daeview.py --with-chroma --metric tri --flags 1,0 # tri/time/intersect
All triangle intersections made, even when no visible contribution to render. (Potential for optimisation?)
G4DAE visualization [8] and "backbone" application for Chroma testing.
[8] | http://simoncblyth.bitbucket.io/env/notes/geant4/geometry/collada/g4daeview/g4daeview_usage/ |
[9] | (1, 2) Implemented with single OpenGL Vertex Buffer Object (VBO) for entire geometry |
[10] | https://bitbucket.org/simoncblyth/env/src/tip/geant4/geometry/collada/g4daeview/ |
Chroma GPU photon propagation at 12 nanoseconds. The photons are generated by Geant4 simulation of a 100 GeV muon travelling from right to left. Photon colors indicate reemission (green), absorption(red), specular reflection (magenta), scattering(blue), no history (white).
Chroma GPU photon propagation at 14 nanoseconds. The interface provides interactive control of the propagation time allowing any stage of the propagation to be viewed by scrubbing time backwards/forwards. The speed of this visualization is achieved by interoperation of CUDA kernels and OpenGL shaders accessing the same GPU resident photon propagation data.
Propagation steps OR photons can be selected by materials, propagation history, or special selection by photon identifier. Photons can be selected by clicking their 3D representations allowing inspection of the propagation history of individual photons.
Photon propagation steps with material pair GdDopedLS,Acrylic. The larger squares represent selected photons, providing access to numerical details of propagation history.
Propagation steps of a single photon, the steps at either sides of the inner and outer acrylic vessels are visible. The line color represents the photon history starting white and turning magenta following a specular reflection.
Initial photon positions of a Geant4 simulated muon that crosses between the Dayabay Near hall ADs. Colors represent photon wavelengths.
External view of Juno geometry with cutaway. The extreme size of the Juno geometry (50 million nodes in Chroma representation) provides a challenge for development on mobile GPUs. As my developments operate at the Geant4 level wherever possible it was relatively straightforward to apply the machinery developed for Dayabay to the Juno detector. In collaboration with Juno simulation experts the geometry was exported from Geant4 and GPU visualized in under a days work.
External view of Juno geometry. The extreme size of the Juno geometry (50 million nodes in Chroma representation) provides a challenge for development on mobile GPUs. The black rectangle arises due to aborts to avoid GPU crashes.
ZMQRoot [11] sends/receives ChromaPhotonList, 3 node [12]
[11] | ROOT+ZeroMQ messaging http://dayabay.ihep.ac.cn/tracs/dybsvn/browser/dybgaudi/trunk/Utilities/ZMQRoot |
[12] | Workaround for a network blockage, server/client or threaded also possible |
[13] | http://dayabay.ihep.ac.cn/tracs/dybsvn/browser/dybgaudi/trunk/Simulation/DetSimChroma/src/DsChromaStackAction.cc |
[14] | https://bitbucket.org/simoncblyth/env/src/tip/zeromq/czmq/czmq_broker.c |
[15] | https://bitbucket.org/simoncblyth/env/src/tip/geant4/geometry/collada/g4daeview/daeresponder.py |
Export G4 Geometry as DAE
DAE to Chroma
Chroma Validation
[16] | G4LogicalBorderSurface class comprises an ordered pair of volume references and surface properties |
Geant4 -> Chroma in development, 1/1000 photon scaledown
Chroma -> Geant4
OSX Preview.app render
OSX Preview.app render
Suitable GPUs only ~250-350 USD
eg NVIDIA GeForce GTX 680 (1536 CUDA cores, 2048 MB, Compute Capability 3.0) ~ 330 USD