GPU Timestamp Profiling and Monitor Semantics¶
This document defines how Gaussian Splatting timing monitors should be interpreted.
Custom monitor coverage¶
The following custom monitors are exported under the gaussian_splatting/ prefix:
telemetry_active: lifecycle flag (1when at least one Gaussian renderer is registered,0when telemetry is inactive).gpu_time_cull_ms: cull stage duration from stage metrics (fallback: culling summary).gpu_time_sort_ms: sort stage duration from stage metrics (fallback: frame sort time).gpu_time_binning_ms: tile binning GPU timestamp duration.gpu_time_prefix_ms: tile prefix/overlap-count GPU timestamp duration.gpu_time_raster_ms: tile raster GPU timestamp duration.gpu_time_resolve_ms: tile resolve GPU timestamp duration.gpu_time_frame_ms: total GPU frame duration for the tile renderer path.route_uid: active render-route UID for the current frame diagnostics.sort_route_uid: active sort-route UID for the current frame diagnostics.
Telemetry availability¶
Use telemetry_active to distinguish unavailable monitor samples from valid runtime zeros.
When telemetry_active == 0, monitor consumers should treat route/timing/stat counters as inactive.
Non-placeholder behavior¶
Timing values now preserve the last valid sample until a newer timestamp sample is available.
This prevents per-frame resets to 0.0 from being interpreted as real timings when timestamp
readback is temporarily delayed.
Expected 0.0 cases:
- No active renderer is registered.
- The pass did not run (for example, no visible splats).
- No valid sample has ever been captured yet.
Freshness fields¶
Runtime diagnostics expose:
gpu_timing_frame_serialgpu_timing_frames_behind
Use these with timing values to decide whether a sample is fresh for the current frame.
Pipeline Trace Data-Flow Recent Window¶
When rendering/gaussian_splatting/debug/enable_pipeline_trace is enabled, pipeline trace snapshots now include
data_flow.recent_window alongside lifetime counters.
capacity: fixed ring-buffer capacity (bounded, no unbounded growth).frames_recorded: number of recent per-frame deltas currently retained.frame_deltas: ordered per-frame telemetry deltas (for near-term regression localization).- aggregate counters (
pack_sh_samples,pack_range_calls, etc.) over the recent window.
Production metrics validation¶
Production metrics now include:
gpu_frame_msgpu_binning_msgpu_prefix_msgpu_raster_msgpu_resolve_msgpu_timing_frame_serialgpu_timing_frames_behindgpu_pass_breakdown_available
Validation checks:
- all timing fields must be finite and non-negative;
- cull/sort/raster stage timings must be non-placeholder for meaningful successful workloads;
- when a GPU timing frame sample is available, GPU pass breakdown must not collapse to all-zero placeholders.