GPU · Graphics · High-Performance Computing

Jonathan Liu
Graphics & Systems
Engineer

Software engineer specialising in GPU programming, real-time rendering, and low-latency system optimisation. I build high-performance rendering engines, CUDA compute kernels, and systems-level infrastructure — from Vulkan pipelines and warp-synchronous GPU algorithms to SIMD-optimised CPU rasterisers.

Projects
01

Core Expertise

GPU & Graphics

Vulkan, OpenGL, GLSL, CUDA, Render Pipelines, Path Tracing, Ray Tracing, Rasterisation

Languages

C++17/20 (primary), C, Python, ARM Assembly

HPC & Perf

SIMD (SSE/AVX2), TBB, OpenMP, Lock-Free, Warp-Level Primitives, perf, Nsight

Systems

Linux, Boost.Asio, gRPC, Docker, AWS, CMake, GDB/LLDB, Valgrind

02

Projects

Real-Time Rendering

Vulkan-Platform

A Vulkan 1.3 real-time rendering engine built from scratch, featuring multi-queue architecture, timeline semaphore synchronisation, compute + graphics pipelines, GPU particle systems, and dynamic rendering — designed for maximum GPU utilisation and minimal CPU overhead.

Vulkan 1.3 C++17 GLSL Compute Shaders Timeline Semaphores
  • Multi-queue command submission with compute and graphics pipeline coordination
  • GPU particle system driven entirely by compute shaders
  • Resource state machine with Nsight Graphics–validated synchronisation
  • Dynamic rendering (VK_KHR_dynamic_rendering) eliminating render pass overhead
View on GitHub
VK
Offline Rendering

Software-Rasterizer

A CPU-side rendering engine combining conventional rasterisation, Whitted-style ray tracing, and Monte Carlo path tracing with importance sampling. Optimised with AVX2/SSE vectorisation and TBB multithreading for both real-time (60+ FPS) and offline (1024 SPP) workloads.

C++17 Path Tracing AVX2/SSE TBB BVH Acceleration
  • Monte Carlo path tracing with unbiased estimation and importance sampling
  • BVH-accelerated ray intersection for interactive preview speeds
  • SIMD-optimised rasterisation achieving 20+ FPS improvement
  • Cache-aware memory layout and loop restructuring for CPU pipeline efficiency
View on GitHub
RT
High-Performance Computing

libHPC

A CUDA and C++ high-performance computing library featuring warp-synchronous algorithms, hierarchical sparse grid frameworks, and GPU kernel profiling infrastructure. Demonstrates design of latency-bounded, high-throughput compute kernels at scale.

CUDA C++17 OpenMP TBB SIMD Lock-Free
  • CUDA radix sort processing 500M elements in 360ms on RTX 3080 Ti (~1.39B elem/sec)
  • Warp-synchronous algorithms with __match_any_sync histogram optimisation
  • Hierarchical sparse 2D grid with lock-free, TBB, and dense composable node types
  • ~35% warp divergence reduction via SASS-level analysis and tiled shared memory
View on GitHub
Systems Programming

Distributed Instant Messaging

A low-latency, event-driven messaging system sustaining 8,000+ concurrent sessions with jitter control, non-blocking I/O, explicit backpressure, and bounded queues — built with Boost.Asio, gRPC, and Qt6.

C++17 Boost.Asio gRPC Qt6 Redis Docker
  • Event-driven architecture with proactor pattern and bounded queue backpressure
  • SQL latency optimised via EXPLAIN + optimizer trace for predictable throughput
  • Deterministic regression harness with ASan, GDB, and concurrency stress testing
  • Cross-platform reproducible builds across Linux, macOS, and Windows
View on GitHub

Let's Talk

Actively seeking GPU software engineering and graphics roles. Open to opportunities in Shanghai and across the APAC region.