Concurrent Programming
C/C++
Express.js
JavaScript(NodeJS)
Linux
Git
CI/CD
OpenGL
UEFI
Boost
Assembly Language(Arm)
AWS
Qt
Software Engineer with strong expertise in C++ (C++17/20), high-performance computing, and low-latency system optimization. My experience spans SIMD-based rendering pipelines, software rasterization, AVX2-optimized computation, multithreading with TBB/OpenMP, and distributed systems using gRPC and Boost.Asio.
Recently, I’ve been focusing on real-time graphics systems, custom rendering engines, and software rasterizers optimized for both CPU and GPU pipelines. My work includes developing high-performance data structures for sparse grid rendering and optimizing rasterization routines leveraging SIMD instructions for maximum computational throughput.
In addition to my experience with HPC systems, federated learning frameworks, and multi-threaded distributed services, I'm deeply interested in graphics rendering engines, real-time paint systems. I'm actively building expertise in low-latency, tile-based rendering architectures and shader programming to push modern hardware to its limits.
Passionate about graphics, rendering, and performance engineering.
EDUCATION
Master of Computer Science
Computer Science
2023 — present
The University of Sydney
EXPERIENCE
Software Engineer
CognixAI
12/2024-03/2025
- Designed and facilitated backend services using Go and Javascript, supporting RESTful APIs and internal tools that streamlined development workflows.
- Launched CI/CD pipelines with AWS Lambda, ECS, and GitHub Actions, achieving 2+ weekly releases with automated testing and safe rollback support.
- Automated routine data ingestion and updates via SQL stored procedures and Python scripts, improving availability and reliability for 50+ daily users.
Software Engineer
Internet 2.0
10/2024-12/2024
- Integrated AI models into production via gRPC streaming, enabling real-time multi-client synchronization with strong consistency guarantees.
- Built end-to-end CI/CD workflows using GitHub Actions, ensuring reliable model versioning, testing, and deployment in a distributed setup.
- Applied TDD and unit testing practices to validate AI components and maintain stability in real-time backend systems.
Software Engineer
SUIRUI
09/2022-08/2024
- Managed a multi-threaded messaging protocol for 8k+ concurrent users within multi-threaded distributed systems; wrote 50+ TDD-based tests and used Valgrind and GDB for debugging and memory leak detection.
- Launched refactoring of legacy C modules into C++17 components with STL and Boost, and introduced Rust in safety-critical paths to improve memory safety and concurrency.
- Led and designed the libHPC high-performance computing library, implementing SIMD-specialized sparse data structures with memory layout redesign and SSE/AVX2 vectorization, achieving up to 3× speedup in real-time media pipelines.
Software Engineer
North China University of Technology
03/2023-06/2023
- Ported the UEFI framework to Raspberry Pi 4B using C for embedded systems, developing bare-metal startup routines and integrating ARM Cortex-M support with memory-efficient code.
- Integrated a lightweight MQTT library into the bare-metal UEFI environment, enabling the Raspberry Pi to control peripheral GPIO and perform LAN-based telemetry and command dispatch via a lightweight message broker.
- Designed and facilitated a QEMU-based virtual UEFI environment for firmware debugging and validation, applying TDD techniques in simulated boot stages using assertion macros.
C++ Developer
Beihang University
01/2022–01/2023
- Collaborated a 3D rendering simulation engine using C++17 and GPU acceleration, integrating Conventional Rasterization, Real-time Ray Tracing, and Offline Path Tracing systems, supporting both interactive and high-fidelity rendering workflows.
- Developed CUDA kernels (e.g., matrix transpose, ray-triangle intersection, GEMM) with shared memory and warp-level optimizations, using linear algebra and trigonometry to achieve 12× speedup in computational geometry workloads.
- Achieved a 20 FPS boost by integrating SIMD (SSE/AVX2) vectorization and TBB-based multithreading, implementing CPU optimization and parallelism techniques to conduct performance optimization.
PROJECTS
Distributed Chat System Development
gRPC
C++
FFmpeg
Qt
Boost Asio
Proactor
Designed a scalable real-time instant message system prototype, featuring an intuitive Qt UI and gRPC framework supported large-scale system integration, managing over 8,000 simultaneous connections while ensuring seamless communication for more than 10,000 users.
Impact: Built a distributed chat system handling 8,000+ connections and 10,000 users with optimized multithreading and resource management., CMake, gRPC, OpenGL, FFmpeg, STL, Boost, NodeJS, GLSL, Proactor
Software Rasterizer
C++
Linear Algebra
OpenCV
STL
glm
Computer Graphics
Authored a 3D rendering engine with Conventional Rasterization Framework, Real-time Ray Tracing System, and Offline Path Tracing System.
Introduced advanced algorithms leveraging TBB and SIMD (SSE & AVX2), achieving a remarkable boost of 20 FPS while maintaining system stability under peak operational conditions.
Impact: Designed a high-performance Software-Rasterizer for both real-time (exceeding 60 FPS) and offline (1024 SPP) applications.., C++17, CMake, Computer Graphics, Eigen, Linear Algebra, Calculus, STL