> less resume.pdf
Summary
Senior Systems Research Engineer specializing in high-performance AI and cloud infrastructure, with a focus on low-level LLM inference optimization (inference engine, custom kernels, networking).
Work Experience
Huawei R&D UK
Edinburgh, Scotland, United Kingdom
05/2023 - Present
■
Senior Systems Research Engineer
Large scale LLM Inference optimization for Huawei Ascend NPUs.
Led multiple key projects to production integration and supervised two research interns. Currently working on long-context LLM inference and sparse attention.
-
Developed lightweight NPU Peer-to-Peer (P2P) Transfer Library, increasing KV cache transfer bandwidth by 2.3x, significantly outperforming existing NPU libraries for both RoCE and HCCS.
-
Wrote high-performance NPU kernels for several critical scenarios including Mixture of Experts Dispatch/Combine, Large Recommendation Model Embedding Retrieval and KV Cache Transfer.
-
Contributed support for LLM Prefill-Decode (PD) Disaggregation and P2P KV Cache Sharing on vLLM-Ascend to the open-source LMCache-Ascend project.
-
Improved Ascend 910B point-to-point bandwidth by 5.57x over single-path baseline by developing a software-based multipath transfer library tailored for its mesh-based topology.
-
Developed a QoS aware NPU-sharing mechanism, improving resource utilization by enabling colocation of smaller models while maintaining SLOs.
Awards: 2x President’s Award - Significant Business Contribution, European Research Institute Excellent Contributor Award, 2012 Labs Outstanding Contributor Award, Quality Star Award
Huawei R&D UK
Edinburgh, Scotland, United Kingdom
11/2021 - 05/2023
Systems Research Engineer
Performance & resource efficiency optimization of Huawei cloud workloads.
-
Developed a distributed Kubernetes scheduler optimized for real-time, high-throughput scheduling decisions, utilizing eBPF for fine-grained, low-overhead monitoring.
-
Designed and implemented custom scheduling algorithms to maximize resource utilization and ensure performance isolation for colocated cloud workloads.
-
Created a comprehensive benchmark suite and load generator to evaluate new algorithms and architectures against representative production scenarios.
Awards: Future Star Award
Imec IDLab
Ghent Area, Belgium
08/2019 - 09/2019
Research Intern
Built web archival and automated quality analysis tools for the Royal Library of Belgium.
I
Imec IDLab
Ghent Area, Belgium
08/2018
Research Intern
Developed a fragmented R-tree index to enable efficient geospatial querying of linked data.
Education
ETH Zurich
Zurich, Switzerland
Sep 2019 - Sep 2021
■
Master of Science in Computer Science
Grade: 5.71/6 (Top 10% of class)
Focus on (Distributed) Systems and High Performance Computing.
Master thesis: Analysis and Optimization of Serverless Cold Start Latencies through Function Snapshots at the ETH Efficient Architectures and Systems Lab under supervision of Prof. Ana Klimovic
I
Ghent University
Ghent, Belgium
Sep 2016 - Jun 2019
Bachelor of Science in Computer Science
Grade: 808/1000 (1st of class)
Minor in Electronics and Telecommunication.
Technical Skills
Languages:
Python, C/C++, Go
ML & Inference:
vLLM Internals, PyTorch, Kernel Development, RDMA/RoCE, CUDA
Cloud:
Kubernetes, Container Runtimes, eBPF, Serverless, DevOps & Observability
Note: this list is non-exhaustive and only includes recently used skills. Generally, I am quick and eager to learn new languages, frameworks and technologies where necessary.
Interests
Outside of work, I enjoy reading up on recent hardware & infrastructure buildouts, playing tennis, running, hiking and ski touring.
>