Jan 15, 2025

12 min read

Realtime Multiplayer Person Detection

What the Program Does

This program performs real-time enemy detection in multiplayer games by capturing the screen, processing the image through a trained neural network, and highlighting detected players with bounding boxes. Initially prototyped in Python, the application was progressively optimized by transitioning to C++ and leveraging advanced techniques like direct GPU mapping via Windows APIs. The end result is a highly efficient detection system capable of processing frames in as little as 10 ms, making it ideal for real-time gaming scenarios and beyond.

A single GIF demonstrating how well the detection works.

Normal vs. Slow Motion

Disclaimer: This project is a technical showcase and is not intended to support any form of cheating software. It demonstrates how advanced technologies can be leveraged to overcome anti-cheat measures—hopefully inspiring anti-cheat developers to up their game!

I was on the lookout for a fun yet practical project to expand my skillset in AI computer vision. The project needed to meet several key requirements:

Abundant Data: The solution should allow for easy access to a large amount of visual data.
Dataset Availability: Ideally, there would be pre-existing datasets to jump-start the project.
Practical Utility: The outcome should have a tangible application.
Real-Time Performance: The system must process frames quickly (e.g., 2 ms per frame).

My first thought was to implement an enemy detection system for a multiplayer shooter—aiming to fire when an enemy is centered on my screen.

Initial Prototype

I started with a Python prototype, which was straightforward thanks to the vast ecosystem of libraries. Here’s the workflow:

Data Acquisition: I grabbed a pre-labeled dataset for Counter-Strike 2.
Model Training: I trained a YOLOV11 bounding box model.
Frame Processing: A Python script captures a frame from the screen, processes it through the YOLOV11 model using PyTorch, and outputs the results.
Visualization: Another script renders the bounding boxes onto a transparent overlay in the foreground window.

While this prototype successfully detected players, it suffered from Python’s limitations in optimization and memory management. The total processing time was around 52 ms per frame:

Frame Capture & RAM Download

30 ms

GPU Upload, Preprocessing & Model Input

10 ms

Model Inference

12 ms

Process Visualization: Detecting Enemies

Transition to C++ and using Tensorflow for Enhanced Performance

To overcome Python’s performance bottlenecks, I rewrote the application in C++. For frame capturing, I leveraged the Desktop Duplication API to directly retrieve the screen’s framebuffer as DXGI on the GPU. The process involved:

Transferring the frame from GPU to RAM.
Registering the texture to CUDA (GPU) and preprocessing directly on the GPU.
Using TensorFlow for inference.

The optimized breakdown was:

Frame Capture & RAM Download

17 ms

GPU Upload, Preprocessing & Model Input

10 ms

Model Inference

7 ms

This brought the processing time down to 34 ms per frame. Although a marked improvement, the RAM transfer still posed a bottleneck.

Further Optimization: Reducing Data Transfer Overhead

To reduce latency, I cropped the input to the central 640x640 region, minimizing data transfer and processing overhead. The new benchmarks were:

Frame Capture & RAM Download

9 ms

GPU Upload, Preprocessing & Model Input

5 ms

Model Inference

4 ms

Resulting in an overall processing time of 18 ms per frame.

The Final Leap: Direct GPU Mapping with Windows.Media.CaptureAPI

The ultimate challenge was eliminating the unnecessary memory transfers between VRAM and RAM. I discovered that the Windows.Media.CaptureAPI allowed for direct frame capture and mapping to CUDA. By using a GitHub resource as inspiration (link coming soon), I implemented the following streamlined workflow:

Direct Frame Capture to CUDA Array

3 ms

CUDA-based Preprocessing

3 ms

Model Inference

4 ms

This approach reduced the total processing time to an astonishing 10 ms per frame.

Legend

Frame Capture

Preprocess

Inference

Overall Performance Evolution Diagram

Python Prototype (52 ms total)

Frame Capture (30 ms)

Preprocess (10 ms)

Inference (12 ms)

C++ Implementation (34 ms total)

Frame Capture (17 ms)

Preprocess (10 ms)

Inference (7 ms)

Cropped Frame Optimization (18 ms total)

Frame Capture (9 ms)

Preprocess (5 ms)

Inference (4 ms)

Direct GPU Mapping (10 ms total)

Frame Capture (3 ms)

Preprocess (3 ms)

Inference (4 ms)

Total Speedup: From 52 ms to 10 ms (5.2x faster).

Speedup Diagram (Total Time Only)

Python Prototype (52 ms)

52 ms

C++ and Tensorflow Implementation (34 ms)

34 ms

Cropped Frame (18 ms)

18 ms

Direct GPU Mapping (10 ms)

10 ms

Bars are relative to the Python prototype’s 52 ms (set to 100% width).

This project showcases how combining state-of-the-art APIs and optimizations can lead to substantial performance gains in real-time computer vision applications. While my journey began with a simple Python prototype, transitioning to C++ and integrating direct GPU capture techniques ultimately pushed the performance into a truly real-time realm.

Happy coding and always keep pushing the boundaries of what’s possible!

What I Learned and Future Applications

Through this project, I expanded my skill set across several areas:

Deep Learning Deployment: I learned how to train and deploy a Convolutional Neural Network (CNN) with Tensorflow for object detection, which can be optimized to run especially on NVIDIA devices like the NVIDIA Jetson.
Graphics Programming: I acquired foundational knowledge of DirectX and the Windows API, as well as integrating OpenGL for rendering, enabling the creation of efficient windowed applications.
Performance Optimization: Transitioning from Python to C++ and implementing direct GPU mapping helped me master techniques for minimizing memory transfers and achieving real-time performance.
Data Labeling: I created automatic labeling tools (see my other projects), streamlining the dataset preparation process for training machine learning models.

This project not only demonstrates the application of advanced computer vision techniques in real-time gaming, but it also opens the door to innovative applications in robotics, autonomous vehicles, and other fields where speed and accuracy are critical.