Why Anti Cheats Walk Your Call Stack

When a thread reaches a syscall like NtReadVirtualMemory or a Windows API entry like GetAsyncKeyState, the x64 call stack already contains every return address from the current frame back to RtlUserThreadStart. Each address identifies the code that placed it there: the module it belongs to, whether that module is file-backed or allocated privately, and whether a call instruction actually precedes the return site. Anti cheats read this chain and pose a single question — does the execution path originate from code that is supposed to exist? Answering that question requires no signature database, no pattern scan, and no hook. The call stack is the densest artifact available at the moment of any Application Programming Interface (API) invocation, and forging all of its properties at once is a problem that compounds with every frame. BattlEye, Easy Anti-Cheat (EAC), and most modern Endpoint Detection and Response (EDR) platforms treat stack walking as their primary heuristic for identifying injected, mapped, or otherwise illegitimate code.

This article breaks down the x64 unwinding mechanism that makes stack capture possible, examines each validation anti cheats perform against individual frames, explains how kernel drivers elevate stack walking beyond user mode interference, contrasts BattlEye and EAC implementations, and traces the arms race through return address spoofing, SilentMoonwalk, module stomping, and indirect syscalls.

Background

Cheat development has migrated toward manual mapping, direct syscalls, and kernel mode execution. These techniques strip away the traditional artifacts that module enumeration and memory scanning rely on, but none of them erase the call stack. Anti cheats need a detection primitive that persists even when injected code leaves no loader entry, no disk footprint, and no hooked function. Stack walking provides exactly that. The public evidence base for the mechanisms described here draws primarily from secret.club’s 2020 reverse engineering of BattlEye’s shellcode and from independent analyses of EAC’s kernel driver behavior.

The x64 unwind model

x86 stack unwinding follows frame pointer chains: read the saved EBP, follow it to the previous frame, repeat. Windows x64 replaces this entirely with metadata-driven unwinding. Every non-leaf function compiled into a Portable Executable (PE) image has a corresponding RUNTIME_FUNCTION entry stored in the .pdata section. This entry statically describes the function’s stack layout at compile time. Leaf functions that never modify non-volatile registers or touch RSP do not need one.

typedef struct _RUNTIME_FUNCTION {
    DWORD BeginAddress;   // RVA of function start
    DWORD EndAddress;     // RVA past function end
    DWORD UnwindData;     // RVA to UNWIND_INFO
} RUNTIME_FUNCTION;

The UNWIND_INFO that UnwindData points to describes exactly what the function prologue does. Each prologue operation maps to an UNWIND_CODE entry:

UWOP_PUSH_NONVOL — pushes a non-volatile register
UWOP_ALLOC_SMALL — allocates up to 128 bytes of stack space
UWOP_ALLOC_LARGE — two forms: 136 bytes to 512K-8 (op info 0) or 512K to 4 GB minus 8 (op info 1)
UWOP_SET_FPREG — establishes a frame pointer register

The unwinder replays these codes in reverse order, computing each frame’s exact size and locating the return address without relying on an RBP chain. Because this metadata is baked into the PE at link time, the unwinder never needs to execute code or trust runtime state. It reads .pdata, reconstructs the call history deterministically, and produces results that hold even when the target process has done everything possible to obscure its execution.

How the unwinding loop operates

Two functions drive the traversal. RtlLookupFunctionEntry locates the RUNTIME_FUNCTION for a given instruction pointer. RtlVirtualUnwind consumes the associated unwind codes and computes the caller’s register context. When no RUNTIME_FUNCTION exists for an address, the unwinder treats the function as a leaf and reads the return address directly from [RSP].

RtlCaptureContext(&context);

while (context.Rip != 0) {
    PRUNTIME_FUNCTION fn = RtlLookupFunctionEntry(
        context.Rip, &imageBase, NULL
    );

    if (!fn) {
        // leaf function - return address is at [RSP]
        context.Rip = *(ULONG64*)context.Rsp;
        context.Rsp += 8;
    } else {
        RtlVirtualUnwind(
            UNW_FLAG_NHANDLER, imageBase,
            context.Rip, fn, &context,
            &handlerData, &establisherFrame, NULL
        );
    }

    capturedStack[depth++] = context.Rip;
}

Anti cheats use this full unwind path rather than the convenience wrapper RtlCaptureStackBackTrace. The complete traversal yields return addresses, frame sizes, and per-frame module attribution, all of which feed into the validation checks that follow.

Module backing validation

The simplest frame-level check asks whether a return address falls inside memory that the Windows image loader placed there. The operating system tags loader-backed memory as MEM_IMAGE, meaning it corresponds to a file on disk mapped through standard channels. Memory from VirtualAlloc carries the MEM_PRIVATE type. Memory from MapViewOfFile is MEM_MAPPED. A manually mapped PE that calls VirtualAlloc for its image sections produces MEM_PRIVATE pages.

A return address resolving to MEM_PRIVATE memory signals code that did not arrive through normal loading. Injected payloads, shellcode, and manually mapped modules all fail this check immediately. This single fact is the entire reason module stomping exists as an evasion technique: overwriting the .text section of a legitimately loaded Dynamic Link Library (DLL) forces return addresses to resolve against a real, file-backed module.

Truncated stacks

A well-formed user mode call stack unwinds completely back to RtlUserThreadStart or BaseThreadInitThunk. When the unwinder encounters data it cannot parse and terminates early, the truncation signals one of several problems: corrupted frame data, a stack pivot to attacker-controlled memory, or synthetic frames that fail to chain correctly.

This check specifically punishes spoofing implementations that fix only the immediate return address while leaving the remainder of the stack malformed. A single broken link anywhere in the chain is enough.

Gadget signatures

Return address spoofing commonly relies on Jump-Oriented Programming (JOP) gadgets as fake return sites. A popular technique places the address of a jmp qword ptr [rbx] instruction as the return address, so that when the monitored function returns, control flows through the gadget back to the caller’s fixup code via a register.

BattlEye targets this pattern directly. Its exception handler inspects the opcode at each return address, scanning for the byte sequence FF 23:

const auto spoof = *(_WORD *)caller_function == 0x23FF; // jmp qword ptr [rbx]

Switching to a different gadget — a different register, or a different instruction entirely — sidesteps this particular signature. But a more robust detection approach scans the bytes immediately before each return address, looking for a call instruction. Legitimate return addresses always sit right after a call that placed them on the stack. If no call precedes a return site, the frame is almost certainly fabricated. This structural check catches most basic spoofing regardless of which gadget an attacker selects.

Impossible call sequences

Passing module backing and gadget checks is necessary but not sufficient. The sequence of functions in the stack also needs to make logical sense. A stack showing KERNELBASE!PathReplaceGreedy calling into KERNELBASE!SystemTimeToTzSpecificLocalTimeEx represents an execution path that no legitimate code produces. Synthetic stacks that pick arbitrary return addresses from real modules risk creating function pairings that are structurally valid but logically impossible.

Deep call graph validation is expensive. It requires building and maintaining knowledge of which functions legitimately call which others. Most anti cheats do not perform this analysis today. Elastic’s EDR does, and anti cheat vendors tend to absorb EDR techniques on a delay of a few years.

Stack pivot detection

The Thread Environment Block (TEB) stores each thread’s stack boundaries in two fields: StackBase and StackLimit. If RSP falls outside these bounds, the thread’s stack has been pivoted to memory the attacker controls, typically heap or global data hosting a Return-Oriented Programming (ROP) chain.

PTEB teb = NtCurrentTeb();
if (rsp < teb->NtTib.StackLimit || rsp > teb->NtTib.StackBase) {
    // stack pivot detected
}

Windows Defender Exploit Guard hooks VirtualAlloc and VirtualProtect specifically to run this bounds check on every call. The check is cheap, fast, and effective. Bypassing it requires the ROP chain to execute on the thread’s actual stack, which severely constrains the attacker’s options.

APC-based kernel stack walking

Everything described so far operates at user mode privilege. Anti cheats that ship kernel drivers have substantially stronger tools, and this is where evasion becomes genuinely difficult.

The most straightforward kernel approach queues an Asynchronous Procedure Call (APC) to a target thread. When the APC fires, it executes in the thread’s own context and has full access to the thread’s stack. Calling RtlCaptureStackBackTrace from inside the APC captures the call chain as it exists at that moment.

Both BattlEye and EAC use APC-based stack capture. The limitation is that threads can block APCs. A thread running with kernel APCs disabled does not execute the queued APC until re-enablement, and cheat developers discover this quickly.

Non-Maskable Interrupt callbacks

Non-Maskable Interrupts (NMIs) exist precisely because some signals must not be blockable. Anti cheats register an NMI callback through KeRegisterNmiCallback and periodically fire NMIs at each Central Processing Unit (CPU) core. When the interrupt lands, the callback captures the full processor state — RIP, RSP, and the complete register context — of whatever code is executing at that instant. If the interrupted RIP sits outside every loaded kernel module, unsigned code is running.

The technique is probabilistic. A single NMI might not catch the target code mid-execution. But NMIs are cheap to send, and over the course of a match, periodic sampling statistically catches any thread that spends meaningful time in unsigned memory. No flag disables NMIs, no callback unhooks them, and no state manipulation prevents them from firing. This makes hiding kernel mode code execution fundamentally hard.

Instrumentation callbacks

EAC takes a distinct approach to monitoring user mode syscall origins. The EAC driver writes the InstrumentationCallback field inside the EPROCESS structure using NtSetInformationProcess. When any syscall returns to user mode, the kernel checks this field. If it is set, execution redirects to the callback address instead of the original return site. The original return address is placed in R10.

// in the instrumentation callback
PVOID returnAddr = (PVOID)teb->InstrumentationCallbackPreviousPc;  // R10

if (!isInModule(returnAddr, "ntdll.dll") &&
    !isInModule(returnAddr, "win32u.dll")) {
    // direct syscall detected
}

The callback validates that R10 points into ntdll.dll or win32u.dll, the only modules that should contain the syscall instruction on a legitimate execution path. This catches direct syscalls without placing inline hooks on ntdll functions and without modifying the System Service Descriptor Table (SSDT). A callback fires on every single syscall return, and bypassing it from user mode is extremely difficult because the redirection occurs at kernel privilege.

ETW Threat Intelligence stack traces

The Event Tracing for Windows (ETW) Threat Intelligence (TI) provider, identified by GUID {09997CFF-4065-4844-BE7E-2A2DEEAD0D02}, is restricted to drivers running at Protected Process Light (PPL) level. Only Antimalware PPL or Early Launch Anti-Malware (ELAM) signed drivers can consume its events. The provider surfaces kernel level visibility into sensitive operations: executable memory allocation, cross-process memory writes, thread context modification, and APC insertion.

When a consumer enables EVENT_ENABLE_PROPERTY_STACK_TRACE, each event carries the full call stack captured through RtlWalkFrameChain. Every NtWriteVirtualMemory, every NtProtectVirtualMemory, every NtCreateThreadEx arrives with a complete record of the code path that invoked it.

Elastic’s EDR uses this infrastructure to tag over 20 behavioral indicators on ETW events: direct syscalls where ntdll is absent from the stack, ROP gadgets in the call chain, unbacked executable memory, proxy calls routed through unexpected modules, and more. Anti cheat vendors have access to the same infrastructure. Their implementations are less publicly documented, but the underlying mechanism is identical.

BattlEye’s shellcode tripwires

BattlEye’s user mode stack walking is the best-documented implementation, largely thanks to secret.club’s reverse engineering work. During competitive matches, BattlEye streams a roughly 8 KB shellcode payload into the game process — what secret.club refers to as “shellcode8kb.” This shellcode installs a Vectored Exception Handler (VEH) and places INT3 breakpoints on a curated list of functions:

GetAsyncKeyState, GetCursorPos, IsBadReadPtr, NtUserGetAsyncKeyState, GetForegroundWindow, CallWindowProcW, NtUserPeekMessage, NtSetEvent, sqrtf, __stdio_common_vsprintf_s, CDXGIFactory::TakeLock, TppTimerpExecuteCallback

When a breakpoint fires, the VEH reads the caller address from the top of the stack and runs four checks:

NtQueryVirtualMemory fails on the caller address — the memory region is being hidden, suggesting a hook
The memory is uncommitted — Virtual Address Descriptor (VAD) manipulation is in play
The memory is executable but not MEM_IMAGE — manually mapped or injected code
The return address contains FF 23 — namazso’s jmp qword ptr [rbx] gadget

If any check triggers, BattlEye sends report ID 0x31 to its servers, including 32 bytes of the calling function’s code and the associated memory metadata.

The function list reveals deliberate profiling. It does not consist solely of input functions. sqrtf and __stdio_common_vsprintf_s appear because cheat code routinely uses math routines and string formatting. CDXGIFactory::TakeLock catches DirectX interaction. The selection reflects BattlEye studying which functions cheats actually invoke and placing breakpoints accordingly.

secret.club also notes that BattlEye’s signature for locating CDXGIFactory::TakeLock includes CC padding bytes from a specific compilation, causing the signature to break across different builds. These kinds of small errors limit the effectiveness of an otherwise strong detection approach.

EAC’s kernel stack copying

EAC goes further than BattlEye on the kernel side. The EAC driver does not limit itself to queuing APCs. It copies the raw contents of kernel thread stacks asynchronously and scans them for code pointers referencing non-paged memory that does not belong to any loaded kernel module.

This bypasses the APC limitation entirely. The driver reads the thread’s kernel stack directly from its KTHREAD structure and walks the data offline. If pointers to unsigned memory appear on the stack — even as saved return addresses from functions that have already returned — the thread is flagged. Disabling APCs or keeping a thread in a non-alertable state does nothing to prevent this. The stack is just memory, and EAC reads it without the thread’s cooperation.

Combined with instrumentation callbacks for user mode syscall origin validation and RtlLookupFunctionEntry / RtlVirtualUnwind for proper stack trace construction, EAC achieves coverage across both user and kernel transitions. It is arguably the most thorough stack-based detection implementation in the anti cheat space.

Return address spoofing in practice

The accumulation of detection mechanisms pushes evasion toward increasing sophistication. The most basic spoofing technique works as follows: before calling a monitored function, swap the return address on the stack for a pointer to a JOP gadget located in a legitimate module, and stash the real return address in a non-volatile register. When the call returns, the gadget redirects execution back through the register.

This defeats the module backing check because the return address resolves to ntdll.dll or kernel32.dll. It fails the gadget detection check, since BattlEye scans for FF 23, and it fails the missing call instruction check, since the byte before a gadget is not a call. namazso’s original work demonstrates that switching to a different gadget avoids BattlEye’s specific signature. But the structural problem persists: gadget addresses do not follow call instructions, and any anti cheat checking for a preceding call catches this class of spoofing wholesale.

SilentMoonwalk

klezVirus pushes spoofing substantially further with SilentMoonwalk, which constructs entire synthetic stack frames using .pdata metadata. It operates in two distinct modes.

Synthetic mode builds artificial frames that mimic legitimate function calls. Each frame is sized correctly according to the target function’s UNWIND_INFO, so RtlVirtualUnwind produces a coherent trace. The frames point to real functions in real modules with correct frame geometry.

Desync mode identifies problematic frames in the real stack and replaces them with frames from legitimate functions. It specifically searches for UWOP_PUSH_NONVOL entries to find frames that restore RBP on unwind, keeping the frame pointer chain intact.

Both modes use ROP to restore the original stack state after the monitored call completes. While the call is in progress, the stack shows only legitimate modules and plausible call sequences. SilentMoonwalk is the most complete user mode stack spoofing implementation in public existence.

The weakness is in the word “plausible.” Synthetic mode can produce call sequences that are structurally valid — frame sizes match, return addresses point into real modules — but represent execution paths that never actually occur. If an anti cheat maintains a model of legitimate call graphs, synthetic stacks become distinguishable from real ones through logic rather than structure.

Module stomping

Rather than fixing the call stack, module stomping fixes where code resides. The technique loads a legitimate DLL, overwrites its .text section with the attacker’s payload, and executes from that location. Every return address naturally resolves to the stomped module because the attacker’s code physically occupies that module’s memory.

Detection compares the in-memory module against its on-disk counterpart. PE-sieve performs this comparison byte-by-byte. If more than roughly 10,000 bytes differ in the code section, the module is flagged, since legitimate hooks patch far fewer bytes. More advanced variants restore the original bytes after execution, but memory page metadata — specifically the SharedOriginal flag in the working set entry — still changes when the page is written to. Tools like Moneta detect this modification.

Indirect syscalls

Instead of executing the syscall instruction from attacker-owned code, which places the attacker’s module address in the return stack, indirect syscalls jump directly to the syscall instruction inside ntdll.dll. From the kernel’s perspective, the return address points into ntdll, passing the instrumentation callback check.

The limitation is that indirect syscalls still bypass the inline hooks that EDR products place at ntdll function entry points. If an anti cheat validates that the call enters through a function’s actual prologue, not merely that syscall executes from within ntdll’s address range, indirect syscalls alone are insufficient. Passing one check fails another.

The asymmetry that favors defenders

Every evasion technique leaves a residual signal. Return address spoofing leaves gadget patterns and missing call instructions. SilentMoonwalk creates structurally valid but logically implausible call sequences. Module stomping alters page metadata. Direct syscalls leave non-ntdll return addresses in KTRAP_FRAME. The fundamental asymmetry is that defenders need to find one anomaly across all frames, all calls, and all threads over an entire session, while attackers need every frame, every call, and every thread to appear perfect every single time. NMI callbacks, instrumentation callbacks, and ETW TI stack traces operate at privilege levels that user mode evasion cannot reach. Intel Control-flow Enforcement Technology (CET) shadow stacks, available since Windows 11, maintain a hardware-backed copy of return addresses that user mode software cannot modify. The call stack is not losing relevance as a detection vector. Hardware enforcement is making it considerably stronger, and the window for purely software-based stack spoofing narrows with every generation of CET-capable hardware that reaches critical mass.