A lock-free, high-performance logging system designed for Windows kernel drivers that enables efficient data transfer between kernel and user mode. This implementation provides thread-safe logging with guaranteed ordering while minimizing performance impact.
DbgView is a really great app, but I experienced multiple crashes while using it. That was quite frustrating for me...
- Lock-Free Design: Per-processor ring buffers eliminate inter-processor contention
- Zero-Copy Architecture: Direct memory sharing between kernel and user mode
- Guaranteed Ordering: Global atomic indexing ensures sequential consistency
- High Performance: Optimized for minimal overhead in kernel mode
- Robust Error Handling: Comprehensive statistics and wraparound handling
- Thread Safety: Safe for concurrent access from any IRQL
The system allocates separate ring buffers for each CPU processor, using shared memory mapped to both kernel and user mode:
typedef struct _RING_BUFFER_CONTEXTS {
SIZE_T ProcessorCount;
RING_BUFFER_CONTEXT ProcessorBuffer[32];
} RING_BUFFER_CONTEXTS;
Each buffer is managed through a header containing read/write offsets and statistics:
typedef struct _BUFFER_HEADER {
volatile LONG WriteOffset;
volatile LONG ReadOffset;
SIZE_T BufferSize;
BUFFER_STATS Stats;
} BUFFER_HEADER;
Each log entry includes metadata for ordering and identification:
typedef struct _LOG_ENTRY {
UINT64 Timestamp; // QPC timestamp
UINT64 Index; // Global sequential index
UINT32 Processor; // CPU ID
UINT32 LogLevel; // Log severity
UINT32 Length; // Data length
UINT32 Remarks; // Additional metadata
CHAR Data[1]; // Variable-length log data
} LOG_ENTRY;
Initialize the logging system with desired buffer size:
PRING_BUFFER_CONTEXTS contexts = (PRING_BUFFER_CONTEXTS)driver_control::enable_logger((1024 * 1024) * 10);
Simple logging with formatted message:
LogMessage("[Module] Message: %s", message);
Raw data logging with custom type:
SendData(TYPE_CUSTOM, remarks, buffer, length);
Set up a callback function:
void LogCallback(PLOG_ENTRY entry, SIZE_T Length) {
// Process log entry
printf("[CPU%d][%lld] %.*s",
entry->Processor,
entry->Index,
entry->Length,
entry->Data);
}
Start reading logs:
ReadLogs(contexts, LogCallback, TRUE); // TRUE for continuous reading
- Raises IRQL to DISPATCH_LEVEL to prevent preemption
- Atomically increments global index
- Handles buffer wraparound if necessary
- Uses InterlockedCompareExchange for thread-safe offset updates
- Updates statistics
- Scans all processor buffers
- Finds entries with next sequential index
- Handles wrapped entries
- Updates read offset after successful processing
- Maintains ordering using global index
- Zero contention between processors during writes
- Minimal synchronization overhead
- Direct memory access without system calls
- Efficient handling of buffer wraparound
- No memory allocation during normal operation
The system maintains comprehensive statistics:
typedef struct _BUFFER_STATS {
volatile LONG64 LostLogCount; // Write failures
volatile LONG64 ReadLogCount; // Successfully read
volatile LONG64 WrittenLogCount; // Successfully written
volatile LONG64 OverwrittenCount; // Overwritten logs
volatile LONG64 TooBigErrorCount; // Size limit exceeded
volatile LONG64 WrapedCount; // Buffer wraps
} BUFFER_STATS;
This logging system is particularly useful for:
- Kernel-mode debugging and diagnostics
- High-performance event logging
- Kernel-to-user mode data transfer
- Network packet capture
- System monitoring and profiling
- Size buffers appropriately for your workload
- Monitor statistics to detect buffer overflow
- Process logs promptly in user mode
- Handle wraparound conditions properly
- Consider log levels for filtering
The system includes robust error handling for:
- Buffer overflow conditions
- Invalid parameters
- Memory allocation failures
- Wraparound scenarios
- Missing or corrupted data
- Maximum 32 processors supported by default
- Buffer size must be pre-allocated
- Older logs may be overwritten if not read quickly enough
- Memory usage scales with processor count