Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128002: use per threads tasks linked list in asyncio #128869

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Jan 15, 2025

Use per-thread linked list of tasks in asyncio. This design allows for lock free register/unregister of tasks of loops running concurrently in different threads. It uses the stop the world pause to traverse the list of tasks from all threads from the thread where all_tasks is called. This has no performance impact on regular builds as per benchmarks and performs a bit faster on free-threading benchmarks. pyperformance benchmarks aren't good for this because it uses just one thread so there is little lock contention, this however performs much better when multiple threads are running.

On free-threading:

Benchmark bm-20250111-linux-x86_64-python-3a570c6d58bd5ad7d7c1-3.14.0a3+-3a570c6 bm-20250112-linux-x86_64-kumaraditya303-per_thread_tasks-3.14.0a3+-cca4057
async_tree_cpu_io_mixed_tg 556 ms 536 ms: 1.04x faster
async_tree_none_tg 303 ms 294 ms: 1.03x faster
coroutines 26.2 ms 25.5 ms: 1.03x faster
async_tree_memoization_tg 397 ms 387 ms: 1.03x faster
async_tree_cpu_io_mixed 598 ms 583 ms: 1.03x faster
async_generators 498 ms 486 ms: 1.02x faster
async_tree_io 748 ms 733 ms: 1.02x faster
async_tree_io_tg 696 ms 682 ms: 1.02x faster
async_tree_memoization 442 ms 436 ms: 1.02x faster
async_tree_none 349 ms 344 ms: 1.01x faster
Geometric mean (ref) 1.02x faster

Benchmark hidden because not significant (1): asyncio_websockets

@1st1
Copy link
Member

1st1 commented Jan 15, 2025

First, great work on this, this is legitimately a cool PR.

That said, I'm feeling really uneasy about _PyEval_StopTheWorld. Maybe instead of this approach we try spin locks + a custom mini hash map data structure? That would obviously be a lot more work, but a cleaner and ultimately better perf approach.

Also please wait for reviews from @pablogsal and @ambv. I'm curious if this would make external introspection harder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants