Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per Script Invocation Lua Memory Limits #903

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

kevin-montrose
Copy link
Contributor

@kevin-montrose kevin-montrose commented Jan 7, 2025

Another decent sized one, though hopefully this is the last "big" Lua PR - the rest I can foresee should be smaller.

TODOs:

  • Custom allocators working
  • New config options for allocators and memory limits
  • Update benchmarks
  • Get final benchmark numbers
  • Answer open questions
    • Are memory pressure updated necessary? Have a thread with .NET GC folks for this. Got our answer, they are correct to have here.
    • Behavior when scripts aborted? Redis is weird here. It's reasonable for writes that happened pre-abort to still happen. We can explore rollback if there's a pressing need, but it's non-trivial.

This introduces the ability to specify maximum memory limits for Lua scripts, currently this a single config (--lua-script-memory-limit). To enable this we also have to introduce custom allocators (--lua-memory-management-mode) for Lua, there are 3 in this PR: Native (the current behavior, where Lua provides the allocator), Tracked (where memory is acquired with NativeMemory and GC pressure is updated), and Managed (where a POH array is pre-allocated and memory is obtained from a freelist punned over that allocation).

In order to gracefully handle Lua OOMs more of the operation of LuaRunner (things like compilation and the preamble) is hidden behind Lua PCalls. This is a necessary change, as the default behavior of Lua is to abort the process in the face of OOMs - PCalls prevent that.

To make the PCall changes less expensive (and just generally less awful), I introduced some (Strong, not Pinned) GCHandles, function pointers, and trampolines. At the end of this, we're basically just using KeraLua to package Lua and define some constants - none of the .NET code is really running anymore. If we really wanted, we could build Lua ourselves (maybe even drop down to 5.1 to match Redis) and exploit that tight coupling - but I have no intention of doing so at this time.

When improving the Lua OOM RESP error, I also found a bug in previous PR around buffer management - it is fixed in this commit.

The Allocators

Native

This is the default.

This just uses the built-in Lua allocator, which is a thin shim over malloc. It should perform bit better than Tracked simply because there isn't any .NET code in the way.

Native does not support memory limits.

Tracked

A thin wrapper over NativeMemory. It supports memory limits, and will fail once total requested bytes exceeds the configure limit. Since it cannot see the overhead of NativeMemory the limit is only softly enforced.

This also calls GC.(Add|Remove)MemoryPressure to inform the GC of these (potentially fairly large) allocations.

Managed (w/ and w/o Limits)

A really basic free-list based allocator over a POH array. It pre-allocates the total limit, and (if one is configured) it strictly limits allocations since the overhead can be seen.

If a limit is not configured, 2MB (or larger, if the requested size exceeds 2MB) arrays are allocated as needed.

We could certainly do a lot better here (I imagine there's something existing in Garnet I could steal or repurpose), but this is mostly a proof we could get Lua 100% onto the managed heap. That said, I couldn't help put profile a little bit, so it shouldn't be awful given Lua's allocation patterns.

Open Questions

Is GC pressure actually needed in the Tracked case?

Docs say:

The AddMemoryPressure and RemoveMemoryPressure methods improve performance only for types that exclusively depend on finalizers to release the unmanaged resources. It's not necessary to use these methods in types that follow the dispose pattern, where finalizers are used to clean up unmanaged resources only in the event that a consumer of the type forgets to call Dispose.

Which makes very little sense to me, as in a container (like a job) with memory limits the presence or lack of a finalizer seems irrelevant to whether the GC needs to be informed of native allocations?

Ultimately the .NET GC folks will just have to answer this one, I've opened a thread with them.

Docs are (somewhat) incorrect here, and will be updated. It is correct, but not strictly necessary, to have these calls in the Tracked case. I'm leaving them in so the GC can respond more promptly to memory pressure.

What is expected behavior when a script is aborted?

This change introduces a case where a script might be aborted, and I expect future changes (timeouts, and potentially SCRIPT|KILL) to add more.

Redis doesn't allow this - you are expected to let Redis crash, or force a shutdown, if a script goes out of control. That's kind of nuts, IMO, especially for any HA service.

However, by deviating from Redis (with this opt-in switch), we do need to define expected behavior.

Right now, the behavior is "any commands that executed in the script, executed". Commands cannot half-execute, but scripts can, basically.

Is this acceptable, or do we need some (presumably configurable) rollback behavior?

With transactions enabled we already know the scope of "needs to be rolled back", but the implementation would be non-trivial.

Decision: No rollbacks

Summarizing some discussion:

If we are running in non-transaction mode, then we view Lua scripts as logically no different from a client issuing a sequence of calls, so the idea that the commands that happened pre-abort are persisted, is the only thing that makes sense.

If we are in transaction mode, it is possible some users expect atomicity - but this is going to be harder as we will not have the "before image" of keys stored anywhere. It is perfectly fine to document that there is no rollback in this situation. We will simply unlock the keys and "succeed" the partial transaction.

Benchmarks

I changed ScriptOperations to use LuaParams instead of OperationParams as we were already ignoring most of the operation variants there. Now all Lua-related benchmarks run for with different allocators enabled: Native (the old behavior, and current default), Tracked w/ 2M limit, Tracked w/o a limit, Managed w/ 2M limit, and Managed w/o limit.

main results are as of ce21c248f084744e45bbff08d0ecce0a51326cca.
luaMemoryLimits are as of 4c5a2e9893f25895c4ab33bc4e60425a431e7015.

Broadly speaking, we're giving up a bit of perf for the ability to recover from OOMs (and other runtime errors, technically). There's some work that could be done to claw bits of this back, in theory, but we are actually doing more with this change.

Having now done a fair amount of profiling and cleanup, most benchmark results are either a wash or a slight improvement after this PR.

LuaRunnerOperations

Comparing the baseline and the Native,None case, we're giving up some perf (~17%) in the CompileForSessionSmall case. We're improving everywhere else, and since the small case is so small (literally no Garnet ops, no calculations, just a return nil) I think that's fine.

main

Method Params Mean Error StdDev Median Allocated
ResetParametersSmall None 102.5 ns 0.51 ns 0.43 ns 102.6 ns -
ResetParametersLarge None 103.4 ns 0.50 ns 0.47 ns 103.4 ns -
ConstructSmall None 97,641.9 ns 609.86 ns 540.63 ns 97,877.1 ns 344 B
ConstructLarge None 99,759.4 ns 1,113.26 ns 1,041.34 ns 99,650.9 ns 3408 B
CompileForSessionSmall None 1,663.9 ns 32.35 ns 57.50 ns 1,689.1 ns -
CompileForSessionLarge None 34,445.3 ns 222.75 ns 208.36 ns 34,498.1 ns -

luaMemoryLimits

Method Params Mean Error StdDev Gen0 Gen1 Gen2 Allocated
ResetParametersSmall Managed,Limit 92.39 ns 1.688 ns 2.677 ns - - - -
ResetParametersLarge Managed,Limit 88.10 ns 1.670 ns 1.989 ns - - - -
ConstructSmall Managed,Limit 122,774.18 ns 2,438.704 ns 4,756.505 ns 3.6621 3.6621 3.6621 2097586 B
ConstructLarge Managed,Limit 125,027.68 ns 2,391.228 ns 2,846.587 ns 3.6621 3.6621 3.6621 2100652 B
CompileForSessionSmall Managed,Limit NA NA NA NA NA NA NA
CompileForSessionLarge Managed,Limit 33,452.03 ns 104.027 ns 86.867 ns - - - -
ResetParametersSmall Managed,None 88.40 ns 1.399 ns 1.309 ns - - - -
ResetParametersLarge Managed,None 88.20 ns 0.863 ns 0.765 ns - - - -
ConstructSmall Managed,None 124,249.15 ns 2,458.123 ns 4,369.311 ns 3.4180 3.4180 3.4180 2097658 B
ConstructLarge Managed,None 126,401.81 ns 2,483.423 ns 2,657.235 ns 3.6621 3.6621 3.6621 2100725 B
CompileForSessionSmall Managed,None 401,576.56 ns 68,404.658 ns 201,692.693 ns - - - -
CompileForSessionLarge Managed,None 34,420.78 ns 616.724 ns 576.884 ns - - - -
ResetParametersSmall Native,None 86.62 ns 1.191 ns 1.114 ns - - - -
ResetParametersLarge Native,None 84.51 ns 0.244 ns 0.216 ns - - - -
ConstructSmall Native,None 96,180.62 ns 930.510 ns 824.873 ns - - - 312 B
ConstructLarge Native,None 90,461.10 ns 309.843 ns 241.905 ns - - - 3376 B
CompileForSessionSmall Native,None 1,938.35 ns 8.084 ns 7.166 ns - - - -
CompileForSessionLarge Native,None 33,549.45 ns 252.168 ns 210.571 ns - - - -
ResetParametersSmall Tracked,Limit 84.94 ns 0.397 ns 0.332 ns - - - -
ResetParametersLarge Tracked,Limit 90.15 ns 0.419 ns 0.392 ns - - - -
ConstructSmall Tracked,Limit 143,864.33 ns 2,872.838 ns 4,120.139 ns 0.2441 0.2441 0.2441 386 B
ConstructLarge Tracked,Limit 137,792.16 ns 359.674 ns 300.344 ns 0.2441 0.2441 0.2441 3450 B
CompileForSessionSmall Tracked,Limit 4,253.87 ns 69.031 ns 64.571 ns 0.0076 0.0076 0.0076 -
CompileForSessionLarge Tracked,Limit 41,241.62 ns 178.716 ns 158.427 ns 0.1221 0.1221 0.1221 -
ResetParametersSmall Tracked,None 87.71 ns 0.359 ns 0.336 ns - - - -
ResetParametersLarge Tracked,None 84.27 ns 0.520 ns 0.486 ns - - - -
ConstructSmall Tracked,None 133,433.83 ns 1,055.285 ns 881.211 ns 0.2441 0.2441 0.2441 346 B
ConstructLarge Tracked,None 139,226.04 ns 1,007.826 ns 942.721 ns 0.2441 0.2441 0.2441 3409 B
CompileForSessionSmall Tracked,None 4,212.05 ns 57.363 ns 53.658 ns 0.0076 0.0076 0.0076 -
CompileForSessionLarge Tracked,None 40,261.86 ns 191.040 ns 178.699 ns 0.1221 0.1221 0.1221 -

LuaScriptCacheOperations

Cases where we construct a new LuaRunner are a bit slower, though most of these are in the error bounds. Observationally, these seem to vary a fair amount run-to-run.

main

Method Params Mean Error StdDev Median Allocated
LookupHit None 2.855 μs 0.8448 μs 2.464 μs 1.450 μs 688 B
LookupMiss None 2.504 μs 0.6450 μs 1.882 μs 3.150 μs 688 B
LoadOuterHit None 3.472 μs 0.8717 μs 2.543 μs 3.200 μs 688 B
LoadInnerHit None 220.146 μs 9.0449 μs 25.806 μs 213.550 μs 1056 B
LoadMiss None 5.845 μs 0.7413 μs 2.139 μs 6.200 μs 688 B
Digest None 14.450 μs 0.6912 μs 1.994 μs 13.800 μs 688 B

luaMemoryLimits

Method Params Mean Error StdDev Median Allocated
LookupHit Managed,Limit 3.678 μs 0.6290 μs 1.805 μs 4.250 μs 112 B
LookupMiss Managed,Limit 3.650 μs 0.7419 μs 2.164 μs 3.700 μs 64 B
LoadOuterHit Managed,Limit 7.664 μs 0.8209 μs 2.407 μs 8.100 μs 352 B
LoadInnerHit Managed,Limit 217.818 μs 11.0561 μs 32.076 μs 212.000 μs 2097632 B
LoadMiss Managed,Limit 5.897 μs 0.9842 μs 2.840 μs 6.050 μs 64 B
Digest Managed,Limit 18.413 μs 1.0777 μs 3.161 μs 18.000 μs 400 B
LookupHit Managed,None 4.062 μs 0.7716 μs 2.239 μs 3.800 μs 352 B
LookupMiss Managed,None 3.709 μs 0.4843 μs 1.397 μs 3.700 μs 352 B
LoadOuterHit Managed,None 4.998 μs 1.0616 μs 3.080 μs 5.600 μs 352 B
LoadInnerHit Managed,None 216.086 μs 13.6203 μs 39.946 μs 203.600 μs 2097744 B
LoadMiss Managed,None 7.680 μs 0.8379 μs 2.444 μs 6.850 μs 688 B
Digest Managed,None 15.133 μs 0.7565 μs 2.121 μs 14.600 μs 400 B
LookupHit Native,None 3.995 μs 0.7868 μs 2.283 μs 4.300 μs 64 B
LookupMiss Native,None 2.765 μs 0.5855 μs 1.689 μs 2.150 μs 352 B
LoadOuterHit Native,None 6.717 μs 0.6914 μs 2.006 μs 6.650 μs 352 B
LoadInnerHit Native,None 213.109 μs 9.0029 μs 25.831 μs 207.300 μs 688 B
LoadMiss Native,None 5.521 μs 1.1810 μs 3.388 μs 5.250 μs 64 B
Digest Native,None 17.971 μs 1.5372 μs 4.508 μs 17.400 μs 400 B
LookupHit Tracked,Limit 3.850 μs 0.8121 μs 2.343 μs 3.600 μs 112 B
LookupMiss Tracked,Limit 3.544 μs 0.9940 μs 2.915 μs 2.950 μs 64 B
LoadOuterHit Tracked,Limit 6.147 μs 1.2957 μs 3.800 μs 5.900 μs 400 B
LoadInnerHit Tracked,Limit 248.742 μs 12.2791 μs 35.819 μs 239.650 μs 432 B
LoadMiss Tracked,Limit 7.053 μs 1.2035 μs 3.511 μs 7.100 μs 64 B
Digest Tracked,Limit 17.226 μs 0.9407 μs 2.759 μs 16.900 μs 64 B
LookupHit Tracked,None 4.530 μs 0.8697 μs 2.537 μs 5.500 μs 64 B
LookupMiss Tracked,None 2.966 μs 0.7327 μs 2.149 μs 2.400 μs 64 B
LoadOuterHit Tracked,None 6.071 μs 1.0243 μs 2.972 μs 5.500 μs 112 B
LoadInnerHit Tracked,None 269.723 μs 11.9403 μs 34.066 μs 260.300 μs 480 B
LoadMiss Tracked,None 5.906 μs 1.4845 μs 4.354 μs 6.100 μs 112 B
Digest Tracked,None 14.648 μs 1.0948 μs 3.123 μs 14.350 μs 400 B

LuaScripts

Again comparing baseline to Native,None we're giving up a bit of perf in Script1, but improving everywhere else. Loss is ~3%, best improvement is ~20% (for Script2). Given that Script1 is again very very minimal (just a return) I think that loss is fine, especially as it's offset by gains for more complicated scripts.

main

Method Params Mean Error StdDev Gen0 Allocated
Script1 None 109.3 ns 1.09 ns 1.02 ns - -
Script2 None 174.6 ns 1.55 ns 1.38 ns 0.0002 24 B
Script3 None 248.1 ns 1.52 ns 1.35 ns 0.0005 32 B
Script4 None 228.1 ns 2.97 ns 2.78 ns - -

luaMemoryLimits

Method Params Mean Error StdDev Gen0 Allocated
Script1 Managed,Limit 111.0 ns 0.50 ns 0.44 ns - -
Script2 Managed,Limit 140.9 ns 0.72 ns 0.68 ns 0.0002 24 B
Script3 Managed,Limit 219.6 ns 0.61 ns 0.57 ns 0.0005 32 B
Script4 Managed,Limit 235.9 ns 1.17 ns 1.03 ns - -
Script1 Managed,None 110.2 ns 0.38 ns 0.36 ns - -
Script2 Managed,None 152.8 ns 0.67 ns 0.63 ns 0.0002 24 B
Script3 Managed,None 221.3 ns 0.82 ns 0.73 ns 0.0005 32 B
Script4 Managed,None 228.0 ns 0.76 ns 0.64 ns - -
Script1 Native,None 112.2 ns 0.41 ns 0.34 ns - -
Script2 Native,None 140.1 ns 0.88 ns 0.82 ns 0.0002 24 B
Script3 Native,None 217.3 ns 0.66 ns 0.61 ns 0.0005 32 B
Script4 Native,None 226.7 ns 0.60 ns 0.53 ns - -
Script1 Tracked,Limit 111.6 ns 1.01 ns 0.95 ns - -
Script2 Tracked,Limit 142.5 ns 0.75 ns 0.67 ns 0.0002 24 B
Script3 Tracked,Limit 220.8 ns 0.53 ns 0.47 ns 0.0005 32 B
Script4 Tracked,Limit 224.9 ns 0.55 ns 0.52 ns - -
Script1 Tracked,None 112.5 ns 0.40 ns 0.34 ns - -
Script2 Tracked,None 145.3 ns 2.93 ns 2.74 ns 0.0002 24 B
Script3 Tracked,None 222.4 ns 0.86 ns 0.81 ns 0.0005 32 B
Script4 Tracked,None 224.9 ns 1.52 ns 1.35 ns - -

ScriptOperations

Comparing to Native,None most of these are in the margin of error. Large and SmallScript are slightly improved (~5%).

main (eliding Params != None)

Method Params Mean Error StdDev Allocated
ScriptLoad None 80.452 μs 0.4009 μs 0.3554 μs 9600 B
ScriptExistsTrue None 18.095 μs 0.2135 μs 0.1893 μs -
ScriptExistsFalse None 17.289 μs 0.0655 μs 0.0547 μs -
Eval None 58.513 μs 0.2955 μs 0.2468 μs -
EvalSha None 24.331 μs 0.4261 μs 0.3986 μs -
SmallScript None 61.024 μs 0.2889 μs 0.2702 μs -
LargeScript None 4,297.098 μs 49.7821 μs 46.5662 μs 4 B
ArrayReturn None 110.093 μs 0.7220 μs 0.6754 μs -

luaMemoryLimits

Method Params Mean Error StdDev Gen0 Gen1 Gen2 Allocated
ScriptLoad Managed,Limit 80.06 μs 0.759 μs 0.710 μs - - - 9600 B
ScriptExistsTrue Managed,Limit 17.93 μs 0.057 μs 0.053 μs - - - -
ScriptExistsFalse Managed,Limit 17.70 μs 0.349 μs 0.416 μs - - - -
Eval Managed,Limit 58.38 μs 0.242 μs 0.214 μs - - - -
EvalSha Managed,Limit 24.84 μs 0.077 μs 0.064 μs - - - -
SmallScript Managed,Limit 49.93 μs 0.214 μs 0.190 μs - - - -
LargeScript Managed,Limit 4,618.86 μs 33.080 μs 27.623 μs - - - 8 B
ArrayReturn Managed,Limit 142.41 μs 9.226 μs 27.204 μs - - - -
ScriptLoad Managed,None 85.01 μs 0.324 μs 0.303 μs - - - 9600 B
ScriptExistsTrue Managed,None 17.80 μs 0.071 μs 0.059 μs - - - -
ScriptExistsFalse Managed,None 17.62 μs 0.070 μs 0.065 μs - - - -
Eval Managed,None 63.56 μs 0.320 μs 0.299 μs - - - -
EvalSha Managed,None 25.42 μs 0.069 μs 0.058 μs - - - -
SmallScript Managed,None 49.80 μs 0.171 μs 0.160 μs - - - -
LargeScript Managed,None 4,650.30 μs 55.005 μs 51.452 μs - - - 5 B
ArrayReturn Managed,None 145.85 μs 9.691 μs 28.574 μs - - - -
ScriptLoad Native,None 81.34 μs 0.515 μs 0.482 μs - - - 9600 B
ScriptExistsTrue Native,None 17.51 μs 0.054 μs 0.048 μs - - - -
ScriptExistsFalse Native,None 17.21 μs 0.055 μs 0.049 μs - - - -
Eval Native,None 58.91 μs 0.253 μs 0.225 μs - - - -
EvalSha Native,None 25.53 μs 0.117 μs 0.098 μs - - - -
SmallScript Native,None 51.06 μs 0.178 μs 0.157 μs - - - -
LargeScript Native,None 4,099.14 μs 43.258 μs 40.463 μs - - - 6 B
ArrayReturn Native,None 113.11 μs 1.086 μs 1.016 μs - - - -
ScriptLoad Tracked,Limit 81.43 μs 0.543 μs 0.508 μs - - - 9600 B
ScriptExistsTrue Tracked,Limit 17.51 μs 0.049 μs 0.044 μs - - - -
ScriptExistsFalse Tracked,Limit 17.12 μs 0.329 μs 0.352 μs - - - -
Eval Tracked,Limit 63.88 μs 0.406 μs 0.380 μs - - - -
EvalSha Tracked,Limit 25.36 μs 0.242 μs 0.189 μs - - - -
SmallScript Tracked,Limit 50.17 μs 0.175 μs 0.164 μs - - - -
LargeScript Tracked,Limit 4,922.04 μs 27.475 μs 25.700 μs 15.6250 15.6250 15.6250 18 B
ArrayReturn Tracked,Limit 122.29 μs 1.110 μs 1.039 μs - - - -
ScriptLoad Tracked,None 82.61 μs 0.715 μs 0.669 μs - - - 9600 B
ScriptExistsTrue Tracked,None 17.49 μs 0.091 μs 0.085 μs - - - -
ScriptExistsFalse Tracked,None 16.77 μs 0.061 μs 0.054 μs - - - -
Eval Tracked,None 59.48 μs 0.342 μs 0.320 μs - - - -
EvalSha Tracked,None 25.27 μs 0.087 μs 0.077 μs - - - -
SmallScript Tracked,None 50.78 μs 0.238 μs 0.223 μs - - - -
LargeScript Tracked,None 4,894.27 μs 18.255 μs 17.076 μs 15.6250 15.6250 15.6250 18 B
ArrayReturn Tracked,None 132.90 μs 1.033 μs 0.916 μs - - - -

@kevin-montrose kevin-montrose marked this pull request as ready for review January 8, 2025 15:12
@badrishc
Copy link
Contributor

badrishc commented Jan 9, 2025

LuaScripts BDN - Giving up ~32% in the worst case

This would be the most concerning for the PR. What is causing this drop, and if it is the trampoline, then is there a way to enable an unsafe mode that avoids this overhead?

kevin-montrose and others added 7 commits January 9, 2025 14:26
1) Added a check for NA in results which is an indication that the BDN test failed at run time
2) Added 'Lua.LuaScriptCacheOperations','Lua.LuaRunnerOperations' to BDN Github Action
3) Updated Expected values for the new Lua BDN tests
@kevin-montrose
Copy link
Contributor Author

LuaScripts BDN - Giving up ~32% in the worst case

This would be the most concerning for the PR. What is causing this drop, and if it is the trampoline, then is there a way to enable an unsafe mode that avoids this overhead?

Doing some light profiling, it's the extra pcall layer. I'll look at clawing some of this back.

Vijay-Nirmal and others added 2 commits January 10, 2025 18:09
* Added LCS command

* Format fix

* Reverted CommandDocsUpdater.cs

* Fix cluster test

* Fixed wrong change

* Moved to constant

* Review command fixes

* Fixed review comment

* Fixed test issue

---------

Co-authored-by: Vasileios Zois <[email protected]>
Co-authored-by: Tal Zaccai <[email protected]>
* Configure min and max IO completion threads separately from min and max threads (in the ThreadPool). This is needed as some scenarios may limit number of thread pool threads but require a larger number of IO completion threads.

* nit
@kevin-montrose
Copy link
Contributor Author

LuaScripts BDN - Giving up ~32% in the worst case

This would be the most concerning for the PR. What is causing this drop, and if it is the trampoline, then is there a way to enable an unsafe mode that avoids this overhead?

@badrishc I've pushed up some changes that claw most of the perf loss back, turning a few cases into improvements even.

…re, which would cause SendAndReset() to fail thinking the message was too large
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants