-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceil/ceilf/floor/floorf seem to do a lot of extra work #219
Comments
Particularly, I don't understand what |
FWIW that constant is just 2^23. I don't have it fresh in my mind, but I recall that another trick to make this faster was to avoid moving the fp-register to an integer register and back in
A correct
You can do a FFI call to some code that assumes that the
I think this should work. |
Update: I did some Discord chatting with @gnzlbg and we looked up the "spec" for this operation: https://en.cppreference.com/w/c/numeric/math/ceil A key part of the spec is that
In the Rust reference it's specified that f32 -> i32 is always a round toward zero, aka truncate. Either rustc emits the correct LLVM IR so that As to setting exception flags properly: we're free to not set the flag during this function, so we basically can just not worry about skipping it. Now to the final issue, of jumping between register types, an unfortunate drag that can sometimes be avoided if we're willing to go architecture specific. On x86/x86_64: using
I'd say that this is "on hold" until those others get sorted out. |
Note that x86 CPUs often have different execution units for floating-point and integer operations, and switching between takes a couple of cycles (e.g. see Table 13.3), so depending on what the user was doing with the float, and what we do with the integer, we might see a cost for the roundtrip. |
The current implementation of floor/ceil is broken on x87, I suspect this is a result of some of the calculations being carried out with excess precision. This is currently blocking the inclusion of rust-libm in Debian bullseye. The approach based on converting to an integer does not seem like it would suffer from that problem. So it seems like a safer default. (round, roundf and rem_p2iof seem to suffer from the same issue, ceilf and floorf do not seem to use the add/sub approach) |
So I had a look at the ceiling and floor operations recently, and I was eventually pointed to the magical
f32
constant8388608.0_f32
. This is the smallestf32
that can't have a fractional part. In other words,8388607.5
is the biggestf32
that has a fractional part, and you can havef32
values greater than that but they'll always be whole number values. The next bit pattern is8388608
(no fractional bits active). If we step 1 bit higher we have an active fractional bit, but the value is8388609
, still a whole number.The source of this magical constant is that you want the exponent part to be 2^[mantissa bits stored], so for
f32
you want the exponent part to be 2^23. The same concept holds withf64
, you just have an exponent part of 2^52:4503599627370496.0_f64
This means that we can have a
ceilf
function forf32
that's really simple:This will pass the test
assert_eq!(ceilf(val), val.ceil())
for all possible 32-bit patterns a float can take. I haven't done a test with thef64
version for all possible 64-bit patterns of course, but no value tested so far has shown a different result than the stdlib result.The current libm implementation of ceilf is, well, a lot more steps than that. Similarly,
ceil
,floor
, andfloorf
are all doing quite a bit of work.Is there some sort of spec that the current functions are trying to match with? Or should we consider converting to this simpler style?
The text was updated successfully, but these errors were encountered: