ceil/ceilf/floor/floorf seem to do a lot of extra work #219

Lokathor · 2019-08-11T23:52:31Z

So I had a look at the ceiling and floor operations recently, and I was eventually pointed to the magical f32 constant 8388608.0_f32. This is the smallest f32 that can't have a fractional part. In other words, 8388607.5 is the biggest f32 that has a fractional part, and you can have f32 values greater than that but they'll always be whole number values. The next bit pattern is 8388608 (no fractional bits active). If we step 1 bit higher we have an active fractional bit, but the value is 8388609, still a whole number.

The source of this magical constant is that you want the exponent part to be 2^[mantissa bits stored], so for f32 you want the exponent part to be 2^23. The same concept holds with f64, you just have an exponent part of 2^52: 4503599627370496.0_f64

This means that we can have a ceilf function for f32 that's really simple:

pub fn ceilf(f: f32) -> f32 {
    if absf(f).to_bits() < 8_388_608.0_f32.to_bits() {
      let truncated = f as i32 as f32;
      if truncated < f {
        truncated + 1.0
      } else {
        truncated
      }
    } else {
      f
    }
}

This will pass the test assert_eq!(ceilf(val), val.ceil()) for all possible 32-bit patterns a float can take. I haven't done a test with the f64 version for all possible 64-bit patterns of course, but no value tested so far has shown a different result than the stdlib result.

The current libm implementation of ceilf is, well, a lot more steps than that. Similarly, ceil, floor, and floorf are all doing quite a bit of work.

Is there some sort of spec that the current functions are trying to match with? Or should we consider converting to this simpler style?

The text was updated successfully, but these errors were encountered:

Lokathor · 2019-08-11T23:56:43Z

Particularly, I don't understand what force_eval! is supposed to do except perhaps to set exception state flags in the FPU, which you can't access in safe rust and LLVM optimizes as if there's no floating point state anyway so you basically can't ever rely on particular flags having been set or not.

gnzlbg · 2019-08-12T07:41:05Z

I was eventually pointed to the magical f32 constant 8388608.0_f32.

FWIW that constant is just 2^23. I don't have it fresh in my mind, but I recall that another trick to make this faster was to avoid moving the fp-register to an integer register and back in let truncated = f as i32 as f32; by doing something like if !(x.abs() > 2^23) { if x > 0.0 { x+= 2^23; x-=2^23; } else { x -=2^23; x += 2^23; } } instead, but then the fp-rounding mode needs to be set to +inf in the x > 0.0 branch, which means one needs to save the rounding mode and restore it before returning..

Particularly, I don't understand what force_eval! is supposed to do except perhaps to set exception state flags in the FPU,

A correct ceilf needs to set the exception flags properly.

which you can't access in safe rust and LLVM optimizes as if there's no floating point state anyway so you basically can't ever rely on particular flags having been set or not.

You can do a FFI call to some code that assumes that the ceilf symbol is a correct implementation of ceilf. That code might only be correct if the rounding mode and the exception flags are properly used. Even without a FFI call, one can use a single inline asm! block in unsafe Rust that: changes the rounding mode, calls ceilf, inspects the exception flags and the rounding mode, and resets the rounding mode to Rust's default.

This will pass the test assert_eq!(ceilf(val), val.ceil()) for all possible 32-bit patterns a float can take.

I think this should work.

Lokathor · 2019-08-13T08:30:36Z

Update: I did some Discord chatting with @gnzlbg and we looked up the "spec" for this operation: https://en.cppreference.com/w/c/numeric/math/ceil

A key part of the spec is that

The operation must ignore rounding mode.
Exceptions are optional, meaning that a conforming implementation is free to not raise the FE_INEXACT exception if they choose.

In the Rust reference it's specified that f32 -> i32 is always a round toward zero, aka truncate. Either rustc emits the correct LLVM IR so that f as i32 always truncates regardless of rounding mode, or you've got bigger problems on your hands. Checking goldbolt output, cvttss2si is a forced truncate instruction, so that's correct and we're all good there.

As to setting exception flags properly: we're free to not set the flag during this function, so we basically can just not worry about skipping it.

Now to the final issue, of jumping between register types, an unfortunate drag that can sometimes be avoided if we're willing to go architecture specific. On x86/x86_64: using sse2 instructions we can just truncate __m128 to __m128i but it stays as an xmm register the whole time. If sse4.1 is available we can even just skip all this nonsense and call the ceil instruction directly. Here I'm referring only to what's available in Stable, because on Nightly there's always the inline ASM option for each specific arch, but that's not very practical.

Given all this info, I think we're probably agreed that the function can be simplified without affecting correctness, though there's two possibilities it seems: truncate or + then -.
In New benchmarks #220 we're trying to sort out benchmarks. Obviously we need to get our benchmark story set before we can benchmark possible replacements.
Also, we might need to sort out the issue of having various implementations configured in.

I'd say that this is "on hold" until those others get sorted out.

gnzlbg · 2019-08-13T08:48:57Z

On x86/x86_64: using sse2 instructions we can just truncate __m128 to __m128i but it stays as an xmm register the whole time.

Note that x86 CPUs often have different execution units for floating-point and integer operations, and switching between takes a couple of cycles (e.g. see Table 13.3), so depending on what the user was doing with the float, and what we do with the integer, we might see a cost for the roundtrip.

plugwash · 2021-01-02T04:01:57Z

The current implementation of floor/ceil is broken on x87, I suspect this is a result of some of the calculations being carried out with excess precision. This is currently blocking the inclusion of rust-libm in Debian bullseye.

The approach based on converting to an integer does not seem like it would suffer from that problem. So it seems like a safer default.

(round, roundf and rem_p2iof seem to suffer from the same issue, ceilf and floorf do not seem to use the add/sub approach)

Lokathor added the question Further information is requested label Aug 11, 2019

plugwash mentioned this issue Jan 2, 2021

Incorrect value of sin(13.11250158914107) on x87 #243

Closed

plugwash mentioned this issue Jan 2, 2021

Fix testcases on x87 #249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceil/ceilf/floor/floorf seem to do a lot of extra work #219

ceil/ceilf/floor/floorf seem to do a lot of extra work #219

Lokathor commented Aug 11, 2019

Lokathor commented Aug 11, 2019

gnzlbg commented Aug 12, 2019

Lokathor commented Aug 13, 2019

gnzlbg commented Aug 13, 2019 •

edited

Loading

plugwash commented Jan 2, 2021

ceil/ceilf/floor/floorf seem to do a lot of extra work #219

ceil/ceilf/floor/floorf seem to do a lot of extra work #219

Comments

Lokathor commented Aug 11, 2019

Lokathor commented Aug 11, 2019

gnzlbg commented Aug 12, 2019

Lokathor commented Aug 13, 2019

gnzlbg commented Aug 13, 2019 • edited Loading

plugwash commented Jan 2, 2021

gnzlbg commented Aug 13, 2019 •

edited

Loading