Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoongArch64: fixed cscal and zscal #5078

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

XiWeiGu
Copy link
Contributor

@XiWeiGu XiWeiGu commented Jan 16, 2025

For the parameters float x[2] = {NaN, NaN} and float alpha[2] = {0.0, 0.0}, the optimized cscal interface does not directly copy 0.0 to x but continues performing complex multiplication, resulting in an output of {NaN, NaN}.
The optimized zscal has the same issue. This problem was detected in LAPACK tests, but the existing OpenBLAS test cases do not cover this scenario. It may be considered for inclusion in future test cases.

@XiWeiGu XiWeiGu changed the title La64 fixed cscal zscal LoongArch64: fixed cscal and zscal Jan 16, 2025
@martin-frbg
Copy link
Collaborator

I wonder if this will lead us down the same path of adding a special flag for array zeroing vs IEEE compliance as with non-complex SCAL :(

@XiWeiGu XiWeiGu force-pushed the la64_fixed_cscal_zscal branch from cb8cc3f to 038e0fb Compare January 17, 2025 01:41
@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jan 17, 2025

You reminded me that we also need to add flags for cscal and zscal.
For the following code:

#include <stdio.h>
#include "cblas.h"
#include <complex.h>
int main() {
    // 向量长度
    int N = 1;

    // 复数缩放因子 alpha
    float alpha[2] = {0.0, 0.0}; 

    // 复数向量 X
    float X[2] = {NAN, NAN};

    // 打印缩放前的向量
    printf("Before scaling:\n");
    for (int i = 0; i < N; i++) {
        printf("X[%d] = %f + %f\n", i, X[i], X[i + 1]);
    }

    // 缩放向量
    cblas_cscal(N, alpha, X, 1);

    // 打印缩放后的向量
    printf("\nAfter scaling:\n");
    for (int i = 0; i < N; i++) {
        printf("X[%d] = %f + %f\n", i, X[i], X[i + 1]);
    }

    return 0;
}

Test output using MKL 2024.2 version:

Before scaling:
X[0] = nan + nan

After scaling:
X[0] = nan + nan

The same output as MKL when using reference BLAS Version 3.12.0.
Test output using OpenBLAS V0.3.29(TARGET=HASWELL):

Before scaling:
X[0] = nan + nan

After scaling:
X[0] = 0.000000 + 0.000000

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jan 17, 2025

This PR will introduce new issues to s/zscal and needs to be revised. (It seems that other platforms also need modifications to avoid the above issues.)

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jan 17, 2025

I submitted a PR #5081 attempting to fix the implementation in C.

@martin-frbg
Copy link
Collaborator

Thanks, I'll try to take a stab at the other implementations over the weekend.

@XiWeiGu XiWeiGu force-pushed the la64_fixed_cscal_zscal branch from 038e0fb to 6b27f17 Compare January 20, 2025 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants