Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SGEMM_DIRECT Kernel based on SME1 #5084

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

vaiskv
Copy link

@vaiskv vaiskv commented Jan 19, 2025

This PR contains support for sgemm_direct kernel based on SME1 architecture.
sgemm_direct kernel handles a special case of cblas_sgemm() level 3 API where aplha =1 and beta=0.

@martin-frbg
Copy link
Collaborator

Thanks. Convenient that this special case does not imply debugging TRMM and SYMM as with the general case SGEMM kernel in #5011 that I still hope to get to soon. :/
Having unguarded -march flags that require a recent compiler is not going to work for all users however, I think we'll need the same kludge as used on x86_64 - ifdef the source on HAVE_SME and use an empty function declaration in the #else branch if not available, then rely on the required compiler flags being provided by Makefile.arm64 (or kernel/Makefile in the DYNAMIC_ARCH case, as currently for SkylakeX and newer) . Also I don't think one can use HWCAP on Apple, there should be a feature flag HAVE_SME provided by the build scripts (and "eventually" by the runtime detection code in dynamic_arm64.c too)

@martin-frbg
Copy link
Collaborator

HarmonyOS doesn't seem to support HWCAP either, and AppleClang balks at the "else if" introduced in common_s.h. I'll see if I can unravel and test locally.

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

Sure, Thanks!. I am working on restructuring the code to HAVE_SME flag as per your suggestion.

@@ -213,9 +213,9 @@
#ifdef ARCH_X86_64
#define SGEMM_DIRECT_PERFORMANT gotoblas -> sgemm_direct_performant
#define SGEMM_DIRECT gotoblas -> sgemm_direct
#else
#else if ARCH_ARM64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#elif ARCH_ARM64

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@martin-frbg
Copy link
Collaborator

Embarassingly it looks as if I nuked the HAVE_SME I had put in cpuid_arm64.c back in november (#4971) with some later change... bad things still sometimes happen when I jump between machines that are not always on the internet :(

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

I have a question:
I am compiling the library with the following command:

make BINARY=64 CC=aarch64-linux-android33-clang ONLY_CBLAS=1 HOSTCC=gcc TARGET=ARMV9SME DYNAMIC_ARCH=1

From the interface/gemm.c file, SGEMM_DIRECT kernel gets compiled only when DYNAMIC_ARCH=1. So when the library is compiled with DYNAMIC_ARCH=1 and if the TARGET is set to ARMV8 instead of AMRV9SME, does the function defined in kernel/arm64/sgemm_direct_arm64_sme1.c also be part of the library? I am assuming it won't be part of it as we have guarded the file with HAVE_SME. But then how can we ensure the library is supported on all Arm targets (Arm v8 , v9 etc)?

@martin-frbg
Copy link
Collaborator

With DYNAMIC_ARCH, TARGET is only used for the common code (interface/gemm.c and all the other interfaces, driver/level3 and so on), and the codes under kernel/arm64 are compiled in a loop with TARGET_CORE set to each of the individual models ARMV8, ARMV8SVE, etc. supported in DYNAMIC_ARCH configuration. So what I tried to express is that your kernel/arm64/sgemm_direct_arm64_sme1.c should look roughly like

#ifdef HAVE_SME
void CNAME (... 
{
your sme code goes here
}
#else 
void CNAME(...)
{}
#endif

so that the compiler finds something to compile (even if it is an empty function) whether it is running for a target with -march=armv9+sme... temporarily defined or not.
And maybe instead of string comparing the current cpu name in the DYNAMIC_ARCH situation at runtime, we should have a supports_sme() function in driver/others/dynamic_arm64.c that does the AT_HWCAP call (or its alternatives for other platforms) and returns 0 or 1, like the support_avx512 is used in interface/gemm.c for x86_64

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

Got it. I will update the code and push the updated patch. Thanks!

@martin-frbg
Copy link
Collaborator

Umm, TARGET=ARMV9SME tells me you're already building on AymenQ's unmerged #5011 ? In that case I might refrain from putting back the HAVE_SME and we could just live with the TARGET name(s) like in the Skylakex sgemm_direct kernel.
(I'm just unsure if the Apple M4 is going to be that representative of other SME-capable Arm64 targets to come)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants