Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorization Error Using MLClient with System-Assigned Managed Identity on Compute Cluster #39158

Open
gitgud5000 opened this issue Jan 13, 2025 · 2 comments
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. Machine Learning needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@gitgud5000
Copy link

gitgud5000 commented Jan 13, 2025

I'm encountering an AuthorizationFailed error when running jobs that utilize MLClient (SDK v2) on Azure Machine Learning compute clusters with managed identities. The authentication appears to fall back to RBAC successfully when executed from a compute instance. However, when attempting to access resources such as workspaces, storage accounts, Key Vaults, and VNets from within a cluster, the following error occurs:

HttpResponseError: (AuthorizationFailed) The client 'XXXXXXXXXXX' with object id 'XXXXXXXXXXX' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/read' over scope '/subscriptions/YYYYYYYYYYYYY/resourceGroups/RESOURSE_GROUP/providers/Microsoft.MachineLearningServices/workspaces/AML_WORKSPACE' or the scope is invalid. If access was recently granted, please refresh your credentials.
Code: AuthorizationFailed
Message: The client 'XXXXXXXXXXX' with object id 'XXXXXXXXXXX' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/read' over scope '/subscriptions/YYYYYYYYYYYYY/resourceGroups/RESOURSE_GROUP/providers/Microsoft.MachineLearningServices/workspaces/AML_WORKSPACE' or the scope is invalid. If access was recently granted, please refresh your credentials.

Additionally, when running with a managed identity configured in the cluster, I receive the following error:

ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
        EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured. Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
        ManagedIdentityCredential: Unexpected content type "text/plain; charset=utf-8"
        SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
        AzureCliCredential: Azure CLI not found on path
        AzurePowerShellCredential: PowerShell is not installed
        AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then, once installed, authenticate to your Azure account using 'azd auth login'.
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.

Steps to Reproduce:
Configure a compute cluster with a system-assigned managed identity.
Attempt to access the Azure ML workspace using ml_client.workspaces.get() within a job running on the cluster.

Expected Behavior:
The job should authenticate using the managed identity and access the specified resources without authorization errors.

Actual Behavior:

The job fails with an AuthorizationFailed error, indicating that the client does not have the necessary permissions to perform the 'Microsoft.MachineLearningServices/workspaces/read' action.

Question:

What is the correct procedure to configure authentication for jobs running on compute clusters using MLClient? Is it necessary to create a managed identity, assign it to the clusters, and grant that identity access to the required resources such as storage accounts, Key Vaults, and VNets?

We have created a managed identity but have not yet assigned any roles to resources. Before proceeding, we would like to know if this is the right approach since configuring these settings involves bureaucratic processes, including access control security requests.

References:

Set up authentication - Azure Machine Learning
Set up service authentication - Azure Machine Learning
Configure managed identities on Azure virtual machines (VMs)

Additionally, I have not been able to find resources on how to address this issue, and it was not a problem with SDK v1.

Any guidance on properly configuring this would be greatly appreciated.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 13, 2025
@gitgud5000
Copy link
Author

Related Issue #25925

@xiangyan99 xiangyan99 added Machine Learning Service Attention Workflow: This issue is responsible by Azure service team. and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Jan 13, 2025
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Jan 13, 2025
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. Machine Learning needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

2 participants