Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start worker (cloud-run-v2) on google cloud run service. #16673

Closed
PtrBld opened this issue Jan 10, 2025 · 5 comments · Fixed by #16738
Closed

Unable to start worker (cloud-run-v2) on google cloud run service. #16673

PtrBld opened this issue Jan 10, 2025 · 5 comments · Fixed by #16738
Labels

Comments

@PtrBld
Copy link

PtrBld commented Jan 10, 2025

Bug summary

Hi all. I tried to start a worker for my work pool on google cloud following the guide here: https://docs.prefect.io/integrations/prefect-gcp/gcp-worker-guide

I run the following command to deploy the worker:

gcloud run deploy prefect-worker --image=prefecthq/prefect:3-latest \
--set-env-vars PREFECT_API_URL=$PREFECT_API_URL,PREFECT_API_KEY=$PREFECT_API_KEY \
--service-account XXX \
--no-cpu-throttling \
--min-instances 1 \
--args "prefect","worker","start","--install-policy","always","--with-healthcheck","-p","gcp-work-pool","-t","cloud-run-v2"

This creates the following logs:

Deploying container to Cloud Run service [prefect-worker] in project [XXX] region [XXX]
X  Deploying...
  -  Creating Revision...
  .  Routing traffic...
Deployment failed


ERROR: (gcloud.run.deploy) Revision 'prefect-worker-00012-nzh' is not ready and cannot serve traffic. The user-provided container failed to start and listen on the port defined provided by the PORT=8080 environment variable. Logs for this revision might contain more information.

Seeing this in google cloud run logs:

/opt/prefect/entrypoint.sh: line 25: exec: prefect worker start --install-policy always --with-healthcheck -p gcp-work-pool -t cloud-run-v2: not found
Container called exit(127).
Default STARTUP TCP probe failed 1 time consecutively for container "prefect-1" on port 8080. The instance was not started

When I start the worker locally everything works.
Thanks for your help!

Version info

Version:             3.1.5
API version:         0.8.4
Python version:      3.10.11
Git commit:          3c06654e
Built:               Mon, Dec 2, 2024 6:57 PM
OS/Arch:             win32/AMD64
Profile:             ephemeral
Server type:         cloud
Pydantic version:    2.9.2
Integrations:
  prefect-docker:    0.6.2
  prefect-gcp:       0.6.2

Additional context

No response

@PtrBld PtrBld added the bug Something isn't working label Jan 10, 2025
@cicdw cicdw added the docs label Jan 10, 2025
@cicdw
Copy link
Member

cicdw commented Jan 10, 2025

It looks to me like the default startup probe for GCR is a TCP probe so I think the docs are missing a step here that should include explicitly configuring an HTTP startup probe.

Is there any chance you'd be able to try and set that up and see if it resolves the issue for you? If so we can update the docs!

@PtrBld
Copy link
Author

PtrBld commented Jan 13, 2025

Thanks @cicdw, this seems to be part of the problem. After changing the startup probe to http and setting the maximum delay for it (240s), I still get the following errors:

2025-01-11 16:07:32.662 GMT
Successfully installed google-api-core-2.24.0 google-api-python-client-2.158.0 google-auth-2.37.0 google-auth-httplib2-0.2.0 google-cloud-core-2.4.1 google-cloud-secret-manager-2.22.0 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 grpc-google-iam-v1-0.14.0 grpcio-status-1.69.0 httplib2-0.22.0 prefect-gcp-0.6.2 proto-plus-1.25.0 pyasn1-0.6.1 pyasn1-modules-0.4.1 pyparsing-3.2.1 rsa-4.9 tenacity-9.0.0 uritemplate-4.1.1
2025-01-11 16:07:32.662 GMT
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
2025-01-11 16:07:34.951 GMT
Worker 'CloudRunWorkerV2 e1a50837-5317-4af1-bd72-8ad67db620f2' started!
2025-01-11 16:07:35.291 GMT
16:07:35.289 | WARNING | EventsWorker - Still processing items: 1 items remaining...
2025-01-11 16:11:14.533 GMT
STARTUP HTTP probe failed 1 time consecutively for container "prefect-1" on path "/api/health". The instance was not started.

The new start command is:

gcloud beta run deploy prefect-worker --image=prefecthq/prefect:3-latest \
--set-env-vars PREFECT_API_URL=XXX,PREFECT_API_KEY=XXX \
--service-account [email protected] \
--no-cpu-throttling \
--min-instances 0 \
--max-instances 1 \
--startup-probe httpGet.port=8080,httpGet.path=/api/health,initialDelaySeconds=240,periodSeconds=20,timeoutSeconds=20 \
--args "prefect","worker","start","--install-policy","always","--with-healthcheck","--pool","workpool"

Do you have an idea what could be causing this issue?

@cicdw
Copy link
Member

cicdw commented Jan 13, 2025

@PtrBld could you try it without the /api prefix, i.e.,

--startup-probe httpGet.port=8080,httpGet.path=/health,initialDelaySeconds=240,periodSeconds=20,timeoutSeconds=20 \

@PtrBld
Copy link
Author

PtrBld commented Jan 15, 2025

Thank you so much @cicdw. It is working now. The final command was:

gcloud beta run deploy prefect-worker --image=prefecthq/prefect:3-latest \
--set-env-vars PREFECT_API_URL=XXX,PREFECT_API_KEY=XXX \
--service-account [email protected] \
--no-cpu-throttling \
--min-instances 1 \
--max-instances 1 \
--startup-probe httpGet.port=8080,httpGet.path=/health,initialDelaySeconds=100,periodSeconds=20,timeoutSeconds=20 \
--args "prefect","worker","start","--install-policy","always","--with-healthcheck","--pool","workpool"

@cicdw
Copy link
Member

cicdw commented Jan 15, 2025

Awesome thank you for being patient and giving those a try! I'll update the docs now.

@cicdw cicdw removed the bug Something isn't working label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants