-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Some security features are enabled out of the box now. Please check the change logs. Highlight
So snippets and other related annotations may be the underlying reason but not visible, yet leading up to the behavior you see. /remove-kind bug Since it can not be reproduced yet, lets wait till more data is posted here to apply the bug label |
Thanks for the hints! Really appreciate it. Tried with some updated helm config, to set all those values to old defaults.
I cannot reproduce yet. Will try to add tomorrow some certs, some domains, some ingress objects to another cluster to get this somehow in a reproducable way. Can currently test only nightly in a small timeframe due to the impact this has and being only reproducable on a cluster with important traffic. Maybe it has to do something with CDN as well. As I see less requests on ingress after migration then before if I run some integration tests. |
#11821 |
Keeping this open while our investigation is running. We cannot explain it yet.
Will fill up with more details as soon as we have understood it deeper.
As it broke only a few environments it is harder to debug.
But it it is warning to check your log lines duriung upgrade
What happened:
Upgraded our ingress-controller via helm from
to
Causing a major outage on 4/10 clusters. We can not understand yet why.
Kubernetes version 1.31.x
What you expected to happen:
Ingress controller continues to work.
I am not sure yet. I keep it open while we investigate deeper.
Kubernetes version (use
kubectl version
):v1.31.3-eks-59bf375
Environment:
AWS / EKS
AWS
uname -a
):Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
kubectl version
kubectl get nodes -o wide
Other data is going to follow after we did a breakdown
How to reproduce this issue:
Hard to reproduce as it is currently happening on the nodes which we cannot test again.
Update 10.01 - 00:10 - Tested again a deployment of the faulty version. Ssl certs were sendings as K8s Fake certs on some domains but the old version were sending the real letsencrypt certs. Looks like a TLS issue after upgrade.
The text was updated successfully, but these errors were encountered: