Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH connectivity issues #498

Open
jrudolph opened this issue Dec 13, 2024 · 7 comments
Open

SSH connectivity issues #498

jrudolph opened this issue Dec 13, 2024 · 7 comments

Comments

@jrudolph
Copy link
Contributor

I recently started having issues with SSH connectivity (again, but this time also with my existing cluster) and looked into it in more detail. This may or not be a duplicate/related to #443 #415, but I wanted to document my findings in case someone else runs into similar problems.

For me the connectivity issues were a complex mix of different things that overlaid each other. All of these lead to hetzner-k3s hanging (silently) in "Waiting for successful ssh connectivity".

  1. most importantly, I recently used a another ssh key for another project and ssh-agent provided the wrong one to hetzner-k3s => in that case, use_agent: true will ignore the configured keys and use the wrong keys and fail silently
  2. use_agent: false: did also fail silently because the key pair was using a password (this is documented somewhere but still a caveat)
  3. with 2.0.x, the master setup does not work out of the box for me because the hetzner firewall configuration is only created after the master has been setup, so there's no connectivity to port 22 (that's probably caused by having existing firewalls defined in hetzner cloud project)
  4. The retry logic in hetzner-k3s hammers the ssh port with requests (~ 1/s?), so that the ssh server will by default start blocking your IP pretty quickly

Here are the workarounds I used:

  1. with use_agent: true, use ssh-add -D, and ssh-add <key> to add the right key to the ssh agent
  2. with use_agent: false use a key without a password (not recommended)
  3. setup a custom firewall rule in hetzner cloud that matches on the cluster label to enable port 22 even before hetzner-k3s creates the firewall rules, using a completely clean hetzner cloud project might also work
  4. if connecting to the server is broken from the command line as well, use a VPN, restart the server, or the sshd process on the nodes (if you can still connect via console)

My main debugging tool in the end was to log in to a node, set debug logging level in /etc/ssh/sshd_config and observe the logs during the attempts (will show blocking, wrong keys, etc).

Suggestions for hetzner-k3s:

  1. with use_agent: true:
    • warn if ssh-agent provides a different key than the one declared
    • surface more detailed error messages from libssh2 (I tried but failed, I'm not a crystal developer)
  2. with use_agent: false:
    • see 1.
    • explicitly warn when provided key is password protected (e.g. grep private key file for encryption headers)
  3. Create firewall rules before/with the servers to be more resilient against existing firewall rules on the project
  4. Be less aggressive with retries when trying to contact ssh, or fail early e.g. on auth issues
@vitobotta
Copy link
Owner

Thanks for sharing your experience!

Just to clear things up about the firewall: by default, when there's no firewall attached to a server, all ports are open. So, as long as you don't have an existing firewall blocking the selected SSH port, everything should be fine.

You're right about the SSH keys, though. That said, I create clusters regularly and haven't run into any issues with SSH connectivity yet due to keys or else.

I'll check if it's possible to use the SSH shard for Crystal to verify that the key used by the agent matches the one in your config. I'll also see if it can detect whether the key is protected by a passphrase.

Regarding the firewall, you shouldn't set up an additional one in your Hetzner project. Ideally, the project should be solely dedicated to the cluster managed by hetzner-k3s. I'll make sure this information is clearer in the docs.

@MarcelHaldimann
Copy link

I think I have the same problem with a simple example.

After the master node has beend created I am able to connect with an ssh-client without a problem with my private key.

hetzner_token: <<API-KEY>>
cluster_name: test
kubeconfig_path: "./kubeconfig"
k3s_version: v1.30.8+k3s1 

networking:
  ssh:
    port: 22
    use_agent: true # set to true if your key has a passphrase
    public_key_path: "./Documents/ssh/gl-new/gl-root.pub"
    private_key_path: "./Documents/ssh/gl-new/gl-root"
  allowed_networks:
    ssh:
      - 0.0.0.0/0
    api: # this will firewall port 6443 on the nodes
      - 0.0.0.0/0
  public_network:
    ipv4: true
    ipv6: true
  private_network:
    enabled: true
    subnet: 10.0.0.0/16
    existing_network_name: ""
  cni:
    enabled: true
    encryption: false
    mode: flannel

  # cluster_cidr: 10.244.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for pod IPs
  # service_cidr: 10.43.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for service IPs. Warning, if you change this, you should also change cluster_dns!
  # cluster_dns: 10.43.0.10 # optional: IPv4 Cluster IP for coredns service. Needs to be an address from the service_cidr range


manifests:
  cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.21.0/ccm-networks.yaml"
  csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.11.0/deploy/kubernetes/hcloud-csi.yml"
#   system_upgrade_controller_deployment_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml"
#   system_upgrade_controller_crd_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml"
#   cluster_autoscaler_manifest_url: "https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/hetzner/examples/cluster-autoscaler-run-on-master.yaml"

datastore:
  mode: etcd # etcd (default) or external
  #external_datastore_endpoint: postgres://....

schedule_workloads_on_masters: false

image: debian-12 # optional: default is ubuntu-24.04
# autoscaling_image: 103908130 # optional, defaults to the `image` setting
# snapshot_os: microos # optional: specified the os type when using a custom snapshot

masters_pool:
  instance_type: cx22
  instance_count: 1
  location: nbg1

worker_node_pools:
- name: test-node-pool
  instance_type: cx22
  instance_count: 3
  location: nbg1
  # image: debian-11
  # labels:
  #   - key: purpose
  #     value: blah
  # taints:
  #   - key: something
  #     value: value1:NoSchedule
# - name: medium-autoscaled
#   instance_type: cpx31
#   instance_count: 2
#   location: nbg1
#   autoscaling:
#     enabled: true
#     min_instances: 0
#     max_instances: 3

embedded_registry_mirror:
  enabled: false # Check if your k3s version is compatible before enabling this option. You can find more information at https://docs.k3s.io/installation/registry-mirror

additional_packages:
- htop

post_create_commands:
- apt update
- apt upgrade -y
- apt autoremove -y

Log output:

mh@ione56 hetzner-k8s % hetzner-k3s create --config ./cluster-config.yml
[Configuration] Validating configuration...
[Configuration] ...configuration seems valid.
[Private Network] Creating private network...
[Private Network] ...private network created
[SSH key] Creating SSH key...
[SSH key] ...SSH key created
[Placement groups] Creating placement group test-masters...
[Placement groups] ...placement group test-masters created
[Placement groups] Creating placement group test-test-node-pool-2...
[Placement groups] ...placement group test-test-node-pool-2 created
[Instance test-master1] Creating instance test-master1 (attempt 1)...
[Instance test-master1] Instance status: starting
[Instance test-master1] Powering on instance (attempt 1)
[Instance test-master1] Waiting for instance to be powered on...
[Instance test-master1] Instance status: running
[Instance test-master1] Waiting for successful ssh connectivity with instance test-master1...
[Instance test-master1] Instance test-master1 already exists, skipping create
[Instance test-master1] Instance status: running
[Instance test-master1] Waiting for successful ssh connectivity with instance test-master1...
[Instance test-master1] Instance test-master1 already exists, skipping create
[Instance test-master1] Instance status: running
[Instance test-master1] Waiting for successful ssh connectivity with instance test-master1...
Error creating instance: timeout after 00:01:00
Instance creation for test-master1 failed. Try rerunning the create command.

Let me know if you need more informations or more logs.

Environment

I installed it with brew on macos Sonoma (14.6.1) upgraded to Sequoia (15.2) still the same problem.

Tested with:
image: debian-12
and
image: ubuntu-24.04

Also with different k3s_versionversions.

mh@ione56 hetzner-k8s % hetzner-k3s --version
2.0.9

Is there something wrong in my config?

thanks in advance
Marcel

@vitobotta
Copy link
Owner

@MarcelHaldimann do you have a passphrase on your key?

@MarcelHaldimann
Copy link

@vitobotta
Yes I do.

@vitobotta
Copy link
Owner

@vitobotta Yes I do.

Did you add the SSH key to Keychain?

@MarcelHaldimann
Copy link

Wow I am an idiot. Now it works like a charm! Thank you!

ssh-add --apple-use-keychain gl-root "~/Documents/ssh/gl-new/gl-root"

I thought I would be asked for the password.

Is there more documentation than in this example?

Thanks for the work and the support!

@vitobotta
Copy link
Owner

Wow I am an idiot. Now it works like a charm! Thank you!

ssh-add --apple-use-keychain gl-root "~/Documents/ssh/gl-new/gl-root"

I thought I would be asked for the password.

Is there more documentation than in this example?

Thanks for the work and the support!

Good point, looks like I forgot to add a mention about this for macOS. Would you mind making a small PR for this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants