Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Create Cluster #515

Open
SasRei opened this issue Jan 17, 2025 · 11 comments
Open

Unable to Create Cluster #515

SasRei opened this issue Jan 17, 2025 · 11 comments

Comments

@SasRei
Copy link

SasRei commented Jan 17, 2025

Hello together

Many thanks for this cool project.

I am trying to create a cluster and i am stuck.

OS: Rocky 8 or Rocky 9 or Ubuntu 22.04 or Docker

I followed the instructions Installation.md, kubectl and Helm are installed, Dependencies are installed, hetzner-k3s-linux-amd64 is installed, and always get the following error:

Unhandled exception: Nil assertion failed (NilAssertionError)
  from /usr/lib/crystal/core/nil.cr:113:7 in 'not_nil!'
  from /usr/lib/crystal/core/nil.cr:109:3 in 'not_nil!'
  from /usr/lib/crystal/core/crystal/system/unix/fiber.cr:11:5 in 'initialize'
  from /usr/lib/crystal/core/gc/boehm.cr:147:5 in 'parse_and_run'
  from /home/runner/work/hetzner-k3s/hetzner-k3s/src/hetzner-k3s.cr:96:1 in '__crystal_main'
  from /usr/lib/crystal/core/crystal/main.cr:129:5 in 'main'
  from src/env/__libc_start_main.c:95:2 in 'libc_start_main_stage2'

the network is created and the SSH key is generated, but then it goes no further.

It also makes no difference which config examples I use. Hello-World works just as little as Creating_a_cluster.md.

I'm stumped, what am I doing wrong?

regards
Sascha

@vitobotta
Copy link
Owner

Is your SSH key protected with a passphrase?

@SasRei
Copy link
Author

SasRei commented Jan 18, 2025

No, I just created a new one without a password and even used the same name as in the example

@vitobotta
Copy link
Owner

Can you share your config file (minus the token)? I will try creating the same cluster from a Ubuntu VPS.

OS: Rocky 8 or Rocky 9 or Ubuntu 22.04 or Docker

Do you mean that you have been running hetzner-k3s from inside these environments?

@SasRei
Copy link
Author

SasRei commented Jan 18, 2025

Do you mean that you have been running hetzner-k3s from inside these environments?

Yes.

My default is Rocky 9, it didn't work there, so I thought Rocky 9 might be too new, so I tried Rocky 8, that didn't work so I tried Docker on the two OS and when that didn't work either, I tried Ubuntu.

---
hetzner_token: xxx
cluster_name: test1
kubeconfig_path: "./kubeconfig"
k3s_version: v1.26.7+k3s1

networking:
  ssh:
    port: 22
    use_agent: false # set to true if your key has a passphrase
    public_key_path: "~/.ssh/id_ed25519.pub"
    private_key_path: "~/.ssh/id_ed25519"
  allowed_networks:
    ssh:
      - 0.0.0.0/0
    api: # this will firewall port 6443 on the nodes; it will NOT firewall the API load balancer
      - 0.0.0.0/0
  public_network:
    ipv4: true
    ipv6: true
  private_network:
    enabled : true
    subnet: 10.3.0.0/16
  cni:
    enabled: true
    encryption: false
    mode: flannel

masters_pool:
  instance_type: cpx11
  instance_count: 3
  image: ubuntu-24.04
  location: nbg1

worker_node_pools:
- name: pool-01
  instance_type: cpx11
  instance_count: 1
  image: ubuntu-24.04
  location: nbg1

root@hetzner-k3s:/opt# hetzner-k3s create --config test1.yaml | tee create.log
[Configuration] Validating configuration...
[Configuration] ...configuration seems valid.
[Private Network] Creating private network...
[Private Network] ...private network created
[SSH key] Creating SSH key...
[SSH key] ...SSH key created
Unhandled exception: Nil assertion failed (NilAssertionError)
from /usr/lib/crystal/core/nil.cr:113:7 in 'not_nil!'
from /usr/lib/crystal/core/nil.cr:109:3 in 'not_nil!'
from /usr/lib/crystal/core/crystal/system/unix/fiber.cr:11:5 in 'initialize'
from /usr/lib/crystal/core/gc/boehm.cr:147:5 in 'parse_and_run'
from /home/runner/work/hetzner-k3s/hetzner-k3s/src/hetzner-k3s.cr:96:1 in '__crystal_main'
from /usr/lib/crystal/core/crystal/main.cr:129:5 in 'main'
from src/env/__libc_start_main.c:95:2 in 'libc_start_main_stage2'

---
hetzner_token: xxx
cluster_name: test2
kubeconfig_path: "./kubeconfig"
k3s_version: v1.30.3+k3s1

networking:
  ssh:
    port: 22
    use_agent: false # set to true if your key has a passphrase
    public_key_path: "~/.ssh/id_ed25519.pub"
    private_key_path: "~/.ssh/id_ed25519"
  allowed_networks:
    ssh:
      - 0.0.0.0/0
    api: # this will firewall port 6443 on the nodes
      - 0.0.0.0/0
  public_network:
    ipv4: true
    ipv6: true
  private_network:
    enabled: true
    subnet: 10.3.0.0/16
    existing_network_name: ""
  cni:
    enabled: true
    encryption: false
    mode: flannel

  # cluster_cidr: 10.244.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for pod IPs
  # service_cidr: 10.43.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for service IPs. Warning, if you change this, you should also change cluster_dns!
  # cluster_dns: 10.43.0.10 # optional: IPv4 Cluster IP for coredns service. Needs to be an address from the service_cidr range


# manifests:
#   cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.20.0/ccm-networks.yaml"
#   csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.9.0/deploy/kubernetes/hcloud-csi.yml"
#   system_upgrade_controller_deployment_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml"
#   system_upgrade_controller_crd_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml"
#   cluster_autoscaler_manifest_url: "https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/hetzner/examples/cluster-autoscaler-run-on-master.yaml"

datastore:
  mode: etcd # etcd (default) or external
  external_datastore_endpoint: postgres://....

schedule_workloads_on_masters: false

# image: rocky-9 # optional: default is ubuntu-24.04
# autoscaling_image: 103908130 # optional, defaults to the `image` setting
# snapshot_os: microos # optional: specified the os type when using a custom snapshot

masters_pool:
  instance_type: cpx21
  instance_count: 3
  location: nbg1

worker_node_pools:
- name: small-static
  instance_type: cpx21
  instance_count: 4
  location: hel1
  # image: debian-11
  # labels:
  #   - key: purpose
  #     value: blah
  # taints:
  #   - key: something
  #     value: value1:NoSchedule
- name: medium-autoscaled
  instance_type: cpx31
  instance_count: 2
  location: fsn1
  autoscaling:
    enabled: true
    min_instances: 0
    max_instances: 3

embedded_registry_mirror:
      enabled: false # Check if your k3s version is compatible before enabling this option. You can find more information at https://docs.k3s.io/installation/registry-mirror

# additional_packages:
# - somepackage

# post_create_commands:
# - apt update
# - apt upgrade -y
# - apt autoremove -y

# kube_api_server_args:
# - arg1
# - ...
# kube_scheduler_args:
# - arg1
# - ...
# kube_controller_manager_args:
# - arg1
# - ...
# kube_cloud_controller_manager_args:
# - arg1
# - ...
# kubelet_args:
# - arg1
# - ...
# kube_proxy_args:
# - arg1
# - ...
# api_server_hostname: k8s.example.com # optional: DNS for the k8s API LoadBalancer. After the script has run, create a DNS record with the address of the API LoadBalancer.

root@hetzner-k3s:/opt# hetzner-k3s create --config test2.yaml | tee create.log
[Configuration] Validating configuration...
[Configuration] ...configuration seems valid.
[Private Network] Creating private network...
[Private Network] ...private network created
[SSH key] Creating SSH key...
Unhandled exception: Nil assertion failed (NilAssertionError)
[SSH key] ...SSH key created
from /usr/lib/crystal/core/nil.cr:113:7 in 'not_nil!'
from /usr/lib/crystal/core/nil.cr:109:3 in 'not_nil!'
from /usr/lib/crystal/core/crystal/system/unix/fiber.cr:11:5 in 'initialize'
from /usr/lib/crystal/core/gc/boehm.cr:147:5 in 'parse_and_run'
from /home/runner/work/hetzner-k3s/hetzner-k3s/src/hetzner-k3s.cr:96:1 in '__crystal_main'
from /usr/lib/crystal/core/crystal/main.cr:129:5 in 'main'
from src/env/__libc_start_main.c:95:2 in 'libc_start_main_stage2'

@SasRei
Copy link
Author

SasRei commented Jan 18, 2025

just for the sake of completeness here is my ubuntu setup. new server at hetzner with ubuntu 22.04

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl
curl -LO https://dl.k8s.io/release/v1.32.0/bin/linux/amd64/kubectl
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
chmod +x kubectl
mkdir -p ~/.local/bin
mv ./kubectl ~/.local/bin/kubectl
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
echo 'source <(kubectl completion bash)' >>~/.bashrc
source ~/.bashrc
echo 'source <(helm completion bash)' >>~/.bashrc
source ~/.bashrc
sudo apt-get install -y libssh2-1 libssh2-1-dev
sudo apt-get install -y libevent-2.1-7 libevent-dev
sudo apt-get install -y libgc1 libgc-dev
sudo apt-get install -y libyaml-0-2 libyaml-dev
sudo apt-get install -y libpcre3 libpcre3-dev
sudo apt-get install -y libgmp10 libgmp-dev
cd /opt
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.0.9/hetzner-k3s-linux-amd64
chmod +x hetzner-k3s-linux-amd64
sudo mv hetzner-k3s-linux-amd64 /usr/local/bin/hetzner-k3s
ssh-keygen -t ed25519 -C "[email protected]"

@vitobotta
Copy link
Owner

Uhm you are using k3s 1.26.7 which doesn't support Embedded Registry Mirror.

Can you add this to the config file and try again?

embedded_registry_mirror:
  enabled: false

@vitobotta
Copy link
Owner

Wait, I see you added that in the second yaml file. Which one are you actually using?

@SasRei
Copy link
Author

SasRei commented Jan 18, 2025

I don't have „the one“ config yet. These are all very recent tests from your Instructions, which I have just tried out. Both with the same result. Your hello-world example does not work either.

Just wanted to show you that different configs deliver the same result

@SasRei
Copy link
Author

SasRei commented Jan 20, 2025

ok, I could have done that earlier.

in another project with a different api key, the same config works on the same server. so it's not your tool but something in my hetzner project, which has existed for years, is crooked. I'll try a little more, maybe I'll figure it out.

@SasRei
Copy link
Author

SasRei commented Jan 20, 2025

Hey there,

I've fixed the issue.

We are using an existing project with hundreds of SSH keys, and it turns out the

ssh_key/find.cr

is unable to fetch more than 25 SSH keys.

The problem is with how pagination is being handled. It isn't requesting all pages of SSH keys from the Hetzner API.

The fix involved adjusting the pagination handling, ensuring that all keys are fetched correctly, regardless of how many are present.

Everything is working fine now.

I'm not a developer, so the fix should come from someone who knows more about what they're doing. 😉

Here's the working adjustment:

require "../client"
require "../ssh_key"
require "../ssh_keys_list"

class Hetzner::SSHKey::Find
  getter hetzner_client : Hetzner::Client
  getter ssh_key_name : String
  getter public_ssh_key_path : String

  def initialize(@hetzner_client, @ssh_key_name, @public_ssh_key_path)
  end

  def run
    ssh_keys = fetch_ssh_keys
    fingerprint = calculate_fingerprint(public_ssh_key_path)

    key = ssh_keys.find { |ssh_key| ssh_key.fingerprint == fingerprint }
    key ||= ssh_keys.find { |ssh_key| ssh_key.name == ssh_key_name }
    key
  end

  private def fetch_ssh_keys
    all_ssh_keys = [] of SSHKey
    page = 1
    per_page = 25

    loop do
      success, response = hetzner_client.get("/ssh_keys", { :page => page, :per_page => per_page })

      if success
        ssh_keys = SSHKeysList.from_json(response).ssh_keys
        all_ssh_keys.concat(ssh_keys)
        break if ssh_keys.size < per_page
      else
        STDERR.puts "[#{default_log_prefix}] Failed to fetch ssh keys: #{response}"
        STDERR.puts "[#{default_log_prefix}] Retrying to fetch ssh keys in 5 seconds..."
        raise "Failed to fetch ssh keys"
      end

      page += 1
    end

    all_ssh_keys
  end

  private def calculate_fingerprint(public_ssh_key_path)
    private_key = File.read(public_ssh_key_path).split[1]
    Digest::MD5.hexdigest(Base64.decode(private_key)).chars.each_slice(2).map(&.join).join(":")
  end

  private def default_log_prefix
    "SSH key"
  end
end

regards
Sascha

@SasRei
Copy link
Author

SasRei commented Jan 20, 2025

There is one thing I don't understand, you write yourself that you

In a recent test I created a 200 node HA cluster (3 masters, 197 worker nodes) in just under 4 minutes

shouldn't you have experienced the same problem? or has something changed in the Hetzner API in the meantime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants