ansible部署不同网段K8S节点机卡住无法继续

  • Choerodon平台版本: 0.6.0

  • 遇到问题的执行步骤:
    ansible-playbook -i inventory/hosts -e @inventory/vars cluster.yml

执行了这个命令后,卡在了下面的这个节点这。。。。。

TASK [node : Join to cluster if needed] *******************************************************************************************************************************************
省略此处信息,下方描述即可知
因为新添加的节点机和master主机为非同一网段,master和节点1机器均为192.168.11.*网段,新节点机为192.168.16.*网段,ip均正常互通,ip也同样均唯一,添加该16网段机器作为节点机,即不能成功,求救,谢谢!

hi,添加节点不是这条命令。请查看下文档中关于如何添加节点操作流程

抱歉问题描述不清楚,补充说明如下:
1、之前是重置ansible的执行语句,在节点处停止,但是去除非同一网段ip后即可正常完成部署,否则非然
2、1步骤重置部署成功后,再次修改hosts,添加非同一网段ip机器,又一次卡在添加节点机未知,如下
TASK [node : Join to cluster if needed] ********************************************************************************************
Monday 12 November 2018 08:52:28 +0800 (0:00:01.433) 0:01:21.593 *******
3、兄台,操作机器为青浦机房虚拟机,区别仅为相应节点机不在同一网段,成功的为与master同在11网段ip的机器,16网段即卡停,故还需要在此求问帮助,谢谢

请重启16网段虚拟机后重试,谢谢。

TASK [node : Join to cluster if needed] ********************************************************************************************
Monday 12 November 2018 09:26:55 +0800 (0:00:01.709) 0:01:11.626 *******

^F5

fatal: [k8s-slave-node-108-3]: UNREACHABLE! => {“changed”: false, “msg”: “SSH Error: data could not be sent to remote host “192.168.16.108”. Make sure this host can be reached over ssh”, “unreachable”: true}
fatal: [k8s-slave-node-109-4]: UNREACHABLE! => {“changed”: false, “msg”: “SSH Error: data could not be sent to remote host “192.168.16.109”. Make sure this host can be reached over ssh”, “unreachable”: true}
fatal: [k8s-slave-node-110-5]: UNREACHABLE! => {“changed”: false, “msg”: “SSH Error: data could not be sent to remote host “192.168.16.110”. Make sure this host can be reached over ssh”, “unreachable”: true}
fatal: [k8s-slave-node-111-6]: UNREACHABLE! => {“changed”: false, “msg”: “SSH Error: data could not be sent to remote host “192.168.16.111”. Make sure this host can be reached over ssh”, “unreachable”: true}

NO MORE HOSTS LEFT *****************************************************************************************************************
to retry, use: --limit @/root/kubeadm-ansible/scale.retry

PLAY RECAP *************************************************************************************************************************
k8s-master-node-184-1 : ok=39 changed=1 unreachable=0 failed=0
k8s-slave-node-108-3 : ok=41 changed=9 unreachable=1 failed=0
k8s-slave-node-109-4 : ok=40 changed=6 unreachable=1 failed=0
k8s-slave-node-110-5 : ok=40 changed=6 unreachable=1 failed=0
k8s-slave-node-111-6 : ok=40 changed=6 unreachable=1 failed=0
k8s-slave-node-168-2 : ok=41 changed=1 unreachable=0 failed=0

Monday 12 November 2018 09:34:44 +0800 (0:07:48.690) 0:09:00.316 *******

reboot时即停止,然后再重新执行,再次卡住

再次卡住最后实录
TASK [node : Create kubeadm client config] *****************************************************************************************
Monday 12 November 2018 09:41:01 +0800 (0:00:00.664) 0:01:09.753 *******
ok: [k8s-slave-node-108-3]
ok: [k8s-slave-node-168-2]
ok: [k8s-slave-node-110-5]
ok: [k8s-slave-node-109-4]
ok: [k8s-slave-node-111-6]

TASK [node : Join to cluster if needed] ********************************************************************************************
Monday 12 November 2018 09:41:02 +0800 (0:00:01.389) 0:01:11.142 *******

https://192.168.11.168/#!/node?namespace=default

请反馈一下在卡住状态下查看16网段节点kubelet日志

journalctl -n 100 -f -u kubelet

Nov 12 13:37:58 k8s-slave-node-108-3 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 12 13:37:58 k8s-slave-node-108-3 systemd[1]: Unit kubelet.service entered failed state.
Nov 12 13:37:58 k8s-slave-node-108-3 systemd[1]: kubelet.service failed.
Nov 12 13:38:09 k8s-slave-node-108-3 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Nov 12 13:38:09 k8s-slave-node-108-3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 12 13:38:09 k8s-slave-node-108-3 systemd[1]: Starting kubelet: The Kubernetes Node Agent…
Nov 12 13:38:09 k8s-slave-node-108-3 kubelet[15497]: I1112 13:38:09.168891 15497 feature_gate.go:156] feature gates: map[]
Nov 12 13:38:09 k8s-slave-node-108-3 kubelet[15497]: I1112 13:38:09.169024 15497 controller.go:114] kubelet config controller: starting controller
Nov 12 13:38:09 k8s-slave-node-108-3 kubelet[15497]: I1112 13:38:09.169035 15497 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 12 13:38:09 k8s-slave-node-108-3 kubelet[15497]: error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
Nov 12 13:38:09 k8s-slave-node-108-3 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 12 13:38:09 k8s-slave-node-108-3 systemd[1]: Unit kubelet.service entered failed state.
Nov 12 13:38:09 k8s-slave-node-108-3 systemd[1]: kubelet.service failed.

请将master节点上/etc/kubernetes/pki/ca.crt拷贝至16网段机子同一位置

拷贝后仍然如前
16网段机器日志仍如下
Nov 12 14:16:13 k8s-slave-node-108-3 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 12 14:16:13 k8s-slave-node-108-3 systemd[1]: Unit kubelet.service entered failed state.
Nov 12 14:16:13 k8s-slave-node-108-3 systemd[1]: kubelet.service failed.
Nov 12 14:16:23 k8s-slave-node-108-3 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Nov 12 14:16:23 k8s-slave-node-108-3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 12 14:16:23 k8s-slave-node-108-3 systemd[1]: Starting kubelet: The Kubernetes Node Agent…
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: I1112 14:16:23.658107 18462 feature_gate.go:156] feature gates: map[]
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: I1112 14:16:23.658215 18462 controller.go:114] kubelet config controller: starting controller
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: I1112 14:16:23.658221 18462 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: I1112 14:16:23.663944 18462 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: I1112 14:16:23.664002 18462 client.go:95] Start docker client with request timeout=2m0s
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: E1112 14:16:23.664369 18462 kube_docker_client.go:91] failed to retrieve docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: W1112 14:16:23.664391 18462 kube_docker_client.go:92] Using empty version for docker client, this may sometimes cause compatibility issue.
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: W1112 14:16:23.664604 18462 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: I1112 14:16:23.668741 18462 feature_gate.go:156] feature gates: map[]
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: W1112 14:16:23.668881 18462 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
Nov 12 14:16:23 k8s-slave-node-108-3 kubelet[18462]: error: failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Nov 12 14:16:23 k8s-slave-node-108-3 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 12 14:16:23 k8s-slave-node-108-3 systemd[1]: Unit kubelet.service entered failed state.
Nov 12 14:16:23 k8s-slave-node-108-3 systemd[1]: kubelet.service failed.

机器账号信息私信你了

经排查是k8s-slave-node-108-3节点安装了其他版本的docker,导致脚本在安装docker时失败,现已为你重新安装

喔噢,原来如此,多谢多谢,感谢感谢