zookeeper 安装

  • Choerodon平台版本: 0.9.0

  • 遇到问题的执行步骤: 部署zookeeper

  • 文档地址:http://choerodon.io/zh/docs/installation-configuration/steps/parts/base/zookeeper/

  • 环境信息(如:节点信息):

  • 报错日志:
    2018-08-25 17:33:30,464 [myid:3] - INFO [WorkerSender[myid=3]:QuorumPeer$QuorumServer@167] - Resolved hostname: zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local to address: zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.65.9
    2018-08-25 17:33:35,469 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@588] - Cannot open channel to 1 at election address zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.65.9:3888
    java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)

  • 原因分析:
    ping zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local

PING zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local (10.233.65.9) 56(84) bytes of data.

64 bytes from 10.233.65.9 (10.233.65.9): icmp_seq=1 ttl=63 time=0.172 ms

64 bytes from 10.233.65.9 (10.233.65.9): icmp_seq=2 ttl=63 time=0.207 ms

64 bytes from 10.233.65.9 (10.233.65.9): icmp_seq=3 ttl=63 time=0.201 ms

64 bytes from 10.233.65.9 (10.233.65.9): icmp_seq=4 ttl=63 time=0.189 ms

svc域名可ping通。。。。zookeeper 可以启动成功,但是不能连接。查看日志报错上面。。

  • 疑问:

看看flannel中有没有报错

4个flannel 节点都没有报错~~~

2018-08-25 18:23:22,120 [myid:1] - WARN [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumCnxManager@588] - Cannot open channel to 3 at election address zookeeper-2.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.66.11:3888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
2018-08-25 18:23:22,121 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: zookeeper-2.zookeeper-headless.jmsw-devops.svc.cluster.local to address: zookeeper-2.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.66.11
2018-08-25 18:23:22,121 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@852] - Notification time out: 800

zookeeper 集群一直报这个错,疯了。。。

你可以尝试把现有的flannel pod都删除一下

看下我这个写法对不对:
net-conf.json: |
修改:
{
“Network”: “10.233.64.0/18”,
“Backend”: {
“Type”: “ali-vpc”
}
}

原来的:
{
“Network”: “[PodsSubnet]”,
“Backend”: {
“Type”: “ali-vpc”
}
}

提提示连不上zookeeper-2,你应该先检查zookeeper-2这个pod是否有报错

2018-08-26 04:35:18,888 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: zookeeper-1.zookeeper-headless.jmsw-devops.svc.cluster.local to address: zookeeper-1.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.65.10
2018-08-26 04:35:18,888 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@852] - Notification time out: 60000
2018-08-26 04:36:23,888 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@588] - Cannot open channel to 1 at election address zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.64.11:3888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
2018-08-26 04:36:23,889 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local to address: zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.64.11
2018-08-26 04:36:28,890 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@588] - Cannot open channel to 2 at election address zookeeper-1.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.65.10:3888

都说了,整个zk集群都在报错啊

进入容器中查看这三个端口是否已启动

刚刚看了,端口没有起来… k8S集群 NFS 都是按你们的文档部署的。。部署搞了几天。真的是无语。。。

我刚刚把 zookeeper 参数设成 --set persistence.enabled=false 也是一样报错

既然容器中的对应端口没有启动,自然无法连接,你是否部署过多次zookeeper?

我都重装linux 系统试过 还是一样也不行。。。

是否清空了NFS呢, zookeeper如果使用了其他节点的数据是会启动失败的。

有清空的,我没用NFS安装 也是报一样的。

尝试下面步骤:
1.先执行下面命令停止 zookeeper

kubectl scale statefulset zookeeper --replicas=0 -n [命名空间]

2.删除NFS中所有zookeeper的数据(非常重要,并非删除pv和pvc)
3.重启flannel

kubectl get po -n kube-system -l k8s-app=flannel -o name  | xargs -I {} kubectl delete {} -n kube-system

4.重启zookeeper

kubectl scale statefulset zookeeper --replicas=3 -n [命名空间]

一样报错:
2018-08-26 09:21:00,052 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local to address: zookeeper-0.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.67.6
2018-08-26 09:21:05,054 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@588] - Cannot open channel to 2 at election address zookeeper-1.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.65.8:3888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
2018-08-26 09:21:05,054 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: zookeeper-1.zookeeper-headless.jmsw-devops.svc.cluster.local to address: zookeeper-1.zookeeper-headless.jmsw-devops.svc.cluster.local/10.233.65.8
2018-08-26 09:21:05,054 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@852] - Notification time out: 25600

3个容器都是一样。。

ECS安全组是否设置了呢?

安全组 要开2181,2888,3888 端口吗

参考文档添加即可