harbor-jobservice一直重启失败

Chen81147 · 2021 年12 月 21 日 08:07

其他服务一切正常，只有harbor的jobservices一直在重启，删掉pod也依然报错
jobservices报错日志logs：

[root ~]# kubectl logs -n c7n-system  harbor-harbor-jobservice-6dc7955984-vsd9s -f
2021-12-21T07:49:38Z [FATAL] [/jobservice/main.go:80]: load and run worker error: connect to redis server timeout: 
MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk.
 Commands that may modify the data set are disabled,
 because this instance is configured to report errors during writes if RDB snapshotting fails 
(stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.

jobservices的describe信息：

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  27m                   default-scheduler  Successfully assigned c7n-system/harbor-harbor-jobservice-6dc7955984-vsd9s to node1
  Normal   Started    25m (x2 over 27m)     kubelet            Started container jobservice
  Warning  Unhealthy  24m (x15 over 26m)    kubelet            Readiness probe failed: Get "http://192.168.14.200:8080/api/v1/stats": dial tcp 192.168.14.200:8080: connect: connection refused
  Normal   Pulled     23m (x3 over 27m)     kubelet            Container image "goharbor/harbor-jobservice:v2.1.4" already present on machine
  Normal   Created    23m (x3 over 27m)     kubelet            Created container jobservice
  Warning  BackOff    2m13s (x62 over 24m)  kubelet            Back-off restarting failed container

上面unhealthy项显示192.168.14.200链接超时
我用下面命令查看所有节点的额CLUSTER-IP，也没找到这个IP

kubectl get service --all-namespaces

上面写着redisRDB问题，redis有一个unhealthy项重启后也没问题
redis日志：

1:M 21 Dec 07:03:15.018 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 21 Dec 07:03:15.018 # Server initialized
1:M 21 Dec 07:03:15.018 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 21 Dec 07:03:54.674 * DB loaded from append only file: 39.656 seconds
1:M 21 Dec 07:03:54.674 * Ready to accept connections

此外，其他服务都是一键安装的，只有harbor是单独的分步安装的。

Vista · 2021 年12 月 21 日 08:22

首先 harbor-jobservice 一直重启不影响 harbor 正常使用。

你贴一下 harbor 的 helm 配置，注意隐藏敏感信息。

Chen81147 · 2021 年12 月 21 日 08:27

是这个吗？

[root@env-sanitation-db ~]# helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f3******fe05d143688", GitTreeState:"clean", GoVersion:"go1.13.12"}

list是空的

[root@env-sanitation-db ~]# helm list
NAME	NAMESPACE	REVISION	UPDATED	STATUS	CHART	APP VERSION

Vista · 2021 年12 月 21 日 08:53

helm get values -n c7n-system harbor

Chen81147 · 2021 年12 月 21 日 08:55

Chen81147 · 2022 年1 月 14 日 03:21

已经解决好了
harbor的jobservices重启是因为连不上harbor的redis
查看redis日志发现读取的备份的dump.rdb文件没权限，报Permission Denied
于是进入redis容器内

kubectl exec -it harbor-harbor-redis-0 -n c7n-system -- bash

执行

# 修改权限
chmod 777 dump.rdb

删除掉harbor-redis的pod即可

kubectl delete pod -n c7n-system harbor-harbor-redis-0

jobservices若依旧重启失败，删掉jobservices等待其重启即可

kubectl delete pod -n c7n-system harbor-harbor-jobservice-6dc7955984-svvmr