k8s离线部署-使用kubekey部署amd64高可用版k8s1.23.17-ksp3.4.1

一、环境信息介绍

(1) 相关服务器信息:

主机名 IP root密码 规格 磁盘 操作系统 备注
prepare 10.13.15.6 cloud@2020 8c16g 400G Ubuntu 20.04.3 LTS -amd64 联网,生成artifact等
k8s01-1 10.13.15.16 cloud@2020 8c16g 400G+960G Ubuntu 20.04.3 LTS -amd64 K8s节点,未联网
k8s01-2 10.13.15.17 cloud@2020 8c16g 400G+960G Ubuntu 20.04.3 LTS -amd64 K8s节点,未联网
k8s01-3 10.13.15.161 cloud@2020 8c16g 400G+960G Ubuntu 20.04.3 LTS -amd64 K8s节点,未联网
k8s01-4 10.13.15.99 cloud@2020 8c16g 400G+960G Ubuntu 20.04.3 LTS -amd64 K8s节点,未联网
registry 10.13.15.71 cloud@2020 4c8g 400G Ubuntu 20.04.3 LTS -amd64 Harbor仓库,未联网

根据官方文档建议:仓库与集群分离部署,减少相互影响,所以多了一个 registry 服务器。

服务器IP本来想规划成连续着的,后来因为不是一次创建的且此文件不是一次性完成,遂没有关注了。

如果是自己搭建着学习或实验,服务器规格减半也是可以的。

(2) 环境涉及软件版本信息:

  • 操作系统:Ubuntu 20.04.3 LTS -amd64
  • KubeSphere:v3.4.1
  • K8s:v1.23.17
  • Docker:24.0.6
  • KubeKey: v3.0.13
  • Harbor:2.5.3

二、离线安装介绍

KubeKey 是一个用于部署 Kubernetes 集群的开源轻量级工具。它提供了一种灵活、快速、便捷的方式来仅安装 Kubernetes/K3s,或同时安装 Kubernetes/K3s 和 KubeSphere,以及其他云原生插件。除此之外,它也是扩展和升级集群的有效工具。

KubeKey v2.1.0 版本新增了清单(manifest)和制品(artifact)的概念,为用户离线部署 Kubernetes 集群提供了一种解决方案。manifest 是一个描述当前 Kubernetes 集群信息和定义 artifact 制品中需要包含哪些内容的文本文件。

使用 KubeKey,用户只需使用清单 manifest 文件来定义将要离线部署的集群环境需要的内容,再通过该 manifest 来导出制品 artifact 文件即可完成准备工作。离线部署时只需要 KubeKey 和 artifact 就可快速、简单的在环境中部署镜像仓库和 Kubernetes 集群。

根据 青云官网的离线部署文档,使用生成 manifest 文件有两种方式:

  • 参考官方文档,在示例的基础稍作修改(文档中有修改说明)生成满足自己需求的 manifest 文件。
  • 在已有集群中执行 KubeKey 命令生成 manifest 文件,如有需要可再参考官网文档进行修改定制(不修改的话,可想而知生成的制品及最后搭建的环境相关软件版本及镜像跟当前集群一样)。

本文中使用第1种方式生成 manifest 文件以及创建后续需要使用到的制品(artifact)。

整个离线部署K8S+KSP的过程的主线就是制作制品、使用制品,其他操作基本上都是围绕它展开的。

三、服务器的前置操作

建议所有服务器都要执行。

3.1 设置各服务器时区

1
2
3
4
5
6
#所有服务器设置时区
timedatectl set-timezone Asia/Shanghai
#所有未联网服务器关闭网络时间同步
timedatectl set-ntp false
#所有服务器设置北京时间,尽量相同或靠近
timedatectl set-time "11:08:30"

3.2 设置各服务器主机名

1
2
hostnamectl set-hostname xxx
#然后退出终端并重新创建一个终端会话

3.4 安装配置时间同步服务

强烈建议提前做好如下准备(否则,安装配置时间同步服务无法采用如下方法完成):

  • 搭建操作系统组件安装源仓库
  • 部署了自己的本地NTP服务器

一般在搭建软件解决方案前就做好所有服务器的时间同步。如果没有搭建操作系统组件安装源仓库,后面使用kk正式部署k8s+ksp时,也会在所有节点上部署chrony服务,但不会部署本地 NTP服务器。所以还是需要自己提前部署本地 NTP服务器,自己想办法

安装时间同步服务

1
2
3
#1.安装 chrony 作为时间同步软件
#正常情况下内网环境下搭建软件解决方案,应该会先自行搭建操作系统组件安装源仓库;
apt-get install chrony -qy

配置时间同步服务

1
2
3
4
5
6
7
8
9
10
#2.修改配置文件 /etc/chrony/chrony.conf,修改 ntp 服务配置
vi /etc/chrony/chrony.conf
# 将如下认的 pool 配置删除(离线环境中也用不上)
pool ntp.ubuntu.com iburst maxsources 4
pool 0.ubuntu.pool.ntp.org iburst maxsources 1
pool 1.ubuntu.pool.ntp.org iburst maxsources 1
pool 2.ubuntu.pool.ntp.org iburst maxsources 2
# 然后配置自己本地的NTP服务器(最好是一台可以访问外网的服务器),NTP服务器步骤此文未涉及
# 假如10.13.15.20是本地NTP服务器,则添加如下配置
server 10.13.15.20 iburst
1
2
3
4
5
#3.重启 chrony 服务:
systemctl restart chrony

#4.验证 chrony 同步状态:
chronyc sourcestats -v

3.5 数据盘配置(可选)

此次用来搭建k8s环境的三个服务器都配置了一个400G大小的系统盘、一个960G的数据盘。

image-20240927113654261

本文演示每台服务器新增一块数据盘 /dev/vdb(就是说下面的操作要在每个ks8节点服务器上执行)

3.5.1 方法1-裸盘

将数据盘格式化后挂载到服务器指定目录上(明显,此种方式不支再次扩容),作为 dockerKubernetes Pod 的专用存储盘(此处应该指明的是笔者得接触到的环境中,业务数据存储一般是对接ceph等独立的分布式存储系统,所以只有日志文件会存储在本地)。

下面是在k8s01-1服务器上操作为例进行阐述,其他节点也需要进行此操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# mklabel 定义分区表格式(常用的有msdos和gpt分区表格式,msdos不支持2TB以上容量的磁盘,所以大于2TB的磁盘选gpt分区表格式)
# mkpart 创建一个分区,名称为data_k8s,此处不支持声明ext4分区格式(所以下行需要用mkfs.ext4),1表示分区起始位置是距离磁盘起始点1 MB,-1表示分区结束位置是磁盘结束位置
root@k8s01-1:~# parted /dev/vdb -s -- mklabel gpt mkpart data_k8s 1 -1
#将分区格式化为ext4格式
root@k8s01-1:~# mkfs.ext4 /dev/vdb1

#手动挂载磁盘分区
root@k8s01-1:/# mkdir /data/
root@k8s01-1:/# mount /dev/vdb1 /data/

#配置开机自动挂载方法1
root@k8s01-1:/# tail -1 /etc/mtab
root@k8s01-1:/# tail -1 /etc/mtab >> /etc/fstab
#查看某分区的uuid: blkid /dev/vdb1
#配置开机自动挂载的第2种方法:可以手动编辑/etc/fstab,添加如下内容
root@k8s01-1:/# vi /etc/fstab
/dev/vdb1 /data ext4 rw,relatime 0 0

3.5.2 方法2-LVM

以LVM(逻辑卷管理)的方式扩容挂载到服务器指定目录上,作为 dockerKubernetes Pod 的专用存储盘(此处应该指明的是笔者得接触到的环境中,业务数据存储一般是对接ceph等独立的分布式存储系统,所以只有日志文件会存储在本地)。

下面是在k8s01-1服务器上操作为例进行阐述,其他节点也需要进行此操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#创建 PV 物理卷
root@k8s01-1:~# pvcreate /dev/vdb
#创建 VG 卷组vgk8s,将物理卷 /dev/vdb 添加到此卷组中。卷组是一种逻辑存储单元,它可以由一个或多个物理卷组成。
root@k8s01-1:~# vgcreate vgk8s /dev/vdb
#查看所有的卷组
root@k8s01-1:~# vgdisplay
#创建 LV 逻辑卷(使用卷组vgk8s的100%所有空间,vgk8s 是VG 名字,lvk8s 是LV 名字),此处一个逻辑卷对应一个卷组(理论上来说,一个卷组可以被划分为一个或多个逻辑卷)
root@k8s01-1:~# lvcreate -l 100%VG vgk8s -n lvk8s
#查看逻辑卷的挂载目录
root@k8s01-1:~# lvdisplay
...
LV Path /dev/vgk8s/lvk8s
...

#查看并格式化逻辑卷
root@k8s01-1:~# ll /dev/vgk8s/lvk8s
root@k8s01-1:~# mkfs.ext4 /dev/vgk8s/lvk8s
1
2
3
4
5
6
7
8
9
10
11
12
#使用LVM的好处就是后续可以根据需要再向此逻辑卷中增加空间,比如现有需要再增加空间且有一个新的磁盘/dev/vdc
#但此情况一般不会出现,正如前面所述,业务数据存储一般是对接ceph等独立的分布式存储系统,所以只有日志文件会存储在本地,一般不会需要如此大空间、存储这么久的日志信息(至少笔者未遇到)
#首先创建物理卷
root@k8s01-1:~# pvcreate /dev/vdc
root@k8s01-1:~# vgextend vgk8s /dev/vdc
#将逻辑卷 /dev/vgk8s/lvk8s 所在卷组(即vgk8s)中所有(100%)剩余可用容量(FREE)扩容给逻辑卷/dev/vgk8s/lvk8s
root@k8s01-1:~# lvextend -l +100%FREE /dev/vgk8s/lvk8s
#查看逻辑卷
root@k8s01-1:~# lvscan
ACTIVE '/dev/vgk8s/lvk8s' [1.87 TiB] inherit
#或者,使用 lvdisplay 可以看到其更详细的信息
root@k8s01-1:~# lvdisplay
1
2
3
4
5
6
7
8
9
10
#手动挂载逻辑卷
root@k8s01-1:/# mkdir /data/
root@k8s01-1:/# mount /dev/vgk8s/lvk8s /data/

#配置开机自动挂载方法1
root@k8s01-1:/# tail -1 /etc/mtab
root@k8s01-1:/# tail -1 /etc/mtab >> /etc/fstab
#配置开机自动挂载的第2种方法:可以手动编辑/etc/fstab,添加如下内容
root@k8s01-1:/# vi /etc/fstab
/dev/vgk8s/lvk8s /data ext4 rw,relatime 0 0

四、离线部署资源文件制作

制作离线部署资源文件需要有一个能连接外网的服务器(本文中是prepare),本文中是服务器 prepare

在该服务器上下载 KubeKey v3.0.13 文件。

4.1 准备 KubeKey 文件

1
2
3
4
5
6
root@prepare:~# mkdir /opt/prepare
root@prepare:~# cd /opt/prepare
# 选择中文区下载(访问 GitHub 受限时使用)
root@prepare:/opt/prepare# export KKZONE=cn
# 执行下载命令,获取指定版本的 kk(受限于网络,有时需要执行多次)
root@prepare:/opt/prepare# curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.13 sh -
1
2
3
4
root@prepare:/opt/prepare# chown root:root kk
#查看kubekey3.0.13版本
root@prepare:/opt/prepare# ./kk version
kk version: &version.Info{Major:"3", Minor:"0", GitVersion:"v3.0.13", GitCommit:"ac75d3ef3c22e6a9d999dcea201234d6651b3e72", GitTreeState:"clean", BuildDate:"2023-11-07T08:42:04Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

4.2 准备 ubuntu-20.04-debs-amd64.iso

本实验环境使用的操作系统是 x86_64 的 Ubuntu 20.04.3 LTS,所以只下载 Ubuntu 20.04.3 LTS的操作系统依赖组件包,其他操作系统在 KubeKey releases 页面下载。

1
2
3
4
5
6
7
8
9
10
#执行下面的命令,在能联网的部署服务器上执行下载。网络访问受限时,也可以通过其他方式,将该 ISO 下载后放到制作离线镜像的服务器的 /opt/prepare 目录下。
# KubeKey v3.0.13 的 release 中没包,只能在 v3.0.12 的 releases 中下载。
root@prepare:/opt/prepare# wget https://github.com/kubesphere/kubekey/releases/download/v3.0.12/ubuntu-20.04-debs-amd64.iso

# 验证 sha256sum,确保 ISO 在下载过程中没出问题(官方提供的 sha256sum 信息在 https://github.com/kubesphere/kubekey/releases/download/v3.0.12/ubuntu-20.04-debs.iso.sha256sum.txt)
root@prepare:/opt/prepare# sha256sum ubuntu-20.04-debs-amd64.iso
9c35697e4192c57a9195a8d216beda71058a420444c89b895147f27339c369b9 ubuntu-20.04-debs-amd64.iso
root@ksp-deploy:/opt/prepare# cat ubuntu-20.04-debs.iso.sha256sum.txt
9c35697e4192c57a9195a8d216beda71058a420444c89b895147f27339c369b9 ubuntu-20.04-debs-amd64.iso
99152a2675d334cf4d17a32bd99ca1fa21616b5bfe45bb5c93f5175ebca472a0 ubuntu-20.04-debs-arm64.iso

4.3 准备 manifest 文件

前文已经说过,本文是在官方示例的基础上经过修改生成 manifest 文件。官方示例的话,笔者知道有两个页面可以看到:青云官网的离线部署文档 与 manifest-example

1
2
3
4
5
6
# ksp v3.4.1 对应的完整镜像列表可以通过如下连接下载得到
root@prepare:/opt/prepare# wget https://github.com/kubesphere/ks-installer/releases/download/v3.4.1/images-list.txt
#但其中包含的是 docker.io 上的镜像,可以在上述“青云官网的离线部署文档https://www.kubesphere.io/zh/docs/v3.4/installing-on-linux/introduction/air-gapped-installation/”中看到其中示例使用的是aliyuncs镜像仓库中的对应镜像,通过参考两处的镜像列表并进行修改,同时修改了镜像列表之外的其他配置,最终得到如下k8sV12317-kspV341-manifest.yaml

root@prepare:/opt/prepare# vi k8sV12317-kspV341-manifest.yaml
#修改后的文件内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Manifest
metadata:
name: sample
spec:
arches:
- amd64
operatingSystems:
- arch: amd64
type: linux
id: ubuntu
version: "20.04"
osImage: Ubuntu 20.04.3 LTS
repository:
iso:
localPath: /opt/prepare/ubuntu-20.04-debs-amd64.iso
url:
kubernetesDistributions:
- type: kubernetes
version: v1.23.17
components:
helm:
version: v3.9.0
cni:
version: v1.2.0
etcd:
version: v3.4.13
calicoctl:
version: v3.26.1
## For now, if your cluster container runtime is containerd, KubeKey will add a docker 20.10.8 container runtime in the below list.
## The reason is KubeKey creates a cluster with containerd by installing a docker first and making kubelet connect the socket file of containerd which docker contained.
containerRuntimes:
- type: docker
version: 24.0.6
crictl:
version: v1.24.0
docker-registry:
version: "2"
harbor:
version: v2.5.3
docker-compose:
version: v2.2.2
images:
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver:v1.23.17
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager:v1.23.17
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.23.17
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler:v1.23.17
- registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.6
- registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.6
- registry.cn-beijing.aliyuncs.com/kubesphereio/cni:v3.26.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.26.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.26.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/pod2daemon-flexvol:v3.26.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/typha:v3.23.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/flannel:v0.12.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/linux-utils:3.3.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/haproxy:2.3
- registry.cn-beijing.aliyuncs.com/kubesphereio/nfs-subdir-external-provisioner:v4.0.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/k8s-dns-node-cache:1.15.12
- registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/ks-apiserver:v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/ks-console:v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/ks-controller-manager:v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/kubectl:v1.22.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kubectl:v1.21.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kubectl:v1.20.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kubefed:v0.8.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/tower:v0.2.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/minio:RELEASE.2019-08-07T01-59-21Z
- registry.cn-beijing.aliyuncs.com/kubesphereio/mc:RELEASE.2019-08-07T23-14-43Z
- registry.cn-beijing.aliyuncs.com/kubesphereio/snapshot-controller:v4.0.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/nginx-ingress-controller:v1.3.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-amd64:1.4
- registry.cn-beijing.aliyuncs.com/kubesphereio/metrics-server:v0.4.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/redis:5.0.14-alpine
- registry.cn-beijing.aliyuncs.com/kubesphereio/haproxy:2.0.25-alpine
- registry.cn-beijing.aliyuncs.com/kubesphereio/alpine:3.14
- registry.cn-beijing.aliyuncs.com/kubesphereio/openldap:1.3.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/netshoot:v1.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/cloudcore:v1.13.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/iptables-manager:v1.13.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/edgeservice:v0.3.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/gatekeeper:v3.5.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/openpitrix-jobs:v3.3.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/devops-apiserver:ks-v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/devops-controller:ks-v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/devops-tools:ks-v3.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/ks-jenkins:v3.4.0-2.319.3-1
- registry.cn-beijing.aliyuncs.com/kubesphereio/inbound-agent:4.10-2
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-base:v3.2.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-nodejs:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-maven:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-maven:v3.2.1-jdk11
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-python:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.2-1.16
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.2-1.17
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.2-1.18
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-base:v3.2.2-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-nodejs:v3.2.0-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-maven:v3.2.0-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-maven:v3.2.1-jdk11-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-python:v3.2.0-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.0-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.2-1.16-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.2-1.17-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/builder-go:v3.2.2-1.18-podman
- registry.cn-beijing.aliyuncs.com/kubesphereio/s2ioperator:v3.2.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/s2irun:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/s2i-binary:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/tomcat85-java11-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/tomcat85-java11-runtime:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/tomcat85-java8-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/tomcat85-java8-runtime:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/java-11-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/java-8-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/java-8-runtime:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/java-11-runtime:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/nodejs-8-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/nodejs-6-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/nodejs-4-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/python-36-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/python-35-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/python-34-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/python-27-centos7:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/argocd:v2.3.3
- registry.cn-beijing.aliyuncs.com/kubesphereio/argocd-applicationset:v0.4.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/dex:v2.30.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/redis:6.2.6-alpine
- registry.cn-beijing.aliyuncs.com/kubesphereio/configmap-reload:v0.7.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus:v2.39.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader:v0.55.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-operator:v0.55.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-rbac-proxy:v0.11.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-state-metrics:v2.6.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/node-exporter:v1.3.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/alertmanager:v0.23.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/thanos:v0.31.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/grafana:8.3.3
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-rbac-proxy:v0.11.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/notification-manager-operator:v2.3.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/notification-manager:v2.3.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/notification-tenant-sidecar:v3.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6
- registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-oss:6.8.22
- registry.cn-beijing.aliyuncs.com/kubesphereio/opensearch:2.6.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/opensearch-dashboards:2.6.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/opensearch-curator:v0.0.5
- registry.cn-beijing.aliyuncs.com/kubesphereio/fluentbit-operator:v0.14.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/docker:19.03
- registry.cn-beijing.aliyuncs.com/kubesphereio/fluent-bit:v1.9.4
- registry.cn-beijing.aliyuncs.com/kubesphereio/log-sidecar-injector:v1.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/filebeat:6.7.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-events-operator:v0.6.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-events-exporter:v0.6.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-events-ruler:v0.6.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-auditing-operator:v0.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/kube-auditing-webhook:v0.2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/pilot:1.14.6
- registry.cn-beijing.aliyuncs.com/kubesphereio/proxyv2:1.14.6
- registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-operator:1.29
- registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-agent:1.29
- registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-collector:1.29
- registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-query:1.29
- registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-es-index-cleaner:1.29
- registry.cn-beijing.aliyuncs.com/kubesphereio/kiali-operator:v1.50.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/kiali:v1.50
- registry.cn-beijing.aliyuncs.com/kubesphereio/busybox:1.31.1
- registry.cn-beijing.aliyuncs.com/kubesphereio/nginx:1.14-alpine
- registry.cn-beijing.aliyuncs.com/kubesphereio/wget:1.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/hello:plain-text
- registry.cn-beijing.aliyuncs.com/kubesphereio/wordpress:4.8-apache
- registry.cn-beijing.aliyuncs.com/kubesphereio/hpa-example:latest
- registry.cn-beijing.aliyuncs.com/kubesphereio/fluentd:v1.4.2-2.0
- registry.cn-beijing.aliyuncs.com/kubesphereio/perl:latest
- registry.cn-beijing.aliyuncs.com/kubesphereio/examples-bookinfo-productpage-v1:1.16.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/examples-bookinfo-reviews-v1:1.16.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/examples-bookinfo-reviews-v2:1.16.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/examples-bookinfo-details-v1:1.16.2
- registry.cn-beijing.aliyuncs.com/kubesphereio/examples-bookinfo-ratings-v1:1.16.3
- registry.cn-beijing.aliyuncs.com/kubesphereio/scope:1.13.0
kubectl: v1.22.0
registry:
auths: {}

4.4 生成制品文件

制品(artifact)是一个根据指定的 manifest 文件内容导出的包含镜像 tar 包和相关二进制文件的 tgz 包。 在 KubeKey 初始化镜像仓库、创建集群、添加节点和升级集群的命令中均可指定一个 artifact,KubeKey 将自动解包该 artifact 并在执行命令时直接使用解包出来的文件。

注意

  • 导出时请确保网络连接正常(如果中断就需要全部重新来过)。
  • KubeKey 会解析镜像列表中的镜像名,若镜像名中的镜像仓库需要鉴权信息,可在 manifest 文件中的 .registry.auths 字段中进行配置(因为此处使用的是aliyuncs的公共容器镜像,所以无需鉴权)。
1
2
3
4
5
#国内一般服务器访问 GitHub/Googleapis 受限,强烈建议执行使用此环境变量
root@prepare:/opt/prepare# export KKZONE=cn
#根据上述制作的的 manifest,在prepare服务器上执行下面的命令制作制品(artifact)
root@prepare:/opt/prepare# ./kk artifact export -m k8sV12317-kspV341-manifest.yaml -o k8sV12317-kspV341-artifact.tar.gz
#上述命名执行时,会在当前目录创建一个 kubekey 子目录

此命令需要较长时间,执行完成时如下图。

image-20250112124715488
1
2
3
4
5
# 制作过程中,创建 kubekey/artifact目录其中存放临时下载的文件与镜像,制品制作完成后 kubekey/artifact 目录会被清理

#制作完成后,会在当前目录生成一个 k8sV12317-kspV341-artifact.tar.gz 文件
root@prepare:/opt/prepare# ls -alh k8sV12317-kspV341-artifact.tar.gz
-rw-r--r-- 1 root root 13G Jan 12 12:41 k8sV12317-kspV341-artifact.tar.gz

4.5 离线部署资源文件汇总

至此,我们已经准备了如下两个离线部署文件:

  • KubeKey:kubekey-v3.0.13-linux-amd64.tar.gz(35M)
  • 制品artifact:k8sV12317-kspV341-artifact.tar.gz(13G)

五、离线部署执行前准备

5.1 上传离线部署资源文件

将 KubeKey 和制品 artifact ,上传至离线环境部署节点 (此处是 k8s01-1 节点,所有节点信息参考文档最开始处的描述) 的 /opt/ 目录。

  • KubeKey:kubekey-v3.0.13-linux-amd64.tar.gz(35M)
  • 制品artifact:k8sV12317-kspV341-artifact.tar.gz(13G)
1
2
3
4
5
6
# 创建离线资源存放的数据目录
root@k8s01-1:/# mkdir /opt/offline-deployk8sksp
#执行以下命令,解压 KubeKey:
root@k8s01-1:/# mv /opt/k8sV12317-kspV341-artifact.tar.gz /opt/offline-deployk8sksp
root@k8s01-1:/# tar -zxf /opt/kubekey-v3.0.13-linux-amd64.tar.gz -C /opt/offline-deployk8sksp
root@k8s01-1:/# cd /opt/offline-deployk8sksp

5.2 创建离线集群配置文件

1
2
#执行以下命令创建离线集群配置文件
root@k8s01-1:/opt/offline-deployk8sksp# ./kk create config --with-kubesphere v3.4.1 --with-kubernetes v1.23.17 -f k8sV12317-kspV341-offlineconfig.yaml

命令执行成功后,在同目录会生成一个 k8sV12317-kspV341-offlineconfig.yaml 文件。

5.3 修改 K8S 部署相关配置

k8sV12317-kspV341-offlineconfig.yaml 文件中 “kind: Cluster” 部分是部署 k8s 集群的相关配置,只需要修改 spec 内的内容。

1
2
root@k8s01-1:/opt/offline-deployk8sksp# vi k8sV12317-kspV341-offlineconfig.yaml
#修改离线集群配置文件 k8sV12317-kspV341-offlineconfig.yaml,修改说明及修改后的截图如下

spec 内需要修改处说明如下(其余未说明处保持默认):

  • hosts部分:指定所有k8s节点的主机主机名、 IP、用户、密码此外还新增一个 registry 节点的配置
  • roleGroups部分:指定 3 个节点同时作为 etcd、control-plane 和 worker 节点,同时还有一个服务器k8s01-4只做工作节点。另外 registry 部分指定镜像仓库的服务器主机名,用于 KubeKey 部署自建 Harbor 仓库
  • controlPlaneEndpoint部分: 启用并设置内置的负载均衡器(internalLoadbalancer)为 HAProxy
  • 新增了storage.openebs.basePath 部分:指定 openebs 默认存储路径(就是默认创建的sc/local的存储路径)为 /data/openebs/local
  • registry:添加 type 类型为 harbor,否则后面执行“init registry”时会安装 docker registry 作为镜像仓库,还有auths部分与privateRegistrynamespaceOverride参数(auths部分有参考: kubesphere关于部署配置文件的描述
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: k8s01-1, address: 10.13.15.16, internalAddress: 10.13.15.16, user: root, password: "cloud@2020"}
- {name: k8s01-2, address: 10.13.15.17, internalAddress: 10.13.15.17, user: root, password: "cloud@2020"}
- {name: k8s01-3, address: 10.13.15.161, internalAddress: 10.13.15.161, user: root, password: "cloud@2020"}
- {name: k8s01-4, address: 10.13.15.99, internalAddress: 10.13.15.99, user: root, password: "cloud@2020"}
- {name: registry, address: 10.13.15.71, internalAddress: 10.13.15.71, user: root, password: "cloud@2020"}
roleGroups:
etcd:
- k8s01-1
- k8s01-2
- k8s01-3
control-plane:
- k8s01-1
- k8s01-2
- k8s01-3
worker:
- k8s01-1
- k8s01-2
- k8s01-3
- k8s01-4
registry: # 如需使用 kk 自动部署镜像仓库,请设置该主机组 (建议仓库与集群分离部署,减少相互影响)
- registry
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
internalLoadbalancer: haproxy

domain: lb.kubesphere.local
address: ""
port: 6443
kubernetes:
version: v1.23.17
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
storage:
openebs:
basePath: /data/openebs/local # 默认没有的新增配置,base path of the local PV provisioner
registry:
# 如需使用 kk 部署 harbor, 可将该参数设置为 harbor,不设置该参数且需使用 kk 创建容器镜像仓库,将默认使用docker registry。
type: harbor
# 如使用 kk 部署的 harbor 或其他需要登录的仓库,可设置对应仓库的auths,如使用 kk 创建的 docker registry 仓库,则无需配置该参数。
# 注意:如使用 kk 部署 harbor,该参数请于 harbor 启动后设置。
#auths:
# "registry.syjiang.com":
# username: admin
# password: Harbor12345
# certsPath: "/etc/docker/certs.d/registry.syjiang.com"
# skipTLSVerify: false # 即使 TLS校验失败了,仍允许通过HTTPS协议连接registry
# plainHTTP: false # 允许通过HTTP协议连接registry
# 设置集群部署时使用的私有仓库
privateRegistry: "registry.syjiang.com"
namespaceOverride: "kubesphereio"
registryMirrors: []
insecureRegistries: []
addons: []

5.4 修改 KSP 部署相关配置

k8sV12317-kspV341-offlineconfig.yaml 文件中 kind: ClusterConfiguration 部分是关于部署 KubeSphere 及相关插件的配置。

5.4.1 必须要新增的参数

  • 添加ClusterConfiguration.spec.namespace_override参数
1
2
82 spec:
83 namespace_override: "kubesphereio"

此参数的意义跟K8S部署配置中的spec.registry.namespaceOverride类似,要确认是配置成 kubesphereio,且要加双引号(笔者先前没加双引号时没生效,但kubesphere官方cnblogs博客文章中是没有加双引号的)。它的作用就是Kubesphere本身使用到的镜像只到registry.syjiang.com/kubesphereio下面去拉取,如果不配置的话有些镜像是会到另外的项目下拉取的比如kubesphere-system空间中pod/ks-apiserver-xxx用到的镜像就是registry.syjiang.com/kubesphere/ks-apiserver:v3.4.1

据Kubeshpere的官方博文介绍,2.x 版本的 KubeKey 没这个问题,3.x 的到 v3.1.1 为止,都存在这个问题。

5.4.2 可修改的参数(可选)

前面准备的制品文件非常之大有10多G,就是因为其中包含相关可选组件的全量镜像。如果不想启用可选插件,这部分直接保持默认配置也可。

如果有需要或想学习一下,可以启用相关插件配置。以下是启用了除 Kubeedge 、gatekeeper 以外的所有插件后的配置(很多内容笔者并没有用过,只是参考了kubesphere在cnblogs上的博文 一文搞定 KubeKey 3.1.1 离线部署 KubeSphere 3.4.1 和 Kubernetes v1.28)。

  • 启用 etcd 监控
1
2
3
4
5
90   etcd:
91 monitoring: true #将false改为true
92 endpointIps: localhost
93 port: 2379
94 tlsEnable: true
  • 启用 KubeSphere 告警系统
1
2
3
4
5
162   alerting:
163 enabled: true #将false改为true
164 # thanosruler:
165 # replicas: 1
166 # resources: {}
  • 启用 KubeSphere 审计日志
1
2
3
4
5
6
167   auditing:
168 enabled: true #将false改为true
169 # operator:
170 # resources: {}
171 # webhook:
172 # resources: {}
  • 启用 KubeSphere DevOps 系统
1
2
3
4
5
6
7
173   devops:
174 enabled: true #将false改为true
175 jenkinsCpuReq: 0.5
176 jenkinsCpuLim: 1
177 jenkinsMemoryReq: 4Gi
178 jenkinsMemoryLim: 4Gi
179 jenkinsVolumeSize: 16Gi
  • 启用 KubeSphere 事件系统
1
2
3
4
5
6
7
8
9
10
180   events:
181 enabled: true #将false改为true
182 # operator:
183 # resources: {}
184 # exporter:
185 # resources: {}
186 ruler:
187 enabled: true
188 replicas: 2
189 # resources: {}
  • 启用 KubeSphere 日志系统(v3.4.0 开始默认启用 opensearch)
1
2
3
4
5
6
190   logging:
191 enabled: true #将false改为true
192 logsidecar:
193 enabled: true
194 replicas: 2
195 # resources: {}
  • 启用 Metrics Server
1
2
196   metrics_server:
197 enabled: true #将false改为true

说明:KubeSphere 支持用于deployments的容器组(Pod)弹性伸缩程序 (HPA)。在 KubeSphere 中,Metrics Server 控制着 HPA 是否启用。

  • 启用网络策略、容器组 IP 池,服务拓扑图
1
2
3
4
5
6
7
228   network:
229 networkpolicy:
230 enabled: true #将false改为true
231 ippool:
232 type: calico #将none改为 calico
233 topology:
234 type: weave-scope #将none 改为 weave-scope
  • 启用应用商店
1
2
3
235   openpitrix:
236 store:
237 enabled: true #将false改为true
  • 启用 KubeSphere 服务网格(Istio)
1
2
3
4
5
6
7
8
9
238   servicemesh:
239 enabled: true #将false改为true
240 istio:
241 components:
242 ingressGateways:
243 - name: istio-ingressgateway
244 enabled: false
245 cni:
246 enabled: false

此时,修改后的 k8sV12317-kspV341-offlineconfig.yaml 文件完整内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: k8s01-1, address: 10.13.15.16, internalAddress: 10.13.15.16, user: root, password: "cloud@2020"}
- {name: k8s01-2, address: 10.13.15.17, internalAddress: 10.13.15.17, user: root, password: "cloud@2020"}
- {name: k8s01-3, address: 10.13.15.161, internalAddress: 10.13.15.161, user: root, password: "cloud@2020"}
- {name: k8s01-4, address: 10.13.15.99, internalAddress: 10.13.15.99, user: root, password: "cloud@2020"}
- {name: registry, address: 10.13.15.71, internalAddress: 10.13.15.71, user: root, password: "cloud@2020"}
roleGroups:
etcd:
- k8s01-1
- k8s01-2
- k8s01-3
control-plane:
- k8s01-1
- k8s01-2
- k8s01-3
worker:
- k8s01-1
- k8s01-2
- k8s01-3
- k8s01-4
registry: # 如需使用 kk 自动部署镜像仓库,请设置该主机组 (建议仓库与集群分离部署,减少相互影响)
- registry
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
internalLoadbalancer: haproxy

domain: lb.kubesphere.local
address: ""
port: 6443
kubernetes:
version: v1.23.17
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
storage:
openebs:
basePath: /data/openebs/local # 默认没有的新增配置,base path of the local PV provisioner
registry:
# 如需使用 kk 部署 harbor, 可将该参数设置为 harbor,不设置该参数且需使用 kk 创建容器镜像仓库,将默认使用docker registry。
type: harbor
# 如使用 kk 部署的 harbor 或其他需要登录的仓库,可设置对应仓库的auths,如使用 kk 创建的 docker registry 仓库,则无需配置该参数。
# 注意:如使用 kk 部署 harbor,该参数请于 harbor 启动后设置。
#auths:
# "registry.syjiang.com":
# username: admin
# password: Harbor12345
# certsPath: "/etc/docker/certs.d/registry.syjiang.com"
# skipTLSVerify: false # 即使 TLS校验失败了,仍允许通过HTTPS协议连接registry
# plainHTTP: false # 允许通过HTTP协议连接registry
# 设置集群部署时使用的私有仓库
privateRegistry: "registry.syjiang.com"
namespaceOverride: "kubesphereio"
registryMirrors: []
insecureRegistries: []
addons: []



---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.4.1
spec:
namespace_override: "kubesphereio"
persistence:
storageClass: ""
authentication:
jwtSecret: ""
local_registry: ""
# dev_tag: ""
etcd:
monitoring: true #将false改为true
endpointIps: localhost
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
# apiserver:
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: false
enableHA: false
volumeSize: 2Gi
openldap:
enabled: false
volumeSize: 2Gi
minio:
volumeSize: 20Gi
monitoring:
# type: external
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
enabled: false
logMaxAge: 7
elkPrefix: logstash
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchHost: ""
externalElasticsearchPort: ""
opensearch:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
enabled: true
logMaxAge: 7
opensearchPrefix: whizard
basicAuth:
enabled: true
username: "admin"
password: "admin"
externalOpensearchHost: ""
externalOpensearchPort: ""
dashboard:
enabled: false
alerting:
enabled: true #将false改为true
# thanosruler:
# replicas: 1
# resources: {}
auditing:
enabled: true #将false改为true
# operator:
# resources: {}
# webhook:
# resources: {}
devops:
enabled: true #将false改为true
jenkinsCpuReq: 0.5
jenkinsCpuLim: 1
jenkinsMemoryReq: 4Gi
jenkinsMemoryLim: 4Gi
jenkinsVolumeSize: 16Gi
events:
enabled: true #将false改为true
# operator:
# resources: {}
# exporter:
# resources: {}
ruler:
enabled: true
replicas: 2
# resources: {}
logging:
enabled: true #将false改为true
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server:
enabled: true #将false改为true
monitoring:
storageClass: ""
node_exporter:
port: 9100
# resources: {}
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# prometheus:
# replicas: 1
# volumeSize: 20Gi
# resources: {}
# operator:
# resources: {}
# alertmanager:
# replicas: 1
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu:
nvidia_dcgm_exporter:
enabled: false
# resources: {}
multicluster:
clusterRole: none
network:
networkpolicy:
enabled: true #将false改为true
ippool:
type: calico #将none改为 calico
topology:
type: weave-scope #将none 改为 weave-scope
openpitrix:
store:
enabled: true #将false改为true
servicemesh:
enabled: true #将false改为true
istio:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
- ""
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
# resources: {}
# hostNetWork: false
iptables-manager:
enabled: true
mode: "external"
# resources: {}
# edgeService:
# resources: {}
gatekeeper:
enabled: false
# controller_manager:
# resources: {}
# audit:
# resources: {}
terminal:
timeout: 600

5.5 创建配置数据目录

1
2
3
4
5
6
7
8
9
#集群所有节点都操作
#创建 openebs 本地数据根目录
root@k8s01-1:/# mkdir -p /data/openebs/local

#创建 docker 数据目录
root@k8s01-1:/# mkdir -p /data/docker

#创建 docker数据目录软连接
root@k8s01-1:/# ln -s /data/docker /var/lib/docker

5.6 安装依赖组件

使用Kubekey在线方式部署k8s+ksp时,需要使用执行"./kk cluster xxx"之前安装操作系统依赖组件,部署过程中校验其未安装就报错而中断部署。但离线部署时,通过给kk 传递“-a k8sV12317-kspV341-artifact.tar.gz”参数可以自动完成这些依赖组件的安装。

所以此处无需手动操作。

5.7、安装配置 Harbor

采用 KubeKey 工具在 registry 服务器上部署 Harbor、创建项目、向其中推送镜像。

请注意,以下操作涉及到两个服务器(k8s01-1是部署节点,registry是Harbor服务所在服务器)且可能会来回切换,请根据命令前的主机名进行区分具体是在哪个服务器上执行相关命令。上执行。

5.7.1 安装 Harbor

1
2
#在部署节点执行如下命令,安装镜像仓库(本文中是 Harbor)
root@k8s01-1:/opt/offline-deployk8sksp# ./kk init registry -f k8sV12317-kspV341-offlineconfig.yaml -a k8sV12317-kspV341-artifact.tar.gz

此处一定不要将k8s01-1节点(其他k8s节点笔者没有尝试,不知是否会报类似错误,可能需要查看kk源码才知道具体的逻辑)同时作为Harbor 服务所在服务器,否则将提示找不到“/tmp/kubekey/etc/docker/certs.d/registry.syjiang.com/ca.crt”而中断执行:

image-20250112161653908

可以看到在安装harbor过程中会做如下事情:

  • 为所有节点配置ntp服务器(这个ntp服务器在哪里,还不确定)
  • 推送私有证书到registry服务器
  • 在registry服务器上安装docker、安装docker-compose、启动harbor
image-20250112162718819
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#在registry服务器查看(harbor的安装文件与配置文件默认被放置在 /opt/harbor 目录)
root@registry:~# ls -lh /opt/harbor/
total 633M
-rw-r--r-- 1 root root 12K Jul 7 2022 LICENSE
drwxr-xr-x 3 root root 4.0K Jan 12 16:26 common
-rw-r--r-- 1 root root 3.3K Jul 7 2022 common.sh
-rw-r--r-- 1 root root 9.3K Jan 12 16:26 docker-compose.yml
-rw-r--r-- 1 root root 633M Jul 7 2022 harbor.v2.5.3.tar.gz
-rw-r--r-- 1 root root 3.4K Jan 12 16:25 harbor.yml
-rw-r--r-- 1 root root 9.7K Jul 7 2022 harbor.yml.tmpl
-rwxr-xr-x 1 root root 2.5K Jul 7 2022 install.sh
-rwxr-xr-x 1 root root 1.9K Jul 7 2022 prepare

#查看容器列表
root@registry:~# cd /opt/harbor/
root@registry:/opt/harbor# docker-compose ps -a
image-20250112163452340
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#查看 /opt/harbor/harbor.yml 配置的域名
root@registry:/opt/harbor# cat /opt/harbor/harbor.yml | grep hostname:
hostname: registry.syjiang.com

#查看 Docker 是否配置了私有证书
root@registry:/opt/harbor# ll /etc/docker/certs.d
total 12
drwxr-xr-x 3 root root 4096 Jan 12 16:24 ./
drwxr-xr-x 3 root root 4096 Jan 12 16:24 ../
drwxrwxr-x 2 root root 4096 Jan 12 16:24 registry.syjiang.com/
root@registry:/opt/harbor# ll /etc/docker/certs.d/registry.syjiang.com/
total 20
drwxrwxr-x 2 root root 4096 Jan 12 16:24 ./
drwxr-xr-x 3 root root 4096 Jan 12 16:24 ../
-rw-r--r-- 1 root root 1103 Jan 12 16:24 ca.crt
-rw-r--r-- 1 root root 1249 Jan 12 16:24 registry.syjiang.com.cert
-rw------- 1 root root 1675 Jan 12 16:24 registry.syjiang.com.key
#KubeKey 部署 Harbor 时,在registry服务器上安装docker、docker-compose、同步私有证书
1
2
3
4
5
6
7
8
9
10
11
12
13
#k8s集群所有节点此时还不会安装docker,但会创建 /etc/docker/certs.d 目录并同步私有证书
root@k8s01-1:/opt/offline-deployk8sksp# ll /etc/docker/certs.d
total 12
drwxr-xr-x 3 root root 4096 Jan 12 16:24 ./
drwxr-xr-x 3 root root 4096 Jan 12 16:24 ../
drwxrwxr-x 2 root root 4096 Jan 12 16:24 registry.syjiang.com/
root@k8s01-1:/opt/offline-deployk8sksp# ll /etc/docker/certs.d/registry.syjiang.com/
total 20
drwxrwxr-x 2 root root 4096 Jan 12 16:24 ./
drwxr-xr-x 3 root root 4096 Jan 12 16:24 ../
-rw-r--r-- 1 root root 1103 Jan 12 16:24 ca.crt
-rw-r--r-- 1 root root 1249 Jan 12 16:24 registry.syjiang.com.cert
-rw------- 1 root root 1675 Jan 12 16:24 registry.syjiang.com.key

5.7.2 在 Harbor 中创建项目

由于 Harbor项目存在访问控制(RBAC)的限制,即只有指定角色的用户才能执行某些操作。如果未创建项目,则镜像不能被推送到 Harbor。

Harbor中有两种类型的项目:

  • 公共项目(Public):任何用户都可以从这个项目中拉取镜像。
  • 私有项目(Private):只有作为项目成员的用户可以拉取镜像。

使用 KubeKey 作为工具来部署的 Harbor 镜像仓库的一些默认信息如下:

  • Harbor 管理员账号:admin,密码:Harbor12345
  • Harbor 安装与配置文件在registry服务器的 /opt/harbor 目录, 如需运维 Harbor,可至该目录下。

目前我们只是在registry服务器上部署了Harbor镜像仓库,此时只有一个默认的library项目,其余内容是空的。还需要进一步进行配置以作为离线部署k8s+ksp环境时使用的镜像仓库。

1
2
3
4
#执行以下命令下载指定脚本,后面用来初始化 Harbor 仓库(但使用时发现其中少了一个项目 kubesphereio )
root@registry:/opt/harbor# wget https://raw.githubusercontent.com/kubesphere/ks-installer/master/scripts/create_project_harbor.sh

#于是结合 https://www.kubesphere.io/zh/docs/v3.4/installing-on-linux/introduction/air-gapped-installation/#%E7%A6%BB%E7%BA%BF%E5%AE%89%E8%A3%85%E9%9B%86%E7%BE%A4 此文档中给出的create_project_harbor.sh 示例,将两者进行合并,修改得到如下 create_project_harbor.sh 文件

修改点包含如下:

  • 修改 url 的值为 https://registry.syjiang.com
  • 需要指定仓库项目名称列表,包含镜像列表(manifest文件的镜像列表)中的项目名称。
  • 脚本末尾 curl 命令末尾加上 -k,表示允许不使用证书到SSL站点。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#合并与修改后,最终得到的 create_project_harbor.sh 文件内容如下
root@registry:/opt/harbor# cat create_project_harbor.sh
#!/usr/bin/env bash

url="https://registry.syjiang.com"
user="admin"
passwd="Harbor12345"

#此处其实有点奇怪,后面其实将所有镜像只推送到 kubesphereio 这一个项目下。但此处又列表了26个项目,不知道青云的kk工具是不是有其他设计还没有实现
harbor_projects=(
library
kubesphereio
kubesphere
argoproj
calico
coredns
openebs
csiplugin
minio
mirrorgooglecontainers
osixia
prom
thanosio
jimmidyson
grafana
elastic
istio
jaegertracing
jenkins
weaveworks
openpitrix
joosthofman
nginxdemos
fluent
kubeedge
openpolicyagent
)

for project in "${harbor_projects[@]}"; do
echo "creating $project"
curl -u "${user}:${passwd}" -X POST -H "Content-Type: application/json" "${url}/api/v2.0/projects" -d "{ \"project_name\": \"${project}\", \"public\": true}" -k
done
1
2
3
#执行create_project_harbor.sh创建项目
root@registry:/opt/harbor# bash create_project_harbor.sh
#提示“The project named library already exists”可以忽略,因为使用kk部署的Harbor中会创建一个默认的项目library

5.7.3 再次修改集群配置文件

正如官方离线部署文档中所述,使用kk部署harbor前需要前将auths部分注释掉,部署后启用并配置好正确的值。其余内容不变。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#修改k8sV12317-kspV341-offlineconfig.yaml 文件
root@k8s01-1:/opt/offline-deployk8sksp# vi k8sV12317-kspV341-offlineconfig.yaml
...
registry:
# 如需使用 kk 部署 harbor, 可将该参数设置为 harbor,不设置该参数且需使用 kk 创建容器镜像仓库,将默认使用docker registry。
type: harbor
# 如使用 kk 部署的 harbor 或其他需要登录的仓库,可设置对应仓库的auths,如使用 kk 创建的 docker registry 仓库,则无需配置该参数。
# 注意:如使用 kk 部署 harbor,该参数请于 harbor 启动后设置。
auths:
"registry.syjiang.com":
username: admin
password: Harbor12345
certsPath: "/etc/docker/certs.d/registry.syjiang.com"
skipTLSVerify: false # 即使 TLS校验失败了,仍允许通过HTTPS协议连接registry
plainHTTP: false # 允许通过HTTP协议连接registry
# 设置集群部署时使用的私有仓库
privateRegistry: "registry.syjiang.com"
namespaceOverride: "kubesphereio"
registryMirrors: []
insecureRegistries: []
...

相关参数的解释如下:

  • 新增 auths 配置增加 registry.syjiang.com 和账号密码。
  • certsPath 配置docker连接 harbor 时私有证书的存放目录。
  • skipTLSVerify 配置成false,即使 TLS 校验失败了,仍允许通过HTTPS协议连接registry。
  • plainHTTP 配置成false,允许通过HTTP协议连接registry。
  • privateRegistry 确认是配置成 registry.syjiang.com
  • namespaceOverride 确认是配置成 kubesphereio,它的作用就是K8S本身使用到的镜像只到registry.syjiang.com/kubesphereio下面去拉取,如果不配置的话有些镜像是会到另外的项目下拉取的比如kube-system空间中pod/snapshot-controller-0用到的镜像就是registry.syjiang.com/csiplugin/snapshot-controller:v4.0.0

5.7.4 提前推送镜像到 Harbor 仓库

1
2
#执行如下命令推送镜像到 Harbor 仓库(全部会推送到kubesphereio 项目下)
root@k8s01-1:/opt/offline-deployk8sksp# ./kk artifact image push -f k8sV12317-kspV341-offlineconfig.yaml -a k8sV12317-kspV341-artifact.tar.gz

5.7.5 查看Harbor镜像仓库

访问harbor镜像仓库的web管理界面,url是https://10.13.15.71,默认的用户名与密码分别是:admin、 Harbor12345

可以看到如下26个项目,所有镜像都在 kubesphereio 项目下,一共有121个不同名称镜像(标签不同另算)

image-20250112203033095

六、安装 KubeSphere 和 K8s 集群

6.1 安装 K8S 与 KSP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#如果重复部署,/data/openebs/local 目录非空,请将此目录下的所有文件清空,否则安装可能失败、提示如下
#TASK [common : debug] **********************************************************
#ok: [localhost] => {
# "msg": [
# "1. check the storage configuration and storage server",
# "2. make sure the DNS address in /etc/resolv.conf is available",
# "3. execute 'kubectl logs -n kubesphere-system -l job-name=minio-make-bucket-job' to watch logs",
# "4. execute 'helm -n kubesphere-system uninstall ks-minio && kubectl -n kubesphere-system delete job minio-make-bucket-job'",
# "5. Restart the installer pod in kubesphere-system namespace"
# ]
#}

#执行以下命令,安装 KubeSphere 和 K8s 集群
root@k8s01-1:/opt/offline-deployk8sksp# ./kk create cluster -f k8sV12317-kspV341-offlineconfig.yaml -a k8sV12317-kspV341-artifact.tar.gz --with-packages --skip-push-images

#另外打开一个终端,执行如下命令查看集群部署状态:
root@k8s01-1:~# kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

参数说明:

  • k8sV12317-kspV341-offlineconfig.yaml:离线环境集群的配置文件
  • k8sV12317-kspV341-artifact.tar.gz:源集群打包出来的 tar 包镜像
  • --with-packages:表示安装安装操作系统依赖组件
  • --skip-push-images: 表示跳过推送镜像的操作(因为前文已经操作过了)

安装成功完成后,会输出如下信息:

image-20250113095155884

6.2 配置时间同步服务

集群所有节点都要操作。在正式部署k8s+ksp集群过程中,会在集群所有节点上安装chrony服务并启动与开机自启动,但使用的默认时间同步服务器需要联网才能访问,离线环境肯定访问不了这些时间同步服务器。故需要使用自己本地的时间同步服务器。

1
2
#1.查看节点chrony服务状态
root@k8s01-1:~# systemctl status chrony

此时必须要有自己的本地部署的 NTP时间同步服务器,自己想办法搭建(可以是k8s集群中任意节点,也可以是其他服务器)

1
2
3
4
5
6
7
8
9
10
#2.修改配置文件 /etc/chrony/chrony.conf,修改 ntp 服务配置
vi /etc/chrony/chrony.conf
# 将如下认的 pool 配置删除(离线环境中也用不上)
pool ntp.ubuntu.com iburst maxsources 4
pool 0.ubuntu.pool.ntp.org iburst maxsources 1
pool 1.ubuntu.pool.ntp.org iburst maxsources 1
pool 2.ubuntu.pool.ntp.org iburst maxsources 2
# 然后配置自己本地的NTP服务器(最好是一台可以访问外网的服务器,这样它可以跟世界范围内的NTP服务器同步时间),NTP服务器步骤此文未涉及
# 假如10.13.15.20是本地NTP服务器,则添加如下配置(如果没有自己本地的时间同步服务器,最后所有节点是没有做严格时间同步的)
server 10.13.15.20 iburst
1
2
3
4
5
#3.重启 chrony 服务:
systemctl restart chrony

#4.验证 chrony 同步状态:
chronyc sourcestats -v

6.3 问题

6.3.1 Task 'monitoring' 执行失败

部署过程中此任务执行失败,但不影响部署主线过程成功完成,所以仍提示部署成功完成。只需要在部署完成后再处理下此任务相关资源对象即可

image-20250113095337670
1
2
3
#具体报错文字形式描述如下  
"stdout": "fatal: [localhost]: FAILED! => {\"attempts\": 3, \"changed\": true, \"cmd\": \"/usr/local/bin/helm upgrade --install notification-manager /kubesphere/kubesphere/notification-manager -f /kubesphere/kubesphere/notification-manager/custom-values-notification.yaml -n kubesphere-monitoring-system --force\\n\", \"delta\": \"0:05:06.023787\", \"end\": \"2025-01-12 23:08:54.785785\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2025-01-12 23:03:48.761998\", \"stderr\": \"Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition\", \"stderr_lines\": [\"Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition\"], \"stdout\": \"\", \"stdout_lines\": []}",
"uuid": "4af3661f-0690-49dd-befe-2b1bdf6d1974"

解决办法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#当前现状
root@k8s01-1:~# kubectl -n kubesphere-logging-system get pods
...
opensearch-cluster-data-0 0/1 Init:ImagePullBackOff 0 11h
opensearch-cluster-data-1 0/1 Init:ImagePullBackOff 0 11h
opensearch-cluster-master-0 0/1 Init:ImagePullBackOff 0 11h
opensearch-logging-curator-opensearch-curator-28945020-725qp 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-9rprk 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-h2rpk 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-kpm45 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-p2dl9 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-q82qn 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-xkczm 0/1 Error 0 8h

root@k8s01-1:~# kubectl -n kubesphere-logging-system describe pod opensearch-cluster-master-0
#发现此pod中尝试拉取与使用镜像 busybox:latest ,竟然是尝试去docker.io 中拉取镜像
#修改此pod/opensearch-cluster-master-0中使用的镜像为 registry.syjiang.com/kubesphereio/busybox:1.31.1
root@k8s01-1:~# kubectl -n kubesphere-logging-system edit pod opensearch-cluster-master-0
#然后使用同样的方法修改 pod/opensearch-cluster-data-x 中使用的busybox:latest镜像为 registry.syjiang.com/kubesphereio/busybox:1.31.1
root@k8s01-1:~# kubectl -n kubesphere-logging-system edit pod opensearch-cluster-data-xxx

#然后可以看到pod/opensearch-cluster-master与pod/opensearch-cluster-data 都变成正常运行状态
root@k8s01-1:~# kubectl -n kubesphere-logging-system get pods
NAME READY STATUS RESTARTS AGE
...
opensearch-cluster-data-0 1/1 Running 0 11h
opensearch-cluster-data-1 1/1 Running 0 11h
opensearch-cluster-master-0 1/1 Running 0 11h
opensearch-logging-curator-opensearch-curator-28945020-725qp 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-9rprk 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-h2rpk 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-kpm45 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-p2dl9 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-q82qn 0/1 Error 0 8h
opensearch-logging-curator-opensearch-curator-28945020-xkczm 0/1 Error 0 9h

######################################################################
##附带问题1:pod/opensearch-logging-curator-opensearch-curator 仍然是 Error 状态
######################################################################
image-20250113100910415
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#查看其中一个pod/opensearch-logging-curator-opensearch-curator-28945020-725qp
root@k8s01-1:~# kubectl -n kubesphere-logging-system describe pod opensearch-logging-curator-opensearch-curator-28945020-725qp
...
Controlled By: Job/opensearch-logging-curator-opensearch-curator-28945020
...
#过去了8个小时,此时事件已经为None
...

#发现上述pod/opensearch-logging-curator-opensearch-curator-28945020-725qp由Job/opensearch-logging-curator-opensearch-curator-28945020 控制
#于是查看 Job/opensearch-logging-curator-opensearch-curator-28945020
root@k8s01-1:~# kubectl -n kubesphere-logging-system describe job opensearch-logging-curator-opensearch-curator-28945020
Name: opensearch-logging-curator-opensearch-curator-28945020
Namespace: kubesphere-logging-system
Selector: controller-uid=d759cadd-c7d9-46aa-9e1a-a2dada8a8597
Labels: app=opensearch-curator
release=opensearch-logging-curator
Annotations: revisions:
{"1":{"status":"failed","reasons":["BackoffLimitExceeded"],"messages":["Job has reached the specified backoff limit"],"desire":1,"failed":...
Controlled By: CronJob/opensearch-logging-curator-opensearch-curator
...

#查看CronJob/opensearch-logging-curator-opensearch-curator
root@k8s01-1:~# kubectl -n kubesphere-logging-system describe cronjobs.batch opensearch-logging-curator-opensearch-curator
Name: opensearch-logging-curator-opensearch-curator
Namespace: kubesphere-logging-system
Labels: app=opensearch-curator
app.kubernetes.io/managed-by=Helm
chart=opensearch-curator-0.0.5
heritage=Helm
release=opensearch-logging-curator
Annotations: meta.helm.sh/release-name: opensearch-logging-curator
meta.helm.sh/release-namespace: kubesphere-logging-system
Schedule: 0 1 * * *

#由上可知,上述失败7个pod由Job/opensearch-logging-curator-opensearch-curator-28945020 控制,而Job/opensearch-logging-curator-opensearch-curator-28945020 由 CronJob/opensearch-logging-curator-opensearch-curator 控制。它每1个小时执行一次
#所以暂时可以不用理会上述报错的7个pod,如果想迅速看到成功的记录,可以去 kubesphere的web管理界面重新执行上述job
image-20250113104345346
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#可以看到新创建的 pod/opensearch-logging-curator-opensearch-curator-28945020-7r6ql 是Completed状态,表示正常完成
root@k8s01-1:~# kubectl -n kubesphere-logging-system get pods
NAME READY STATUS RESTARTS AGE
...
opensearch-logging-curator-opensearch-curator-28945020-725qp 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-7r6ql 0/1 Completed 0 24s
opensearch-logging-curator-opensearch-curator-28945020-9rprk 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-h2rpk 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-kpm45 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-p2dl9 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-q82qn 0/1 Error 0 9h
opensearch-logging-curator-opensearch-curator-28945020-xkczm 0/1 Error 0 9h

#查看其日志
root@k8s01-1:~# kubectl -n kubesphere-logging-system logs opensearch-logging-curator-opensearch-curator-28945020-7r6ql
#其他7个处于Error 状态的pod/opensearch-logging-curator-opensearch-curator-28945020-xxx 是由job不断创建新pod直到达到尝试上限7而生成(spec.backoffLimit默认为6,即重试6次,共7个pod)pod都失败后,认为job失败

#此时可以删除上述处于Error 状态的pod/opensearch-logging-curator-opensearch-curator-28945020-xxx,也可以不理会
root@k8s01-1:~# kubectl -n kubesphere-logging-system get pod | grep opensearch-logging-curator-opensearch-curator-28945020 | grep -v "opensearch-logging-curator-opensearch-curator-28945020-7r6ql" | awk '{print $1}' | xargs kubectl -n kubesphere-logging-system delete pod
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
######################################################################
##附带问题2:istio-system空间中pod/jaeger-es-index-cleaner-xxx 仍然是 Error 状态
######################################################################
#
root@k8s01-1:~# kubectl -n istio-system get pods
NAME READY STATUS RESTARTS AGE
istiod-1-14-6-775dfd88cd-jhc69 1/1 Running 0 12h
jaeger-collector-b4555bcd4-s9b9w 1/1 Running 139 (60m ago) 12h
jaeger-es-index-cleaner-28944955-52ggg 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-5fdln 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-6crz4 0/1 Error 0 10h
jaeger-es-index-cleaner-28944955-7hkwt 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-brdhh 0/1 Error 0 10h
jaeger-es-index-cleaner-28944955-dmrtx 0/1 Error 0 10h
jaeger-es-index-cleaner-28944955-mzx44 0/1 Error 0 11h
...

#通过查看pod/jaeger-es-index-cleaner-28944955-52ggg 日志,发现其也是因为不能正常使用 ns/kubesphere-logging-system 下opensearch-cluster-data 相关svc与pod提供的服务而造成相关job执行失败
#所以同样也可以去kubesphere的web界面重新执行 job/jaeger-es-index-cleaner-28944955 即可
#然后将创建一个新的pod/jaeger-es-index-cleaner-28944955-tznkj 并是处于Completed 状态
root@k8s01-1:~# kubectl -n istio-system get pods
NAME READY STATUS RESTARTS AGE
istiod-1-14-6-775dfd88cd-jhc69 1/1 Running 0 12h
jaeger-collector-b4555bcd4-s9b9w 1/1 Running 139 (62m ago) 12h
jaeger-es-index-cleaner-28944955-52ggg 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-5fdln 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-6crz4 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-7hkwt 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-brdhh 0/1 Error 0 10h
jaeger-es-index-cleaner-28944955-dmrtx 0/1 Error 0 10h
jaeger-es-index-cleaner-28944955-mzx44 0/1 Error 0 11h
jaeger-es-index-cleaner-28944955-tznkj 0/1 Completed 0 9s
jaeger-operator-846bc67879-hwc7r 1/1 Running 0 12h
jaeger-query-579b5c5f7c-hhwk8 2/2 Running 137 (59m ago) 12h
kiali-6cd665f559-2g8mx 1/1 Running 0 12h
kiali-operator-745866d585-xvfwh 1/1 Running 0 12h
#删除先前遗留下来的垃圾pod(可选)
root@k8s01-1:~# kubectl -n istio-system get pods | grep Error | awk '{print $1}' | xargs kubectl -n istio-system delete pod

#到目前为止 ,集群中所有pod都是正常运行状态或正常结束状态

6.4 部署结果验证

6.4.1 命令行查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#查看k8s集群节点列表
root@k8s01-1:~# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s01-1 Ready control-plane,master,worker 12h v1.23.17
k8s01-2 Ready control-plane,master,worker 12h v1.23.17
k8s01-3 Ready control-plane,master,worker 12h v1.23.17
k8s01-4 Ready worker 12h v1.23.17

#查看k8s集群组件状态
root@k8s01-1:~# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}

#查看calico 节点
root@k8s01-1:~# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 10.13.15.17 | node-to-node mesh | up | 14:43:14 | Established |
| 10.13.15.161 | node-to-node mesh | up | 14:43:15 | Established |
| 10.13.15.99 | node-to-node mesh | up | 14:43:13 | Established |
+--------------+-------------------+-------+----------+-------------+
...

#查看所有pod(集群中所有pod都是正常运行状态或正常结束状态)
root@k8s01-1:~# kubectl get pods -A -o wide

6.4.2 查看ksp3.4.1的web管理界面

访问kubesphere3.4.1的web管理界面:http://10.13.15.16:30880/login ,默认用户名:admin,其默认密码:P@88w0rd

image-20250113145359197

6.5 控制节点配置keepalived

参考 https://jiangsanyin.github.io/2024/08/10/%E6%90%AD%E5%BB%BA%E4%BA%92%E4%B8%BA%E4%B8%BB%E5%A4%87MySQL5-7%E9%9B%86%E7%BE%A4%E5%B9%B6%E5%AE%9E%E7%8E%B0%E8%87%AA%E5%8A%A8%E5%88%87%E6%8D%A2/#2-3-%E5%AE%89%E8%A3%85%E4%B8%8E%E9%85%8D%E7%BD%AEkeepalived

七、参考文章

  • kubesphere官网离线部署文档:https://www.kubesphere.io/zh/docs/v3.4/installing-on-linux/introduction/air-gapped-installation

  • kubesphere关于部署配置文件的描述:https://github.com/kubesphere/kubekey/blob/master/docs/config-example.md

  • kubesphere在cnblogs上的博文:https://www.cnblogs.com/kubesphere/p/18208852

  • 运维博主运维有术文章:https://zhuanlan.zhihu.com/p/672059287


k8s离线部署-使用kubekey部署amd64高可用版k8s1.23.17-ksp3.4.1
https://jiangsanyin.github.io/2024/09/27/k8s离线部署-使用kubekey部署amd64高可用版k8s1-23-17-ksp3-4-1/
作者
sanyinjiang
发布于
2024年9月27日
许可协议