1 部署环境准备
1.1 节点规划
主机名
用途
IP
root用户密码
规格
系统盘
OS
备注
controller01
master + worker
113.57.37.106
XXX
192c1T
1124G
Ubuntu 22.04.4 LTS-aarch64
Kunpeng-920服务器,联网
1.2 服务器基础准备
1.2.1 修改各节点主机名
1 hostnamectl set-hostname controller01
1.2.2
配置主机名与ip映射(所有节点)
1 2 3 root@controller01:~# cat /etc/hosts ... 113.57.37.106 controller01
1.2.3
配置从master01到所有节点的ssh免密登录
1 2 root@controller01 :~ root@controller01 :~
1.2.4
关闭防火墙等设置与安装一些常用的工具组件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 关闭防火墙 systemctl status ufw systemctl stop ufw systemctl disable ufw# 设置时区为亚洲上海 timedatectl set-timezone Asia/Shanghai# 安装常用工具组件 apt-get update && apt install -y tmux wget# 安装k8s相关服务依赖组件 apt install -y socat conntrack ebtables ipset chrony systemctl enable chrony.service && systemctl restart chrony.service && systemctl status chrony# 查看时间同步情况 chronyc sources
1.2.5
Swap分区禁用及内核参数调整
1 2 3 4 5 6 7 8 # 临时禁用swap分区 swapoff -a# 注释swap配置行,永久禁用swap分区 sed -i 's/.*swap.*/#&/' /etc/fstab# 调整内核参数 echo 'vm.swappiness=0' >> /etc/sysctl.d/k8s.conf sysctl -p /etc/sysctl.d/k8s.conf
1.2.6
转发IPv4并让iptables看到桥接流量
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # 设置OS启动时的加载模块 echo 'overlay' >> /etc/modules-load.d/k8s.conf;echo 'br_netfilter' >> /etc/modules-load.d/k8s.conf# 加载overlay与br_netfilter模块 modprobe overlay && modprobe br_netfilter lsmod | grep overlay lsmod | grep br_netfilter# 设置内核参数(重启依然生效) echo 'net.bridge.bridge-nf-call-iptables = 1' >> /etc/sysctl.d/k8s.conf echo 'net.bridge.bridge-nf-call-ip6tables = 1' >> /etc/sysctl.d/k8s.conf echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.d/k8s.conf# bridge-nf # bridge-nf 使得 netfilter 可以对 Linux 网桥上的 IPv4/ARP/IPv6 包过滤。比如,设置net.bridge.bridge-nf-call-iptables=1后,二层的网桥在转发包时也会被 # net.bridge.bridge-nf-call-arptables:是否在 arptables 的 FORWARD 中过滤网桥的 ARP 包 # net.bridge.bridge-nf-call-ip6tables:是否在 ip6tables 链中过滤 IPv6 包 # net.bridge.bridge-nf-call-iptables:是否在 iptables 链中过滤 IPv4 包 # net.bridge.bridge-nf-filter-vlan-tagged:是否在 iptables/arptables 中过滤打了 vlan 标签的包。 # 防火墙是保护服务器和基础设施安全的重要工具。在 Linux 生态系统中,iptables 是使 用很广泛的防火墙工具之一,它基于内核的包过滤框架(packet filtering # Linux 上最常用的防火墙工具是 iptables。iptables 与协议栈内有包过滤功能的 hook 交 互来完成工作。这些内核 hook 构成了 netfilter 框架。 # 每个进入网络系统的包(接收或发送)在经过协议栈时都会触发这些 hook,程序 可以通过注册 hook 函数的方式在一些关键路径上处理网络流量。iptables 相关的内核模 # 不需要重启使内核参数生效,然后查看当前生效的相关内核参数 sysctl --system sysctl net.bridge.bridge-nf-call-iptables sysctl net.bridge.bridge-nf-call-ip6tables sysctl net.ipv4.ip_forward
1.3 Docker环境准备
安装docker engine,安装步骤如下。
1 2 3 4 5 6 7 8 9 10 11 12 13 # 更新安装源索引 apt-get update# 安装依赖包,用于通过HTTPS来获取仓库 apt install apt-transport-https ca-certificates curl gnupg-agent software-properties-common# 添加 Docker 的官方 GPG 密钥 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -# 设置稳定版仓库(添加到/etc/apt/sources.list中) add-apt-repository "deb [arch=arm64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"# 如果提示“Could not handshake”就尝试多执行几次 gpg --keyserver keyserver.ubuntu.com --recv 7EA0A9C3F273FCD8 gpg --export --armor 7EA0A9C3F273FCD8 |apt-key add -
1 2 3 4 5 6 7 # 查询可安装docker-ce版本 apt-get update apt-cache policy docker-ce# 安装指定版本 apt-get install docker-ce=5:20.10.24~3-0~ubuntu-jammy# 验证安装是否成功 docker --version
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # 修改docker守护进程配置文件,更新 cgroupdriver 为systemd tee /etc/docker/daemon.json <<EOF { "registry-mirrors": [ "https://hub.rat.dev", "https://registry.aliyuncs.com", "https://registry.docker-cn.com", "https://docker.chenby.cn", "https://docker.registry.cyou", "https://docker-cf.registry.cyou", "https://dockercf.jsdelivr.fyi", "https://docker.jsdelivr.fyi", "https://dockertest.jsdelivr.fyi", "https://dockerproxy.com", "https://docker.m.daocloud.io", "https://docker.nju.edu.cn", "https://docker.mirrors.sjtug.sjtu.edu.cn", "https://docker.mirrors.ustc.edu.cn", "https://mirror.iscas.ac.cn", "https://docker.rainbond.cc" ], "exec-opts": ["native.cgroupdriver=systemd"] } EOF# 重载配置文件,重启docker服务 systemctl daemon-reload systemctl restart docker systemctl status docker
2 集群部署
2.1 组件安装源准备
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # 安装组件 apt-get update apt install -y apt-transport-https ca-certificates curl# 添加GPG 密钥 # curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add - curl -s https://mirrors.huaweicloud.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -# 添加kubernetes相关组件安装源 # 不管是阿里云(https://mirrors.aliyun.com/kubernetes/apt)还是华为云(https://mirrors.huaweicloud.com/kubernetes/apt/),都没有kubernetes-jammy。仍可使用kubernetes-xenial cat <<EOF > /etc/apt/sources.list.d/kubernetes.list deb https://mirrors.huaweicloud.com/kubernetes/apt/ kubernetes-xenial main EOF apt-get update
2.2 集群基础组件安装
1 2 3 4 5 6 7 8 9 10 11 12 # 安装kubelet kubeadm kubectl apt-get update# 列出所有可用的 kubelet 版本 apt-cache madison kubelet apt install -y kubelet=1.23.17-00 kubeadm=1.23.17-00 kubectl=1.23.17-00 kubernetes-cni# 此时kubelet服务处于loaded|activating状态,无需理会 systemctl enable kubelet# 设置组件不自动更新 apt-mark hold kubelet kubeadm kubectl
2.3 集群镜像准备
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # 下载镜像并上传到harbor # 如果没有harbor服务器或不知道它是什么,就直接使用 registry.aliyuncs.com/google_containers/xxx k8s_ver="1.23.17" kubeadm config images list --kubernetes-version=${k8s_ver} | awk -F "/" '{print $NF}' images=$(kubeadm config images list --kubernetes-version=${k8s_ver} | awk -F "/" '{print $NF}') for i in ${images}; do echo $i; docker pull registry.aliyuncs.com/google_containers/$i; done# 如果有harbor服务器 k8s_ver="1.23.17" harbor_url="175.6.40.93:8196" kubeadm config images list --kubernetes-version=${k8s_ver} | awk -F "/" '{print $NF}' images=$(kubeadm config images list --kubernetes-version=${k8s_ver} | awk -F "/" '{print $NF}') for i in ${images}; do echo $i; docker pull registry.aliyuncs.com/google_containers/$i; docker tag registry.aliyuncs.com/google_containers/$i ${harbor_url}/google_containers/$i; docker push ${harbor_url}/google_containers/$i; docker rmi registry.aliyuncs.com/google_containers/$i; done
2.4 集群初始化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # 如果没有harbor服务器或不知道它是什么 kubeadm init \ --image-repository=registry.aliyuncs.com/google_containers \ --kubernetes-version=1.23.17 \ --pod-network-cidr="10.250.0.0/16" \ --apiserver-advertise-address=113.57.37.106 \ --service-cidr="10.96.0.0/12" \ --ignore-preflight-errors strings=Swap# 如果有harbor服务器 kubeadm init \ --image-repository=175.6.40.93:8196/google_containers \ --kubernetes-version=1.23.17 \ --pod-network-cidr="10.250.0.0/16" \ --apiserver-advertise-address=172.20.0.21 \ --service-cidr="10.96.0.0/12" \ --ignore-preflight-errors strings=Swap
1 2 3 4 5 6 # --image-repository harbor服务器长域名+仓库名 # --kubernetes-version 上面的${k8s_ver} # --pod-network-cidr k8s集群的pod网段,可自定义 # --apiserver-advertise-address master节点的管理ip # --service-cidr k8s集群的svc网段,可自定义 # --ignore-preflight-errors strings=Swap 值只能是一个列表的子集,详情查看kubeadm init --help
2.5
工作节点添加(删除master节点污点)
1 2 3 4 # 配置kube config文件 mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config
1 2 3 4 5 # 因为我们现在是搭建单节点k8s集群,所以master节点既是控制节点也是工作节点 # 但K8S中默认master节点是不能调度任何pod的,因为此处是单节点k8s集群,所以需要将控制节点污点去除 kubectl taint nodes --all node-role.kubernetes.io/master-# 搭建好单节点k8s集群后,在此基础上添加工作节点,参考“3 添加节点”章节
1 2 3 4 5 6 7 # 使能kubectl 命令补全 cat >> /root/.bashrc <<EOF source <(kubectl completion bash) EOF# 加载此文件 source /root/.bashrc
2.6 集群网络配置
此处使用的k8s集群网络解决方案是calico,以下是安装网络插件(calico)的步骤。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Flannel、Calico、kubeovn、华为的Canal、Weave Net 等,大家有兴趣可以自己去研究下 # 容器镜像同步网站:https://docker.aityp.com/ mkdir -p /data/kubernetes/network/flannel && cd /data/kubernetes/network/flannel wget --no-check-certificate https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml # 修改下载下来的kube-flannel.yml 文件,有两点 # (1)将其中两个镜像修改为国内镜像 docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel-cni-plugin:v1.7.1-flannel1-linuxarm64 docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel:v0.27.0-linuxarm64# (2)将net-conf.json--->Network的值修改为跟kubeadm init xxx命令中“--pod-network-cidr”参数值一样 # 即修改为10.250.0.0/16 # 修改后如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 --- kind: Namespace apiVersion: v1 metadata: name: kube-flannel labels: k8s-app: flannel pod-security.kubernetes.io/enforce: privileged --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: flannel name: flannel rules: - apiGroups: - "" resources: - pods verbs: - get - apiGroups: - "" resources: - nodes verbs: - get - list - watch - apiGroups: - "" resources: - nodes/status verbs: - patch --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: flannel name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-flannel --- apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: flannel name: flannel namespace: kube-flannel --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-flannel labels: tier: node k8s-app: flannel app: flannel data: cni-conf.json: | { "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } net-conf.json: | { "Network": "10.250.0.0/16", "EnableNFTables": false, "Backend": { "Type": "vxlan" } } --- apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds namespace: kube-flannel labels: tier: node app: flannel k8s-app: flannel spec: selector: matchLabels: app: flannel template: metadata: labels: tier: node app: flannel spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux hostNetwork: true priorityClassName: system-node-critical tolerations: - operator: Exists effect: NoSchedule serviceAccountName: flannel initContainers: - name: install-cni-plugin image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel-cni-plugin:v1.7.1-flannel1-linuxarm64 command: - cp args: - -f - /flannel - /opt/cni/bin/flannel volumeMounts: - name: cni-plugin mountPath: /opt/cni/bin - name: install-cni image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel:v0.27.0-linuxarm64 command: - cp args: - -f - /etc/kube-flannel/cni-conf.json - /etc/cni/net.d/10-flannel.conflist volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ containers: - name: kube-flannel image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel:v0.27.0-linuxarm64 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr resources: requests: cpu: "100m" memory: "50Mi" securityContext: privileged: false capabilities: add: ["NET_ADMIN" , "NET_RAW" ] env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: EVENT_QUEUE_DEPTH value: "5000" - name: CONT_WHEN_CACHE_NOT_READY value: "false" volumeMounts: - name: run mountPath: /run/flannel - name: flannel-cfg mountPath: /etc/kube-flannel/ - name: xtables-lock mountPath: /run/xtables.lock volumes: - name: run hostPath: path: /run/flannel - name: cni-plugin hostPath: path: /opt/cni/bin - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg - name: xtables-lock hostPath: path: /run/xtables.lock type: FileOrCreate
1 2 3 4 5 # 执行上述 yaml文件 # root@controller01:/data/kubernetes/network/flannel# kubectl create -f kube-flannel.yml # 查看k8s集群中所有pod在状态(默认每3秒刷新一次,退出时请使用组合键”ctrl+c“) watch kubectl get pods -A -o wide
2.7 集群客户端认证文件准备
其实,对于单节点k8s集群,此操作在前面已经执行过了,可以跳过。对于多节点K8s集群,则需要将此文件复制到所有节点的$HOME/.kube
下,一般是直接使用root用户操作,其HOME
目录就是/root
1 2 3 4 # 配置kube config文件 mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config
2.8 验证集群状态与可用性
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 root@controller01:~# arch aarch64 root@controller01:~# uname -a Linux controller01 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:26:57 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux root@controller01:~# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME controller01 Ready control-plane,master 17m v1.23.17 113.57.37.106 <none> Ubuntu 22.04.4 LTS 5.15.0-94-generic docker://20.10.24 root@controller01:~# kubectl get cs Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy {"health":"true","reason":""} root@controller01:~# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-shsdv 1/1 Running 0 3m9s kube-system coredns-6d8c4cb4d-fmjkw 1/1 Running 0 16m kube-system coredns-6d8c4cb4d-nzn5v 1/1 Running 0 16m kube-system etcd-controller01 1/1 Running 0 17m kube-system kube-apiserver-controller01 1/1 Running 0 17m kube-system kube-controller-manager-controller01 1/1 Running 0 17m kube-system kube-proxy-7cccn 1/1 Running 0 17m kube-system kube-scheduler-controller01 1/1 Running 0 17m
对于单节点K8S1.23.17集群的搭建,至此就结束了。后续内容是添加工作节点与卸载集群的操作。
3 添加节点
此处说的添加节点单纯是指添加工作节点。
新添加到k8s集群的节点也需要先完成1.2~2.2章节的操作,然后执行如下操作。
当我们使用kubeadm初始化一个集群后,会生成一个token,用这个token来添加新的节点进入此k8s集群,但此token默认只有24H有效期。
如果token还没有过期:
1 2 3 4 5 6 7 8 9 10 11 12 13 # 查看是否存在有效的 token 值 root@controller01:~# kubeadm token list# 获取token root@controller01:~# kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 4bwou4.h82yx951wezuo08a 23h 2025-02-10T04:14:45Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token# 获取CA证书 sha256 编码 hash 值 root@controller01:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'# 使用上面输出的token与hash 值,添加新节点(在新节点上执行) root@ksp-registry:~# kubeadm join 113.57.37.106:6443 --token 4bwou4.h82yx951wezuo08a --discovery-token-ca-cert-hash sha256:78f2d6dbcc9b8aab7b1edfcbd276fc0c70858dd60f1c6e15666020f6c91ca362
如果token已过期之后(没过期也可以立刻重新生成),可以使用如下命令重新生成相关token与添加节点的命令:
1 2 3 4 root@controller01:~# kubeadm token create --print-join-command# 上述命令执行后,会输出一行命令,将这行命令在新节点上执行 root@ksp-registry:~# kubeadm join 113.57.37.106:6443 --token alvy7y.kvvw4xomybxj67o6 --discovery-token-ca-cert-hash sha256:78f2d6dbcc9b8aab7b1edfcbd276fc0c70858dd60f1c6e15666020f6c91ca362
为新节点打标签:
1 root@controller01:~# kubectl label node ksp-registry node-role.kubernetes.io/worker=""
1 2 3 4 root@controller01:/opt/cni/bin# kubectl get nodes NAME STATUS ROLES AGE VERSION controller01 Ready control-plane,master,worker 25d v1.23.17 ksp-registry Ready worker 23m v1.23.17
4 卸载集群
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # 所有节点执行: systemctl stop kubelet systemctl stop etcd# 所有节点执行: kubeadm reset # 所有节点执行: dpkg -l | grep kube | awk '{print $2}' | xargs dpkg --purge dpkg -l | grep kube# 所有节点执行: rm -rf $HOME/.kube rm -rf ~/.kube/ rm -rf /etc/kubernetes/ rm -rf /etc/systemd/system/kubelet.service.d rm -rf /etc/systemd/system/kubelet.service rm -rf /usr/bin/kube* rm -rf /etc/cni rm -rf /opt/cni rm -rf /var/lib/etcd/*