CentOS7系统上Kubernetes集群搭建

虚拟机创建

在自己的Mac系统里面利用Parallels Desktop创建3台虚拟机,具体信息如下:

CentOS7-Node1:  
  10.211.55.7 
  parallels/centos-test

CentOS7-Node2:  
  10.211.55.8
  parallels/centos-test

CentOS7-Node3:  
  10.211.55.9
  parallels/centos-test

Master安装

选择CentOS7-Node1机器作为Master节点。

配置yum

更新yum源:

[parallels@CentOS7-Node1 yum.repos.d]$ cd /etc/yum.repos.d
[parallels@CentOS7-Node1 yum.repos.d]$ sudo touch kubernetes.repo
[kubernetes]
name=Kubernetes  
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64  
enabled=1  
gpgcheck=0  
repo_gpgcheck=0  
安装Kubernetes环境

评估下来,利用kubeadm来搭建是大家比较推荐的,而且公司的集群也是。所以毫不忧虑就用kubeadm。

[parallels@CentOS7-Node1 yum.repos.d]$ yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
Loaded plugins: fastestmirror, langpacks  
You need to be root to perform this command.  
[parallels@CentOS7-Node1 yum.repos.d]$ sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
Loaded plugins: fastestmirror, langpacks  
Loading mirror speeds from cached hostfile  
 * base: mirrors.aliyun.com
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
kubernetes                                                                                                                                                         | 1.4 kB  00:00:00  
kubernetes/primary                                                                                                                                                 |  58 kB  00:00:00  
kubernetes                                                                                                                                                                        421/421  
Resolving Dependencies  
--> Running transaction check
...... # 省略一堆无意义的日志
Dependency Installed:  
  conntrack-tools.x86_64 0:1.4.4-5.el7_7.2             cri-tools.x86_64 0:1.13.0-0                    kubernetes-cni.x86_64 0:0.7.5-0      libnetfilter_cthelper.x86_64 0:1.0.0-10.el7_7.1     
  libnetfilter_cttimeout.x86_64 0:1.0.0-6.el7_7.1      libnetfilter_queue.x86_64 0:1.0.2-2.el7_2      socat.x86_64 0:1.7.3.2-2.el7        

Complete!  
关于yum的配置与升级
yum install -y yum-utils device-mapper-persistent-data lvm2  
yum update  
启动docker

启动Docker,加入开启机动项:

[parallels@CentOS7-Node1 ~]$ sudo systemctl enable docker && systemctl start docker
[sudo] password for parallels: 
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.  
启动kubelet

启动kubelet,加入开机启动项:

sudo systemctl enable kubelet && systemctl start kubelet  
[parallels@CentOS7-Node1 ~]$ sudo systemctl enable kubelet && systemctl start kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.  
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.  
Authenticating as: Parallels (parallels)  
Password:  
==== AUTHENTICATION COMPLETE ===

kubeadm config

[parallels@CentOS7-Node1 Workspace]$ kubeadm config print init-defaults
apiVersion: kubeadm.k8s.io/v1beta2  
bootstrapTokens:  
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration  
localAPIEndpoint:  
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:  
  criSocket: /var/run/dockershim.sock
  name: centos7-node1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:  
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2  
certificatesDir: /etc/kubernetes/pki  
clusterName: kubernetes  
controllerManager: {}  
dns:  
  type: CoreDNS
etcd:  
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io  
kind: ClusterConfiguration  
kubernetesVersion: v1.16.0  
networking:  
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}  
kubeadm config print init-defaults > /home/parallels/Workspace/init.default.yaml  

配置Docker

首先要安装好Docker环境,请参考之前的 http://www.cyblogs.com/centos7shang-an-zhuang-docker/

Docker的一些相关命令

yum install docker-ce-18.09.9-3.el7 # 指定版本为18.09.9-3.el7

systemctl status docker  
systemctl restart docker  
systemctl daemon-reload  

下载kubernetes的相关镜像

配置镜像地址,但没什么用。后面还是需要用到国内的镜像:

echo '{"registry-mirrors":["https://docker.mirrors.ustc.edu.cn"]}' > /etc/docker/daemon.json  
# 如果提示没有权限,就手动vim添加进去。然后重启docker服务

查看一下kubernetes依赖的镜像名称以及版本

[parallels@CentOS7-Node1 Workspace]$ kubeadm config images list
W1022 13:51:12.550171   19704 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)  
W1022 13:51:12.550458   19704 version.go:102] falling back to the local client version: v1.16.2  
k8s.gcr.io/kube-apiserver:v1.16.2  
k8s.gcr.io/kube-controller-manager:v1.16.2  
k8s.gcr.io/kube-scheduler:v1.16.2  
k8s.gcr.io/kube-proxy:v1.16.2  
k8s.gcr.io/pause:3.1  
k8s.gcr.io/etcd:3.3.15-0  
k8s.gcr.io/coredns:1.6.2  

如果网络OK,应该直接执行这个命令即可,但实际会报错误。

[parallels@CentOS7-Node1 Workspace]$ sudo kubeadm config images pull --config=/home/parallels/Workspace/init.default.yaml

# 这里由于网络拉取镜像的问题,基本无法操作,只能先去aliyun获取回来后再修改tag的方式,错误如下。
[parallels@CentOS7-Node1 Workspace]$ sudo kubeadm config images pull --config=/home/parallels/Workspace/init.default.yaml
[sudo] password for parallels: 
failed to pull image "k8s.gcr.io/kube-apiserver:v1.16.0": output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)  
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher  
获取镜像

通过另外一种方式来获取镜像:

touch kubeadm.sh

#!/bin/bash

KUBE_VERSION=v1.16.0  
KUBE_PAUSE_VERSION=3.1  
ETCD_VERSION=3.3.15-0  
CORE_DNS_VERSION=1.6.2

GCR_URL=k8s.gcr.io  
ALIYUN_URL=registry.cn-hangzhou.aliyuncs.com/google_containers

images=(  
    kube-apiserver:${KUBE_VERSION}
    kube-controller-manager:${KUBE_VERSION}
    kube-scheduler:${KUBE_VERSION}
    kube-proxy:${KUBE_VERSION}
    pause:${KUBE_PAUSE_VERSION}
    etcd:${ETCD_VERSION}
    coredns:${CORE_DNS_VERSION}
)

for imageName in ${images[@]} ; do  
  docker pull $ALIYUN_URL/$imageName
  docker tag  $ALIYUN_URL/$imageName $GCR_URL/$imageName
  docker rmi  $ALIYUN_URL/$imageName
done  

拉取镜像

chmod u+x kubeadm.sh # 添加权限  
sudo ./kubeadm.sh  

剩下的就是耐心等待......

查看最终本地的镜像

[root@CentOS7-Node1 Workspace]# docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE  
k8s.gcr.io/kube-apiserver            v1.16.0             b305571ca60a        4 weeks ago         217MB  
k8s.gcr.io/kube-proxy                v1.16.0             c21b0c7400f9        4 weeks ago         86.1MB  
k8s.gcr.io/kube-controller-manager   v1.16.0             06a629a7e51c        4 weeks ago         163MB  
k8s.gcr.io/kube-scheduler            v1.16.0             301ddc62b80b        4 weeks ago         87.3MB  
k8s.gcr.io/etcd                      3.3.15-0            b2756210eeab        6 weeks ago         247MB  
k8s.gcr.io/coredns                   1.6.2               bf261d157914        2 months ago        44.1MB  
k8s.gcr.io/pause                     3.1                 da86e6ba6ca1        22 months ago       742kB  
[parallels@CentOS7-Node1 Workspace]$ sudo kubeadm init --config=init.default.yaml 
[init] Using Kubernetes version: v1.16.0
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.4. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:  
        [ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher  
关闭防火墙

解决掉防火墙的问题,请参阅:http://www.cyblogs.com/centos7cha-kan-he-guan-bi-fang-huo-qiang/

cgroupfs错误
detected "cgroupfs" as the Docker cgroup driver  
{
  "registry-mirrors": [
    "https://registry.docker-cn.com"
  ],
  "live-restore": true,
  "exec-opts": [
    "native.cgroupdriver=systemd" # 修改用户
  ]
}

# 重新启动Docker
systemctl restart docker  
systemctl status docker  
禁止swap

还是发现需要禁止掉swap

Oct 22 16:35:36 CentOS7-Node1 kubelet[1395]: F1022 16:35:36.065168    1395 server.go:271] failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename                                Type                Size        Used        Priority /dev/dm-1                               partition        2097148        29952        -1]  
swapoff -a  
#要永久禁掉swap分区,打开如下文件注释掉swap那一行
sudo vi /etc/fstab  

再次启动kubeadm init

kubeadm init --config=init.default.yaml

[init] Using Kubernetes version: v1.16.2
...
[preflight] Pulling images required for setting up a Kubernetes cluster
...
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
...
[certs] Using certificateDir folder "/etc/kubernetes/pki"
...
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

出现错误了,变更Docker的版本后,继续执行,还是会报错误。

 [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
重设kubeadm

这里需要重设kubeadm了。具体操作如下:

kubeadm reset  
echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables  
journalctl查看日志
journalctl -xefu kubelet  

这里还是会报错,因为之前的

apiVersion: kubeadm.k8s.io/v1beta2  
kind: ClusterConfiguration  
imageRepository: k8s.gcr.io  
kubernetesVersion: v1.16.0  
networking:  
  dnsDomain: cluster.local
  serviceSubnet: "10.96.0.0/16"

继续执行init的过程,kubeadm init --config=/home/parallels/Workspace/init.default.yaml

[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
  You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:  
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.211.55.7:6443 --token imwj34.ksfiwzj5ga80du0r \  
    --discovery-token-ca-cert-hash sha256:7ffef85880ed43dd539afa045715f9ad5bef15e904cede96213d6cfd4adb0795 

真心不容易,这里一直反反复复执行。只要是images的版本问题以及init的过程容易出错。

验证configmap
[root@CentOS7-Node1 ~]# kubectl get -n kube-system configmap   
NAME                                 DATA   AGE  
coredns                              1      5m49s  
extension-apiserver-authentication   6      5m53s  
kube-proxy                           2      5m49s  
kubeadm-config                       2      5m50s  
kubelet-config-1.16                  1      5m50s  

安装Node,加入集群

安装跟Master一直的基本环境,包括docker,kubelet,kubeadm等,重复上面的动作。

scp root@10.211.55.7:/home/parallels/Workspace/init.default.yaml .  
scp root@10.211.55.7:/home/parallels/Workspace/kubeadm.sh .  
yum install docker-ce-18.06.3.ce-3.el7  

kubeadm命令生成配置文件,创建join-config.yaml,内容如下:

apiVersion: kubeadm.k8s.io/v1beta2  
kind: JoinConfiguration  
discovery:  
  bootstrapToken:
    apiServerEndpoint: 10.211.55.7:6443
    token: imwj34.ksfiwzj5ga80du0r
    unsafeSkipCAVerification: true
  tlsBootstrapToken: imwj34.ksfiwzj5ga80du0r

其中,apiServerEndpoint的值来自于Master的服务器地址,这里就是10.211.55.7tokentlsBootstrapToken的值就来自于kubeadm init安装Master的最后一行提示信息。这里一定要注意yaml文件的格式,否则执行会报错误。

[root@CentOS7-Node2 Workspace]# kubeadm join  --config=join-config.yaml
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:  
        [ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher  
[root@CentOS7-Node2 Workspace]# swapoff -a
[root@CentOS7-Node2 Workspace]# kubeadm join  --config=join-config.yaml
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:  
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.  
安装网络插件

去Master机器,执行:

[root@CentOS7-Node1 Workspace]# kubectl get nodes
NAME            STATUS     ROLES    AGE     VERSION  
centos7-node1   NotReady   master   154m    v1.16.2  
centos7-node2   NotReady   <none>   2m49s   v1.16.2  

这里显示的是NotReady状态,是因为还没有安装CNI网络插件。我们选择weave插件来安装。

[root@CentOS7-Node1 Workspace]# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
serviceaccount/weave-net created  
clusterrole.rbac.authorization.k8s.io/weave-net created  
clusterrolebinding.rbac.authorization.k8s.io/weave-net created  
role.rbac.authorization.k8s.io/weave-net created  
rolebinding.rbac.authorization.k8s.io/weave-net created  
daemonset.apps/weave-net created  

验证集群是否安装完成

[root@CentOS7-Node1 Workspace]# kubectl get pods -n kube-system
NAME                                    READY   STATUS              RESTARTS   AGE  
coredns-5644d7b6d9-9fr9p                0/1     ContainerCreating   0          172m  
coredns-5644d7b6d9-pmpkq                0/1     ContainerCreating   0          172m  
etcd-centos7-node1                      1/1     Running             0          171m  
kube-apiserver-centos7-node1            1/1     Running             0          171m  
kube-controller-manager-centos7-node1   1/1     Running             0          171m  
kube-proxy-ccnht                        1/1     Running             0          21m  
kube-proxy-rdq9l                        1/1     Running             0          172m  
kube-scheduler-centos7-node1            1/1     Running             0          171m  
weave-net-6hw26                         2/2     Running             0          8m7s  
weave-net-qv8vz                         2/2     Running             0          8m7s  

发现coredns一直处于ContainerCreating的状态。具体的看一下错误信息。

[root@CentOS7-Node1 Workspace]# kubectl describe pod coredns-5644d7b6d9-9fr9p -n kube-system
Name:                 coredns-5644d7b6d9-9fr9p  
Namespace:            kube-system  
Priority:             2000000000  
Priority Class Name:  system-cluster-critical  
Node:                 centos7-node2/10.211.55.8  
Start Time:           Tue, 22 Oct 2019 20:49:47 +0800  
Labels:               k8s-app=kube-dns  
                      pod-template-hash=5644d7b6d9
.... # 此处省略一些
Events:  
  Type     Reason                  Age        From                    Message
  ----     ------                  ----       ----                    -------
  Warning  FailedScheduling        <unknown>  default-scheduler       0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling        <unknown>  default-scheduler       0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled               <unknown>  default-scheduler       Successfully assigned kube-system/coredns-5644d7b6d9-9fr9p to centos7-node2
  Warning  FailedCreatePodSandBox  2m         kubelet, centos7-node2  Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          119s       kubelet, centos7-node2  Pod sandbox changed, it will be killed and re-created.

这里可以看出一些错误:

Oct 22 10:50:15 CentOS7-Node1 kubelet[7649]: F1022 10:50:15.170550    7649 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml",  
Oct 22 10:50:15 CentOS7-Node1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a  

可以删除掉一个pod的方式让它重新启动:

[root@CentOS7-Node1 ~]# kubectl delete pod coredns-5644d7b6d9-9fr9p -n kube-system
pod "coredns-5644d7b6d9-9fr9p" deleted  

看了太多的文章与博客,发现没有几个写的太完全的,都是写的成功的经验,实际上中间不知道有各种奇怪问题。说句实话,k8s很方便,但是门槛很高,依赖的东西真的太多太多了。特别是版本问题导致的问题,很难解决掉。

最后看一下成功的图片吧

http://static.cyblogs.com/WX20191023-164029@2x.png

常用命令汇总

systemctl daemon-reload

systemctl restart kubelet

kubectl get pods -n kube-system

kubectl describe pod coredns-5644d7b6d9-lqtks -n kube-system

kubectl delete pod coredns-5644d7b6d9-qh4bc -n kube-system  
# 允许master节点部署pod
kubectl taint nodes --all node-role.kubernetes.io/master-  
# 禁止master部署pod
kubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedule

kubeadm reset

systemctl enable docker && systemctl start docker

systemctl enable kubelet && systemctl start kubelet

journalctl -xefu kubelet  

参考地址