我使用kubeadm
创建了5个VM(1个主站和4个从站,运行Ubuntu 16.04.3 LTS)的K8s集群。我使用flannel
在集群中建立网络。我能够成功部署一个应用程序。然后,我通过NodePort服务公开它。从这里,事情变得复杂了。K8s NodePort服务仅在集群中的2/4个从站上“无法通过IP访问”
在我开始之前,我禁用了主节点和节点上的默认firewalld
服务。
据我所知,K8s Services doc类型的NodePort暴露集群中所有节点上的服务。但是,当我创建它时,该服务仅暴露在群集中4个中的2个节点上。我猜这不是预期的行为(是吗?)
为了排除故障,这里有一些资源规格:
[email protected]:~# kubectl get nodes
NAME STATUS AGE VERSION
vm-deepejai-00b Ready 5m v1.7.3
vm-plashkar-006 Ready 4d v1.7.3
vm-rosnthom-00f Ready 4d v1.7.3
vm-vivekse-003 Ready 4d v1.7.3 //the master
vm-vivekse-004 Ready 16h v1.7.3
[email protected]:~# kubectl get pods -o wide -n playground
NAME READY STATUS RESTARTS AGE IP NODE
kubernetes-bootcamp-2457653786-9qk80 1/1 Running 0 2d 10.244.3.6 vm-rosnthom-00f
springboot-helloworld-2842952983-rw0gc 1/1 Running 0 1d 10.244.3.7 vm-rosnthom-00f
[email protected]:~# kubectl get svc -o wide -n playground
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
sb-hw-svc 10.101.180.19 <nodes> 9000:30847/TCP 5h run=springboot-helloworld
[email protected]:~# kubectl describe svc sb-hw-svc -n playground
Name: sb-hw-svc
Namespace: playground
Labels: <none>
Annotations: <none>
Selector: run=springboot-helloworld
Type: NodePort
IP: 10.101.180.19
Port: <unset> 9000/TCP
NodePort: <unset> 30847/TCP
Endpoints: 10.244.3.7:9000
Session Affinity: None
Events: <none>
[email protected]:~# kubectl get endpoints sb-hw-svc -n playground -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-08-09T06:28:06Z
name: sb-hw-svc
namespace: playground
resourceVersion: "588958"
selfLink: /api/v1/namespaces/playground/endpoints/sb-hw-svc
uid: e76d9cc1-7ccb-11e7-bc6a-fa163efaba6b
subsets:
- addresses:
- ip: 10.244.3.7
nodeName: vm-rosnthom-00f
targetRef:
kind: Pod
name: springboot-helloworld-2842952983-rw0gc
namespace: playground
resourceVersion: "473859"
uid: 16d9db68-7c1a-11e7-bc6a-fa163efaba6b
ports:
- port: 9000
protocol: TCP
一些修修补补,我意识到,那些2“故障”的节点,这些服务并不之后可以从这些主机内部获得。
NODE01(工作):
[email protected]:~# curl 127.0.0.1:30847 //<localhost>:<nodeport>
Hello Docker World!!
[email protected]:~# curl 10.101.180.19:9000 //<cluster-ip>:<port>
Hello Docker World!!
[email protected]:~# curl 10.244.3.7:9000 //<pod-ip>:<port>
Hello Docker World!!
NODE02(工作):
[email protected]:~# curl 127.0.0.1:30847
Hello Docker World!!
[email protected]:~# curl 10.101.180.19:9000
Hello Docker World!!
[email protected]:~# curl 10.244.3.7:9000
Hello Docker World!!
Node03(不工作):
[email protected]:~# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
[email protected]:~# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
[email protected]:~# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Node04(不工作):
[email protected]:/# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
[email protected]:/# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
[email protected]:/# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
在所有4个从站上尝试过netstat
和telnet
。下面是输出:
NODE01(工作主机):
[email protected]:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 27808/kube-proxy
[email protected]:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
NODE02(工作主机):
[email protected]:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 11842/kube-proxy
[email protected]:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node03(在不工作的主机):
[email protected]:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 7791/kube-proxy
[email protected]:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Node04(非工作主机):
[email protected]:/# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 689/kube-proxy
[email protected]:/# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
加成信息:
从kubectl get pods
输出,我可以看到,吊舱实际上是部署在从vm-rosnthom-00f
。我能够从所有5台虚拟机中获得该主机的ping
,并且所有虚拟机都可以使用curl vm-rosnthom-00f:30847
。
我可以清楚地看到内部集群网络混乱了,但我不确定如何解决它!所有从站的iptables -L
都是相同的,甚至本地环回(ifconfig lo
)已启动并运行于所有从站。我完全不知道如何解决它!
只是为了确认,做所有的非泊坞窗接口的IP地址有一个独立的IP地址空间比码头工人,豆荚和服务?我想看到的命令是'root @ vm-deepejai-00b:/#curl THE_IP_OF_vm-vivekse-004:30847',以确保'vm-deepejai-00b'能想象到将流量路由到'vm-vivekse-004' ,因为无论如何 –
下面是发生了什么问题另外,为了清楚起见,你是否检查过'iptables -t nat -L'以及'iptables -L'(我无法确定这是你的意思) –
@MatthewLDaniel关于你的第一个评论,卷曲的工作原理: '根@ VM-deepejai-00B:〜#卷曲173.36.23.4:30847 你好泊坞世界!!' 其中173.36.23.4是VM-的IP vivekse-004 –