2017-08-09 54 views
10

我使用kubeadm创建了5个VM(1个主站和4个从站,运行Ubuntu 16.04.3 LTS)的K8s集群。我使用flannel在集群中建立网络。我能够成功部署一个应用程序。然后,我通过NodePort服务公开它。从这里,事情变得复杂了。K8s NodePort服务仅在集群中的2/4个从站上“无法通过IP访问”

在我开始之前,我禁用了主节点和节点上的默认firewalld服务。

据我所知,K8s Services doc类型的NodePort暴露集群中所有节点上的服务。但是,当我创建它时,该服务仅暴露在群集中4个中的2个节点上。我猜这不是预期的行为(是吗?)

为了排除故障,这里有一些资源规格:

[email protected]:~# kubectl get nodes 
NAME    STATUS AGE  VERSION 
vm-deepejai-00b Ready  5m  v1.7.3 
vm-plashkar-006 Ready  4d  v1.7.3 
vm-rosnthom-00f Ready  4d  v1.7.3 
vm-vivekse-003 Ready  4d  v1.7.3 //the master 
vm-vivekse-004 Ready  16h  v1.7.3 

[email protected]:~# kubectl get pods -o wide -n playground 
NAME          READY  STATUS RESTARTS AGE  IP   NODE 
kubernetes-bootcamp-2457653786-9qk80  1/1  Running 0   2d  10.244.3.6 vm-rosnthom-00f 
springboot-helloworld-2842952983-rw0gc 1/1  Running 0   1d  10.244.3.7 vm-rosnthom-00f 

[email protected]:~# kubectl get svc -o wide -n playground 
NAME  CLUSTER-IP  EXTERNAL-IP PORT(S)   AGE  SELECTOR 
sb-hw-svc 10.101.180.19 <nodes>  9000:30847/TCP 5h  run=springboot-helloworld 

[email protected]:~# kubectl describe svc sb-hw-svc -n playground 
Name:    sb-hw-svc 
Namespace:   playground 
Labels:    <none> 
Annotations:  <none> 
Selector:   run=springboot-helloworld 
Type:    NodePort 
IP:     10.101.180.19 
Port:    <unset> 9000/TCP 
NodePort:   <unset> 30847/TCP 
Endpoints:   10.244.3.7:9000 
Session Affinity: None 
Events:    <none> 

[email protected]:~# kubectl get endpoints sb-hw-svc -n playground -o yaml 
apiVersion: v1 
kind: Endpoints 
metadata: 
    creationTimestamp: 2017-08-09T06:28:06Z 
    name: sb-hw-svc 
    namespace: playground 
    resourceVersion: "588958" 
    selfLink: /api/v1/namespaces/playground/endpoints/sb-hw-svc 
    uid: e76d9cc1-7ccb-11e7-bc6a-fa163efaba6b 
subsets: 
- addresses: 
    - ip: 10.244.3.7 
    nodeName: vm-rosnthom-00f 
    targetRef: 
     kind: Pod 
     name: springboot-helloworld-2842952983-rw0gc 
     namespace: playground 
     resourceVersion: "473859" 
     uid: 16d9db68-7c1a-11e7-bc6a-fa163efaba6b 
    ports: 
    - port: 9000 
    protocol: TCP 

一些修修补补,我意识到,那些2“故障”的节点,这些服务并不之后可以从这些主机内部获得。

NODE01(工作):

[email protected]:~# curl 127.0.0.1:30847  //<localhost>:<nodeport> 
Hello Docker World!! 
[email protected]:~# curl 10.101.180.19:9000 //<cluster-ip>:<port> 
Hello Docker World!! 
[email protected]:~# curl 10.244.3.7:9000  //<pod-ip>:<port> 
Hello Docker World!! 

NODE02(工作):

[email protected]:~# curl 127.0.0.1:30847 
Hello Docker World!! 
[email protected]:~# curl 10.101.180.19:9000 
Hello Docker World!! 
[email protected]:~# curl 10.244.3.7:9000 
Hello Docker World!! 

Node03(不工作):

[email protected]:~# curl 127.0.0.1:30847 
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out 
[email protected]:~# curl 10.101.180.19:9000 
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out 
[email protected]:~# curl 10.244.3.7:9000 
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out 

Node04(不工作):

[email protected]:/# curl 127.0.0.1:30847 
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out 
[email protected]:/# curl 10.101.180.19:9000 
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out 
[email protected]:/# curl 10.244.3.7:9000 
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out 

在所有4个从站上尝试过netstattelnet。下面是输出:

NODE01(工作主机):

[email protected]:~# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  27808/kube-proxy 
[email protected]:~# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
Connected to 127.0.0.1. 
Escape character is '^]'. 

NODE02(工作主机):

[email protected]:~# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  11842/kube-proxy 
[email protected]:~# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
Connected to 127.0.0.1. 
Escape character is '^]'. 

Node03(在不工作的主机):

[email protected]:~# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  7791/kube-proxy 
[email protected]:~# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
telnet: Unable to connect to remote host: Connection timed out 

Node04(非工作主机):

[email protected]:/# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  689/kube-proxy 
[email protected]:/# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
telnet: Unable to connect to remote host: Connection timed out 

加成信息:

kubectl get pods输出,我可以看到,吊舱实际上是部署在从vm-rosnthom-00f。我能够从所有5台虚拟机中获得该主机的ping,并且所有虚拟机都可以使用curl vm-rosnthom-00f:30847

我可以清楚地看到内部集群网络混乱了,但我不确定如何解决它!所有从站的iptables -L都是相同的,甚至本地环回(ifconfig lo)已启动并运行于所有从站。我完全不知道如何解决它!

+0

只是为了确认,做所有的非泊坞窗接口的IP地址有一个独立的IP地址空间比码头工人,豆荚和服务?我想看到的命令是'root @ vm-deepejai-00b:/#curl THE_IP_OF_vm-vivekse-004:30847',以确保'vm-deepejai-00b'能想象到将流量路由到'vm-vivekse-004' ,因为无论如何 –

+0

下面是发生了什么问题另外,为了清楚起见,你是否检查过'iptables -t nat -L'以及'iptables -L'(我无法确定这是你的意思) –

+0

@MatthewLDaniel关于你的第一个评论,卷曲的工作原理: '根@ VM-deepejai-00B:〜#卷曲173.36.23.4:30847 你好泊坞世界!!' 其中173.36.23.4是VM-的IP vivekse-004 –

回答

-3

如果您想从群集中的任何节点到达服务,您需要的服务类型为ClusterIP。由于您将服务类型定义为NodePort,因此可以从运行服务的节点进行连接。


我上面的回答是不正确的,基于文档,我们应该能够从任何连接NodeIP:Nodeport。但它也不在我的集群中工作。

https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services---service-types

NodePort:在静态端口( NodePort)自曝在每个节点上的IP服务。自动创建一个ClusterIP服务,NodePort服务将路由到该服务。通过请求 ,您将能够从集群外部联系 节点端口服务:。

我的一个节点ip转发没有设置。我能够连接使用NodeIP我的服务:nodePort

sysctl -w net.ipv4.ip_forward=1 
相关问题