K8s NodePort服务仅在集群中的2/4个从站上“无法通过IP访问”

我使用kubeadm创建了5个VM（1个主站和4个从站，运行Ubuntu 16.04.3 LTS）的K8s集群。我使用flannel在集群中建立网络。我能够成功部署一个应用程序。然后，我通过NodePort服务公开它。从这里，事情变得复杂了。K8s NodePort服务仅在集群中的2/4个从站上“无法通过IP访问”

在我开始之前，我禁用了主节点和节点上的默认firewalld服务。

据我所知，K8s Services doc类型的NodePort暴露集群中所有节点上的服务。但是，当我创建它时，该服务仅暴露在群集中4个中的2个节点上。我猜这不是预期的行为（是吗？）

为了排除故障，这里有一些资源规格：

[email protected]:~# kubectl get nodes 
NAME    STATUS AGE  VERSION 
vm-deepejai-00b Ready  5m  v1.7.3 
vm-plashkar-006 Ready  4d  v1.7.3 
vm-rosnthom-00f Ready  4d  v1.7.3 
vm-vivekse-003 Ready  4d  v1.7.3 //the master 
vm-vivekse-004 Ready  16h  v1.7.3 

[email protected]:~# kubectl get pods -o wide -n playground 
NAME          READY  STATUS RESTARTS AGE  IP   NODE 
kubernetes-bootcamp-2457653786-9qk80  1/1  Running 0   2d  10.244.3.6 vm-rosnthom-00f 
springboot-helloworld-2842952983-rw0gc 1/1  Running 0   1d  10.244.3.7 vm-rosnthom-00f 

[email protected]:~# kubectl get svc -o wide -n playground 
NAME  CLUSTER-IP  EXTERNAL-IP PORT(S)   AGE  SELECTOR 
sb-hw-svc 10.101.180.19 <nodes>  9000:30847/TCP 5h  run=springboot-helloworld 

[email protected]:~# kubectl describe svc sb-hw-svc -n playground 
Name:    sb-hw-svc 
Namespace:   playground 
Labels:    <none> 
Annotations:  <none> 
Selector:   run=springboot-helloworld 
Type:    NodePort 
IP:     10.101.180.19 
Port:    <unset> 9000/TCP 
NodePort:   <unset> 30847/TCP 
Endpoints:   10.244.3.7:9000 
Session Affinity: None 
Events:    <none> 

[email protected]:~# kubectl get endpoints sb-hw-svc -n playground -o yaml 
apiVersion: v1 
kind: Endpoints 
metadata: 
    creationTimestamp: 2017-08-09T06:28:06Z 
    name: sb-hw-svc 
    namespace: playground 
    resourceVersion: "588958" 
    selfLink: /api/v1/namespaces/playground/endpoints/sb-hw-svc 
    uid: e76d9cc1-7ccb-11e7-bc6a-fa163efaba6b 
subsets: 
- addresses: 
    - ip: 10.244.3.7 
    nodeName: vm-rosnthom-00f 
    targetRef: 
     kind: Pod 
     name: springboot-helloworld-2842952983-rw0gc 
     namespace: playground 
     resourceVersion: "473859" 
     uid: 16d9db68-7c1a-11e7-bc6a-fa163efaba6b 
    ports: 
    - port: 9000 
    protocol: TCP

一些修修补补，我意识到，那些2“故障”的节点，这些服务并不之后可以从这些主机内部获得。

NODE01（工作）：

[email protected]:~# curl 127.0.0.1:30847  //<localhost>:<nodeport> 
Hello Docker World!! 
[email protected]:~# curl 10.101.180.19:9000 //<cluster-ip>:<port> 
Hello Docker World!! 
[email protected]:~# curl 10.244.3.7:9000  //<pod-ip>:<port> 
Hello Docker World!!

NODE02（工作）：

[email protected]:~# curl 127.0.0.1:30847 
Hello Docker World!! 
[email protected]:~# curl 10.101.180.19:9000 
Hello Docker World!! 
[email protected]:~# curl 10.244.3.7:9000 
Hello Docker World!!

Node03（不工作）：

[email protected]:~# curl 127.0.0.1:30847 
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out 
[email protected]:~# curl 10.101.180.19:9000 
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out 
[email protected]:~# curl 10.244.3.7:9000 
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out

Node04（不工作）：

[email protected]:/# curl 127.0.0.1:30847 
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out 
[email protected]:/# curl 10.101.180.19:9000 
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out 
[email protected]:/# curl 10.244.3.7:9000 
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out

在所有4个从站上尝试过netstat和telnet。下面是输出：

NODE01（工作主机）：

[email protected]:~# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  27808/kube-proxy 
[email protected]:~# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
Connected to 127.0.0.1. 
Escape character is '^]'.

NODE02（工作主机）：

[email protected]:~# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  11842/kube-proxy 
[email protected]:~# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
Connected to 127.0.0.1. 
Escape character is '^]'.

Node03（在不工作的主机）：

[email protected]:~# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  7791/kube-proxy 
[email protected]:~# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
telnet: Unable to connect to remote host: Connection timed out

Node04（非工作主机）：

[email protected]:/# netstat -tulpn | grep 30847 
tcp6  0  0 :::30847    :::*     LISTEN  689/kube-proxy 
[email protected]:/# telnet 127.0.0.1 30847 
Trying 127.0.0.1... 
telnet: Unable to connect to remote host: Connection timed out

加成信息：

从kubectl get pods输出，我可以看到，吊舱实际上是部署在从vm-rosnthom-00f。我能够从所有5台虚拟机中获得该主机的ping，并且所有虚拟机都可以使用curl vm-rosnthom-00f:30847。

我可以清楚地看到内部集群网络混乱了，但我不确定如何解决它！所有从站的iptables -L都是相同的，甚至本地环回（ifconfig lo）已启动并运行于所有从站。我完全不知道如何解决它！

来源

2017-08-09 Vivek Sethi

只是为了确认，做所有的非泊坞窗接口的IP地址有一个独立的IP地址空间比码头工人，豆荚和服务？我想看到的命令是'root @ vm-deepejai-00b：/＃curl THE_IP_OF_vm-vivekse-004：30847'，以确保'vm-deepejai-00b'能想象到将流量路由到'vm-vivekse-004' ，因为无论如何 –

下面是发生了什么问题另外，为了清楚起见，你是否检查过'iptables -t nat -L'以及'iptables -L'（我无法确定这是你的意思） –

@MatthewLDaniel关于你的第一个评论，卷曲的工作原理： '根@ VM-deepejai-00B：〜＃卷曲173.36.23.4:30847 你好泊坞世界!!' 其中173.36.23.4是VM-的IP vivekse-004 –

-3

如果您想从群集中的任何节点到达服务，您需要的服务类型为ClusterIP。由于您将服务类型定义为NodePort，因此可以从运行服务的节点进行连接。

我上面的回答是不正确的，基于文档，我们应该能够从任何连接NodeIP:Nodeport。但它也不在我的集群中工作。

https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services---service-types

NodePort：在静态端口（ NodePort）自曝在每个节点上的IP服务。自动创建一个ClusterIP服务，NodePort服务将路由到该服务。通过请求，您将能够从集群外部联系节点端口服务：。

我的一个节点ip转发没有设置。我能够连接使用NodeIP我的服务：nodePort

sysctl -w net.ipv4.ip_forward=1

来源

2017-08-10 02:49:49 sfgroups

K8s NodePort服务仅在集群中的2/4个从站上“无法通过IP访问”

回答

相关问题