2012-06-14 129 views
1

我有一个网站运行在使用Elastic Beanstalk部署的Amazon Web Services上,并在单个EC2微型实例上运行。这是一个临时环境,我是唯一有权访问它的人。使用Apache JMeter,我模拟六个用户在网站上浏览,每3秒总共平均请求一次(图像,CSS,JS和其他静态资源由CloudFront提供服务,并且不会在EC2实例上创建流量)。亚马逊ELB无法提供响应

问题是,经过一段时间(通常从建立环境30-60分钟),网站停止响应。我确信Tomcat仍然正常运行,因为我可以在日志(catalina.out)中看到cronjob仍在执行中。似乎只有ELB无法提供回应。

分析日志时,Tomcat上完全没有错误(none在/opt/tomcat7/logs/tail_catalina.log或/opt/tomcat7/logs/catalina.out中)。下面的错误尽快开始出现在的/ etc/httpd的/日志/ elasticbeanstalk-error_log中的网站变得不可:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:26:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:27:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:27:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:27:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:27:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:27:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:27:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:28:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:28:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:28:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:28:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:28:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:28:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:29:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:29:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:29:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:29:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:29:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:29:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:30:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:30:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:30:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:30:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:30:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:30:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:31:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:31:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:31:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:31:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:31:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:31:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:32:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:32:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 

...直到EC2实例被终止最后(和一个新的自动启动) 。

如果我没有提出任何请求(或者如果我减少),则不会发生此问题。

任何帮助非常感谢。

谢谢!

+0

与问题无关,但由于googlability:如果您尝试访问只有443设置的ELB上的端口80,则可以看到“连接被拒绝”。 – Fuser97381

回答

7

让我先假设:

  • 你的Tomcat应用程序是应该在127.0.0.1:8999

如果这是真的,日志事件监听:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 

..表示应用程序侦听器死亡。您可以使用此确认:

curl -v http://127.0.0.1:8999/ 

,当网站运行正常curl命令应该返回一个有效的HTTP响应,并可能会返回一个Connection refusedcouldn't connect to host当您遇到停电。您还可以使用下面的命令来检查有效监听器应用程序端口:

netstat -an | grep LISTEN | grep 8999 

有许多原因可以解释为什么应用程序监听器可能会死,包括但不限于:

  • JVM的硬碰撞(使用ps,看是否JVM进程仍在运行)
  • 应用的软崩溃(看Tomcat应用程序日志)
  • 跑了文件描述符(使用lsof | wc -l的D比较给应用程序的用户的ulimit -n

然而,应导致的错误消息的大多数错误写入到JVM进程的stderr,其通常记录。这是最好看的地方。如果一切都失败了,你可能想尝试在启用了调试日志记录的情况下在前台运行Tomcat应用程序。

+0

非常感谢您提供完整的答案,@ gabrtv。我只是在等待一个实例再次停止服务,我会用你的建议来弄清楚问题所在。您知道通常在亚马逊EC2上登录的stderr在哪里吗?谢谢。 – satoshi

+0

'stderr'是以每个进程为基础登录的。在这种情况下,您关心的是Tomcat/JVM进程的stderr。通常将其写入日志文件,即catalina.out或单独的“错误”日志文件。你也应该擦除'/ var/log/syslog'和'/ var/log/messages'来查看任何相关的错误。 – gabrtv

+0

对此有何更新?赏金很快结束;) – gabrtv

1

我刚刚花了一天的时间与这个类似的问题作斗争。我有一个WAR文件部署到Amazon Elastic Beanstalk环境。与我不同的是,由AEBS环境启动的实例只持续了5分钟,然后被AEBS替换为新实例。

后相当多的挖掘(在5分钟块,而我的情况还活着)和一些light reading我发现AEBS Tomcat实例与Apache的接收端口的请求80.请求发送到/_hostmanager被重新路由到创造端口8999和其他任何端口8080(Tomcat)。部署到实例的名为“hostmanager”的Ruby应用程序在端口8999上侦听。此应用程序可能会报告返回到AWS Elastic Beanstalk主机管理器的其他统计信息,以允许Elastic Beanstalk环境获取环境负载的图片,以及适当放大或缩小实例的数量。

如果AWS Elastic Beanstalk Host Manager未从实例的主机管理器应用程序获取响应,则它将终止该实例并启动一个新实例。这可能是您的网站持续30分钟然后死亡的原因。

所以我想这里的问题不在于你的Java应用程序正在担任了8080端口,但与hostmanager应用程序不侦听端口8999。这可能是什么原因造成:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 

退房/opt/elasticbeanstalk/var/log/hostmanager.log因为它可能会为您提供更多线索,说明发生了什么以及为什么hostmanager应用程序不愉快。

在我的情况下,事实证明我的hostmanager应用程序正在运行一个wget到Amazon S3存储桶并获得404响应(我从上面看到的hostmanager.log中发现了这一点)。这导致主管人员无法启动。因此,当传入的请求重新路由到端口8999时,没有人在监听。失败。实例已终止。

与其试图找出hostmanager应用程序失败的原因,我决定将Elastic Beanstalk环境正在使用的AMI视为丢失的原因。我最终放弃它,并按照以下步骤获得流失的自定义AMI新的弹性魔豆环境:

  1. 从那是实例创建一个AMI我的WAR文件
  2. 创建一个新的弹性魔豆环境由它创建
  3. 创建从AMI在步骤中创建2
  4. 补充说我需要一些额外的比特(Tomcat管理例如)常规EC2实例
  5. 从在步骤3中
  6. 创建的普通实例创建一个AMI
  7. AMI应用于Elastic Beanstalk环境

不知道你的设置是什么,它有点难以准确帮助。尽管希望知道主机管理员在端口8999上进行侦听的组合,hostmanager.log的位置以及一些运气会让你知道你想要的位置!