我试图从Spark数据框中将一些数据保存到S3存储桶。这很简单:什么是AWSRequestMetricsFullSupport,如何关闭它?
dataframe.saveAsParquetFile("s3://kirk/my_file.parquet")
数据已成功保存,但UI很忙很长一段时间。我得到成千上万的这样的行:
2015-09-04 20:48:19,591 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[5C3211750F4FF5AB], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[63.827], HttpRequestTime=[62.919], HttpClientReceiveResponseTime=[61.678], RequestSigningTime=[0.05], ResponseProcessingTime=[0.812], HttpClientSendRequestTime=[0.038],
2015-09-04 20:48:19,610 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[709DA41540539FE0], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[18.064], HttpRequestTime=[17.959], HttpClientReceiveResponseTime=[16.703], RequestSigningTime=[0.06], ResponseProcessingTime=[0.003], HttpClientSendRequestTime=[0.046],
2015-09-04 20:48:19,664 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[1B1EB812E7982C7A], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[54.36], HttpRequestTime=[54.26], HttpClientReceiveResponseTime=[53.006], RequestSigningTime=[0.057], ResponseProcessingTime=[0.002], HttpClientSendRequestTime=[0.034],
2015-09-04 20:48:19,675 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: AF6F960F3B2BF3AB), S3 Extended Request ID: CLs9xY8HAxbEAKEJC4LS1SgpqDcnHeaGocAbdsmYKwGttS64oVjFXJOe314vmb9q], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[AF6F960F3B2BF3AB], ServiceEndpoint=[https://kirk.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[10.111], HttpRequestTime=[10.009], HttpClientReceiveResponseTime=[8.758], RequestSigningTime=[0.043], HttpClientSendRequestTime=[0.044],
2015-09-04 20:48:19,685 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: F2198ACEB4B2CE72), S3 Extended Request ID: J9oWD8ncn6WgfUhHA1yqrBfzFC+N533oD/DK90eiSvQrpGH4OJUc3riG2R4oS1NU], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[F2198ACEB4B2CE72], ServiceEndpoint=[https://kirk.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[9.879], HttpRequestTime=[9.776], HttpClientReceiveResponseTime=[8.537], RequestSigningTime=[0.05], HttpClientSendRequestTime=[0.033],
我可以理解,如果一些用户感兴趣的记录S3操作的等待时间,但有什么办法禁用任何和所有的监测和AWSRequestMetricsFullSupport
登录?
当我检查Spark UI时,它告诉我作业完成得相对较快,但控制台充斥着这些消息很长一段时间。
对于上下文,我保存了1m行和500列的数据帧。大约需要20秒钟才能保存,但延迟警告会出现在我的控制台中> 20分钟。 –