假设你有在未压缩HDFS火花日志文件,但你想在spark-defaults.conf
打开spark.eventLog.compress true
和前进并压缩旧的日志。地图减少的方法会最有意义,但作为一个你也可以使用:
snzip -t hadoop-snappy local_file_will_end_in_dot_snappy
然后上传把它直接。
安装snzip可能类似于此:
sudo yum install snappy snappy-devel
curl -O https://dl.bintray.com/kubo/generic/snzip-1.0.4.tar.gz
tar -zxvf snzip-1.0.4.tar.gz
cd snzip-1.0.4
./configure
make
sudo make install
贵轮单个文件之旅可能是:
hdfs dfs -copyToLocal /var/log/spark/apps/application_1512353561403_50748_1 .
snzip -t hadoop-snappy application_1512353561403_50748_1
hdfs dfs -copyFromLocal application_1512353561403_50748_1.snappy /var/log/spark/apps/application_1512353561403_50748_1.snappy
或者与gohdfs:
hdfs cat /var/log/spark/apps/application_1512353561403_50748_1 \
| snzip -t hadoop-snappy > zzz
hdfs put zzz /var/log/spark/apps/application_1512353561403_50748_1.snappy
rm zzz
这不是可能,'put'只是移动数据。 –