2013-01-22 57 views
1

我是Apache Flume的新手。 我创建了我的经纪人,如:Flume代理 - 使用尾巴-F

agent.sources=exec-source 
agent.sinks=hdfs-sink 
agent.channels=ch1 

agent.sources.exec-source.type=exec 
agent.sources.exec-source.command=tail -F /var/log/apache2/access.log 

agent.sinks.hdfs-sink.type=hdfs 
agent.sinks.hdfs-sink.hdfs.path=hdfs://<Host-Name of name node>/ 
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess 

agent.channels.ch1.type=memory 
agent.channels.ch1.capacity=1000 

agent.sources.exec-source.channels=ch1 
agent.sinks.hdfs-sink.channel=ch1 

,我得到输出继电器是:

13/01/22 17:31:48 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1 
13/01/22 17:31:48 INFO node.FlumeNode: Flume node starting - agent 
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting 
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting 
13/01/22 17:31:48 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 9 
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:conf/flume_exec.conf 
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Added sinks: hdfs-sink Agent: agent 
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink 
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink 
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink 
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink 
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent] 
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: Creating channels 
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: created channel ch1 
13/01/22 17:31:48 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs-sink, type: hdfs 
13/01/22 17:31:48 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false 
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{exec-source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec-source,state:IDLE} }} sinkRunners:{hdfs-sink=SinkRunner: { policy:[email protected] counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} } 
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel ch1 
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: ch1, registered successfully. 
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: ch1 started 
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink hdfs-sink 
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting Source exec-source 
13/01/22 17:31:48 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log 
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: hdfs-sink, registered successfully. 
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started 

但它不写日志HDFS。

当我运行cat /var/log/apache2/access.log而不是tail –F /var/log/apache2/access.log它运行并且我的文件在HDFS上创建。

请帮我一把。

+0

这是一样的http://stackoverflow.com/questions/13721930/flume-ng-tail-a-file? – Ryan

回答

1

默认情况下,“tail -F”在开始时仅打印最后10行文件。看来10行不足以填充HDFS块,所以你看不到Flume写的任何东西。 您可以:

  • 尝试“尾-n $ X -F”在开始打印最近的X线(X值将取决于您的HDFS安装块的大小会发生变化)
  • 等待,直到访问。在Flume运行时日志会变得足够大(同样,等待时间取决于块的大小和访问速度。日志越来越长;在生产模式下,它会足够快,我认为)
  • 将follwing行添加到您的flume.conf。这将迫使水槽滚动新文件,而不管写入的数据的大小的每10秒(假定它不是零):

    agent.sinks.hdfs-sink.hdfs.rollInterval = 10个

    agent.sinks。 hdfs_sink.hdfs.rollSize = 0