2012-12-24 104 views
2

我有一个文件有几个IP地址。在txt的4行中有大约900个IP。我希望输出为每行1个IP。我怎样才能做到这一点?基于其他的代码,我想出了这个室内用,但它无法becasue多个IP单线路:python解析文件的IP地址

import sys 
import re 

try: 
    if sys.argv[1:]: 
     print "File: %s" % (sys.argv[1]) 
     logfile = sys.argv[1] 
    else: 
     logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ") 
    try: 
     file = open(logfile, "r") 
     ips = [] 
     for text in file.readlines(): 
      text = text.rstrip() 
      regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text) 
      if regex is not None and regex not in ips: 
       ips.append(regex) 

     for ip in ips: 
      outfile = open("/tmp/list.txt", "a") 
      addy = "".join(ip) 
      if addy is not '': 
       print "IP: %s" % (addy) 
       outfile.write(addy) 
       outfile.write("\n") 
    finally: 
     file.close() 
     outfile.close() 
except IOError, (errno, strerror): 
     print "I/O Error(%s) : %s" % (errno, strerror) 
+2

你要找的IPv4地址的规范形式。请注意,即使是IPv4地址,也有其他可接受的形式。例如尝试http:// 2130706433 /如果您在本地主机端口80上运行HTTP服务器(2130706433 == 0x7f000001 == 127.0.0.1)。当然,如果你控制文件的格式,你不需要担心这些事情......但是,如果你能够切实支持IPv6,它将会对你的脚本有前瞻性。 –

+0

're.findall()'总是返回一个列表。它永远不是'没有'。 – jfs

回答

2

$锚在你的表达是阻止你找到任何东西,但最后一个条目。卸下,然后使用由.findall()返回的列表:

found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text) 
if regex: 
    ips.extend(found) 
1

的函数findAll返回匹配的数组,你是不是通过每场比赛迭代。

regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text) 
if regex is not None: 
    for match in regex: 
     if match not in ips: 
      ips.append(match) 
0

没有re.MULTILINE标志$只在字符串的结尾相匹配。

为了使调试更容易将代码拆分为几个可独立测试的部分。

def extract_ips(data): 
    return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data) 

如果输入文件是小,你并不需要保存IPS的原始顺序:

with open(filename) as infile, open(outfilename, "w") as outfile: 
    outfile.write("\n".join(set(extract_ips(infile.read())))) 

否则:

with open(filename) as infile, open(outfilename, "w") as outfile: 
    seen = set() 
    for line in infile: 
     for ip in extract_ips(line): 
      if ip not in seen: 
       seen.add(ip) 
       print >>outfile, ip 
1

提取IP地址从文件

我在this discussion回答了类似的问题。总之,这是基于我正在进行的项目之一,用于提取液的网络,并从不同类型的输入数据的基于主机的指标(如字符串,文件,博客文章等):https://github.com/JohnnyWachter/intel


我会导入在IPAddresses数据类,然后用它们来完成你的任务,以下列方式:

#!/usr/bin/env/python 

"""Extract IPv4 Addresses From Input File.""" 

from Data import CleanData # Format and Clean the Input Data. 
from IPAddresses import ExtractIPs # Extract IPs From Input Data. 


def get_ip_addresses(input_file_path): 
    """" 
    Read contents of input file and extract IPv4 Addresses. 
    :param iput_file_path: fully qualified path to input file. Expecting str 
    :returns: dictionary of IPv4 and IPv4-like Address lists 
    :rtype: dict 
    """ 

    input_data = [] # Empty list to house formatted input data. 

    input_data.extend(CleanData(input_file_path).to_list()) 

    results = ExtractIPs(input_data).get_ipv4_results() 

    return results 
  • 现在你已经列出的字典,您可以轻松访问您想要的数据并以您想要的任何方式输出。下面的例子利用了上面的功能;结果打印到控制台,并把它们写入到一个指定的输出文件:

    # Extract the desired data using the aforementioned function. 
    ipv4_list = get_ip_addresses('/path/to/input/file') 
    
    # Open your output file in 'append' mode. 
    with open('/path/to/output/file', 'a') as outfile: 
    
        # Ensure that the list of valid IPv4 Addresses is not empty. 
        if ipv4_list['valid_ips']: 
    
         for ip_address in ipv4_list['valid_ips']: 
    
          # Print to console 
          print(ip_address) 
    
          # Write to output file. 
          outfile.write(ip_address)