2016-08-12 45 views
-2

我有一个文本文件,其中包含500多个HTML页面,我可以将这些文件快速分离到HTML文件吗?将文本文件拆分为HTML文档

我想到了并确定每个文档的起点&会起作用,但我不确定如何为此编写脚本?

+0

什么编程语言?每页开头的代码是什么。它是不是'<!DOCTYPE html>'? –

回答

0

如果您的HTML代码由<!DOCTYPE html>标签分隔,你可以使用这个脚本用Python编写的:

# text to html 
# Parses through a text file and seperates HTML code into 
# files like html1.html, html2.html, etc. 
# The HTML files need to include <!DOCTYPE html> at the start! 

# Usage: $ python text-to-html.py filename 
# Example: $ python text-to-html.py testfile.txt 

from sys import argv 

filename = argv[1] 

open_file = open(filename) 
counter = 0 

for line in open_file: 
    if "<!DOCTYPE html>" in line: 
     counter += 1 
     new_filename = "html%d.html" % (counter) 
     new_file = open(new_filename, "w") 
    new_file.write(line) 

希望它能帮助!