2011-08-05 75 views
2

我想分割一个RTF文件(与C#或VB.Net)在字符串[BreakPage] 2或更多的部分。我有为例此文件,含有[BreakPage],这需要在2份被分裂:我如何分割一个RTF文件

{\ RTF1 \ ANSI \ ansicpg1251 \ UC1 \ deff0 \ stshfdbch0 \ stshfloch0 \ stshfhich0 \ stshfbi0 \ deflang1049 \ deflangfe1049 { \ fonttbl {\ f0 \ froman \ fcharset204 \ fprq2 {* \ panose 02020603050405020304} Times New Roman;} {\ f38 \ froman \ fcharset0 \ fprq2 Times New Roman;} {\ f36 \ froman \ fcharset238 \ fprq2 Times New Roman CE;} {\ f39 \ froman \ fcharset161 \ fprq2 Times New Roman 希腊语;} {\ f40 \ froman \ fcharset162 \ fprq2 Times New Roman(希伯来语);} {\ f42 \ froman \ fcharset178 \ fprq2 Times New Roman (Arabic);} {\ f43 \ froman \ fcharset186 \ fprq2 Times New Roman Baltic;} {\ f44 \ froman \ fcharset163 \ fprq2 Times New Roman (Vietnamese);}} {\ colortbl; \ red0 \ green0 \ blue0; \ red0 \ green0 \ blue255; \ red0 \ green255 \ blue255; \ red0 \ green255 \ blue0; \ red255 \ green0 \ blue255; \ red255 \ green0 \ blue0; \ red255 \ green255 \ blue0; \ red255 \ green255 \ blue255; \ red0 \ green0 \ blue128; \ red0 \ green128 \ blue128; \ red0 \ green128 \ blue0; \ red128 \ green0 \ blue128; \ red128 \ green0 \ blue0; \ red128 \ green128 \ blue0; \ red128 \ green128 \ blue128; \ red192 \ green192 \ blue192;} {\ stylesheet {\ ql \ li0 \ ri0 \ widctlpar \ aspalpha \ aspnum \ faauto \ adjustright \ rin0 \ lin0 \ itap0 \ fs24 \ lang1049 \ langfe1049 \ cgrid \ langnp1049 \ langfenp1049 \ snext0 Normal;} {* \ cs10 \ additive \ ssemihidden默认段落 Font;} {* \ ts11 \ tsrowd \ trftsWidthB3 \ trpaddl108 \ trpaddr108 \ trpaddfl3 \ trpaddft3 \ trpaddfb3 \ trpaddfr3 \ trcbpat1 \ trcfpat1 \ tscellwidthfts0 \ tsvertalt \ tsbrdrt \ tsbrdrl \ tsbrdrb \ tsbrdrr \ tsbrdrdgl \ tsbrdrdgr \ tsbrdrh \ tsbrdrv \ QL \ li0 \ RI0 \ widctlpar \ aspalpha \ aspnum \ faauto \ adjustright \ rin0 \ LIN0 \ itap0 \ FS20 \ lang1024 \ langfe1024 \ cgrid \ langnp1024 \ langfenp1024 \ snext11 \ ssemihidden正常 表;}} {* \ latentstyles \ lsdstimax156 \ lsdlockeddef0} {* \ rsidtbl \ rsid2111663 \ rsid7154806 \ rsid15558346} {* \ generator Microsoft Word 11.0.5604;} {\ info {\ author Programmer} {\ operator Programmer} {\ creatim \ yr2011 \ mo8 \ dy2 \ hr12 \ min45} { \ revtim \ yr2011 \ mo8 \ dy5 \ hr12 \ min34} {\ version3} {\ edmins1} {\ nofpages1} {\ nofwords5} {\ nofchars34} {\ nofcharsws38} {\ vvern24689}} \ margl1701 \ margr850 \ margt1134 \ margb1134 \ widowctrl \ ftnbj \ aenddoc \ noxlattoyen \ expshrtn \ noultrlspc \ dntblnsbdb \ nospaceforul \ hyphcaps0 \ horzdoc \ dghspace120 \ dgvspace120 \ dghorigin1701 \ dgvorigin1984 \ dghshow0 \ dgvshow3 \ jcompress \ viewkind1 \ viewscale100 \ nolnhtadjtbl \ rsidroot15558346 \ fet0 \ sectd \ linex0 \ sectdefaultcl \ sftnbj {* \ pnseclvl1 \ pnucrm \ pnstart1 \ pnindent720 \ pnhang {\ pntxta 。}} {* \ pnseclvl2 \ pnucltr \ pnstart1 \ pnindent720 \ pnha ng {\ pntxta 。}} {* \ pnseclvl3 \ pndec \ pnstart1 \ pnindent720 \ pnhang {\ pntxta 。}} {* \ pnseclvl6 \ pnseclvl4 \ pnhttp:// pntxta }} {* \ pnseclvl6 \ pnseclvl6 \ pnseclvl6 \ pnseclvl6 \ pnseclvl6 \ pnseclvl6 \ pnseclvl6 \ pnlcltr \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb (} {\ pntxta)}} {* \ pnseclvl7 \ pnlcrm \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb(} {\ pntxta )}} {* \ pnseclvl8 \ pnlcltr \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb (} {\ pntxta)}} {* \ pnseclvl9 \ pnlcrm \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb(} {\ pntxta)}} \ pard \ plain \ ql \ li0 \ RI0 \ nowidctlpar \ faauto \ rin0 \ LIN0 \ itap0 \ FS24 \ lang1049 \ langfe1049 \ CGRID \ langnp1049 \ langfenp1049 {\ b \ insrsid7154806 \ charrsid7154806行1个\帕} {\ insrsid7154806 \帕 } {\ I \ insrsid7154806 \ charrsid7154806 行3} {\ lang1048 \ langfe1049 \ langnp1048 \ insrsid7154806 \帕 } {\ lang1048 \ langfe1049 \ langnp1048 \ insrsid2111663 [BreakPage] \帕 } {\ insrsid7154806线路4 \帕\面值5号线\帕}}

任何人都可以帮助我吗?

谢谢!

回答

5

问题是RTF在全局标题中有一些(但不一定是全部)格式化信息。为了分割RTF文本以便结果再次有效,应用格式化的RTF基本上需要知道标头信息的位置,并在分割中复制它。

这样做有两种方式:

  1. 写一个RTF解析器
  2. 使用现有的RTF解析器

(1)是可行的,但需要时间。幸运的是,RTF解析器已经存在,例如this one on CodeProject

另外,您也可以将RTF文本加载到RichTextBox,然后搜索拆分文本"[BreakPage]"RichTextBox内,以编程方式选择第一和第二部分,并使用SelectedRtf属性检索RTF文本。