2011-05-29 41 views
5

可能重复:
Remove HTML Tags from an NSString on the iPhone地带HTML标签等从NSString的

我想知道剥离掉所有的HTML/JavaScript广告等标记出一个NSString的最佳方法。

目前的解决方案,我用树叶意见等标签,这将是删除它们的最好方法?

我知道解决方案,例如作者LibXML,但我希望有一些例子可以使用。

目前的解决方案:

- (NSString *)flattenHTML:(NSString *)html trimWhiteSpace:(BOOL)trim { 

    NSScanner *theScanner; 
    NSString *text = nil; 

    theScanner = [NSScanner scannerWithString:html]; 

    while ([theScanner isAtEnd] == NO) { 

     // find start of tag 
     [theScanner scanUpToString:@"<" intoString:NULL] ;     
     // find end of tag   
     [theScanner scanUpToString:@">" intoString:&text] ; 

     // replace the found tag with a space 
     //(you can filter multi-spaces out later if you wish) 
     html = [html stringByReplacingOccurrencesOfString: 
       [ NSString stringWithFormat:@"%@>", text] 
               withString:@""]; 
    } 

    // trim off whitespace 
    return trim ? [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] : html; 
} 
+0

@ x3ro中投票关闭为重复 – Mark 2011-05-29 21:33:42

+2

@马克,他做到了,这是评论自动添加(为了海报的利益)当一个人投票结束时。 – benzado 2011-05-29 21:35:53

+0

嗯收盘数仍然为零,当我看到它 – Mark 2011-05-29 21:36:54

回答

17

试试这个方法从一个字符串中删除HTML标签:

- (NSString *)stripTags:(NSString *)str 
{ 
    NSMutableString *html = [NSMutableString stringWithCapacity:[str length]]; 

    NSScanner *scanner = [NSScanner scannerWithString:str]; 
    scanner.charactersToBeSkipped = NULL; 
    NSString *tempText = nil; 

    while (![scanner isAtEnd]) 
    { 
     [scanner scanUpToString:@"<" intoString:&tempText]; 

     if (tempText != nil) 
      [html appendString:tempText]; 

     [scanner scanUpToString:@">" intoString:NULL]; 

     if (![scanner isAtEnd]) 
      [scanner setScanLocation:[scanner scanLocation] + 1]; 

     tempText = nil; 
    } 

    return html; 
} 
+0

做好!!!!!!! – 2012-10-10 17:21:35

+1

我添加'scanner.charactersToBeSkipped = NULL'以上面的代码,以避免字粘连,如下所述:http://stackoverflow.com/questions/2828737/strange-behaviour-of-nsscanner-on-simple-whitespace-removal – 2012-10-10 17:46:58

+0

好吧。谢谢。 – Dee 2012-10-11 06:08:14