2011-01-13 52 views
1

我有这样的代码:功能做文字自动转换

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock 
{ 
    NSString *someString = [[NSString alloc] initWithData:CDATABlock encoding:NSUTF8StringEncoding]; 


    someString = [ someString stringByReplacingOccurrencesOfString:@"%" withString: @"&" ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"|" withString: @"|" ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@" " withString: @" " ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"–" withString:@"-"]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"—" withString:@"——"]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"‘" withString:@"'" ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"’" withString:@"'" ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"‚" withString:@"," ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"“" withString:@"\"" ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"”" withString:@"\"" ]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"…" withString:@"..."]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"&#38;" withString:@"<"]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"&#39;" withString:@">"]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"&#8364;" withString:@"€"]; 
    someString = [ someString stringByReplacingOccurrencesOfString:@"&#8594;" withString:@"→"]; 

    if(nil != self.currentItemValue){ 
     [self.currentItemValue appendString:someString]; 
    } 
} 

有一个函数来自动完成这一角色的转换?

+3

你也可以通过提供一些答案来加热。 – Abizern 2011-01-13 20:08:44

回答

2

而不是像这样硬编码替换,有一个更好的方法。

这些实体的形式为:&# +十进制数字+ ;。十进制数位是该字符的unicode代码点的基本版本。因此,您可以使用此格式搜索子字符串,提取数字并将其直接转换为字符。

这里有一个办法做到这一点,利用RegexKitLite找到字符串:

NSString * source = @"&#38; &#39; &#124; &#160; &#8211; &#8212; &#8216; &#8217; &#8218; &#8220; &#8221; &#8230; &#8364; &#8594;"; 

NSString * regex = @"&#(\\d+);"; 
NSArray * matches = [source arrayOfCaptureComponentsMatchedByRegex:regex]; 

NSMutableString * decodedSource = [source mutableCopy]; 
for (NSArray * match in matches) { 
    NSString * fullMatch = [match objectAtIndex:0]; 
    NSString * decimalCode = [match objectAtIndex:1]; 

    unichar character = (unichar)[decimalCode intValue]; 
    NSString * replacement = [NSString stringWithFormat:@"%C", character]; 

    [decodedSource replaceOccurrencesOfString:fullMatch withString:replacement options:NSLiteralSearch range:NSMakeRange(0, [decodedSource length])]; 
} 

NSLog(@"decoded: %@", decodedSource); 
[decodedSource release]; 

在我的机器,这个记录:

decoded: & ' |   – — ‘ ’ ‚ “ ” … € → 

这不是最有效的方法(这是最糟糕的案例一O(nm)算法),但它是一个开始。 :)

2

哇,这是非常糟糕的,以及效率低下。至少,请切换到使用NSMutableString并进行内联替换。

在任何情况下,您都可以一次完成此操作,但您必须亲自编写代码。您可以使用NSScanner或类似-rangeOfString:options:range:的方法来找到每个连续的实体,然后自己找出它的替换。如果您使用的是NSMutableString,则可以用其替换替换该实体,并继续搜索(在修改您的位置(在NSScanner的情况下)或适当范围以适应实体和替换字符之间的长度差异) 。