2011-05-03 89 views
4

我一直在为iPad平台开发学校报纸应用程序。我正在使用NSXMLParser来获取每篇文章的标题,简要说明和链接。为了从每个解析的链接中获取HTML项目,我决定使用Hpple解析器。我想我正确解析和存储RSS项目,但是当我尝试使用for循环解析每个已解析链接的HTML项目时,它告诉我我有一个空的RSS项目数组。但是,我可以在控制台上显示RSS项目的内容。所以,它不是空的。我会把我的代码的一部分,并从控制台显示。请帮助我。这个项目的到期日期很快。提前致谢。使用Hpple解析器和NSXMLParser迭代解析内部HTML

这里是我开始加载我的RSS解析器(articleParser):

- (void)loadData { 
    [self loadInitData]; 

    //[self loadDataWithLink]; 

} 

- (void)loadInitData { 
    if (sections == nil) { 
     [activityIndicator startAnimating]; 

     NSLog(@"STARTING ARTICLE PARSER FROM MAIN URL!!!"); 

     Parser *articleParser = [[Parser alloc] init]; 
     [articleParser parseRssFeed:@"http://theaggie.org/rss/headlines.xml" withDelegate:self]; 
     [articleParser release]; 
    } else { 

    } 

} 

及以下就是我如何在存储阵列NSMutable的收到条项目被称为“节”。然后我使用for循环遍历解析文章的每个链接。

- (void)receivedArticleItems:(Article *)theArticle { 
    if (sections == nil) { 
     sections = [[NSMutableArray alloc] init]; 
    } 
    [sections addObject:theArticle]; 

    NSLog(@"We recieved the article!"); 
    NSLog(@"Article: %@", theArticle); 
    NSLog(@"What is in sections: %@", sections); 

for (int i = 1; i < 5; i++) { 
     NSLog(@"articleItems: %@",[sections objectAtIndex:0]); 
     NSLog(@"articleItems at index 0: %@",[[[sections objectAtIndex:0] articleItems] objectAtIndex:0]); 

     [self loadDataWithLink:[[[[sections objectAtIndex:0] articleItems] objectAtIndex:0] objectForKey:@"link"]]; 
    } 
    [activityIndicator stopAnimating]; 
} 

下面是我如何使用TFFHpple解析器来获取每个解析链接的HTML项目:

- (void)loadDataWithLink:(NSString *)urlString{ 

NSData *htmlData = [NSData dataWithContentsOfURL:[NSURL URLWithString:urlString]]; 

// Create parser 
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData]; 

//Get all the cells main body 
htmlElements = [xpathParser search:@"//div[@id='main']/div[@id='mainCol1']/div[@id='main-body']"]; 

// Access the first cell 
TFHppleElement *htmlElement = [htmlElements objectAtIndex:0]; 

// NSString *title = [htmlElement content]; 

NSLog(@"What is in element: %@", htmlElement); 

[xpathParser release]; 
//[htmlData release]; 
} 

而这就是我得到的控制台上:

2011-05-02 22:58:35.355 TheCalAggie[2443:207] Parsing started for article! 
2011-05-02 22:58:35.356 TheCalAggie[2443:207] Adding story title: Students say, 'No time for books' 
2011-05-02 22:58:35.356 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/03/students-say-no-time-for-books 
2011-05-02 22:58:35.357 TheCalAggie[2443:207] Summary: The last book managerial economics major Kiyan Parsa read for fun was The Lord of the Rings. That was in high school. 
2011-05-02 22:58:35.358 TheCalAggie[2443:207] Published on: Tue, 03 May 2011 00:00:00 -0700 
2011-05-02 22:58:35.359 TheCalAggie[2443:207] Parsing started for article! 
2011-05-02 22:58:35.360 TheCalAggie[2443:207] Adding story title: UC Davis craft center one of largest college crafting centers 
2011-05-02 22:58:35.360 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/02/uc-davis-craft-center-one-of-largest-college-crafting-centers 
2011-05-02 22:58:35.361 TheCalAggie[2443:207] Summary: Hidden away in the South Silo, the UC Davis Craft Center offers 10 craft studios and more than a hundred classes for students looking to learn or perfect their crafting skills. 
2011-05-02 22:58:35.362 TheCalAggie[2443:207] Published on: Mon, 02 May 2011 00:00:00 -0700 
2011-05-02 22:58:35.362 TheCalAggie[2443:207] We recieved the article! 
2011-05-02 22:58:35.363 TheCalAggie[2443:207] Article: *nil description* 
2011-05-02 22:58:35.364 TheCalAggie[2443:207] What is in sections: (
    (null) 
) 
2011-05-02 22:58:35.374 TheCalAggie[2443:207] articleItems: *nil description* 
2011-05-02 22:58:35.375 TheCalAggie[2443:207] articleItems at index 0: { 
    link = "http://theaggie.org/article/2011/05/03/peaceful-rally-held-on-campus-after-killing-of-bin-laden\n"; 
    pubDate = "Tue, 03 May 2011 00:00:00 -0700"; 
    summary = "The announcement of Osama bin Laden's death sent a wave of patriotism across the nation and UC Davis. Bin Laden was the leader of al-Qaeda - the organization allegedly behind the Sept. 11, 2001 attacks that killed over 3,000 Americans.\n"; 
    title = "Peaceful rally held on campus after killing of bin Laden \n"; 
} 
2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse. 
2011-05-02 22:59:35.379 TheCalAggie[2443:207] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSMutableArray objectAtIndex:]: index 0 beyond bounds for empty array' 
*** Call stack at first throw: 

任何帮助将不胜感激。再次感谢。

回答

3

2011-05-02 22:59:35.376 TheCalAggie [2443:207]无法解析。

解析器正在努力解析HTML。该解析器在解析HTML时并不完美。对于通过可能损坏/无效的HTML文档运行XPath的解析来说,这是一件复杂的事情。

传递链接,你试图通过W3C验证器here解析是抛出一些错误;所以它不是完全有效的HTML。如果它太破碎了解析器,那么你必须调试并找出它。要真正了解这一点,您需要在您使用的解析器中设置断点以了解更多信息。

+0

非常感谢Damien!我通过HTML源代码挖掘,终于能够解析我需要的东西。现在,我还遇到了MVC的其他问题。我发布了另一个问题。你也可以帮助我吗?这里是该问题的链接:http://stackoverflow.com/questions/6132894/passing-uitextview-values-to-modalviewcontroller-from-parent-view-controller – SerPiero 2011-05-26 04:32:03

0

达米恩是对的。首先,您必须修复html以使您的代码正常工作。它解析的数据每次都不一样。这证明HTML是越野车。所以代码可能在某些情况下工作。尝试运行几次。你会偶尔看到它的工作。

+0

是的。我终于能够解析我需要的东西。尽管如此,我不得不挖掘HTML源代码。谢谢你的回应。如果你能帮我解决问题,我还有一个问题。这里是链接:http://stackoverflow.com/questions/6132894/passing-uitextview-values-to-modalviewcontroller-from-parent-view-controller – SerPiero 2011-05-26 04:34:32