2014-04-22 49 views
1

我正在处理twitter文本c#库,并且Twitter已经在其一致性测试中添加了双字unicode字符测试。YamlDotNet没有正确地反序列化双字unicode字符

https://github.com/twitter/twitter-text-conformance/blob/master/validate.yml

这里有一个NUnit的测试方法对上述文件运行。

[Test] 
    public void TestDoubleWordUnicodeYamlRetrieval() 
    { 
     var yamlFile = "validate.yml"; 
     Assert.IsTrue(File.Exists(conformanceDir + yamlFile), "Yaml file " + conformanceDir + yamlFile + " does not exist."); 

     var stream = new StreamReader(Path.Combine(conformanceDir, yamlFile)); 
     var yaml = new YamlStream(); 
     yaml.Load(stream); 

     var root = yaml.Documents[0].RootNode as YamlMappingNode; 
     var testNode = new YamlScalarNode("tests"); 
     Assert.IsTrue(root.Children.ContainsKey(testNode), "Document is missing test node."); 
     var tests = root.Children[testNode] as YamlMappingNode; 
     Assert.IsNotNull(tests, "Test node is not YamlMappingNode"); 

     var typeNode = new YamlScalarNode("lengths"); 
     Assert.IsTrue(tests.Children.ContainsKey(typeNode), "Test type lengths not found in tests."); 
     var typeTests = tests.Children[typeNode] as YamlSequenceNode; 
     Assert.IsNotNull(typeTests, "lengths tests are not YamlSequenceNode"); 

     var list = new List<dynamic>(); 
     var count = 0; 
     foreach (YamlMappingNode item in typeTests) 
     { 
      var text = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "text").Value) as string; 
      var description = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "description").Value) as string; 
      Assert.DoesNotThrow(() => {text.Normalize(NormalizationForm.FormC);}, String.Format("Yaml couldn't parse a double word unicode string at test {0} - {1}.", count, description)); 
      count++; 
     } 
    } 

这是产生的误差: Vocus.TwitterText.Tests.ConformanceTest.TestDoubleWordUnicodeYamlRetrieval: YAML未能在试验5解析一个双字unicode字符串 - 计数基本多语种平面之外的unicode字符(双字)。 意外的异常信息:System.ArgumentException

回答

0

我不认为这是是YAML解析器,你可以试试:

using (var stream = new StreamReader(path, Encoding.UTF8)) 
{ 
    var yaml = new YamlStream(); 
    yaml.Load(stream); 
    //Do the rest of your code 
} 
+0

对不起在回答这么晚了,但是这并没有帮助。 有问题的特定线实际上不是UTF8字符,但unicoded字符表示: 文本:“\ U00010000 \ U0010ffff” 使用流读取器时,输出的文件为一个字符串,字符是正确的。使用yaml检索节点时,输出为\ 0。 –