1
我正在处理twitter文本c#库,并且Twitter已经在其一致性测试中添加了双字unicode字符测试。YamlDotNet没有正确地反序列化双字unicode字符
https://github.com/twitter/twitter-text-conformance/blob/master/validate.yml
这里有一个NUnit的测试方法对上述文件运行。
[Test]
public void TestDoubleWordUnicodeYamlRetrieval()
{
var yamlFile = "validate.yml";
Assert.IsTrue(File.Exists(conformanceDir + yamlFile), "Yaml file " + conformanceDir + yamlFile + " does not exist.");
var stream = new StreamReader(Path.Combine(conformanceDir, yamlFile));
var yaml = new YamlStream();
yaml.Load(stream);
var root = yaml.Documents[0].RootNode as YamlMappingNode;
var testNode = new YamlScalarNode("tests");
Assert.IsTrue(root.Children.ContainsKey(testNode), "Document is missing test node.");
var tests = root.Children[testNode] as YamlMappingNode;
Assert.IsNotNull(tests, "Test node is not YamlMappingNode");
var typeNode = new YamlScalarNode("lengths");
Assert.IsTrue(tests.Children.ContainsKey(typeNode), "Test type lengths not found in tests.");
var typeTests = tests.Children[typeNode] as YamlSequenceNode;
Assert.IsNotNull(typeTests, "lengths tests are not YamlSequenceNode");
var list = new List<dynamic>();
var count = 0;
foreach (YamlMappingNode item in typeTests)
{
var text = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "text").Value) as string;
var description = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "description").Value) as string;
Assert.DoesNotThrow(() => {text.Normalize(NormalizationForm.FormC);}, String.Format("Yaml couldn't parse a double word unicode string at test {0} - {1}.", count, description));
count++;
}
}
这是产生的误差: Vocus.TwitterText.Tests.ConformanceTest.TestDoubleWordUnicodeYamlRetrieval: YAML未能在试验5解析一个双字unicode字符串 - 计数基本多语种平面之外的unicode字符(双字)。 意外的异常信息:System.ArgumentException
对不起在回答这么晚了,但是这并没有帮助。 有问题的特定线实际上不是UTF8字符,但unicoded字符表示: 文本:“\ U00010000 \ U0010ffff” 使用流读取器时,输出的文件为一个字符串,字符是正确的。使用yaml检索节点时,输出为\ 0。 –