解析与d

文件我在d新希望解析表单解析与d

>name1 
acgcgcagagatatagctagatcg 
aagctctgctcgcgct 
>name2 
acgggggcttgctagctcgatagatcga 
agctctctttctccttcttcttctagagaga 
>name2 
gag ggagag

，这样我可以捕捉到“头” 1，名称，与相应的“序列NAME3的生物档案'数据，..acgcg ...东西。

现在我有this.but它将被行只迭代线，

import std.stdio; 
import std.stream; 
import std.regex; 


int main(string[] args){ 
    auto filename = args[1]; 
    auto entry_name = regex(r"^>(.*)"); //captures header only 
    auto fasta_regex = regex(r"(\>.+\n)([^\>]+\n)"); //captures header and correponding sequence 

    try { 
    Stream file = new BufferedFile(filename); 
    foreach(ulong n, char[] line; file) { 
     auto name_capture = match(line,entry_name); 
     writeln(name_capture.captures[1]); 
    } 

    file.close(); 
    } 
    catch (FileException xy){ 
    writefln("Error reading the file: "); 
    } 

    catch (Exception xx){ 
    writefln("Exception occured: " ~ xx.toString()); 
    } 
    return 0; 
}

我想知道提取头的一个很好的方式和顺序数据，这样我可以创建一个关联数组，其中每个项目对应于文件中的条目

[name1:acgcgcagagatatagctagatcgaagctctgctcgcgct,name2:acgggggcttgctagctcgatagatcgaagctctctttctccttcttcttctagagaga,.....]

来源

2012-01-24 eastafri

d似乎是流行的生物信息学家:) – Trass3r

标题在它自己的行上是正确的吗？那么为什么不检查，并使用一个appender分配的值

auto current = std.array.appender!(char[]); 
string name; 
foreach(ulong n, char[] line; file) { 
     auto entry = match(line,entry_name); 
     if(entry){//we are in a header line 

      if(name){//write what was caught 
       map[name]=current.data.dup;//dup because .current.data is reused 
      } 
      name = entry.hit.idup; 
      current.clear(); 
     }else{ 
      current.put(line); 
     } 
} 
map[name]=current.data.dup;//remember last capture

地图是在这里您可以存储值（一个string[string]会做）

来源

2012-01-24 20:33:50

非常感谢中！标题在它自己的行上。我不明白c.hit来自哪里:)另外，为什么我们应该分配entry_name作为匹配对象？（它应该是一个正则表达式）。最后是map [name]的类型是什么？对不起，目前这是一个n00b。 – eastafri

获取一些编译错误（dmd2）：Regex！（char）类型的表达式entry_name没有布尔值 read_file.d（37）：错误：未定义的标识符映射，你的意思是main函数吗？ read_file.d（39）：错误：无法将char []类型的表达式（c.hit（））隐式转换为字符串 read_file.d（42）：错误：没有属性'append'for'Appender！（char []）' read_file.d（45）：错误：未定义的标识符映射，你的意思是功能主？ read_file.d（49）：错误：未定义的标识符FileExceptio – eastafri

@eastafri做了一个编辑修复其中的一些错误 –

这里是没有正则表达式我的解决方案（我不相信这种简单的输入我们需要的正则表达式）：

import std.stdio; 
import std.stream; 

int main(string[] args) { 
    int ret = 0; 
    string fileName = args[1]; 
    string header; 
    char[] sequence; 
    string[string] content; 
    try { 
    auto file = new BufferedFile(fileName); 
    foreach(ulong lineNumber, char[] line; file) { 
     if (line[0] == '>') {  
     if (header.length > 0) { 
      content[header] = sequence.idup; 
      sequence.length = 0; 
     } // if 
     // we have a new header, and new sequence will start after it 
     header = line[1..$].idup; 
     content[header] = ""; 
     } else { 
      sequence ~= line; 
     } // else 
    } // foreach 
    content[header] = sequence.idup; 
    file.close(); 
    } 
    catch (OpenException oe){ 
    writefln("Error opening file: " ~ oe.toString()); 
    } 
    catch (Exception e){ 
    writefln("Exception: " ~ e.toString()); 
    } 
    writeln(content); 
    return ret; 
} // main() function 

/+ -------------------------- BEGIN OUTPUT ------------------------------- + 
["name3":"gag ggagag", "name1":"acgcgcagagatatagctagatcgaagctctgctcgcgct", "name2":"acgggggcttgctagctcgatagatcgaagctctctttctccttcttcttctagagaga"] 
+ -------------------------- END OUTPUT --------------------------------- +/

来源

2012-01-25 10:11:37 DejanLekic

哦，很好的主意！谢谢！ – eastafri

回答

相关问题