如何将文本文件读入matlab并将其设置为列表？

我有了格式如何将文本文件读入matlab并将其设置为列表？

gene   complement(22995..24539) 
       /gene="ppp" 
       /locus_tag="MRA_0020" 
CDS    complement(22995..24539) 
       /gene="ppp" 
       /locus_tag="MRA_0020" 
       /codon_start=1 
       /transl_table=11 
       /product="putative serine/threonine phosphatase Ppp" 
       /protein_id="ABQ71738.1" 
       /db_xref="GI:148503929" 
gene   complement(24628..25095) 
       /locus_tag="MRA_0021" 
CDS    complement(24628..25095) 
       /locus_tag="MRA_0021" 
       /codon_start=1 
       /transl_table=11 
       /product="hypothetical protein" 
       /protein_id="ABQ71739.1" 
       /db_xref="GI:148503930" 
gene   complement(25219..26802) 
       /locus_tag="MRA_0022" 
CDS    complement(25219..26802) 
       /locus_tag="MRA_0022" 
       /codon_start=1 
       /transl_table=11 
       /product="hypothetical protein" 
       /protein_id="ABQ71740.1" 
       /db_xref="GI:148503931"

我想读文本文件到Matlab和做一个清单从系基因为出发点，在列表中每个项目的信息的文本文件。所以对于这个例子，列表中会有3个项目。我已经尝试了一些东西，无法让这个工作。任何人有任何想法我可以做什么？

来源

2010-06-13 Ben Fossen

下面是一个算法的快速建议：

公开赛fopen
开始读取线与fgetl，直到找到与'CDS'开始行的文件。
保持读取行，直到您得到以'gene'开头的另一行。
对于（2）和（3）中
- 找到'/'和'='之间的字符串的行之间的所有行。这是字段名称
- 找到引号之间的字符串。这是场
的向上一个计数器的值，从2号开始，直到你完成读取文件

这些命令可能会有所帮助：

要查找由特定的字符包围的字符串，请使用例如 regexp(lineThatHasBeenRead,'/(.+)=','tokens','once')
要创建输出结构，请使用动态字段名称，例如， output(ct).(fieldname) = value;

编辑

下面是一些代码。我将你的例子保存为'test.txt'。

% open file 
fid = fopen('test.txt'); 

% parse the file 
eof = false; 
geneCt = 1; 
clear output % you cannot reassign output if it exists with different fieldnames already 
output(1:1000) = struct; % you may want to initialize fields here 
while ~eof 
    % read lines till we find one with CDS 
    foundCDS = false; 
    while ~foundCDS 
     currentLine = fgetl(fid); 
     % check for eof, then CDS. Allow whitespace at the beginning 
     if currentLine == -1 
      % end of file 
      eof = true; 
     elseif ~isempty(regexp(currentLine,'^\s+CDS','match','once')) 
      foundCDS = true; 
     end 
    end % looking for CDS 

    if ~eof 

     % read (and remember) lines till we find 'gene' 
     collectedLines = cell(1,20); % assume no more than 20 lines pere gene. Row vector for looping below 
     foundGene = false; 
     lineCt = 1; 
     while ~foundGene 
      currentLine = fgetl(fid); 
      % check for eof, then gene. Allow whitespace at the beginning 
      if currentLine == -1; 
       % end of file - consider all data has been read 
       eof = true; 
       foundGene = true; 
      elseif ~isempty(regexp(currentLine,'^\s+gene','match','once')) 
       foundGene = true; 
      else 
       collectedLines{lineCt} = currentLine; 
       lineCt = lineCt + 1; 
      end 
     end 

     % loop through collectedLines and assign. Do not loop through the 
     % gene line 
     for line = collectedLines(1:lineCt-1) 
      fieldname = regexp(line{1},'/(.+)=','tokens','once'); 
      value = regexp(line{1},'="?([^"]+)"?$','tokens','once'); 
      % try converting value to number 
      numValue = str2double(value); 
      if isfinite(numValue) 
       value = numValue; 
      else 
       value = value{1}; 
      end 
      output(geneCt).(fieldname{1}) = value; 
     end 
     geneCt = geneCt + 1; 
    end 
end % while eof 

% cleanup 
fclose(fid); 
output(geneCt:end) = [];

来源

2010-06-14 04:47:50 Jonas

如何将文本文件读入matlab并将其设置为列表？

回答

相关问题