2013-08-29 238 views
2

背景:我试图创建一个行为文件来乘以三个矩阵。我试图通过首先查看是否可以读取输入矩阵然后输出中间矩阵来进行调试。VHDL矩阵乘法

行为文件:

LIBRARY ieee; 
USE ieee.std_logic_1164.ALL; 

entity DCT_beh is 
    port (
      Clk :   in std_logic; 
      Start :   in std_logic; 
      Din :   in INTEGER; 
      Done :   out std_logic; 
      Dout :   out INTEGER 
     ); 
end DCT_beh; 

architecture behavioral of DCT_beh is 
begin 
    process 
      type RF is array (0 to 7, 0 to 7) of INTEGER; 

      variable i, j, k  : INTEGER; 
      variable InBlock  : RF; 
      variable COSBlock  : RF; 
      variable TempBlock  : RF; 
      variable OutBlock  : RF; 
      variable A, B, P, Sum : INTEGER; 

    begin 

      COSBlock := ( 
    (125, 122, 115, 103, 88,  69,  47,  24 ), 
    (125, 103, 47,  -24, -88, -122, -115, -69 ), 
    (125, 69,  -47, -122, -88, 24,  115, 103 ), 
    (125, 24,  -115, -69, 88,  103, -47, -122 ), 
    (125, -24, -115, 69,  88,  -103, -47, 122 ), 
    (125, -69, -47, 122, -88, -24, 115, -103 ), 
    (125, -103, 47,  24,  -88, 122, -115, 69 ), 
    (125, -122, 115, -103, 88,  -69, 47,  -24 ) 
        ); 

--Starting 
    wait until Start = '1'; 
     Done <= '0'; 

--Read Input Data 
    for i in 0 to 7 loop 
     for j in 0 to 7 loop  
      wait until Clk = '1' and clk'event; 
      InBlock(i,j) := Din; 
     end loop; 
    end loop; 

--TempBlock = COSBLOCK * InBlock 

    for i in 0 to 7 loop 
     for j in 0 to 7 loop 
      Sum := 0; 
      for k in 0 to 7 loop 
       A := COSBlock(i, k); 
       B := InBlock(k, j); 
       P := A * B; 
       Sum := Sum + P; 
       if(k = 7) then 
       TempBlock(i, j) := Sum; 
       end if; 
      end loop; 
     end loop; 
    end loop; 


--Finishing 

    wait until Clk = '1' and Clk'event; 
    Done <= '1'; 

--Output Data 

    for i in 0 to 7 loop 
     for j in 0 to 7 loop 
      wait until Clk = '1' and Clk'event; 
      Done <= '0'; 
      Dout <= tempblock(i,j); 
     end loop; 
    end loop; 
end process;  
end behavioral; 

测试平台文件:

LIBRARY ieee; 
USE ieee.std_logic_1164.ALL; 

-- Uncomment the following library declaration if using 
-- arithmetic functions with Signed or Unsigned values 
--USE ieee.numeric_std.ALL; 

ENTITY lab4b_tb IS 
END lab4b_tb; 

ARCHITECTURE behavior OF lab4b_tb IS 

-- Component Declaration for the Unit Under Test (UUT) 

COMPONENT DCT_beh 
PORT(
    Clk : IN std_logic; 
    Start : IN std_logic; 
    Din : IN INTEGER; 
    Done : OUT std_logic; 
    Dout : OUT INTEGER 
    ); 
END COMPONENT; 


    --Inputs 
    signal Clk : std_logic := '0'; 
    signal Start : std_logic := '0'; 
    signal Din : INTEGER; 

--Outputs 
    signal Done : std_logic; 
    signal Dout : INTEGER; 

    -- Clock period definitions 
    constant Clk_period : time := 10 ns; 

BEGIN 

-- Instantiate the Unit Under Test (UUT) 
    uut: DCT_beh PORT MAP (
     Clk => Clk, 
     Start => Start, 
     Din => Din, 
     Done => Done, 
     Dout => Dout 
    ); 

    -- Clock process definitions 
    Clk_process :process 
    begin 
    Clk <= '0'; 
    wait for Clk_period/2; 
    Clk <= '1'; 
    wait for Clk_period/2; 
    end process; 


    -- Stimulus process 
    stim_proc: process 

variable i, j : INTEGER; 
variable cnt : INTEGER; 

    begin  
    -- hold reset state for 100 ns. 

    wait for 100 ns; 

     start <= '1'; 
     wait for clk_period; 
     start <= '0'; 

    for cnt in 0 to 63 loop 
     wait until clk = '1' and clk'event; 
      din <= cnt; 
     end loop; 

     --wait for 100 ns; 

     --start <= '1'; 
     --wait for clk_period; 
     --start <= '0'; 

     --for i in 0 to 63 loop 
      -- wait for clk_period; 
      --if (i < 24) then 
       --din <= 255; 
      --elsif (i > 40) then 
       --din <= 255; 
      --else 
       --din <= 0; 
      --end if; 
     --end loop; 


    wait; 
    end process; 

END; 

从我在做什么的时候开始= 1矩阵被读入inputblock。在这种情况下,矩阵只填充了从0到63的唯一增量值。然后,当done = 1时,输出outblock,它是乘以矩阵。问题是,在我的模拟中,我收到了一些应该在最终矩阵中的值,但它们的顺序并不正确。例如下面的行包含第一行中的矩阵相乘,tempblock:

14464.000 15157.000 15850.000 16543.000 17236.000 17929.000 18622.000 19315.000 

正如你可以在我的模拟图片中看到我得到一些这些值,但随后的信号变得有些怪异的较大值。 (0),din(1),din(2)... din(n)不对应于输入块(0,0),输入块(0,1),输入块inputblock(0,2)等等。但是我彻底检查了我的行为文件,没有发现任何问题。我的测试平台设计有什么问题吗?

Testbench: bottom signals are unsigned values

编辑:我需要帮助在输出该

 din<=0; 


    for i in 0 to 63 loop 
     wait until clk = '1' and clk'event; 
     if i = 0 then 
      Start <= '1','0' after clk_period; 
      end if; 
      if (i < 24) then 
       din <= 255; 
      elsif (i > 40) then 
       din <= 255; 
      else 
       din <= 0; 
      end if; 

    end loop; 

我认为这将是类似的答案代码,但我遇到同样的确切问题。这将如何解决?这里是目前输出的图片。正确的值在那里,但只是偏移了一个时钟周期。 enter image description here

FINAL EDIT:自己解决了。问题在于循环边界。

+1

不可能大或负? – user1155120

+0

它与所得矩阵中的其他值相比似乎非常大。出现第一个[0] [0]值,但仅在前两个时钟周期后出现。这导致我相信时间在两个时钟周期之外,导致整个outblock显示不正确。 – user1766888

+0

请注意您附带的图像显示wave add -radix unsigned/dout/din。在ghdl中,仿真在735 ns的时钟上升沿处崩溃,你的模型离开正常的飞行状态。我得到一个浮点异常,可能与范围有关。我有一个实验性的vhdl仿真器,在连续时钟到820ns时,在765ns,14464,15157,15850,16543和17236处显示-2147469752。 – user1155120

回答

1

这里是看起来是模型的工作版本,它的测试平台

增加(和更新)

如果你使基体多采取实时(时钟),你看到DONE延迟了矩阵乘法所用的时钟数。为了显示添加的寄存器文件的好处,我随意选择了两个时钟。

我会评论代码的有趣部分。

LIBRARY ieee; 
USE ieee.std_logic_1164.ALL; 

ENTITY lab4b_tb IS 
END lab4b_tb; 

ARCHITECTURE behavior OF lab4b_tb IS 

    signal Clk:  std_logic := '0'; -- no reset 
    signal Start: std_logic := '0'; -- no reset 
    signal Din:  INTEGER  := 0;  -- no reset 

    signal Done : std_logic; 
    signal Dout : INTEGER; 

    constant Clk_period : time := 10 ns; 

BEGIN 

    uut: entity work.DCT_beh -- DCT_beh 
     PORT MAP (
      Clk => Clk, 
      Start => Start, 
      Din => Din, 
      Done => Done, 
      Dout => Dout 
    ); 

CLOCK: 
    process 
    begin 
     Clk <= '0'; 
     wait for Clk_period/2; 
     Clk <= '1'; 
     wait for Clk_period/2; 
    end process; 

STIMULUS: 
    process 
     variable i, j : INTEGER; 
     variable cnt : INTEGER; 
    begin  

     wait until clk = '1' and clk'event; -- sync Start to clk 

FIRST_BLOCK_IN: 
     Start <= '1','0' after 11 ns; --issued same time as datum 0 
     for i in 0 to 63 loop 
       if (i < 24) then 
        din <= 255; 
       elsif (i > 40) then 
        din <= 255; 
       else 
        din <= 0; 
       end if; 
       wait until clk = '1' and clk'event; 
     end loop; 
SECOND_BLOCK_N: 
     Start <= '1','0' after 11 ns; -- with first datum 
     for cnt in 0 to 63 loop 
      din <= cnt; 
      wait until clk = '1' and clk'event; 
     end loop; 
     din <= 0; -- to show the last input datum clearly 

     wait; 
    end process; 

END ARCHITECTURE; 

这两个输入块是你的新块值,你的原始块值为第一个输出块提供了索引。第二个块也显示与原始相同的答案,验证DONE握手。

注意开始与每个块的第一个数据并行。

我还调整了输入激励以在时钟边界上开始,以便在时钟的下降沿没有第一个开始显示。

如果存在异步生成的脉冲,我将它们扩展为纳秒以确保它们在时钟边缘看到,因为它们不是在时钟边缘生成的。

LIBRARY ieee; 
USE ieee.std_logic_1164.ALL; 

entity DCT_beh is 
    port (
     Clk :   in std_logic; 
     Start :   in std_logic; 
     Din :   in INTEGER; 
     Done :   out std_logic; 
     Dout :   out INTEGER 
    ); 

end DCT_beh; 

architecture behavioral of DCT_beh is 
    type RF is array (0 to 7, 0 to 7) of INTEGER; 
    signal OutBlock:   RF; 
    signal InBlock:    RF; 
    signal internal_Done:  std_logic := '0'; -- no reset 
    signal Input_Ready:   std_logic := '0'; -- no reset 
    signal done_detected:  std_logic := '0'; -- no reset 
    signal input_rdy_detected: std_logic := '0'; -- no reset 
    signal last_out:   std_logic := '0'; -- no reset 

begin 
INPUT_DATA: 
    process 
    begin 
     wait until Start = '1'; 
     --Read Input Data 
     for i in 0 to 7 loop 
      for j in 0 to 7 loop  
       wait until Clk = '1' and clk'event; 
       InBlock(i,j) <= Din; 
       if i=7 and j=7 then 
        Input_Ready <= '1', '0' after 11 ns; 
       end if; 
      end loop; 
     end loop; 
    end process; 

WAIT_FOR_InBlock: 
    process 
    begin 
     wait until clk = '1' and clk'event; 
     input_rdy_detected <= Input_Ready; 
     --InBlock valid after the following rising edge of clk 
    end process; 

TRANSFORM: 
    process 
      variable InpBlock  : RF; 
      constant COSBlock  : RF := 
      ( 
       (125, 122, 115, 103, 88,  69,  47,  24 ), 
       (125, 103, 47, -24, -88, -122, -115,  -69 ), 
       (125, 69, -47, -122, -88,  24, 115,  103 ), 
       (125, 24, -115, -69, 88, 103, -47, -122 ), 
       (125, -24, -115,  69, 88, -103, -47,  122 ), 
       (125, -69, -47, 122, -88, -24, 115, -103 ), 
       (125, -103, 47,  24, -88, 122, -115,  69 ), 
       (125, -122, 115, -103, 88, -69,  47,  -24 ) 
      ); 
      variable TempBlock  : RF; 
      variable A, B, P, Sum : INTEGER; 
    begin 

     if input_rdy_detected = '0' then 
      wait until input_rdy_detected = '1'; 
     end if; 

     InpBlock := InBlock; -- Broadside dump or swap 

--TempBlock = COSBLOCK * InBlock 

-- arbitrarily make matrix multiple 2 clocks long  
     wait until clk = '1' and clk'event; -- 1st xfm clock 

     for i in 0 to 7 loop 
      for j in 0 to 7 loop 
       Sum := 0; 
       for k in 0 to 7 loop 
        A := COSBlock(i, k); 
        B := InpBlock(k, j); 
        P := A * B; 
        Sum := Sum + P; 
        if(k = 7) then 
         TempBlock(i, j) := Sum; 
        end if; 
       end loop; 
      end loop; 
     end loop; 

    -- Done issued in clk cycle of last TempBlock(i, j) := Sum; 

     internal_Done <= '1', '0' after 11 ns; 
     wait until clk = '1' and clk'event; -- 2nd xfrm clk 
     -- OutBlock available after last TempBlock value stored 

     OutBlock <= TempBlock; -- Broadside dump or swap 
    end process; 

Done_BUFFER: 
    Done <= internal_Done; 


WAIT_FOR_OutBlock: 
    process 
    begin 
     wait until clk = '1' and clk'event; 
     done_detected <= internal_Done; 
     -- Done can come either before the first output_data transfer 
     -- or during the last output data transfer 
     -- this gives us the clock delay to finish the last xfm transfer to 
     -- TempBlock(i, j) 
     -- Technically part of the output process but was too cumbersome to write 
    end process; 

OUTPUT_DATA: 
    process 
    begin 
     -- OutBlock is valid after clock edge when Done is true 
     for i in 0 to 7 loop 
      for j in 0 to 7 loop 

       if i = 0 and j = 0 then 

        if done_detected = '0' then 
         wait until done_detected = '1'; 
        end if; 
       end if; 

       Dout <= OutBlock(i,j);       
       wait until clk = '1' and clk'event; 
      end loop; 
     end loop; 
    end process; 

end behavioral; 

RF的类型定义已移至体系结构声明部分,以允许通过信号进行进程间通信。输入循环,矩阵乘法和输出循环都有自己的进程。我还添加了进程间握手(Input_Ready和input_Done(完成),添加信号input_rdy_detect和done_detect)

如果进程可能需要64个时钟,则会显示一个显示最后一个数据进程(Input_Ready和潜在完成)的信号 这将是非常混乱的代码,否则你仍然需要触发器

在输入进程和乘法进程之间有一个额外的RF允许并发操作时矩阵乘法需要实时性(在这个例子中需要2个时钟,我不想将波形拉得太远)

一些握手延迟似乎与编码风格相关,并且通过input_rdy_detect和done_detect触发器得到了解决。

第一个波形图显示了在A和B标记之间显示的变换过程现在需要两个时钟后的第一个输出数据。

Two Clock Matrix Multiply

你可以看到下面紧跟着要做的第一输出数据是78540,而不是在你的波形截屏显示的110415。我们中的一个显示了错误的价值。该版本的DCT_beh只有在加载最后一个数据后才严格执行RF值的传输。

在清理输入过程和乘法过程之间的握手之前,我确实获得了110415的值。通过TempBlock我们的OutBlock来追踪它会花费很多工作量。

现在是好消息。第二个输入块来自您的原始激励,输入值为输出传输提供了一个很好的索引。那些输出数据值都显示正确。

2nd Block Done and start of 2nd block output

input_rdy_detect和done_detect发生显示在它们各自的下游过程的第一个事务的信号。我在第二个输入块的末尾添加了尾随消声器信号分配,以避免混淆。

下面是一个逼近你的屏幕截图,我不能做选定的缩放,而是使用逐次逼近。

enter image description here

你只需要在外跑模拟到1955年纳秒捕捉2号地块被淘汰的最后数据。

这是在运行OS X 10.8.4的Mac上使用Tristan Gingold的ghdl和Tony Bybell的gtkwave完成的。

+0

谢谢你的回答。你知道我可以如何改变上面的编辑代码吗?我有完全相同的问题,但有一套不同的说明。 – user1766888

+0

我编辑了我的模型的副本,在管道中(以及在单独的进程中)操作三种状态。当我这样做的时候,我可以告诉你至少需要一个时钟(模拟循环Done与最后一次写入寄存器文件是同义的)。所以我添加了第二个显示边界的工作。 – user1155120