2017-02-03 26 views
0

我期待利用Service Fabric附带的混乱测试功能。我已经按照文档中的描述设置我的代码:https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-controlled-chaos运行StartChaosAsync方法似乎对服务结构集群没有明显影响

我遇到的问题是我几乎总是看不到我的集群显示的故障事件。以下是我运行混沌示例代码的控制台输出示例;

[StartedEvent], Timestamp=03/02/2017 16:32:21 
 
ChaosParameters: maxClusterStabilizationTimeout=00:00:30, waitTimeBetweenFaults=00:00:20, waitTimeBetweenIterations=00:00:30, maxConcurrentFaults=3, timeToRun=01:00:00, enableMoveReplicas=True, Context: 
 
ClusterHealthPolicy=ClusterHealthPolicy: MaxPercentUnhealthyNodes=0, MaxPercentUnhealthyApplications=0, ConsiderWarningAsError=True 
 

 

 
[ExecutingFaultsEvent], Timestamp=03/02/2017 16:32:26 
 
0 Faults: 
 

 
[ExecutingFaultsEvent], Timestamp=03/02/2017 16:33:00 
 
0 Faults: 
 

 
[ExecutingFaultsEvent], Timestamp=03/02/2017 16:33:33 
 
0 Faults: 
 

 
[ExecutingFaultsEvent], Timestamp=03/02/2017 16:34:06 
 
0 Faults: 
 

 
[ExecutingFaultsEvent], Timestamp=03/02/2017 16:34:40 
 
0 Faults: 
 

 
[ExecutingFaultsEvent], Timestamp=03/02/2017 16:35:13

我失去了任何配置?

我得到这些结果针对我的本地和Azure群集。我也尝试了C#和PowerShell的例子,它们都有相同的结果。

我只看到过这个工作一次(本地),每个[ExecutingFaultsEvent]都重新启动一个节点。我是否也应该在这里看到多种类型的错误?

预先感谢

+0

您是否还可以在集群中添加实体的当前运行状况?如果有任何警告,那么“考虑警告性的错误”,那么混沌就会认为事情不健康,不会移动它们。 – masnider

+0

我曾尝试将“ConsiderWarningsAsError”设置为true,并且已确认所有实体都健康,但每次运行此代码时仍会看到相同的问题。我可以在任何地方看到日志来帮助诊断吗? –

+0

您是否因为使用我的天蓝色服务架构群集发现类似问题而遇到过任何问题? – Kramer00

回答

0

问题是与所提供的样本代码(和缺乏有用样本,通常的,在这个领域...)。另外,为了立即满足(看到混乱发生而没有等待太久......),你需要比文档样本更具攻击性(这又是一种真正的工作,因为你已经发现了......) 。

你是更好的服务使用不同过载为ChaosParameters构造...

试试这个(更换样品代码与此IMPL):

  var startTimeUtc = DateTime.UtcNow; 
      var stabilizationTimeout = TimeSpan.FromSeconds(30.0); 
      var timeToRun = TimeSpan.FromMinutes(60.0); 
      var maxConcurrentFaults = 7; 
      var timeBetweenFaults = new TimeSpan(0, 0, 10); 
      var timeBetweenIterations = new TimeSpan(0, 0, 10); 
      Dictionary<string, string> _context = new Dictionary<string, string>(); 
      //Aggressive chaos... 
      var clusterHealthPolicy = new System.Fabric.Health.ClusterHealthPolicy() 
      { 
       MaxPercentUnhealthyApplications = 90, 
       MaxPercentUnhealthyNodes = 100 
      }; 

      var parameters = new ChaosParameters(
       stabilizationTimeout, 
       maxConcurrentFaults, 
       true, /* EnableMoveReplicaFault */ 
       timeToRun, 
       _context, 
       timeBetweenIterations, 
       timeBetweenFaults, 
       clusterHealthPolicy); 

注意:我建议你在新的静态异步任务返回功能中执行此操作...

Full(working)示例:

public static async Task RunChaos() 
    { 
     var clusterConnectionString = "localhost:19000"; 
     using (var client = new FabricClient(clusterConnectionString)) 
     { 
      var startTimeUtc = DateTime.UtcNow; 
      var stabilizationTimeout = TimeSpan.FromSeconds(30.0); 
      var timeToRun = TimeSpan.FromMinutes(60.0); 
      var maxConcurrentFaults = 7; 
      var timeBetweenFaults = new TimeSpan(0, 0, 10); 
      var timeBetweenIterations = new TimeSpan(0, 0, 10); 
      Dictionary<string, string> _context = new Dictionary<string, string>(); 
      //Aggressive chaos... 
      var clusterHealthPolicy = new System.Fabric.Health.ClusterHealthPolicy() 
      { 
       MaxPercentUnhealthyApplications = 90, 
       MaxPercentUnhealthyNodes = 100 
      }; 

      var parameters = new ChaosParameters(
       stabilizationTimeout, 
       maxConcurrentFaults, 
       true, /* EnableMoveReplicaFault */ 
       timeToRun, 
       _context, 
       timeBetweenIterations, 
       timeBetweenFaults, 
       clusterHealthPolicy); 

      var token = new System.Threading.CancellationToken(); 

      try 
      { 
       await client.TestManager.StartChaosAsync(parameters, new TimeSpan(0, 30, 0), token); 
      } 
      catch (FabricChaosAlreadyRunningException) 
      { 
       Console.WriteLine("An instance of Chaos is already running in the cluster."); 
      } 

      var filter = new ChaosReportFilter(startTimeUtc, DateTime.MaxValue); 

      var eventSet = new HashSet<ChaosEvent>(new ChaosEventComparer()); 

      while (true) 
      { 
       var report = await client.TestManager.GetChaosReportAsync(filter); 

       foreach (var chaosEvent in report.History) 
       { 
        if (eventSet.Add(chaosEvent)) 
        { 
         Console.WriteLine(chaosEvent); 
        } 
       } 

       // When Chaos stops, a StoppedEvent is created. 
       // If a StoppedEvent is found, exit the loop. 
       var lastEvent = report.History.LastOrDefault(); 

       if (lastEvent is StoppedEvent) 
       { 
        break; 
       } 

       Task.Delay(TimeSpan.FromSeconds(1.0)).GetAwaiter().GetResult(); 
      } 
     } 
    } 
相关问题