我正在使用Windbg来分析在delphi编写的数据快照应用程序服务器中发生的死锁。“DebugInfo for CritSec不会指向关键部分”当分析死锁
当我运行
!analyze -hang -v
我得到这个
:000:x86> !analyze -hang -v ******************************************************************************* * * * Exception Analysis * * * ******************************************************************************* GetPageUrlData failed, server returned HTTP status 404 URL requested: http://watson.microsoft.com/00000000.htm?Retriage=1 FAULTING_IP: +6ced240 00000000 ?? ??? EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 0000000000000000 ExceptionCode: 80000003 (Break instruction exception) ExceptionFlags: 00000000 NumberParameters: 0 FAULTING_THREAD: 0000000000000000 BUGCHECK_STR: HANG DEFAULT_BUCKET_ID: APPLICATION_HANG PROCESS_NAME: ********.exe ERROR_CODE: (NTSTATUS) 0xcfffffff - EXCEPTION_CODE: (NTSTATUS) 0xcfffffff - MOD_LIST: NTGLOBALFLAG: 0 APPLICATION_VERIFIER_FLAGS: 0 DERIVED_WAIT_CHAIN: Dl Eid Cid WaitType -- --- ------- -------------------------- 0 c7c.2634 Critical Section WAIT_CHAIN_COMMAND: ~0s;k;; BLOCKING_THREAD: 0000000000002634 PRIMARY_PROBLEM_CLASS: APPLICATION_HANG LAST_CONTROL_TRANSFER: from 0000000077138df4 to 000000007711f8b1 STACK_TEXT: 0018fc50 77138df4 00000c6c 00000000 00000000 ntdll_77100000!NtWaitForSingleObject+0x15 0018fcb4 77138cd8 00000000 00000000 03fe0940 ntdll_77100000!RtlpWaitOnCriticalSection+0x13e 0018fcdc 7369324f 736a3134 00000000 03fe0940 ntdll_77100000!RtlEnterCriticalSection+0x150 WARNING: Stack unwind information not available. Following frames may be wrong. 0018fcec 7369af5f 00000388 00000000 003d1e00 mswsock!GetLspGuid+0x19af 0018fd08 76366958 00000388 0018fd84 0018fd9c mswsock!GetLspGuid+0x96bf 0018fd38 0018fd58 763668cd 00000388 0018fd84 ws2_32!WSAAccept+0x84 00000000 00000000 00000000 00000000 00000000 0x18fd58 FOLLOWUP_IP: mswsock!GetLspGuid+19af 7369324f 33db xor ebx,ebx SYMBOL_STACK_INDEX: 3 SYMBOL_NAME: mswsock!GetLspGuid+19af FOLLOWUP_NAME: MachineOwner MODULE_NAME: C:\Windows\System32\mswsock IMAGE_NAME: lld DEBUG_FLR_IMAGE_TIMESTAMP: 4ce7c83d STACK_COMMAND: ~0s ; kb FAILURE_BUCKET_ID: APPLICATION_HANG_cfffffff_lld!Unloaded BUCKET_ID: X64_HANG_mswsock!GetLspGuid+19af WATSON_STAGEONE_URL: http://watson.microsoft.com/00000000.htm?Retriage=1 Followup: MachineOwner ---------
然后我做了
!locks -V
,看看哪些锁定在等待让我吃惊,它返回这一点,
0:000:x86> !locks -V CritSec ntdll!RtlCriticalSectionLock+0 at 0000000077057060 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec ntdll!LdrpLoaderLock+0 at 0000000077057490 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec ntdll!RtlpDynamicFunctionTableLock+0 at 0000000077057468 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec ntdll!FastPebLock+0 at 000000007705a900 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec ntdll!RtlpProcessHeapsListLock+0 at 000000007705a240 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec +270208 at 0000000000270208 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 1 CritSec ntdll!EtwProvCritSect+0 at 000000007705a120 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec ntdll!EtwPrivSessionCritSect+0 at 000000007705a1e0 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec +10208 at 0000000000010208 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 CritSec +276f40 at 0000000000276f40 LockCount NOT LOCKED RecursionCount 0 OwningThread 0 EntryCount 0 ContentionCount 0 Scanned 10 critical sections
从看调用堆栈
STACK_TEXT: 0018fc50 77138df4 00000c6c 00000000 00000000 ntdll_77100000!NtWaitForSingleObject+0x15 0018fcb4 77138cd8 00000000 00000000 03fe0940 ntdll_77100000!RtlpWaitOnCriticalSection+0x13e 0018fcdc 7369324f 736a3134 00000000 03fe0940 ntdll_77100000!RtlEnterCriticalSection+0x150 WARNING: Stack unwind information not available. Following frames may be wrong. 0018fcec 7369af5f 00000388 00000000 003d1e00 mswsock!GetLspGuid+0x19af 0018fd08 76366958 00000388 0018fd84 0018fd9c mswsock!GetLspGuid+0x96bf 0018fd38 0018fd58 763668cd 00000388 0018fd84 ws2_32!WSAAccept+0x84 00000000 00000000 00000000 00000000 00000000 0x18fd58
我确定它是在地址0x736a3134在临界区等待(第一个参数传递给RtlEnterCriticalSection),所以我跑这
!critsec 736a3134
这给了我这个输出
0:000:x86> !critsec 736a3134 DebugInfo for CritSec at 00000000736a3134 does not point back to the critical section NOT an initialized critical section. CritSec mswsock!WSPStartup+6f64 at 00000000736a3134 WaiterWoken Yes LockCount -1 RecursionCount 11028 OwningThread c6c EntryCount 1f49dad6 ContentionCount 88000000 *** Locked
现在一分钱下降,指向临界区的指针已变得不可靠可能是由于并发线程访问和代码中其他地方缺乏同步
我的问题是如何追踪这是哪里或找出是否是另一个问题?
PS:只有当应用程序为重负载下有可能700级的客户端连接
(它使用每个连接一个线程,我知道32位的应用程序将在默认线程限于aprox的2000多个线程出现此错误堆栈大小,这不是最好的办法)
PPS:我有多个崩溃转储应用程序挂起等待在不同的关键部分,在每种情况下关键部分的指针似乎不指向关键部分。
当你连接了delphi调试器时,你能重现错误吗?如果是这样,你可以检查调用堆栈,我想在后来的delphi版本甚至是IDE中的一些死锁信息。 另一件要提到的是,当进行错误的初始化时,关键部分是非常明智的,例如,一个代码snipplet,如: critSect.Leave; critSect.Enter; 可能有非常不好的副作用。 如果没有任何帮助,我建议你尝试两件事: 1.)在完全调试模式下使用FastMM(堆损坏) 2.)驱动你自己的关键部分类并计算所有进入和离开调用(最终加上一个调用PARAM)。 – mrabat
我们试图在Delphi调试器中重现,并且只是单独使用,但不能,因此使用windbg和崩溃转储的原因。我们怀疑为了得到我们需要的错误会在服务器上产生很大的负载,并且它也会成为关键时间。我们已经审查了所有代码,并且无法轻易发现锁定顺序的错误,或者我们错过了锁定或需要同步的任何地方,但认为必须有我们错过的东西。 – MikeT
您确定的指针不太可能是指向临界区域的指针。不要依赖参数在堆栈中保持正确,因为它们可能被覆盖并重用。首先,指针736a3134非常接近mswsock中的代码段,所以它可能是返回指针之一。 –