Just short of a year ago I built a system that will crash after about an hour of intensive gaming on only two games... those were the only two games I played that were intensive enough to cause it. The reason was that my power supply was inadequate. Upgrading the power supply eliminated all the problems.
Timeout Detection and Recovery
Windows Vista and later operating systems attempt to detect situations in which computers appear to be completely "frozen". They then attempt to dynamically recover from the frozen situations so that their desktops are responsive again. This process of detection and recovery is known as timeout detection and recovery (TDR). In the TDR process, the operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to reinitialize the driver and reset the GPU. Therefore, end users are not required to reboot the operating system, which greatly enhances their experience. The only visible artifact from the hang detection to the recovery is a screen flicker. This screen flicker results when the operating system resets some portions of the graphics stack, which causes a screen redraw. Some legacy DirectX applications (for example, those DirectX applications that conform to DirectX versions earlier than 9.0) might render to a black screen at the end of this recovery. The end user would have to restart these applications.
The following sequence briefly describes the TDR process:
1. Timeout detection
The GPU scheduler, which is part of the DirectX graphics kernel subsystem (Dxgkrnl.sys), detects that the GPU is taking more than the permitted amount of time to execute a particular task. The GPU scheduler then tries to preempt this particular task. The preempt operation has a "wait" timeout, which is the actual TDR timeout. This step is thus the timeout detection phase of the process. The default timeout period in Windows Vista and later operating systems is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout period, the operating system diagnoses that the GPU is frozen.
To prevent timeout detection from occurring, hardware vendors should ensure that graphics operations (that is, DMA buffer completion) take no more than 2 seconds in end-user scenarios such as productivity and game play.
2. Preparation for recovery:
The operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to inform the driver that the operating system detected a timeout. The driver must then reinitialize itself and reset the GPU. In addition, the driver must stop accessing memory and should not access hardware. The operating system and the driver collect hardware and other state information that could be useful for post-mortem diagnosis.
3. Desktop recovery:
The operating system resets the appropriate state of the graphics stack. The video memory manager, which is also part of Dxgkrnl.sys, purges all allocations from video memory. The display miniport driver resets the GPU hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As previously mentioned, some legacy DirectX applications might render just black at the end of this recovery, which requires the end user to restart these applications. Well-written DirectX 9Ex and DirectX 10 and later applications that handle Device Remove technology continue to work correctly. An application must release and then recreate its Direct3D device and all of the device's objects. For more information about how DirectX applications recover, see the Windows SDK.
Beginning with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008, the user experience has been improved in situations where the GPU hangs frequently and rapidly. Repetitive GPU hangs indicate that the graphics hardware has not recovered successfully. In these situations, the end user must shut down and restart the operating system to fully reset the graphics hardware. If the operating system detects that six or more GPU hangs and subsequent recoveries occur within 1 minute, the operating system bug-checks the computer on the next GPU hang.
Throughout the TDR process (that is, the process of detecting and recovering from situations where a GPU stops operating), the desktop is unresponsive and thus unavailable to the end user. In the final stages of recovery, a brief screen flash occurs that is similar to the brief screen flash that occurs when the end user changes the screen resolution. After the operating system has successfully recovered the desktop, the following informational message appears to the end user.
The operating system also logs the preceding message in the Event Viewer application and collects diagnosis information in the form of a debug report. If the end user opted in to provide feedback, the operating system returns this debug report to Microsoft through the Online Crash Analysis (OCA) mechanism.
This is not a regularly monitored account. Please contact Kris@NVIDIA for assistance.
If you're dying to reach me, hit me up at Amorphous@NVIDIA
Advanced Moderator Operations and Recursive Posting Hermetic/Omnigenous User-Simulating AI
Overclocking Hall of Fame - Post your 3DMark scores today!
NVIDIA Focus Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members.
You must Log In to add a comment.
New Private Message
Like Us On
Follow Us On
Copyright © 2015 NVIDIA Corporation