Wednesday, November 14, 2012

Troubleshooting Slow Boot Performance in Windows

It has been a few weeks since my last post, but trust me when I say the wait was worth it. This is another article that would go along with my "advanced windows desktop troubleshooting" series of posts only if I actually did have such a series on this site. Maybe I should set up an article series here in the future? Anyways, on with the article!

If you have been following my recent articles then you know that recently I had the opportunity to work with my colleagues in attempting to resolve an issue regarding slow boot performance in Windows 7. We did find at least one major problem where there was a group policy that was forcing automatic A\V re-installation at login time. The GPO was removed and at least that issue was fixed. But unfortunately, we found that the entire slow boot problem was not resolved and required more research, testing, and work. We needed to get the boot and logon times to what would be considered an acceptable time frame. Our team was a 'task force', if you will, on trying to locate the root causes and permanently fix the problem. 

The boot performance issues were on computers in public areas. Each public computer has a very large Windows 7 "image" deployed to it (around 100gb) that has a great deal of embedded software which includes Faronics DeepFreeze to prevent any type of system changes. Each of these computers are used by our customers every single day. From cold boot to CTRL+ALT+DEL the PC would take around 2 minutes then from the login screen to a fully usable desktop it would take at least 5 to 6 minutes. Adding these numbers up means that it would literally take around 8 minutes for a user to log into a computer and be able to actually use it to do work. You can imagine how angry our customers felt having to wait around 10 minutes just to use a computer! 

First we need to ask the question, "What exactly is an acceptable amount of time it should take for a user to log in to a computer and work?" 8 minutes is not acceptable by any imagination. To get straight to the point and without getting into a philosophical discussion, between 2 - 4 minutes is a very acceptable boot time for a computer regardless of the type of hardware it has in it. 

In order to troubleshoot this issue and fix the problem, we used several Sysinternals tools (Process Monitor and Autoruns) to see what was going launching at startup and to view what processes were running during boot and winlogon. But the final tool that helped to resolve our issue was the metrics in the Windows Performance Toolkit.

The Windows Performance Toolkit is a part of the Windows ADK or, Windows Assessment and Deployment Toolkit. ADK is a web installer for a large amount of tools and utilities for Windows 7 and Windows 8. A short description of the toolkit says “Windows Performance Toolkit includes tools to record system events and analyze performance data in a graphical user interface. Tools available in this toolkit include Windows Performance Recorder, Windows Performance Analyzer, and Xperf.”

To start troubleshooting, we set up a test computer with the large Windows 7 image deployed to it. We disabled DeepFreeze then installed the Windows Performance Toolkit on it. The toolkit includes two specific tools, the Windows Performance Recorder and the Windows Performance Analyzer. Using the recorder, we set the following options: 

  • Resource Analysis = CPU Usage and Disk I/O Activity only
  • Performance Scenario = Boot
  • Number of Iterations = 1

image

After clicking Start, the PC rebooted. We logged back in (after it took 8 minutes to build the user profile) and launched the Windows Performance Recorder (WPR) app again which prompted us to save the boot trace log. The log was saved as an .ETL file in C:\users\%username%\WPR Files\, so now we could use the Windows Performance Analyzer (WPA) on any other computer with the Windows Performance Toolkit installed to view the data.

Opening the boot trace with WPA immediately displayed some very interesting results. Take a look at the screenshot below.

image

Right under the graph explorer pane immediately displayed the problem. High disk I/O. The hard drive in the computer is literally pegged for the entire first 5 & 1\2 minutes of the boot and logon process. This brings us to lesson number one in troubleshooting boot performance issues = High disk I/O is the number one culprit of poor boot performance. So now we have an idea of why the computer is taking so long to log in a user. But we still didn’t know exactly what the culprits were.

Using the data graphs in the Windows Performance Analyzer, we broke down the data even further. We knew exactly how long this boot trace took (430 seconds or 7.1 minutes) and exactly how long each boot phase was taking to complete. If you look at the display below, you can see that it was taking at least 185 seconds to build a user profile, and an additional 91 seconds to finish post-boot actions. The computer isn’t fully usable until the disk I/O reduces.

boot

If you recall at the beginning of this article, we are using Faronics DeepFreeze to prevent system changes on these computers. This means that after the computer reboots, all system changes are lost; especially user profiles. A quick look at the size of the default user profile (C:\users\default\) revealed a major part of the problem; it was 450mb in size! This means that every time a user logs into the computer, Windows copies all data from C:\users\default\ into the new user profile. We needed to get the default profile storage reduced as much as possible.

image

Digging deeper into the threads and processes revealed more issues. One of the largest applications running at startup that was eating up disk I/O was something called “MatlabStartupAccellerator.exe”. After reading this article, we learned that this was a scheduled task set up by Matlab when it was embedded in the image and could be removed without affecting the application. It would just take longer to launch Matlab for the first time but this was a risk we were willing to take to get boot performance to an acceptable level. We also saw that the search indexer was running during logon.

image

We looked at the services that started up with the computer as well and only one stood out as particularly important, the MS SQL Server service. The full Microsoft Visual Studio 2010 Express suite is installed on these computers, which included SQL Server 2008 R2 Express. MS Visual Studio isn’t used by all users on these computers, so we recommended to change the startup setting from automatic to manual.

image

So, using the findings from the boot trace as a guide our Senior Support Tech and Desktop team went to work and started cleaning up the default user profile, changes some Windows services from “Automatic” to “Manual”, cleaned up the task scheduler, edited the update schedule for our antivirus to during maintenance hours (so DeepFreeze would allow the changes to commit), and more. In the mean time, our server team reviewed group policies and removed all old and redundant policies that was causing additional slow down during boot and Windows logon.

By the time we tested the changes and pushed the changes to all of the affected computers, the boot and logon time went down from 8 to 10 minutes to 1.5 to 3 minutes! And this is only for the very first logon of the day. As long as the computer was not turned off at any point, subsequent user logins took no longer than 30 seconds each!

Resolving this issue truly was a team effort. A great deal of blood, sweat, and tears went into fixing our boot and logon performance issues and our customers are now much happier. During this process, we learned how to accurately troubleshoot boot performance issues as well as how to avoid them in the future.

This article is only one example of how to use the Windows Performance Toolkit to create a trace for finding performance issues on a Windows computer. The example above displays using these tools in Windows 7, but they also work in Windows 8. Below is a list of resources if you want to learn more about how to use the WPT. I highly recommend you watch the TechNet 2012 video “How many coffees can you drink while Windows boots?” as Microsoft I.T. demonstrates how to use WPT, and they showing some real world examples on how they helped to resolve customer performance issues. As always, please feel free to leave a comment below.

http://msdn.microsoft.com/en-us/performance/cc825801.aspx
http://blogs.technet.com/b/mniehaus/archive/2012/09/13/using-the-windows-performance-toolkit.aspx
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/WCL305

- Joe