Wednesday, February 20, 2008

Managing notebook HDD problems with Windows Power Events Monitor (+ hdparm + smartctl)

Do you remember the relatively recent frenzy around how in Linux there was a very aggressive policy regarding disk-drives with the result of very premature HDD failure? Well - the good news are that it was not something wrong that Linux was doing, the BAD news are that the same problem is now present with certain HDD models under Windows too !!! The problem is related to the huge number of head loading / unloading and I had posts about that story here, here, here and here - the fact that the bug can generate some loud noise is the partially good part (since you might become aware of it, but that only takes place in like 25% of the drives), the real problem is that after a certain amount of head loading / unloading your HDD will just fail, and that can take place even after only a few months of use ... (more likely a little over one year so that many disks will be out of warranty and full of important data). The WORSE part is that on some of those HDD models setting 'saner parameters' only works until the HDD is powered-down OR SUSPENDED - so for instance even if you manage to disable that infuriating disk-clunking it will be back as soon as you resume your notebook from standby!!!

Under Linux the fix was not very complex - for instance under Ubuntu I have created a text file /etc/init.d/hdparm-B with a content like:

hdparm -B 254 -S 61 -M 254 /dev/sda
echo 30000 > /proc/sys/vm/dirty_writeback_centisecs
echo 8 > /proc/sys/vm/dirty_background_ratio
echo 24000 > /proc/sys/vm/dirty_expire_centisecs

and then symbolic links to that as /etc/rcS.d/S92hdparm-B and /etc/acpi/resume.d/ (it is simpler than it sounds).

Under Windows however things are not as simple since there is no very clear folder where to put some commands that you want executed in certain conditions ... and since the irritating disk clunking (from head loading/unloading) is now also a serious problem under Windows with certain laptop disk models (like for instance Western Digital WDC WD1200BEVE) a small helper program was needed ...

Initially I have used myself some other of my own programs that are always running on my notebooks, but when two of my friends asked for help with just the same problem it was clear that a more generic solution was needed - and in about 2-3 hours on Sunday I have placed together a small program which will solve that problem in a way that should be very simple for most of the technical Windows users - it just took two days after that to get the project on SourceForge :) (together with the source code which is GPL v2).

So what you need to do is to first go to the SourceForge page for the binary release for Windows Power Events Monitor and download the program (the start page for the project is here and from that one you can also navigate to the page with the source code).

In order to install the program you just need to unzip the content of the binary release to your C:\ drive root - you will get at the top a folder called C:\_smart (which later you can rename or move, but it will be easier to test it this way), and inside that folder you will find the program that you need to run as C:\_smart\bin\pwr_mon.exe - just start it and a new icon will become visible in the system tray - the light-bulb will be ON if the computer is on AC and OFF when using batteries, a right-click will bring the main menu of the program from where the main window can be shown/hidden. If your HDD is one of those that will not retain the settings over power-off or standby (some Western Digital and some Samsung are certainly in this category) you will also need to create a shortcut to this program somewhere in your StartUp folder (or the StartUp folder for all users).

The actual low-level work is done by the Windows version of hdparm (it is included in the binary release from above) - but the actual parameters are in the three BAT files that are VERY SIMPLE to tweak so that you will get the desired results for YOUR configuration !!! Everything involved is located under C:\_smart\bin\ and by default the values that are used are picked for the Western Digital WDC WD1200BEVE - which in my personal experience so far was among the 'worst offenders' - so when I am running it plugged-in on AC I am setting it to such values that Advanced Power Management (the thing that generates the 'clunking') is 'practically disabled' (-B 254 in H.BAT) and also the Acoustic Management is set to 'fast' (-M 254). However on batteries another BAT file is called - H_BATT.BAT - and for that one I am using slightly more power-friendly settings (-B 253 -M 128) that will generate SOME clunking (but the batteries will last longer and the HDD will be slightly better protected if you drop it on a hard surface) - if you want to eliminate that residual clunking just change the values from H_BATT.BAT to the same as H.BAT (-B 254 -M 254).

There is also a third BAT file that is called just once when the program is started - it is H_FREEZE.BAT and it will protect your HDD (only until the next restart) from being hardware-locked with a password that you do not know (that action has no effect if you already have a password on the HDD and also you can still set/change the HDD password from BIOS after a full reset - don't forget to also place a password on the BIOS itself!). Some newer BIOS versions will already take care of that 'security freeze', but unfortunately not all - and certainly very few of the older computers BIOSes ...

If your notebook has two internal HDD drives (a friend of mine has one of those monsters) you will just have to edit the 3 BAT files and add a second line on each of them for /dev/hdb instead of /dev/hda. Also other actions that you would like to automate when the computer is restored from standby/hibernate or when the AC/DC status changes can be added to those BAT files so you can feel free to experiment :)

Another very nice thing that you can do with the programs from that folder is to check the 'health' and 'age' of your HDD - just get to a command prompt in that folder and run a command like

smartctl.exe -d ata -a /dev/hda > 1.txt

and after that you will have a file called 1.TXT where you can see things like the number of hours your HDD was ON (under Power_On_Hours, but some HDDs might have the amount of MINUTES here) and the amount of head loading/unloading (under Load_Cycle_Count) - if your Load_Cycle_Count is over 100000 you should start worrying (also if Reallocated_Event_Count is bigger than 0 before 1 year). And if dividing Load_Cycle_Count by Power_On_Hours results in a number bigger than 30 cycles/hour you probably need this program badly :)

So that's it - you can now use your notebook HDD without that annoying noise and without fear that in 6 months it will die as a result of too many head loading/unloading! (also please add comments to this post if you encounter any problem; the testing could not be very extensive on Vista so any feedback is welcome).