NVIDIA Linux drivers, PowerMizer, Coolbits, Performance Levels and GPU fan settings

Whew, I know that’s a long title for a post, but I wanted to make sure that I mentioned every term so that people having the same problem could readily find the post that explains what solved it for me. For some time now (ever since the 346.x series [340.76, which was the last driver that worked for me, was released on 27 January 2015]), I have had a problem with the NVIDIA Linux Display Drivers (known as nvidia-drivers in Gentoo Linux). The problem that I’ve experienced is that the newer drivers would, upon starting an X session, immediately clock up to Performance Level 2 or 3 within PowerMizer.

Before using these newer drivers, the Performance Level would only increase when it was really required (3D rendering, HD video playback, et cetera). I probably wouldn’t have even noticed that the Performance Level was changing, except that it would cause the GPU fan to spin faster, which was noticeably louder in my office.

After scouring the interwebs, I found that I was not the only person to have this problem. For reference, see this article, and this one about locking to certain Performance Levels. However, I wasn’t able to find a solution for the exact problem that I was having. If you look at the screenshot below, you’ll see that the Performance Level is set at 2 which was causing the card to run quite hot (79°C) even when it wasn’t being pushed.

NVIDIA Linux drivers stuck on high Performance Level in Xorg
Click to enlarge

It turns out that I needed to add some options to my X Server Configuration. Unfortunately, I was originally making changes in /etc/X11/xorg.conf, but they weren’t being honoured. I added the following lines to /etc/X11/xorg.conf.d/20-nvidia.conf, and the changes took effect:


Section "Device"
     Identifier    "Device 0"
     Driver        "nvidia"
     VendorName    "NVIDIA Corporation"
     BoardName     "GeForce GTX 470"
     Option        "RegistryDwords" "PowerMizerEnable=0x1; PowerMizerDefaultAC=0x3;"
EndSection

The portion in bold (the RegistryDwords option) was what ultimately fixed the problem for me. More information about the NVIDIA drivers can be found in their README and Installation Guide, and in particular, these settings are described on the X configuration options page. The PowerMizerDefaultAC setting may seem like it is for laptops that are plugged in to AC power, but as this system was a desktop, I found that it was always seen as being “plugged in to AC power.”

As you can see from the screenshots below, these settings did indeed fix the PowerMizer Performance Levels and subsequent temperatures for me:

NVIDIA Linux drivers Performance Level after Xorg settings
Click to enlarge

Whilst I was adding X configuration options, I also noticed that Coolbits (search for “Coolbits” on that page) were supported with the Linux driver. Here’s the excerpt about Coolbits for version 364.19 of the NVIDIA Linux driver:

Option “Coolbits” “integer”
Enables various unsupported features, such as support for GPU clock manipulation in the NV-CONTROL X extension. This option accepts a bit mask of features to enable.

WARNING: this may cause system damage and void warranties. This utility can run your computer system out of the manufacturer’s design specifications, including, but not limited to: higher system voltages, above normal temperatures, excessive frequencies, and changes to BIOS that may corrupt the BIOS. Your computer’s operating system may hang and result in data loss or corrupted images. Depending on the manufacturer of your computer system, the computer system, hardware and software warranties may be voided, and you may not receive any further manufacturer support. NVIDIA does not provide customer service support for the Coolbits option. It is for these reasons that absolutely no warranty or guarantee is either express or implied. Before enabling and using, you should determine the suitability of the utility for your intended use, and you shall assume all responsibility in connection therewith.

When “2” (Bit 1) is set in the “Coolbits” option value, the NVIDIA driver will attempt to initialize SLI when using GPUs with different amounts of video memory.

When “4” (Bit 2) is set in the “Coolbits” option value, the nvidia-settings Thermal Monitor page will allow configuration of GPU fan speed, on graphics boards with programmable fan capability.

When “8” (Bit 3) is set in the “Coolbits” option value, the PowerMizer page in the nvidia-settings control panel will display a table that allows setting per-clock domain and per-performance level offsets to apply to clock values. This is allowed on certain GeForce GPUs. Not all clock domains or performance levels may be modified.

When “16” (Bit 4) is set in the “Coolbits” option value, the nvidia-settings command line interface allows setting GPU overvoltage. This is allowed on certain GeForce GPUs.

When this option is set for an X screen, it will be applied to all X screens running on the same GPU.

The default for this option is 0 (unsupported features are disabled).

I found that I would personally like to have the options enabled by “4” and “8”, and that one can combine Coolbits by simply adding them together. For instance, the ones I wanted (“4” and “8”) added up to “12”, so that’s what I put in my configuration:


Section "Device"
     Identifier    "Device 0"
     Driver        "nvidia"
     VendorName    "NVIDIA Corporation"
     BoardName     "GeForce GTX 470"
     Option        "Coolbits" "12"
     Option        "RegistryDwords" "PowerMizerEnable=0x1; PowerMizerDefaultAC=0x3;"
EndSection

and that resulted in the following options being available within the nvidia-settings utility:

NVIDIA Linux drivers Performance Level after Xorg settings
Click to enlarge

Though the Coolbits portions aren’t required to fix the problems that I was having, I find them to be helpful for maintenance tasks and configurations. I hope, if you’re having problems with the NVIDIA drivers, that these instructions help give you a better understanding of how to workaround any issues you may face. Feel free to comment if you have any questions, and we’ll see if we can work through them.

Cheers,
Zach

2016 Superheroes Race for the National Children’s Advocacy Center (NCAC) in Huntsville, AL

Well, it was that time of year again… time to dress up as one’s favourite Superhero and run for a great cause! The first weekend of April, I made the drive down to the lovely city of Huntsville, Alabama in order to support the National Children’s Advocacy Center by running in the 2016 NCAC Superheros 5K (here’s my post about last year’s race).

This year’s course was the same as last year, so it was a really nice run through parts of the city centre and through more residential areas of Huntsville. Unlike last year, the race started much later in the afternoon, so the temperatures were a lot higher (last year, truthfully, a bit chilly). It was beautifully sunny, and actually bordered on a bit warm, but I would gladly take those conditions over the cold! 🙂

2016 NCAC Superheroes race - pre-race start
Right before the start of the race
Click for full photo

I wasn’t quite sure how this race was going to turn out, seeing as it was my first since the knee injury late last year. I was hopeful that my rehabilitation and training since the injury would help me at least come close to my time last year, but I also doubted that possibility. I came in first place overall with a time of 20:13, which was a little over 30 seconds slower than last year. All things considered, I was pleased with my time. A few other fantastic runners to mention this year were Elliott Kliesner (age 14) who came in about 37 seconds after me, Christian Grant (age 12) with a time of 21:42, and Bud Bettler (age 72) who finished with an outstanding time for his age bracket at 28:16.

2016 NCAC Superheroes race - 5K results
5K Results 1st through 5th place
Click for top 45 results

Years ago, I decided that I wouldn’t run in any races unless they benefited a children’s charity, and I can’t think of any organisation with which my goals more align with their mission than the National Children’s Advocacy Center. According to WAFF News in Huntsville, the race raised over $24,000 for the NCAC! That will make a huge difference in the lives of the children that the NCAC serves! Here’s to hoping that next year’s race (the 7th annual) will raise even more. Hope to see you there!

2016 NCAC Superheroes race - Nathan Zachary award acceptance
Nathan Zachary’s award (and Superhero cape) acceptance
Click to enlarge

Cheers,
Zach

Linux firmware for iwlwifi ucode failed with error -2

A couple weeks ago, I decided to update my primary laptop’s kernel from 4.0 to 4.5. Everything went smoothly with the exception of my wireless networking. This particular laptop uses the a wifi chipset that is controlled by the Intel Wireless DVM Firmware:


#lspci | grep 'Network controller'
03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205 [Taylor Peak] (rev 34)

According to Intel Linux support for wireless networking page, I need kernel support for the ‘iwlwifi’ driver. I remembered this requirement from building the previous kernel, so I included it in the new 4.5 kernel. The new kernel had some additional options, though, and they were:


[*] Intel devices
...
< > Intel Wireless WiFi Next Gen AGN - Wireless-N/Advanced-N/Ultimate-N (iwlwifi)
< > Intel Wireless WiFi DVM Firmware support
< > Intel Wireless WiFi MVM Firmware support
Debugging Options --->

As previously mentioned, the Kernel page for iwlwifi indicates that I need the DVM module for my particular chipset, so I selected it. Previously, I chose to build support for the driver into the kernel, and then use the firmware for the device. However, this time, I noticed that it wasn’t loading:


[ 3.962521] iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 3.970843] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-6000g2a-6.ucode failed with error -2
[ 3.976457] iwlwifi 0000:03:00.0: loaded firmware version 18.168.6.1 op_mode iwldvm
[ 3.996628] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEBUG enabled
[ 3.996640] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEBUGFS disabled
[ 3.996647] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEVICE_TRACING enabled
[ 3.996656] iwlwifi 0000:03:00.0: Detected Intel(R) Centrino(R) Advanced-N 6205 AGN, REV=0xB0
[ 3.996828] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 4.306206] iwlwifi 0000:03:00.0 wlp3s0: renamed from wlan0
[ 9.632778] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 9.633025] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 9.633133] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 9.898531] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 9.898803] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 9.898906] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 20.605734] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 20.605983] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 20.606082] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 20.873465] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 20.873831] iwlwifi 0000:03:00.0: L1 Enabled - LTR Disabled
[ 20.873971] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0

The strange thing, though, is that the firmware was right where it should be:


# ls -lh /lib/firmware/
total 664K
-rw-r--r-- 1 root root 662K Mar 26 13:30 iwlwifi-6000g2a-6.ucode

After digging around for a while, I finally figured out the problem. The kernel was trying to load the firmware for this device/driver before it was actually available. There are definitely ways to build the firmware into the kernel image as well, but instead of going that route, I just chose to rebuild my kernel with this driver as a module (which is actually the recommended method anyway):


[*] Intel devices
...
Intel Wireless WiFi Next Gen AGN - Wireless-N/Advanced-N/Ultimate-N (iwlwifi)
Intel Wireless WiFi DVM Firmware support
< > Intel Wireless WiFi MVM Firmware support
Debugging Options --->

If I had fully read the page instead of just skimming it, I could have saved myself a lot of time. Hopefully this post will help anyone getting the “Direct firmware load for iwlwifi-6000g2a-6.ucode failed with error -2” error message.

Cheers,
Zach