Power management: a system wide challenge – Peter De Schrijver

This talk describes how power is consumed, starting from the level of MOSFETs, and then goes on with tips on how to conserve power.

The MOSFET is the basic building block of a circuit. It has a design parameter, L, the distance between the source and drain, which affects the speed and the power consumption. Power is consumed in two ways: dynamic power, due to circuit capacitance, and static power, due to resistive current. In static power, the drain leakage is the most imporant component and it increases exponentially with decreasing threshold voltage.

To reduce power at the circuit level, we can play with threshold voltage and supply voltage. The threshold voltage can be tweaked to have slow and power-efficient transistors in one part of the system and fast and power-consuming transistors in another part. The supply voltage can be tweaked by having several voltage domains on the chip. For example, the OMAP dynamically lowers the voltage based on the performance of the silicon. It has a ring oscillator that reflects the maximum path they need to meet, then they decrease the voltage until the ring oscillator gets the target frequency. This is used for the ARM core and (in OMAP4) for the interconnect and IVA as well.

At the architectural level, the power can be reduced with concurrency, by lowering both the clock and the voltage while still meeting the same performance (to the extent that the processing can be parallelised). Also specialised hardware blocks reduce power consumption because they can be optimised to exactly the required performance.

Clock gating can shut off the dynamic power: when the clock is shut off, the circuit doesn’t switch anymore so there is no dynamic power consumption. Clock gating can be done at several levels, under hardware or software control. For instance, in OMAP3 a peripheral’s clock is automatically shut off when nobody is using it. Also complete clock domains can be shut off by software.

Powergating is similar to clock gating, but turns of the supply, so both static and dynamic power are saved. Of course, the state cannot be retained so it needs to be saved, which brings a performance and power penalty. Also, the circuitry that turns off the power has static leakage itself… Powergating a CPU is relatively easy: all you have to do is save the registers and invalidate the cache. For peripherals, it is more difficult because they often have more state to retain, and they’re quite expensive to read out – sometimes they can’t even be read out. So instead of reading all registers, it is possible to keep a shadow register bank in memory that is updated on every write, or the state can be recomputed based on higher-level variables e.g. i2c clock rate.

Halfway to powergating is the retention state. This turns off clocks and reduces the voltage to a level that is not enough to do switching, but enough to retain the state. This is a bit difficult in practice, though, because you usually cannot do it for so many different power domains as you typically have on a chip – instead only the main supply can be regulated. Also, re-regulating a supply takes time and puts you in difficult-to-manage intermediate states. OMAP3 has a retention state for the CPU.

In Linux, this is managed by the cpuidle framework. It tracks. the idle time and latency constrains (how fast the system has to come up again). Platform drivers implement the states and announce them as well as the entry/exit latency and the energy break even point (how long you need to stay idle to have actually saved energy). Currently the cpuidle framework is still separate from the scheduler, so the scheduler doesn’t take into account that it would be better to wait a bit before waking up some core.

For peripherals, there is the runtime pm framework. Drivers indicate that they need hardware with get() and put() calls. The framework controls clockgating and powergating using the clock and powerdomain frameworks. There is currently just one chip which has mainline powerdomain support. Android doesn’t do any of this, it just does system suspend. Maemo has an ad-hoc powergating strategy for OMAP, not using the framework. Also in 3D drivers there is ad-hoc powerdomain management. Currently, even the suspend framework doesn’t use runtime pm, it has its own mechanisms to save driver state. But this is changing and the agreement is that suspend should use runtime pm to shut peripherals down.

For dynamic voltage and frequency scaling (DVFS), there is the cpufreq framework. It has a governor that implements the policy of choosing a frequency point, the voltage is derived from that based on OPPs. There are two non-static governors: ondemand and interactive. ondemand tries to keep the system load at 80%; when load goes higher, the clock is bumped up. interactive is similar but makes a difference between longterm and shortterm load: it scales up quickly if the shortterm load goes up, so it will be interactive. For peripherals, there is the devfreq framework but it’s not used a lot.

To make a balance between performance and power, we need PM QoS. [I didn’t follow this part.] It is used on maemo/OMAP3, but not on Android/Tegra3.

To make all of the above possible, userspace has to collaborate.

  • Don’t poll
  • Aggregate timers, so you reduce the number of wakeup sources.
  • Release unused resources: most drivers are smart enough to turn off the peripherals when all fds are closed.
  • Watch the battery level and scale down functionality when the level is low.

To be able to optimise power consumption, it is important to measure it. Current is measured with a small resistor in the power path, simultaneously measuring voltage in case of DVS.

In conclusion, power management is a system-wide problem: there are contributors all over the place and any little piece can make things worse.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s