| Commit message (Collapse) | Author | Age |
|
|
|
| |
Change-Id: I126075a330f305c85f8fe1b8c9d408f368be95d1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should apply the iowait boost only if cpufreq policy has iowait boost
enabled. Also make it a schedutil configuration from sysfs so it can be
turned on/off if needed (by default initialize it to the policy value).
For systems that don't need/want it enabled, such as those on arm64
based mobile devices that are battery operated, it saves energy when the
cpufreq driver policy doesn't have it enabled (details below):
Here are some results for energy measurements collected running a
YouTube video for 30 seconds:
Before: 8.042533 mWh
After: 7.948377 mWh
Energy savings is ~1.2%
Bug: 38010527
Link: https://lkml.org/lkml/2017/5/19/42
Change-Id: If124076ad0c16ade369253840dedfbf870aff927
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'cached_raw_freq' is used to get the next frequency quickly but should
always be in sync with sg_policy->next_freq. There are cases where it is
not and in such cases it should be reset to avoid switching to incorrect
frequencies.
Consider this case for example:
- policy->cur is 1.2 GHz (Max)
- New request comes for 780 MHz and we store that in cached_raw_freq.
- Based on 780 MHz, we calculate the effective frequency as 800 MHz.
- We then decide not to update the frequency as
sugov_up_down_rate_limit() return true.
- Here cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz.
- Now if the utilization doesn't change in next request, then the next
target frequency will still be 780 MHz and it will match with
cached_raw_freq and so we will directly return 1.2 GHz instead of 800
MHz.
BACKPORT of upstream commit 07458f6a5171 ("cpufreq: schedutil: Reset
cached_raw_freq when not in sync with next_freq").
This also updates sugov_update_commit() for handling up/down tunables, which
aren't present in mainline.
Change-Id: I70bca2c5dfdb545a0471d1c9e4c5addb30ab5494
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code is to get the CPU util by accumulate different scheduling
classes and when the total util value is larger than CPU capacity
then it clamps util to CPU maximum capacity. So we can get correct util
value when use PELT signal but if with WALT signal it misses to clamp
util value.
On the other hand, WALT doesn't accumulate different class utilization
but it needs to applying boost margin for WALT signal the CPU util
value is possible to be larger than CPU capacity; so this patch is to
always clamp util to CPU maximum capacity.
Change-Id: I05481ddbf20246bb9be15b6bd21b6ec039015ea8
Signed-off-by: Leo Yan <leo.yan@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
policy requests
Currently, sugov_next_freq_shared() uses last_freq_update_time as a
reference to decide when to start considering CPU contributions as
stale.
However, since last_freq_update_time is set by the last CPU that issued
a frequency transition, this might cause problems in certain cases. In
practice, the detection of stale utilization values fails whenever the
CPU with such values was the last to update the policy. For example (and
please note again that the SCHED_CPUFREQ_RT flag is not the problem
here, but only the detection of after how much time that flag has to be
considered stale), suppose a policy with 2 CPUs:
CPU0 | CPU1
|
| RT task scheduled
| SCHED_CPUFREQ_RT is set
| CPU1->last_update = now
| freq transition to max
| last_freq_update_time = now
|
more than TICK_NSEC nsecs
|
a small CFS wakes up |
CPU0->last_update = now1 |
delta_ns(CPU0) < TICK_NSEC* |
CPU0's util is considered |
delta_ns(CPU1) = |
last_freq_update_time - |
CPU1->last_update = 0 |
< TICK_NSEC |
CPU1 is still considered |
CPU1->SCHED_CPUFREQ_RT is set |
we stay at max (until CPU1 |
exits from idle) |
* delta_ns is actually negative as now1 > last_freq_update_time
While last_freq_update_time is a sensible reference for rate limiting,
it doesn't seem to be useful for working around stale CPU states.
Fix the problem by always considering now (time) as the reference for
deciding when CPUs have stale contributions.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d86ab9cff8b936aadde444d0e263a8db5ff0349b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).
That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.
Make intel_pstate set transition_delay_us to 500.
BACKPORT: Modified to support the separate up_rate_limit_us and
down_rate_limit_us (upstream just has a single rate_limit_us). Also
dropped the changes for intel_pstate as there's a merge conflict.
Change-Id: I62a8543879a4d8582cdcb31ebd55607705d1c8b1
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit 1b72e7fd304639f1cd49d1e11955c4974936d88c)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make iowait_boost and iowait_boost_max as unsigned int since its unit is kHz
and this is consistent with struct cpufreq_policy. Also change the local
variables in sugov_iowait_boost to match this.
Change-Id: I6c67ed94c57c4bdb24bada4b97045593fcb95d2e
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the iowait_boost feature in schedutil makes the frequency go
to max on iowait wakeups. This feature was added to handle a case that
Peter described where the throughput of operations involving continuous
I/O requests [1] is reduced due to running at a lower frequency, however
the lower throughput itself causes utilization to be low and hence
causing frequency to be low hence its "stuck".
Instead of going to max, its also possible to achieve the same effect by
ramping up to max if there are repeated in_iowait wakeups happening.
This patch is an attempt to do that. We start from a lower frequency
(policy->min) and double the boost for every consecutive iowait update
until we reach the maximum iowait boost frequency (iowait_boost_max).
I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86
machine with intel_pstate in passive mode using schedutil. In this test
the iowait_boost value ramped from 800MHz to 4GHz in 60ms. The patch
achieves the desired improved throughput as the existing behavior.
[1] https://patchwork.kernel.org/patch/9735885/
Change-Id: I4a018434a50f4ca29ec15b03465f6dc212e54423
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sugov_update_commit() calls trace_cpu_frequency() to record the
current CPU frequency if it has not changed in the fast switch case
to prevent utilities from getting confused (they may report that the
CPU is idle if the frequency has not been recorded for too long, for
example).
However, that may cause the tracepoint to be triggered quite often
for no real reason (if the frequency doesn't change, we will not
modify the last update time stamp and governor computations may
run again shortly when that happens), so don't do that (arguably, it
is done to work around a utilities bug anyway).
That allows code duplication in sugov_update_commit() to be reduced
somewhat too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit 38d4ea229d25d30be6bf41bcd6cd663a587866ca)
(conflicts with sugov_up_down_rate_limit resolved)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ia019dda29b8c1c4cf3553da75c88d066eb5674e9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task. If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away. That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.
This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway. On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.
On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
(cherry picked from commit b7eaf1aab9f8bd2e49fceed77ebc66c1b5800718)
(simple CPUFREQ_RT_DL vs CPUFREQ_DL usage conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I531ec02c052944ee07a904dc2a25c59948ee762b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loop in sugov_next_freq_shared() contains an if block to skip the
loop for the current CPU. This turns out to be an unnecessary
conditional in the scheduler's hot-path for every CPU in the policy.
It would be better to drop the conditional and make the loop treat all
the CPUs in the same way. That would eliminate the need of calling
sugov_iowait_boost() at the top of the routine.
To keep the code optimized to return early if the current CPU has RT/DL
flags set, move the flags check to sugov_update_shared() instead in
order to avoid the function call entirely.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit cba1dfb57b94c234728b689d9b00d4267fa1a879)
(modified for SCHED_CPUFREQ_DL vs SCHED_CPUFREQ_RT)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ie046fdc8eda46821356750edd0fb6f7d077af363
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sugov_start()
sugov_start() only initializes struct sugov_cpu per-CPU structures
for shared policies, but it should do that for single-CPU policies too.
That in particular makes the IO-wait boost mechanism work in the
cases when cpufreq policies correspond to individual CPUs.
Fixes: 21ca6d2c52f8 (cpufreq: schedutil: Add iowait boosting)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
(cherry picked from commit 4296f23ed49a15d36949458adcc66ff993dee2a8)
(we use SCHED_CPUFREQ_DL instead of SCHED_CPUFREQ_RT in cpu->flags)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I5b837a0ee4432115d85caa1a9808ea61e1e1b07f
|
|
|
|
|
|
|
|
|
|
|
|
| |
get_next_freq() uses sg_cpu only to get sg_policy, which the callers of
get_next_freq() already have. Pass sg_policy instead of sg_cpu to
get_next_freq(), to make it more efficient.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 655cb1ebff4b7918fc560502c3297af2d3c7d114)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ia210058da32930a6cdb18258aa679cd1a44a747e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cached_raw_freq applies to the entire cpufreq policy and not individual
CPUs. Apart from wasting per-cpu memory, it is actually wrong to keep it
in struct sugov_cpu as we may end up comparing next_freq with a stale
cached_raw_freq of a random CPU.
Move cached_raw_freq to struct sugov_policy.
Fixes: 5cbea46984d6 (cpufreq: schedutil: map raw required frequency to driver frequency)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry-picked from 6c4f0fa643cb9e775dcc976e3db00d649468ff1d)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ie91420f710819b383947f9031da9be1f3bb7f636
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch rectifies a comment present in sugov_irq_work() function to
follow proper grammar.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d06e622d3d9206e6a2cc45a0f9a3256da8773ff4)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Iaf996445d411725639d511432cc424086892a146
|
|
|
|
|
|
|
|
|
|
|
| |
Execute the irq-work specific initialization/exit code only when the
fast path isn't available.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 21ef57297b15a49b0c4dd4e7135c1a08e9a29a1c)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Icfd68f455ef71846d799fcd2d8ec6aa1bf59573e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fast_switch_enabled flag will be used by both sugov_policy_alloc()
and sugov_policy_free() with a later patch.
Prepare for that by moving the calls to enable and disable it to the
beginning of sugov_init() and end of sugov_exit().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 4a71ce4348bb61740d411822357061f8bf870f4c)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ia174f423ca02d59360657ac2e77a5098ce5cf99c
|
|
|
|
|
|
|
|
|
|
|
| |
Switch to the more common practice of writing labels.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8e2ddb03643eb9d0bc4926946d7ce0d308eef0a5)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: Ida75c99cf3dff5cae24d3866454c83bcdb3385b9
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using WALT we always used boosted cpu util for OPP selection.
This is the primary purpose for boosted cpu util, but we hadn't
changed the PELT utilization check to do the same thing.
Fix that here.
Change-Id: Id5ffb26eac23b25fe754255221f6d21b8cededfd
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rate_limit_us tunable is intended to reduce the possible overhead
from running the schedutil governor. However, that overhead can be
divided into two separate parts: the governor computations and the
invocation of the scaling driver to set the CPU frequency. The latter
is where the real overhead comes from. The former is much less
expensive in terms of execution time and running it every time the
governor callback is invoked by the scheduler, after rate_limit_us
interval has passed since the last frequency update, would not be a
problem.
For this reason, redefine the rate_limit_us tunable so that it means the
minimum time that has to pass between two consecutive invocations of the
scaling driver by the schedutil governor (to set the CPU frequency).
Change-Id: Iced64116b826c25441ef537c27a3dabfcf81919e
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[pulled from linux-pm linux-next https://patchwork.kernel.org/patch/9583949/ ]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rate-limit tunable in the schedutil governor applies to transitions
to both lower and higher frequencies. On several platforms it is not the
ideal tunable though, as it is difficult to get best power/performance
figures using the same limit in both directions.
It is common on mobile platforms with demanding user interfaces to want
to increase frequency rapidly for example but decrease slowly.
One of the example can be a case where we have short busy periods
followed by similar or longer idle periods. If we keep the rate-limit
high enough, we will not go to higher frequencies soon enough. On the
other hand, if we keep it too low, we will have too many frequency
transitions, as we will always reduce the frequency after the busy
period.
It would be very useful if we can set low rate-limit while increasing
the frequency (so that we can respond to the short busy periods quickly)
and high rate-limit while decreasing frequency (so that we don't reduce
the frequency immediately after the short busy period and that may avoid
frequency transitions before the next busy period).
Implement separate up/down transition rate limits. Note that the
governor avoids frequency recalculations for a period equal to minimum
of up and down rate-limit. A global mutex is also defined to protect
updates to min_rate_limit_us via two separate sysfs files.
Note that this wouldn't change behavior of the schedutil governor for
the platforms which wish to keep same values for both up and down rate
limits.
This is tested with the rt-app [1] on ARM Exynos, dual A15 processor
platform.
Testcase: Run a SCHED_OTHER thread on CPU0 which will emulate work-load
for X ms of busy period out of the total period of Y ms, i.e. Y - X ms
of idle period. The values of X/Y taken were: 20/40, 20/50, 20/70, i.e
idle periods of 20, 30 and 50 ms respectively. These were tested against
values of up/down rate limits as: 10/10 ms and 10/40 ms.
For every test we noticed a performance increase of 5-10% with the
schedutil governor, which was very much expected.
[Viresh]: Simplified user interface and introduced min_rate_limit_us +
mutex, rewrote commit log and included test results.
[1] https://github.com/scheduler-tools/rt-app/
Change-Id: I18720a83855b196b8e21dcdc8deae79131635b84
Signed-off-by: Steve Muckle <smuckle.linux@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
(applied from https://marc.info/?l=linux-kernel&m=147936011103832&w=2)
[trivial adaptations]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If WALT is available and enabled, make schedutil governor use its
utilization signal.
Change-Id: I92bc37989447a76616e9bcc4e9e8616774fb9925
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[we need to use boosted_cpu_util for schedutil, so make it
not static]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A policy of going to fmax on any RT activity will be detrimental
for power on many platforms. Often RT accounts for only a small amount
of CPU activity so sending the CPU frequency to fmax is overkill. Worse
still, some platforms may not be able to even complete the CPU frequency
change before the RT activity has already completed.
Cpufreq governors have not treated RT activity this way in the past so
it is not part of the expected semantics of the RT scheduling class. The
DL class offers guarantees about task completion and could be used for
this purpose.
Modify the schedutil algorithm to instead use rt_avg as an estimate of
RT utilization of the CPU.
Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>.
Change-Id: I1ed605a3e2512a94d34217a8e57c3fd97cca60be
Signed-off-by: Steve Muckle <smuckle@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If slow path frequency changes are conducted in a SCHED_OTHER context
then they may be delayed for some amount of time, including
indefinitely, when real time or deadline activity is taking place.
Move the slow path to a real time kernel thread. In the future the
thread should be made SCHED_DEADLINE. The RT priority is arbitrarily set
to 50 for now.
Hackbench results on ARM Exynos, dual core A15 platform for 10
iterations:
$ hackbench -s 100 -l 100 -g 10 -f 20
Before After
---------------------------------
1.808 1.603
1.847 1.251
2.229 1.590
1.952 1.600
1.947 1.257
1.925 1.627
2.694 1.620
1.258 1.621
1.919 1.632
1.250 1.240
Average:
1.8829 1.5041
Based on initial work by Steve Muckle.
Change-Id: I8f53037e94f353960c6d10abf07822d671631ef7
Signed-off-by: Steve Muckle <smuckle.linux@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from 02a7b1ee3baa)
[adapt to the 3.18 kthread interface]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
|
|
Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.
Doing that is possible after commit 34e2c55 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.
The new governor is relatively simple.
The frequency selection formula used by it depends on whether or not
the utilization is frequency-invariant. In the frequency-invariant
case the new CPU frequency is given by
next_freq = 1.25 * max_freq * util / max
where util and max are the last two arguments of cpufreq_update_util().
In turn, if util is not frequency-invariant, the maximum frequency in
the above formula is replaced with the current frequency of the CPU:
next_freq = 1.25 * curr_freq * util / max
The coefficient 1.25 corresponds to the frequency tipping point at
(util / max) = 0.8.
All of the computations are carried out in the utilization update
handlers provided by the new governor. One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).
The governor supports fast frequency switching if that is supported
by the cpufreq driver in use and possible for the given policy.
In the fast switching case, all operations of the governor take
place in its utilization update handlers. If fast switching cannot
be used, the frequency switch operations are carried out with the
help of a work item which only calls __cpufreq_driver_target()
(under a mutex) to trigger a frequency update (to a value already
computed beforehand in one of the utilization update handlers).
Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes. That
heavy-handed approach should be replaced with something more
subtle and specifically targeted at RT and DL tasks.
The governor shares some tunables management code with the
"ondemand" and "conservative" governors and uses some common
definitions from cpufreq_governor.h, but apart from that it
is stand-alone.
Change-Id: I03876e622768e4b3ee4dc28682af7cce771f2f4c
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
(cherry-picked from 9bdcb44e391da5c41b98573bf0305a0e0b1c9569)
[ Backport the schedutil cpufreq governor from 4.9. Some cpufreq
tunable infrastructure as well as the resolve_freq API is also
backported as those are dependencies]
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[trivial cherry-picking fixes]
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[fixed default governor machinery]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|