| Commit message (Collapse) | Author | Age |
| ... | |
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we are calculating what the impact of placing a task on a specific
cpu is, we should include the information that there might be a minimum
capacity imposed upon that cpu which could change the performance and/or
energy cost decisions.
When choosing an idle backup CPU, favour CPUs that won't end up
running at a high OPP due to a min capacity cap imposed by external
actors.
Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add the ability to track minimim capacity forced onto a sched_group
by some external actor.
group_max_util returns the highest utilisation inside a sched_group
and is used when we are trying to calculate an energy cost estimate
for a specific scheduling scenario. Minimum capacities imposed from
elsewhere will influence this energy cost so we should reflect it
here.
Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We have the ability to track minimum capacity forced onto a CPU by
userspace or external actors. This is provided though a minimum
frequency scale factor exposed by arch_scale_min_freq_capacity.
The use of this information is enabled through the MIN_CAPACITY_CAPPING
feature. If not enabled, the minimum frequency scale factor will
remain 0 and it will not impact energy estimation or scheduling
decisions.
Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the minimum capacity of a group is capped by userspace or internal
dependencies which are not otherwise visible to the scheduler, we need
a way to see these and integrate this information into the energy
calculations and task placement decisions we make.
Add arch_scale_min_freq_capacity to determine the lowest capacity which
a specific cpu can provide under the current set of known constraints.
Change-Id: Ied4a1dc0982bbf42cb5ea2f27201d4363db59705
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The max frequency scaling factor is defined as:
max_freq_scale = policy_max_freq / cpuinfo_max_freq
To be able to scale the cpu capacity by this factor introduce a call to
the new arch scaling function arch_scale_max_freq_capacity() in
update_cpu_capacity() and provide a default implementation which returns
SCHED_CAPACITY_SCALE.
Another subsystem (e.g. cpufreq) can overwrite this default implementation,
exactly as for frequency and cpu invariance. It has to be enabled by the
arch by defining arch_scale_max_freq_capacity to the actual
implementation.
Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The SG's energy is obtained by adding busy and idle contributions which
are computed by considering a proper fraction of the
SCHED_CAPACITY_SCALE defined by the SG's utilizations.
By scaling each and every contribution conputed we risk to accumulate
rounding errors which can results into a non null energy_delta also in
cases when the same total accomulated utilization is differently
distributed among different CPUs.
To reduce rouding errors, this patch accumulated non-scaled busy/idle
energy contributions for each visited SG, and scale each of them just
one time at the end.
Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2
Suggested-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The energy_env data structure is used to cache values required by multiple
different functions involved in energy_diff computation. Some of these
functions require additional parameters which can be easily embedded
into the energy_env itself. The current implementation of energy_diff
hardcodes the usage of two different energy_env structures to estimate and
compare the energy consumption related to a "before" and an "after" CPU.
Moreover, it does this energy estimation by walking multiple times the
SDs/SGs data structures.
A better design can be envisioned by better using the energy_env structure
to support a more efficient and concurrent evaluation of multiple schedule
candidates. To this purpose, this patch provides a complete re-factoring
of the energy_diff implementation to:
1. use a single energy_env structure for the evaluation of all the
candidate CPUs
2. walk just one time the SDs/SGs, thus improving the overall performance
to compute the energy estimation for each CPU candidate specified by
the single used energy_env
3. simplify the code (at least if you look at the new version and not at
this re-factoring patch) thus providing a more clean code to maintain
and extend for additional features
This patch updated all the clients of energy_env to use only the data
provided by this structure and an index for one of its CPUs candidates.
Embedding everything within the energy env will make it simple to add
tracepoints for this new version, which can easily provide an holistic
view on how energy_diff evaluated the proposed CPU candidates.
The new proposed structure, for both "struct energy_env" and the functions
using it, is designed in such a way to easily accommodate additional
further extensions (e.g. SchedTune filtering) without requiring an
additional big re-factoring of these core functions.
Finally, the usage of a CPUs candidate array, embedded into the
energy_diff structure, allows also to seamless extend the exploration of
multiple candidate CPUs, for example to support the comparison of a
spread-vs-packing strategy.
Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
[reworked find_new_capacity() and enforced the respect of
find_best_target() selection order]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
[@0ctobot: Adapted for kernel.lnx.4.4.r35-rel]
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The current definition of select_energy_cpu_brute is a bit confusing in
the definition of the value for the target_cpu to be returned as wakeup
CPU for the specified task.
This cleanup the code by ensuring that we always set target_cpu right
before returning it. rcu_read_lock and check on *sd!=NULL are also moved
around to be exactly where they are required.
Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In preparation for the energy_diff refactoring, let's remove all the
SchedTune specific bits which are used to keep track of the capacity
variations requited by the PESpace filtering.
This removes also the energy_normalization function and the wrapper of
energy_diff which is used to trigger a PESpace filtering by
schedtune_accept_deltas().
The remaining code is the "original" energy_diff function which
looks just at the energy variations to compare prev_cpu vs next_cpu.
Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The format of the energy_diff tracepoint is going to be changed by the
following energ_diff refactoring patches. Let's remove it now to start from
a clean slate.
Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a simple renaming patch which just align to the most common code
convention used in fair.c, task_structs pointers are usually named *p.
Change-Id: Id0769e52b6a271014d89353fdb4be9bb721b5b2f
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
| | | |
| | |
| | |
| | |
| | | |
error: a function declaration without a prototype is deprecated in all versions of C
Change-Id: Iea020e1a126d23f5c8056807ac9c02a79493153b
|
| | | |
| | |
| | |
| | | |
Change-Id: I126075a330f305c85f8fe1b8c9d408f368be95d1
|
| |\| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
into lineage-20
7d11b1a7a11c Revert "sched: cpufreq: Use sched_clock instead of rq_clock when updating schedutil"
daaa5da96a74 sched: Take irq_sparse lock during the isolation
217ab2d0ef91 rcu: Speed up calling of RCU tasks callbacks
997b726bc092 kernel: power: Workaround for sensor ipc message causing high power consume
b933e4d37bc0 sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
82d3f23d6dc5 sched/fair: Fix bandwidth timer clock drift condition
629bfed360f9 kernel: power: qos: remove check for core isolation while cluster LPMs
891a63210e1d sched/fair: Fix issue where frequency update not skipped
b775cb29f663 ANDROID: Move schedtune en/dequeue before schedutil update triggers
ebdb82f7b34a sched/fair: Skip frequency updates if CPU about to idle
ff383d94478a FROMLIST: sched: Make iowait_boost optional in schedutil
9539942cb065 FROMLIST: cpufreq: Make iowait boost a policy option
b65c91c9aa14 ARM: dts: msm: add HW CPU's busy-cost-data for additional freqs
72f13941085b ARM: dts: msm: fix CPU's idle-cost-data
ab88411382f7 ARM: dts: msm: fix EM to be monotonically increasing
83dcbae14782 ARM: dts: msm: Fix EAS idle-cost-data property length
33d3b17bfdfb ARM: dts: msm: Add msm8998 energy model
c0fa7577022c sched/walt: Re-add code to allow WALT to function
d5cd35f38616 FROMGIT: binder: use EINTR for interrupted wait for work
db74739c86de sched: Don't fail isolation request for an already isolated CPU
aee7a16e347b sched: WALT: increase WALT minimum window size to 20ms
4dbe44554792 sched: cpufreq: Use per_cpu_ptr instead of this_cpu_ptr when reporting load
ef3fb04c7df4 sched: cpufreq: Use sched_clock instead of rq_clock when updating schedutil
c7128748614a sched/cpupri: Exclude isolated CPUs from the lowest_mask
6adb092856e8 sched: cpufreq: Limit governor updates to WALT changes alone
0fa652ee00f5 sched: walt: Correct WALT window size initialization
41cbb7bc59fb sched: walt: fix window misalignment when HZ=300
43cbf9d6153d sched/tune: Increase the cgroup limit to 6
c71b8fffe6b3 drivers: cpuidle: lpm-levels: Fix KW issues with idle state idx < 0
938e42ca699f drivers: cpuidle: lpm-levels: Correctly check for list empty
8d8a48aecde5 sched/fair: Fix load_balance() affinity redo path
eccc8acbe705 sched/fair: Avoid unnecessary active load balance
0ffdb886996b BACKPORT: sched/core: Fix rules for running on online && !active CPUs
c9999f04236e sched/core: Allow kthreads to fall back to online && !active cpus
b9b6bc6ea3c0 sched: Allow migrating kthreads into online but inactive CPUs
a9314f9d8ad4 sched/fair: Allow load bigger task load balance when nr_running is 2
c0b317c27d44 pinctrl: qcom: Clear status bit on irq_unmask
45df1516d04a UPSTREAM: mm: fix misplaced unlock_page in do_wp_page()
899def5edcd4 UPSTREAM: mm/ksm: Remove reuse_ksm_page()
46c6fbdd185a BACKPORT: mm: do_wp_page() simplification
90dccbae4c04 UPSTREAM: mm: reuse only-pte-mapped KSM page in do_wp_page()
ebf270d24640 sched/fair: vruntime should normalize when switching from fair
cbe0b37059c9 mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
12d40f1995b4 msm: mdss: Fix indentation
620df03a7229 msm: mdss: Treat polling_en as the bool that it is
12af218146a6 msm: mdss: add idle state node
13e661759656 cpuset: Restore tasks affinity while moving across cpusets
602bf4096dab genirq: Honour IRQ's affinity hint during migration
9209b5556f6a power: qos: Use effective affinity mask
f31078b5825f genirq: Introduce effective affinity mask
58c453484f7e sched/cputime: Mitigate performance regression in times()/clock_gettime()
400383059868 kernel: time: Add delay after cpu_relax() in tight loops
1daa7ea39076 pinctrl: qcom: Update irq handle for GPIO pins
07f7c9961c7c power: smb-lib: Fix mutex acquisition deadlock on PD hard reset
094b738f46c8 power: qpnp-smb2: Implement battery charging_enabled node
d6038d6da57f ASoC: msm-pcm-q6-v2: Add dsp buf check
0d7a6c301af8 qcacld-3.0: Fix OOB in wma_scan_roam.c
Change-Id: Ia2e189e37daad6e99bdb359d1204d9133a7916f4
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
schedutil"
That commit should have changed rq_clock to sched_clock, instead
of sched_ktime_clock, which kept schedutil from making correct
decisions.
This reverts commit ef3fb04c7df43dfa1793e33f764a2581cda96310.
Change-Id: Id4118894388c33bf2b2d3d5ee27eb35e82dc4a96
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
irq_migrate_all_off_this_cpu() is used to migrate IRQs and this
function checks for all active irq in the allocated_irqs mask.
irq_migrate_all_off_this_cpu() expects the caller to take irq_sparse
lock to avoid race conditions while accessing allocated_irqs
mask variable. Prevent a race between irq alloc/free and irq
migration by adding irq_sparse lock across CPU isolation.
Change-Id: I9edece1ecea45297c8f6529952d88b3133046467
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Joel Fernandes found that the synchronize_rcu_tasks() was taking a
significant amount of time. He demonstrated it with the following test:
# cd /sys/kernel/tracing
# while [ 1 ]; do x=1; done &
# echo '__schedule_bug:traceon' > set_ftrace_filter
# time echo '!__schedule_bug:traceon' > set_ftrace_filter;
real 0m1.064s
user 0m0.000s
sys 0m0.004s
Where it takes a little over a second to perform the synchronize,
because there's a loop that waits 1 second at a time for tasks to get
through their quiescent points when there's a task that must be waited
for.
After discussion we came up with a simple way to wait for holdouts but
increase the time for each iteration of the loop but no more than a
full second.
With the new patch we have:
# time echo '!__schedule_bug:traceon' > set_ftrace_filter;
real 0m0.131s
user 0m0.000s
sys 0m0.004s
Which drops it down to 13% of what the original wait time was.
Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: I40bcecdfdb2a1cdae7195f1d3b107455ea4b26b1
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Sync from Qcom's document KBA-180725024109
To avoid the non-wakeup type sensor data break the AP sleep flow,
notify sensor subsystem in the first place of pm_suspend .
Bug: 118418963
Test: measure power consumption after running test case
Change-Id: I2848230d495e30ac462aef148b3f885103d9c24e
Signed-off-by: Frank Luo <luofrank@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
cpu-local slices
commit de53fd7aedb100f03e5d2231cfce0e4993282425 upstream.
It has been observed, that highly-threaded, non-cpu-bound applications
running under cpu.cfs_quota_us constraints can hit a high percentage of
periods throttled while simultaneously not consuming the allocated
amount of quota. This use case is typical of user-interactive non-cpu
bound applications, such as those running in kubernetes or mesos when
run on multiple cpu cores.
This has been root caused to cpu-local run queue being allocated per cpu
bandwidth slices, and then not fully using that slice within the period.
At which point the slice and quota expires. This expiration of unused
slice results in applications not being able to utilize the quota for
which they are allocated.
The non-expiration of per-cpu slices was recently fixed by
'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
condition")'. Prior to that it appears that this had been broken since
at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
added the following conditional which resulted in slices never being
expired.
if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;
Because this was broken for nearly 5 years, and has recently been fixed
and is now being noticed by many users running kubernetes
(https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
that the mechanisms around expiring runtime should be removed
altogether.
This allows quota already allocated to per-cpu run-queues to live longer
than the period boundary. This allows threads on runqueues that do not
use much CPU to continue to use their remaining slice over a longer
period of time than cpu.cfs_period_us. However, this helps prevent the
above condition of hitting throttling while also not fully utilizing
your cpu quota.
This theoretically allows a machine to use slightly more than its
allotted quota in some periods. This overflow would be bounded by the
remaining quota left on each per-cpu runqueueu. This is typically no
more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
change nothing, as they should theoretically fully utilize all of their
quota in each period. For user-interactive tasks as described above this
provides a much better user/application experience as their cpu
utilization will more closely match the amount they requested when they
hit throttling. This means that cpu limits no longer strictly apply per
period for non-cpu bound applications, but that they are still accurate
over longer timeframes.
This greatly improves performance of high-thread-count, non-cpu bound
applications with low cfs_quota_us allocation on high-core-count
machines. In the case of an artificial testcase (10ms/100ms of quota on
80 CPU machine), this commit resulted in almost 30x performance
improvement, while still maintaining correct cpu quota restrictions.
That testcase is available at https://github.com/indeedeng/fibtest.
Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Change-Id: I7d7a39fb554ec0c31f9381f492165f43c70b3924
Signed-off-by: Dave Chiluk <chiluk+linux@indeed.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Hammond <jhammond@indeed.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kyle Anderson <kwa@yelp.com>
Cc: Gabriel Munos <gmunoz@netflix.com>
Cc: Peter Oskolkov <posk@posk.io>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Brendan Gregg <bgregg@netflix.com>
Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@indeed.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit 512ac999d2755d2b7109e996a76b6fb8b888631d upstream.
I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.
It's very simple to reproduce:
mkdir /sys/fs/cgroup/cpu/test
cd /sys/fs/cgroup/cpu/test
echo 100000 > cpu.cfs_quota_us
echo $$ > tasks
then repeat:
cat cpu.stat | grep nr_throttled # nr_throttled will increase steadily
After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.
The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:
a9cf55b28610 ("sched: Expire invalid runtime")
... but was changed to be the current implementation due to its locking bug.
This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).
Change-Id: I8168fe3b45785643536f289ea823d1a62d9d8ab2
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[alakeshh: backport: Fixed merge conflicts:
- sched.h: Fix the indentation and order in which the variables are
declared to match with coding style of the existing code in 4.14
Struct members of same type were declared in separate lines in
upstream patch which has been changed back to having multiple
members of same type in the same line.
e.g. int a; int b; -> int a, b; ]
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since all cores in a cluster are in isolation, PMQoS latency constraint
set by clock driver to switch PLL is ignored. So, Cluster enter to L2PC
and SPM is trying to disable the PLL and at same time clock driver
trying to switch the PLL from other cluster which leads to the
synchronization issues.
Fix is although all cores are in isolation, honor PMQoS request
for cluster LPMs.
Change-Id: I4296e16ef4e9046d1fbe3b7378e9f61a2f11c74d
Signed-off-by: Raghavendra Kakarla <rkakarla@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes one of the infrequent conditions in
commit 54b6baeca500 ("sched/fair: Skip frequency updates if CPU about to idle")
where we could have skipped a frequency update. The fix is to use the
correct flag which skips freq updates.
Note that this is a rare issue (can show up only during CFS throttling)
and even then we just do an additional frequency update which we were
doing anyway before the above patch.
Bug: 64689959
Change-Id: I0117442f395cea932ad56617065151bdeb9a3b53
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
CPU rq util updates happen when rq signals are updated as part of
enqueue and dequeue operations. Doing these updates triggers a call to
the registered util update handler, which takes schedtune boosting
into account. Enqueueing the task in the correct schedtune group after
this happens means that we will potentially not see the boost for an
entire throttle period.
Move the enqueue/dequeue operations for schedtune before the signal
updates which can trigger OPP changes.
Change-Id: I4236e6b194bc5daad32ff33067d4be1987996780
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If CPU is about to idle, prevent a frequency update. With the number of
schedutil governor wake ups are reduced by more than half on a test
playing bluetooth audio.
Test: sugov wake ups drop by more than half when playing music with
screen off (476 / 1092)
Bug: 64689959
Change-Id: I400026557b4134c0ac77f51c79610a96eb985b4a
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We should apply the iowait boost only if cpufreq policy has iowait boost
enabled. Also make it a schedutil configuration from sysfs so it can be
turned on/off if needed (by default initialize it to the policy value).
For systems that don't need/want it enabled, such as those on arm64
based mobile devices that are battery operated, it saves energy when the
cpufreq driver policy doesn't have it enabled (details below):
Here are some results for energy measurements collected running a
YouTube video for 30 seconds:
Before: 8.042533 mWh
After: 7.948377 mWh
Energy savings is ~1.2%
Bug: 38010527
Link: https://lkml.org/lkml/2017/5/19/42
Change-Id: If124076ad0c16ade369253840dedfbf870aff927
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make iowait boost a cpufreq policy option and enable it for intel_pstate
cpufreq driver. Governors like schedutil can use it to determine if
boosting for tasks that wake up with p->in_iowait set is needed.
Bug: 38010527
Link: https://lkml.org/lkml/2017/5/19/43
Change-Id: Icf59e75fbe731dc67abb28fb837f7bb0cd5ec6cc
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Initial Enery Model was calculated with a device including
less number of available frequencies. This change adds
the missing values, note that all performance values had to be
updated so they would be re-normalized to 0-1024.
Bug: 64837462
Test: YouTube did not have energy regression
Change-Id: I2b4c62d06e39fe0da524af96568187042664d62a
Signed-off-by: Andres Oportus <andresoportus@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
CPU idle states are mapped into EAS energy model data structures
according to this table:
+ cpu_idle_status
| + idle-cost-data index
| | + meaning
| | | + expected energy cost
| | | |
-1 0: CPU active CPU energy > 0
0 1: CPU WFI CPU energy > 0
1 2: CPU off (cluster on) CPU energy = 0
2 3: CPU off (cluster off) CPU energy = 0
Change-Id: I4b51bb74cb96c265731f3872c95947474db973ac
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|
| | | |
| | |
| | |
| | |
| | | |
Change-Id: Iad2e3882a2e9d7dbbfd80cf485bbb1f0e664b04f
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need 4 idle-cost-data for CPUs, despite cpu_idle supporting
only 3 different idle states. The idle-cost-data property length
should always be one more entry longer than the number of available
cpu_idle states. The idle-cost-data property has to have the same
length for both CLUSTER_COST_N and CPU_COST_N.
Bug: 37641804
Change-Id: Ic14a6a1ef4409e81c5adc23575f7d1157d6eadce
Signed-off-by: Siqi Lin <siqilin@google.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Squash of commits:
ed6442938f08: Enable EAS in 8998 MTP
3989a0e22e44: Update Energy Model using Muskie
922c6f4b9e8b: Added idle-cost-data to energy model and fixed
busy-cost-data for big cluster cpus
Change-Id: I717eb88204f5e28a1afd494dc484895cc749e2fc
|
| | | |
| | |
| | |
| | | |
Change-Id: Ieb1067c5e276f872ed4c722b7d1fabecbdad87e7
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
when interrupted by a signal, binder_wait_for_work currently returns
-ERESTARTSYS. This error code isn't propagated to user space, but a way
to handle interruption due to signals must be provided to code using
this API.
Replace this instance of -ERESTARTSYS with -EINTR, which is propagated
to user space.
Bug: 180989544
(cherry picked from commit 48f10b7ed0c23e2df7b2c752ad1d3559dad007f9
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing)
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Li Li <dualli@google.com>
Test: built, booted, interrupted a worker thread within
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-3-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie6c7993cab699bc2c1a25a2f9d94b200a1156e5d
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When isolating a CPU, a check is performed to see if there is
only 1 active CPU in the system. If that is the case, the
CPU is not isolated. However this check is done before testing
if the requested CPU is already isolated or not. If the
requested CPU is already isolated, there is no need to fail
the isolation even when there is only 1 active CPU in the system.
For example, 0-6 CPUs are isolated on a 8 CPU machine. When
an isolation request comes for CPU6, which is already isolated,
the current code fail the requesting thinking we end up
with no active CPU in the system.
Change-Id: I28fea4ff67ffed82465e5cfa785414069e4a180a
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Increase WALT minimum window size to 20ms. 10ms isn't large enough
capture workload's pattern.
[beykerykt}: Adapt for HMP
Change-Id: I4d69577fbfeac2bc23db4ff414939cc51ada30d6
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need cpufreq_update_util to report load for the CPU corresponding
to the rq that is passed in as an argument, rather than the CPU executing
cpufreq_update_util.
Change-Id: I8473f230d40928d5920c614760e96fef12745d5a
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
rq_clock may not be updated often enough for schedutil or other
cpufreq governors to work correctly when it's passed as the
timestamp for a load report. Use sched_clock instead.
[beykerykt]: Switch to sched_ktime_clock()
Change-Id: I745b727870a31da25f766c2c2f37527f568c20da
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The cpupri_find() returns the candidate CPUs which are running
lower priority than the waking RT task in the lowest_mask. This
contains isolated CPUs as well. Since the energy aware CPU selection
skips isolated CPUs, no target CPU may be found if all unisolated CPUs
are running higher priority RT tasks. In which case, we fallback to
the default CPU selection algorithm and returns an isolated CPU. This
decision is reversed by select_task_rq() and returns an unisolated
CPU that is busy with other RT tasks. This RT task packing is desired
behavior. However, RT push mechanism pushes the packed RT task to
an isolated CPU. This can be avoided by excluding isolated CPUs from
the lowest_mask returned by cpupri_find().
Change-Id: I75486b3935caf496a638d0333565beffc47fe249
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's not necessary to keep reporting load to the governor
if it doesn't change in a window. Limit updates to when
we expect load changes - after window rollover and when
we send updates related to intercluster migrations.
[beykerykt]: Adapt for HMP
Change-Id: I3232d40f3d54b0b81cfafdcdb99b534df79327bf
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It is preferable that WALT window rollover occurs just
before a tick, since the tick is an opportune moment
to record a complete window's statistics, as well as report
those stats to the cpu frequency governor. When CONFIG_HZ
results in a TICK_NSEC that isn't a integral number, this
requirement may be violated. Account for this by reducing
the WALT window size to the nearest multiple of TICK_NSEC.
Commit d368c6faa19b ("sched: walt: fix window misalignment
when HZ=300") attempted to do this but WALT isn't using
MIN_SCHED_RAVG_WINDOW as the window size and the patch was
doing nothing.
Also, change the type of 'walt_disabled' to bool and warn
if an invalid window size causes WALT to be disabled.
[beykerykt]: Adapt for HMP
Change-Id: Ie3dcfc21a3df4408254ca1165a355bbe391ed5c7
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Due to rounding error hrtimer tick interval becomes 3333333 ns when HZ=300.
Consequently the tick time stamp nearest to the WALT's default window size
20ms will be also 19999998 (3333333 * 6).
[beykerykt]: Adapt for HMP
Change-Id: I08f9bd2dbecccbb683e4490d06d8b0da703d3ab2
Suggested-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The schedtune cgroup controller allows upto 5 cgroups including the
default/root cgroup. Until now the user space is creating only
4 additional cgroups namely, foreground, background, top-app and
audio-app. Recently another cgroup called rt is created before
the audio-app cgroup. Since kernel limits the cgroups to 5, the
creation of audio-app cgroup is failing. Fix this by increasing
the schedtune cgroup controller cgroup limit to 6.
Change-Id: I13252a90dba9b8010324eda29b8901cb0b20bc21
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Idle state calculcations will need to return the state chosen as an
integer. The state chosen is used as a index into the array and as such
cannot be negative value. Do not return negative errors from the
calculations. By default, the state returned wil be zero.
Change-Id: Idb18e933f385cf868fe99fa6a2783f6b8e84c196
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | | |
Correctly check for list empty condition to get least cluster
latency.
Change-Id: I6584a8d6d77794ca506c994d927467e9c1fefa63
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If load_balance() fails to migrate any tasks because all tasks were
affined, load_balance() removes the source cpu from consideration and
attempts to redo and balance among the new subset of cpus.
There is a bug in this code path where the algorithm considers all active
cpus in the system (minus the source that was just masked out). This is
not valid for two reasons: some active cpus may not be in the current
scheduling domain and one of the active cpus is dst_cpu. These cpus should
not be considered, as we cannot pull load from them.
Instead of failing out of load_balance(), we may end up redoing the search
with no valid cpus and incorrectly concluding the domain is balanced.
Additionally, if the group_imbalance flag was just set, it may also be
incorrectly unset, thus the flag will not be seen by other cpus in future
load_balance() runs as that algorithm intends.
Fix the check by removing cpus not in the current domain and the dst_cpu
from considertation, thus limiting the evaluation to valid remaining cpus
from which load might be migrated.
Co-authored-by: Austin Christ <austinwc@codeaurora.org>
Co-authored-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Change-Id: Ife6701c9c62e7155493d9db9398f08c4474e94b3
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When find busiest group, it will avoid load balance if
it is only 1 task running on src cpu. Consider race when
different cpus do newly idle load balance at the same time,
check src cpu nr_running to avoid unnecessary active load
balance again.
See the race condition example here:
1) cpu2 have 2 tasks, so cpu2 rq->nr_running == 2 and cfs.h_nr_running
==2.
2) cpu4 and cpu5 doing newly idle load balance at the same time.
3) cpu4 and cpu5 both see cpu2 sched_load_balance_sg_stats sum_nr_run=2
so they are both see cpu2 as the busiest rq.
4) cpu5 did a success migration task from cpu2, so cpu2 only have 1 task
left, cpu2 rq->nr_running == 1 and cfs.h_nr_running ==1.
5) cpu4 surely goes to no_move because currently cpu4 only have 1 task
which is currently running.
6) and then cpu4 goes here to check if cpu2 need active load balance.
Change-Id: Ia9539a43e9769c4936f06ecfcc11864984c50c29
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As already enforced by the WARN() in __set_cpus_allowed_ptr(), the rules
for running on an online && !active CPU are stricter than just being a
kthread, you need to be a per-cpu kthread.
If you're not strictly per-CPU, you have better CPUs to run on and
don't need the partially booted one to get your work done.
The exception is to allow smpboot threads to bootstrap the CPU itself
and get kernel 'services' initialized before we allow userspace on it.
Change-Id: I515e873a6e5be0cde7771ecedf56101614300fe2
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 955dbdf4ce87 ("sched: Allow migrating kthreads into online but inactive CPUs")
Link: http://lkml.kernel.org/r/20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Backported to 4.4
Signed-off-by: joshuous <joshuous@gmail.com>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
online but not active. A CPU_ONLINE callback may create or bind a
kthread so that its cpus_allowed mask only allows the CPU which is
being brought online. The kthread may start executing before the CPU
is made active and can end up in select_fallback_rq().
In such cases, the expected behavior is selecting the CPU which is
coming online; however, because select_fallback_rq() only chooses from
active CPUs, it determines that the task doesn't have any viable CPU
in its allowed mask and ends up overriding it to cpu_possible_mask.
CPU_ONLINE callbacks should be able to put kthreads on the CPU which
is coming online. Update select_fallback_rq() so that it follows
cpu_online() rather than cpu_active() for kthreads.
Reported-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Change-Id: I562dcc53717b1f2f8324abffb652b91592ba8d5c
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Per-cpu workqueues have been tripping CPU affinity sanity checks while
a CPU is being offlined. A per-cpu kworker ends up running on a CPU
which isn't its target CPU while the CPU is online but inactive.
While the scheduler allows kthreads to wake up on an online but
inactive CPU, it doesn't allow a running kthread to be migrated to
such a CPU, which leads to an odd situation where setting affinity on
a sleeping and running kthread leads to different results.
Each mem-reclaim workqueue has one rescuer which guarantees forward
progress and the rescuer needs to bind itself to the CPU which needs
help in making forward progress; however, due to the above issue,
while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on
the correct CPU if the CPU is in the process of going offline,
tripping the sanity check and executing the work item on the wrong
CPU.
This patch updates __migrate_task() so that kthreads can be migrated
into an inactive but online CPU.
Change-Id: I38cc3eb3b2ec5b7034cc72a2bcdd32a549314915
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When there is only 2 tasks in 1 cpu and the other
task is currently running, allow load bigger task
to be balanced if the other task is currently
running.
Change-Id: I489e9624ba010f9293272a67585e8209a786b787
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
|