| Commit message (Collapse) | Author |
|
Change-Id: Ieb1067c5e276f872ed4c722b7d1fabecbdad87e7
|
|
milliseconds
[ Upstream commit 975e155ed8732cb81f55c021c441ae662dd040b5 ]
We added the 'sched_rr_timeslice_ms' SCHED_RR tuning knob in this commit:
ce0dbbbb30ae ("sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice")
... which name suggests to users that it's in milliseconds, while in reality
it's being set in milliseconds but the result is shown in jiffies.
This is obviously confusing when HZ is not 1000, it makes it appear like the
value set failed, such as HZ=100:
root# echo 100 > /proc/sys/kernel/sched_rr_timeslice_ms
root# cat /proc/sys/kernel/sched_rr_timeslice_ms
10
Fix this to be milliseconds all around.
Signed-off-by: Shile Zhang <shile.zhang@nokia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1485612049-20923-1-git-send-email-shile.zhang@nokia.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
* We'll take the upstream fix
This reverts commit 1b99e92be2574035c814686987f5add55dccfaf5.
Change-Id: I0e522a100091138689f8a4ba8c288ce1a7fc625b
|
|
The definition of sysctl_sched_migration_cost, sysctl_sched_nr_migrate
and sysctl_sched_time_avg includes the attribute const_debug. This
attribute is not part of the extern declaration of these variables in
include/linux/sched/sysctl.h, while it is in kernel/sched/sched.h,
and as a result Clang generates warnings like this:
kernel/sched/sched.h:1618:33: warning: section attribute is specified on redeclared variable [-Wsection]
extern const_debug unsigned int sysctl_sched_time_avg;
^
./include/linux/sched/sysctl.h:42:21: note: previous declaration is here
extern unsigned int sysctl_sched_time_avg;
The header only declares the variables when CONFIG_SCHED_DEBUG is defined,
therefore it is not necessary to duplicate the definition of const_debug.
Instead we can use the attribute __read_mostly, which is the expansion of
const_debug when CONFIG_SCHED_DEBUG=y is set.
Change-Id: I49537778bdb93f0ec1e0ceade94a3d32dd30b09f
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shile Zhang <shile.zhang@nokia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171030180816.170850-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 264aed7c2d2c7159a8980a3a897a9e118b5a69f1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
[clingutla@codeaurora.org: Resolved minor merge conflicts]
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
|
|
../kernel/sched/sched.h:1154:36: warning: section attribute is specified
on redeclared variable [-Wsection]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
|
|
The definition of sysctl_sched_migration_cost, sysctl_sched_nr_migrate
and sysctl_sched_time_avg includes the attribute const_debug. This
attribute is not part of the extern declaration of these variables in
include/linux/sched/sysctl.h, while it is in kernel/sched/sched.h,
and as a result Clang generates warnings like this:
kernel/sched/sched.h:1618:33: warning: section attribute is specified on redeclared variable [-Wsection]
extern const_debug unsigned int sysctl_sched_time_avg;
^
./include/linux/sched/sysctl.h:42:21: note: previous declaration is here
extern unsigned int sysctl_sched_time_avg;
The header only declares the variables when CONFIG_SCHED_DEBUG is defined,
therefore it is not necessary to duplicate the definition of const_debug.
Instead we can use the attribute __read_mostly, which is the expansion of
const_debug when CONFIG_SCHED_DEBUG=y is set.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shile Zhang <shile.zhang@nokia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171030180816.170850-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://git.kernel.org/linus/a9903f04e0a4ea522d959c2f287cdf0ab029e324
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
|
|
Add trace point to track IRQs disable callers to
isolate issues unrelated to scheduler and improve debug
turn around time.
Change-Id: Ib1ef45d8bed1fc0e128b5ab2051f0c30e8c50ee7
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
|
|
Add trace point to track preemption disable callers to
isolate issues unrelated to scheduler and improve debug
turn around time.
Change-Id: If9303b7165167e8f79cd339929daf4afc31a61c4
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
|
|
We all should be using (and improving) the schedutil governor now. Get
rid of the non-upstream governor.
Tested on Hikey.
Change-Id: Ic660756536e5da51952738c3c18b94e31f58cd57
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
With the new wakeup approach this sysctl is not necessary any more.
Change-Id: I52114b3c918791f6a4f9f30f50002919ccbc1a9c
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
(cherry picked from commit 885c0d503bcdf0ef4e9b46822496f16b20aa3bbd)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
|
|
Setting sched_freq_reporting_policy tunable to an unsupported
values results in a warning from the scheduler. The previous
policy setting is also lost.
As sched_freq_reporting_policy can not be set to an incorrect
value now, remove the WARN_ON_ONCE from the scheduler.
Change-Id: I58d7e5dfefb7d11d2309bc05a1dd66acdc11b766
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
Colocation in HMP includes a tunable that turns on or off the feature
globally across all colocation groups. Supporting this tunable correctly
would result in complexity that would outweigh any foreseeable benefits.
For example, disabling the feature globally would involve deleting all
colocation groups one by one while ensuring no placement decisions are
made during the process.
Remove the tunable. Adding or removing a task from a colocation group is
still possible and so we're not losing functionality.
Change-Id: I4cb8bcdbee98d3bdd168baacbac345eca9ea8879
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
Since clusters can vary significantly in the power and performance
characteristics, there may be a need to have different CPU selection
policies based on which cluster a task is being placed on. For example
the placement policy can be more aggressive in using idle CPUs on
cluster that are power efficient and less aggressive on clusters
that are geared towards performance. Add support for per cluster
wake_up_idle flag to allow greater flexibility in placement policies.
Change-Id: I18cd3d907cd965db03a13f4655870dc10c07acfe
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
Low sleep time can be an indication that waking tasks will not receive
any vruntime bonus and hence would suffer from latency when packed.
short-burst tasks sleeping on an average more than sched_short_sleep_ns
are not eligible for packing. This policy covers the case where a
task runs in short bursts and sleeping for smaller duration in between.
Change-Id: Ib81fa37809b85c267949cd433bc6115dd89f100e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
Introduce sched_short_burst tunable to classify "short-burst" tasks.
These tasks are eligible for packing to avoid overhead associated with
waking up an idle CPU. select_best_cpu() ignores power-cost and selects
the CPU with least wakeup latency which is not loaded with IRQs and
can accommodate this task without exceeding spill limits. The ties are
broken with load followed by previous CPU.
This policy does not affect cluster selection but only CPU selection
in the selected cluster. The tasks eligible for "wakeup-up-idle" and
"boost" are not considered for packing. This policy is applied for
both "fair" and "rt" scheduling class tasks.
Change-Id: I2a05493fde93f58636725f18d0ce8dbce4418a30
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
The recent introduction of the schedtune cgroup controller has provided
the scheduler with added flexibility in terms of some of it's placement
features. In particular each cgroup under the schedtune controller can
now specify:
1) Whether it needs co-location along with other cgroups
2) Whether it is eligible for scheduler boost (sched_boost_enabled)
3) Whether the kernel can override the boost eligibility when necessary
(sched_boost_no_override)
The scheduler now creates a reserved co-location group at boot. This
group is used to co-locate all tasks that form part of any one of the
cgroups that have co-location enabled. This reserved group can neither
be destroyed nor reused for other purposes. Furthermore, cgroups are
only allowed to indicate their co-location preference once at boot.
Further updates are disallowed.
Since we are now creating co-location groups for an extended period of
time, there are a few other factors to consider when determining the
preferred cluster for the group. We first exclude any tasks in the
group that have not been observed to be running for a significant
amount of time. Secondly we introduce the notion of group up and down
migrate tunables to allow different migration policies than individual
tasks. Lastly we break co-location if a single task in a group exceeds
up-migrate but the total load of the group does not exceed group
up-migrate.
In terms of sched_boost, the scheduler now supports multiple types of
boost. These are:
1) FULL_THROTTLE : Force up-migrate tasks belonging any cgroup that
has the sched_boost_enabled flag turned on. Little
CPUs will only be used when big CPUs can no longer
accommodate tasks. Also up-migrate all RT tasks.
2) CONSERVATIVE : Override the sched_boost_enabled flag for all cgroups
except those that have the sched_boost_no_override
flag set. Force up-migrate all tasks belonging to only
those cgroups that still remain eligible for boost.
RT tasks do not get force up migrated.
3) RESTRAINED : Start frequency aggregation for co-located tasks. This
type of boost does not force up-migrate any task.
Finally the boost API removes ref-counting. This means that there can
only be a single entity using boost at any given time. If multiple
entities are managing boost, they are required to be well behaved so
that they don't interfere with one another. Even for a single client,
it is not possible to switch directly from one boost type to another.
Boost must be first turned off before switching over to a new type.
Change-Id: I8d224a70cbef162f27078b62b73acaa22670861d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
The previous patches in this series introduce the mechanics of CPU
load tracking without fixups for intra cluster migration and top task
load tracking. Add a tunable that dictates what of the above needs to
be considered when reporting load to the governor. The default policy
is to take the maximum of the CPU load and top task load.
Change-Id: Ie585a11ed774b929910d04c41471db3a2a102ec5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.
To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.
This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
/proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
- 0% boost requires to operate in "standard" mode by scheduling
tasks at the minimum capacities required by the workload demand
- 100% boost requires to push at maximum the task performances,
"regardless" of the incurred energy consumption
A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.
Change-Id: I59a41725e2d8f9238a61dfb0c909071b53560fc0
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 63c8fad2b06805ef88f1220551289f0a3c3529f1
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.4
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
The current policy has a preference to select an idle CPU in the waker
cluster compared to the waker CPU running only 1 task. By selecting
an idle CPU, it eliminates the chance of waker migrating to a
different CPU after the wakee preempts it. This policy is also not
susceptible to the incorrect "sync" usage i.e the waker does not
goto sleep after waking up the wakee.
However LPM exit latency associated with an idle CPU outweigh the
above benefits on some targets. So add a knob to prefer the waker
CPU having only 1 runnable task over idle CPUs in the waker cluster.
Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
Schedules on a core whose irq count is less than a threshold.
Improves I/O performance of EAS.
Change-Id: I08ff7dd0d22502a0106fc636b1af2e6fe9e758b5
|
|
use a window based view of time in order to track task
demand and CPU utilization in the scheduler.
Window Assisted Load Tracking (WALT) implementation credits:
Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park,
Pavan Kumar Kondeti, Olav Haugan
2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla
and Todd Kjos
Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708
Includes fixes for issues:
eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a
race resulting in watchdog resets
BUG: 29353986
Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb
Handle walt accounting anomoly during resume
During resume, there is a corner case where on wakeup, a task's
prev_runnable_sum can go negative. This is a workaround that
fixes the condition and warns (instead of crashing).
BUG: 29464099
Change-Id: I173e7874324b31a3584435530281708145773508
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
The choice of initial task load upon fork has a large influence
on CPU and OPP selection when scheduler-driven DVFS is in use.
Make this tuneable by adding a new sysctl "sched_initial_task_util".
If the sched governor is not used, the default remains at SCHED_LOAD_SCALE
Otherwise, the value from the sysctl is used. This defaults to 0.
Signed-off-by: "Todd Kjos <tkjos@google.com>"
|
|
EAS assumes that clusters with smaller capacity cores are more
energy-efficient. This may not be true on non-big-little devices,
so EAS can make incorrect cluster selections when finding a CPU
to wake. The "sched_is_big_little" hint can be used to cause a
cpu-based selection instead of cluster-based selection.
This change incorporates the addition of the sync hint enable patch
EAS did not honour synchronous wakeup hints, a new sysctl is
created to ask EAS to use this information when selecting a CPU.
The control is called "sched_sync_hint_enable".
Also contains:
EAS: sched/fair: for SMP bias toward idle core with capacity
For SMP devices, on wakeup bias towards idle cores that have capacity
vs busy devices that need a higher OPP
eas: favor idle cpus for boosted tasks
BUG: 29533997
BUG: 29512132
Change-Id: I0cc9a1b1b88fb52916f18bf2d25715bdc3634f9c
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
eas/sched/fair: Favoring busy cpus with low OPPs
BUG: 29533997
BUG: 29512132
Change-Id: I9305b3239698d64278db715a2e277ea0bb4ece79
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
|
|
Introduce a new sysctl for this option, 'sched_cstate_aware'.
When this is enabled, select_idle_sibling in CFS is modified to
choose the idle CPU in the sibling group which has the lowest
idle state index - idle state indexes are assumed to increase
as sleep depth and hence wakeup latency increase. In this way,
we attempt to minimise wakeup latency when an idle CPU is
required.
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Includes:
sched: EAS: fix select_idle_sibling
when sysctl_sched_cstate_aware is enabled, best_idle cpu will not be chosen
in the original flow because it will goto done directly
Bug: 30107557
Change-Id: Ie09c2e3960cafbb976f8d472747faefab3b4d6ac
Signed-off-by: martin_liu <martin_liu@htc.com>
|
|
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.
To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.
This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
/proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
- 0% boost requires to operate in "standard" mode by scheduling
tasks at the minimum capacities required by the workload demand
- 100% boost requires to push at maximum the task performances,
"regardless" of the incurred energy consumption
A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|
|
Do the aggregation for frequency only when the total group busy time
is above sched_freq_aggregate_threshold. This filtering is especially
needed for the cases where groups are created by including all threads
of an application process. This knob can be tuned to apply aggregation
only for the heavy workload applications.
When this knob is enabled and load is aggregated, the load is not
clipped to 100% @ current frequency to ramp up the frequency faster.
Change-Id: Icfd91c85938def101a989af3597d3dcaa8026d16
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
When sysctl_sched_enable_thread_grouping is set to 1, any new tasks
created are put in the same group as their group leader.
Change-Id: If1837dd7c8120c8b097cfffa1dc52eb4781f1641
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
CONFIG_SCHED_FREQ_INPUT was created to keep parts of the scheduler
dealing with frequency separate from other parts of the scheduler
that deal with task placement. However, overtime the two features
have become intricately linked whereby SCHED_FREQ_INPUT cannot be
turned on without having SCHED_HMP turned on as well. Given this
complex inter-dependency and the fact that all old, existing and
future targets use both config options, remove this unnecessary
feature separation. It will aid in making kernel upgrades a lot
simpler and faster.
Change-Id: Ia20e40d8a088d50909cc28f5be758fa3e9a4af6f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
Migration notifiers were created to aid the CPU-boost driver manage
CPU frequencies when tasks migrate from one CPU to another. Over time
with the evolution of scheduler guided frequency, the scheduler now
directly manages load when tasks migrate. Consequently the CPU-boost
driver no longer makes use of this information. Remove unused code
pertaining to this feature.
Change-Id: I3529e4356e15e342a5fcfbcf3654396752a1d7cd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
Schedules on a core whose irq count is less than a threshold.
Improves I/O performance of EAS.
Change-Id: I08ff7dd0d22502a0106fc636b1af2e6fe9e758b5
|
|
use a window based view of time in order to track task
demand and CPU utilization in the scheduler.
Window Assisted Load Tracking (WALT) implementation credits:
Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park,
Pavan Kumar Kondeti, Olav Haugan
2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla
and Todd Kjos
Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708
Includes fixes for issues:
eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a
race resulting in watchdog resets
BUG: 29353986
Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb
Handle walt accounting anomoly during resume
During resume, there is a corner case where on wakeup, a task's
prev_runnable_sum can go negative. This is a workaround that
fixes the condition and warns (instead of crashing).
BUG: 29464099
Change-Id: I173e7874324b31a3584435530281708145773508
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
The choice of initial task load upon fork has a large influence
on CPU and OPP selection when scheduler-driven DVFS is in use.
Make this tuneable by adding a new sysctl "sched_initial_task_util".
If the sched governor is not used, the default remains at SCHED_LOAD_SCALE
Otherwise, the value from the sysctl is used. This defaults to 0.
Signed-off-by: "Todd Kjos <tkjos@google.com>"
|
|
EAS assumes that clusters with smaller capacity cores are more
energy-efficient. This may not be true on non-big-little devices,
so EAS can make incorrect cluster selections when finding a CPU
to wake. The "sched_is_big_little" hint can be used to cause a
cpu-based selection instead of cluster-based selection.
This change incorporates the addition of the sync hint enable patch
EAS did not honour synchronous wakeup hints, a new sysctl is
created to ask EAS to use this information when selecting a CPU.
The control is called "sched_sync_hint_enable".
Also contains:
EAS: sched/fair: for SMP bias toward idle core with capacity
For SMP devices, on wakeup bias towards idle cores that have capacity
vs busy devices that need a higher OPP
eas: favor idle cpus for boosted tasks
BUG: 29533997
BUG: 29512132
Change-Id: I0cc9a1b1b88fb52916f18bf2d25715bdc3634f9c
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
eas/sched/fair: Favoring busy cpus with low OPPs
BUG: 29533997
BUG: 29512132
Change-Id: I9305b3239698d64278db715a2e277ea0bb4ece79
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
|
|
Introduce a new sysctl for this option, 'sched_cstate_aware'.
When this is enabled, select_idle_sibling in CFS is modified to
choose the idle CPU in the sibling group which has the lowest
idle state index - idle state indexes are assumed to increase
as sleep depth and hence wakeup latency increase. In this way,
we attempt to minimise wakeup latency when an idle CPU is
required.
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Includes:
sched: EAS: fix select_idle_sibling
when sysctl_sched_cstate_aware is enabled, best_idle cpu will not be chosen
in the original flow because it will goto done directly
Bug: 30107557
Change-Id: Ie09c2e3960cafbb976f8d472747faefab3b4d6ac
Signed-off-by: martin_liu <martin_liu@htc.com>
|
|
This reverts commit 8f90803a45d3aa349 ("sched: warn/panic upon excessive
scheduling latency") as this feature is no longer used.
Change-Id: I200d0e9e8dad5047522cd02a68de25d4a70a91a4
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
This reverts commit b40bf941f61756bcc ("sched: add scheduling latency
tracking procfs node") as this feature is no longer used.
Change-Id: I5de789b6349e6ea78ae3725af2a3ffa72b7b7f11
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
This has always been unused feature given its limitation of adding
phantom load to the system. Since there are no immediate plans of
using this and the fact that it adds unnecessary complications to
the new load fixup mechanism, remove this feature for now. It can
be revisited later in light of the new mechanism.
Change-Id: Ie9501a898d0f423338293a8dde6bc56f493f1e75
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Kill unused scheduler knob sched_migration_fixup. With this change
scheduler always adjusts CPU's busy time during migration.
Change-Id: I5d59e89d5cc0f2c705c40036cd7b47f5d3f89e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Kill unused scheduler knob sched_upmigrate_min_nice.
Change-Id: I53ddfde39c78e78306bd746c1c4da9a94ec67cd8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Kill unused scheduler knob and parameter sched_enable_power_aware. HMP
scheduler always take into account power cost for placing task.
Change-Id: Ib26a21df9b903baac26c026862b0a41b4a8834f3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Kill unused scheduler knob sched_freq_account_wait_time.
Change-Id: Ib74123ebd69dfa3f86cf7335099f50c12a6e93c3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Kill unused scheduler knob sched_account_wait_time. With this change
scheduler always accounts task's wait time into demand.
Change-Id: Ifa4bcb5685798f48fd020f3d0c9853220b3f5fdc
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Related threads in a group could execute on different CPUs and hence
present a split-demand picture to cpufreq governor. IOW the governor
fails to see the net cpu demand of all related threads in a given
window if the threads's execution were to be split across CPUs. That
could result in sub-optimal frequency chosen in comparison to the
ideal frequency at which the aggregate work (taken up by related
threads) needs to be run.
This patch aggregates cpu execution stats in a window for all related
threads in a group. This helps present cpu busy time to governor as if
all related threads were part of the same thread and thus help select
the right frequency required by related threads. This aggregation
is done per-cluster.
Change-Id: I71e6047620066323721c6d542034ddd4b2950e7f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: Fixed notify_migration() to hold rcu read
lock as this version of Linux doesn't hold p->pi_lock when the
function gets called while keeping use of rcu_access_pointer() since
we never dereference return value.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.
To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.
This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
/proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
- 0% boost requires to operate in "standard" mode by scheduling
tasks at the minimum capacities required by the workload demand
- 100% boost requires to push at maximum the task performances,
"regardless" of the incurred energy consumption
A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|
|
Biasing sync wakee task towards waker CPU's cluster makes sense when the
waker's demand is high enough so the wakee also can take advantage
of high CPU frequency voted because of waker's load. Placing sync wakee
on the low demand waker's CPU can lead placement imbalance which can
lead unnecessary migration.
Introduce a new tunable "sched_big_waker_task_load" that defines the big
waker so scheduler avoid wakee on waker's cluster bias when the waker's
load is below the tunable.
CRs-fixed: 971295
Change-Id: I1550ede0a71ac8c9be74a7daabe164c6a269a3fb
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
include/linux/sched/sysctl.h.]
|
|
If sync wakee task's demand is small it's worth to place the wakee task
on waker's cluster for better performance in the sense that waker and
wakee are corelated so the wakee should take advantage of waker cluster's
frequency which is voted by the waker along with cache locality benefit.
While biasing towards the waker's cluster we want to avoid the waker CPU
as much as possible as placing the wakee on the waker's CPU can make the
waker got preempted and migrated by load balancer.
Introduce a new tunable 'sched_small_wakee_task_load' that differentiates
eligible small wakee task and place the small wakee tasks on the waker's
cluster.
CRs-fixed: 971295
Change-Id: I96897d9a72a6f63dca4986d9219c2058cd5a7916
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
include/linux/sched/sysctl.h.]
|
|
Current window based load tracking only saves history for five
windows. A historically heavy task's heavy load will be completely
forgotten after five windows of light load. Even before the five
window expires, a heavy task wakes up on same CPU it used to run won't
trigger any frequency change until end of the window. It would starve
for the entire window. It also adds one "small" load window to
history because it's accumulating load at a low frequency, further
reducing the tracked load for this heavy task.
Ideally, scheduler should be able to identify such tasks and notify
governor to increase frequency immediately after it wakes up.
Add a histogram for each task to track a much longer load history. A
prediction will be made based on runtime of previous or current
window, histogram data and load tracked in recent windows. Prediction
of all tasks that is currently running or runnable on a CPU is
aggregated and reported to CPUFreq governor in sched_get_cpus_busy().
sched_get_cpus_busy() now returns predicted busy time in addition
to previous window busy time and new task busy time, scaled to
the CPU maximum possible frequency.
Tunables:
- /proc/sys/kernel/sched_gov_alert_freq (KHz)
This tunable can be used to further filter the notifications.
Frequency alert notification is sent only when the predicted
load exceeds previous window load by sched_gov_alert_freq converted to
load.
Change-Id: If29098cd2c5499163ceaff18668639db76ee8504
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task()
and removed changes for CONFIG_SCHED_QHMP.]
|
|
The frequency based inter cluster load balance restrictions are not
reliable as frequency does not provide a good estimate of the CPU's
current load. Replace them with the spill_load and spill_nr_run
based checks.
The higher capacity cluster is restricted from pulling the tasks from
the lower capacity cluster unless all of the lower capacity CPUs are
above spill. This behavior can be controlled by a sysctl tunable and
it is disabled by default (i.e. no load balance restrictions).
Change-Id: I45c09c8adcb61a8a7d4e08beadf2f97f1805fb42
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes
for CONFIG_SCHED_QHMP.]
|
|
Provide userspace interface for tasks to be grouped together as
"related" threads. For example, all threads involved in updating
display buffer could be tagged as related.
Scheduler will attempt to provide special treatment for group of
related threads such as:
1) Colocation of related threads in same "preferred" cluster
2) Aggregation of demand towards determination of cluster frequency
This patch extends scheduler to provide best-effort colocation support
for a group of related threads.
Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor merge conflicts. removed ifdefry
for CONFIG_SCHED_QHMP.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
|
Make use of clusters in the fair and rt scheduling classes. This is
needed as the freq domain mask can no longer be used to do correct
task placement. The freq domain mask was being used to demarcate
clusters.
Change-Id: I57f74147c7006f22d6760256926c10fd0bf50cbd
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes
for CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|