| Commit message (Collapse) | Author | Age |
| ... | |
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
At present in order to estimate power cost of CPU load, HMP scheduler
converts CPU load to coresponding frequency on the fly which can be
avoided.
Optimize and reduce execution time of select_best_cpu() by precomputing
CPU load to frequency conversion. This optimization reduces about ~20% of
execution time of select_best_cpu() on average.
Change-Id: I385c57f2ea9a50883b76ba6ca3deb673b827217f
[joonwoop@codeaurora.org: fixed minior conflict in kernel/sched/sched.h.
stripped out codes for CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The commit 392edf4969d20 ("sched: avoid stale cumulative_runnable_avg
HMP statistics) introduced the callback function fixup_hmp_sched_stats()
so update_history() can avoid decrement and increment pair of HMP stat.
However the commit also made fixup function to do obscure p->ravg.demand
update which isn't the cleanest way.
Revise the function fixup_hmp_sched_stats() so the caller can update
p->ravg.demand directly.
Change-Id: Id54667d306495d2109c26362813f80f08a1385ad
[joonwoop@codeaurora.org: stripped out CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Account amount of load contributed by new tasks within CPU load so that
governor can apply different policy when CPU is loaded by new tasks.
To be able to distinguish new task load a new tunable
sched_new_task_windows also introduced. The tunable defines tasks as new
when the tasks are have been active less than configured windows.
Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
[joonwoop@codeaurora.org: ommited changes for
drivers/cpufreq/cpufreq_interactive.c. cpufreq changes needs to be
applied separately later. fixed conflict in include/linux/sched.h and
include/linux/sched/sysctl.h. omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add per-cpu tunable to set the extra cost to use a CPU that is idle.
Add the same for a cluster.
Change-Id: I4aa53f3c42c963df7abc7480980f747f0413d389
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: omitted changes for qhmp*.[c,h] stripped out
CONFIG_SCHED_QHMP in drivers/base/cpu.c and include/linux/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add new API to the scheduler to allow low power mode driver to inform
the scheduler about the d-state of a cluster. This can be leveraged by
the scheduler to make an informed decision about the cost of placing a task
on a cluster.
Change-Id: If0fe0fdba7acad1c2eb73654ebccfdb421225e62
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: omitted fixes for qhmp_core.c and qhmp_core.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
At present HMP scheduler packs tasks to busy CPU till the CPU's load is
100% to avoid waking up of idle CPU as much as possible. Such aggressive
packing leads unintended CPU frequency raise as governor raises the busy
CPU's frequency when its load is more than configured frequency max load
which can be less than 100%.
Fix to take into account of governor's frequency max load and pack tasks
only when the CPU's projected load is less than max load to avoid
unnecessary frequency raise.
Change-Id: I4447e5e0c2fa5214ae7a9128f04fd7585ed0dcac
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add zone awareness to the load balancer. Remove all earlier restrictions
that the load balancer had for inter cluster kicks and migration.
Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due
to its return type change.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For the fair sched class, update the select_best_cpu() policy to do
power based placement. The hope is to minimize the voltage at which
the CPU runs.
While RT tasks already do power based placement, their placement
preference has to now take into account the power cost of all tasks
on a given CPU. Also remove the check for sched_boost since
sched_boost no longer intends to elevate all tasks to the highest
capacity cluster.
Change-Id: Ic6a7625c97d567254d93b94cec3174a91727cb87
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
Task packing will now be determined solely on the basis of the
power cost of task placement. All tasks are eligible for packing.
Remove the notion of "small" tasks from the scheduler.
Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a new window starts for a task and the task is on a rq, scheduler
decreases rq's cumulative_runnable_avg momentarily, re-account task's
demand and increases rq's cumulative_runnable_avg with newly accounted
task's demand. Therefore there is short time period that rq's
cumulative_runnable_avg is less than what it's supposed to be.
Meanwhile, there is chance that other CPU is in search of best CPU to place
a task and makes suboptimal decision with momentarily stale
cumulative_runnable_avg.
Fix such issue by adding or subtracting of delta between task's old
and new demand instead of decrementing and incrementing of entire task's
load.
Change-Id: I3c9329961e6f96e269fa13359e7d1c39c4973ff2
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently RT tasks prefer to go to the lowest power CPU in the
system. This can end up causing contention on the lowest power
CPU. Instead ensure that RT tasks end up on the lowest power
cluster and the least loaded CPU within that cluster.
Change-Id: I363b3d43236924962c67d2fb5d3d2d09800cd994
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| | |
Inline relatively small and frequently used function scale_load_to_cpu().
CRs-fixed: 849655
Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
select_best_cpu() is a crucial wakeup routine that determines the
time taken by the scheduler to wake up a task. Optimize this routine
to get higher performance. The following changes have been made as
part of the optimization listed in order of how they built on top of
one another:
* Several routines called by select_best_cpu() recalculate task load
and CPU load even though these are already known quantities. For
example mostly_idle_cpu_sync() calculates CPU load; task_will_fit()
calculates task load before spill_threshold_crossed() recalculates
both. Remove these redundant calculations by moving the task load
and CPU load computations to the select_best_cpu() 'for' loop and
passing to any functions that need the information.
* Rewrite best_small_task_cpu() to avoid the existing two pass
approach. The two pass approach was only in place to find the
minimum power cluster for small task placement. This information
can easily be established by looking at runqueue capacities. The
cluster with not the highest capacity constitutes the minimum power
cluster. A special CPU mask is called the mpc_mask required to safeguard
against undue side effects on SMP systems. Also terminate the function
early if the previous CPU is found to be mostly_idle.
* Reorganize code to ensure that no unnecessary computations or
variable assignments are done. For example there is no need to
compute CPU load if that information does not end up getting used
in any iteration of the 'for' loop.
* The tick logic for EA migrations unnecessarily checks for the power
of all CPUs only for skip_cpu() to throw away the result later.
Ensure that for EA we only check CPUs within the same cluster
and avoid running select_best_cpu() whenever possible.
CRs-fixed: 849655
Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask.
added a comment about prerequisite of lower_power_cpu_available().
s/struct rq * rq/struct rq *rq/. s/TASK_NICE/task_nice/]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The busy time of CPUs is adjusted during task migrations. This can
result in reporting the load greater than 100% to the governor and
causes direct jumps to the higher frequencies during the intra cluster
migrations. Hence clip the load to 100% during the load reporting at
the end of the window. The load is not clipped for load alert notifications
which allows ramping up the frequency faster for inter cluster migrations
and heavy task wakeup scenarios.
Change-Id: I7347260aa476287ecfc706d4dd0877f4b75a1089
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 0e2092e47488 ("sched: Use only partial wait time as
task demand") as it causes performance regression.
Change-Id: I3917858be98530807c479fc31eb76c0f22b4ea89
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The load scale factor of a CPU gets boosted when its max freq
is restricted. A task load at the same frequency is scaled higher
than normal under this scenario. This results in tasks migrating
early to the better capacity CPUs and their residency over there
also gets increased as their inflated load would be relatively
higher than than the downmigrate threshold.
Auto adjust the upmigrate and downmigrate thresholds by a factor
equal to rq->max_possible_freq/rq->max_freq of a lower capacity CPU.
If the adjusted upmigrate threshold exceeds the window size, it is
clipped to the window size. If the adjusted downmigrate threshold
decreases the difference between the upmigrate and downmigrate, it is
clipped to a value such that the difference between the modified
and the original thresholds is same.
Change-Id: Ifa70ee5d4ca5fe02789093c7f070c77629907f04
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The scheduler currently either considers a tasks entire wait time as
task demand or completely ignores wait time based on the tunable
sched_account_wait_time. Both approaches have their limitations,
however. The former artificially boosts tasks demand when it may not
actually be justified. With the latter, the scheduler runs the risk
of never being able to recognize true load (consider two CPU hogs on
a single little CPU). To achieve a compromise between these two
extremes, change the load tracking algorithm to only consider part of
a tasks wait time as its demand. The portion of wait time accounted
as demand is determined by each tasks percent load, i.e. a task that
waits for 10ms and has 60 % task load, only 6 ms of the wait will
contribute to task demand. This approach is more fair as the scheduler
now tries to determine how much of its wait time would a task actually
have been using the CPU if it had been executing. It ensures that tasks
with high demand continue to see most of the benefits of accounting
wait time as busy time, however, lower demand tasks don't experience a
disproportionately high boost to demand triggering unjustified big CPU
usage. Note that this new approach is only applicable to wait time
being considered as task demand and not wait time considered as CPU
busy time.
To achieve the above effect, ensure that anytime a task is waiting, its
runtime in every relevant window segment is appropriately adjusted using
its pct load.
Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When an entire cluster is hotplugged, the scheduler's notion of
max_capacity can get outdated. This introduces the following
inefficiencies in behavior:
* task_will_fit() does not return true on all tasks. Consequently
all big tasks go through fallback CPU selection logic skipping
C-state and power checks in select_best_cpu().
* During boost, migration_needed() return true unnecessarily
causing an avoidable rerun of select_best_cpu().
* An unnecessary kick is sent to all little CPUs when boost is set.
* An opportunity for early bailout from nohz_kick_needed() is lost.
Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback
which indicates the last CPU in a cluster being hotplugged out. Also
modify update_min_max_capacity() to only iterate through online CPUs
instead of possible CPUs. While we can't guarantee the integrity of
the cpu_online_mask in the notifier callback, the scheduler will fix
up all state soon after any changes to the online mask.
The change does have one side effect; early termination from the
notifier callback when min_max_freq or max_possible_freq remain
unchanged is no longer possible. This is because when the last CPU
in a cluster is hot removed, only max_capacity is updated without
affecting min_max_freq or max_possible_freq. Therefore, when the
first CPU in the same cluster gets hot added at a later point
max_capacity must once again be recomputed despite there being no
change in min_max_freq or max_possible_freq.
Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It may be desirable to discourage upmigration of tasks belonging to
some cgroups. Add a per-cgroup flag (upmigrate_discourage) that
discourages upmigration of tasks of a cgroup. Tasks of the cgroup are
allowed to upmigrate only under overcommitted scenario.
Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Use new cgroup APIs]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.
Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.
* sched_update_nr_prod() needs to consider that nr_running count can
change by more than 1 when CFS_BANDWIDTH feature is used
* sched_get_nr_running_avg() needs to sum up nr_iowait count across
all cpus, rather than just one
* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
as a result of which it could use curr_time which is behind a cpu's
'last_time' value. That would lead to erroneous calculation of
average nr_running or nr_iowait.
While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.
Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
CFS_BANDWIDTH feature is not currently well-supported by HMP
scheduler. Issues encountered include a kernel panic when
rq->nr_big_tasks count becomes negative. This patch fixes HMP
scheduler code to better handle CFS_BANDWIDTH feature. The most
prominent change introduced is maintenance of HMP stats (nr_big_tasks,
nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in
addition to being maintained in each 'struct rq'. This allows HMP
stats to be updated easily when a group is throttled on a cpu.
Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in dequeue_task_fair().]
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Key hmp stats (nr_big_tasks, nr_small_tasks and
cumulative_runnable_average) are currently maintained per-cpu in
'struct rq'. Merge those stats in their own structure (struct
hmp_sched_stats) and modify impacted functions to deal with the newly
introduced structure. This cleanup is required for a subsequent patch
which fixes various issues with use of CFS_BANDWIDTH feature in HMP
scheduler.
Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __update_load_avg().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add code to calculate the run queue depth of a cpu and iowait
depth of the cpu.
The scheduler calls in to sched_update_nr_prod whenever there
is a runqueue change. This function maintains the runqueue average
and the iowait of that cpu in that time interval.
Whoever wants to know the runqueue average is expected to call
sched_get_nr_running_avg periodically to get the accumulated
runqueue and iowait averages for all the cpus.
Change-Id: Id8cb2ecf0ed479f090a83ccb72dd59c53fa73e0c
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
(cherry picked from commit 0299fcaaad80e2c0ac9aa583c95107f6edc27750)
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.
Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.
Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If irq raises while sched_irqload() is calculating irqload delta,
sched_account_irqtime() can update rq's irqload_ts which can be greater
than the jiffies stored in sched_irqload()'s context so delta can be
negative. This negative delta means there was recent irq occurence.
So remove improper BUG_ON().
CRs-fixed: 771894
Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
Add the current CPU temperature to the sched_cpu_load trace point.
This will allow us to track the CPU temperature.
CRs-Fixed: 764788
Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| | |
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.
Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
CPUs with significant IRQ activity will not be able to serve tasks
quickly. Avoid them if possible by disqualifying such CPUs from
being recognized as mostly idle.
Change-Id: I2c09272a4f259f0283b272455147d288fce11982
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The scheduler currently ignores irq activity when deciding which
CPUs to place tasks on. If a CPU is getting hammered with IRQ activity
but has no tasks it will look attractive to the scheduler as it will
not be in a low power mode.
Track irqload with a decaying average. This quantity can be used
in the task placement logic to avoid CPUs which are under high
irqload. The decay factor is 3/4. Note that with this algorithm the
tracked irqload quantity will be higher than the actual irq time
observed in any single window. Some sample outcomes with steady
irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is
used as a threshold in a subsequent patch):
irqload per window load value asymptote # windows to > 10
2ms 8 n/a
3ms 12 7
4ms 16 4
5ms 20 3
Of course irqload will not be constant in each window, these are just
given as simple examples.
Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During sched boost RT tasks currently end up going to the lowest
power cluster. This can be a performance bottleneck especially if
the frequency and IPC differences between clusters are high.
Furthermore, when RT tasks go over to the little cluster during
boost, the load balancer keeps attempting to pull work over to the
big cluster. This results in pre-emption of the executing RT task
causing more delays. Finally, containing more work on a single
cluster during boost might help save some power if the little
cluster can then enter deeper low power modes.
Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.
Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.
This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine-grained manner.
Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
Make criteria for notifying governor to be per-cpu. Governor is
notified of any large change in cpu's busy time statistics
(rq->prev_runnable_sum) since the last reported value.
Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
rq->curr/prev_runnable_sum counters represent cpu demand from various
tasks that have run on a cpu. Any task that runs on a cpu will have a
representation in rq->curr_runnable_sum. Their partial_demand value
will be included in rq->curr_runnable_sum. Since partial_demand is
derived from historical load samples for a task, rq->curr_runnable_sum
could represent "inflated/un-realistic" cpu usage. As an example, lets
say that task with partial_demand of 10ms runs for only 1ms on a cpu.
What is included in rq->curr_runnable_sum is 10ms (and not the actual
execution time of 1ms). This leads to cpu busy time being reported on
the upside causing frequency to stay higher than necessary.
This patch fixes cpu busy accounting scheme to strictly represent
actual usage. It also provides for conditional fixup of busy time upon
migration and upon heavy-task wakeup.
CRs-Fixed: 691443
Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in init_task_load(),
se.avg.decay_count has deprecated.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we send notification to governor not taking note of cpus
that are synchronized with regard to their frequency. As a result,
scheduler could send pointless notifications (notification spam!).
Avoid this by considering synchronized cpus and alerting governor only
when the highest demand of any cpu within cluster far exceeds or falls
behind current frequency.
Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A couple bugs exist with incorrect use of cpu_online_mask in
pre/post_big_small_task() functions, leading to potentially incorrect
computation of load_scale_factor/capacity/nr_big/small_tasks.
pre/post_big_small_task_count_change() use cpu_online_mask in an
unreliable manner. While local_irq_disable() in
pre_big_small_task_count_change() ensures a cpu won't go away in
cpu_online_mask, nothing prevents a cpu from coming online
concurrently. As a result, cpu_online_mask used in
pre_big_small_task_count_change() can be inconsistent with that used
in post_big_small_task_count_change() which can lead to an attempt to
unlock rq->lock which was not taken before.
Secondly, when either max_possible_freq or min_max_freq is changing,
it needs to trigger recomputation of load_scale_factor and capacity
for *all* cpus, even if some are offline. Otherwise, an offline cpu
could later come online with incorrect load_scale_factor/capacity.
While it should be sufficient to scan online cpus for
updating their nr_big/small_tasks in
post_big_small_task_count_change(), unfortunately it sounds pretty
hard to provide a stable cpu_online_mask when its called from
cpufreq_notifier_policy(). cpufreq framework can trigger a
CPUFREQ_NOTIFY notification in multiple contexts, some in cpu-hotplug
paths, which makes it pretty hard to guess whether get_online_cpus()
can be taken without causing deadlocks or not. To workaround the
insufficient information we have about the hotplug-safety context when
CPUFREQ_NOTIFY is issued, have post_big_small_task_count_change()
traverse all possible cpus in updating nr_big/small_task_count.
CRs-Fixed: 717134
Change-Id: Ife8f3f7cdfd77d5a21eee63627d7a3465930aed5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The current WINDOW_STATS_AVG policy is actually a misnomer since it
uses the maximum value of the runtime in the recent window and the
average of the past ravg_hist_size windows. Add a policy that only
uses the average and call it WINDOW_STATS_AVG policy. Rename all the
other polices to make them shorter and unambiguous.
Change-Id: I080a4ea072a84a88858ca9da59a4151dfbdbe62c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Following commit efcad25cbfb (revert "sched: influence cpu_power based
on max_freq and efficiency), all CPUs in the system have the same
cpu_power and consequently the same group capacity. Therefore, the
check in bail_inter_cluster_balance() can now no longer be used to
distinguish a higher performance cluster from one with lower
performance. The check is currently broken and always returns true for
every load balancing attempt. Fix this by using runqueue capacity
instead which can still be used as a good measure of cluster
capabilities.
Change-Id: Idecfd1ed221d27d4324b20539e5224a92bf8b751
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Several configuration variable change will result in
reset_all_window_stats() being called. All of them, except
sched_set_window(), are serialized via policy_mutex. Take
policy_mutex in sched_set_window() as well to serialize use of
reset_all_window_stats() function
Change-Id: Iada7ff8ac85caa1517e2adcf6394c5b050e3968a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
"disabled" mode (sched_disble_window_stats = 1) disables all
window-stats related activity. This is useful when changing key
configuration variables associated with window-stats feature (like
policy or window size).
Change-Id: I9e55c9eb7f7e3b1b646079c3aa338db6259a9cfe
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support legacy mode, which results in busy time being seen by governor
that is close to what it would have seen via existing APIs i.e
get_cpu_idle_time_us(), get_cpu_iowait_time_us() and
get_cpu_idle_time_jiffy(). In particular, legacy mode means that only
task execution time is counted in rq->curr_runnable_sum and
rq->prev_runnable_sum. Also task migration does not result in
adjustment of those counters.
Change-Id: If374ccc084aa73f77374b6b3ab4cd0a4ca7b8c90
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| | |
Remove code duplication associated with update of various window-stats
related sysctl tunables
Change-Id: I64e29ac065172464ba371a03758937999c42a71f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Make RAVG_HIST_SIZE available from /proc/sys/kernel/sched_ravg_hist_size
to allow tuning of the size of the history that is used in computation
of task demand.
CRs-fixed: 706138
Change-Id: Id54c1e4b6e974a62d787070a0af1b4e8ce3b4be6
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in sysctl.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
Expand rq->curr_runnable_sum and rq->prev_runnable_sum to be 64-bit
counters as otherwise they can easily overflow when a cpu has many
tasks.
Change-Id: I68ab2658ac6a3174ddb395888ecd6bf70ca70473
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| | |
Add sysctl interface to tune sched_acct_wait_time variable at runtime
Change-Id: I38339cdb388a507019e429709a7c28e80b5b3585
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Account cycles spent by idle cpu handling interrupts (irq or softirq)
towards its busy time.
Change-Id: I84cc084ced67502e1cfa7037594f29ed2305b2b1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
check_for_migration() could run concurrently on multiple cpus,
resulting in multiple tasks wanting to migrate to same cpu. This could
cause cpus to be underutilized and lead to increased scheduling
latencies for tasks. Fix this by serializing select_best_cpu() calls
from cpus running check_for_migration() check and marking selected
cpus as reserved, so that subsequent call to select_best_cpu() from
check_for_migration() will skip reserved cpus.
Change-Id: I73a22cacab32dee3c14267a98b700f572aa3900c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently turning on boost does not immediately trigger migration of
tasks from lower capacity cpus. Tasks could incur migration latency
of up to one timer tick (when check_for_migration() is run).
Fix this by triggering a migration check on cpus with lower capacity
as soon as boost is turned on for first time.
Change-Id: I244649f9cb6608862d87631325967b887b7f4b7e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
|