summaryrefslogtreecommitdiff
path: root/kernel/sched (follow)
Commit message (Collapse)AuthorAge
...
* | | Revert "sched: Use only partial wait time as task demand"Joonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 0e2092e47488 ("sched: Use only partial wait time as task demand") as it causes performance regression. Change-Id: I3917858be98530807c479fc31eb76c0f22b4ea89 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched/deadline: Add basic HMP extensionsSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some HMP extensions have to be supported by all scheduling classes irrespective of them using HMP task placement or not. Add these basic extensions to make deadline scheduling work. Also during the tick, if a deadline task gets throttled, its HMP stats get decremented as part of the dequeue. However, the throttled task does not update its on_rq flag causing HMP stats to be double decremented when update_history() is called as part of a window rollover. Avoid this by checking for throttled deadline tasks before subtracting and adding the deadline tasks load from the rq cumulative runnable avg. Change-Id: I9e2ed6675a730f2ec830f764f911e71c00a7d87a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Fix racy invocation of fixup_busy_time via move_queued_taskVikram Mulukutla2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_task_cpu uses fixup_busy_time to redistribute a task's load information between source and destination runqueues. fixup_busy_time assumes that both source and destination runqueue locks have been acquired if the task is not being concurrently woken up. However this is no longer true, since move_queued_task does not acquire the destination CPU's runqueue lock due to optimizations brought in by recent kernels. Acquire both source and destination runqueue locks before invoking set_task_cpu in move_queued_tasks. Change-Id: I39fadf0508ad42e511db43428e52c8aa8bf9baf6 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in move_queued_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: don't inflate the task load when the CPU max freq is restrictedPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the CPU max freq is restricted and the CPU is running at the max freq, the task load is inflated by max_possible_freq/max_freq factor. This results in tasks migrating early to the better capacity CPUs which makes things worse if the frequency restriction is due to the thermal condition. Change-Id: Ie0ea405d7005764a6fb852914e88cf97102c138a Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | sched: auto adjust the upmigrate and downmigrate thresholdsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The load scale factor of a CPU gets boosted when its max freq is restricted. A task load at the same frequency is scaled higher than normal under this scenario. This results in tasks migrating early to the better capacity CPUs and their residency over there also gets increased as their inflated load would be relatively higher than than the downmigrate threshold. Auto adjust the upmigrate and downmigrate thresholds by a factor equal to rq->max_possible_freq/rq->max_freq of a lower capacity CPU. If the adjusted upmigrate threshold exceeds the window size, it is clipped to the window size. If the adjusted downmigrate threshold decreases the difference between the upmigrate and downmigrate, it is clipped to a value such that the difference between the modified and the original thresholds is same. Change-Id: Ifa70ee5d4ca5fe02789093c7f070c77629907f04 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | sched: don't inherit initial task load from the parentPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | child task is not supposed to inherit initial task load attribute from the parent. Reset the child's init_load_pct attribute during fork. Change-Id: I458b121f10f996fda364e97b51aaaf6c345c1dbb Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | sched/fair: Add irq load awareness to the tick CPU selection logicOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IRQ load is not taken into account when determining whether a task should be migrated to a different CPU. A task that runs for a long time could get stuck on CPU with high IRQ load causing degraded performance. Add irq load awareness to the tick CPU selection logic. CRs-fixed: 809119 Change-Id: I7969f7dd947fb5d66fce0bedbc212bfb2d42c8c1 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: disable IRQs in update_min_max_capacitySteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | IRQs must be disabled while locking runqueues since an interrupt may cause a runqueue lock to be acquired. CRs-fixed: 828598 Change-Id: Id66f2e25ed067fc4af028482db8c3abd3d10c20f Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Use only partial wait time as task demandSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scheduler currently either considers a tasks entire wait time as task demand or completely ignores wait time based on the tunable sched_account_wait_time. Both approaches have their limitations, however. The former artificially boosts tasks demand when it may not actually be justified. With the latter, the scheduler runs the risk of never being able to recognize true load (consider two CPU hogs on a single little CPU). To achieve a compromise between these two extremes, change the load tracking algorithm to only consider part of a tasks wait time as its demand. The portion of wait time accounted as demand is determined by each tasks percent load, i.e. a task that waits for 10ms and has 60 % task load, only 6 ms of the wait will contribute to task demand. This approach is more fair as the scheduler now tries to determine how much of its wait time would a task actually have been using the CPU if it had been executing. It ensures that tasks with high demand continue to see most of the benefits of accounting wait time as busy time, however, lower demand tasks don't experience a disproportionately high boost to demand triggering unjustified big CPU usage. Note that this new approach is only applicable to wait time being considered as task demand and not wait time considered as CPU busy time. To achieve the above effect, ensure that anytime a task is waiting, its runtime in every relevant window segment is appropriately adjusted using its pct load. Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: fix race conditions where HMP tunables changeJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When multiple threads race to update HMP scheduler tunables, at present, the tunables which require big/small task count fix-up can be updated without fix-up and it can trigger BUG_ON(). This happens because sched_hmp_proc_update_handler() acquires rq locks and does fix-up only when number of big/small tasks affecting tunables are updated even though the function sched_hmp_proc_update_handler() calls set_hmp_defaults() which re-calculates all sysctl input data at that point. Consequently a thread that is trying to update a tunable which does not affect big/small task count can call set_hmp_defaults() and update big/small task count affecting tunable without fix-up if there is another thread and it just set fix-up needed sysctl value. Example of problem scenario : thread 0 thread 1 Set sched_small_task – needs fix up. Set sched_init_task_load – no fix up needed. proc_dointvec_minmax() completed which means sysctl_sched_small_task has new value. Call set_hmp_defaults() without lock/fixup. set_hmp_defaults() still updates sched_small_tasks with new sysctl_sched_small_task value by thread 0. Fix such issue by embracing proc update handler with already existing policy mutex. CRs-fixed: 812443 Change-Id: I7aa4c0efc1ca56e28dc0513480aca3264786d4f7 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: check HMP scheduler tunables validityJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Check tunables validity to take valid values only. CRs-fixed: 812443 Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Update max_capacity when an entire cluster is hotpluggedSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an entire cluster is hotplugged, the scheduler's notion of max_capacity can get outdated. This introduces the following inefficiencies in behavior: * task_will_fit() does not return true on all tasks. Consequently all big tasks go through fallback CPU selection logic skipping C-state and power checks in select_best_cpu(). * During boost, migration_needed() return true unnecessarily causing an avoidable rerun of select_best_cpu(). * An unnecessary kick is sent to all little CPUs when boost is set. * An opportunity for early bailout from nohz_kick_needed() is lost. Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback which indicates the last CPU in a cluster being hotplugged out. Also modify update_min_max_capacity() to only iterate through online CPUs instead of possible CPUs. While we can't guarantee the integrity of the cpu_online_mask in the notifier callback, the scheduler will fix up all state soon after any changes to the online mask. The change does have one side effect; early termination from the notifier callback when min_max_freq or max_possible_freq remain unchanged is no longer possible. This is because when the last CPU in a cluster is hot removed, only max_capacity is updated without affecting min_max_freq or max_possible_freq. Therefore, when the first CPU in the same cluster gets hot added at a later point max_capacity must once again be recomputed despite there being no change in min_max_freq or max_possible_freq. Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Ensure attempting load balance when HMP active balance flags are setSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | find_busiest_group() can end up returning a NULL group due to load based checks even though there are tasks that can be migrated to higher capacity CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible (LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set. Since sched boost also falls under the same category club it into the same generic condition. Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: add scheduling latency tracking procfs nodeJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the worst scheduling latency. It provides easier way to identify maximum scheduling latency seen across the CPUs. Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e [joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: warn/panic upon excessive scheduling latencyJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and /proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the cases that tasks are runnable but not scheduled more than configured time. This helps to find out unacceptably high scheduling latency more easily. Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82 [joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched/core: Fix incorrect wait time and wait count statisticsJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present scheduler resets task's wait start timestamp when the task migrates to another rq. This misleads scheduler itself into reporting less wait time than actual by omitting time spent for waiting prior to migration and also more wait count than actual by counting migration as wait end event which can be seen by trace or /proc/<pid>/sched with CONFIG_SCHEDSTATS=y. Carry forward migrating task's wait time prior to migration and don't count migration as a wait end event to fix such statistics error. In order to determine whether task is migrating mark task->on_rq with TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration. Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ohaugan@codeaurora.org Link: http://lkml.kernel.org/r/20151113033854.GA4247@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [joonwoop@codeaurora.org: fixed minor conflict in detach_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Change-Id: I2d7f7d9895815430ad61383e62d28d889cce66c3
* | | sched: Update cur_freq in the cpufreq policy notifier callbackSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At boot, the cpufreq framework sends transition notifiers before sending out the policy notifier. Since the scheduler relies on the policy notifier to build up the frequency domain masks, when the initial set of transition notifiers are sent, the scheduler has no frequency domains. As a result the scheduler fails to update the cur_freq information. Update cur_freq as part of the policy notifier so that the scheduler always has the current frequency information. Change-Id: I7bd2958dfeb064dd20b9ccebafd372436484e5d6 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: avoid CPUs with high irq activity for non-small tasksJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The irq-aware scheduler is to achieve better performance by avoiding task placement to the CPUs which have high irq activity. However current scheduler places tasks to the CPUs which are loaded by irq activity preferably as opposed to what it is meant to be when the task is non-small. This is suboptimal for both power and performance. Fix task placement algorithm to avoid CPUs with significant irq activities. Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: actively migrate big tasks on power CPU to idle performance CPUJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When performance CPU runs idle or newly idle load balancer to pull a task on power efficient CPU, the load balancer always fails and enters idle mode if the big task on the power efficient CPU is running. This is suboptimal when the running task on the power efficient CPU doesn't fit on the power efficient CPU as it's quite possible that the big task will sustain on the power efficient CPU until it's preempted while there is a performance CPU sitting idle. Revise load balancer algorithm to actively migrate big tasks on power efficient CPU to performance CPU when performance CPU runs idle or newly idle load balancer. Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in group_classify().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Add cgroup-based criteria for upmigrationSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It may be desirable to discourage upmigration of tasks belonging to some cgroups. Add a per-cgroup flag (upmigrate_discourage) that discourages upmigration of tasks of a cgroup. Tasks of the cgroup are allowed to upmigrate only under overcommitted scenario. Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Use new cgroup APIs] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: avoid running idle_balance() on behalf of wrong CPUJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most power efficient idle CPU among the CPUs in its sched domain level under the condition that the substitute idle CPU should be limited to a CPU which has the same capacity with original idle CPU. It is found that at present idle_balance() spans all the CPUs in its sched domain and run idle balancer on behalf of any CPU within the domain which could be all the CPUs in the system which consequently makes idle balancer on a performance CPU always runs on behalf of a power efficient idle CPU. This would cause for idle performance CPUs to fail to pull tasks from power efficient CPUs always when there is only an online performance CPU. Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to run idle balancre on behalf of more power efficient CPU but still has the same capacity with original CPU to fix such issue. Change-Id: I0575290c24f28db011d9353915186e64df7e57fe Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Keep track of average nr_big_tasksSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend sched_get_nr_running_avg() API to return average nr_big_tasks, in addition to average nr_running and average nr_io_wait tasks. Also add a new trace point to record values returned by sched_get_nr_running_avg() API. Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Resolve trivial merge conflicts] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Fix bug in average nr_running and nr_iowait calculationSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_get_nr_running_avg() returns average nr_running and nr_iowait task count since it was last invoked. Fix several bugs in their calculation. * sched_update_nr_prod() needs to consider that nr_running count can change by more than 1 when CFS_BANDWIDTH feature is used * sched_get_nr_running_avg() needs to sum up nr_iowait count across all cpus, rather than just one * sched_get_nr_running_avg() could race with sched_update_nr_prod(), as a result of which it could use curr_time which is behind a cpu's 'last_time' value. That would lead to erroneous calculation of average nr_running or nr_iowait. While at it, fix also a bug in BUG_ON() check in sched_update_nr_prod() function and remove unnecessary nr_running argument to sched_update_nr_prod() function. Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Avoid pulling all tasks from a CPU during load balanceSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running load balance, the destination CPU checks the number of running tasks on the busiest CPU without holding the busiest CPUs runqueue lock. This opens the load balancer to a race whereby a third CPU running load balance at the same time; having found the same busiest group and queue, may have already pulled one of the waiting tasks from the busiest CPU. Under scenarios where the source CPU is running the idle task and only a single task remains waiting on the busiest runqueue (nr_running = 1), the destination CPU will end up pulling the only enqueued task from that CPU, leaving the destination CPU with nothing left to run. Fix this race, by reconfirming nr_running for the busiest CPU, after its runqueue lock has been obtained. Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Avoid pulling big tasks to the little cluster during load balanceSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a lower capacity CPU attempts to pull work from a higher capacity CPU, during load balance, it does not distinguish between tasks that will fit or not fit on the destination CPU. This causes suboptimal load balancing decisions whereby big tasks end up on the lower capacity CPUs and little tasks remain on higher capacity CPUs. Avoid this behavior, by first restricting search to only include tasks that fit on the destination CPU. If such a task cannot be found, remove this restriction so that any task can be pulled over to the destination CPU. This behavior is not applicable during sched_boost, however, as none of the tasks will fit on a lower capacity CPU. Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: fix rounding error on scaled execution time calculationJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's found that the scaled execution time can be less than its actual time due to rounding errors. The HMP scheduler accumulates scaled execution time of tasks to determine if tasks are in need of up-migration. But the rounding error prevents the HMP scheduler from accumulating 100% load which prevents us from ever reaching an up-migrate of 100%. Fix rounding error by rounding quotient up. CRs-fixed: 759041 Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched/fair: Respect wake to idle over sync wakeupOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sync wakeup currently takes precedence over wake to idle flag. A sync wakeup causes a task to be placed on a non-idle CPU because we expect this CPU to become idle very shortly. However, even though the sync flag is set there is no guarantee that the task will go to sleep right away As a consequence performance suffers. Fix this by preferring an idle CPU over a potential busy cpu when both wake to idle and sync wakeup are set. Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d CRs-fixed: 794424 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: Support CFS_BANDWIDTH feature in HMP schedulerSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CFS_BANDWIDTH feature is not currently well-supported by HMP scheduler. Issues encountered include a kernel panic when rq->nr_big_tasks count becomes negative. This patch fixes HMP scheduler code to better handle CFS_BANDWIDTH feature. The most prominent change introduced is maintenance of HMP stats (nr_big_tasks, nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in addition to being maintained in each 'struct rq'. This allows HMP stats to be updated easily when a group is throttled on a cpu. Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in dequeue_task_fair().]
* | | sched: Consolidate hmp stats into their own structSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Key hmp stats (nr_big_tasks, nr_small_tasks and cumulative_runnable_average) are currently maintained per-cpu in 'struct rq'. Merge those stats in their own structure (struct hmp_sched_stats) and modify impacted functions to deal with the newly introduced structure. This cleanup is required for a subsequent patch which fixes various issues with use of CFS_BANDWIDTH feature in HMP scheduler. Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in __update_load_avg().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Add userspace interface to set PF_WAKE_UP_IDLESrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_prefer_idle flag controls whether tasks can be woken to any available idle cpu. It may be desirable to set sched_prefer_idle to 0 so that most tasks wake up to non-idle cpus under mostly_idle threshold and have specialized tasks override this behavior through other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available idle cpu independent of sched_prefer_idle flag setting. Currently only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task. This patch adds a user-space API (in /proc filesystem) to set PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle file can be written to set or clear PF_WAKE_UP_IDLE flag for a given task. Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/fair.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched_avg: add run queue averagingJeff Ohlstein2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add code to calculate the run queue depth of a cpu and iowait depth of the cpu. The scheduler calls in to sched_update_nr_prod whenever there is a runqueue change. This function maintains the runqueue average and the iowait of that cpu in that time interval. Whoever wants to know the runqueue average is expected to call sched_get_nr_running_avg periodically to get the accumulated runqueue and iowait averages for all the cpus. Change-Id: Id8cb2ecf0ed479f090a83ccb72dd59c53fa73e0c Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org> (cherry picked from commit 0299fcaaad80e2c0ac9aa583c95107f6edc27750) [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: add sched feature FORCE_CPU_THROTTLING_IMMINENTJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform migration due to EA without checking frequency throttling. This option can give us better debugging and verification capability. Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583 [joonwoop@codeaurora.org: fixed trivial conflict in kernel/sched/features.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: continue to search less power efficient cpu for load balancerJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When choosing a CPU to do power-aware active balance from the load balancer currently selects the first eligible CPU it finds, even if there is another eligible CPU which is higher-power. This can lead to suboptimal load balancing behavior and extra migrations. Power and performance will be impacted. Achieve better power and performance by continuing to search the least power efficient cpu as long as the cpu's load average is higher than or equal to the busiest cpu found by far. CRs-fixed: 777341 Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Update cur_freq for offline CPUs in notifier callbackSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq governor does not send frequency change notifications for offline CPUs. This means that a hot removed CPU's cur_freq information can get stale if there is a frequency change while that CPU is offline. When the offline CPU is hotplugged back in, all subsequent load calculations are based off the stale information until another frequency change occurs and the corresponding set of notifications are sent out. Avoid this incorrect load tracking by updating the cur_freq for all CPUs in the same frequency domain. Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Fix overflow in max possible capacity calculationOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | The max possible capacity calculation might overflow given large enough max possible frequency and capacity. Fix potential for overflow. Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: add preference for prev_cpu in HMP task placementSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present the HMP task placement algorithm scans CPUs in numerical order and if two identical options are found, the first one encountered is chosen, even if it is different from the task's previous CPU. Add a bias towards the task's previous CPU in such situations. Any time two or more CPUs are considered equivalent (load, C-state, power cost), if one of them is the task's previous CPU, bias towards that CPU. The algorithm is otherwise unchanged. CRs-Fixed: 772033 Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Per-cpu prefer_idle flagSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the global sysctl_sched_prefer_idle flag and replace it with a per-cpu prefer_idle flag. The per-cpu flag is expected to same for all cpus in a cluster. It thus provides convenient means to disable packing in one cluster while allowing packing in another cluster. Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()Srivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl_sched_prefer_idle controls selection of idle cpus for waking tasks. In some cases, waking to idle cpus help performance while in other cases it hurts (as tasks incur latency associated with C-state wakeup). Its ideal if scheduler can adapt prefer_idle behavior based on the task that is waking up, but that's hard for scheduler to figure by itself. PF_WAKE_UP_IDLE hint can be provided by external module/driver in such case to guide scheduler in preferring an idle cpu for select tasks irrespective of sysctl_sched_prefer_idle flag. This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a hint for scheduler to prefer idle cpu for waking tasks. Similarly scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on idle cpu when they wakeup. CRs-Fixed: 773101 Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Add sysctl to enable power aware schedulingOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sysctl to enable energy awareness at runtime. This is useful for performance/power tuning/measurements and debugging. In addition this will match up with the Documentation/scheduler/sched-hmp.txt documentation. Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched: Ensure no active EA migration occurs when EA is disabledOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There exists a flag called "sched_enable_power_aware" that is not honored everywhere. Fix this. Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched: take account of irq preemption when calculating irqload deltaJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If irq raises while sched_irqload() is calculating irqload delta, sched_account_irqtime() can update rq's irqload_ts which can be greater than the jiffies stored in sched_irqload()'s context so delta can be negative. This negative delta means there was recent irq occurence. So remove improper BUG_ON(). CRs-fixed: 771894 Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Prevent race conditions where upmigrate_min_nice changesJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger BUG_ON(rq->nr_big_tasks < 0). This happens when there is a task which was considered as non-big task due to its nice > upmigrate_min_nice and later upmigrate_min_nice is changed to higher value so the task becomes big task. In this case runqueue still has nr_big_tasks = 0 incorrectly with current implementation. Consequently next scheduler tick sees a big task to schedule and try to decrease nr_big_tasks which is already 0. Introduce sched_upmigrate_min_nice which is updated atomically and re-count the number of big and small tasks to fix BUG_ON() triggering. Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Avoid frequent task migration due to EA in lbOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new tunable exists that allow task migration to be throttled when the scheduler tries to do task migrations due to Energy Awareness (EA). This tunable is only taken into account when migrations occur in the tick path. Extend the usage of the tunable to take into account the load balancer (lb) path also. In addition ensure that the start of task execution on a CPU is updated correctly. If a task is preempted but still runnable on the same CPU the start of execution should not be updated. Only update the start of execution when a task wakes up after sleep or moves to a new CPU. Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed conflict in group_classify() and set_task_cpu().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Avoid migrating tasks to little cores due to EAOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If during the check whether migration is needed we find that there is a lower power CPU available we commence to find a new CPU for this task. However, by the time we search for a new CPU the lower power CPU might no longer be available. We should abort the attempt to migrate a task in this case. CRs-Fixed: 764788 Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: Add temperature to cpu_load trace pointOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add the current CPU temperature to the sched_cpu_load trace point. This will allow us to track the CPU temperature. CRs-Fixed: 764788 Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: Only do EA migration when CPU throttling is imminentOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not want to migrate tasks unnecessary to avoid cache hit and other migration latencies that could affect the performance of the system. Add a check to only try EA migration when CPU frequency throttling is imminent. CRs-Fixed: 764788 Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched: Avoid frequent migration of running taskSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Power values for cpus can drop quite considerably when it goes idle. As a result, the best choice for running a single task in a cluster can vary quite rapidly. As the task keeps hopping cpus, other cpus go idle and start being seen as more favorable target for running a task, leading to task migrating almost every scheduler tick! Prevent this by keeping track of when a task started running on a cpu and allowing task migration in tick path (migration_needed()) on account of energy efficiency reasons only if the task has run sufficiently long (as determined by sysctl_sched_min_runtime variable). Note that currently sysctl_sched_min_runtime setting is considered only in scheduler_tick()->migration_needed() path and not in idle_balance() path. In other words, a task could be migrated to another cpu which did a idle_balance(). This limitation should not affect high-frequency migrations seen typically (when a single high-demand task runs on high-performance cpu). CRs-Fixed: 756570 Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and __schedule().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: treat sync waker CPUs with 1 task as idleSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU with one task performs a sync wakeup, its one task is expected to sleep immediately so this CPU should be treated as idle for the purposes of CPU selection for the waking task. This is only done when idle CPUs are the preferred targets for non-small task wakeups. When prefer_idle is 0, the CPU is left as non-idle in the selection logic so it is still a preferred candidate for the sync wakeup. Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: extend sched_task_load tracepoint to indicate prefer_idleSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prefer idle determines whether the scheduler prefers an idle CPU over a busy CPU or not to wake up a task on. Knowing the correct value of this tunable is essential in understanding placement decisions made in select_best_cpu(). Change-Id: I955d7577061abccb65d01f560e1911d9db70298a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: extend sched_task_load tracepoint to indicate sync wakeupSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Sync wakeups provide a hint to the scheduler about upcoming task activity. Knowing which wakeups are sync wakeups from logs will assist in workload analysis. Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda Signed-off-by: Steve Muckle <smuckle@codeaurora.org>