summaryrefslogtreecommitdiff
path: root/kernel/sched (follow)
Commit message (Collapse)AuthorAge
...
* | | sched: do not balance on exec if SCHED_HMPSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rebalancing at exec time will currently undo any beneficial placement that has been done during fork time, since select_best_cpu() will not discount the currently running task. For now just skip re-evaluating task placement at exec. Change-Id: I1e5e0fcc329b7b53c338c8c73795ebd5e85a118b Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Use historical load for freq governor inputSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historical load maintained per task can be used to influence cpu frequency better. For example, when a heavy demand task wakes up after prolonged sleep, we could use the historical load information to alert cpufreq governor about the need to raise cpu frequency. This patch changes CPU busy statistics to be aggregation of historical task demand. Also task's historical load (as defined by sysctl_sched_window_stats_policy) is add to cpu's busy statistics (rq->curr_runnable_sum) whenever it executes on a cpu. Change-Id: I2b66136f138b147ba19083b9b044c4feb20d9b57 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched: window-stats: apply scaling to full elapsed windowsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the event that a full window (or multiple full windows) have elapsed when updating a task's window-based stats, the runtime of those windows needs to be scaled based on the CPU frequency. This is currently missing, causing full windows to be accounted as having elapsed at maximum frequency, erroneously inflating task demand. Change-Id: I356b4279d44d4f39c8aea881c04327b70ed66183 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: notify cpufreq on over/underprovisioned CPUsSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a migration occurs the source and destination CPUs may not be running at frequencies which match the new task load on those CPUs. Previously, the scheduler was notifying cpufreq anytime a task greater than a certain size migrates. This is suboptimal however since this does not take into account the CPU's current frequency and other task activity that may be present. Change-Id: I5092bda3a517e1343f97e5a455957c25ee19b549 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Introduce spill threshold tunables to manage overcommitmentSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the number of tasks intended for a cluster exceed the number of mostly idle CPUs in that cluster, the scheduler currently freely uses CPUs in other clusters if possible. While this is optimal for performance the power trade off can be quite significant. Introduce spill threshold tunables that govern the extent to which the scheduler should attempt to contain tasks within a cluster. Change-Id: I797e6c6b2aa0c3a376dad93758abe1d587663624 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed conflict in nohz_kick_needed()] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: add affinity, task load information to sched tracepointsSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Knowing the affinity mask and CPU usage of a task is helpful in understanding the behavior of the system. Affinity information has been added to the enq_deq trace event, and the migration tracepoint now reports the load of the task migrated. Change-Id: I29d8a610292b4dfeeb8fe16174e9d4dc196649b7 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: add migration load change notifier for frequency guidanceSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a task moves between CPUs in two different frequency domains the cpufreq governor may wish to immediately modify the frequency of both the source and destination CPUs of the migrating task. A tunable is provided to establish what size task is considered "significant" enough to warrant notifying cpufreq. Also fix a bug that would cause load to not be accounted properly during wakeup migrations. Change-Id: Ie8f6b1cc4d43a602840dac18590b42a81327c95a Signed-off-by: Steve Muckle <smuckle@codeaurora.org> [rameezmustafa@codeaurora.org: Add double rq locking for set_task_cpu()] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched/fair: Limit MAX_PINNED_INTERVAL for more frequent load balancingSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Should the system get stuck in a state where load balancing is failing due to all tasks being pinned, deferring load balancing for up to half a second may cause further performance problems. Eventually all tasks will not be pinned and load balancing should not be deferred for a great length of time. Change-Id: I06f93b5448353b5871645b9274ce4419dc9fae0f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched/fair: Help out higher capacity CPUs when they are overcommittedSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a task to schedule, we currently don't consider CPUs where it will not fit even if they are idle. Instead we choose the previous CPU which is sub-optimal for performance if an idle CPU is present. This change introduces tracking of any idle CPUs irrespective of whether the task fits on them or not. If we don't have a good place to put the task, prefer the lowest power idle CPU. Change-Id: I4e8290639ad1602541a44a80ba4b2804068cac0f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched/rt: Introduce power aware scheduling for real time tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Real Time task scheduling has historically been geared towards performance with a significant attempt to keep higher priority tasks on the same CPU. This is not optimal for power since the task CPU may not be the most power efficient CPU. Also task movement via select_lowest_rq() gives CPU priority the primary consideration before looking at CPU topologies to find a CPU closest to the task CPU in terms of topology. This again is not optimal for power since the closest CPU may be significantly worse for power than CPUs further away. This patch removes any bias for the task CPU. When the lowest priority CPUs in the system are found we give no consideration to the CPU topology. Instead we find the lowest power CPU within local_cpu_mask. This takes care of select_task_rq_rt() and push_task(). The pull model remains unaffected since we have no room for power optimization there. Change-Id: I4162ebe2f74be14240e62476f231f9e4a18bd9e8 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: s/__get_cpu_var/this_cpu_ptr/] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: balance power inefficient CPUs with one taskSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the load balancer does not pay attention to CPUs with one task since it is not possible to subdivide that load any further to achieve better balance. With power aware scheduling however it may be desirable to relocate that one task if the CPU it is currently executing on is less power efficient than other CPUs in the system. Change-Id: Idf3f5e22b88048184323513f0052827b884526b6 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: group_capacity_factor changed to group_no_capacity. group_classify is called by update_sg_lb_stats() too.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: check for power inefficient task placement in tickSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although tasks are routed to the most power-efficient CPUs during task wakeup, a CPU-bound task will not go through this decision point. Load balancing can help if it is modified to dislodge a single task from an inefficient CPU. The situation can be further improved if during the tick, the task placement is checked to see if it is optimal. This sort of checking is already being done to ensure proper task placement in heterogneous CPU topologies, so checking for power efficient task placement fits pretty well. Change-Id: I71e56d406d314702bc26dee1438c0eeda7699027 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: do nohz load balancing in order of power efficiencySteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nohz load balancer CPU does load balancing on behalf of all idle tickless CPUs. In the interest of power efficiency though, we should do load balancing on the most power efficient idle tickless CPU first, and then work our way towards the least power efficient idle tickless CPU. This will help load find its way to the most power efficient CPUs in the system. Since when selecting the CPU to balance next it is unknown what task load would be pulled, a frequency must be assumed in order to do a comparison of CPU power consumption. The maximum freqeuncy supported by all CPUs is used for this. Change-Id: I96c7f4300fde2c677c068dc10fc0e57f763eb9b2 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in nohz_idle_balance().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: run idle_balance() on most power-efficient CPUSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU goes idle, it checks to see whether it can pull any load from other busy CPUs. The CPU going idle may not be the most power-efficient idle CPU in the system however. This patch causes the CPU going idle to check to see whether there is a more power-efficient idle CPU within the same lowest sched domain. If there is, then it runs the load balancer on behalf of that CPU instead of itself. Since it is unknown at this point what task load would be pulled, a frequency must be assumed for this in order to do a comparison of CPU power consumption. The maximum freqeuncy supported by all CPUs is used for this. Change-Id: I5eedddc1f7d10df58ecd358f37dba563eeecf4fc Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed minor conflict around comment.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: add hook for platform-specific CPU power informationSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To enable power-aware scheduling, provide a hook/infrastructure for platforms to communicate CPU power requirements for each supported CPU frequency. This information is then used to estimate the cost of running a task on a given CPU. Currently, an assumption is made that the task will be running by itself on the CPU. Given the current policy tries to spread tasks as much as possible this assumption should not be too far off. Change-Id: I19f1fa760a0d43222d2880f8aec0508c468b39bb Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: return rq->capacity as power cost with sched_use_pelt=1. se.avg.runnable_avg_period is deprecated and power_cost() will be changed by subsequent change anyway./ Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: add power aware scheduling sysctlSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sched_enable_power_aware sysctl will control whether or not scheduling decisions are influenced by the power consumption of individual CPUs. Change-Id: I312f892cf76a3fccc4ecc8aa6703908b205267f0 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Extend update_task_ravg() to accept wallclock as argumentSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will make it easier to account interrupt time on a cpu, introduced in a subsequent patch. Change-Id: I0e1fb5255c280ca374fd255e7fc19d5de9f8b045 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched: add sched_get_busy, sched_set_window APIsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_get_busy() returns the busy time of a cpu during the most recent completed window. sched_set_window() will set window size and aligns windows across all CPUs. Change-Id: Ic53e27f43fd4600109b7b6db979e1c52c7aca103 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in include/linux/sched.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: window-stats: adjust RQ curr, prev sums on task migrationSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adjust cpu's busy time in its recent and previous window upon task migration. This would enable scheduler to provide better inputs to cpufreq governor on a cpu's busy time in a given window. Change-Id: Idec2ca459382e9f46d882da3af53148412d631c6 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Add aggregated runqueue windowed statsSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add counters per-cpu to track its busy time in the latest window and one previous to that. This would be needed to track accurate busy time per-cpu that accounts for migrations. Basically once a task migrates, its execution time in current window is migrated as well to new cpu. The idle task's runtime is not accounted since it should not count towards runqueue busy time. Change-Id: I4014dd686f95dbbfaa4274269bc36ed716573421 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: add prev_window counter per-taskSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently windows where tasks had no execution time are ignored. However accurate accounting of cpu busy time that factors in migration would need to know actual utilization of a task in the window previous to the latest one. This would help scheduler guide cpufreq governor on busy time per-cpu that is not subject to migration induced errors. Change-Id: I5841b1732c83e83d69002139de3bdb93333ce347 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: synchronize windows across cpusSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synchronizing windows across cpus for task load measurements simplifies cpu busy time accounting during migrations. For task migrations, its usage in current window can be carried over to its new cpu. This lets cpufreq governor see a correct picture of cpu busy time that is not affected by migrations. This patch lines up windows across cpus. One of the cpu, sync_cpu, serves as a reference for all others. During bootup sync_cpu would initialize its window_start (from its sched_clock()). Other cpus will synchronize their window_start in reference to sync_cpu. This patch assumes synchronous sched_clock() across cpus and may need some change to address architectures which do not provide such synchronized sched_clock(). Change-Id: I13381389a72f5f9f85cc2446401d493a55c78ab7 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Do not account wait timeSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Task load statistics are used for two purposes : cpu frequency management and placement. Task's load can't be accurately judged by its wait time. For ex: a task could have waited for 10ms and when given opportunity to run, could just execute for 1ms. Accounting for 11ms as task's demand could be over-stating its needs in this example. For now, remove wait time from task demand and instead let task load be derived from its actual exec time. This may need to become a tunable feature. Change-Id: I47e94c444c6b44c3b0810347287d50f1ee685038 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: update during migration and earlier at wakeupSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During migrations accounting needs to be done in set_task_cpu() to subtract the task activity from the source CPU and add it to the destination CPU. This accounting will require that the task's window based load statistics be up to date. Unfortunately, the window-based statistics cannot always be updated in set_task_cpu() because they are already being updated in the wakeup path. We cannot update the statistics solely in the wakeup path because not all wakeups are migrations. Those non-migrating wakeups will not enter set_task_cpu(). To ensure the window-based stats are always updated for both wakeup migrations and regular migrations, they are updated earlier in the wakeup path, and also updated in set_task_cpu if the task is already runnable (this ensures it is not a wakeup migration, but a regular migration). Change-Id: Ib246028741d0be9bb38ce93679d6e6ba25b10756 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in fair.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: move definition of update_task_ravg()Srivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_task_cpu() will need to call update_task_ravg(). Move up definition to make it easy. Change-Id: I95c1c9e009bd1805f28708e8d6fd3b7b2166410e Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Switch to windows based load stats by defaultSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set window-based load stats to be the default mechanism under which tasks get classified (as big/small) and which will drive frequency demand for tasks. sched_ravg_window kernel parameter can be used to change this default setting to use PELT (per-entity load tracking) scheme instead. Change-Id: I626110daa0bb2b53172bedea829d31877255ceaa Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Provide tunable to switch between PELT and window-based statsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a runtime tunable to switch between using PELT-based load stats and window-based load stats. This will be needed for runtime analysis of the two load tracking schemes. Change-Id: I018f6a90b49844bf2c4e5666912621d87acc7217 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Provide scaled load information for tasks in /procSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Extend "sched" file in /proc for every task to provide information on scaled load statistics and percentage-scaled based load (load_avg) for a task. This will be valuable debug aid. Change-Id: I6ee0394b409c77c7f79f5b9ac560da03dc879758 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Add additional ftrace eventsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two ftrace events: sched_task_load -> records information of a task, such as scaled demand sched_cpu_load -> records information of a cpu, such as nr_running, nr_big_tasks etc This will be useful to debug HMP related task placement decisions by scheduler. Change-Id: If91587149bcd9bed157b5d2bfdecc3c3bf6652ff Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Extend /proc/sched_debug with additional informationSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide additional information in /proc/sched_debug for every cpu. This will be a valuable debug aid. Change-Id: If22ee530e880cd21719242be7bc2c41308ad4186 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Tighten controls for tasks spillover to idle clusterSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several conditions can cause an idle cluster to pick up load from a busy cluster. One such condition is when busy cluster has number of tasks that exceeds its capacity (or number of cpus). This patch extends that condition to consider small and big tasks on a cluster. Too many "small" tasks should not cause them to spill over to another idle cluster. Like-wise presence of big tasks should be considered by a cluster to pick up load from another another cluster with lower capacity. Change-Id: I0545bf2989c37217d84ed18756c6f5c8946d5ae5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minior conflict in fair.c.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Track number of big and small tasks on a cpuSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds 'nr_big_tasks' and 'nr_small_tasks' per-cpu counters that tracks number of big and small tasks on a cpu respectively. This will be used in load balance decisions introduced in a subsequent patch. Change-Id: Ia174904140f81dd6d1946286889a50be3f16ea83 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fix conflicts in fair.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Handle cpu-bound tasks stuck on wrong cpuSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPU-bound tasks that don't sleep for long intervals can stay stuck on the wrong cpu, as the selection of "ideal" cpu for tasks largely happens during task wakeup time. This patch adds a check in the scheduler tick for task/cpu mismatch (big task on little cpu OR little task on big cpu) and forces migration of such tasks to their ideal cpu (via select_best_cpu()). Change-Id: Icac3485b6aa4b558c4ed9df23c2e81fb8f4bb9d9 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Extend active balance to accept 'push_task' argumentSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Active balance currently picks one task to migrate from busy cpu to a chosen cpu (push_cpu). This patch extends active load balance to recognize a particular task ('push_task') that needs to be migrated to 'push_cpu'. This capability will be leveraged by HMP-aware task placement in a subsequent patch. Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Send NOHZ kick to idle cpu in same clusterSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A busy cpu will kick (via IPI) one of the idle cpus in tickless state to run load balance and help move tasks off itself. The cpu chosen to receive kick is simply the "first" idle cpu found in nohz.idle_cpus_mask. This could cause unnecessary wakeups of a cluster. A better choice would be to look for an idle cpu that is in the same cluster as busy cpu, which should minimize cluster wakeups. Change-Id: Ia63038d7c34b416b53c8feef3c3b31dab5200e42 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict about return value.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Basic task placement support for HMP systemsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMP systems have cpus with different power and performance characteristics. Some cpus could offer better power at cost of lower performance while other cpus could offer better performance at cost of higher power. As a result, bandwidth consumed by a task to do some "fixed" amount of work could vary across cpus. Optimal task placement on HMP would involve placing a task on a cpu where it can meet its performance goals at lowest power cost. Since kernel has little to no awareness of performance goals of applications, we guestimate whether task is meeting its performance goals or not by looking at its cpu bandwidth consumption. High bandwidth consumption could imply that task's performance can improve by running on cpus with better capacity/performance-characterisitcs. This patch makes the basic changes to support HMP. It provides a configurable threshold and any task consuming bandwidth in excess of threshold will be placed on a cpu with better capacity. Change-Id: I3fd98edd430f73342fbef06411e8b2d1cf2f56fa Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict about members of p->se which are not available anymore.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Use rq->efficiency in scaling load statsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Extend task load scaling function to account for cpu efficiency factor. Task load is scaled in reference to "most" efficient cpu. Change-Id: I7bf829211a6e1293076e8ba0f93b4f6abcf20d92 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Introduce efficiency, load_scale_factor and capacitySrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Efficiency reflects instructions per cycle capability of a cpu. load_scale_factor reflects magnification factor that is applied for task load when estimating bandwidth it will consume on a cpu. It accounts for the fact that task load is scaled in reference to "best" cpu that has best efficiency factor and also best possible max_freq. Note that there may be no single CPU in the system that has both the best efficiency and best possible max_freq, but that is still the combination that all task load in the system is scaled against. capacity reflects max_freq and efficiency metric of a cpu. It is defined such that the "least" performing cpu (one with lowest efficiency factor and max_freq) gets capacity of 1024. Again, there may not be a CPU in the system that has both the lowest efficiency and lowest max_freq. This is still the combination that is assigned a capacity of 1024 however, other CPU capacities are relative to this. Change-Id: I4a853f1f0f90020721d2a4ee8b10db3d226b287c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Add CONFIG_SCHED_HMP Kconfig optionSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a compile-time flag to enable or disable scheduler features for HMP (heterogenous multi-processor) systems. Main feature deals with optimizing task placement for best power/performance tradeoff. Also extend features currently dependent on CONFIG_SCHED_FREQ_INPUT to be enabled for CONFIG_HMP as well. Change-Id: I03b3942709a80cc19f7b934a8089e1d84c14d72d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor ifdefry conflict.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Add scaled task load statisticsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scheduler guided frequency selection as well as task placement on heterogeneous systems require scaled task load statistics. This patch adds a 'runnable_avg_sum_scaled' metric per task that is a scaled derivative of 'runnable_avg_sum'. Load is scaled in reference to "best" cpu, i.e one with best possible max_freq Change-Id: Ie8ae450d0b02753e9927fb769aee734c6d33190f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: incoporated with change 9d89c257df (" sched/fair: Rewrite runnable load and utilization average tracking"). Used container_of() to get sched_entity.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Introduce CONFIG_SCHED_FREQ_INPUTSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a compile time flag to enable scheduler guidance of frequency selection. This flag is also used to turn on or off window-based load stats feature. Having a compile time flag will let some platforms avoid any overhead that may be present with this scheduler feature. Change-Id: Id8dec9839f90dcac82f58ef7e2bd0ccd0b6bd16c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict around sysctl_timer_migration.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: window-based load stats improvementsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following cleanups and improvements are made to window-based load stats feature: * Add sysctl to pick max, avg or most recent samples as task's demand. * Fix overflow possibility in calculation of sum for average policy. * Use unscaled statistics when a task is running on a CPU which is thermally throttled. Change-Id: I8293565ca0c2a785dadf8adb6c67f579a445ed29 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Add min_max_freq and rq->max_possible_freqSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rq->max_possible_freq represents the maximum frequency a cpu is capable of attaining, while rq->max_freq represents the maximum frequency a cpu can attain at a given instant. rq->max_freq includes constraints imposed by user or thermal driver. rq->max_freq <= rq->max_possible_freq. max_possible_freq is derived as max(rq->max_possible_freq) and represents the "best" cpu that can attain best possible frequency. min_max_freq is derived as min(rq->max_possible_freq). For homogeneous systems, max_possible_freq and min_max_freq will be same, while they could be different on heterogeneous systems. Change-Id: Iec485fde35cfd33f55ebf2c2dce4864faa2083c5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict around max_possible_freq.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: move task load based functionsSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | The task load based functions will need to make use of LOAD_AVG_MAX in a subsequent patch, so move them below the definition of that macro. Change-Id: I02f18ba069b81033e611f8f8bba6dccd7cd81252 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: fix race between try_to_wake_up() and move_task()Steve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until a task's state has been seen as interruptible/uninterruptible and it is no longer on_cpu, it is possible that the task may move to another CPU (load balancing may cause this). Here is an example where the race condition results in incorrect operation: - cpu 0 calls put_prev_task on task A, task A's state is TASK_RUNNING - cpu 0 runs task B, which attempts to wake up A - cpu 0 begins try_to_wake_up(), recording src_cpu for task A as cpu 0 - cpu 1 then pulls task A (perhaps due to idle balance) - cpu 1 runs task A, which then sleeps, becoming INTERRUPTIBLE - cpu 0 continues in try_to_wake_up(), thinking task A's previous cpu is 0, where it is actually 1 - if select_task_rq returns cpu 0, task A will be woken up on cpu 0 without properly updating its cpu to 0 in set_task_cpu() CRs-Fixed: 665958 Change-Id: Icee004cb320bd8edfc772d9f74e670a9d4978a99 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Skip load update for idle taskSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Load statistics for idle tasks is not useful in any manner. Skip load update for such idle tasks. CRs-Fixed: 665706 Change-Id: If3a908bad7fbb42dcb3d0a1d073a3750cf32fcf9 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Window-based load stat improvementsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some tasks can have a sporadic load pattern such that they can suddenly start running for longer intervals of time after running for shorter durations. To recognize such sharp increase in tasks' demands, max between the average of 5 window load samples and the most recent sample is chosen as the task demand. Make the window size (sched_ravg_window) configurable at boot up time. To prevent users from setting inappropriate values for window size, min and max limits are defined. As 'ravg' struct tracks load for both real-time and non real-time tasks it is moved out of sched_entity struct. In order to prevent changing function signatures for move_tasks() and move_one_task() per-cpu variables are defined to track the total load moved. In case multiple tasks are selected to migrate in one load balance operation, loads > 100 could be sent through migration notifiers. Prevent this scenario by setting mnd.load to 100 in such cases. Define wrapper functions to compute cpu demands for tasks and to change rq->cumulative_runnable_avg. Change-Id: I9abfbf3b5fe23ae615a6acd3db9580cfdeb515b4 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Rohit Gupta <rohgup@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18 and squash "dcf7256 sched: window-stats: Fix overflow bug" into this patch.] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in __migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Call the notify_on_migrate notifier chain for wakeups as wellRohit Gupta2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a change to send notify_on_migrate hints on wakeups of foreground tasks from scheduler if their load is above wakeup_load_thresholds (default value is 60). These hints can be used to choose an appropriate CPU frequency corresponding to the load of the task being woken up. By default sched_wakeup_load_threshold is set to 60 and therefore wakeup hints are sent out for those tasks whose loads are higher that value. This might cause unnecessary wakeup boosts to happen when load based syncing is turned ON for cpu-boost. Disable the wake up hints by setting the sched_wakeup_load_threshold to a value higher than 100 so that wakeup boost doesnt happen unless it is explicitly turned ON from adb shell. Change-Id: Ieca413c1a8bd2b14a15a7591e8e15d22925c42ca Signed-off-by: Rohit Gupta <rohgup@codeaurora.org> [rameezmustafa@codeaurora.org: Squash "a26fcce sched: Disable wakeup hints for foreground tasks by default" into this patch and update commit text.] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | cpufreq: cpu-boost: Introduce scheduler assisted load based syncsRohit Gupta2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, on getting a migration notification cpu-boost changed the scaling min of the destination frequency to match that of the source frequency or sync_threshold whichever was minimum. If the scheduler migration notification is extended with task load (cpu demand) information, the cpu boost driver can use this load to compute a suitable frequency for the migrating task. The required frequency for the task is calculated by taking the load percentage of the max frequency and no sync is performed if the load is less than a particular value (migration_load_threshold).This change is beneficial for both perf and power as demand of a task is taken into consideration while making cpufreq decisions and unnecessary syncs for lightweight tasks are avoided. The task load information provided by scheduler comes from a window-based load collection mechanism which also normalizes the load collected by the scheduler to the max possible frequency across all CPUs. Change-Id: Id2ba91cc4139c90602557f9b3801fb06b3c38992 Signed-off-by: Rohit Gupta <rohgup@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in __migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: window-based load stats for tasksSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a metric per task that specifies how cpu bound a task is. Task execution is monitored over several time windows and the fraction of the window for which task was found to be executing or wanting to run is recorded as task's demand. Windows over which task was sleeping are ignored. We track last 5 recent windows for every task and the maximum demand seen in any of the previous 5 windows (where task had some activity) drives freq demand for every task. A per-cpu metric (rq->cumulative_runnable_avg) is also provided which is an aggregation of cpu demand of all tasks currently enqueued on it. rq->cumulative_runnable_avg will be useful to know if cpu frequency will need to be changed to match task demand. Change-Id: Ib83207b9ba8683cd3304ee8a2290695c34f08fe2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in ttwu_do_wakeup() to incorporate with changed trace_sched_wakeup() location.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>