summaryrefslogtreecommitdiff
path: root/kernel/sched (follow)
Commit message (Collapse)AuthorAge
...
* | | sched: add sync wakeup recognition in select_best_cpuSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a wakeup is a sync wakeup, we need to discount the currently running task's load from the waker's CPU as we calculate the best CPU for the waking task to land on. Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Provide knob to prefer mostly_idle over idle cpusSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl_sched_prefer_idle lets the scheduler bias selection of idle cpus over mostly idle cpus for tasks. This knob could be useful to control balance between power and performance. Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: make sched_cpu_high_irqload a runtime tunableSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | It may be desirable to be able to alter the scehd_cpu_high_irqload setting easily, so make it a runtime tunable value. Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: trace: extend sched_cpu_load to print irqloadSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | The irqload is used in determining whether CPUs are mostly idle so it is useful to know this value while viewing scheduler traces. Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: avoid CPUs with high irq activitySteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | CPUs with significant IRQ activity will not be able to serve tasks quickly. Avoid them if possible by disqualifying such CPUs from being recognized as mostly idle. Change-Id: I2c09272a4f259f0283b272455147d288fce11982 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: refresh sched_clock() after acquiring rq lock in irq pathSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wallclock time passed to sched_account_irqtime() may be stale after we wait to acquire the runqueue lock. This could cause problems in update_task_ravg because a different CPU may have advanced this CPU's window_start based on a more up-to-date wallclock value, triggering a BUG_ON(window_start > wallclock). Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: track soft/hard irqload per-RQ with decaying avgSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scheduler currently ignores irq activity when deciding which CPUs to place tasks on. If a CPU is getting hammered with IRQ activity but has no tasks it will look attractive to the scheduler as it will not be in a low power mode. Track irqload with a decaying average. This quantity can be used in the task placement logic to avoid CPUs which are under high irqload. The decay factor is 3/4. Note that with this algorithm the tracked irqload quantity will be higher than the actual irq time observed in any single window. Some sample outcomes with steady irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is used as a threshold in a subsequent patch): irqload per window load value asymptote # windows to > 10 2ms 8 n/a 3ms 12 7 4ms 16 4 5ms 20 3 Of course irqload will not be constant in each window, these are just given as simple examples. Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: do not set window until sched_clock is fully initializedSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The system initially uses a jiffy-based sched clock. When the platform registers a new timer for sched_clock, sched_clock can jump backwards. Once sched_clock_postinit() runs it should be safe to rely on it. Also sched_clock_cpu() relies on completion of sched_clock_init() and until that happens sched_clock_cpu() returns zero. This is used in the irq accounting path which window-based stats relies upon. So do not set window_start until sched_clock_cpu() is working. Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Make RT tasks eligible for boostSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During sched boost RT tasks currently end up going to the lowest power cluster. This can be a performance bottleneck especially if the frequency and IPC differences between clusters are high. Furthermore, when RT tasks go over to the little cluster during boost, the load balancer keeps attempting to pull work over to the big cluster. This results in pre-emption of the executing RT task causing more delays. Finally, containing more work on a single cluster during boost might help save some power if the little cluster can then enter deeper low power modes. Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Limit LBF_PWR_ACTIVE_BALANCE to within clusterSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When higher power (performance) cluster has only one online cpu, we currently let an idle cpu in lower power cluster pull a running task from performance cluster via active balance. Active balance for power-aware reasons is supposed to be restricted to balance within cluster, the check for which is not correctly implemented. Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Packing support until a frequency thresholdSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add another dimension for task packing based on frequency. This patch adds a per-cpu tunable, rq->mostly_idle_freq, which when set will result in tasks being packed on a single cpu in cluster as long as cluster frequency is less than set threshold. Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: tighten up jiffy to sched_clock mappingSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tick code already tracks exact time a tick is expected to arrive. This can be used to eliminate slack in the jiffy to sched_clock mapping that aligns windows between a caller of sched_set_window and the scheduler itself. Change-Id: I9d47466658d01e6857d7457405459436d504a2ca Signed-off-by: Steve Muckle <smuckle@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Avoid unnecessary load balance when tasks don't fit on dst_cpuSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When considering to pull over a task that does not fit on the destination CPU make sure that the busiest group has exceeded its capacity. While the change is applicable to all groups, the biggest impact will be on migrating big tasks to little CPUs. This should only happen when the big cluster is no longer capable of balancing load within the cluster. This change should have no impact on single cluster systems. Change-Id: I6d1ef0e0d878460530f036921ce4a4a9c1e1394b Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: print sched_cpu_load tracepoint for all CPUsSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When select_best_cpu() is called because a task is on a suboptimal CPU, certain CPUs are skipped because moving the task there would not make things any better. For the purposes of debugging though it is useful to always see the state of all CPUs. Change-Id: I76965663c1feef5c4cfab9909e477b0dcf67272d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: per-cpu mostly_idle thresholdSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack tasks on cpus to some extent. In some cases, it may be desirable to have different packing limits for different cpus. For example, pack to a higher limit on high-performance cpus compared to power-efficient cpus. This patch removes the global mostly_idle tunables and makes them per-cpu, thus letting task packing behavior to be controlled in a fine-grained manner. Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Add API to set task's initial task loadSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a per-task attribute, init_load_pct, that is used to initialize newly created children's initial task load. This helps important applications launch their child tasks on cpus with highest capacity. Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict int init_new_task_load. se.avg.runnable_avg_sum has deprecated.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: use C-states in non-small task wakeup placement logicSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when a non-small task wakes up, the task placement logic first tries to find the least loaded CPU before breaking any ties via the power cost of running the task on those CPUs. When the power cost is also same, however, the scheduler just selects the first CPU it came across. Use C-states to further break ties when the power cost is the same for multiple CPUs. The scheduler will now pick a CPU in the shallowest C-state. Change-Id: Ie1401b305fa02758a2f7b30cfca1afe64459fc2b Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: take rq lock prior to saving idle task's mark_startSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the idle task is being re-initialized during hotplug its mark_start value must be retained. The runqueue lock must be held when reading this value though to serialize this with other CPUs that could update the idle task's window-based statistics. CRs-Fixed: 743991 Change-Id: I1bca092d9ebc32a808cea2b9fe890cd24dc868cd Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: update governor notification logicSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make criteria for notifying governor to be per-cpu. Governor is notified of any large change in cpu's busy time statistics (rq->prev_runnable_sum) since the last reported value. Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Retain idle thread's mark_startSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | init_idle() is called on a cpu's idle-thread once at bootup and subsequently everytime the cpu is hot-added. Since init_idle() calls __sched_fork(), we end up blowing idle thread's ravg.mark_start value. As a result we will fail to accurately maintain cpu's curr/prev_runnable_sum counters. Below example illustrates such a failure: CS = curr_runnable_sum, PS = prev_runnable_sum t0 -> New window starts for CPU2 <after some_task_activity> CS = X, PS = Y t1 -> <cpu2 is hot-removed. idle_task start's running on cpu2> At this time, cpu2_idle_thread.ravg.mark_start = t1 t1 -> t0 + W. One window elapses. CPU2 still hot-removed. We defer swapping CS and PS until some future task event occurs t2 -> CPU2 hot-added. _cpu_up()->idle_thread_get()->init_idle() ->__sched_fork() results in cpu2_idle_thread.ravg.mark_start = 0 t3 -> Some task wakes on cpu2. Since mark_start = 0, we don't swap CS and PS => which is a BUG! Fix this by retaining idle task's original mark_start value during init_idle() call. Change-Id: I4ac9bfe3a58fb5da8a6c7bc378c79d9930d17942 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Add checks for frequency changeOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to check for frequency change when a task is migrated due to affinity change and during active balance. Change-Id: I96676db04d34b5b91edd83431c236a1c28166985 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed minor conflict in core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Use absolute scale for notifying governorSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the tunables used for deciding the need for notification to be on absolute scale. The earlier scale (in percent terms relative to cur_freq) does not work well with available range of frequencies. For example, 100% tunable value would work well for lower range of frequencies and not for higher range. Having the tunable to be on absolute scale makes tuning more realistic. Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Enhance cpu busy time accountingSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rq->curr/prev_runnable_sum counters represent cpu demand from various tasks that have run on a cpu. Any task that runs on a cpu will have a representation in rq->curr_runnable_sum. Their partial_demand value will be included in rq->curr_runnable_sum. Since partial_demand is derived from historical load samples for a task, rq->curr_runnable_sum could represent "inflated/un-realistic" cpu usage. As an example, lets say that task with partial_demand of 10ms runs for only 1ms on a cpu. What is included in rq->curr_runnable_sum is 10ms (and not the actual execution time of 1ms). This leads to cpu busy time being reported on the upside causing frequency to stay higher than necessary. This patch fixes cpu busy accounting scheme to strictly represent actual usage. It also provides for conditional fixup of busy time upon migration and upon heavy-task wakeup. CRs-Fixed: 691443 Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in init_task_load(), se.avg.decay_count has deprecated.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: window-stats: ftrace event improvementsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two new ftrace event: * trace_sched_freq_alert, to log notifications sent to governor for requesting change in frequency. * trace_sched_get_busy, to log cpu busytime information returned by scheduler Extend existing ftrace events as follows: * sched_update_task_ravg() event to log irqtime parameter * sched_migration_update_sum() to log threadid which is being migrated (and thus responsible for update of curr_runnable_sum and prev_runnable_sum counters) Change-Id: Ia68ce0953a2d21d319a1db7f916c51ff6a91557c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: improve logic for alerting governorSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we send notification to governor not taking note of cpus that are synchronized with regard to their frequency. As a result, scheduler could send pointless notifications (notification spam!). Avoid this by considering synchronized cpus and alerting governor only when the highest demand of any cpu within cluster far exceeds or falls behind current frequency. Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Stop task migration to busy CPUs due to power active balanceSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Power active balance should only be invoked when the destination CPU is calling load balance with either a CPU_IDLE or a CPU_NEWLY_IDLE environment. We do not want to push tasks towards busy CPUs even they are a more power efficient place to run that task. This can cause higher scheduling latencies due to the resulting load imbalance. Change-Id: I8e0f242338887d189e2fc17acfb63586e7c40839 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: window-stats: Fix accounting bug in legacy modeSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | TASK_UPDATE event currently does not result in increment of rq->curr_runnable_sum in legacy mode, which is wrong. As a result, cpu busy time reported under legacy mode could be incorrect. Change-Id: Ifa76c735a0ead23062c1a64faf97e7b801b66bf9 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Note legacy mode in fork() and exit()Srivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In legacy mode, mark_task_starting() should avoid adding (new) task's (initial) demand to rq->curr_runnable_sum and rq->prev_runnable_sum. Similarly exit() should avoid removing (exiting) task's demand from rq->curr_runnable_sum and rq->prev_runnable_sum (as those counters don't include task's demand and partial_demand values in legacy mode). Change-Id: I26820b1ac5885a9d681d363ec53d6866a2ea2e6f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Fix reference to stale task_struct in try_to_wake_up()Srivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | try_to_wake_up() currently drops p->pi_lock and later checks for need to notify cpufreq governor on task migrations or wakeups. However the woken task could exit between the time p->pi_lock is released and the time the test for notification is run. As a result, the test for notification could refer to an exited task. task_notify_on_migrate(p) could thus lead to invalid memory reference. Fix this by running the test for notification with task's pi_lock held. Change-Id: I1c7a337473d2d8e79342a015a179174ce00702e1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Remove hack to enable/disable HMP scheduling extensionsSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current method of turning HMP scheduling extensions on or off based on the number of CPUs is inappropriate as there may be SoCs with 4 or less cores that require the use of these extensions. Remove this hack as HMP extensions will now be enabled/disabled via command line options. Change-Id: Id44b53c2c3b3c3b83e1911a834e2c824f3958135 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: fix wrong load_scale_factor/capacity/nr_big/small_tasksSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple bugs exist with incorrect use of cpu_online_mask in pre/post_big_small_task() functions, leading to potentially incorrect computation of load_scale_factor/capacity/nr_big/small_tasks. pre/post_big_small_task_count_change() use cpu_online_mask in an unreliable manner. While local_irq_disable() in pre_big_small_task_count_change() ensures a cpu won't go away in cpu_online_mask, nothing prevents a cpu from coming online concurrently. As a result, cpu_online_mask used in pre_big_small_task_count_change() can be inconsistent with that used in post_big_small_task_count_change() which can lead to an attempt to unlock rq->lock which was not taken before. Secondly, when either max_possible_freq or min_max_freq is changing, it needs to trigger recomputation of load_scale_factor and capacity for *all* cpus, even if some are offline. Otherwise, an offline cpu could later come online with incorrect load_scale_factor/capacity. While it should be sufficient to scan online cpus for updating their nr_big/small_tasks in post_big_small_task_count_change(), unfortunately it sounds pretty hard to provide a stable cpu_online_mask when its called from cpufreq_notifier_policy(). cpufreq framework can trigger a CPUFREQ_NOTIFY notification in multiple contexts, some in cpu-hotplug paths, which makes it pretty hard to guess whether get_online_cpus() can be taken without causing deadlocks or not. To workaround the insufficient information we have about the hotplug-safety context when CPUFREQ_NOTIFY is issued, have post_big_small_task_count_change() traverse all possible cpus in updating nr_big/small_task_count. CRs-Fixed: 717134 Change-Id: Ife8f3f7cdfd77d5a21eee63627d7a3465930aed5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: add check for cpu idleness when using C-state informationSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Task enqueue on a CPU occurs prior to that CPU exiting an idle state. For the time duration between enqueue and idle exit, the CPU C-state information can no longer be relied on for further task placement since already enqueued/waiting tasks are not taken into account. The small task placement algorithm implicitly assumes a non zero C-state implies an idle CPU. Since this assumption is incorrect for the duration described above, make the cpu_idle() check explicit. This problem can lead to task packing beyond the mostly_idle threshold. Change-Id: Idb5be85705d6b15f187d011ea2196e1bfe31dbf2 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: extend sched_task_load tracepoint to indicate small tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While debugging its always useful to know whether a task is small or not to determine the scheduling algorithm being used. Have the sched_task_load tracepoint indicate this information rather than having to do manual calculations for every task placement. Change-Id: Ibf390095f05c7da80df1ebfe00f4c5af66c97d12 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Add C-state tracking to the sched_cpu_load trace eventSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C-state information is used by the scheduler for small task placement decisions. Track this information in the sched_cpu_load trace event. Also add the trace event in best_small_task_cpu(). This will help better understand small task placement decisions. Change-Id: Ife5f05bba59f85c968fab999bd13b9fb6b1c184e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: window-stats: add a new AVG policySyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current WINDOW_STATS_AVG policy is actually a misnomer since it uses the maximum value of the runtime in the recent window and the average of the past ravg_hist_size windows. Add a policy that only uses the average and call it WINDOW_STATS_AVG policy. Rename all the other polices to make them shorter and unambiguous. Change-Id: I080a4ea072a84a88858ca9da59a4151dfbdbe62c Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Fix compile errorSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_get_busy(), sched_set_io_is_busy() and sched_set_window() need to be defined only when CONFIG_SCHED_FREQ_INPUT is defined, otherwise we get compilation error related to dual definition of those routines Change-Id: Ifd5c9b6675b78d04c2f7ef0e24efeae70f7ce19b Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in include/linux/sched.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: update ld_moved for active balance from the load balancerSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ld_moved is currently left set to 0 when the load balancer calls upon active balance. This behavior is incorrect as it prevents the termination of load balance for parent sched domains. Currently the feature is used quite frequently for power active balance and sched boost. This means that while sched boost is in effect we could run into a scenario where a more power efficient newly idle big CPU first triggers active migration from a less power efficient busy big CPU. It then continues to load balance at the cluster level causing active migration for a task running on a little CPU. Consequently the more power efficient big CPU ends up with two tasks where as the less power efficient big CPU may become idle. Fix this problem by updating ld_moved when active migration has been requested. Change-Id: I52e84eafb77249fd9378ebe531abe2d694178537 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: actively migrate tasks to idle big CPUs during sched boostSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sched boost feature is currently tick driven, i.e. task placement decisions only take place at a tick (or wakeup). The load balancer does not have any knowledge of boost being in effect. Tasks that are woken up on a little CPU when all big CPUs are busy will continue executing there at least until the next tick even if one of the big CPUs becomes idle. Reduce this latency by adding support for detecting whether boost is in effect or not in the load balancer. If boost is in effect any big CPU running idle balance will trigger active migration from a little CPU with the highest task load. Change-Id: Ib2828809efa0f9857f5009b29931f63b276a59f3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: always do idle balance with a NEWLY_IDLE idle environmentSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of energy aware scheduling, if idle_balance() is to be called on behalf of a different CPU which is idle, CPU_IDLE is used in the environment for load_balance(). This, however, introduces subtle differences in load calculations and policies in the load balancer. For example there are restrictions on which CPU is permitted to do load balancing during !CPU_NEWLY_IDLE (see update_sg_lb_stats) and find_busiest_group() uses different criteria to detect the presence of a busy group. There are other differences as well. Revert back to using the NEWLY_IDLE environment irrespective of whether idle_balance() is called for the newly idle CPU or on behalf on already existing idle CPU. This will ensure that task movement logic while doing idle balance remains unaffected. Change-Id: I388b0ad9a38ca550667895c8ed19628f3d25ce1a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: fix bail condition in bail_inter_cluster_balance()Syed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following commit efcad25cbfb (revert "sched: influence cpu_power based on max_freq and efficiency), all CPUs in the system have the same cpu_power and consequently the same group capacity. Therefore, the check in bail_inter_cluster_balance() can now no longer be used to distinguish a higher performance cluster from one with lower performance. The check is currently broken and always returns true for every load balancing attempt. Fix this by using runqueue capacity instead which can still be used as a good measure of cluster capabilities. Change-Id: Idecfd1ed221d27d4324b20539e5224a92bf8b751 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Initialize env->loop variable to 0Srivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load_balance() function does not explicitly initialize env->loop variable to 0. As a result, there is a vague possibility of move_tasks() hitting a very long (unnecessary) loop when its unable to move tasks from src_cpu. This can lead to unpleasant results like a watchdog bark. Fix this by explicitly initializing env->loop variable to 0 (in both load_balance() and active_load_balance_cpu_stop()). Change-Id: I36b84c91a9753870fa16ef9c9339db7b706527be Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: use policy_mutex in sched_set_window()Srivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several configuration variable change will result in reset_all_window_stats() being called. All of them, except sched_set_window(), are serialized via policy_mutex. Take policy_mutex in sched_set_window() as well to serialize use of reset_all_window_stats() function Change-Id: Iada7ff8ac85caa1517e2adcf6394c5b050e3968a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Avoid taking all cpu's rq->lock for longSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reset_all_window_stats() walks task-list with all cpu's rq->lock held, which can cause spinlock timeouts if task-list is huge (and hence lead to a spinlock bug report). Avoid this by walking task-list without cpu's rq->lock held. Change-Id: Id09afd8b730fa32c76cd3bff5da7c0cd7aeb8dfb Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window_stats: Add "disable" mode supportSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "disabled" mode (sched_disble_window_stats = 1) disables all window-stats related activity. This is useful when changing key configuration variables associated with window-stats feature (like policy or window size). Change-Id: I9e55c9eb7f7e3b1b646079c3aa338db6259a9cfe Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Fix exit raceSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exiting tasks are removed from tasklist and hence at some point will become invisible to do_each_thread/for_each_thread task iterators. This breaks the functionality of reset_all_windows_stats() which *has* to reset stats for *all* tasks. This patch causes exiting tasks stats to be reset *before* they are removed from tasklist. DONT_ACCOUNT bit in exiting task's ravg.flags is also marked so that their remaining execution time is not accounted in cpu busy time counters (rq->curr/prev_runnable_sum). reset_all_windows_stats() is thus guaranteed to return with all task's stats reset to 0. Change-Id: I5f101156a4f958c1b3f31eb0db8cd06e621b75e9 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: code cleanupSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Provide a wrapper function to reset task's window statistics. This will be reused by a subsequent patch Change-Id: Ied7d32325854088c91285d8fee55d5a5e8a954b3 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: legacy modeSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support legacy mode, which results in busy time being seen by governor that is close to what it would have seen via existing APIs i.e get_cpu_idle_time_us(), get_cpu_iowait_time_us() and get_cpu_idle_time_jiffy(). In particular, legacy mode means that only task execution time is counted in rq->curr_runnable_sum and rq->prev_runnable_sum. Also task migration does not result in adjustment of those counters. Change-Id: If374ccc084aa73f77374b6b3ab4cd0a4ca7b8c90 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Code cleanupSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Collapse duplicated comments about keeping few of sysctl knobs initialized to same value as their non-sysctl copies Change-Id: Idc8261d86b9f36e5f2f2ab845213bae268ae9028 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Code cleanupSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Remove code duplication associated with update of various window-stats related sysctl tunables Change-Id: I64e29ac065172464ba371a03758937999c42a71f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Code cleanupSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add_task_demand() and 'long_sleep' calculation in it are not strictly required. rq_freq_margin() check for need to change frequency, which removes need for long_sleep calculation. Once that is removed, need for add_task_demand() vanishes. Change-Id: I936540c06072eb8238fc18754aba88789ee3c9f5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minior conflict in core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>