summaryrefslogtreecommitdiff
path: root/kernel (follow)
Commit message (Collapse)AuthorAge
* sched: fix compile error where !CONFIG_SCHED_FREQ_INPUTJoonwoo Park2016-03-23
| | | | | | | | The sysctl node sched_new_task_windows is only for CONFIG_SCHED_HMP and CONFIG_SCHED_FREQ_INPUT. Change-Id: I4791e977fa8516fd2cd31198f71103b8d7e874c3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: fix compile failure where !CONFIG_SCHED_HMPJoonwoo Park2016-03-23
| | | | | | | Fix compile failure when HMP scheduler isn't selected. Change-Id: I411fa3501a4c4ac280c037a1698aa3b7278d440f Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: select task's prev_cpu as the best CPU when it was chosen recentlyJoonwoo Park2016-03-23
| | | | | | | | | | | | | Select given task's prev_cpu when the task slept for short period to reduce latency of task placement and migrations. A new tunable /proc/sys/kernel/sched_select_prev_cpu_us introduced to determine whether tasks are eligible to go through fast path. CRs-fixed: 947467 Change-Id: Ia507665b91f4e9f0e6ee1448d8df8994ead9739a [joonwoop@codeaurora.org: fixed conflict in include/linux/sched.h, include/linux/sched/sysctl.h, kernel/sched/core.c and kernel/sysctl.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: use ktime instead of sched_clock for load trackingJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | At present, HMP scheduler uses sched_clock to setup window boundary to be aligned with timer interrupt to ensure timer interrupt fires after window rollover. However this alignment won't last long since the timer interrupt rearms next timer based on time measured by ktime which isn't coupled with sched_clock. Convert sched_clock to ktime to avoid wallclock discrepancy between scheduler and timer so that we can ensure scheduler's window boundary is always aligned with timer. CRs-fixed: 933330 Change-Id: I4108819a4382f725b3ce6075eb46aab0cf670b7e [joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h and kernel/sched/core.c. omitted fixes for kernel/sched/qhmp_core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Update min/max capacity for the CPUFREQ_CREATE_POLICY notifierSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | Following the change "57e2905 sched: Skip resetting HMP stats when max frequencies remain unchanged" the scheduler fails to update min/max capacities appropriately when CPUs are hot added after being hot removed. Fix this problem by handling the CPUFREQ_CREATE_POLICY notification and explicitly updating min/max capacities. Change-Id: I5dadac3258e18897fa3d505cf128ebe24c091efa Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched/cputime: fix a deadlock on 32bit systemsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | cpu_hardirq_time and cpu_softirq_time are protected with seqlock on 32bit systems. There is a potential deadlock with this seqlock and rq->lock. CPU 1 CPU0 ========================== ======================== --> acquire CPU0 rq->lock --> __irq_enter() ----> task enqueue/dequeue ----> irqtime_account_irq() ------> update_rq_clock() ------> irq_time_write_begin() --------> irq_time_read() --------> sched_account_irqtime() (waiting for the seqlock (waiting for the CPU0 rq->lock) held in irq_time_write_begin() Fix this issue by dropping the seqlock before calling sched_account_irqtime() Change-Id: I29a33876e372f99435a57cc11eada9c8cfd59a3f Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* sched: Optimize scheduler trace events to reduce trace buffer usageSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | Scheduler ftrace events currently generate a lot of data when turned on. The excessive log messages often end up overflowing trace buffers for long use cases or crowding out other events. Optimize scheduler events so that the log spew is less and more manageable. To that end change the variable type for some event fields; introduce variants of sched_cpu_load that can be turned on/off for separate code paths and remove unused fields from various events. Change-Id: I2b313542b39ad5e09a01ad1303b5dfe2c4883b8a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in rt.c due to CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: initialize frequency domain cpumaskJoonwoo Park2016-03-23
| | | | | | | | | | | | | It's possible select_best_cpu() gets called before the first cpufreq notifier call. In such scenario select_best_cpu() can hang forever by not clearing search_cpus. Initialize frequency domain cpumask with the CPU of rq to avoid such scenario. CRs-fixed: 931349 Change-Id: If8d31c5477efe61ad7c6b336ba9e27ca6f556b63 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: print sched_task_load alwaysJoonwoo Park2016-03-23
| | | | | | | | At present select_best_cpu() bails out when best idle CPU found without printing sched_task_load trace event. Print it. Change-Id: Ie749239bdb32afa5b1b704c048342b905733647e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: add preference for prev and sibling CPU in RT task placementJoonwoo Park2016-03-23
| | | | | | | | | Add a bias towards the RT task's previous CPU and sibling CPUs in order to avoid cache bouncing and migrations. CRs-fixed: 927903 Change-Id: I45d79d774e65efcb38282130b6692b4c3b03c2f0 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: core: Don't use current task_cpu when migrating with stop_one_cpuVikram Mulukutla2016-03-23
| | | | | | | | | | | | | | | | | | | | | To migrate a running task using stop_one_cpu, one has to give up the the pi_lock and rq_lock. To safeguard against migration between giving up those locks and actually invoking stop_one_cpu, one has to save away task_cpu(p) before releasing pi_lock, and use the saved value when passing it as the src_cpu argument to stop_one_cpu. If the current task_cpu is passed in, the task may have already been migrated to that CPU for whatever other reason. sched_exec attempts to invoke stop_one_cpu with source CPU set to task_cpu(task) after dropping the pi_lock. While this doesn't result in a functional error, it is rather useless to have the entire migration code run when the task is already running on the destination CPU. Change-Id: I02963ed02c7119a3d707580a191fbc86b94cdfaf Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> [joonwoop@codeaurora.org: omitted changes for qhmp_core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Notify cpufreq governor early about potential big tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | Tasks that are on the runqueue continuously for a certain amount of time have the potential to be big tasks at the end of the window in which they are runnable. In such scenarios ramping the CPU frequency early can boost performance rather than waiting till the end of a window for the governor to query load. Notify the governor early at every tick when a task has been observed to execute beyond some percentage of the tick period. The threshold beyond which a task is eligible for early detection can be changed via the tunable sched_early_detection_duration. The feature itself is enabled only when scheduler boost is in effect. Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in kernel/sched/core.c. fixed minor conflicts in include/linux/sched.h, include/linux/sched/sysctl.h and kernel/sysctl.c due to CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Skip resetting HMP stats when max frequencies remain unchangedSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | A change in cpufreq policy parameters currently trigger a partial reset of HMP stats. This is necessary when there are changes in the max frequency of any cluster since updated load scaling factors necessitate updating the number of big and small tasks on every CPU. However, this computation is redundant when parameters other than the max freq change. Optimize code by avoiding the redundant calculations. Change-Id: Ib572f5dfdc4ada378e695f328ff81e2ce31132ba Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: update sched_task_load trace eventJoonwoo Park2016-03-23
| | | | | | | | | | Add best_cpu and latency field to sched_task_load trace event. The latency field represents combined latency of update_task_ravg(), update_task_ravg() and select_best_cpu() which is useful to analyze latency overhead of HMP scheduler. Change-Id: Ie6d777c918d0414d361d758490e3cd7d509f5837 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: avoid unnecessary multiplication and divisionJoonwoo Park2016-03-23
| | | | | | | | Avoid unnecessary multiplication and division when load scaling factor is 1024. Change-Id: If3cb63a77feaf49cc69ddec7f41cc3c1cabbfc5a Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: precompute required frequency for CPU loadJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | At present in order to estimate power cost of CPU load, HMP scheduler converts CPU load to coresponding frequency on the fly which can be avoided. Optimize and reduce execution time of select_best_cpu() by precomputing CPU load to frequency conversion. This optimization reduces about ~20% of execution time of select_best_cpu() on average. Change-Id: I385c57f2ea9a50883b76ba6ca3deb673b827217f [joonwoop@codeaurora.org: fixed minior conflict in kernel/sched/sched.h. stripped out codes for CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: clean up fixup_hmp_sched_stats()Joonwoo Park2016-03-23
| | | | | | | | | | | | | | | The commit 392edf4969d20 ("sched: avoid stale cumulative_runnable_avg HMP statistics) introduced the callback function fixup_hmp_sched_stats() so update_history() can avoid decrement and increment pair of HMP stat. However the commit also made fixup function to do obscure p->ravg.demand update which isn't the cleanest way. Revise the function fixup_hmp_sched_stats() so the caller can update p->ravg.demand directly. Change-Id: Id54667d306495d2109c26362813f80f08a1385ad [joonwoop@codeaurora.org: stripped out CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: account new task load so that governor can apply different policyJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | Account amount of load contributed by new tasks within CPU load so that governor can apply different policy when CPU is loaded by new tasks. To be able to distinguish new task load a new tunable sched_new_task_windows also introduced. The tunable defines tasks as new when the tasks are have been active less than configured windows. Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59 Suggested-by: Saravana Kannan <skannan@codeaurora.org> [joonwoop@codeaurora.org: ommited changes for drivers/cpufreq/cpufreq_interactive.c. cpufreq changes needs to be applied separately later. fixed conflict in include/linux/sched.h and include/linux/sched/sysctl.h. omitted changes for qhmp_core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Fix frequency change checks when affined tasks are migratingSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | On the 3.18 kernel version and beyond, when the affinity for an enqueued task changes such that migration is required, the rq variable gets updated to the destination rq. This means that check_for_freq_change() skips the source CPU frequency check and instead double checks the destination CPU. Fix this by using the src_cpu variable instead. Change-Id: I14727a34e22c50c9a839007d474802f96a2f49f6 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict int __migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Add tunables for static cpu and cluster costOlav Haugan2016-03-23
| | | | | | | | | | | Add per-cpu tunable to set the extra cost to use a CPU that is idle. Add the same for a cluster. Change-Id: I4aa53f3c42c963df7abc7480980f747f0413d389 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop@codeaurora.org: omitted changes for qhmp*.[c,h] stripped out CONFIG_SCHED_QHMP in drivers/base/cpu.c and include/linux/sched.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched/core: Add API to set cluster d-stateOlav Haugan2016-03-23
| | | | | | | | | | | | Add new API to the scheduler to allow low power mode driver to inform the scheduler about the d-state of a cluster. This can be leveraged by the scheduler to make an informed decision about the cost of placing a task on a cluster. Change-Id: If0fe0fdba7acad1c2eb73654ebccfdb421225e62 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop@codeaurora.org: omitted fixes for qhmp_core.c and qhmp_core.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: take into account of governor's frequency max loadJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | At present HMP scheduler packs tasks to busy CPU till the CPU's load is 100% to avoid waking up of idle CPU as much as possible. Such aggressive packing leads unintended CPU frequency raise as governor raises the busy CPU's frequency when its load is more than configured frequency max load which can be less than 100%. Fix to take into account of governor's frequency max load and pack tasks only when the CPU's projected load is less than max load to avoid unnecessary frequency raise. Change-Id: I4447e5e0c2fa5214ae7a9128f04fd7585ed0dcac [joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/sched.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: set HMP scheduler's default initial task load to 100%Joonwoo Park2016-03-23
| | | | | | | | Set init_task_load to 100% to allow new tasks to wake up on the best performance CPUs. Change-Id: Ie762a3f629db554fb5cfa8c1d7b8b2391badf573 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: add preference for prev and sibling CPU in HMP task placementJoonwoo Park2016-03-23
| | | | | | | | | | | | | At present HMP task placement algorithm places wake-up task on any lowest power cost CPU in the system even if the task's previous CPU is also one of the lowest power cost CPU. Placing task on the previous CPU can reduce cache bouncing. Add a bias towards the task's previous CPU and CPU in the same cache domain with previous CPU. Change-Id: Ieab3840432e277048058da76764b3a3f16e20c56 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Update task->on_rq when tasks are moving between runqueuesOlav Haugan2016-03-23
| | | | | | | | | | | | | | | Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING CRs-fixed: 884720 Change-Id: I1572aba00a0273d4ad5bc9a3dd60fb68e2f0b895 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* sched: remove temporary demand fixups in fixup_busy_time()Syed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | On older kernel versions p->on_rq was a binary value that did not allow distinguishing between enqueued and migrating tasks. As a result fixup_busy_time would have to do temporary load adjustments to ensure that update_history does not do incorrect demand adjustments for migrating tasks. Since p->on_rq can now be used make a distinction between migrating and enqueued tasks, there is no need to do these temporary load calculations. Instead make sure update_history() only does load adjustments on enqueued tasks. Change-Id: I1f800ac61a045a66ab44b9219516c39aa08db087 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: add frequency zone awareness to the load balancerSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | Add zone awareness to the load balancer. Remove all earlier restrictions that the load balancer had for inter cluster kicks and migration. Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due to its return type change.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Update the wakeup placement logic for fair and rt tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | For the fair sched class, update the select_best_cpu() policy to do power based placement. The hope is to minimize the voltage at which the CPU runs. While RT tasks already do power based placement, their placement preference has to now take into account the power cost of all tasks on a given CPU. Also remove the check for sched_boost since sched_boost no longer intends to elevate all tasks to the highest capacity cluster. Change-Id: Ic6a7625c97d567254d93b94cec3174a91727cb87 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: remove the notion of small tasks and small task packingSyed Rameez Mustafa2016-03-23
| | | | | | | | | Task packing will now be determined solely on the basis of the power cost of task placement. All tasks are eligible for packing. Remove the notion of "small" tasks from the scheduler. Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: Rework energy aware schedulingSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | Energy aware core rotation is not compatible with the power based task placement being introduced in subsequent patches. Remove all existing EA based task placement/migration logic. power_cost() is the only function remaining. This function has been modified to return the total power cost associated with a task on a given CPU taking existing load on that CPU into account. Change-Id: Ia00501e3cbfc6e11446a9a2e93e318c4c42bdab4 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed multiple conflicts in fair.c and minor conflict in features.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: encourage idle load balance and discourage active load balanceJoonwoo Park2016-03-23
| | | | | | | | | | | | | Encourage IDLE and NEWLY_IDLE load balance by ignoring cache hotness and discourage active load balance by increasing busy balancing failure threshold. Such changes are for idle CPUs to help out busy CPUs more aggressively and reduce unnecessary active load balance within the same CPU domain. Change-Id: I22f6aba11932ccbb82a436c0532589c46f9148ed [joonwoop@codeaurora.org: fixed conflict in need_active_balance() and can_migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: avoid stale cumulative_runnable_avg HMP statisticsJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | When a new window starts for a task and the task is on a rq, scheduler decreases rq's cumulative_runnable_avg momentarily, re-account task's demand and increases rq's cumulative_runnable_avg with newly accounted task's demand. Therefore there is short time period that rq's cumulative_runnable_avg is less than what it's supposed to be. Meanwhile, there is chance that other CPU is in search of best CPU to place a task and makes suboptimal decision with momentarily stale cumulative_runnable_avg. Fix such issue by adding or subtracting of delta between task's old and new demand instead of decrementing and incrementing of entire task's load. Change-Id: I3c9329961e6f96e269fa13359e7d1c39c4973ff2 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Add load based placement for RT tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | Currently RT tasks prefer to go to the lowest power CPU in the system. This can end up causing contention on the lowest power CPU. Instead ensure that RT tasks end up on the lowest power cluster and the least loaded CPU within that cluster. Change-Id: I363b3d43236924962c67d2fb5d3d2d09800cd994 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: Avoid running idle_balance() consecutivelySyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of "6dd123a sched: update ld_moved for active balance from the load balancer" the function load_balance() returns a non zero number of migrated tasks in anticipation of tasks that will end up on that CPU via active migration. Unfortunately on kernel versions 3.14 and beyond this ends up breaking pick_next_task_fair() which assumes that the load balancer only returns non zero numbers for tasks already migrated on to the destination CPU. A non zero number then triggers a rerun of the pick_next_task_fair() logic so that it can return one of the migrated tasks as the next task. When the load balancer returns a non zero number for tasks that will be moved via active migration, the rerun of pick_next_task_fair() finds the CPU to still have no runnable tasks. This in turn causes a rerun of idle_balance() and possibly migrating another task. Hence the destination CPU can unintentionally end up pulling several tasks. The intent of the change above is still necessary though to indicate termination of load balance at higher scheduling domains when active migration occurs. Achieve the same effect by using continue_balancing instead of faking the number of pulled tasks. This way pick_next_task_fair() stays happy and load balance stops at higher scheduling domains. Change-Id: Id223a3287e5d401e10fbc67316f8551303c7ff96 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: inline function scale_load_to_cpu()Joonwoo Park2016-03-23
| | | | | | | | Inline relatively small and frequently used function scale_load_to_cpu(). CRs-fixed: 849655 Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: look for least busy and fallback CPU only when it's neededJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | Function best_small_task_cpu() has bias on mostly idle CPUs and shallow cstate CPUs. Thus chance of needing to find the least busy or the least power cost fallback CPU is quite rare typically. At present, however, the function finds those two CPUs always unnecessarily for most of time. Optimize the function by amending it to look for the least busy CPU and the least power cost fallback CPU only when those are in need. This change is solely for optimization and doesn't make functional changes. CRs-fixed: 849655 Change-Id: I5eca11436e85b448142a7a7644f422c71eb25e8e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: iterate search CPUs starting from prev_cpu for optimizationJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | Function best_small_task_cpu() looks for a mostly idle CPU and returns it as the best CPU for a given small task. At present, however, it cannot break the CPU search loop when the function found a mostly idle CPU but continues to iterate CPU search loop because the function needs to find and return the given task's previous CPU as the best CPU to avoid unnecessary task migration when the previous CPU is mostly idle. Optimize the function best_small_task_cpu() to iterate search CPUs starting from the given task's CPU so it can break the loop as soon as mostly idle CPU found. This optimization saves few hundreds ns spent by the function and doesn't make any functional change. CRs-fixed: 849655 Change-Id: I8c540963487f4102dac4d54e9f98e24a4a92a7b3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Optimize the select_best_cpu() "for" loopSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | select_best_cpu() is agnostic of the hardware topology. This means that certain functions such as task_will_fit() and skip_cpu() are run unnecessarily for every CPU in a cluster whereas they need to run only once per cluster. Reduce the execution time of select_best_cpu() by ensuring these functions run only once per cluster. The frequency domain mask is used to identify CPUs that fall in the same cluster. CRs-fixed: 849655 Change-Id: Id24208710a0fc6321e24d9a773f00be9312b75de Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: added continue after clearing search_cpus. fixed indentations with space. fixed skip_cpu() to return true when rq == task_rq.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Optimize select_best_cpu() to reduce execution timeSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | select_best_cpu() is a crucial wakeup routine that determines the time taken by the scheduler to wake up a task. Optimize this routine to get higher performance. The following changes have been made as part of the optimization listed in order of how they built on top of one another: * Several routines called by select_best_cpu() recalculate task load and CPU load even though these are already known quantities. For example mostly_idle_cpu_sync() calculates CPU load; task_will_fit() calculates task load before spill_threshold_crossed() recalculates both. Remove these redundant calculations by moving the task load and CPU load computations to the select_best_cpu() 'for' loop and passing to any functions that need the information. * Rewrite best_small_task_cpu() to avoid the existing two pass approach. The two pass approach was only in place to find the minimum power cluster for small task placement. This information can easily be established by looking at runqueue capacities. The cluster with not the highest capacity constitutes the minimum power cluster. A special CPU mask is called the mpc_mask required to safeguard against undue side effects on SMP systems. Also terminate the function early if the previous CPU is found to be mostly_idle. * Reorganize code to ensure that no unnecessary computations or variable assignments are done. For example there is no need to compute CPU load if that information does not end up getting used in any iteration of the 'for' loop. * The tick logic for EA migrations unnecessarily checks for the power of all CPUs only for skip_cpu() to throw away the result later. Ensure that for EA we only check CPUs within the same cluster and avoid running select_best_cpu() whenever possible. CRs-fixed: 849655 Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask. added a comment about prerequisite of lower_power_cpu_available(). s/struct rq * rq/struct rq *rq/. s/TASK_NICE/task_nice/] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched/debug: Add Kconfig to trigger panics on all 'BUG:' conditionsMatt Wagantall2016-03-23
| | | | | | | | | | | | | Introduce CONFIG_PANIC_ON_SCHED_BUG to trigger panics along with all 'BUG:' prints from the scheduler core, even potentially-recoverable ones such as scheduling while atomic, sleeping from invalid context, and detection of broken arch topologies. Change-Id: I5d2f561614604357a2bc7900b047e53b3a0b7c6d Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [joonwoop@codeaurora.org: fixed trivial merge conflict in lib/Kconfig.debug.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: fix incorrect prev_runnable_sum accounting with long ISR runJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, when IRQ handler spans multiple scheduler windows, HMP scheduler resets the IRQ CPU's prev_runnable_sum with its current max capacity under the assumption that there is no other possible contribution to the CPU's prev_runnable_sum. This isn't correct as another CPU can migrate tasks to the IRQ CPU. Furthermore such incorrectness can trigger BUG_ON() if the migrated task's prev_window is larger than migrating CPU's current capacity in following scenario. 1. ISR on the power efficient CPU has been running for multiple windows. 2. A task which has prev_window higher than IRQ CPU's current capacity migrated to the IRQ CPU. 3. Servicing IRQ is done and the IRQ CPU resets its prev_runnable_rum = CPU's current capacity. 4. Before window rollover, the task on the IRQ CPU migrates to other CPU and fixes up source and destnation CPUs' busy time. 5. BUG_ON(src_rq->prev_runnable_sum < 0) triggers as p->ravg.prev_window is larger than src_rq->prev_runnable_sum. Fix such incorrectness by preserving prev_runnable_sum when ISR spans multiple scheduler windows. There is no need to reset it. CRs-fixed: 828055 Change-Id: I1f95ece026493e49d3810f9c940ec5f698cc0b81 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: prevent task migration while governor queries CPUs' loadJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | At present, governor retrieves each CPUs' load sequentially. In this way, there is chance of race between governor's CPU load query and task migration that would result in reporting of lesser CPUs' load than actual. For example, CPU0 load = 30%. CPU1 load = 50%. Governor Load balancer - sched_get_busy(cpu 0) = 30%. - A task 'p' migrated from CPU 1 to CPU 0. p->ravg->prev_window = 50. Now CPU 0's load = 80%, CPU 1's load = 0%. - sched_get_busy(cpu 1) = 0% 50% of load from CPU 1 to 0 never accounted. Fix such issues by introducing a new API sched_get_cpus_busy() which makes for governor to be able to get set of CPUs' load. The loads set internally constructed with blocking load balancer to ensure migration cannot occur in the meantime. Change-Id: I4fa4dd1195eff26aa603829aca2054871521495e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: report loads greater than 100% only during load alert notificationsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | The busy time of CPUs is adjusted during task migrations. This can result in reporting the load greater than 100% to the governor and causes direct jumps to the higher frequencies during the intra cluster migrations. Hence clip the load to 100% during the load reporting at the end of the window. The load is not clipped for load alert notifications which allows ramping up the frequency faster for inter cluster migrations and heavy task wakeup scenarios. Change-Id: I7347260aa476287ecfc706d4dd0877f4b75a1089 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: turn off the TTWU_QUEUE featureSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | While the feature TTWU_QUEUE has the advantage of reducing cache bouncing of runqueue locks, it has the side effect that runqueue statistics are not updated until the remote CPU has a chance to enqueue the task. Since there is no upper bound on the amount of time it can take the remote CPU to enqueue the task, several sequential wakeups can result in suboptimal task placement based on the stale statistics. Turn off the feature as the cost of sub-optimal placement is much higher than the cost of cache bouncing spinlocks for msm based systems. Change-Id: I0b85c0225237b2bc44f54934769f5e3750c0f3d6 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: avoid unnecessary HMP scheduler stat re-accountingJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | When sched_entity's runnable average changes, before and after, we decrease and increase HMP scheduler's statistics of the sched_entity to take into account of updated runnable average. In that period, however, other CPUs would see that the runnable average updating CPU's load as less than actual. This is suboptimal and can lead improper task placement and load balance decision. We can avoid such situation at least with window based load tracking as sched_entity's load average which is for PELT won't affect to HMP scheduler's load tracking statistics. Thus fix to update HMP statistics only when HMP scheduler uses PELT based load statistics. Change-Id: I9eb615c248c79daab5d22cbb4a994f94be6a968d [joonwoop@codeaurora.org: applied fix into __update_load_avg() instead of update_entity_load_avg().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched/fair: Fix capacity and nr_run comparisons in can_migrate_task()Syed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | Kernel version 3.18 and beyond alter the definition of sgs->group_capacity whereby it reflects the load a group is capable of taking. In previous kernel versions the term used to refer to the number of effective CPUs available. This change breaks the comparison of capacity with the number of running tasks on a group. To fix this convert the capacity metric before doing the comparison. Change-Id: I3ebd941273edbcc903a611d9c883773172e86c8e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* Revert "sched: Use only partial wait time as task demand"Joonwoo Park2016-03-23
| | | | | | | | This reverts commit 0e2092e47488 ("sched: Use only partial wait time as task demand") as it causes performance regression. Change-Id: I3917858be98530807c479fc31eb76c0f22b4ea89 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched/deadline: Add basic HMP extensionsSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | Some HMP extensions have to be supported by all scheduling classes irrespective of them using HMP task placement or not. Add these basic extensions to make deadline scheduling work. Also during the tick, if a deadline task gets throttled, its HMP stats get decremented as part of the dequeue. However, the throttled task does not update its on_rq flag causing HMP stats to be double decremented when update_history() is called as part of a window rollover. Avoid this by checking for throttled deadline tasks before subtracting and adding the deadline tasks load from the rq cumulative runnable avg. Change-Id: I9e2ed6675a730f2ec830f764f911e71c00a7d87a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* sched: Fix racy invocation of fixup_busy_time via move_queued_taskVikram Mulukutla2016-03-23
| | | | | | | | | | | | | | | | | | set_task_cpu uses fixup_busy_time to redistribute a task's load information between source and destination runqueues. fixup_busy_time assumes that both source and destination runqueue locks have been acquired if the task is not being concurrently woken up. However this is no longer true, since move_queued_task does not acquire the destination CPU's runqueue lock due to optimizations brought in by recent kernels. Acquire both source and destination runqueue locks before invoking set_task_cpu in move_queued_tasks. Change-Id: I39fadf0508ad42e511db43428e52c8aa8bf9baf6 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in move_queued_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: don't inflate the task load when the CPU max freq is restrictedPavankumar Kondeti2016-03-23
| | | | | | | | | | | When the CPU max freq is restricted and the CPU is running at the max freq, the task load is inflated by max_possible_freq/max_freq factor. This results in tasks migrating early to the better capacity CPUs which makes things worse if the frequency restriction is due to the thermal condition. Change-Id: Ie0ea405d7005764a6fb852914e88cf97102c138a Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>