summaryrefslogtreecommitdiff
path: root/kernel/sysctl.c (follow)
Commit message (Collapse)AuthorAge
...
* | | Revert "sched: add scheduling latency tracking procfs node"Joonwoo Park2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b40bf941f61756bcc ("sched: add scheduling latency tracking procfs node") as this feature is no longer used. Change-Id: I5de789b6349e6ea78ae3725af2a3ffa72b7b7f11 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: eliminate sched_early_detection_duration knobJoonwoo Park2016-06-03
| | | | | | | | | | | | | | | | | | | | | Kill unused scheduler knob sched_early_detection_duration. Change-Id: I36b7a10982367f9c7ab8eefcb8ef1d0f9955601d Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Remove the sched heavy task frequency guidance featureJoonwoo Park2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has always been unused feature given its limitation of adding phantom load to the system. Since there are no immediate plans of using this and the fact that it adds unnecessary complications to the new load fixup mechanism, remove this feature for now. It can be revisited later in light of the new mechanism. Change-Id: Ie9501a898d0f423338293a8dde6bc56f493f1e75 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: eliminate sched_migration_fixup knobJoonwoo Park2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | Kill unused scheduler knob sched_migration_fixup. With this change scheduler always adjusts CPU's busy time during migration. Change-Id: I5d59e89d5cc0f2c705c40036cd7b47f5d3f89e58 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: eliminate sched_upmigrate_min_nice knobJoonwoo Park2016-06-03
| | | | | | | | | | | | | | | | | | | | | Kill unused scheduler knob sched_upmigrate_min_nice. Change-Id: I53ddfde39c78e78306bd746c1c4da9a94ec67cd8 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: eliminate sched_enable_power_aware knob and parameterJoonwoo Park2016-06-01
| | | | | | | | | | | | | | | | | | | | | | | | Kill unused scheduler knob and parameter sched_enable_power_aware. HMP scheduler always take into account power cost for placing task. Change-Id: Ib26a21df9b903baac26c026862b0a41b4a8834f3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: eliminate sched_freq_account_wait_time knobJoonwoo Park2016-06-01
| | | | | | | | | | | | | | | | | | | | | Kill unused scheduler knob sched_freq_account_wait_time. Change-Id: Ib74123ebd69dfa3f86cf7335099f50c12a6e93c3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: eliminate sched_account_wait_time knobJoonwoo Park2016-06-01
| | | | | | | | | | | | | | | | | | | | | | | | Kill unused scheduler knob sched_account_wait_time. With this change scheduler always accounts task's wait time into demand. Change-Id: Ifa4bcb5685798f48fd020f3d0c9853220b3f5fdc Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Aggregate for frequencySrivatsa Vaddagiri2016-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Related threads in a group could execute on different CPUs and hence present a split-demand picture to cpufreq governor. IOW the governor fails to see the net cpu demand of all related threads in a given window if the threads's execution were to be split across CPUs. That could result in sub-optimal frequency chosen in comparison to the ideal frequency at which the aggregate work (taken up by related threads) needs to be run. This patch aggregates cpu execution stats in a window for all related threads in a group. This helps present cpu busy time to governor as if all related threads were part of the same thread and thus help select the right frequency required by related threads. This aggregation is done per-cluster. Change-Id: I71e6047620066323721c6d542034ddd4b2950e7f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: Fixed notify_migration() to hold rcu read lock as this version of Linux doesn't hold p->pi_lock when the function gets called while keeping use of rcu_access_pointer() since we never dereference return value.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: restrict sync wakee placement bias with waker's demandJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Biasing sync wakee task towards waker CPU's cluster makes sense when the waker's demand is high enough so the wakee also can take advantage of high CPU frequency voted because of waker's load. Placing sync wakee on the low demand waker's CPU can lead placement imbalance which can lead unnecessary migration. Introduce a new tunable "sched_big_waker_task_load" that defines the big waker so scheduler avoid wakee on waker's cluster bias when the waker's load is below the tunable. CRs-fixed: 971295 Change-Id: I1550ede0a71ac8c9be74a7daabe164c6a269a3fb Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [joonwoop@codeaurora.org: fixed a minor conflict in include/linux/sched/sysctl.h.]
* | | sched: add preference for waker cluster CPU in wakee task placementJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If sync wakee task's demand is small it's worth to place the wakee task on waker's cluster for better performance in the sense that waker and wakee are corelated so the wakee should take advantage of waker cluster's frequency which is voted by the waker along with cache locality benefit. While biasing towards the waker's cluster we want to avoid the waker CPU as much as possible as placing the wakee on the waker's CPU can make the waker got preempted and migrated by load balancer. Introduce a new tunable 'sched_small_wakee_task_load' that differentiates eligible small wakee task and place the small wakee tasks on the waker's cluster. CRs-fixed: 971295 Change-Id: I96897d9a72a6f63dca4986d9219c2058cd5a7916 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [joonwoop@codeaurora.org: fixed a minor conflict in include/linux/sched/sysctl.h.]
* | | sched: Add separate load tracking histogram to predict loadsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current window based load tracking only saves history for five windows. A historically heavy task's heavy load will be completely forgotten after five windows of light load. Even before the five window expires, a heavy task wakes up on same CPU it used to run won't trigger any frequency change until end of the window. It would starve for the entire window. It also adds one "small" load window to history because it's accumulating load at a low frequency, further reducing the tracked load for this heavy task. Ideally, scheduler should be able to identify such tasks and notify governor to increase frequency immediately after it wakes up. Add a histogram for each task to track a much longer load history. A prediction will be made based on runtime of previous or current window, histogram data and load tracked in recent windows. Prediction of all tasks that is currently running or runnable on a CPU is aggregated and reported to CPUFreq governor in sched_get_cpus_busy(). sched_get_cpus_busy() now returns predicted busy time in addition to previous window busy time and new task busy time, scaled to the CPU maximum possible frequency. Tunables: - /proc/sys/kernel/sched_gov_alert_freq (KHz) This tunable can be used to further filter the notifications. Frequency alert notification is sent only when the predicted load exceeds previous window load by sched_gov_alert_freq converted to load. Change-Id: If29098cd2c5499163ceaff18668639db76ee8504 Suggested-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Junjie Wu <junjiew@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task() and removed changes for CONFIG_SCHED_QHMP.]
* | | sched: Revise the inter cluster load balance restrictionsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The frequency based inter cluster load balance restrictions are not reliable as frequency does not provide a good estimate of the CPU's current load. Replace them with the spill_load and spill_nr_run based checks. The higher capacity cluster is restricted from pulling the tasks from the lower capacity cluster unless all of the lower capacity CPUs are above spill. This behavior can be controlled by a sysctl tunable and it is disabled by default (i.e. no load balance restrictions). Change-Id: I45c09c8adcb61a8a7d4e08beadf2f97f1805fb42 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes for CONFIG_SCHED_QHMP.]
* | | sched: colocate related threadsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide userspace interface for tasks to be grouped together as "related" threads. For example, all threads involved in updating display buffer could be tagged as related. Scheduler will attempt to provide special treatment for group of related threads such as: 1) Colocation of related threads in same "preferred" cluster 2) Aggregation of demand towards determination of cluster frequency This patch extends scheduler to provide best-effort colocation support for a group of related threads. Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor merge conflicts. removed ifdefry for CONFIG_SCHED_QHMP.] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Update fair and rt placement logic to use scheduler clustersSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of clusters in the fair and rt scheduling classes. This is needed as the freq domain mask can no longer be used to do correct task placement. The freq domain mask was being used to demarcate clusters. Change-Id: I57f74147c7006f22d6760256926c10fd0bf50cbd Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes for CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | mm: swap: swap ratio supportVinayak Menon2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to receive a static ratio from userspace to divide the swap pages between ZRAM and disk based swap devices. The existing infrastructure allows to keep same priority for multiple swap devices, which results in round robin distribution of pages. With this patch, the ratio can be defined. CRs-fixed: 968416 Change-Id: I54f54489db84cabb206569dd62d61a8a7a898991 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
* | | sched: fix compile error where !CONFIG_SCHED_FREQ_INPUTJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | The sysctl node sched_new_task_windows is only for CONFIG_SCHED_HMP and CONFIG_SCHED_FREQ_INPUT. Change-Id: I4791e977fa8516fd2cd31198f71103b8d7e874c3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: select task's prev_cpu as the best CPU when it was chosen recentlyJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Select given task's prev_cpu when the task slept for short period to reduce latency of task placement and migrations. A new tunable /proc/sys/kernel/sched_select_prev_cpu_us introduced to determine whether tasks are eligible to go through fast path. CRs-fixed: 947467 Change-Id: Ia507665b91f4e9f0e6ee1448d8df8994ead9739a [joonwoop@codeaurora.org: fixed conflict in include/linux/sched.h, include/linux/sched/sysctl.h, kernel/sched/core.c and kernel/sysctl.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Notify cpufreq governor early about potential big tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tasks that are on the runqueue continuously for a certain amount of time have the potential to be big tasks at the end of the window in which they are runnable. In such scenarios ramping the CPU frequency early can boost performance rather than waiting till the end of a window for the governor to query load. Notify the governor early at every tick when a task has been observed to execute beyond some percentage of the tick period. The threshold beyond which a task is eligible for early detection can be changed via the tunable sched_early_detection_duration. The feature itself is enabled only when scheduler boost is in effect. Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in kernel/sched/core.c. fixed minor conflicts in include/linux/sched.h, include/linux/sched/sysctl.h and kernel/sysctl.c due to CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: account new task load so that governor can apply different policyJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Account amount of load contributed by new tasks within CPU load so that governor can apply different policy when CPU is loaded by new tasks. To be able to distinguish new task load a new tunable sched_new_task_windows also introduced. The tunable defines tasks as new when the tasks are have been active less than configured windows. Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59 Suggested-by: Saravana Kannan <skannan@codeaurora.org> [joonwoop@codeaurora.org: ommited changes for drivers/cpufreq/cpufreq_interactive.c. cpufreq changes needs to be applied separately later. fixed conflict in include/linux/sched.h and include/linux/sched/sysctl.h. omitted changes for qhmp_core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: add frequency zone awareness to the load balancerSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add zone awareness to the load balancer. Remove all earlier restrictions that the load balancer had for inter cluster kicks and migration. Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due to its return type change.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: remove the notion of small tasks and small task packingSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Task packing will now be determined solely on the basis of the power cost of task placement. All tasks are eligible for packing. Remove the notion of "small" tasks from the scheduler. Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Rework energy aware schedulingSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Energy aware core rotation is not compatible with the power based task placement being introduced in subsequent patches. Remove all existing EA based task placement/migration logic. power_cost() is the only function remaining. This function has been modified to return the total power cost associated with a task on a given CPU taking existing load on that CPU into account. Change-Id: Ia00501e3cbfc6e11446a9a2e93e318c4c42bdab4 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed multiple conflicts in fair.c and minor conflict in features.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: check HMP scheduler tunables validityJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Check tunables validity to take valid values only. CRs-fixed: 812443 Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: add scheduling latency tracking procfs nodeJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the worst scheduling latency. It provides easier way to identify maximum scheduling latency seen across the CPUs. Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e [joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: warn/panic upon excessive scheduling latencyJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and /proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the cases that tasks are runnable but not scheduled more than configured time. This helps to find out unacceptably high scheduling latency more easily. Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82 [joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Per-cpu prefer_idle flagSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the global sysctl_sched_prefer_idle flag and replace it with a per-cpu prefer_idle flag. The per-cpu flag is expected to same for all cpus in a cluster. It thus provides convenient means to disable packing in one cluster while allowing packing in another cluster. Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Add sysctl to enable power aware schedulingOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sysctl to enable energy awareness at runtime. This is useful for performance/power tuning/measurements and debugging. In addition this will match up with the Documentation/scheduler/sched-hmp.txt documentation. Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
* | | sched: Prevent race conditions where upmigrate_min_nice changesJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger BUG_ON(rq->nr_big_tasks < 0). This happens when there is a task which was considered as non-big task due to its nice > upmigrate_min_nice and later upmigrate_min_nice is changed to higher value so the task becomes big task. In this case runqueue still has nr_big_tasks = 0 incorrectly with current implementation. Consequently next scheduler tick sees a big task to schedule and try to decrease nr_big_tasks which is already 0. Introduce sched_upmigrate_min_nice which is updated atomically and re-count the number of big and small tasks to fix BUG_ON() triggering. Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Avoid frequent task migration due to EA in lbOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new tunable exists that allow task migration to be throttled when the scheduler tries to do task migrations due to Energy Awareness (EA). This tunable is only taken into account when migrations occur in the tick path. Extend the usage of the tunable to take into account the load balancer (lb) path also. In addition ensure that the start of task execution on a CPU is updated correctly. If a task is preempted but still runnable on the same CPU the start of execution should not be updated. Only update the start of execution when a task wakes up after sleep or moves to a new CPU. Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed conflict in group_classify() and set_task_cpu().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Avoid frequent migration of running taskSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Power values for cpus can drop quite considerably when it goes idle. As a result, the best choice for running a single task in a cluster can vary quite rapidly. As the task keeps hopping cpus, other cpus go idle and start being seen as more favorable target for running a task, leading to task migrating almost every scheduler tick! Prevent this by keeping track of when a task started running on a cpu and allowing task migration in tick path (migration_needed()) on account of energy efficiency reasons only if the task has run sufficiently long (as determined by sysctl_sched_min_runtime variable). Note that currently sysctl_sched_min_runtime setting is considered only in scheduler_tick()->migration_needed() path and not in idle_balance() path. In other words, a task could be migrated to another cpu which did a idle_balance(). This limitation should not affect high-frequency migrations seen typically (when a single high-demand task runs on high-performance cpu). CRs-Fixed: 756570 Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and __schedule().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: Provide knob to prefer mostly_idle over idle cpusSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl_sched_prefer_idle lets the scheduler bias selection of idle cpus over mostly idle cpus for tasks. This knob could be useful to control balance between power and performance. Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: make sched_cpu_high_irqload a runtime tunableSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | It may be desirable to be able to alter the scehd_cpu_high_irqload setting easily, so make it a runtime tunable value. Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: per-cpu mostly_idle thresholdSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack tasks on cpus to some extent. In some cases, it may be desirable to have different packing limits for different cpus. For example, pack to a higher limit on high-performance cpus compared to power-efficient cpus. This patch removes the global mostly_idle tunables and makes them per-cpu, thus letting task packing behavior to be controlled in a fine-grained manner. Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: update governor notification logicSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make criteria for notifying governor to be per-cpu. Governor is notified of any large change in cpu's busy time statistics (rq->prev_runnable_sum) since the last reported value. Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Use absolute scale for notifying governorSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the tunables used for deciding the need for notification to be on absolute scale. The earlier scale (in percent terms relative to cur_freq) does not work well with available range of frequencies. For example, 100% tunable value would work well for lower range of frequencies and not for higher range. Having the tunable to be on absolute scale makes tuning more realistic. Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Enhance cpu busy time accountingSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rq->curr/prev_runnable_sum counters represent cpu demand from various tasks that have run on a cpu. Any task that runs on a cpu will have a representation in rq->curr_runnable_sum. Their partial_demand value will be included in rq->curr_runnable_sum. Since partial_demand is derived from historical load samples for a task, rq->curr_runnable_sum could represent "inflated/un-realistic" cpu usage. As an example, lets say that task with partial_demand of 10ms runs for only 1ms on a cpu. What is included in rq->curr_runnable_sum is 10ms (and not the actual execution time of 1ms). This leads to cpu busy time being reported on the upside causing frequency to stay higher than necessary. This patch fixes cpu busy accounting scheme to strictly represent actual usage. It also provides for conditional fixup of busy time upon migration and upon heavy-task wakeup. CRs-Fixed: 691443 Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in init_task_load(), se.avg.decay_count has deprecated.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: improve logic for alerting governorSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we send notification to governor not taking note of cpus that are synchronized with regard to their frequency. As a result, scheduler could send pointless notifications (notification spam!). Avoid this by considering synchronized cpus and alerting governor only when the highest demand of any cpu within cluster far exceeds or falls behind current frequency. Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: legacy modeSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support legacy mode, which results in busy time being seen by governor that is close to what it would have seen via existing APIs i.e get_cpu_idle_time_us(), get_cpu_iowait_time_us() and get_cpu_idle_time_jiffy(). In particular, legacy mode means that only task execution time is counted in rq->curr_runnable_sum and rq->prev_runnable_sum. Also task migration does not result in adjustment of those counters. Change-Id: If374ccc084aa73f77374b6b3ab4cd0a4ca7b8c90 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Code cleanupSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Remove code duplication associated with update of various window-stats related sysctl tunables Change-Id: I64e29ac065172464ba371a03758937999c42a71f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Make RAVG_HIST_SIZE tunableOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make RAVG_HIST_SIZE available from /proc/sys/kernel/sched_ravg_hist_size to allow tuning of the size of the history that is used in computation of task demand. CRs-fixed: 706138 Change-Id: Id54c1e4b6e974a62d787070a0af1b4e8ce3b4be6 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in sysctl.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: window-stats: Allow acct_wait_time to be tunedSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | Add sysctl interface to tune sched_acct_wait_time variable at runtime Change-Id: I38339cdb388a507019e429709a7c28e80b5b3585 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: window-stats: Handle policy change properlySrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_window_stat_policy influences task demand and thus various statistics maintained per-cpu like curr_runnable_sum. Changing policy non-atomically would lead to improper accounting. For example, when task is enqueued on a cpu's runqueue, its demand that is added to rq->cumulative_runnable_avg could be based on AVG policy and when its dequeued its demand that is removed can be based on MAX, leading to erroneous accounting. This change causes policy change to be "atomic" i.e all cpu's rq->lock are held and all task's window-stats are reset before policy is changed. Change-Id: I6a3e4fb7bc299dfc5c367693b5717a1ef518c32d CRs-Fixed: 687409 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in include/linux/sched/sysctl.h. Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: remove sysctl control for HMP and power-aware task placementSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no real need to control HMP and power-aware task placement at runtime after kernel has booted. Boot-time control should be sufficient. Not allowing for runtime (sysctl) support simplifies the code quite a bit. Also rename sysctl_sched_enable_hmp_task_placement to be shorter. Change-Id: I60cae51a173c6f73b79cbf90c50ddd41a27604aa Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict. p->nr_cpus_allowed == 1 has moved to core.c Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched/fair: Introduce scheduler boost for low latency workloadsSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain low latency bursty workloads require immediate use of highest capacity CPUs in HMP systems. Existing load tracking mechanisms may be unable to respond to the sudden surge in the system load within the latency requirements. Introduce the scheduler boost feature for such workloads. While boost is in effect the scheduler bypasses regular load based task placement and prefers highest capacity CPUs in the system for all non-small fair sched class tasks. Provide both a kernel and userspace API for software that may have apriori knowledge about the system workload. Change-Id: I783f585d1f8c97219e629d9c54f712318821922f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in include/linux/sched/sysctl.h.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: notify cpufreq on over/underprovisioned CPUsSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a migration occurs the source and destination CPUs may not be running at frequencies which match the new task load on those CPUs. Previously, the scheduler was notifying cpufreq anytime a task greater than a certain size migrates. This is suboptimal however since this does not take into account the CPU's current frequency and other task activity that may be present. Change-Id: I5092bda3a517e1343f97e5a455957c25ee19b549 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
* | | sched: Introduce spill threshold tunables to manage overcommitmentSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the number of tasks intended for a cluster exceed the number of mostly idle CPUs in that cluster, the scheduler currently freely uses CPUs in other clusters if possible. While this is optimal for performance the power trade off can be quite significant. Introduce spill threshold tunables that govern the extent to which the scheduler should attempt to contain tasks within a cluster. Change-Id: I797e6c6b2aa0c3a376dad93758abe1d587663624 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed conflict in nohz_kick_needed()] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | sched: add migration load change notifier for frequency guidanceSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a task moves between CPUs in two different frequency domains the cpufreq governor may wish to immediately modify the frequency of both the source and destination CPUs of the migrating task. A tunable is provided to establish what size task is considered "significant" enough to warrant notifying cpufreq. Also fix a bug that would cause load to not be accounted properly during wakeup migrations. Change-Id: Ie8f6b1cc4d43a602840dac18590b42a81327c95a Signed-off-by: Steve Muckle <smuckle@codeaurora.org> [rameezmustafa@codeaurora.org: Add double rq locking for set_task_cpu()] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: add power aware scheduling sysctlSteve Muckle2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sched_enable_power_aware sysctl will control whether or not scheduling decisions are influenced by the power consumption of individual CPUs. Change-Id: I312f892cf76a3fccc4ecc8aa6703908b205267f0 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | | sched: Basic task placement support for HMP systemsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMP systems have cpus with different power and performance characteristics. Some cpus could offer better power at cost of lower performance while other cpus could offer better performance at cost of higher power. As a result, bandwidth consumed by a task to do some "fixed" amount of work could vary across cpus. Optimal task placement on HMP would involve placing a task on a cpu where it can meet its performance goals at lowest power cost. Since kernel has little to no awareness of performance goals of applications, we guestimate whether task is meeting its performance goals or not by looking at its cpu bandwidth consumption. High bandwidth consumption could imply that task's performance can improve by running on cpus with better capacity/performance-characterisitcs. This patch makes the basic changes to support HMP. It provides a configurable threshold and any task consuming bandwidth in excess of threshold will be placed on a cpu with better capacity. Change-Id: I3fd98edd430f73342fbef06411e8b2d1cf2f56fa Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict about members of p->se which are not available anymore.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>