summaryrefslogtreecommitdiff
path: root/kernel/sched/stop_task.c (follow)
Commit message (Collapse)AuthorAge
* sched/walt: Re-add code to allow WALT to functionEthan Chen2023-07-16
| | | | Change-Id: Ieb1067c5e276f872ed4c722b7d1fabecbdad87e7
* Merge android-4.4@ceee5bd (v4.4.95) into msm-4.4Blagovest Kolenichev2017-11-06
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/heads/tmp-ceee5bd BACKPORT: arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux sched/core: fix have_sched_energy_data build warning sched/core: Warn if ENERGY_AWARE is enabled but data is missing sched: walt: Correct WALT window size initialization FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide sched: WALT: account cumulative window demand sched/fair: remove useless variable in find_best_target sched/tune: access schedtune_initialized under CGROUP_SCHEDTUNE sched/fair: consider task utilization in group_max_util() sched/fair: consider task utilization in group_norm_util() sched/fair: enforce EAS mode sched/fair: ignore backup CPU when not valid sched/fair: trace energy_diff for non boosted tasks UPSTREAM: sched/fair: Sync task util before slow-path wakeup UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed BACKPORT: sched/fair: Fix find_idlest_group when local group is not allowed UPSTREAM: sched/fair: Remove unnecessary comparison with -1 BACKPORT: sched/fair: Move select_task_rq_fair slow-path into its own function UPSTREAM: sched/fair: Force balancing on nohz balance if local group has capacity UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice() UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot() UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq() UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg() UPSTREAM: sched/core: Fix find_idlest_group() for fork BACKPORT: sched/fair: Fix PELT integrity for new tasks BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handling UPSTREAM: sched/fair: Fix and optimize the fork() path BACKPORT: sched/fair: Make it possible to account fair load avg consistently cpufreq/sched: Consider max cpu capacity when choosing frequencies Linux 4.4.95 FS-Cache: fix dereference of NULL user_key_payload fscrypto: require write access to mount to set encryption policy KEYS: Fix race between updating and finding a negative key fscrypt: fix dereference of NULL user_key_payload f2fs crypto: add missing locking for keyring_key access f2fs crypto: replace some BUG_ON()'s with error checks sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task() parisc: Fix double-word compare and exchange in LWS code on 32-bit kernels parisc: Avoid trashing sr2 and sr3 in LWS code pkcs7: Prevent NULL pointer dereference, since sinfo is not always set. KEYS: don't let add_key() update an uninstantiated key lib/digsig: fix dereference of NULL user_key_payload KEYS: encrypted: fix dereference of NULL user_key_payload rtlwifi: rtl8821ae: Fix connection lost problem clockevents/drivers/cs5535: Improve resilience to spurious interrupts bus: mbus: fix window size calculation for 4GB windows brcmsmac: make some local variables 'static const' to reduce stack size i2c: ismt: Separate I2C block read from SMBus block read ALSA: hda: Remove superfluous '-' added by printk conversion ALSA: seq: Enable 'use' locking in all configurations drm/nouveau/mmu: flush tlbs before deleting page tables drm/nouveau/bsp/g92: disable by default can: esd_usb2: Fix can_dlc value for received RTR, frames usb: musb: Check for host-mode using is_host_active() on reset interrupt usb: musb: sunxi: Explicitly release USB PHY on exit can: gs_usb: fix busy loop if no more TX context is available ALSA: usb-audio: Add native DSD support for Pro-Ject Pre Box S2 Digital usb: hub: Allow reset retry for USB2 devices on connect bounce usb: quirks: add quirk for WORLDE MINI MIDI keyboard usb: cdc_acm: Add quirk for Elatec TWN3 USB: serial: metro-usb: add MS7820 device id USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor() USB: devio: Revert "USB: devio: Don't corrupt user memory" ANDROID: binder: show high watermark of alloc->pages. ANDROID: binder: Add thread->process_todo flag. UPSTREAM: arm64: compat: Remove leftover variable declaration ANDROID: sched/fair: Select correct capacity state for energy_diff Revert "UPSTREAM: efi/libstub/arm64: Set -fpie when building the EFI stub" cpufreq: schedutil: clamp util to CPU maximum capacity FROMLIST: android: binder: Fix null ptr dereference in debug msg FROMLIST: android: binder: Change binder_shrinker to static cpufreq/sched: Use cpu max freq rather than policy max Conflicts: include/linux/sched.h kernel/sched/core.c kernel/sched/fair.c Change-Id: I2751f851df741f00e797deaf2119872b3dced655 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
| * FROMLIST: sched/fair: Use wake_q length as a hint for wake_wideBrendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (from https://patchwork.kernel.org/patch/9895261/) This patch adds a parameter to select_task_rq, sibling_count_hint allowing the caller, where it has this information, to inform the sched_class the number of tasks that are being woken up as part of the same event. The wake_q mechanism is one case where this information is available. select_task_rq_fair can then use the information to detect that it needs to widen the search space for task placement in order to avoid overloading the last-level cache domain's CPUs. * * * The reason I am investigating this change is the following use case on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which all repeatedly do X amount of work then pthread_barrier_wait (i.e. sleep until the last task finishes its X and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU finish faster, and then those CPUs pull over the tasks that are still running: v CPU v ->time-> ------------- 0 (big) 11111 /333 ------------- 1 (big) 22222 /444| ------------- 2 (LITTLE) 333333/ ------------- 3 (LITTLE) 444444/ ------------- Now when task 4 hits the barrier (at |) and wakes the others up, there are 4 tasks with prev_cpu=<big> and 0 tasks with prev_cpu=<little>. want_affine therefore means that we'll only look in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled on the bigs until the next load balance, something like this: v CPU v ->time-> ------------------------ 0 (big) 11111 /333 31313\33333 ------------------------ 1 (big) 22222 /444|424\4444444 ------------------------ 2 (LITTLE) 333333/ \222222 ------------------------ 3 (LITTLE) 444444/ \1111 ------------------------ ^^^ underutilization So, I'm trying to get want_affine = 0 for these tasks. I don't _think_ any incarnation of the wakee_flips mechanism can help us here because which task is waker and which tasks are wakees generally changes with each iteration. However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the nice property that we know exactly how many tasks are being woken, so we can cheat. It might be a disadvantage that we "widen" _every_ task that's woken in an event, while select_idle_sibling would work fine for the first sd_llc_size - 1 tasks. IIUC, if wake_affine() behaves correctly this trick wouldn't be necessary on SMP systems, so it might be best guarded by the presence of SD_ASYM_CPUCAPACITY? * * * Final note.. In order to observe "perfect" behaviour for this use case, I also had to disable the TTWU_QUEUE sched feature. Suppose during the wakeup above we are working through the work queue and have placed tasks 3 and 2, and are about to place task 1: v CPU v ->time-> -------------- 0 (big) 11111 /333 3 -------------- 1 (big) 22222 /444|4 -------------- 2 (LITTLE) 333333/ 2 -------------- 3 (LITTLE) 444444/ <- Task 1 should go here -------------- If TTWU_QUEUE is enabled, we will not yet have enqueued task 2 (having instead sent a reschedule IPI) or attached its load to CPU 2. So we are likely to also place task 1 on cpu 2. Disabling TTWU_QUEUE means that we enqueue task 2 before placing task 1, solving this issue. TTWU_QUEUE is there to minimise rq lock contention, and I guess that this contention is less of an issue on big.LITTLE systems since they have relatively few CPUs, which suggests the trade-off makes sense here. Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Matt Fleming <matt@codeblueprint.co.uk>
| * sched: Introduce Window Assisted Load Tracking (WALT)Srivatsa Vaddagiri2016-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use a window based view of time in order to track task demand and CPU utilization in the scheduler. Window Assisted Load Tracking (WALT) implementation credits: Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park, Pavan Kumar Kondeti, Olav Haugan 2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla and Todd Kjos Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708 Includes fixes for issues: eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a race resulting in watchdog resets BUG: 29353986 Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb Handle walt accounting anomoly during resume During resume, there is a corner case where on wakeup, a task's prev_runnable_sum can go negative. This is a workaround that fixes the condition and warns (instead of crashing). BUG: 29464099 Change-Id: I173e7874324b31a3584435530281708145773508 Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Srinath Sridharan <srinathsr@google.com> Signed-off-by: Juri Lelli <juri.lelli@arm.com> [jstultz: fwdported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>
* | sched: Add separate load tracking histogram to predict loadsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current window based load tracking only saves history for five windows. A historically heavy task's heavy load will be completely forgotten after five windows of light load. Even before the five window expires, a heavy task wakes up on same CPU it used to run won't trigger any frequency change until end of the window. It would starve for the entire window. It also adds one "small" load window to history because it's accumulating load at a low frequency, further reducing the tracked load for this heavy task. Ideally, scheduler should be able to identify such tasks and notify governor to increase frequency immediately after it wakes up. Add a histogram for each task to track a much longer load history. A prediction will be made based on runtime of previous or current window, histogram data and load tracked in recent windows. Prediction of all tasks that is currently running or runnable on a CPU is aggregated and reported to CPUFreq governor in sched_get_cpus_busy(). sched_get_cpus_busy() now returns predicted busy time in addition to previous window busy time and new task busy time, scaled to the CPU maximum possible frequency. Tunables: - /proc/sys/kernel/sched_gov_alert_freq (KHz) This tunable can be used to further filter the notifications. Frequency alert notification is sent only when the predicted load exceeds previous window load by sched_gov_alert_freq converted to load. Change-Id: If29098cd2c5499163ceaff18668639db76ee8504 Suggested-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Junjie Wu <junjiew@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task() and removed changes for CONFIG_SCHED_QHMP.]
* | sched: clean up fixup_hmp_sched_stats()Joonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 392edf4969d20 ("sched: avoid stale cumulative_runnable_avg HMP statistics) introduced the callback function fixup_hmp_sched_stats() so update_history() can avoid decrement and increment pair of HMP stat. However the commit also made fixup function to do obscure p->ravg.demand update which isn't the cleanest way. Revise the function fixup_hmp_sched_stats() so the caller can update p->ravg.demand directly. Change-Id: Id54667d306495d2109c26362813f80f08a1385ad [joonwoop@codeaurora.org: stripped out CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: avoid stale cumulative_runnable_avg HMP statisticsJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a new window starts for a task and the task is on a rq, scheduler decreases rq's cumulative_runnable_avg momentarily, re-account task's demand and increases rq's cumulative_runnable_avg with newly accounted task's demand. Therefore there is short time period that rq's cumulative_runnable_avg is less than what it's supposed to be. Meanwhile, there is chance that other CPU is in search of best CPU to place a task and makes suboptimal decision with momentarily stale cumulative_runnable_avg. Fix such issue by adding or subtracting of delta between task's old and new demand instead of decrementing and incrementing of entire task's load. Change-Id: I3c9329961e6f96e269fa13359e7d1c39c4973ff2 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Consolidate hmp stats into their own structSrivatsa Vaddagiri2016-03-23
|/ | | | | | | | | | | | | | | | | Key hmp stats (nr_big_tasks, nr_small_tasks and cumulative_runnable_average) are currently maintained per-cpu in 'struct rq'. Merge those stats in their own structure (struct hmp_sched_stats) and modify impacted functions to deal with the newly introduced structure. This cleanup is required for a subsequent patch which fixes various issues with use of CFS_BANDWIDTH feature in HMP scheduler. Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in __update_load_avg().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* sched: Make sched_class::set_cpus_allowed() unconditionalPeter Zijlstra2015-08-12
| | | | | | | | | | | | | | | | | | | Give every class a set_cpus_allowed() method, this enables some small optimization in the RT,DL implementation by avoiding a double cpumask_weight() call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.614517487@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Provide update_curr callbacks for stop/idle scheduling classesThomas Gleixner2014-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris bisected a NULL pointer deference in task_sched_runtime() to commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'. Chris observed crashes in atop or other /proc walking programs when he started fork bombs on his machine. He assumed that this is a new exit race, but that does not make any sense when looking at that commit. What's interesting is that, the commit provides update_curr callbacks for all scheduling classes except stop_task and idle_task. While nothing can ever hit that via the clock_nanosleep() and clock_gettime() interfaces, which have been the target of the commit in question, the author obviously forgot that there are other code paths which invoke task_sched_runtime() do_task_stat(() thread_group_cputime_adjusted() thread_group_cputime() task_cputime() task_sched_runtime() if (task_current(rq, p) && task_on_rq_queued(p)) { update_rq_clock(rq); up->sched_class->update_curr(rq); } If the stats are read for a stomp machine task, aka 'migration/N' and that task is current on its cpu, this will happily call the NULL pointer of stop_task->update_curr. Ooops. Chris observation that this happens faster when he runs the fork bomb makes sense as the fork bomb will kick migration threads more often so the probability to hit the issue will increase. Add the missing update_curr callbacks to the scheduler classes stop_task and idle_task. While idle tasks cannot be monitored via /proc we have other means to hit the idle case. Fixes: 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency' Reported-by: Chris Mason <clm@fb.com> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sched: Add wrapper for checking task_struct::on_rqKirill Tkhai2014-08-20
| | | | | | | | | | | | | | | | | | | | | | | Implement task_on_rq_queued() and use it everywhere instead of on_rq check. No functional changes. The only exception is we do not use the wrapper in check_for_tasks(), because it requires to export task_on_rq_queued() in global header files. Next patch in series would return it back, so we do not twist it from here to there. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched, nohz: Change rq->nr_running to always use wrappersKirill Tkhai2014-05-22
| | | | | | | | | | | | | | | | Sometimes ->nr_running may cross 2 but interrupt is not being sent to rq's cpu. In this case we don't reenable the timer. Looks like this may be the reason for rare unexpected effects, if nohz is enabled. Patch replaces all places of direct changing of nr_running and makes add_nr_running() caring about crossing border. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhost Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Fix hotplug task migrationPeter Zijlstra2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan Carpenter reported: > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338) > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005) Kirill also spotted that migrate_tasks() will have an instant NULL deref because pick_next_task() will immediately deref prev. Instead of fixing all the corner cases because migrate_tasks() can pass in a NULL prev task in the unlikely case of hot-un-plug, provide a fake task such that we can remove all the NULL checks from the far more common paths. A further problem; not previously spotted; is that because we pushed pre_schedule() and idle_balance() into pick_next_task() we now need to avoid those getting called and pulling more tasks on our dying CPU. We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1. We also note that since we call pick_next_task() exactly the amount of times we have runnable tasks present, we should never land in idle_balance(). Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()") Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reported-by: Kirill Tkhai <tkhai@yandex.ru> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* sched: Push put_prev_task() into pick_next_task()Peter Zijlstra2014-02-10
| | | | | | | | | | | In order to avoid having to do put/set on a whole cgroup hierarchy when we context switch, push the put into pick_next_task() so that both operations are in the same function. Further changes then allow us to possibly optimize away redundant work. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptop Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/deadline: Add SCHED_DEADLINE structures & implementationDario Faggioli2014-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces the data structures, constants and symbols needed for SCHED_DEADLINE implementation. Core data structure of SCHED_DEADLINE are defined, along with their initializers. Hooks for checking if a task belong to the new policy are also added where they are needed. Adds a scheduling class, in sched/dl.c and a new policy called SCHED_DEADLINE. It is an implementation of the Earliest Deadline First (EDF) scheduling algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS) that makes it possible to isolate the behaviour of tasks between each other. The typical -deadline task will be made up of a computation phase (instance) which is activated on a periodic or sporadic fashion. The expected (maximum) duration of such computation is called the task's runtime; the time interval by which each instance need to be completed is called the task's relative deadline. The task's absolute deadline is dynamically calculated as the time instant a task (better, an instance) activates plus the relative deadline. The EDF algorithms selects the task with the smallest absolute deadline as the one to be executed first, while the CBS ensures each task to run for at most its runtime every (relative) deadline length time interval, avoiding any interference between different tasks (bandwidth isolation). Thanks to this feature, also tasks that do not strictly comply with the computational model sketched above can effectively use the new policy. To summarize, this patch: - introduces the data structures, constants and symbols needed; - implements the core logic of the scheduling algorithm in the new scheduling class file; - provides all the glue code between the new scheduling class and the core scheduler and refines the interactions between sched/dl and the other existing scheduling classes. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Fabio Checconi <fchecconi@gmail.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/numa: Introduce migrate_swap()Peter Zijlstra2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | Use the new stop_two_cpus() to implement migrate_swap(), a function that flips two tasks between their respective cpus. I'm fairly sure there's a less crude way than employing the stop_two_cpus() method, but everything I tried either got horribly fragile and/or complex. So keep it simple for now. The notable detail is how we 'migrate' tasks that aren't runnable anymore. We'll make it appear like we migrated them before they went to sleep. The sole difference is the previous cpu in the wakeup path, so we override this. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/1381141781-10992-39-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Use an accessor to read the rq clockFrederic Weisbecker2013-05-28
| | | | | | | | | | | | | | | | Read the runqueue clock through an accessor. This prepares for adding a debugging infrastructure to detect missing or redundant calls to update_rq_clock() between a scheduler's entry and exit point. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Turner <pjt@google.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1365724262-20142-6-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Fix migration thread runtime bogosityMike Galbraith2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make stop scheduler class do the same accounting as other classes, Migration threads can be caught in the act while doing exec balancing, leading to the below due to use of unmaintained ->se.exec_start. The load that triggered this particular instance was an apparently out of control heavily threaded application that does system monitoring in what equated to an exec bomb, with one of the VERY frequently migrated tasks being ps. %CPU PID USER CMD 99.3 45 root [migration/10] 97.7 53 root [migration/12] 97.0 57 root [migration/13] 90.1 49 root [migration/11] 89.6 65 root [migration/15] 88.7 17 root [migration/3] 80.4 37 root [migration/8] 78.1 41 root [migration/9] 44.2 13 root [migration/2] Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* sched: Move all scheduler bits into kernel/sched/Peter Zijlstra2011-11-17
There's too many sched*.[ch] files in kernel/, give them their own directory. (No code changed, other than Makefile glue added.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>