From 58c453484f7ef1b81d9da4a5696e849a92e61883 Mon Sep 17 00:00:00 2001 From: Giovanni Gherdovich Date: Fri, 5 Aug 2016 10:21:56 +0200 Subject: sched/cputime: Mitigate performance regression in times()/clock_gettime() Commit: 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency") fixed a problem whereby clock_nanosleep() followed by clock_gettime() could allow a task to wake early. It addressed the problem by calling the scheduling classes update_curr() when the cputimer starts. Said change induced a considerable performance regression on the syscalls times() and clock_gettimes(CLOCK_PROCESS_CPUTIME_ID). There are some debuggers and applications that monitor their own performance that accidentally depend on the performance of these specific calls. This patch mitigates the performace loss by prefetching data in the CPU cache, as stalls due to cache misses appear to be where most time is spent in our benchmarks. Here are the performance gain of this patch over v4.7-rc7 on a Sandy Bridge box with 32 logical cores and 2 NUMA nodes. The test is repeated with a variable number of threads, from 2 to 4*num_cpus; the results are in seconds and correspond to the average of 10 runs; the percentage gain is computed with (before-after)/before so a positive value is an improvement (it's faster). The improvement varies between a few percents for 5-20 threads and more than 10% for 2 or >20 threads. pound_clock_gettime: threads 4.7-rc7 patched 4.7-rc7 [num] [secs] [secs (percent)] 2 3.48 3.06 ( 11.83%) 5 3.33 3.25 ( 2.40%) 8 3.37 3.26 ( 3.30%) 12 3.32 3.37 ( -1.60%) 21 4.01 3.90 ( 2.74%) 30 3.63 3.36 ( 7.41%) 48 3.71 3.11 ( 16.27%) 79 3.75 3.16 ( 15.74%) 110 3.81 3.25 ( 14.80%) 128 3.88 3.31 ( 14.76%) pound_times: threads 4.7-rc7 patched 4.7-rc7 [num] [secs] [secs (percent)] 2 3.65 3.25 ( 11.03%) 5 3.45 3.17 ( 7.92%) 8 3.52 3.22 ( 8.69%) 12 3.29 3.36 ( -2.04%) 21 4.07 3.92 ( 3.78%) 30 3.87 3.40 ( 12.17%) 48 3.79 3.16 ( 16.61%) 79 3.88 3.28 ( 15.42%) 110 3.90 3.38 ( 13.35%) 128 4.00 3.38 ( 15.45%) pound_clock_gettime and pound_clock_gettime are two benchmarks included in the MMTests framework. They launch a given number of threads which repeatedly call times() or clock_gettimes(). The results above can be reproduced with cloning MMTests from github.com and running the "poundtime" workload: $ git clone https://github.com/gormanm/mmtests.git $ cd mmtests $ cp configs/config-global-dhp__workload_poundtime config $ ./run-mmtests.sh --run-monitor $(uname -r) The above will run "poundtime" measuring the kernel currently running on the machine; Once a new kernel is installed and the machine rebooted, running again $ cd mmtests $ ./run-mmtests.sh --run-monitor $(uname -r) will produce results to compare with. A comparison table will be output with: $ cd mmtests/work/log $ ../../compare-kernels.sh the table will contain a lot of entries; grepping for "Amean" (as in "arithmetic mean") will give the tables presented above. The source code for the two benchmarks is reported at the end of this changelog for clairity. The cache misses addressed by this patch were found using a combination of `perf top`, `perf record` and `perf annotate`. The incriminated lines were found to be struct sched_entity *curr = cfs_rq->curr; and delta_exec = now - curr->exec_start; in the function update_curr() from kernel/sched/fair.c. This patch prefetches the data from memory just before update_curr is called in the interested execution path. A comparison of the total number of cycles before and after the patch follows; the data is obtained using `perf stat -r 10 -ddd ` running over the same sequence of number of threads used above (a positive gain is an improvement): threads cycles before cycles after gain 2 19,699,563,964 +-1.19% 17,358,917,517 +-1.85% 11.88% 5 47,401,089,566 +-2.96% 45,103,730,829 +-0.97% 4.85% 8 80,923,501,004 +-3.01% 71,419,385,977 +-0.77% 11.74% 12 112,326,485,473 +-0.47% 110,371,524,403 +-0.47% 1.74% 21 193,455,574,299 +-0.72% 180,120,667,904 +-0.36% 6.89% 30 315,073,519,013 +-1.64% 271,222,225,950 +-1.29% 13.92% 48 321,969,515,332 +-1.48% 273,353,977,321 +-1.16% 15.10% 79 337,866,003,422 +-0.97% 289,462,481,538 +-1.05% 14.33% 110 338,712,691,920 +-0.78% 290,574,233,170 +-0.77% 14.21% 128 348,384,794,006 +-0.50% 292,691,648,206 +-0.66% 15.99% A comparison of cache miss vs total cache loads ratios, before and after the patch (again from the `perf stat -r 10 -ddd ` tables): threads L1 misses/total*100 L1 misses/total*100 gain before after 2 7.43 +-4.90% 7.36 +-4.70% 0.94% 5 13.09 +-4.74% 13.52 +-3.73% -3.28% 8 13.79 +-5.61% 12.90 +-3.27% 6.45% 12 11.57 +-2.44% 8.71 +-1.40% 24.72% 21 12.39 +-3.92% 9.97 +-1.84% 19.53% 30 13.91 +-2.53% 11.73 +-2.28% 15.67% 48 13.71 +-1.59% 12.32 +-1.97% 10.14% 79 14.44 +-0.66% 13.40 +-1.06% 7.20% 110 15.86 +-0.50% 14.46 +-0.59% 8.83% 128 16.51 +-0.32% 15.06 +-0.78% 8.78% As a final note, the following shows the evolution of performance figures in the "poundtime" benchmark and pinpoints commit 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency") as a major source of degradation, mostly unaddressed to this day (figures expressed in seconds). pound_clock_gettime: threads parent of 6e998916dfe3 4.7-rc7 6e998916dfe3 itself 2 2.23 3.68 ( -64.56%) 3.48 (-55.48%) 5 2.83 3.78 ( -33.42%) 3.33 (-17.43%) 8 2.84 4.31 ( -52.12%) 3.37 (-18.76%) 12 3.09 3.61 ( -16.74%) 3.32 ( -7.17%) 21 3.14 4.63 ( -47.36%) 4.01 (-27.71%) 30 3.28 5.75 ( -75.37%) 3.63 (-10.80%) 48 3.02 6.05 (-100.56%) 3.71 (-22.99%) 79 2.88 6.30 (-118.90%) 3.75 (-30.26%) 110 2.95 6.46 (-119.00%) 3.81 (-29.24%) 128 3.05 6.42 (-110.08%) 3.88 (-27.04%) pound_times: threads parent of 6e998916dfe3 4.7-rc7 6e998916dfe3 itself 2 2.27 3.73 ( -64.71%) 3.65 (-61.14%) 5 2.78 3.77 ( -35.56%) 3.45 (-23.98%) 8 2.79 4.41 ( -57.71%) 3.52 (-26.05%) 12 3.02 3.56 ( -17.94%) 3.29 ( -9.08%) 21 3.10 4.61 ( -48.74%) 4.07 (-31.34%) 30 3.33 5.75 ( -72.53%) 3.87 (-16.01%) 48 2.96 6.06 (-105.04%) 3.79 (-28.10%) 79 2.88 6.24 (-116.83%) 3.88 (-34.81%) 110 2.98 6.37 (-114.08%) 3.90 (-31.12%) 128 3.10 6.35 (-104.61%) 4.00 (-28.87%) The source code of the two benchmarks follows. To compile the two: NR_THREADS=42 for FILE in pound_times pound_clock_gettime; do gcc -lrt -O2 -lpthread -DNUM_THREADS=$NR_THREADS $FILE.c -o $FILE done ==== BEGIN pound_times.c ==== struct tms start; void *pound (void *threadid) { struct tms end; int oldutime = 0; int utime; int i; for (i = 0; i < 5000000 / NUM_THREADS; i++) { times(&end); utime = ((int)end.tms_utime - (int)start.tms_utime); if (oldutime > utime) { printf("utime decreased, was %d, now %d!\n", oldutime, utime); } oldutime = utime; } pthread_exit(NULL); } int main() { pthread_t th[NUM_THREADS]; long i; times(&start); for (i = 0; i < NUM_THREADS; i++) { pthread_create (&th[i], NULL, pound, (void *)i); } pthread_exit(NULL); return 0; } ==== END pound_times.c ==== ==== BEGIN pound_clock_gettime.c ==== void *pound (void *threadid) { struct timespec ts; int rc, i; unsigned long prev = 0, this = 0; for (i = 0; i < 5000000 / NUM_THREADS; i++) { rc = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts); if (rc < 0) perror("clock_gettime"); this = (ts.tv_sec * 1000000000) + ts.tv_nsec; if (0 && this < prev) printf("%lu ns timewarp at iteration %d\n", prev - this, i); prev = this; } pthread_exit(NULL); } int main() { pthread_t th[NUM_THREADS]; long rc, i; pid_t pgid; for (i = 0; i < NUM_THREADS; i++) { rc = pthread_create(&th[i], NULL, pound, (void *)i); if (rc < 0) perror("pthread_create"); } pthread_exit(NULL); return 0; } ==== END pound_clock_gettime.c ==== Suggested-by: Mike Galbraith Change-Id: Iad82d9f31c92e50e1e3b1339892512526ceb0acf Signed-off-by: Giovanni Gherdovich Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Stanislaw Gruszka Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1470385316-15027-2-git-send-email-ggherdovich@suse.cz Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6c3f46e759d2..18afb0fe2704 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -78,6 +78,7 @@ #include #include #include +#include #include #include @@ -3133,6 +3134,23 @@ DEFINE_PER_CPU(struct kernel_cpustat, kernel_cpustat); EXPORT_PER_CPU_SYMBOL(kstat); EXPORT_PER_CPU_SYMBOL(kernel_cpustat); +/* + * The function fair_sched_class.update_curr accesses the struct curr + * and its field curr->exec_start; when called from task_sched_runtime(), + * we observe a high rate of cache misses in practice. + * Prefetching this data results in improved performance. + */ +static inline void prefetch_curr_exec_start(struct task_struct *p) +{ +#ifdef CONFIG_FAIR_GROUP_SCHED + struct sched_entity *curr = (&p->se)->cfs_rq->curr; +#else + struct sched_entity *curr = (&task_rq(p)->cfs)->curr; +#endif + prefetch(curr); + prefetch(&curr->exec_start); +} + /* * Return accounted runtime for the task. * In case the task is currently running, return the runtime plus current's @@ -3167,6 +3185,7 @@ unsigned long long task_sched_runtime(struct task_struct *p) * thread, breaking clock_gettime(). */ if (task_current(rq, p) && task_on_rq_queued(p)) { + prefetch_curr_exec_start(p); update_rq_clock(rq); p->sched_class->update_curr(rq); } -- cgit v1.2.3 From 13e66175965635ed74c948f232a32033230cc5d0 Mon Sep 17 00:00:00 2001 From: Pavankumar Kondeti Date: Thu, 28 Feb 2019 10:40:39 +0530 Subject: cpuset: Restore tasks affinity while moving across cpusets When tasks move across cpusets, the current affinity settings are lost. Cache the task affinity and restore it during cpuset migration. The restoring happens only when the cached affinity is subset of the current cpuset settings. Change-Id: I6c2ec1d5e3d994e176926d94b9e0cc92418020cc Signed-off-by: Pavankumar Kondeti --- kernel/sched/core.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 18afb0fe2704..b33433586774 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4921,6 +4921,9 @@ again: retval = -EINVAL; } + if (!retval && !(p->flags & PF_KTHREAD)) + cpumask_and(&p->cpus_requested, in_mask, cpu_possible_mask); + out_free_new_mask: free_cpumask_var(new_mask); out_free_cpus_allowed: @@ -8344,6 +8347,7 @@ void __init sched_init_smp(void) /* Move init over to a non-isolated CPU */ if (set_cpus_allowed_ptr(current, non_isolated_cpus) < 0) BUG(); + cpumask_copy(¤t->cpus_requested, cpu_possible_mask); sched_init_granularity(); free_cpumask_var(non_isolated_cpus); -- cgit v1.2.3 From b9b6bc6ea3c06ab2edac96db7a3f9e51c9e459d1 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 17 Jun 2017 08:10:08 -0400 Subject: sched: Allow migrating kthreads into online but inactive CPUs Per-cpu workqueues have been tripping CPU affinity sanity checks while a CPU is being offlined. A per-cpu kworker ends up running on a CPU which isn't its target CPU while the CPU is online but inactive. While the scheduler allows kthreads to wake up on an online but inactive CPU, it doesn't allow a running kthread to be migrated to such a CPU, which leads to an odd situation where setting affinity on a sleeping and running kthread leads to different results. Each mem-reclaim workqueue has one rescuer which guarantees forward progress and the rescuer needs to bind itself to the CPU which needs help in making forward progress; however, due to the above issue, while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on the correct CPU if the CPU is in the process of going offline, tripping the sanity check and executing the work item on the wrong CPU. This patch updates __migrate_task() so that kthreads can be migrated into an inactive but online CPU. Change-Id: I38cc3eb3b2ec5b7034cc72a2bcdd32a549314915 Signed-off-by: Tejun Heo Reported-by: "Paul E. McKenney" Reported-by: Steven Rostedt --- kernel/sched/core.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b33433586774..3c64cd08e8e2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1144,8 +1144,13 @@ static struct rq *__migrate_task(struct rq *rq, struct task_struct *p, int dest_ { int src_cpu; - if (unlikely(!cpu_active(dest_cpu))) - return rq; + if (p->flags & PF_KTHREAD) { + if (unlikely(!cpu_online(dest_cpu))) + return ret; + } else { + if (unlikely(!cpu_active(dest_cpu))) + return ret; + } /* Affinity changed (again). */ if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) -- cgit v1.2.3 From c9999f04236e640bc82b2085cadca8487184bc76 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 16 Jun 2016 15:35:04 -0400 Subject: sched/core: Allow kthreads to fall back to online && !active cpus During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is online but not active. A CPU_ONLINE callback may create or bind a kthread so that its cpus_allowed mask only allows the CPU which is being brought online. The kthread may start executing before the CPU is made active and can end up in select_fallback_rq(). In such cases, the expected behavior is selecting the CPU which is coming online; however, because select_fallback_rq() only chooses from active CPUs, it determines that the task doesn't have any viable CPU in its allowed mask and ends up overriding it to cpu_possible_mask. CPU_ONLINE callbacks should be able to put kthreads on the CPU which is coming online. Update select_fallback_rq() so that it follows cpu_online() rather than cpu_active() for kthreads. Reported-by: Gautham R Shenoy Tested-by: Gautham R. Shenoy Change-Id: I562dcc53717b1f2f8324abffb652b91592ba8d5c Signed-off-by: Tejun Heo Signed-off-by: Peter Zijlstra (Intel) Cc: Abdul Haleem Cc: Aneesh Kumar Cc: Linus Torvalds Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: kernel-team@fb.com Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.org Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3c64cd08e8e2..e83959592d6c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1654,9 +1654,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p, bool allow_iso) for (;;) { /* Any allowed, online CPU? */ for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { - if (!cpu_online(dest_cpu)) + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) continue; - if (!cpu_active(dest_cpu)) + if (!cpu_online(dest_cpu)) continue; if (cpu_isolated(dest_cpu)) { if (allow_iso) -- cgit v1.2.3 From 0ffdb886996b66e1eb9bbda6e9605bf7dbca0cd1 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 25 Jul 2017 18:58:21 +0200 Subject: BACKPORT: sched/core: Fix rules for running on online && !active CPUs As already enforced by the WARN() in __set_cpus_allowed_ptr(), the rules for running on an online && !active CPU are stricter than just being a kthread, you need to be a per-cpu kthread. If you're not strictly per-CPU, you have better CPUs to run on and don't need the partially booted one to get your work done. The exception is to allow smpboot threads to bootstrap the CPU itself and get kernel 'services' initialized before we allow userspace on it. Change-Id: I515e873a6e5be0cde7771ecedf56101614300fe2 Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Tejun Heo Cc: Thomas Gleixner Fixes: 955dbdf4ce87 ("sched: Allow migrating kthreads into online but inactive CPUs") Link: http://lkml.kernel.org/r/20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Backported to 4.4 Signed-off-by: joshuous --- kernel/sched/core.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e83959592d6c..f73831486c2f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1085,6 +1085,33 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) } #ifdef CONFIG_SMP + +static inline bool is_per_cpu_kthread(struct task_struct *p) +{ + if (!(p->flags & PF_KTHREAD)) + return false; + + if (p->nr_cpus_allowed != 1) + return false; + + return true; +} + +/* + * Per-CPU kthreads are allowed to run on !actie && online CPUs, see + * __set_cpus_allowed_ptr() and select_fallback_rq(). + */ +static inline bool is_cpu_allowed(struct task_struct *p, int cpu) +{ + if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) + return false; + + if (is_per_cpu_kthread(p)) + return cpu_online(cpu); + + return cpu_active(cpu); +} + /* * This is how migration works: * @@ -1144,16 +1171,8 @@ static struct rq *__migrate_task(struct rq *rq, struct task_struct *p, int dest_ { int src_cpu; - if (p->flags & PF_KTHREAD) { - if (unlikely(!cpu_online(dest_cpu))) - return ret; - } else { - if (unlikely(!cpu_active(dest_cpu))) - return ret; - } - /* Affinity changed (again). */ - if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + if (!is_cpu_allowed(p, dest_cpu)) return rq; src_cpu = cpu_of(rq); @@ -1654,9 +1673,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p, bool allow_iso) for (;;) { /* Any allowed, online CPU? */ for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { - if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) - continue; - if (!cpu_online(dest_cpu)) + if (!is_cpu_allowed(p, dest_cpu)) continue; if (cpu_isolated(dest_cpu)) { if (allow_iso) -- cgit v1.2.3 From db74739c86de566ca88a8affa96d2d59cfbdbb98 Mon Sep 17 00:00:00 2001 From: Pavankumar Kondeti Date: Fri, 14 Sep 2018 13:59:59 +0530 Subject: sched: Don't fail isolation request for an already isolated CPU When isolating a CPU, a check is performed to see if there is only 1 active CPU in the system. If that is the case, the CPU is not isolated. However this check is done before testing if the requested CPU is already isolated or not. If the requested CPU is already isolated, there is no need to fail the isolation even when there is only 1 active CPU in the system. For example, 0-6 CPUs are isolated on a 8 CPU machine. When an isolation request comes for CPU6, which is already isolated, the current code fail the requesting thinking we end up with no active CPU in the system. Change-Id: I28fea4ff67ffed82465e5cfa785414069e4a180a Signed-off-by: Pavankumar Kondeti --- kernel/sched/core.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f73831486c2f..c597f6c61115 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5967,12 +5967,6 @@ int sched_isolate_cpu(int cpu) cpumask_andnot(&avail_cpus, cpu_online_mask, cpu_isolated_mask); - /* We cannot isolate ALL cpus in the system */ - if (cpumask_weight(&avail_cpus) == 1) { - ret_code = -EINVAL; - goto out; - } - if (!cpu_online(cpu)) { ret_code = -EINVAL; goto out; @@ -5981,6 +5975,13 @@ int sched_isolate_cpu(int cpu) if (++cpu_isolation_vote[cpu] > 1) goto out; + /* We cannot isolate ALL cpus in the system */ + if (cpumask_weight(&avail_cpus) == 1) { + --cpu_isolation_vote[cpu]; + ret_code = -EINVAL; + goto out; + } + /* * There is a race between watchdog being enabled by hotplug and * core isolation disabling the watchdog. When a CPU is hotplugged in -- cgit v1.2.3 From c0fa7577022c4169e1aaaf1bd9e04f63d285beb2 Mon Sep 17 00:00:00 2001 From: Ethan Chen Date: Sat, 20 Jan 2018 16:35:53 -0800 Subject: sched/walt: Re-add code to allow WALT to function Change-Id: Ieb1067c5e276f872ed4c722b7d1fabecbdad87e7 --- kernel/sched/core.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c597f6c61115..17c13347d703 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -98,6 +98,7 @@ #define CREATE_TRACE_POINTS #include +#include "walt.h" ATOMIC_NOTIFIER_HEAD(load_alert_notifier_head); @@ -1389,6 +1390,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) p->se.nr_migrations++; perf_event_task_migrate(p); + walt_fixup_busy_time(p, new_cpu); fixup_busy_time(p, new_cpu); } @@ -2155,6 +2157,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags, raw_spin_lock(&rq->lock); old_load = task_load(p); + wallclock = walt_ktime_clock(); + walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0); + walt_update_task_ravg(p, rq, TASK_WAKE, wallclock, 0); wallclock = sched_ktime_clock(); update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0); update_task_ravg(p, rq, TASK_WAKE, wallclock, 0); @@ -2253,6 +2258,11 @@ static void try_to_wake_up_local(struct task_struct *p) update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0); update_task_ravg(p, rq, TASK_WAKE, wallclock, 0); cpufreq_update_util(rq, 0); + + wallclock = walt_ktime_clock(); + + walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0); + walt_update_task_ravg(p, rq, TASK_WAKE, wallclock, 0); ttwu_activate(rq, p, ENQUEUE_WAKEUP); note_task_waking(p, wallclock); } @@ -2385,6 +2395,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) #endif INIT_LIST_HEAD(&p->se.group_node); + walt_init_new_task_load(p); #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq = NULL; @@ -2669,6 +2680,7 @@ void wake_up_new_task(struct task_struct *p) struct rq *rq; add_new_task_to_grp(p); + walt_init_new_task_load(p); raw_spin_lock_irqsave(&p->pi_lock, flags); p->state = TASK_RUNNING; @@ -2687,6 +2699,7 @@ void wake_up_new_task(struct task_struct *p) #endif rq = __task_rq_lock(p); mark_task_starting(p); + walt_mark_task_starting(p); update_rq_clock(rq); post_init_entity_util_avg(&p->se); activate_task(rq, p, ENQUEUE_WAKEUP_NEW); @@ -3235,10 +3248,13 @@ void scheduler_tick(void) raw_spin_lock(&rq->lock); old_load = task_load(curr); + walt_set_window_start(rq); set_window_start(rq); update_rq_clock(rq); curr->sched_class->task_tick(rq, curr, 0); update_cpu_load_active(rq); + walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, + walt_ktime_clock(), 0); calc_global_load_tick(rq); wallclock = sched_ktime_clock(); update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0); @@ -3602,6 +3618,9 @@ static void __sched notrace __schedule(bool preempt) update_rq_clock(rq); next = pick_next_task(rq, prev); + wallclock = walt_ktime_clock(); + walt_update_task_ravg(prev, rq, PUT_PREV_TASK, wallclock, 0); + walt_update_task_ravg(next, rq, PICK_NEXT_TASK, wallclock, 0); clear_tsk_need_resched(prev); clear_preempt_need_resched(); rq->clock_skip_update = 0; @@ -6364,6 +6383,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu) case CPU_UP_PREPARE: raw_spin_lock_irqsave(&rq->lock, flags); + walt_set_window_start(rq); set_window_start(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); rq->calc_load_update = calc_load_update; @@ -6385,6 +6405,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu) sched_ttwu_pending(); /* Update our root-domain */ raw_spin_lock_irqsave(&rq->lock, flags); + walt_migrate_sync_cpu(cpu); if (rq->rd) { BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span)); @@ -8580,6 +8601,11 @@ void __init sched_init(void) } #endif rq->max_idle_balance_cost = sysctl_sched_migration_cost; +#ifdef CONFIG_SCHED_WALT + rq->cur_irqload = 0; + rq->avg_irqload = 0; + rq->irqload_ts = 0; +#endif INIT_LIST_HEAD(&rq->cfs_tasks); -- cgit v1.2.3 From daaa5da96a74d35c64db2952add990213355ab4a Mon Sep 17 00:00:00 2001 From: Prasad Sodagudi Date: Fri, 15 Mar 2019 11:52:48 -0700 Subject: sched: Take irq_sparse lock during the isolation irq_migrate_all_off_this_cpu() is used to migrate IRQs and this function checks for all active irq in the allocated_irqs mask. irq_migrate_all_off_this_cpu() expects the caller to take irq_sparse lock to avoid race conditions while accessing allocated_irqs mask variable. Prevent a race between irq alloc/free and irq migration by adding irq_sparse lock across CPU isolation. Change-Id: I9edece1ecea45297c8f6529952d88b3133046467 Signed-off-by: Prasad Sodagudi --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'kernel/sched/core.c') diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 17c13347d703..61a80b81afcf 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6024,7 +6024,9 @@ int sched_isolate_cpu(int cpu) smp_call_function_any(&avail_cpus, hrtimer_quiesce_cpu, &cpu, 1); smp_call_function_any(&avail_cpus, timer_quiesce_cpu, &cpu, 1); + irq_lock_sparse(); stop_cpus(cpumask_of(cpu), do_isolation_work_cpu_stop, 0); + irq_unlock_sparse(); calc_load_migrate(rq); update_max_interval(); -- cgit v1.2.3