| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch series "HWPOISON: soft offlining for non-lru movable page", v6.
After Minchan's commit bda807d44454 ("mm: migrate: support non-lru
movable page migration"), some type of non-lru page like zsmalloc and
virtio-balloon page also support migration.
Therefore, we can:
1) soft offlining no-lru movable pages, which means when memory
corrected errors occur on a non-lru movable page, we can stop to use
it by migrating data onto another page and disable the original
(maybe half-broken) one.
2) enable memory hotplug for non-lru movable pages, i.e. we may offline
blocks, which include such pages, by using non-lru page migration.
This patchset is heavily dependent on non-lru movable page migration.
This patch (of 4):
Change the return type of isolate_movable_page() from bool to int. It
will return 0 when isolate movable page successfully, and return -EBUSY
when it isolates failed.
There is no functional change within this patch but prepare for later
patch.
[xieyisheng1@huawei.com: v6]
Link: http://lkml.kernel.org/r/1486108770-630-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1485867981-16037-2-git-send-email-ysxie@foxmail.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 9e5bcd610ffcedf5e485e78a72762810b25c7f25
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I0d921d0a4e202397ce2a37909c67aaafd86fb465
Signed-off-by: Arun KS <arunks@codeaurora.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* refs/heads/tmp-5f6325b
Linux 4.4.112
selftests/x86: Add test_vsyscall
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu: Implement CPU vulnerabilites sysfs functions
sysfs/cpu: Add vulnerability folder
x86/cpu: Merge bugs.c and bugs_64.c
x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
x86/cpufeatures: Add X86_BUG_CPU_INSECURE
x86/cpufeatures: Make CPU bugs sticky
x86/cpu: Factor out application of forced CPU caps
x86/Documentation: Add PTI description
e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
uas: ignore UAS for Norelsys NS1068(X) chips
Bluetooth: Prevent stack info leak from the EFS element.
staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
usbip: remove kernel addresses from usb device and urb debug msgs
USB: fix usbmon BUG trigger
usb: misc: usb3503: make sure reset is low for at least 100us
USB: serial: cp210x: add new device ID ELV ALC 8xxx
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
bpf, array: fix overflow in max_entries and undefined behavior in index_mask
bpf: prevent out-of-bounds speculation
bpf: adjust insn_aux_data when patching insns
bpf: refactor fixup_bpf_calls()
bpf: move fixup_bpf_calls() function
bpf: don't (ab)use instructions to store state
bpf: add bpf_patch_insn_single helper
kaiser: Set _PAGE_NX only if supported
drm/vmwgfx: Potential off by one in vmw_view_add()
KVM: x86: Add memory barrier on vmcs field lookup
x86/microcode/intel: Extend BDW late-loading with a revision check
rbd: set max_segments to USHRT_MAX
crypto: algapi - fix NULL dereference in crypto_remove_spawns()
ipv6: fix possible mem leaks in ipv6_make_skb()
net: stmmac: enable EEE in MII, GMII or RGMII only
sh_eth: fix SH7757 GEther initialization
sh_eth: fix TSU resource handling
RDS: null pointer dereference in rds_atomic_free_op
RDS: Heap OOB write in rds_message_alloc_sgs()
net: core: fix module type in sock_diag_bind
ip6_tunnel: disable dst caching if tunnel is dual-stack
8021q: fix a memory leak for VLAN 0 device
x86/pti/efi: broken conversion from efi to kernel page table
Revert "userfaultfd: selftest: vm: allow to build in vm/ directory"
xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
sysrq: Fix warning in sysrq generated crash.
hwrng: core - sleep interruptible in read
x86/mm/pat, /dev/mem: Remove superfluous error message
cx82310_eth: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
lan78xx: use skb_cow_head() to deal with cloned skbs
r8152: adjust ALDPS function
r8152: use test_and_clear_bit
r8152: fix the wake event
usb: musb: ux500: Fix NULL pointer dereference at system PM
usbvision fix overflow of interfaces array
locking/mutex: Allow next waiter lockless wakeup
futex: Replace barrier() in unqueue_me() with READ_ONCE()
locks: don't check for race with close when setting OFD lock
zswap: don't param_set_charp while holding spinlock
mm/zswap: use workqueue to destroy pool
mm/page-writeback: fix dirty_ratelimit calculation
mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page
mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
ALSA: aloop: Fix racy hw constraints adjustment
ALSA: aloop: Fix inconsistent format due to incomplete rule
ALSA: aloop: Release cable upon open error path
ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
ALSA: pcm: Abort properly at pending signal in OSS read/write loops
ALSA: pcm: Add missing error checks in OSS emulation plugin builder
ALSA: pcm: Remove incorrect snd_BUG_ON() usages
iommu/arm-smmu-v3: Don't free page table ops twice
x86/acpi: Handle SCI interrupts above legacy space gracefully
x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
kvm: vmx: Scrub hardware GPRs at VM-exit
net/mac80211/debugfs.c: prevent build failure with CONFIG_UBSAN=y
MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
MIPS: Factor out NT_PRFPREG regset access helpers
MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
IB/srpt: Disable RDMA access by the initiator
can: gs_usb: fix return value of the "set_bittiming" callback
KVM: Fix stack-out-of-bounds read in write_mmio
dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
fscrypt: updates on 4.15-rc4
ANDROID: uid_sys_stats: fix the comment
BACKPORT: optee: fix invalid of_node_put() in optee_driver_init()
BACKPORT: tee: optee: sync with new naming of interrupts
BACKPORT: tee: indicate privileged dev in gen_caps
BACKPORT: tee: optee: interruptible RPC sleep
BACKPORT: tee: optee: add const to tee_driver_ops and tee_desc structures
BACKPORT: tee: tee_shm: Constify dma_buf_ops structures.
BACKPORT: tee: add forward declaration for struct device
BACKPORT: tee: optee: fix uninitialized symbol 'parg'
BACKPORT: tee.txt: standardize document format
BACKPORT: tee: add ARM_SMCCC dependency
BACKPORT: selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables
Conflicts:
security/selinux/nlmsgtab.c
Change-Id: I5770a565f39c321f2305f8228e41f822e3cd0625
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit e1409c325fdc1fef7b3d8025c51892355f065d15 upstream.
pageblock_pfn_to_page() is used to check there is valid pfn and all
pages in the pageblock is in a single zone. If there is a hole in the
pageblock, passing arbitrary position to pageblock_pfn_to_page() could
cause to skip whole pageblock scanning, instead of just skipping the
hole page. For deterministic behaviour, it's better to always pass
pageblock aligned range to pageblock_pfn_to_page(). It will also help
further optimization on pageblock_pfn_to_page() in the following patch.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit 623446e4dc45b37740268165107cc63abb3022f0 upstream.
free_pfn and compact_cached_free_pfn are the pointer that remember
restart position of freepage scanner. When they are reset or invalid,
we set them to zone_end_pfn because freepage scanner works in reverse
direction. But, because zone range is defined as [zone_start_pfn,
zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
not store it to free_pfn and compact_cached_free_pfn. Instead, we need
to store zone_end_pfn - 1 to them. There is one more thing we should
consider. Freepage scanner scan reversely by pageblock unit. If
free_pfn and compact_cached_free_pfn are set to middle of pageblock, it
regards that sitiation as that it already scans front part of pageblock
so we lose opportunity to scan there. To fix-up, this patch do
round_down() to guarantee that reset position will be pageblock aligned.
Note that thanks to the current pageblock_pfn_to_page() implementation,
actual access to zone_end_pfn doesn't happen until now. But, following
patch will change pageblock_pfn_to_page() so this patch is needed from
now on.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch is motivated from Hugh and Vlastimil's concern [1].
There are two ways to get freepage from the allocator. One is using
normal memory allocation API and the other is __isolate_free_page()
which is internally used for compaction and pageblock isolation. Later
usage is rather tricky since it doesn't do whole post allocation
processing done by normal API.
One problematic thing I already know is that poisoned page would not be
checked if it is allocated by __isolate_free_page(). Perhaps, there
would be more.
We could add more debug logic for allocated page in the future and this
separation would cause more problem. I'd like to fix this situation at
this time. Solution is simple. This patch commonize some logic for
newly allocated page and uses it on all sites. This will solve the
problem.
[1] http://marc.info/?i=alpine.LSU.2.11.1604270029350.7066%40eggly.anvils%3E
Change-Id: I601ec8ce8ee4ab76cd408ff2148dd8c73b959fc2
[iamjoonsoo.kim@lge.com: mm-page_alloc-introduce-post-allocation-processing-on-page-allocator-v3]
Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1466150259-27727-9-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 46f24fd857b37bb86ddd5d0ac3d194e984dfdf1c
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[guptap@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It's not necessary to initialized page_owner with holding the zone lock.
It would cause more contention on the zone lock although it's not a big
problem since it is just debug feature. But, it is better than before
so do it. This is also preparation step to use stackdepot in page owner
feature. Stackdepot allocates new pages when there is no reserved space
and holding the zone lock in this case will cause deadlock.
Change-Id: Id96ab8444f194bead3fa4a8ddda30cdcca4ddc9f
Link: http://lkml.kernel.org/r/1464230275-25791-2-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 83358ece26b70f20c0ba2e0e00dc84b0ee24fe6d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[guptap@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We don't need to split freepages with holding the zone lock. It will
cause more contention on zone lock so not desirable.
Change-Id: Ifb1ee4e48e322abb25a9293885f68dfe75afb743
[rientjes@google.com: if __isolate_free_page() fails, avoid adding to freelist so we don't call map_pages() with it]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211447001.43430@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/1464230275-25791-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 66c64223ad4e7a4a9161fcd9606426d9f57227ca
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[guptap@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have squeezed meta data of zspage into first page's descriptor. So,
to get meta data from subpage, we should get first page first of all.
But it makes trouble to implment page migration feature of zsmalloc
because any place where to get first page from subpage can be raced with
first page migration. IOW, first page it got could be stale. For
preventing it, I have tried several approahces but it made code
complicated so finally, I concluded to separate metadata from first
page. Of course, it consumes more memory. IOW, 16bytes per zspage on
32bit at the moment. It means we lost 1% at *worst case*(40B/4096B)
which is not bad I think at the cost of maintenance.
Link: http://lkml.kernel.org/r/1464736881-24886-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 3783689a1aa82ef27a6418b043dd7a077b8330c5
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I2f0c67d7b85ebfe0655b92e13854fdbde5f26f2b
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now, VM has a feature to migrate non-lru movable pages so balloon
doesn't need custom migration hooks in migrate.c and compaction.c.
Instead, this patch implements the page->mapping->a_ops->
{isolate|migrate|putback} functions.
With that, we could remove hooks for ballooning in general migration
functions and make balloon compaction simple.
[akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: b1123ea6d3b3da25af5c8a9d843bd07ab63213f4
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ibf5bc7ffcbfd31e01946729dc5366baa79d7cf5e
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have allowed migration for only LRU pages until now and it was enough
to make high-order pages. But recently, embedded system(e.g., webOS,
android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
have seen several reports about troubles of small high-order allocation.
For fixing the problem, there were several efforts (e,g,. enhance
compaction algorithm, SLUB fallback to 0-order page, reserved memory,
vmalloc and so on) but if there are lots of non-movable pages in system,
their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages with
movable. For the feature, this patch introduces functions related to
migration to address_space_operations as well as some page flags.
If a driver want to make own pages movable, it should define three
functions which are function pointers of struct
address_space_operations.
1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the
page as PG_isolated so concurrent isolation in several CPUs skip the
page for isolation. If a driver cannot isolate the page, it should
return *false*.
Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.
2. int (*migratepage) (struct address_space *mapping,
struct page *newpage, struct page *oldpage, enum migrate_mode);
After isolation, VM calls migratepage of driver with isolated page. The
function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
indicate to the VM the oldpage is no longer movable via
__ClearPageMovable() under page_lock if you migrated the oldpage
successfully and returns 0. If driver cannot migrate the page at the
moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page
migration in a short time because VM interprets -EAGAIN as "temporal
migration failure". On returning any error except -EAGAIN, VM will give
up the page migration without retrying in this time.
Driver shouldn't touch page.lru field VM using in the functions.
3. void (*putback_page)(struct page *);
If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed
page. In this function, driver should put the isolated page back to the
own data structure.
4. non-lru movable page flags
There are two page flags for supporting non-lru movable page.
* PG_movable
Driver should use the below function to make page movable under
page_lock.
void __SetPageMovable(struct page *page, struct address_space *mapping)
It needs argument of address_space for registering migration family
functions which will be called by VM. Exactly speaking, PG_movable is
not a real flag of struct page. Rather than, VM reuses page->mapping's
lower bits to represent it.
#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
so driver shouldn't access page->mapping directly. Instead, driver
should use page_mapping which mask off the low two bits of page->mapping
so it can get right struct address_space.
For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page. As
well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
__ClearPageMovable). But __PageMovable is cheap to catch whether page
is LRU or non-lru movable once the page has been isolated. Because LRU
pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more
expensive checking with lock_page in pfn scanning to select victim.
For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents
sudden destroying of page->mapping.
Driver using __SetPageMovable should clear the flag via
__ClearMovablePage under page_lock before the releasing the page.
* PG_isolated
To prevent concurrent isolation among several CPUs, VM marks isolated
page as PG_isolated under lock_page. So if a CPU encounters PG_isolated
non-lru movable page, it can skip it. Driver doesn't need to manipulate
the flag because VM will set/clear it automatically. Keep in mind that
if driver sees PG_isolated page, it means the page have been isolated by
VM so it shouldn't touch page.lru field. PG_isolated is alias with
PG_reclaim flag so driver shouldn't use the flag for own purpose.
[opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: bda807d4445414e8e77da704f116bb0880fe0c76
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I03380d927fed84c7464bd5f7c4405bef6b265b69
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While testing the kcompactd in my platform 3G MEM only DMA ZONE. I
found the kcompactd never wakeup. It seems the zoneindex has already
minus 1 before. So the traverse here should be <=.
It fixes a regression where kswapd could previously compact, but
kcompactd not. Not a crash fix though.
[akpm@linux-foundation.org: fix kcompactd_do_work() as well, per Hugh]
Link: http://lkml.kernel.org/r/1463659121-84124-1-git-send-email-puck.chen@hisilicon.com
Fixes: accf62422b3a ("mm, kswapd: replace kswapd compaction with waking up kcompactd")
Signed-off-by: Chen Feng <puck.chen@hisilicon.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zhuangluan Su <suzhuangluan@hisilicon.com>
Cc: Yiping Xu <xuyiping@hisilicon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 6cd9dc3e75078ef646076fa63adfb9b85ced0b66
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Id5d2fe71ab7e62b0c063b9ba27b3b112447d1fdc
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Assume memory47 is the last online block left in node1. This will hang:
# echo offline > /sys/devices/system/node/node1/memory47/state
After a couple of minutes, the following pops up in dmesg:
INFO: task bash:957 blocked for more than 120 seconds.
Not tainted 4.6.0-rc6+ #6
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bash D ffff8800b7adbaf8 0 957 951 0x00000000
Call Trace:
schedule+0x35/0x80
schedule_timeout+0x1ac/0x270
wait_for_completion+0xe1/0x120
kthread_stop+0x4f/0x110
kcompactd_stop+0x26/0x40
__offline_pages.constprop.28+0x7e6/0x840
offline_pages+0x11/0x20
memory_block_action+0x73/0x1d0
memory_subsys_offline+0x47/0x60
device_offline+0x86/0xb0
store_mem_state+0xda/0xf0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x37/0x40
kernfs_fop_write+0x11d/0x170
__vfs_write+0x37/0x120
vfs_write+0xa9/0x1a0
SyS_write+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa4
kcompactd is waiting for kcompactd_max_order > 0 when it's woken up to
actually exit. Check kthread_should_stop() to break out of the wait.
Fixes: 698b1b306 ("mm, compaction: introduce kcompactd").
Reported-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 172400c69cb0d0d684b7cd75ac75872b3d7c61a1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ia90162cd5fe4cd502efc6bde3bf70cd227f58899
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Similarly to direct reclaim/compaction, kswapd attempts to combine
reclaim and compaction to attempt making memory allocation of given
order available.
The details differ from direct reclaim e.g. in having high watermark as
a goal. The code involved in kswapd's reclaim/compaction decisions has
evolved to be quite complex.
Testing reveals that it doesn't actually work in at least one scenario,
and closer inspection suggests that it could be greatly simplified
without compromising on the goal (make high-order page available) or
efficiency (don't reclaim too much). The simplification relieas of
doing all compaction in kcompactd, which is simply woken up when high
watermarks are reached by kswapd's reclaim.
The scenario where kswapd compaction doesn't work was found with mmtests
test stress-highalloc configured to attempt order-9 allocations without
direct reclaim, just waking up kswapd. There was no compaction attempt
from kswapd during the whole test. Some added instrumentation shows
what happens:
- balance_pgdat() sets end_zone to Normal, as it's not balanced
- reclaim is attempted on DMA zone, which sets nr_attempted to 99, but
it cannot reclaim anything, so sc.nr_reclaimed is 0
- for zones DMA32 and Normal, kswapd_shrink_zone uses testorder=0, so
it merely checks if high watermarks were reached for base pages.
This is true, so no reclaim is attempted. For DMA, testorder=0
wasn't used, as compaction_suitable() returned COMPACT_SKIPPED
- even though the pgdat_needs_compaction flag wasn't set to false, no
compaction happens due to the condition sc.nr_reclaimed >
nr_attempted being false (as 0 < 99)
- priority-- due to nr_reclaimed being 0, repeat until priority reaches
0 pgdat_balanced() is false as only the small zone DMA appears
balanced (curiously in that check, watermark appears OK and
compaction_suitable() returns COMPACT_PARTIAL, because a lower
classzone_idx is used there)
Now, even if it was decided that reclaim shouldn't be attempted on the
DMA zone, the scenario would be the same, as (sc.nr_reclaimed=0 >
nr_attempted=0) is also false. The condition really should use >= as
the comment suggests. Then there is a mismatch in the check for setting
pgdat_needs_compaction to false using low watermark, while the rest uses
high watermark, and who knows what other subtlety. Hopefully this
demonstrates that this is unsustainable.
Luckily we can simplify this a lot. The reclaim/compaction decisions
make sense for direct reclaim scenario, but in kswapd, our primary goal
is to reach high watermark in order-0 pages. Afterwards we can attempt
compaction just once. Unlike direct reclaim, we don't reclaim extra
pages (over the high watermark), the current code already disallows it
for good reasons.
After this patch, we simply wake up kcompactd to process the pgdat,
after we have either succeeded or failed to reach the high watermarks in
kswapd, which goes to sleep. We pass kswapd's order and classzone_idx,
so kcompactd can apply the same criteria to determine which zones are
worth compacting. Note that we use the classzone_idx from
wakeup_kswapd(), not balanced_classzone_idx which can include higher
zones that kswapd tried to balance too, but didn't consider them in
pgdat_balanced().
Since kswapd now cannot create high-order pages itself, we need to
adjust how it determines the zones to be balanced. The key element here
is adding a "highorder" parameter to zone_balanced, which, when set to
false, makes it consider only order-0 watermark instead of the desired
higher order (this was done previously by kswapd_shrink_zone(), but not
elsewhere). This false is passed for example in pgdat_balanced().
Importantly, wakeup_kswapd() uses true to make sure kswapd and thus
kcompactd are woken up for a high-order allocation failure.
The last thing is to decide what to do with pageblock_skip bitmap
handling. Compaction maintains a pageblock_skip bitmap to record
pageblocks where isolation recently failed. This bitmap can be reset by
three ways:
1) direct compaction is restarting after going through the full deferred cycle
2) kswapd goes to sleep, and some other direct compaction has previously
finished scanning the whole zone and set zone->compact_blockskip_flush.
Note that a successful direct compaction clears this flag.
3) compaction was invoked manually via trigger in /proc
The case 2) is somewhat fuzzy to begin with, but after introducing
kcompactd we should update it. The check for direct compaction in 1),
and to set the flush flag in 2) use current_is_kswapd(), which doesn't
work for kcompactd. Thus, this patch adds bool direct_compaction to
compact_control to use in 2). For the case 1) we remove the check
completely - unlike the former kswapd compaction, kcompactd does use the
deferred compaction functionality, so flushing tied to restarting from
deferred compaction makes sense here.
Note that when kswapd goes to sleep, kcompactd is woken up, so it will
see the flushed pageblock_skip bits. This is different from when the
former kswapd compaction observed the bits and I believe it makes more
sense. Kcompactd can afford to be more thorough than a direct
compaction trying to limit allocation latency, or kswapd whose primary
goal is to reclaim.
For testing, I used stress-highalloc configured to do order-9
allocations with GFP_NOWAIT|__GFP_HIGH|__GFP_COMP, so they relied just
on kswapd/kcompactd reclaim/compaction (the interfering kernel builds in
phases 1 and 2 work as usual):
stress-highalloc
4.5-rc1+before 4.5-rc1+after
-nodirect -nodirect
Success 1 Min 1.00 ( 0.00%) 5.00 (-66.67%)
Success 1 Mean 1.40 ( 0.00%) 6.20 (-55.00%)
Success 1 Max 2.00 ( 0.00%) 7.00 (-16.67%)
Success 2 Min 1.00 ( 0.00%) 5.00 (-66.67%)
Success 2 Mean 1.80 ( 0.00%) 6.40 (-52.38%)
Success 2 Max 3.00 ( 0.00%) 7.00 (-16.67%)
Success 3 Min 34.00 ( 0.00%) 62.00 ( 1.59%)
Success 3 Mean 41.80 ( 0.00%) 63.80 ( 1.24%)
Success 3 Max 53.00 ( 0.00%) 65.00 ( 2.99%)
User 3166.67 3181.09
System 1153.37 1158.25
Elapsed 1768.53 1799.37
4.5-rc1+before 4.5-rc1+after
-nodirect -nodirect
Direct pages scanned 32938 32797
Kswapd pages scanned 2183166 2202613
Kswapd pages reclaimed 2152359 2143524
Direct pages reclaimed 32735 32545
Percentage direct scans 1% 1%
THP fault alloc 579 612
THP collapse alloc 304 316
THP splits 0 0
THP fault fallback 793 778
THP collapse fail 11 16
Compaction stalls 1013 1007
Compaction success 92 67
Compaction failures 920 939
Page migrate success 238457 721374
Page migrate failure 23021 23469
Compaction pages isolated 504695 1479924
Compaction migrate scanned 661390 8812554
Compaction free scanned 13476658 84327916
Compaction cost 262 838
After this patch we see improvements in allocation success rate
(especially for phase 3) along with increased compaction activity. The
compaction stalls (direct compaction) in the interfering kernel builds
(probably THP's) also decreased somewhat thanks to kcompactd activity,
yet THP alloc successes improved a bit.
Note that elapsed and user time isn't so useful for this benchmark,
because of the background interference being unpredictable. It's just
to quickly spot some major unexpected differences. System time is
somewhat more useful and that didn't increase.
Also (after adjusting mmtests' ftrace monitor):
Time kswapd awake 2547781 2269241
Time kcompactd awake 0 119253
Time direct compacting 939937 557649
Time kswapd compacting 0 0
Time kcompactd compacting 0 119099
The decrease of overal time spent compacting appears to not match the
increased compaction stats. I suspect the tasks get rescheduled and
since the ftrace monitor doesn't see that, the reported time is wall
time, not CPU time. But arguably direct compactors care about overall
latency anyway, whether busy compacting or waiting for CPU doesn't
matter. And that latency seems to almost halved.
It's also interesting how much time kswapd spent awake just going
through all the priorities and failing to even try compacting, over and
over.
We can also configure stress-highalloc to perform both direct
reclaim/compaction and wakeup kswapd/kcompactd, by using
GFP_KERNEL|__GFP_HIGH|__GFP_COMP:
stress-highalloc
4.5-rc1+before 4.5-rc1+after
-direct -direct
Success 1 Min 4.00 ( 0.00%) 9.00 (-50.00%)
Success 1 Mean 8.00 ( 0.00%) 10.00 (-19.05%)
Success 1 Max 12.00 ( 0.00%) 11.00 ( 15.38%)
Success 2 Min 4.00 ( 0.00%) 9.00 (-50.00%)
Success 2 Mean 8.20 ( 0.00%) 10.00 (-16.28%)
Success 2 Max 13.00 ( 0.00%) 11.00 ( 8.33%)
Success 3 Min 75.00 ( 0.00%) 74.00 ( 1.33%)
Success 3 Mean 75.60 ( 0.00%) 75.20 ( 0.53%)
Success 3 Max 77.00 ( 0.00%) 76.00 ( 0.00%)
User 3344.73 3246.04
System 1194.24 1172.29
Elapsed 1838.04 1836.76
4.5-rc1+before 4.5-rc1+after
-direct -direct
Direct pages scanned 125146 120966
Kswapd pages scanned 2119757 2135012
Kswapd pages reclaimed 2073183 2108388
Direct pages reclaimed 124909 120577
Percentage direct scans 5% 5%
THP fault alloc 599 652
THP collapse alloc 323 354
THP splits 0 0
THP fault fallback 806 793
THP collapse fail 17 16
Compaction stalls 2457 2025
Compaction success 906 518
Compaction failures 1551 1507
Page migrate success 2031423 2360608
Page migrate failure 32845 40852
Compaction pages isolated 4129761 4802025
Compaction migrate scanned 11996712 21750613
Compaction free scanned 214970969 344372001
Compaction cost 2271 2694
In this scenario, this patch doesn't change the overall success rate as
direct compaction already tries all it can. There's however significant
reduction in direct compaction stalls (that is, the number of
allocations that went into direct compaction). The number of successes
(i.e. direct compaction stalls that ended up with successful
allocation) is reduced by the same number. This means the offload to
kcompactd is working as expected, and direct compaction is reduced
either due to detecting contention, or compaction deferred by kcompactd.
In the previous version of this patchset there was some apparent
reduction of success rate, but the changes in this version (such as
using sync compaction only), new baseline kernel, and/or averaging
results from 5 executions (my bet), made this go away.
Ftrace-based stats seem to roughly agree:
Time kswapd awake 2532984 2326824
Time kcompactd awake 0 257916
Time direct compacting 864839 735130
Time kswapd compacting 0 0
Time kcompactd compacting 0 257585
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: accf62422b3a67fce8ce086aa81c8300ddbf42be
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I5cc784ff0fb1b28bdcfa8d4ef9bd33f585ee4662
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 698b1b30642f1ff0ea10ef1de9745ab633031377
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I987ae548cba936987b8479dc02de67d0f88b9cb6
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* v4.4-16.09-android-tmp:
unsafe_[get|put]_user: change interface to use a error target label
usercopy: remove page-spanning test for now
usercopy: fix overlap check for kernel text
mm/slub: support left redzone
Linux 4.4.21
lib/mpi: mpi_write_sgl(): fix skipping of leading zero limbs
regulator: anatop: allow regulator to be in bypass mode
hwrng: exynos - Disable runtime PM on probe failure
cpufreq: Fix GOV_LIMITS handling for the userspace governor
metag: Fix atomic_*_return inline asm constraints
scsi: fix upper bounds check of sense key in scsi_sense_key_string()
ALSA: timer: fix NULL pointer dereference on memory allocation failure
ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE
ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
ALSA: hda - Enable subwoofer on Dell Inspiron 7559
ALSA: hda - Add headset mic quirk for Dell Inspiron 5468
ALSA: rawmidi: Fix possible deadlock with virmidi registration
ALSA: fireworks: accessing to user space outside spinlock
ALSA: firewire-tascam: accessing to user space outside spinlock
ALSA: usb-audio: Add sample rate inquiry quirk for B850V3 CP2114
crypto: caam - fix IV loading for authenc (giv)decryption
uprobes: Fix the memcg accounting
x86/apic: Do not init irq remapping if ioapic is disabled
vhost/scsi: fix reuse of &vq->iov[out] in response
bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
ubifs: Fix assertion in layout_in_gaps()
ovl: fix workdir creation
ovl: listxattr: use strnlen()
ovl: remove posix_acl_default from workdir
ovl: don't copy up opaqueness
wrappers for ->i_mutex access
lustre: remove unused declaration
timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
timekeeping: Cap array access in timekeeping_debug
xfs: fix superblock inprogress check
ASoC: atmel_ssc_dai: Don't unconditionally reset SSC on stream startup
drm/msm: fix use of copy_from_user() while holding spinlock
drm: Reject page_flip for !DRIVER_MODESET
drm/radeon: fix radeon_move_blit on 32bit systems
s390/sclp_ctl: fix potential information leak with /dev/sclp
rds: fix an infoleak in rds_inc_info_copy
powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
nvme: Call pci_disable_device on the error path.
cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
block: make sure a big bio is split into at most 256 bvecs
block: Fix race triggered by blk_set_queue_dying()
ext4: avoid modifying checksum fields directly during checksum verification
ext4: avoid deadlock when expanding inode size
ext4: properly align shifted xattrs when expanding inodes
ext4: fix xattr shifting when expanding inodes part 2
ext4: fix xattr shifting when expanding inodes
ext4: validate that metadata blocks do not overlap superblock
net: Use ns_capable_noaudit() when determining net sysctl permissions
kernel: Add noaudit variant of ns_capable()
KEYS: Fix ASN.1 indefinite length object parsing
drivers:hv: Lock access to hyperv_mmio resource tree
cxlflash: Move to exponential back-off when cmd_room is not available
netfilter: x_tables: check for size overflow
drm/amdgpu/cz: enable/disable vce dpm even if vce pg is disabled
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
IB/IPoIB: Do not set skb truesize since using one linearskb
udp: properly support MSG_PEEK with truncated buffers
crypto: nx-842 - Mask XERS0 bit in return value
cxlflash: Fix to avoid virtual LUN failover failure
cxlflash: Fix to escalate LINK_RESET also on port 1
tipc: fix nl compat regression for link statistics
tipc: fix an infoleak in tipc_nl_compat_link_dump
netfilter: x_tables: check for size overflow
Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b]
drm/i915: Check VBT for port presence in addition to the strap on VLV/CHV
drm/i915: Only ignore eDP ports that are connected
Input: xpad - move pending clear to the correct location
net: thunderx: Fix link status reporting
x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances
crypto: vmx - IV size failing on skcipher API
tda10071: Fix dependency to REGMAP_I2C
crypto: vmx - Fix ABI detection
crypto: vmx - comply with ABIs that specify vrsave as reserved.
HID: core: prevent out-of-bound readings
lpfc: Fix DMA faults observed upon plugging loopback connector
block: fix blk_rq_get_max_sectors for driver private requests
irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
clocksource: Allow unregistering the watchdog
btrfs: Continue write in case of can_not_nocow
blk-mq: End unstarted requests on dying queue
cxlflash: Fix to resolve dead-lock during EEH recovery
drm/radeon/mst: fix regression in lane/link handling.
ecryptfs: fix handling of directory opening
ALSA: hda: add AMD Polaris-10/11 AZ PCI IDs with proper driver caps
drm: Balance error path for GEM handle allocation
ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow
Input: xpad - correctly handle concurrent LED and FF requests
net: thunderx: Fix receive packet stats
net: thunderx: Fix for multiqset not configured upon interface toggle
perf/x86/cqm: Fix CQM memory leak and notifier leak
perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
s390/crypto: provide correct file mode at device register.
proc: revert /proc/<pid>/maps [stack:TID] annotation
intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
cxlflash: Fix to avoid unnecessary scan with internal LUNs
Drivers: hv: vmbus: don't manipulate with clocksources on crash
Drivers: hv: vmbus: avoid scheduling in interrupt context in vmbus_initiate_unload()
Drivers: hv: vmbus: avoid infinite loop in init_vp_index()
arcmsr: fixes not release allocated resource
arcmsr: fixed getting wrong configuration data
s390/pci_dma: fix DMA table corruption with > 4 TB main memory
net/mlx5e: Don't modify CQ before it was created
net/mlx5e: Don't try to modify CQ moderation if it is not supported
mmc: sdhci: Do not BUG on invalid vdd
UVC: Add support for R200 depth camera
sched/numa: Fix use-after-free bug in the task_numa_compare
ALSA: hda - add codec support for Kabylake display audio codec
drm/i915: Fix hpd live status bits for g4x
tipc: fix nullptr crash during subscription cancel
arm64: Add workaround for Cavium erratum 27456
net: thunderx: Fix for Qset error due to CQ full
drm/radeon: fix dp link rate selection (v2)
drm/amdgpu: fix dp link rate selection (v2)
qla2xxx: Use ATIO type to send correct tmr response
mmc: sdhci: 64-bit DMA actually has 4-byte alignment
drm/atomic: Do not unset crtc when an encoder is stolen
drm/i915/skl: Add missing SKL ids
drm/i915/bxt: update list of PCIIDs
hrtimer: Catch illegal clockids
i40e/i40evf: Fix RSS rx-flow-hash configuration through ethtool
mpt3sas: Fix for Asynchronous completion of timedout IO and task abort of timedout IO.
mpt3sas: A correction in unmap_resources
net: cavium: liquidio: fix check for in progress flag
arm64: KVM: Configure TCR_EL2.PS at runtime
irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor
pwm: lpc32xx: fix and simplify duty cycle and period calculations
pwm: lpc32xx: correct number of PWM channels from 2 to 1
pwm: fsl-ftm: Fix clock enable/disable when using PM
megaraid_sas: Add an i/o barrier
megaraid_sas: Fix SMAP issue
megaraid_sas: Do not allow PCI access during OCR
s390/cio: update measurement characteristics
s390/cio: ensure consistent measurement state
s390/cio: fix measurement characteristics memleak
qeth: initialize net_device with carrier off
lpfc: Fix external loopback failure.
lpfc: Fix mbox reuse in PLOGI completion
lpfc: Fix RDP Speed reporting.
lpfc: Fix crash in fcp command completion path.
lpfc: Fix driver crash when module parameter lpfc_fcp_io_channel set to 16
lpfc: Fix RegLogin failed error seen on Lancer FC during port bounce
lpfc: Fix the FLOGI discovery logic to comply with T11 standards
lpfc: Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.
cxl: Enable PCI device ID for future IBM CXL adapter
cxl: fix build for GCC 4.6.x
cxlflash: Enable device id for future IBM CXL adapter
cxlflash: Resolve oops in wait_port_offline
cxlflash: Fix to resolve cmd leak after host reset
cxl: Fix DSI misses when the context owning task exits
cxl: Fix possible idr warning when contexts are released
Drivers: hv: vmbus: fix rescind-offer handling for device without a driver
Drivers: hv: vmbus: serialize process_chn_event() and vmbus_close_internal()
Drivers: hv: vss: run only on supported host versions
drivers/hv: cleanup synic msrs if vmbus connect failed
Drivers: hv: util: catch allocation errors
tools: hv: report ENOSPC errors in hv_fcopy_daemon
Drivers: hv: utils: run polling callback always in interrupt context
Drivers: hv: util: Increase the timeout for util services
lightnvm: fix missing grown bad block type
lightnvm: fix locking and mempool in rrpc_lun_gc
lightnvm: unlock rq and free ppa_list on submission fail
lightnvm: add check after mempool allocation
lightnvm: fix incorrect nr_free_blocks stat
lightnvm: fix bio submission issue
cxlflash: a couple off by one bugs
fm10k: Cleanup exception handling for mailbox interrupt
fm10k: Cleanup MSI-X interrupts in case of failure
fm10k: reinitialize queuing scheme after calling init_hw
fm10k: always check init_hw for errors
fm10k: reset max_queues on init_hw_vf failure
fm10k: Fix handling of NAPI budget when multiple queues are enabled per vector
fm10k: Correct MTU for jumbo frames
fm10k: do not assume VF always has 1 queue
clk: xgene: Fix divider with non-zero shift value
e1000e: fix division by zero on jumbo MTUs
e1000: fix data race between tx_ring->next_to_clean
ixgbe: Fix handling of NAPI budget when multiple queues are enabled per vector
igb: fix NULL derefs due to skipped SR-IOV enabling
igb: use the correct i210 register for EEMNGCTL
igb: don't unmap NULL hw_addr
i40e: Fix Rx hash reported to the stack by our driver
i40e: clean whole mac filter list
i40evf: check rings before freeing resources
i40e: don't add zero MAC filter
i40e: properly delete VF MAC filters
i40e: Fix memory leaks, sideband filter programming
i40e: fix: do not sleep in netdev_ops
i40e/i40evf: Fix RS bit update in Tx path and disable force WB workaround
i40evf: handle many MAC filters correctly
i40e: Workaround fix for mss < 256 issue
UPSTREAM: audit: fix a double fetch in audit_log_single_execve_arg()
UPSTREAM: ARM: 8494/1: mm: Enable PXN when running non-LPAE kernel on LPAE processor
FIXUP: sched/tune: update accouting before CPU capacity
FIXUP: sched/tune: add fixes missing from a previous patch
arm: Fix #if/#ifdef typo in topology.c
arm: Fix build error "conflicting types for 'scale_cpu_capacity'"
sched/walt: use do_div instead of division operator
DEBUG: cpufreq: fix cpu_capacity tracing build for non-smp systems
sched/walt: include missing header for arm_timer_read_counter()
cpufreq: Kconfig: Fixup incorrect selection by CPU_FREQ_DEFAULT_GOV_SCHED
sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats()
FIXUP: sched: scheduler-driven cpu frequency selection
sched/rt: Add Kconfig option to enable panicking for RT throttling
sched/rt: print RT tasks when RT throttling is activated
UPSTREAM: sched: Fix a race between __kthread_bind() and sched_setaffinity()
sched/fair: Favor higher cpus only for boosted tasks
vmstat: make vmstat_updater deferrable again and shut down on idle
sched/fair: call OPP update when going idle after migration
sched/cpufreq_sched: fix thermal capping events
sched/fair: Picking cpus with low OPPs for tasks that prefer idle CPUs
FIXUP: sched/tune: do initialization as a postcore_initicall
DEBUG: sched: add tracepoint for RD overutilized
sched/tune: Introducing a new schedtune attribute prefer_idle
sched: use util instead of capacity to select busy cpu
arch_timer: add error handling when the MPM global timer is cleared
FIXUP: sched: Fix double-release of spinlock in move_queued_task
FIXUP: sched/fair: Fix hang during suspend in sched_group_energy
FIXUP: sched: fix SchedFreq integration for both PELT and WALT
sched: EAS: Avoid causing spikes to max-freq unnecessarily
FIXUP: sched: fix set_cfs_cpu_capacity when WALT is in use
sched/walt: Accounting for number of irqs pending on each core
sched: Introduce Window Assisted Load Tracking (WALT)
sched/tune: fix PB and PC cuts indexes definition
sched/fair: optimize idle cpu selection for boosted tasks
FIXUP: sched/tune: fix accounting for runnable tasks
sched/tune: use a single initialisation function
sched/{fair,tune}: simplify fair.c code
FIXUP: sched/tune: fix payoff calculation for boost region
sched/tune: Add support for negative boost values
FIX: sched/tune: move schedtune_nornalize_energy into fair.c
FIX: sched/tune: update usage of boosted task utilisation on CPU selection
sched/fair: add tunable to set initial task load
sched/fair: add tunable to force selection at cpu granularity
sched: EAS: take cstate into account when selecting idle core
sched/cpufreq_sched: Consolidated update
FIXUP: sched: fix build for non-SMP target
DEBUG: sched/tune: add tracepoint on P-E space filtering
DEBUG: sched/tune: add tracepoint for energy_diff() values
DEBUG: sched/tune: add tracepoint for task boost signal
arm: topology: Define TC2 energy and provide it to the scheduler
CHROMIUM: sched: update the average of nr_running
DEBUG: schedtune: add tracepoint for schedtune_tasks_update() values
DEBUG: schedtune: add tracepoint for CPU boost signal
DEBUG: schedtune: add tracepoint for SchedTune configuration update
DEBUG: sched: add energy procfs interface
DEBUG: sched,cpufreq: add cpu_capacity change tracepoint
DEBUG: sched: add tracepoint for CPU load/util signals
DEBUG: sched: add tracepoint for task load/util signals
DEBUG: sched: add tracepoint for cpu/freq scale invariance
sched/fair: filter energy_diff() based on energy_payoff value
sched/tune: add support to compute normalized energy
sched/fair: keep track of energy/capacity variations
sched/fair: add boosted task utilization
sched/{fair,tune}: track RUNNABLE tasks impact on per CPU boost value
sched/tune: compute and keep track of per CPU boost value
sched/tune: add initial support for CGroups based boosting
sched/fair: add boosted CPU usage
sched/fair: add function to convert boost value into "margin"
sched/tune: add sysctl interface to define a boost value
sched/tune: add detailed documentation
fixup! sched/fair: jump to max OPP when crossing UP threshold
fixup! sched: scheduler-driven cpu frequency selection
sched: rt scheduler sets capacity requirement
sched: deadline: use deadline bandwidth in scale_rt_capacity
sched: remove call of sched_avg_update from sched_rt_avg_update
sched/cpufreq_sched: add trace events
sched/fair: jump to max OPP when crossing UP threshold
sched/fair: cpufreq_sched triggers for load balancing
sched/{core,fair}: trigger OPP change request on fork()
sched/fair: add triggers for OPP change requests
sched: scheduler-driven cpu frequency selection
cpufreq: introduce cpufreq_driver_is_slow
sched: Consider misfit tasks when load-balancing
sched: Add group_misfit_task load-balance type
sched: Add per-cpu max capacity to sched_group_capacity
sched: Do eas idle balance regardless of the rq avg idle value
arm64: Enable max freq invariant scheduler load-tracking and capacity support
arm: Enable max freq invariant scheduler load-tracking and capacity support
sched: Update max cpu capacity in case of max frequency constraints
cpufreq: Max freq invariant scheduler load-tracking and cpu capacity support
arm64, topology: Updates to use DT bindings for EAS costing data
sched: Support for extracting EAS energy costs from DT
Documentation: DT bindings for energy model cost data required by EAS
sched: Disable energy-unfriendly nohz kicks
sched: Consider a not over-utilized energy-aware system as balanced
sched: Energy-aware wake-up task placement
sched: Determine the current sched_group idle-state
sched, cpuidle: Track cpuidle state index in the scheduler
sched: Add over-utilization/tipping point indicator
sched: Estimate energy impact of scheduling decisions
sched: Extend sched_group_energy to test load-balancing decisions
sched: Calculate energy consumption of sched_group
sched: Highest energy aware balancing sched_domain level pointer
sched: Relocated cpu_util() and change return type
sched: Compute cpu capacity available at current frequency
arm64: Cpu invariant scheduler load-tracking and capacity support
arm: Cpu invariant scheduler load-tracking and capacity support
sched: Introduce SD_SHARE_CAP_STATES sched_domain flag
sched: Initialize energy data structures
sched: Introduce energy data structures
sched: Make energy awareness a sched feature
sched: Documentation for scheduler energy cost model
sched: Prevent unnecessary active balance of single task in sched group
sched: Enable idle balance to pull single task towards cpu with higher capacity
sched: Consider spare cpu capacity at task wake-up
sched: Add cpu capacity awareness to wakeup balancing
sched: Store system-wide maximum cpu capacity in root domain
arm: Update arch_scale_cpu_capacity() to reflect change to define
arm64: Enable frequency invariant scheduler load-tracking support
arm: Enable frequency invariant scheduler load-tracking support
cpufreq: Frequency invariant scheduler load-tracking support
sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
FROMLIST: pstore: drop pmsg bounce buffer
UPSTREAM: usercopy: remove page-spanning test for now
UPSTREAM: usercopy: force check_object_size() inline
BACKPORT: usercopy: fold builtin_const check into inline function
UPSTREAM: x86/uaccess: force copy_*_user() to be inlined
UPSTREAM: HID: core: prevent out-of-bound readings
Android: Fix build breakages.
UPSTREAM: tty: Prevent ldisc drivers from re-using stale tty fields
UPSTREAM: netfilter: nfnetlink: correctly validate length of batch messages
cpuset: Make cpusets restore on hotplug
UPSTREAM: mm/slub: support left redzone
UPSTREAM: Make the hardened user-copy code depend on having a hardened allocator
Android: MMC/UFS IO Latency Histograms.
UPSTREAM: usercopy: fix overlap check for kernel text
UPSTREAM: usercopy: avoid potentially undefined behavior in pointer math
UPSTREAM: unsafe_[get|put]_user: change interface to use a error target label
BACKPORT: arm64: mm: fix location of _etext
BACKPORT: ARM: 8583/1: mm: fix location of _etext
BACKPORT: Don't show empty tag stats for unprivileged uids
UPSTREAM: tcp: fix use after free in tcp_xmit_retransmit_queue()
ANDROID: base-cfg: drop SECCOMP_FILTER config
UPSTREAM: [media] xc2028: unlock on error in xc2028_set_config()
UPSTREAM: [media] xc2028: avoid use after free
ANDROID: base-cfg: enable SECCOMP config
ANDROID: rcu_sync: Export rcu_sync_lockdep_assert
RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()
RFC: FROMLIST: locking/percpu-rwsem: Optimize readers and reduce global impact
net: ipv6: Fix ping to link-local addresses.
ipv6: fix endianness error in icmpv6_err
ANDROID: dm: android-verity: Allow android-verity to be compiled as an independent module
backporting: a brief introduce of backported feautures on 4.4
Linux 4.4.20
sysfs: correctly handle read offset on PREALLOC attrs
hwmon: (iio_hwmon) fix memory leak in name attribute
ALSA: line6: Fix POD sysfs attributes segfault
ALSA: line6: Give up on the lock while URBs are released.
ALSA: line6: Remove double line6_pcm_release() after failed acquire.
ACPI / SRAT: fix SRAT parsing order with both LAPIC and X2APIC present
ACPI / sysfs: fix error code in get_status()
ACPI / drivers: replace acpi_probe_lock spinlock with mutex
ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
staging: comedi: ni_mio_common: fix wrong insn_write handler
staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
staging: comedi: comedi_test: fix timer race conditions
staging: comedi: daqboard2000: bug fix board type matching code
USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
USB: serial: option: add WeTelecom WM-D200
USB: serial: mos7840: fix non-atomic allocation in write path
USB: serial: mos7720: fix non-atomic allocation in write path
USB: fix typo in wMaxPacketSize validation
usb: chipidea: udc: don't touch DP when controller is in host mode
USB: avoid left shift by -1
dmaengine: usb-dmac: check CHCR.DE bit in usb_dmac_isr_channel()
crypto: qat - fix aes-xts key sizes
crypto: nx - off by one bug in nx_of_update_msc()
Input: i8042 - set up shared ps2_cmd_mutex for AUX ports
Input: i8042 - break load dependency between atkbd/psmouse and i8042
Input: tegra-kbc - fix inverted reset logic
btrfs: properly track when rescan worker is running
btrfs: waiting on qgroup rescan should not always be interruptible
fs/seq_file: fix out-of-bounds read
gpio: Fix OF build problem on UM
usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe()
megaraid_sas: Fix probing cards without io port
mpt3sas: Fix resume on WarpDrive flash cards
cdc-acm: fix wrong pipe type on rx interrupt xfers
i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
mfd: cros_ec: Add cros_ec_cmd_xfer_status() helper
aacraid: Check size values after double-fetch from user
ARC: Elide redundant setup of DMA callbacks
ARC: Call trace_hardirqs_on() before enabling irqs
ARC: use correct offset in pt_regs for saving/restoring user mode r25
ARC: build: Better way to detect ISA compatible toolchain
drm/i915: fix aliasing_ppgtt leak
drm/amdgpu: record error code when ring test failed
drm/amd/amdgpu: sdma resume fail during S4 on CI
drm/amdgpu: skip TV/CV in display parsing
drm/amdgpu: avoid a possible array overflow
drm/amdgpu: fix amdgpu_move_blit on 32bit systems
drm/amdgpu: Change GART offset to 64-bit
iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
sched/nohz: Fix affine unpinned timers mess
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
of: fix reference counting in of_graph_get_endpoint_by_regs
arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
mac80211: fix purging multicast PS buffer queue
s390/dasd: fix hanging device after clear subchannel
EDAC: Increment correct counter in edac_inc_ue_error()
pinctrl/amd: Remove the default de-bounce time
iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass
iommu/arm-smmu: Fix CMDQ error handling
iommu/dma: Don't put uninitialised IOVA domains
xhci: Make sure xhci handles USB_SPEED_SUPER_PLUS devices.
USB: serial: ftdi_sio: add PIDs for Ivium Technologies devices
USB: serial: ftdi_sio: add device ID for WICED USB UART dev board
USB: serial: option: add support for Telit LE920A4
USB: serial: option: add D-Link DWM-156/A3
USB: serial: fix memleak in driver-registration error path
xhci: don't dereference a xhci member after removing xhci
usb: xhci: Fix panic if disconnect
xhci: always handle "Command Ring Stopped" events
usb/gadget: fix gadgetfs aio support.
usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
USB: validate wMaxPacketValue entries in endpoint descriptors
usb: renesas_usbhs: Use dmac only if the pipe type is bulk
usb: renesas_usbhs: clear the BRDYSTS in usbhsg_ep_enable()
USB: hub: change the locking in hub_activate
USB: hub: fix up early-exit pathway in hub_activate
usb: hub: Fix unbalanced reference count/memory leak/deadlocks
usb: define USB_SPEED_SUPER_PLUS speed for SuperSpeedPlus USB3.1 devices
usb: dwc3: gadget: increment request->actual once
usb: dwc3: pci: add Intel Kabylake PCI ID
usb: misc: usbtest: add fix for driver hang
usb: ehci: change order of register cleanup during shutdown
crypto: caam - defer aead_set_sh_desc in case of zero authsize
crypto: caam - fix echainiv(authenc) encrypt shared descriptor
crypto: caam - fix non-hmac hashes
genirq/msi: Make sure PCI MSIs are activated early
genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
um: Don't discard .text.exit section
ACPI / CPPC: Prevent cpc_desc_ptr points to the invalid data
ACPI: CPPC: Return error if _CPC is invalid on a CPU
mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
PCI: Limit config space size for Netronome NFP4000
PCI: Add Netronome NFP4000 PF device ID
PCI: Limit config space size for Netronome NFP6000 family
PCI: Add Netronome vendor and device IDs
PCI: Support PCIe devices with short cfg_size
NVMe: Don't unmap controller registers on reset
ALSA: hda - Manage power well properly for resume
libnvdimm, nd_blk: mask off reserved status bits
perf intel-pt: Fix occasional decoding errors when tracing system-wide
vfio/pci: Fix NULL pointer oops in error interrupt setup handling
virtio: fix memory leak in virtqueue_add()
parisc: Fix order of EREFUSED define in errno.h
arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
ALSA: usb-audio: Add quirk for ELP HD USB Camera
ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610)
powerpc/eeh: eeh_pci_enable(): fix checking of post-request state
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Handle EADDRNOTAVAIL on connection failures
tools/testing/nvdimm: fix SIGTERM vs hotplug crash
uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
x86/mm: Disable preemption during CR3 read+write
hugetlb: fix nr_pmds accounting with shared page tables
mm: SLUB hardened usercopy support
mm: SLAB hardened usercopy support
s390/uaccess: Enable hardened usercopy
sparc/uaccess: Enable hardened usercopy
powerpc/uaccess: Enable hardened usercopy
ia64/uaccess: Enable hardened usercopy
arm64/uaccess: Enable hardened usercopy
ARM: uaccess: Enable hardened usercopy
x86/uaccess: Enable hardened usercopy
x86: remove more uaccess_32.h complexity
x86: remove pointless uaccess_32.h complexity
x86: fix SMAP in 32-bit environments
Use the new batched user accesses in generic user string handling
Add 'unsafe' user access functions for batched accesses
x86: reorganize SMAP handling in user space accesses
mm: Hardened usercopy
mm: Implement stack frame object validation
mm: Add is_migrate_cma_page
Linux 4.4.19
Documentation/module-signing.txt: Note need for version info if reusing a key
module: Invalidate signatures on force-loaded modules
dm flakey: error READ bios during the down_interval
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()
ACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx
x86/platform/intel_mid_pci: Rework IRQ0 workaround
PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset
MIPS: hpet: Increase HPET_MIN_PROG_DELTA and decrease HPET_MIN_CYCLES
MIPS: Don't register r4k sched clock when CPUFREQ enabled
MIPS: mm: Fix definition of R6 cache instruction
SUNRPC: Don't allocate a full sockaddr_storage for tracing
Input: elan_i2c - properly wake up touchpad on ASUS laptops
target: Fix ordered task CHECK_CONDITION early exception handling
target: Fix max_unmap_lba_count calc overflow
target: Fix race between iscsi-target connection shutdown + ABORT_TASK
target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP
target: Fix ordered task target_setup_cmd_from_cdb exception hang
iscsi-target: Fix panic when adding second TCP connection to iSCSI session
ubi: Fix race condition between ubi device creation and udev
ubi: Fix early logging
ubi: Make volume resize power cut aware
of: fix memory leak related to safe_name()
IB/mlx4: Fix memory leak if QP creation failed
IB/mlx4: Fix error flow when sending mads under SRIOV
IB/mlx4: Fix the SQ size of an RC QP
IB/IWPM: Fix a potential skb leak
IB/IPoIB: Don't update neigh validity for unresolved entries
IB/SA: Use correct free function
IB/mlx5: Return PORT_ERR in Active to Initializing tranisition
IB/mlx5: Fix post send fence logic
IB/mlx5: Fix entries check in mlx5_ib_resize_cq
IB/mlx5: Fix returned values of query QP
IB/mlx5: Fix entries checks in mlx5_ib_create_cq
IB/mlx5: Fix MODIFY_QP command input structure
ALSA: hda - Fix headset mic detection problem for two dell machines
ALSA: hda: add AMD Bonaire AZ PCI ID with proper driver caps
ALSA: hda/realtek - Can't adjust speaker's volume on a Dell AIO
ALSA: hda: Fix krealloc() with __GFP_ZERO usage
mm/hugetlb: avoid soft lockup in set_max_huge_pages()
mtd: nand: fix bug writing 1 byte less than page size
block: fix bdi vs gendisk lifetime mismatch
block: add missing group association in bio-cloning functions
metag: Fix __cmpxchg_u32 asm constraint for CMP
ftrace/recordmcount: Work around for addition of metag magic but not relocations
balloon: check the number of available pages in leak balloon
drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown"
drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
drm/edid: Add 6 bpc quirk for display AEO model 0.
drm: Restore double clflush on the last partial cacheline
drm/nouveau/fbcon: fix font width not divisible by 8
drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
drm/nouveau: check for supported chipset before booting fbdev off the hw
drm/radeon: support backlight control for UNIPHY3
drm/radeon: fix firmware info version checks
drm/radeon: Poll for both connect/disconnect on analog connectors
drm/radeon: add a delay after ATPX dGPU power off
drm/amdgpu/gmc7: add missing mullins case
drm/amdgpu: fix firmware info version checks
drm/amdgpu: Disable RPM helpers while reprobing connectors on resume
drm/amdgpu: support backlight control for UNIPHY3
drm/amdgpu: Poll for both connect/disconnect on analog connectors
drm/amdgpu: add a delay after ATPX dGPU power off
w1:omap_hdq: fix regression
netlabel: add address family checks to netlbl_{sock,req}_delattr()
ARM: dts: sunxi: Add a startup delay for fixed regulator enabled phys
audit: fix a double fetch in audit_log_single_execve_arg()
iommu/amd: Update Alias-DTE in update_device_table()
iommu/amd: Init unity mappings only for dma_ops domains
iommu/amd: Handle IOMMU_DOMAIN_DMA in ops->domain_free call-back
iommu/vt-d: Return error code in domain_context_mapping_one()
iommu/exynos: Suppress unbinding to prevent system failure
drm/i915: Don't complain about lack of ACPI video bios
nfsd: don't return an unhashed lock stateid after taking mutex
nfsd: Fix race between FREE_STATEID and LOCK
nfs: don't create zero-length requests
MIPS: KVM: Propagate kseg0/mapped tlb fault errors
MIPS: KVM: Fix gfn range check in kseg0 tlb faults
MIPS: KVM: Add missing gfn range check
MIPS: KVM: Fix mapped fault broken commpage handling
random: add interrupt callback to VMBus IRQ handler
random: print a warning for the first ten uninitialized random users
random: initialize the non-blocking pool via add_hwgenerator_randomness()
CIFS: Fix a possible invalid memory access in smb2_query_symlink()
cifs: fix crash due to race in hmac(md5) handling
cifs: Check for existing directory when opening file with O_CREAT
fs/cifs: make share unaccessible at root level mountable
jbd2: make journal y2038 safe
ARC: mm: don't loose PTE_SPECIAL in pte_modify()
remoteproc: Fix potential race condition in rproc_add
ovl: disallow overlayfs as upperdir
HID: uhid: fix timeout when probe races with IO
EDAC: Correct channel count limit
Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU
spi: pxa2xx: Clear all RFT bits in reset_sccr1() on Intel Quark
i2c: efm32: fix a failure path in efm32_i2c_probe()
s5p-mfc: Add release callback for memory region devs
s5p-mfc: Set device name for reserved memory region devs
hp-wmi: Fix wifi cannot be hard-unblocked
dm: set DMF_SUSPENDED* _before_ clearing DMF_NOFLUSH_SUSPENDING
sur40: fix occasional oopses on device close
sur40: lower poll interval to fix occasional FPS drops to ~56 FPS
Fix RC5 decoding with Fintek CIR chipset
vb2: core: Skip planes array verification if pb is NULL
videobuf2-v4l2: Verify planes array in buffer dequeueing
media: dvb_ringbuffer: Add memory barriers
media: usbtv: prevent access to free'd resources
mfd: qcom_rpm: Parametrize also ack selector size
mfd: qcom_rpm: Fix offset error for msm8660
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
s390/cio: allow to reset channel measurement block
KVM: nVMX: Fix memory corruption when using VMCS shadowing
KVM: VMX: handle PML full VMEXIT that occurs during event delivery
KVM: MTRR: fix kvm_mtrr_check_gfn_range_consistency page fault
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
arm64: mm: avoid fdt_check_header() before the FDT is fully mapped
arm64: dts: rockchip: fixes the gic400 2nd region size for rk3368
pinctrl: cherryview: prevent concurrent access to GPIO controllers
Bluetooth: hci_intel: Fix null gpio desc pointer dereference
gpio: intel-mid: Remove potentially harmful code
gpio: pca953x: Fix NBANK calculation for PCA9536
tty/serial: atmel: fix RS485 half duplex with DMA
serial: samsung: Fix ERR pointer dereference on deferred probe
tty: serial: msm: Don't read off end of tx fifo
arm64: Fix incorrect per-cpu usage for boot CPU
arm64: debug: unmask PSTATE.D earlier
arm64: kernel: Save and restore UAO and addr_limit on exception entry
USB: usbfs: fix potential infoleak in devio
usb: renesas_usbhs: fix NULL pointer dereference in xfer_work()
USB: serial: option: add support for Telit LE910 PID 0x1206
usb: dwc3: fix for the isoc transfer EP_BUSY flag
usb: quirks: Add no-lpm quirk for Elan
usb: renesas_usbhs: protect the CFIFOSEL setting in usbhsg_ep_enable()
usb: f_fs: off by one bug in _ffs_func_bind()
usb: gadget: avoid exposing kernel stack
UPSTREAM: usb: gadget: configfs: add mutex lock before unregister gadget
ANDROID: dm-verity: adopt changes made to dm callbacks
UPSTREAM: ecryptfs: fix handling of directory opening
ANDROID: net: core: fix UID-based routing
ANDROID: net: fib: remove duplicate assignment
FROMLIST: proc: Fix timerslack_ns CAP_SYS_NICE check when adjusting self
ANDROID: dm verity fec: pack the fec_header structure
ANDROID: dm: android-verity: Verify header before fetching table
ANDROID: dm: allow adb disable-verity only in userdebug
ANDROID: dm: mount as linear target if eng build
ANDROID: dm: use default verity public key
ANDROID: dm: fix signature verification flag
ANDROID: dm: use name_to_dev_t
ANDROID: dm: rename dm-linear methods for dm-android-verity
ANDROID: dm: Minor cleanup
ANDROID: dm: Mounting root as linear device when verity disabled
ANDROID: dm-android-verity: Rebase on top of 4.1
ANDROID: dm: Add android verity target
ANDROID: dm: fix dm_substitute_devices()
ANDROID: dm: Rebase on top of 4.1
CHROMIUM: dm: boot time specification of dm=
Implement memory_state_time, used by qcom,cpubw
Revert "panic: Add board ID to panic output"
usb: gadget: f_accessory: remove duplicate endpoint alloc
BACKPORT: brcmfmac: defer DPC processing during probe
FROMLIST: proc: Add LSM hook checks to /proc/<tid>/timerslack_ns
FROMLIST: proc: Relax /proc/<tid>/timerslack_ns capability requirements
UPSTREAM: ppp: defer netns reference release for ppp channel
cpuset: Add allow_attach hook for cpusets on android.
UPSTREAM: KEYS: Fix ASN.1 indefinite length object parsing
ANDROID: sdcardfs: fix itnull.cocci warnings
android-recommended.cfg: enable fstack-protector-strong
Linux 4.4.18
mm: memcontrol: fix memcg id ref counter on swap charge move
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
mm: memcontrol: fix cgroup creation failure after many small jobs
ext4: fix reference counting bug on block allocation error
ext4: short-cut orphan cleanup on error
ext4: validate s_reserved_gdt_blocks on mount
ext4: don't call ext4_should_journal_data() on the journal inode
ext4: fix deadlock during page writeback
ext4: check for extents that wrap around
crypto: scatterwalk - Fix test in scatterwalk_done
crypto: gcm - Filter out async ghash if necessary
fs/dcache.c: avoid soft-lockup in dput()
fuse: fix wrong assignment of ->flags in fuse_send_init()
fuse: fuse_flush must check mapping->flags for errors
fuse: fsync() did not return IO errors
sysv, ipc: fix security-layer leaking
block: fix use-after-free in seq file
x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386
x86/pat: Document the PAT initialization sequence
x86/xen, pat: Remove PAT table init code from Xen
x86/mtrr: Fix PAT init handling when MTRR is disabled
x86/mtrr: Fix Xorg crashes in Qemu sessions
x86/mm/pat: Replace cpu_has_pat with boot_cpu_has()
x86/mm/pat: Add pat_disable() interface
x86/mm/pat: Add support of non-default PAT MSR setting
devpts: clean up interface to pty drivers
random: strengthen input validation for RNDADDTOENTCNT
apparmor: fix ref count leak when profile sha1 hash is read
Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace
arm: oabi compat: add missing access checks
cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
i2c: i801: Allow ACPI SystemIO OpRegion to conflict with PCI BAR
x86/mm/32: Enable full randomization on i386 and X86_32
HID: sony: do not bail out when the sixaxis refuses the output report
PNP: Add Broadwell to Intel MCH size workaround
PNP: Add Haswell-ULT to Intel MCH size workaround
scsi: ignore errors from scsi_dh_add_device()
ipath: Restrict use of the write() interface
tcp: consider recv buf for the initial window scale
qed: Fix setting/clearing bit in completion bitmap
net/irda: fix NULL pointer dereference on memory allocation failure
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
bonding: set carrier off for devices created through netlink
ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space
tcp: enable per-socket rate limiting of all 'challenge acks'
tcp: make challenge acks less predictable
arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux
arm64: vmlinux.lds: make __rela_offset and __dynsym_offset ABSOLUTE
Linux 4.4.17
vfs: fix deadlock in file_remove_privs() on overlayfs
intel_th: Fix a deadlock in modprobing
intel_th: pci: Add Kaby Lake PCH-H support
net: mvneta: set real interrupt per packet for tx_done
libceph: apply new_state before new_up_client on incrementals
libata: LITE-ON CX1-JB256-HP needs lower max_sectors
i2c: mux: reg: wrong condition checked for of_address_to_resource return value
posix_cpu_timer: Exit early when process has been reaped
media: fix airspy usb probe error path
ipr: Clear interrupt on croc/crocodile when running with LSI
SCSI: fix new bug in scsi_dev_info_list string matching
RDS: fix rds_tcp_init() error path
can: fix oops caused by wrong rtnl dellink usage
can: fix handling of unmodifiable configuration options fix
can: c_can: Update D_CAN TX and RX functions to 32 bit - fix Altera Cyclone access
can: at91_can: RX queue could get stuck at high bus load
perf/x86: fix PEBS issues on Intel Atom/Core2
ovl: handle ATTR_KILL*
sched/fair: Fix effective_load() to consistently use smoothed load
mmc: block: fix packed command header endianness
block: fix use-after-free in sys_ioprio_get()
qeth: delete napi struct when removing a qeth device
platform/chrome: cros_ec_dev - double fetch bug in ioctl
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
spi: sun4i: fix FIFO limit
spi: sunxi: fix transfer timeout
namespace: update event counter when umounting a deleted dentry
9p: use file_dentry()
ext4: verify extent header depth
ecryptfs: don't allow mmap when the lower fs doesn't support it
Revert "ecryptfs: forbid opening files without mmap handler"
locks: use file_inode()
power_supply: power_supply_read_temp only if use_cnt > 0
cgroup: set css->id to -1 during init
pinctrl: imx: Do not treat a PIN without MUX register as an error
pinctrl: single: Fix missing flush of posted write for a wakeirq
pvclock: Add CPU barriers to get correct version value
Input: tsc200x - report proper input_dev name
Input: xpad - validate USB endpoint count during probe
Input: wacom_w8001 - w8001_MAX_LENGTH should be 13
Input: xpad - fix oops when attaching an unknown Xbox One gamepad
Input: elantech - add more IC body types to the list
Input: vmmouse - remove port reservation
ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt
ALSA: timer: Fix leak in events via snd_timer_user_ccallback
ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS
xenbus: don't bail early from xenbus_dev_request_and_reply()
xenbus: don't BUG() on user mode induced condition
xen/pciback: Fix conf_space read/write overlap check.
ARC: unwind: ensure that .debug_frame is generated (vs. .eh_frame)
arc: unwind: warn only once if DW2_UNWIND is disabled
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
pps: do not crash when failed to register
vmlinux.lds: account for destructor sections
mm, meminit: ensure node is online before checking whether pages are uninitialised
mm, meminit: always return a valid node from early_pfn_to_nid
mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
fs/nilfs2: fix potential underflow in call to crc32_le
mm, compaction: abort free scanner if split fails
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
dmaengine: at_xdmac: double FIFO flush needed to compute residue
dmaengine: at_xdmac: fix residue corruption
dmaengine: at_xdmac: align descriptors on 64 bits
x86/quirks: Add early quirk to reset Apple AirPort card
x86/quirks: Reintroduce scanning of secondary buses
x86/quirks: Apply nvidia_bugs quirk only on root bus
USB: OHCI: Don't mark EDs as ED_OPER if scheduling fails
Conflicts:
arch/arm/kernel/topology.c
arch/arm64/include/asm/arch_gicv3.h
arch/arm64/kernel/topology.c
block/bio.c
drivers/cpufreq/Kconfig
drivers/md/Makefile
drivers/media/dvb-core/dvb_ringbuffer.c
drivers/media/tuners/tuner-xc2028.c
drivers/misc/Kconfig
drivers/misc/Makefile
drivers/mmc/core/host.c
drivers/scsi/ufs/ufshcd.c
drivers/scsi/ufs/ufshcd.h
drivers/usb/dwc3/gadget.c
drivers/usb/gadget/configfs.c
fs/ecryptfs/file.c
include/linux/mmc/core.h
include/linux/mmc/host.h
include/linux/mmzone.h
include/linux/sched.h
include/linux/sched/sysctl.h
include/trace/events/power.h
include/trace/events/sched.h
init/Kconfig
kernel/cpuset.c
kernel/exit.c
kernel/sched/Makefile
kernel/sched/core.c
kernel/sched/cputime.c
kernel/sched/fair.c
kernel/sched/features.h
kernel/sched/rt.c
kernel/sched/sched.h
kernel/sched/stop_task.c
kernel/sched/tune.c
lib/Kconfig.debug
mm/Makefile
mm/vmstat.c
Change-Id: I243a43231ca56a6362076fa6301827e1b0493be5
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit a46cbf3bc53b6a93fb84a5ffb288c354fa807954 upstream.
It's possible to isolate some freepages in a pageblock and then fail
split_free_page() due to the low watermark check. In this case, we hit
VM_BUG_ON() because the freeing scanner terminated early without a
contended lock or enough freepages.
This should never have been a VM_BUG_ON() since it's not a fatal
condition. It should have been a VM_WARN_ON() at best, or even handled
gracefully.
Regardless, we need to terminate anytime the full pageblock scan was not
done. The logic belongs in isolate_freepages_block(), so handle its
state gracefully by terminating the pageblock loop and making a note to
restart at the same pageblock next time since it was not possible to
complete the scan this time.
[rientjes@google.com: don't rescan pages in a pageblock]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1607111244150.83138@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606291436300.145590@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Tested-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit a4f04f2c6955aff5e2c08dcb40aca247ff4d7370 upstream.
If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly. If the
watermark is insufficient for a free page of order <= cc->order, then
terminate the scanner since all future splits will also likely fail.
This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.
compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.
The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark. All thp page faults can take >400ms in such a state without
this fix.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* msm-4.4/tmp-510d0a3f:
Linux 4.4.11
nf_conntrack: avoid kernel pointer value leak in slab name
drm/radeon: fix DP link training issue with second 4K monitor
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Input: max8997-haptic - fix NULL pointer dereference
get_rock_ridge_filename(): handle malformed NM entries
tools lib traceevent: Do not reassign parg after collapse_tree()
qla1280: Don't allocate 512kb of host tags
atomic_open(): fix the handling of create_error
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
workqueue: fix rebind bound workers warning
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
perf/core: Disable the event on a truncated AUX record
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
pinctrl: at91-pio4: fix pull-up/down logic
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
ALSA: hda - Fix broken reconfig
ALSA: hda - Fix white noise on Asus UX501VW headset
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
ALSA: usb-audio: Yet another Phoneix Audio device quirk
ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
crypto: testmgr - Use kmalloc memory for RSA input
crypto: hash - Fix page length clamping in hash walk
crypto: qat - fix invalid pf2vf_resp_wq logic
s390/mm: fix asce_bits handling with dynamic pagetable levels
zsmalloc: fix zs_can_compact() integer overflow
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
net/route: enforce hoplimit max value
tcp: refresh skb timestamp at retransmit time
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h MIME-Version: 1.0
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
net: fix infoleak in rtnetlink
net: fix infoleak in llc
net: fec: only clear a queue's work bit if the queue was emptied
netem: Segment GSO packets on enqueue
sch_dsmark: update backlog as well
sch_htb: update backlog as well
net_sched: update hierarchical backlog too
net_sched: introduce qdisc_replace() helper
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net/mlx4_en: fix spurious timestamping callbacks
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
openvswitch: use flow protocol when recalculating ipv6 checksums
atl2: Disable unimplemented scatter/gather feature
vlan: pull on __vlan_insert_tag error path and fix csum correction
net: use skb_postpush_rcsum instead of own implementations
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
net: sched: do not requeue a NULL skb
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
route: do not cache fib route info on local routes with oif
decnet: Do not build routes to devices without decnet private data.
parisc: Use generic extable search and sort routines
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: mm: treat memstart_addr as a signed quantity
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: consistently use p?d_set_huge
arm64: fix KASLR boot-time I-cache maintenance
arm64: hugetlb: partial revert of 66b3923a1a0f
arm64: make irq_stack_ptr more robust
arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
efi: stub: use high allocation for converted command line
efi: stub: add implementation of efi_random_alloc()
efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
arm64: kaslr: randomize the linear region
arm64: add support for kernel ASLR
arm64: add support for building vmlinux as a relocatable PIE binary
arm64: switch to relative exception tables
extable: add support for relative extables to search and sort routines
scripts/sortextable: add support for ET_DYN binaries
arm64: futex.h: Add missing PAN toggling
arm64: make asm/elf.h available to asm files
arm64: avoid dynamic relocations in early boot code
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
arm64: add support for module PLTs
arm64: move brk immediate argument definitions to separate header
arm64: mm: use bit ops rather than arithmetic in pa/va translations
arm64: mm: only perform memstart_addr sanity check if DEBUG_VM
arm64: User die() instead of panic() in do_page_fault()
arm64: allow kernel Image to be loaded anywhere in physical memory
arm64: defer __va translation of initrd_start and initrd_end
arm64: move kernel image to base of vmalloc area
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: decouple early fixmap init from linear mapping
arm64: pgtable: implement static [pte|pmd|pud]_offset variants
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
arm64: add support for ioremap() block mappings
arm64: prevent potential circular header dependencies in asm/bug.h
of/fdt: factor out assignment of initrd_start/initrd_end
of/fdt: make memblock minimum physical address arch configurable
arm64: Remove the get_thread_info() function
arm64: kernel: Don't toggle PAN on systems with UAO
arm64: cpufeature: Test 'matches' pointer to find the end of the list
arm64: kernel: Add support for User Access Override
arm64: add ARMv8.2 id_aa64mmfr2 boiler plate
arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro
arm64: use local label prefixes for __reg_num symbols
arm64: vdso: Mark vDSO code as read-only
arm64: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: prefetch: add missing #include for spin_lock_prefetch
arm64: lib: patch in prfm for copy_page if requested
arm64: lib: improve copy_page to deal with 128 bytes at a time
arm64: prefetch: add alternative pattern for CPUs without a prefetcher
arm64: prefetch: don't provide spin_lock_prefetch with LSE
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
arm64: kasan: avoid TLB conflicts
arm64: mm: add code to safely replace TTBR1_EL1
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: mm: specialise pagetable allocators
asm-generic: Fix local variable shadow in __set_fixmap_offset
Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
arm64: Fix an enum typo in mm/dump.c
arm64: kasan: ensure that the KASAN zero page is mapped read-only
arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
arm64: hide __efistub_ aliases from kallsyms
Linux 4.4.10
drm/i915/skl: Fix DMC load on Skylake J0 and K0
lib/test-string_helpers.c: fix and improve string_get_size() tests
ACPI / processor: Request native thermal interrupt handling via _OSC
drm/i915: Fake HDMI live status
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/radeon: make sure vertical front porch is at least 1
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/amdgpu: make sure vertical front porch is at least 1
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
nvmem: mxs-ocotp: fix buffer overflow in read
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
ata: ahci-platform: Add ports-implemented DT bindings.
libahci: save port map for forced port map
powerpc: Fix bad inline asm constraint in create_zero_mask()
ACPICA: Dispatcher: Update thread ID for recursive method calls
x86/sysfb_efi: Fix valid BAR address range check
ARC: Add missing io barriers to io{read,write}{16,32}be()
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
propogate_mnt: Handle the first propogated copy being a slave
fs/pnode.c: treat zero mnt_group_id-s as unequal
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
MAINTAINERS: Remove asterisk from EFI directory names
writeback: Fix performance regression in wb_over_bg_thresh()
batman-adv: Reduce refcnt of removed router when updating route
batman-adv: Fix broadcast/ogm queue limit on a removed interface
batman-adv: Check skb size before using encapsulated ETH+VLAN header
batman-adv: fix DAT candidate selection (must use vid)
mm: update min_free_kbytes from khugepaged after core initialization
proc: prevent accessing /proc/<PID>/environ until it's ready
Input: zforce_ts - fix dual touch recognition
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
mm/zswap: provide unique zpool name
mm, cma: prevent nr_isolated_* counters from going negative
Minimal fix-up of bad hashing behavior of hash_64()
MD: make bio mergeable
tracing: Don't display trigger file for events that can't be enabled
mac80211: fix statistics leak if dev_alloc_name() fails
ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation
lpfc: fix misleading indentation
clk: qcom: msm8960: Fix ce3_src register offset
clk: versatile: sp810: support reentrance
clk: qcom: msm8960: fix ce3_core clk enable register
clk: meson: Fix meson_clk_register_clks() signature type mismatch
clk: rockchip: free memory in error cases when registering clock branches
soc: rockchip: power-domain: fix err handle while probing
clk-divider: make sure read-only dividers do not write to their register
CNS3xxx: Fix PCI cns3xxx_write_config()
mwifiex: fix corner case association failure
ata: ahci_xgene: dereferencing uninitialized pointer in probe
nbd: ratelimit error msgs after socket close
mfd: intel-lpss: Remove clock tree on error path
ipvs: drop first packet to redirect conntrack
ipvs: correct initial offset of Call-ID header search in SIP persistence engine
ipvs: handle ip_vs_fill_iph_skb_off failure
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
Revert: "powerpc/tm: Check for already reclaimed tasks"
arm64: head.S: use memset to clear BSS
efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
arm64: entry: remove pointless SPSR mode check
arm64: mm: move pgd_cache initialisation to pgtable_cache_init
arm64: module: avoid undefined shift behavior in reloc_data()
arm64: module: fix relocation of movz instruction with negative immediate
arm64: traps: address fallout from printk -> pr_* conversion
arm64: ftrace: fix a stack tracer's output under function graph tracer
arm64: pass a task parameter to unwind_frame()
arm64: ftrace: modify a stack frame in a safe way
arm64: remove irq_count and do_softirq_own_stack()
arm64: hugetlb: add support for PTE contiguous bit
arm64: Use PoU cache instr for I/D coherency
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: reduce stack use in irq_handler
arm64: Documentation: add list of software workarounds for errata
arm64: mm: place __cpu_setup in .text
arm64: cmpxchg: Don't incldue linux/mmdebug.h
arm64: mm: fold alternatives into .init
arm64: Remove redundant padding from linker script
arm64: mm: remove pointless PAGE_MASKing
arm64: don't call C code with el0's fp register
arm64: when walking onto the task stack, check sp & fp are in current->stack
arm64: Add this_cpu_ptr() assembler macro for use in entry.S
arm64: irq: fix walking from irq stack to task stack
arm64: Add do_softirq_own_stack() and enable irq_stacks
arm64: Modify stack trace and dump for use with irq_stack
arm64: Store struct thread_info in sp_el0
arm64: Add trace_hardirqs_off annotation in ret_to_user
arm64: ftrace: fix the comments for ftrace_modify_code
arm64: ftrace: stop using kstop_machine to enable/disable tracing
arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64: fix COMPAT_SHMLBA definition for large pages
arm64: add __init/__initdata section marker to some functions/variables
arm64: pgtable: implement pte_accessible()
arm64: mm: allow sections for unaligned bases
arm64: mm: detect bad __create_mapping uses
Linux 4.4.9
extcon: max77843: Use correct size for reading the interrupt register
stm class: Select CONFIG_SRCU
megaraid_sas: add missing curly braces in ioctl handler
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
thermal: rockchip: fix a impossible condition caused by the warning
unbreak allmodconfig KCONFIG_ALLCONFIG=...
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
bus: imx-weim: Take the 'status' property value into account
ARM: dts: pxa: fix dma engine node to pxa3xx-nand
ARM: dts: armada-375: use armada-370-sata for SATA
ARM: EXYNOS: select THERMAL_OF
ARM: prima2: always enable reset controller
ARM: OMAP3: Add cpuidle parameters table for omap3430
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
perf stat: Document --detailed option
perf tools: handle spaces in file names obtained from /proc/pid/maps
perf hists browser: Only offer symbol scripting when a symbol is under the cursor
mtd: nand: Drop mtd.owner requirement in nand_scan
mtd: brcmnand: Fix v7.1 register offsets
mtd: spi-nor: remove micron_quad_enable()
serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
x86/mm/kmmio: Fix mmiotrace for hugepages
perf evlist: Reference count the cpu and thread maps at set_maps()
drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors
rtc: max77686: Properly handle regmap_irq_get_virq() error code
rtc: rx8025: remove rv8803 id
rtc: ds1685: passing bogus values to irq_restore
rtc: vr41xx: Wire up alarm_irq_enable
rtc: hym8563: fix invalid year calculation
PM / Domains: Fix removal of a subdomain
PM / OPP: Initialize u_volt_min/max to a valid value
misc: mic/scif: fix wrap around tests
misc/bmp085: Enable building as a module
lib/mpi: Endianness fix
fbdev: da8xx-fb: fix videomodes of lcd panels
scsi_dh: force modular build if SCSI is a module
paride: make 'verbose' parameter an 'int' again
regulator: s5m8767: fix get_register() error handling
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
locking/mcs: Fix mcs_spin_lock() ordering
regulator: core: Fix nested locking of supplies
regulator: core: Ensure we lock all regulators
regulator: core: fix regulator_lock_supply regression
Revert "regulator: core: Fix nested locking of supplies"
videobuf2-v4l2: Verify planes array in buffer dequeueing
videobuf2-core: Check user space planes array in dqbuf
USB: usbip: fix potential out-of-bounds write
cgroup: make sure a parent css isn't freed before its children
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
workqueue: fix ghost PENDING flag while doing MQ IO
x86/apic: Handle zero vector gracefully in clear_vector_irq()
efi: Expose non-blocking set_variable() wrapper to efivars
efi: Fix out-of-bounds read in variable_matches()
IB/security: Restrict use of the write() interface
IB/mlx5: Expose correct max_sge_rd limit
cxl: Keep IRQ mappings on context teardown
v4l2-dv-timings.h: fix polarity for 4k formats
vb2-memops: Fix over allocation of frame vectors
ASoC: rt5640: Correct the digital interface data select
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: ssm4567: Reset device before regcache_sync()
ASoC: s3c24xx: use const snd_soc_component_driver pointer
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
toshiba_acpi: Fix regression caused by hotkey enabling value
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
perf intel-pt: Fix segfault tracing transactions
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Fixup the free space logic in ring_prepare
drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()
drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number
drm/i915: Cleanup phys status page too
pwm: brcmstb: Fix check of devm_ioremap_resource() return code
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
drm: Loongson-3 doesn't fully support wc memory
drm/radeon: fix vertical bars appear on monitor (v2)
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/radeon: fix initial connector audio value
drm/radeon: add a quirk for a XFX R9 270X
drm/amdgpu: fix regression on CIK (v2)
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
iommu/dma: Restore scatterlist offsets correctly
iommu/amd: Fix checking of pci dma aliases
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND"
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: gtco - fix crash on detecting device without endpoints
netlink: don't send NETLINK_URELEASE for unbound sockets
nl80211: check netlink protocol in socket release notification
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: ccp - Prevent information leakage on export
iwlwifi: mvm: fix memory leak in paging
iwlwifi: pcie: lower the debug level for RSA semaphore access
s390/pci: add extra padding to function measurement block
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
drm/qxl: fix cursor position with non-zero hotspot
drm/nouveau/core: use vzalloc for allocating ramht
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
ALSA: hda - Add dock support for ThinkPad X260
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA: hda - Don't trust the reported actual power state
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
sched/cgroup: Fix/cleanup cgroup teardown/init
dmaengine: pxa_dma: fix the maximum requestor line
dmaengine: hsu: correct use of channel status register
dmaengine: dw: fix master selection
debugfs: Make automount point inodes permanently empty
lib: lz4: fixed zram with lz4 on big endian machines
dm cache metadata: fix cmd_read_lock() acquiring write lock
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
usb: gadget: f_fs: Fix use-after-free
usb: hcd: out of bounds access in for_each_companion
xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
usb: xhci: fix wild pointers in xhci_mem_cleanup
xhci: resume USB 3 roothub first
usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
assoc_array: don't call compare_object() on a node
ARM: OMAP2+: hwmod: Fix updating of sysconfig register
ARM: OMAP2: Fix up interconnect barrier initialization for DRA7
ARM: mvebu: Correct unit address for linksys
ARM: dts: AM43x-epos: Fix clk parent for synctimer
KVM: arm/arm64: Handle forward time correction gracefully
kvm: x86: do not leak guest xcr0 into host interrupt handlers
x86/mce: Avoid using object after free in genpool
block: loop: fix filesystem corruption in case of aio/dio
block: partition: initialize percpuref before sending out KOBJ_ADD
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/hardirq.h
arch/arm64/include/asm/irq.h
arch/arm64/include/asm/mmu_context.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/stacktrace.c
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/pageattr.c
mm/memcontrol.c
CRs-Fixed: 1069136
Signed-off-by: Bryan Huntsman <bryanh@codeaurora.org>
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: Ie9a16debd0578331a66947376f3b787a7bb54d65
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 9d6fd2c3e9fcfb ("Merge remote-tracking branch
'msm-4.4/tmp-510d0a3f' into msm-4.4"), because it breaks the
dump parsing tools due to kernel can be loaded anywhere in the memory
now and not fixed at linear mapping.
Change-Id: Id416f0a249d803442847d09ac47781147b0d0ee6
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* msm-4.4/tmp-510d0a3f:
Linux 4.4.11
nf_conntrack: avoid kernel pointer value leak in slab name
drm/radeon: fix DP link training issue with second 4K monitor
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Input: max8997-haptic - fix NULL pointer dereference
get_rock_ridge_filename(): handle malformed NM entries
tools lib traceevent: Do not reassign parg after collapse_tree()
qla1280: Don't allocate 512kb of host tags
atomic_open(): fix the handling of create_error
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
workqueue: fix rebind bound workers warning
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
perf/core: Disable the event on a truncated AUX record
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
pinctrl: at91-pio4: fix pull-up/down logic
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
ALSA: hda - Fix broken reconfig
ALSA: hda - Fix white noise on Asus UX501VW headset
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
ALSA: usb-audio: Yet another Phoneix Audio device quirk
ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
crypto: testmgr - Use kmalloc memory for RSA input
crypto: hash - Fix page length clamping in hash walk
crypto: qat - fix invalid pf2vf_resp_wq logic
s390/mm: fix asce_bits handling with dynamic pagetable levels
zsmalloc: fix zs_can_compact() integer overflow
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
net/route: enforce hoplimit max value
tcp: refresh skb timestamp at retransmit time
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h MIME-Version: 1.0
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
net: fix infoleak in rtnetlink
net: fix infoleak in llc
net: fec: only clear a queue's work bit if the queue was emptied
netem: Segment GSO packets on enqueue
sch_dsmark: update backlog as well
sch_htb: update backlog as well
net_sched: update hierarchical backlog too
net_sched: introduce qdisc_replace() helper
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net/mlx4_en: fix spurious timestamping callbacks
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
openvswitch: use flow protocol when recalculating ipv6 checksums
atl2: Disable unimplemented scatter/gather feature
vlan: pull on __vlan_insert_tag error path and fix csum correction
net: use skb_postpush_rcsum instead of own implementations
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
net: sched: do not requeue a NULL skb
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
route: do not cache fib route info on local routes with oif
decnet: Do not build routes to devices without decnet private data.
parisc: Use generic extable search and sort routines
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: mm: treat memstart_addr as a signed quantity
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: consistently use p?d_set_huge
arm64: fix KASLR boot-time I-cache maintenance
arm64: hugetlb: partial revert of 66b3923a1a0f
arm64: make irq_stack_ptr more robust
arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
efi: stub: use high allocation for converted command line
efi: stub: add implementation of efi_random_alloc()
efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
arm64: kaslr: randomize the linear region
arm64: add support for kernel ASLR
arm64: add support for building vmlinux as a relocatable PIE binary
arm64: switch to relative exception tables
extable: add support for relative extables to search and sort routines
scripts/sortextable: add support for ET_DYN binaries
arm64: futex.h: Add missing PAN toggling
arm64: make asm/elf.h available to asm files
arm64: avoid dynamic relocations in early boot code
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
arm64: add support for module PLTs
arm64: move brk immediate argument definitions to separate header
arm64: mm: use bit ops rather than arithmetic in pa/va translations
arm64: mm: only perform memstart_addr sanity check if DEBUG_VM
arm64: User die() instead of panic() in do_page_fault()
arm64: allow kernel Image to be loaded anywhere in physical memory
arm64: defer __va translation of initrd_start and initrd_end
arm64: move kernel image to base of vmalloc area
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: decouple early fixmap init from linear mapping
arm64: pgtable: implement static [pte|pmd|pud]_offset variants
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
arm64: add support for ioremap() block mappings
arm64: prevent potential circular header dependencies in asm/bug.h
of/fdt: factor out assignment of initrd_start/initrd_end
of/fdt: make memblock minimum physical address arch configurable
arm64: Remove the get_thread_info() function
arm64: kernel: Don't toggle PAN on systems with UAO
arm64: cpufeature: Test 'matches' pointer to find the end of the list
arm64: kernel: Add support for User Access Override
arm64: add ARMv8.2 id_aa64mmfr2 boiler plate
arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro
arm64: use local label prefixes for __reg_num symbols
arm64: vdso: Mark vDSO code as read-only
arm64: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: prefetch: add missing #include for spin_lock_prefetch
arm64: lib: patch in prfm for copy_page if requested
arm64: lib: improve copy_page to deal with 128 bytes at a time
arm64: prefetch: add alternative pattern for CPUs without a prefetcher
arm64: prefetch: don't provide spin_lock_prefetch with LSE
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
arm64: kasan: avoid TLB conflicts
arm64: mm: add code to safely replace TTBR1_EL1
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: mm: specialise pagetable allocators
asm-generic: Fix local variable shadow in __set_fixmap_offset
Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
arm64: Fix an enum typo in mm/dump.c
arm64: kasan: ensure that the KASAN zero page is mapped read-only
arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
arm64: hide __efistub_ aliases from kallsyms
Linux 4.4.10
drm/i915/skl: Fix DMC load on Skylake J0 and K0
lib/test-string_helpers.c: fix and improve string_get_size() tests
ACPI / processor: Request native thermal interrupt handling via _OSC
drm/i915: Fake HDMI live status
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/radeon: make sure vertical front porch is at least 1
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/amdgpu: make sure vertical front porch is at least 1
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
nvmem: mxs-ocotp: fix buffer overflow in read
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
ata: ahci-platform: Add ports-implemented DT bindings.
libahci: save port map for forced port map
powerpc: Fix bad inline asm constraint in create_zero_mask()
ACPICA: Dispatcher: Update thread ID for recursive method calls
x86/sysfb_efi: Fix valid BAR address range check
ARC: Add missing io barriers to io{read,write}{16,32}be()
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
propogate_mnt: Handle the first propogated copy being a slave
fs/pnode.c: treat zero mnt_group_id-s as unequal
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
MAINTAINERS: Remove asterisk from EFI directory names
writeback: Fix performance regression in wb_over_bg_thresh()
batman-adv: Reduce refcnt of removed router when updating route
batman-adv: Fix broadcast/ogm queue limit on a removed interface
batman-adv: Check skb size before using encapsulated ETH+VLAN header
batman-adv: fix DAT candidate selection (must use vid)
mm: update min_free_kbytes from khugepaged after core initialization
proc: prevent accessing /proc/<PID>/environ until it's ready
Input: zforce_ts - fix dual touch recognition
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
mm/zswap: provide unique zpool name
mm, cma: prevent nr_isolated_* counters from going negative
Minimal fix-up of bad hashing behavior of hash_64()
MD: make bio mergeable
tracing: Don't display trigger file for events that can't be enabled
mac80211: fix statistics leak if dev_alloc_name() fails
ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation
lpfc: fix misleading indentation
clk: qcom: msm8960: Fix ce3_src register offset
clk: versatile: sp810: support reentrance
clk: qcom: msm8960: fix ce3_core clk enable register
clk: meson: Fix meson_clk_register_clks() signature type mismatch
clk: rockchip: free memory in error cases when registering clock branches
soc: rockchip: power-domain: fix err handle while probing
clk-divider: make sure read-only dividers do not write to their register
CNS3xxx: Fix PCI cns3xxx_write_config()
mwifiex: fix corner case association failure
ata: ahci_xgene: dereferencing uninitialized pointer in probe
nbd: ratelimit error msgs after socket close
mfd: intel-lpss: Remove clock tree on error path
ipvs: drop first packet to redirect conntrack
ipvs: correct initial offset of Call-ID header search in SIP persistence engine
ipvs: handle ip_vs_fill_iph_skb_off failure
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
Revert: "powerpc/tm: Check for already reclaimed tasks"
arm64: head.S: use memset to clear BSS
efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
arm64: entry: remove pointless SPSR mode check
arm64: mm: move pgd_cache initialisation to pgtable_cache_init
arm64: module: avoid undefined shift behavior in reloc_data()
arm64: module: fix relocation of movz instruction with negative immediate
arm64: traps: address fallout from printk -> pr_* conversion
arm64: ftrace: fix a stack tracer's output under function graph tracer
arm64: pass a task parameter to unwind_frame()
arm64: ftrace: modify a stack frame in a safe way
arm64: remove irq_count and do_softirq_own_stack()
arm64: hugetlb: add support for PTE contiguous bit
arm64: Use PoU cache instr for I/D coherency
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: reduce stack use in irq_handler
arm64: Documentation: add list of software workarounds for errata
arm64: mm: place __cpu_setup in .text
arm64: cmpxchg: Don't incldue linux/mmdebug.h
arm64: mm: fold alternatives into .init
arm64: Remove redundant padding from linker script
arm64: mm: remove pointless PAGE_MASKing
arm64: don't call C code with el0's fp register
arm64: when walking onto the task stack, check sp & fp are in current->stack
arm64: Add this_cpu_ptr() assembler macro for use in entry.S
arm64: irq: fix walking from irq stack to task stack
arm64: Add do_softirq_own_stack() and enable irq_stacks
arm64: Modify stack trace and dump for use with irq_stack
arm64: Store struct thread_info in sp_el0
arm64: Add trace_hardirqs_off annotation in ret_to_user
arm64: ftrace: fix the comments for ftrace_modify_code
arm64: ftrace: stop using kstop_machine to enable/disable tracing
arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64: fix COMPAT_SHMLBA definition for large pages
arm64: add __init/__initdata section marker to some functions/variables
arm64: pgtable: implement pte_accessible()
arm64: mm: allow sections for unaligned bases
arm64: mm: detect bad __create_mapping uses
Linux 4.4.9
extcon: max77843: Use correct size for reading the interrupt register
stm class: Select CONFIG_SRCU
megaraid_sas: add missing curly braces in ioctl handler
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
thermal: rockchip: fix a impossible condition caused by the warning
unbreak allmodconfig KCONFIG_ALLCONFIG=...
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
bus: imx-weim: Take the 'status' property value into account
ARM: dts: pxa: fix dma engine node to pxa3xx-nand
ARM: dts: armada-375: use armada-370-sata for SATA
ARM: EXYNOS: select THERMAL_OF
ARM: prima2: always enable reset controller
ARM: OMAP3: Add cpuidle parameters table for omap3430
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
perf stat: Document --detailed option
perf tools: handle spaces in file names obtained from /proc/pid/maps
perf hists browser: Only offer symbol scripting when a symbol is under the cursor
mtd: nand: Drop mtd.owner requirement in nand_scan
mtd: brcmnand: Fix v7.1 register offsets
mtd: spi-nor: remove micron_quad_enable()
serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
x86/mm/kmmio: Fix mmiotrace for hugepages
perf evlist: Reference count the cpu and thread maps at set_maps()
drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors
rtc: max77686: Properly handle regmap_irq_get_virq() error code
rtc: rx8025: remove rv8803 id
rtc: ds1685: passing bogus values to irq_restore
rtc: vr41xx: Wire up alarm_irq_enable
rtc: hym8563: fix invalid year calculation
PM / Domains: Fix removal of a subdomain
PM / OPP: Initialize u_volt_min/max to a valid value
misc: mic/scif: fix wrap around tests
misc/bmp085: Enable building as a module
lib/mpi: Endianness fix
fbdev: da8xx-fb: fix videomodes of lcd panels
scsi_dh: force modular build if SCSI is a module
paride: make 'verbose' parameter an 'int' again
regulator: s5m8767: fix get_register() error handling
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
locking/mcs: Fix mcs_spin_lock() ordering
regulator: core: Fix nested locking of supplies
regulator: core: Ensure we lock all regulators
regulator: core: fix regulator_lock_supply regression
Revert "regulator: core: Fix nested locking of supplies"
videobuf2-v4l2: Verify planes array in buffer dequeueing
videobuf2-core: Check user space planes array in dqbuf
USB: usbip: fix potential out-of-bounds write
cgroup: make sure a parent css isn't freed before its children
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
workqueue: fix ghost PENDING flag while doing MQ IO
x86/apic: Handle zero vector gracefully in clear_vector_irq()
efi: Expose non-blocking set_variable() wrapper to efivars
efi: Fix out-of-bounds read in variable_matches()
IB/security: Restrict use of the write() interface
IB/mlx5: Expose correct max_sge_rd limit
cxl: Keep IRQ mappings on context teardown
v4l2-dv-timings.h: fix polarity for 4k formats
vb2-memops: Fix over allocation of frame vectors
ASoC: rt5640: Correct the digital interface data select
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: ssm4567: Reset device before regcache_sync()
ASoC: s3c24xx: use const snd_soc_component_driver pointer
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
toshiba_acpi: Fix regression caused by hotkey enabling value
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
perf intel-pt: Fix segfault tracing transactions
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Fixup the free space logic in ring_prepare
drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()
drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number
drm/i915: Cleanup phys status page too
pwm: brcmstb: Fix check of devm_ioremap_resource() return code
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
drm: Loongson-3 doesn't fully support wc memory
drm/radeon: fix vertical bars appear on monitor (v2)
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/radeon: fix initial connector audio value
drm/radeon: add a quirk for a XFX R9 270X
drm/amdgpu: fix regression on CIK (v2)
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
iommu/dma: Restore scatterlist offsets correctly
iommu/amd: Fix checking of pci dma aliases
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND"
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: gtco - fix crash on detecting device without endpoints
netlink: don't send NETLINK_URELEASE for unbound sockets
nl80211: check netlink protocol in socket release notification
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: ccp - Prevent information leakage on export
iwlwifi: mvm: fix memory leak in paging
iwlwifi: pcie: lower the debug level for RSA semaphore access
s390/pci: add extra padding to function measurement block
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
drm/qxl: fix cursor position with non-zero hotspot
drm/nouveau/core: use vzalloc for allocating ramht
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
ALSA: hda - Add dock support for ThinkPad X260
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA: hda - Don't trust the reported actual power state
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
sched/cgroup: Fix/cleanup cgroup teardown/init
dmaengine: pxa_dma: fix the maximum requestor line
dmaengine: hsu: correct use of channel status register
dmaengine: dw: fix master selection
debugfs: Make automount point inodes permanently empty
lib: lz4: fixed zram with lz4 on big endian machines
dm cache metadata: fix cmd_read_lock() acquiring write lock
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
usb: gadget: f_fs: Fix use-after-free
usb: hcd: out of bounds access in for_each_companion
xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
usb: xhci: fix wild pointers in xhci_mem_cleanup
xhci: resume USB 3 roothub first
usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
assoc_array: don't call compare_object() on a node
ARM: OMAP2+: hwmod: Fix updating of sysconfig register
ARM: OMAP2: Fix up interconnect barrier initialization for DRA7
ARM: mvebu: Correct unit address for linksys
ARM: dts: AM43x-epos: Fix clk parent for synctimer
KVM: arm/arm64: Handle forward time correction gracefully
kvm: x86: do not leak guest xcr0 into host interrupt handlers
x86/mce: Avoid using object after free in genpool
block: loop: fix filesystem corruption in case of aio/dio
block: partition: initialize percpuref before sending out KOBJ_ADD
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/hardirq.h
arch/arm64/include/asm/irq.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/stacktrace.c
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/pageattr.c
mm/memcontrol.c
CRs-Fixed: 1054234
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
Change-Id: I2a7a34631ffee36ce18b9171f16d023be777392f
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit 14af4a5e9b26ad251f81c174e8a43f3e179434a5 upstream.
/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay. The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.
The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated(). Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.
Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit "mm: vmscan: fix the page state calculation in too_many_isolated"
fixed an issue where a number of tasks were blocked in reclaim path
for seconds, because of vmstat_diff not being synced in time.
A similar problem can happen in isolate_migratepages_block, where
similar calculation is performed. This patch fixes that.
Change-Id: Ie74f108ef770da688017b515fe37faea6f384589
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When CONFIG_PAGE_POISONING is enabled, the pages are poisoned
after setting free page in KASan Shadow memory and KASan reports
the read after free warning. The same thing happens in the allocation
path. So change the order of calling KASan_alloc/free API so that
pages poisoning happens when the pages are in alloc status in KASan
shadow memory.
following is the KASan report for reference.
==================================================================
BUG: KASan: use after free in memset+0x24/0x44 at addr ffffffc000000000
Write of size 4096 by task swapper/0
page:ffffffbac5000000 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-g5a4a5d5-07242-g6938a8b-dirty #1
Hardware name: Qualcomm Technologies, Inc. MSM 8996 v2 + PMI8994 MTP (DT)
Call trace:
[<ffffffc000089ea4>] dump_backtrace+0x0/0x1c4
[<ffffffc00008a078>] show_stack+0x10/0x1c
[<ffffffc0010ecfd8>] dump_stack+0x74/0xc8
[<ffffffc00020faec>] kasan_report_error+0x2b0/0x408
[<ffffffc00020fd20>] kasan_report+0x34/0x40
[<ffffffc00020f138>] __asan_storeN+0x15c/0x168
[<ffffffc00020f374>] memset+0x20/0x44
[<ffffffc0002086e0>] kernel_map_pages+0x238/0x2a8
[<ffffffc0001ba738>] free_pages_prepare+0x21c/0x25c
[<ffffffc0001bc7e4>] __free_pages_ok+0x20/0xf0
[<ffffffc0001bd3bc>] __free_pages+0x34/0x44
[<ffffffc0001bd5d8>] __free_pages_bootmem+0xf4/0x110
[<ffffffc001ca9050>] free_all_bootmem+0x160/0x1f4
[<ffffffc001c97b30>] mem_init+0x70/0x1ec
[<ffffffc001c909f8>] start_kernel+0x2b8/0x4e4
[<ffffffc001c987dc>] kasan_early_init+0x154/0x160
Change-Id: Idbd3dc629be57ed55a383b069a735ae3ee7b9f05
Signed-off-by: Se Wang (Patrick) Oh <sewango@codeaurora.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compaction returns prematurely with COMPACT_PARTIAL when contended or has
fatal signal pending. This is ok for the callers, but might be misleading
in the traces, as the usual reason to return COMPACT_PARTIAL is that we
think the allocation should succeed. After this patch we distinguish the
premature ending condition in the mm_compaction_finished and
mm_compaction_end tracepoints.
The contended status covers the following reasons:
- lock contention or need_resched() detected in async compaction
- fatal signal pending
- too many pages isolated in the zone (only for async compaction)
Further distinguishing the exact reason seems unnecessary for now.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some compaction tracepoints convert the integer return values to strings
using the compaction_status_string array. This works for in-kernel
printing, but not userspace trace printing of raw captured trace such as
via trace-cmd report.
This patch converts the private array to appropriate tracepoint macros
that result in proper userspace support.
trace-cmd output before:
transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0
zone=ffffffff81815d7a order=9 ret=
after:
transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0
zone=ffffffff81815d7a order=9 ret=partial
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce is_via_compact_memory() helper indicating compacting via
/proc/sys/vm/compact_memory to improve readability.
To catch this situation in __compaction_suitable, use order as parameter
directly instead of using struct compact_control.
This patch has no functional changes.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cache isolate_start_pfn before entering isolate_migratepages(). If
pageblock is skipped in isolate_migratepages() due to whatever reason,
cc->migrate_pfn can be far from isolate_start_pfn hence we flush pages
that were freed. For example, the following scenario can be possible:
- assume order-9 compaction, pageblock order is 9
- start_isolate_pfn is 0x200
- isolate_migratepages()
- skip a number of pageblocks
- start to isolate from pfn 0x600
- cc->migrate_pfn = 0x620
- return
- last_migrated_pfn is set to 0x200
- check flushing condition
- current_block_start is set to 0x600
- last_migrated_pfn < current_block_start then do useless flush
This wrong flush would not help the performance and success rate so this
patch tries to fix it. One simple way to know the exact position where
we start to isolate migratable pages is that we cache it in
isolate_migratepages() before entering actual isolation. This patch
implements that and fixes the problem.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compaction free scanner is looking for PageBuddy() pages and
skipping all others. For large compound pages such as THP or hugetlbfs,
we can save a lot of iterations if we skip them at once using their
compound_order(). This is generally unsafe and we can read a bogus
value of order due to a race, but if we are careful, the only danger is
skipping too much.
When tested with stress-highalloc from mmtests on 4GB system with 1GB
hugetlbfs pages, the vmstat compact_free_scanned count decreased by at
least 15%.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compaction migrate scanner tries to skip THP pages by their order,
to reduce number of iterations for pages it cannot isolate. The check
is only done if PageLRU() is true, which means it applies to THP pages,
but not e.g. hugetlbfs pages or any other non-LRU compound pages, which
we have to iterate by base pages.
This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be
split under us. But the only danger (after filtering out order values
that are not below MAX_ORDER, to prevent overflows) is that we skip too
much or too little after reading a bogus compound_order() due to a rare
race. This is the same reasoning as patch 99c0fd5e51c4 ("mm,
compaction: skip buddy pages by their order in the migrate scanner")
introduced for unsafely reading PageBuddy() order.
After this patch, all pages are tested for PageCompound() and we skip
them by compound_order(). The test is done after the test for
balloon_page_movable() as we don't want to assume if balloon pages (or
other pages with own isolation and migration implementation if a generic
API gets implemented) are compound or not.
When tested with stress-highalloc from mmtests on 4GB system with 1GB
hugetlbfs pages, the vmstat compact_migrate_scanned count decreased by
15%.
[kirill.shutemov@linux.intel.com: change PageTransHuge checks to PageCompound for different series was squashed here]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reseting the cached compaction scanner positions is now open-coded in
__reset_isolation_suitable() and compact_finished(). Encapsulate the
functionality in a new function reset_cached_positions().
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handling the position where compaction free scanner should restart
(stored in cc->free_pfn) got more complex with commit e14c720efdd7 ("mm,
compaction: remember position within pageblock in free pages scanner").
Currently the position is updated in each loop iteration of
isolate_freepages(), although it should be enough to update it only when
breaking from the loop. There's also an extra check outside the loop
updates the position in case we have met the migration scanner.
This can be simplified if we move the test for having isolated enough
from the for-loop header next to the test for contention, and
determining the restart position only in these cases. We can reuse the
isolate_start_pfn variable for this instead of setting cc->free_pfn
directly. Outside the loop, we can simply set cc->free_pfn to current
value of isolate_start_pfn without any extra check.
Also add a VM_BUG_ON to catch possible mistake in the future, in case we
later add a new condition that terminates isolate_freepages_block()
prematurely without also considering the condition in
isolate_freepages().
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assorted compaction cleanups and optimizations. The interesting patches
are 4 and 5. In 4, skipping of compound pages in single iteration is
improved for migration scanner, so it works also for !PageLRU compound
pages such as hugetlbfs, slab etc. Patch 5 introduces this kind of
skipping in the free scanner. The trick is that we can read
compound_order() without any protection, if we are careful to filter out
values larger than MAX_ORDER. The only danger is that we skip too much.
The same trick was already used for reading the freepage order in the
migrate scanner.
To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc
from mmtests, set to simulate THP allocations (including __GFP_COMP) on
a 4GB system where 1GB was occupied by hugetlbfs pages. I'll include
just the relevant stats:
Patch 3 Patch 4 Patch 5
Compaction stalls 7523 7529 7515
Compaction success 323 304 322
Compaction failures 7200 7224 7192
Page migrate success 247778 264395 240737
Page migrate failure 15358 33184 21621
Compaction pages isolated 906928 980192 909983
Compaction migrate scanned 2005277 1692805 1498800
Compaction free scanned 13255284 11539986 9011276
Compaction cost 288 305 277
With 5 iterations per patch, the results are still noisy, but we can see
that Patch 4 does reduce migrate_scanned by 15% thanks to skipping the
hugetlbfs pages at once. Interestingly, free_scanned is also reduced
and I have no idea why. Patch 5 further reduces free_scanned as
expected, by 15%. Other stats are unaffected modulo noise.
[1] https://lkml.org/lkml/2015/1/19/158
This patch (of 5):
Compaction should finish when the migration and free scanner meet, i.e.
they reach the same pageblock. Currently however, the test in
compact_finished() simply just compares the exact pfns, which may yield
a false negative when the free scanner position is in the middle of a
pageblock and the migration scanner reaches the begining of the same
pageblock.
This hasn't been a problem until commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner") allowed the
free scanner position to be in the middle of a pageblock between
invocations. The hot-fix 1d5bfe1ffb5b ("mm, compaction: prevent
infinite loop in compact_zone") prevented the issue by adding a special
check in the migration scanner to satisfy the current detection of
scanners meeting.
However, the proper fix is to make the detection more robust. This
patch introduces the compact_scanners_met() function that returns true
when the free scanner position is in the same or lower pageblock than
the migration scanner. The special case in isolate_migratepages()
introduced by 1d5bfe1ffb5b is removed.
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
| |
mm/compaction.c:250:13: warning: 'suitable_migration_target' defined but not used [-Wunused-function]
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the compaction is activated via /proc/sys/vm/compact_memory it would
better scan the whole zone. And some platforms, for instance ARM, have
the start_pfn of a zone at zero. Therefore the first try to compact via
/proc doesn't work. It needs to reset the compaction scanner position
first.
Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration. The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion. The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state. This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru. Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.
To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data. These maps are
created locked and read only. Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool. When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory. When the value is set to 1, allocations
succeed.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compaction has anti fragmentation algorithm. It is that freepage should
be more than pageblock order to finish the compaction if we don't find any
freepage in requested migratetype buddy list. This is for mitigating
fragmentation, but, there is a lack of migratetype consideration and it is
too excessive compared to page allocator's anti fragmentation algorithm.
Not considering migratetype would cause premature finish of compaction.
For example, if allocation request is for unmovable migratetype, freepage
with CMA migratetype doesn't help that allocation and compaction should
not be stopped. But, current logic regards this situation as compaction
is no longer needed, so finish the compaction.
Secondly, condition is too excessive compared to page allocator's logic.
We can steal freepage from other migratetype and change pageblock
migratetype on more relaxed conditions in page allocator. This is
designed to prevent fragmentation and we can use it here. Imposing hard
constraint only to the compaction doesn't help much in this case since
page allocator would cause fragmentation again.
To solve these problems, this patch borrows anti fragmentation logic from
page allocator. It will reduce premature compaction finish in some cases
and reduce excessive compaction work.
stress-highalloc test in mmtests with non movable order 7 allocation shows
considerable increase of compaction success rate.
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
31.82 : 42.20
I tested it on non-reboot 5 runs stress-highalloc benchmark and found that
there is no more degradation on allocation success rate than before. That
roughly means that this patch doesn't result in more fragmentations.
Vlastimil suggests additional idea that we only test for fallbacks when
migration scanner has scanned a whole pageblock. It looked good for
fragmentation because chance of stealing increase due to making more free
pages in certain pageblock. So, I tested it, but, it results in decreased
compaction success rate, roughly 38.00. I guess the reason that if system
is low memory condition, watermark check could be failed due to not enough
order 0 free page and so, sometimes, we can't reach a fallback check
although migrate_pfn is aligned to pageblock_nr_pages. I can insert code
to cope with this situation but it makes code more complicated so I don't
include his idea at this patch.
[akpm@linux-foundation.org: fix CONFIG_CMA=n build]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add kernel address sanitizer hooks to mark allocated page's addresses as
accessible in corresponding shadow region. Mark freed pages as
inaccessible.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vmstat interfaces are good at hiding negative counts (at least when
CONFIG_SMP); but if you peer behind the curtain, you find that
nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
more negative: so they can absorb larger and larger numbers of isolated
pages, yet still appear to be zero.
I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
but I guess it's there for a good reason, in which case we ought to get
too_many_isolated() working again.
The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
to call acct_isolated() to increment them.
It is possible that the bug whcih this patch fixes could cause OOM kills
when the system still has a lot of reclaimable page cache.
Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org> [3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, freepage isolation in one pageblock doesn't consider how many
freepages we isolate. When I traced flow of compaction, compaction
sometimes isolates more than 256 freepages to migrate just 32 pages.
In this patch, freepage isolation is stopped at the point that we
have more isolated freepage than isolated page for migration. This
results in slowing down free page scanner and make compaction success
rate higher.
stress-highalloc test in mmtests with non movable order 7 allocation shows
increase of compaction success rate.
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
27.13 : 31.82
pfn where both scanners meets on compaction complete
(separate test due to enormous tracepoint buffer)
(zone_start=4096, zone_end=1048576)
586034 : 654378
In fact, I didn't fully understand why this patch results in such good
result. There was a guess that not used freepages are released to pcp list
and on next compaction trial we won't isolate them again so compaction
success rate would decrease. To prevent this effect, I tested with adding
pcp drain code on release_freepages(), but, it has no good effect.
Anyway, this patch reduces waste time to isolate unneeded freepages so
seems reasonable.
Vlastimil said:
: I briefly tried it on top of the pivot-changing series and with order-9
: allocations it reduced free page scanned counter by almost 10%. No effect
: on success rates (maybe because pivot changing already took care of the
: scanners meeting problem) but the scanning reduction is good on its own.
:
: It also explains why e14c720efdd7 ("mm, compaction: remember position
: within pageblock in free pages scanner") had less than expected
: improvements. It would only actually stop within pageblock in case of
: async compaction detecting contention. I guess that's also why the
: infinite loop problem fixed by 1d5bfe1ffb5b affected so relatively few
: people.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What we want to check here is whether there is highorder freepage in buddy
list of other migratetype in order to steal it without fragmentation.
But, current code just checks cc->order which means allocation request
order. So, this is wrong.
Without this fix, non-movable synchronous compaction below pageblock order
would not stopped until compaction is complete, because migratetype of
most pageblocks are movable and high order freepage made by compaction is
usually on movable type buddy list.
There is some report related to this bug. See below link.
http://www.spinics.net/lists/linux-mm/msg81666.html
Although the issued system still has load spike comes from compaction,
this makes that system completely stable and responsive according to his
report.
stress-highalloc test in mmtests with non movable order 7 allocation
doesn't show any notable difference in allocation success rate, but, it
shows more compaction success rate.
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
18.47 : 28.94
Fixes: 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available")
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compaction deferring logic is heavy hammer that block the way to the
compaction. It doesn't consider overall system state, so it could prevent
user from doing compaction falsely. In other words, even if system has
enough range of memory to compact, compaction would be skipped due to
compaction deferring logic. This patch add new tracepoint to understand
work of deferring logic. This will also help to check compaction success
and fail.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is not well analyzed that when/why compaction start/finish or not.
With these new tracepoints, we can know much more about start/finish
reason of compaction. I can find following bug with these tracepoint.
http://www.spinics.net/lists/linux-mm/msg81582.html
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It'd be useful to know current range where compaction work for detailed
analysis. With it, we can know pageblock where we actually scan and
isolate, and, how much pages we try in that pageblock and can guess why it
doesn't become freepage with pageblock order roughly.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now have tracepoint for begin event of compaction and it prints start
position of both scanners, but, tracepoint for end event of compaction
doesn't print finish position of both scanners. It'd be also useful to
know finish position of both scanners so this patch add it. It will help
to find odd behavior or problem on compaction internal logic.
And mode is added to both begin/end tracepoint output, since according to
mode, compaction behavior is quite different.
And lastly, status format is changed to string rather than status number
for readability.
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expand the usage of the struct alloc_context introduced in the previous
patch also for calling try_to_compact_pages(), to reduce the number of its
parameters. Since the function is in different compilation unit, we need
to move alloc_context definition in the shared mm/internal.h header.
With this change we get simpler code and small savings of code size and stack
usage:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27)
function old new delta
__alloc_pages_direct_compact 283 256 -27
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13)
function old new delta
try_to_compact_pages 582 569 -13
Stack usage of __alloc_pages_direct_compact goes from 24 to none (per
scripts/checkstack.pl).
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The goal of memory compaction is to create high-order freepages through
page migration. Page migration however puts pages on the per-cpu lru_add
cache, which is later flushed to per-cpu pcplists, and only after pcplists
are drained the pages can actually merge. This can happen due to the
per-cpu caches becoming full through further freeing, or explicitly.
During direct compaction, it is useful to do the draining explicitly so
that pages merge as soon as possible and compaction can detect success
immediately and keep the latency impact at minimum. However the current
implementation is far from ideal. Draining is done only in
__alloc_pages_direct_compact(), after all zones were already compacted,
and the decisions to continue or stop compaction in individual zones was
done without the last batch of migrations being merged. It is also
missing the draining of lru_add cache before the pcplists.
This patch moves the draining for direct compaction into compact_zone().
It adds the missing lru_cache draining and uses the newly introduced
single zone pcplists draining to reduce overhead and avoid impact on
unrelated zones. Draining is only performed when it can actually lead to
merging of a page of desired order (passed by cc->order). This means it
is only done when migration occurred in the previously scanned cc->order
aligned block(s) and the migration scanner is now pointing to the next
cc->order aligned block.
The patch has been tested with stress-highalloc benchmark from mmtests.
Although overal allocation success rates of the benchmark were not
affected, the number of detected compaction successes has doubled. This
suggests that allocations were previously successful due to implicit
merging caused by background activity, making a later allocation attempt
succeed immediately, but not attributing the success to compaction. Since
stress-highalloc always tries to allocate almost the whole memory, it
cannot show the improvement in its reported success rate metric. However
after this patch, compaction should detect success and terminate earlier,
reducing the direct compaction latencies in a real scenario.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compaction caches the migration and free scanner positions between
compaction invocations, so that the whole zone gets eventually scanned and
there is no bias towards the initial scanner positions at the
beginning/end of the zone.
The cached positions are continuously updated as scanners progress and the
updating stops as soon as a page is successfully isolated. The reasoning
behind this is that a pageblock where isolation succeeded is likely to
succeed again in near future and it should be worth revisiting it.
However, the downside is that potentially many pages are rescanned without
successful isolation. At worst, there might be a page where isolation
from LRU succeeds but migration fails (potentially always). So upon
encountering this page, cached position would always stop being updated
for no good reason. It might have been useful to let such page be
rescanned with sync compaction after async one failed, but this is now
handled by caching scanner position for async and sync mode separately
since commit 35979ef33931 ("mm, compaction: add per-zone migration pfn
cache for async compaction").
After this patch, cached positions are updated unconditionally. In
stress-highalloc benchmark, this has decreased the numbers of scanned
pages by few percent, without affecting allocation success rates.
To prevent free scanner from leaving free pages behind after they are
returned due to page migration failure, the cached scanner pfn is changed
to point to the pageblock of the returned free page with the highest pfn,
before leaving compact_zone().
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deferred compaction is employed to avoid compacting zone where sync direct
compaction has recently failed. As such, it makes sense to only defer
when a full zone was scanned, which is when compact_zone returns with
COMPACT_COMPLETE. It's less useful to defer when compact_zone returns
with apparent success (COMPACT_PARTIAL), followed by a watermark check
failure, which can happen due to parallel allocation activity. It also
does not make much sense to defer compaction which was completely skipped
(COMPACT_SKIP) for being unsuitable in the first place.
This patch therefore makes deferred compaction trigger only when
COMPACT_COMPLETE is returned from compact_zone(). Results of
stress-highalloc becnmark show the difference is within measurement error,
so the issue is rather cosmetic.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 53853e2d2bfb ("mm, compaction: defer each zone individually
instead of preferred zone"), compaction is deferred for each zone where
sync direct compaction fails, and reset where it succeeds. However, it
was observed that for DMA zone compaction often appeared to succeed
while subsequent allocation attempt would not, due to different outcome
of watermark check.
In order to properly defer compaction in this zone, the candidate zone
has to be passed back to __alloc_pages_direct_compact() and compaction
deferred in the zone after the allocation attempt fails.
The large source of mismatch between watermark check in compaction and
allocation was the lack of alloc_flags and classzone_idx values in
compaction, which has been fixed in the previous patch. So with this
problem fixed, we can simplify the code by removing the candidate_zone
parameter and deferring in __alloc_pages_direct_compact().
After this patch, the compaction activity during stress-highalloc
benchmark is still somewhat increased, but it's negligible compared to the
increase that occurred without the better watermark checking. This
suggests that it is still possible to apparently succeed in compaction but
fail to allocate, possibly due to parallel allocation activity.
[akpm@linux-foundation.org: fix build]
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compaction relies on zone watermark checks for decisions such as if it's
worth to start compacting in compaction_suitable() or whether compaction
should stop in compact_finished(). The watermark checks take
classzone_idx and alloc_flags parameters, which are related to the memory
allocation request. But from the context of compaction they are currently
passed as 0, including the direct compaction which is invoked to satisfy
the allocation request, and could therefore know the proper values.
The lack of proper values can lead to mismatch between decisions taken
during compaction and decisions related to the allocation request. Lack
of proper classzone_idx value means that lowmem_reserve is not taken into
account. This has manifested (during recent changes to deferred
compaction) when DMA zone was used as fallback for preferred Normal zone.
compaction_suitable() without proper classzone_idx would think that the
watermarks are already satisfied, but watermark check in
get_page_from_freelist() would fail. Because of this problem, deferring
compaction has extra complexity that can be removed in the following
patch.
The issue (not confirmed in practice) with missing alloc_flags is opposite
in nature. For allocations that include ALLOC_HIGH, ALLOC_HIGHER or
ALLOC_CMA in alloc_flags (the last includes all MOVABLE allocations on
CMA-enabled systems) the watermark checking in compaction with 0 passed
will be stricter than in get_page_from_freelist(). In these cases
compaction might be running for a longer time than is really needed.
Another issue compaction_suitable() is that the check for "does the zone
need compaction at all?" comes only after the check "does the zone have
enough free free pages to succeed compaction". The latter considers extra
pages for migration and can therefore in some situations fail and return
COMPACT_SKIPPED, although the high-order allocation would succeed and we
should return COMPACT_PARTIAL.
This patch fixes these problems by adding alloc_flags and classzone_idx to
struct compact_control and related functions involved in direct compaction
and watermark checking. Where possible, all other callers of
compaction_suitable() pass proper values where those are known. This is
currently limited to classzone_idx, which is sometimes known in kswapd
context. However, the direct reclaim callers should_continue_reclaim()
and compaction_ready() do not currently know the proper values, so the
coordination between reclaim and compaction may still not be as accurate
as it could. This can be fixed later, if it's shown to be an issue.
Additionaly the checks in compact_suitable() are reordered to address the
second issue described above.
The effect of this patch should be slightly better high-order allocation
success rates and/or less compaction overhead, depending on the type of
allocations and presence of CMA. It allows simplifying deferred
compaction code in a followup patch.
When testing with stress-highalloc, there was some slight improvement
(which might be just due to variance) in success rates of non-THP-like
allocations.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|