From 4d1ddb8d3b84e9e162217bf55e8aad6fa796b836 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Date: Thu, 5 Apr 2018 16:24:43 -0700
Subject: BACKPORT: zsmalloc: introduce zs_huge_class_size()

Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3.

ZRAM's max_zpage_size is a bad thing.  It forces zsmalloc to store
normal objects as huge ones, which results in bigger zsmalloc memory
usage.  Drop it and use actual zsmalloc huge-class value when decide if
the object is huge or not.

This patch (of 2):

Not every object can be share its zspage with other objects, e.g.  when
the object is as big as zspage or nearly as big a zspage.  For such
objects zsmalloc has a so called huge class - every object which belongs
to huge class consumes the entire zspage (which consists of a physical
page).  On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is
3264, so starting down from size 3264, objects can share page(-s) and
thus minimize memory wastage.

ZRAM, however, has its own statically defined watermark for huge
objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every
object larger than this watermark (3072) as a PAGE_SIZE object, in other
words, to a huge class, while zsmalloc can keep some of those objects in
non-huge classes.  This results in increased memory consumption.

zsmalloc knows better if the object is huge or not.  Introduce
zs_huge_class_size() function which tells if the given object can be
stored in one of non-huge classes or not.  This will let us to drop
ZRAM's huge object watermark and fully rely on zsmalloc when we decide
if the object is huge.

[sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
  Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 010b495e2fa32353d0ef6aa70a8169e5ef617a15)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 113183619
Change-Id: Ic35f8c1ec75f0b78bf2d83729b6aedd2999f25c8
---
 include/linux/zsmalloc.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include/linux')

diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 57a8e98f2708..2219cce81ca4 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -47,6 +47,8 @@ void zs_destroy_pool(struct zs_pool *pool);
 unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags);
 void zs_free(struct zs_pool *pool, unsigned long obj);
 
+size_t zs_huge_class_size(struct zs_pool *pool);
+
 void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
-- 
cgit v1.2.3


From 461a6385e58e8247e6ba2005aa5d1b8d980ee4a2 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@wdc.com>
Date: Thu, 2 Aug 2018 10:51:40 -0700
Subject: scsi: sysfs: Introduce sysfs_{un,}break_active_protection()

commit 2afc9166f79b8f6da5f347f48515215ceee4ae37 upstream.

Introduce these two functions and export them such that the next patch
can add calls to these functions from the SCSI core.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/sysfs.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

(limited to 'include/linux')

diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 00a1f330f93a..d3c19f8c4564 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -238,6 +238,9 @@ int __must_check sysfs_create_files(struct kobject *kobj,
 				   const struct attribute **attr);
 int __must_check sysfs_chmod_file(struct kobject *kobj,
 				  const struct attribute *attr, umode_t mode);
+struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj,
+						  const struct attribute *attr);
+void sysfs_unbreak_active_protection(struct kernfs_node *kn);
 void sysfs_remove_file_ns(struct kobject *kobj, const struct attribute *attr,
 			  const void *ns);
 bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr);
@@ -351,6 +354,17 @@ static inline int sysfs_chmod_file(struct kobject *kobj,
 	return 0;
 }
 
+static inline struct kernfs_node *
+sysfs_break_active_protection(struct kobject *kobj,
+			      const struct attribute *attr)
+{
+	return NULL;
+}
+
+static inline void sysfs_unbreak_active_protection(struct kernfs_node *kn)
+{
+}
+
 static inline void sysfs_remove_file_ns(struct kobject *kobj,
 					const struct attribute *attr,
 					const void *ns)
-- 
cgit v1.2.3


From d25b6212cc955482eefef191b02975c1fb87d65c Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Thu, 7 Jun 2018 09:56:59 -0700
Subject: iommu/vt-d: Add definitions for PFSID

commit 0f725561e168485eff7277d683405c05b192f537 upstream.

When SRIOV VF device IOTLB is invalidated, we need to provide
the PF source ID such that IOMMU hardware can gauge the depth
of invalidation queue which is shared among VFs. This is needed
when device invalidation throttle (DIT) capability is supported.

This patch adds bit definitions for checking and tracking PFSID.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: "Ashok Raj" <ashok.raj@intel.com>
Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/intel-iommu.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include/linux')

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 23e129ef6726..0892615ce93d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -125,6 +125,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
  * Extended Capability Register
  */
 
+#define ecap_dit(e)		((e >> 41) & 0x1)
 #define ecap_pasid(e)		((e >> 40) & 0x1)
 #define ecap_pss(e)		((e >> 35) & 0x1f)
 #define ecap_eafs(e)		((e >> 34) & 0x1)
@@ -294,6 +295,7 @@ enum {
 #define QI_DEV_IOTLB_SID(sid)	((u64)((sid) & 0xffff) << 32)
 #define QI_DEV_IOTLB_QDEP(qdep)	(((qdep) & 0x1f) << 16)
 #define QI_DEV_IOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
+#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xfff) << 52))
 #define QI_DEV_IOTLB_SIZE	1
 #define QI_DEV_IOTLB_MAX_INVS	32
 
@@ -318,6 +320,7 @@ enum {
 #define QI_DEV_EIOTLB_PASID(p)	(((u64)p) << 32)
 #define QI_DEV_EIOTLB_SID(sid)	((u64)((sid) & 0xffff) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd)	((u64)((qd) & 0x1f) << 4)
+#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xfff) << 52))
 #define QI_DEV_EIOTLB_MAX_INVS	32
 
 #define QI_PGRP_IDX(idx)	(((u64)(idx)) << 55)
-- 
cgit v1.2.3


From d792799caa81f9b0a850380a9eacafa4922b3990 Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Thu, 7 Jun 2018 09:57:00 -0700
Subject: iommu/vt-d: Fix dev iotlb pfsid use

commit 1c48db44924298ad0cb5a6386b88017539be8822 upstream.

PFSID should be used in the invalidation descriptor for flushing
device IOTLBs on SRIOV VFs.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: "Ashok Raj" <ashok.raj@intel.com>
Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/intel-iommu.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

(limited to 'include/linux')

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0892615ce93d..e353f6600b0b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -466,9 +466,8 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
 			     u8 fm, u64 type);
 extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 			  unsigned int size_order, u64 type);
-extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
-			       u64 addr, unsigned mask);
-
+extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+			u16 qdep, u64 addr, unsigned mask);
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 
 extern int dmar_ir_support(void);
-- 
cgit v1.2.3


From 1fc5fa527625d2cbddf9004b26e020ecc83d272d Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Mon, 24 Oct 2016 15:27:59 +1000
Subject: x86/io: add interface to reserve io memtype for a resource range.
 (v1.1)

commit 8ef4227615e158faa4ee85a1d6466782f7e22f2f upstream.

A recent change to the mm code in:
87744ab3832b mm: fix cache mode tracking in vm_insert_mixed()

started enforcing checking the memory type against the registered list for
amixed pfn insertion mappings. It happens that the drm drivers for a number
of gpus relied on this being broken. Currently the driver only inserted
VRAM mappings into the tracking table when they came from the kernel,
and userspace mappings never landed in the table. This led to a regression
where all the mapping end up as UC instead of WC now.

I've considered a number of solutions but since this needs to be fixed
in fixes and not next, and some of the solutions were going to introduce
overhead that hadn't been there before I didn't consider them viable at
this stage. These mainly concerned hooking into the TTM io reserve APIs,
but these API have a bunch of fast paths I didn't want to unwind to add
this to.

The solution I've decided on is to add a new API like the arch_phys_wc
APIs (these would have worked but wc_del didn't take a range), and
use them from the drivers to add a WC compatible mapping to the table
for all VRAM on those GPUs. This means we can then create userspace
mapping that won't get degraded to UC.

v1.1: use CONFIG_X86_PAT + add some comments in io.h

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: x86@kernel.org
Cc: mcgrof@suse.com
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/io.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

(limited to 'include/linux')

diff --git a/include/linux/io.h b/include/linux/io.h
index de64c1e53612..8ab45611fc35 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -154,4 +154,26 @@ enum {
 void *memremap(resource_size_t offset, size_t size, unsigned long flags);
 void memunmap(void *addr);
 
+/*
+ * On x86 PAT systems we have memory tracking that keeps track of
+ * the allowed mappings on memory ranges. This tracking works for
+ * all the in-kernel mapping APIs (ioremap*), but where the user
+ * wishes to map a range from a physical device into user memory
+ * the tracking won't be updated. This API is to be used by
+ * drivers which remap physical device pages into userspace,
+ * and wants to make sure they are mapped WC and not UC.
+ */
+#ifndef arch_io_reserve_memtype_wc
+static inline int arch_io_reserve_memtype_wc(resource_size_t base,
+					     resource_size_t size)
+{
+	return 0;
+}
+
+static inline void arch_io_free_memtype_wc(resource_size_t base,
+					   resource_size_t size)
+{
+}
+#endif
+
 #endif /* _LINUX_IO_H */
-- 
cgit v1.2.3