| Commit message (Collapse) | Author |
|
Per RFC 6724, section 4, "Candidate Source Addresses":
It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the "outgoing" interface).
Add a sysctl to enable this behaviour.
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allows putting a VXLAN device into a new flow-based mode in which
skbs with a ip_tunnel_info dst metadata attached will be encapsulated
according to the instructions stored in there with the VXLAN device
defaults taken into consideration.
Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
set, the packet processing will populate a ip_tunnel_info struct for
each packet received and attach it to the skb using the new metadata
dst. The metadata structure will contain the outer header and tunnel
header fields which have been stripped off. Layers further up in the
stack such as routing, tc or netfitler can later match on these fields
and perform forwarding. It is the responsibility of upper layers to
ensure that the flag is set if the metadata is needed. The flag limits
the additional cost of metadata collecting based on demand.
This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.
It also allows for OVS to leverage this mode which in turn allows for
the removal of the OVS specific VXLAN code.
Because the skb is currently scrubed in vxlan_rcv(), the attachment of
the new dst metadata is postponed until after scrubing which requires
the temporary addition of a new member to vxlan_metadata. This member
is removed again in a later commit after the indirect VXLAN receive API
has been removed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This implementation uses lwtunnel infrastructure to register
hooks for mpls tunnel encaps.
It picks cues from iptunnel_encaps infrastructure and previous
mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Provides infrastructure to parse/dump/store encap information for
light weight tunnels like mpls. Encap information for such tunnels
is associated with fib routes.
This infrastructure is based on previous suggestions from
Eric Biederman to follow the xfrm infrastructure.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
helper functions. These functions may change skb->data/hlen which are
cached by some JITs to improve performance of ld_abs/ld_ind instructions.
Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
re-compute header len and re-cache skb->data/hlen back into cpu registers.
Note, skb->data/hlen are not directly accessible from the programs,
so any changes to skb->data done either by these helpers or by other
TC actions are safe.
eBPF JIT supported by three architectures:
- arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
- x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
(experiments showed that conditional re-caching is slower).
- s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
in the program (re-caching is tbd).
These helpers allow more scalable handling of vlan from the programs.
Instead of creating thousands of vlan netdevs on top of eth0 and attaching
TC+ingress+bpf to all of them, the program can be attached to eth0 directly
and manipulate vlans as necessary.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both of these fields are unused and has been unused since they
were added 3 and 5 years ago. Drop them since they are clearly
not very useful.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's not used anywhere.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device. If so, drop skb. A
non-zero skb->offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev->skb_mark. The switchdev port
driver would assign a non-zero dev->offload_skb_mark for each device port
netdev during registration, for example.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.
Also optimize the x86 code a bit by caching task_struct_size.
Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).
Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already. This saves from doing an extra slab
allocation at fork().
The only real downside here is that we have to stick everything
and the end of the task_struct. But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently, we set wrong gfp_mask to page_owner info in case of isolated
freepage by compaction and split page. It causes incorrect mixed
pageblock report that we can get from '/proc/pagetypeinfo'. This metric
is really useful to measure fragmentation effect so should be accurate.
This patch fixes it by setting correct information.
Without this patch, after kernel build workload is finished, number of
mixed pageblock is 112 among roughly 210 movable pageblocks.
But, with this fix, output shows that mixed pageblock is just 57.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Switch to my kernel.org alias instead of a badly named gmail address,
which I rarely use.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile). For example it can detect when formatting a pointer as a
number, like the issue fixed in commit a3fa71c40f18 ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f. for example commit 5ce1aca81435
("reiserfs: fix __RASSERT format string").
To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/. These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
s390 has a constant hugepage size, by setting HPAGE_SHIFT we also change
e.g. the pageblock_order, which should be independent in respect to
hugepage support.
With this patch every architecture is free to define how to check
for hugepage support.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.
The switch driver can react to protodown notification by doing a phys down
on the associated switch port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
They just call file_inode and then the corresponding *_inode_file_wait
function. Just make them static inlines instead.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
|
|
Allow callers to pass in an inode instead of a filp.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
|
|
Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Pass the mdix setting from ethtool down to the phy driver, to allow
driver specific implementations of manually setting the polarity.
Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Hop was always either 0 or sizeof(struct ipv6hdr).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.
This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).
V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB
Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg
data") updated per-blkcg policy data to be dynamically allocated.
When a policy is registered, its policy data aren't created. Instead,
when the policy is activated on a queue, the policy data are allocated
if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
This is buggy. Consider the following scenario.
1. A blkcg is created. No blkg's attached yet.
2. The policy is registered. No policy data is allocated.
3. The policy is activated on a queue. As the above blkcg doesn't
have any blkg's, it won't allocate the matching blkcg_policy_data.
4. An IO is issued from the blkcg and blkg is created and the blkcg
still doesn't have the matching policy data allocated.
With cfq-iosched, this leads to an oops.
It also doesn't free policy data on policy unregistration assuming
that freeing of all policy data on blkcg destruction should take care
of it; however, this also is incorrect.
1. A blkcg has policy data.
2. The policy gets unregistered but the policy data remains.
3. Another policy gets registered on the same slot.
4. Later, the new policy tries to allocate policy data on the previous
blkcg but the slot is already occupied and gets skipped. The
policy ends up operating on the policy data of the previous policy.
There's no reason to manage blkcg_policy_data lazily. The reason we
do lazy allocation of blkg's is that the number of all possible blkg's
is the product of cgroups and block devices which can reach a
surprising level. blkcg_policy_data is contrained by the number of
cgroups and shouldn't be a problem.
This patch makes blkcg_policy_data to be allocated for all existing
blkcg's on policy registration and freed on unregistration and removes
blkcg_policy_data handling from policy [de]activation paths. This
makes that blkcg_policy_data are created and removed with the policy
they belong to and fixes the above described problems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg data")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Add all_blkcgs list goes through blkcg->all_blkcgs_node and is
protected by blkcg_pol_mutex. This will be used to fix
blkcg_policy_data allocation bug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net. Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.
This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
All users gone. Remove it before we get another one.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.
When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.
Example:
CPU1 CPU2
cpu_up/down
irq_desc = irq_to_desc(irq);
remove_from_radix_tree(desc);
raw_spin_lock(&desc->lock);
free(desc);
We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:
Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
|
|
Making tick_broadcast_oneshot_control() independent from
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
there.
Provide a proper stub inline.
Fixes: f32dd1170511 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.
If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.
Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.
The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.
Provide checks for the trivial cases:
- If broadcast is disabled in the config, then return busy
- If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
busy if the broadcast device is hrtimer based.
- If oneshot mode is enabled in the config call the original
tick_broadcast_oneshot_control() function. That function needs
extra checks which will be implemented in seperate patches.
[ Split out from a larger combo patch ]
Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
|
|
Device drivers typically use ACPI _HIDs/_CIDs listed in struct device_driver
acpi_match_table to match devices. However, for generic drivers, we do not
want to list _HID for all supported devices. Also, certain classes of devices
do not have _CID (e.g. SATA, USB). Instead, we can leverage ACPI _CLS,
which specifies PCI-defined class code (i.e. base-class, subclass and
programming interface). This patch adds support for matching ACPI devices using
the _CLS method.
To support loadable module, current design uses _HID or _CID to match device's
modalias. With the new way of matching with _CLS this would requires modification
to the current ACPI modalias key to include _CLS. This patch appends PCI-defined
class-code to the existing ACPI modalias as following.
acpi:<HID>:<CID1>:<CID2>:..:<CIDn>:<bbsspp>:
E.g:
# cat /sys/devices/platform/AMDI0600:00/modalias
acpi:AMDI0600:010601:
where bb is th base-class code, ss is te sub-class code, and pp is the
programming interface code
Since there would not be _HID/_CID in the ACPI matching table of the driver,
this patch adds a field to acpi_device_id to specify the matching _CLS.
static const struct acpi_device_id ahci_acpi_match[] = {
{ ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xffffff) },
{},
};
In this case, the corresponded entry in modules.alias file would be:
alias acpi*:010601:* ahci_platform
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This effectively reverts the following three commits:
7bc10388ccdd ACPI / resources: free memory on error in add_region_before()
0f1b414d1907 ACPI / PNP: Avoid conflicting resource reservations
b9a5e5e18fbf ACPI / init: Fix the ordering of acpi_reserve_resources()
(commit b9a5e5e18fbf introduced regressions some of which, but not
all, were addressed by commit 0f1b414d1907 and commit 7bc10388ccdd
was a fixup on top of the latter) and causes ACPI fixed hardware
resources to be reserved at the fs_initcall_sync stage of system
initialization.
The story is as follows. First, a boot regression was reported due
to an apparent resource reservation ordering change after a commit
that shouldn't lead to such changes. Investigation led to the
conclusion that the problem happened because acpi_reserve_resources()
was executed at the device_initcall() stage of system initialization
which wasn't strictly ordered with respect to driver initialization
(and with respect to the initialization of the pcieport driver in
particular), so a random change causing the device initcalls to be
run in a different order might break things.
The response to that was to attempt to run acpi_reserve_resources()
as soon as we knew that ACPI would be in use (commit b9a5e5e18fbf).
However, that turned out to be too early, because it caused resource
reservations made by the PNP system driver to fail on at least one
system and that failure was addressed by commit 0f1b414d1907.
That fix still turned out to be insufficient, though, because
calling acpi_reserve_resources() before the fs_initcall stage of
system initialization caused a boot regression to happen on the
eCAFE EC-800-H20G/S netbook. That meant that we only could call
acpi_reserve_resources() at the fs_initcall initialization stage
or later, but then we might just as well call it after the PNP
initalization in which case commit 0f1b414d1907 wouldn't be
necessary any more.
For this reason, the changes made by commit 0f1b414d1907 are reverted
(along with a memory leak fixup on top of that commit), the changes
made by commit b9a5e5e18fbf that went too far are reverted too and
acpi_reserve_resources() is changed into fs_initcall_sync, which
will cause it to be executed after the PNP subsystem initialization
(which is an fs_initcall) and before device initcalls (including
the pcieport driver initialization) which should avoid the initial
issue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
Link: http://marc.info/?t=143092384600002&r=1&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Modular users will always be users of init functionality, but
users of init functionality are not necessarily always modules.
Hence any functionality like module_init and module_exit would
be more at home in the module.h file. And module.h should
explicitly include init.h to make the dependency clear.
We've already done all the legwork needed to ensure that this
move does not cause any build regressions due to implicit
header file include assumptions about where module_init lives.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Abstract the NTB device behind a programming interface, so that it can
support different hardware and client drivers.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Currently print_cfs_rq() is declared in include/linux/sched.h.
However it's not used outside kernel/sched. Hence move the
declaration to kernel/sched/sched.h
Also some functions are only available for CONFIG_SCHED_DEBUG=y.
Hence move the declarations to within the #ifdef.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
sched_info, which results in ugly #if clauses.
Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
switch, selected by both.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
had two problems. First, the preempt-notifier API needs to sleep with the
addition of the static_key, we do however need to hold off preemption
while modifying the preempt notifier list, otherwise a preemption could
observe an inconsistent list state. KVM correctly registers and
unregisters preempt notifiers with preemption disabled, so the sleep
caused dmesg splats.
Second, KVM registers and unregisters preemption notifiers very often
(in vcpu_load/vcpu_put). With a single uniprocessor guest the static key
would move between 0 and 1 continuously, hitting the slow path on every
userspace exit.
To fix this, wrap the static_key inc/dec in a new API, and call it from
KVM.
Fixes: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
At the moment the IRQCHIP_DECLARE macro is only declared locally in
drivers/irqchip/irqchip.h. It prevents from using it directly in arch/*
directories whenever irqchip drivers only exist there, which happens in a few
cases (e.g. arc, arm, microblaze and mips).
This patch makes the macro to be globally defined, i.e. in
include/linux/irqchip.h, and thus usable for arch-specific declarations of
irqchip drivers. In this way, it is very similar to what clocksource does (ie
CLOCKSOURCE_OF_DECLARE is defined in include/linux/clocksource.h).
For now, this patch only moves the declaration of the macro
IRQCHIP_DECLARE to the global header 'include/linux/irqchip.h' and make
'drivers/irqchip/irqchip.h' include 'include/linux/irqchip.h'. Later, other
patches will get rid of 'drivers/irqchip/irqchip.h' and modify all the impacted
irqchip drivers.
Signed-off-by: Joel Porquet <joel@porquet.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1435865565-14114-1-git-send-email-joel@porquet.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
52ebea749aae ("writeback: make backing_dev_info host cgroup-specific
bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's
(bdi_writeback's). As the congested state needs to be per-wb and
referenced from blkcg side and multiple wbs, the patch made all
non-root cong's (bdi_writeback_congested's) reference counted and
indexed on bdi.
When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all
non-root cong's; however, this can hang indefinitely because wb's can
also be referenced from blkcg_gq's which are destroyed after bdi
destruction is complete.
To fix the bug, bdi destruction will be updated to not wait for cong's
to drain, which naturally means that cong's may outlive the associated
bdi. This is fine for non-root cong's but is problematic for the root
cong's which are embedded in their bdi's as they may end up getting
dereferenced after the containing bdi's are freed.
This patch makes root cong's behave the same as non-root cong's. They
are no longer embedded in their bdi's but allocated separately during
bdi initialization, indexed and reference counted the same way.
* As cong handling is the same for all wb's, wb->congested
initialization is moved into wb_init().
* When !CONFIG_CGROUP_WRITEBACK, there was no indexing or refcnting.
bdi->wb_congested is now a pointer pointing to the root cong
allocated during bdi init and minimal refcnting operations are
implemented.
* The above makes root wb init paths diverge depending on
CONFIG_CGROUP_WRITEBACK. root wb init is moved to cgwb_bdi_init().
This patch in itself shouldn't cause any consequential behavior
differences but prepares for the actual fix.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jon Christopherson <jon@jons.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681
Tested-by: Jon Christopherson <jon@jons.org>
Added <linux/slab.h> include to backing-dev.h for kfree() definition.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch only moves files to their new locations, before applying the
next two patches adding the NTB Abstraction layer. Splitting this patch
from the next is intended make distinct which code is changed only due
to moving the files, versus which are substantial code changes in adding
the NTB Abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.
This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
Add two functions sysfs_create_mount_point and
sysfs_remove_mount_point that hang a permanently empty directory off
of a kobject or remove a permanently emptpy directory hanging from a
kobject. Export these new functions so modular filesystems can use
them.
Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Add a new function kernfs_create_empty_dir that can be used to create
directory that can not be modified.
Update the code to use make_empty_dir_inode when reporting a
permanently empty directory to the vfs.
Update the code to not allow adding to permanently empty directories.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Add a magic sysctl table sysctl_mount_point that when used to
create a directory forces that directory to be permanently empty.
Update the code to use make_empty_dir_inode when accessing permanently
empty directories.
Update the code to not allow adding to permanently empty directories.
Update /proc/sys/fs/binfmt_misc to be a permanently empty directory.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
To ensure it is safe to mount proc and sysfs I need to check if
filesystems that are mounted on top of them are mounted on truly empty
directories. Given that some directories can gain entries over time,
knowing that a directory is empty right now is insufficient.
Therefore add supporting infrastructure for permantently empty
directories that proc and sysfs can use when they create mount points
for filesystems and fs_fully_visible can use to test for permanently
empty directories to ensure that nothing will be gained by mounting a
fresh copy of proc or sysfs.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Mateusz Guzik reported :
Currently obtaining a new file descriptor results in locking fdtable
twice - once in order to reserve a slot and second time to fill it.
Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.
Mateusz provided an RFC patch and a micro benchmark :
http://people.redhat.com/~mguzik/pipebench.c
A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.
We can use RCU instead of the spinlock.
__fd_install() must wait if a resize is in progress.
The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())
resize should be attempted by a single thread to not waste resources.
rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.
It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Mateusz Guzik <mguzik@redhat.com>
Tested-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
To be consistent with other kernel interface namings, rename
of_get_named_gen_pool() to of_gen_pool_get(). In the original function
name "_named" suffix references to a device tree property, which contains
a phandle to a device and the corresponding device driver is assumed to
register a gen_pool object.
Due to a weak relation and to avoid any confusion (e.g. in future
possible scenario if gen_pool objects are named) the suffix is removed.
[sfr@canb.auug.org.au: crypto/marvell/cesa - fix up for of_get_named_gen_pool() rename]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Boris BREZILLON <boris.brezillon@free-electrons.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
To be consistent with other genalloc interface namings, rename
dev_get_gen_pool() to gen_pool_get(). The original omitted "dev_" prefix
is removed, since it points to argument type of the function, and so it
does not bring any useful information.
[akpm@linux-foundation.org: update arch/arm/mach-socfpga/pm.c]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mark Brown <broonie@kernel.org>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alan Tull <atull@opensource.altera.com>
Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|