summaryrefslogtreecommitdiff
path: root/include/net (follow)
Commit message (Collapse)AuthorAge
...
* | | route: Extend flow representation with tunnel keyThomas Graf2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to allow routes to match on tunnel metadata. For now, the tunnel id is added to flowi_tunnel which allows for routes to be bound to specific virtual tunnels. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | vxlan: Flow based tunnelingThomas Graf2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows putting a VXLAN device into a new flow-based mode in which skbs with a ip_tunnel_info dst metadata attached will be encapsulated according to the instructions stored in there with the VXLAN device defaults taken into consideration. Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is set, the packet processing will populate a ip_tunnel_info struct for each packet received and attach it to the skb using the new metadata dst. The metadata structure will contain the outer header and tunnel header fields which have been stripped off. Layers further up in the stack such as routing, tc or netfitler can later match on these fields and perform forwarding. It is the responsibility of upper layers to ensure that the flag is set if the metadata is needed. The flag limits the additional cost of metadata collecting based on demand. This prepares the VXLAN device to be steered by the routing and other subsystems which allows to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability. It also allows for OVS to leverage this mode which in turn allows for the removal of the OVS specific VXLAN code. Because the skb is currently scrubed in vxlan_rcv(), the attachment of the new dst metadata is postponed until after scrubing which requires the temporary addition of a new member to vxlan_metadata. This member is removed again in a later commit after the indirect VXLAN receive API has been removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | dst: Metadata destinationsThomas Graf2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces a new dst_metadata which enables to carry per packet metadata between forwarding and processing elements via the skb->dst pointer. The structure is set up to be a union. Thus, each separate type of metadata requires its own dst instance. If demand arises to carry multiple types of metadata concurrently, metadata dst entries can be made stackable. The metadata dst entry is refcnt'ed as expected for now but a non reference counted use is possible if the reference is forced before queueing the skb. In order to allow allocating dsts with variable length, the existing dst_alloc() is split into a dst_alloc() and dst_init() function. The existing dst_init() function to initialize the subsystem is being renamed to dst_subsys_init() to make it clear what is what. The check before ip_route_input() is changed to ignore metadata dsts and drop the dst inside the routing function thus allowing to interpret metadata in a later commit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel genericThomas Graf2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mpls: ip tunnel supportRoopa Prabhu2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implementation uses lwtunnel infrastructure to register hooks for mpls tunnel encaps. It picks cues from iptunnel_encaps infrastructure and previous mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | lwtunnel: support dst output redirect functionRoopa Prabhu2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces lwtunnel_output function to call corresponding lwtunnels output function to xmit the packet. It adds two variants lwtunnel_output and lwtunnel_output6 for ipv4 and ipv6 respectively today. But this is subject to change when lwtstate will reside in dst or dst_metadata (as per upstream discussions). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: support for fib route lwtunnel encap attributesRoopa Prabhu2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds support in ipv6 fib functions to parse Netlink RTA encap attributes and attach encap state data to rt6_info. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: support for fib route lwtunnel encap attributesRoopa Prabhu2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support in ipv4 fib functions to parse user provided encap attributes and attach encap state data to fib_nh and rtable. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | lwtunnel: infrastructure for handling light weight tunnels like mplsRoopa Prabhu2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provides infrastructure to parse/dump/store encap information for light weight tunnels like mpls. Encap information for such tunnels is associated with fib routes. This infrastructure is based on previous suggestions from Eric Biederman to follow the xfrm infrastructure. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | switchdev: add offload_fwd_mark generator helperScott Feldman2015-07-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be unique for device and may even be unique for a sub-set of ports within device, so add switchdev helper function to generate unique marks based on port's switch ID and group_ifindex. group_ifindex would typically be the container dev's ifindex, such as the bridge's ifindex. The generator uses a global hash table to store offload_fwd_marks hashed by {switch ID, group_ifindex} key. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | cls_cgroup: factor out classid retrievalDaniel Borkmann2015-07-20
| |/ |/| | | | | | | | | | | | | | | | | | | | | Split out retrieving the cgroups net_cls classid retrieval into its own function, so that it can be reused later on from other parts of the traffic control subsystem. If there's no skb->sk, then the small helper returns 0 as well, which in cls_cgroup terms means 'could not classify'. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Nonlocal bindTom Herbert2015-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to allow non-local binds similar to how this was done for IPv4. Non-local binds are very useful in emulating the Internet in a box, etc. This add the ip_nonlocal_bind sysctl under ipv6. Testing: Set up nonlocal binding and receive routing on a host, e.g.: ip -6 rule add from ::/0 iif eth0 lookup 200 ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200 sysctl -w net.ipv6.ip_nonlocal_bind=1 Set up routing to 2001:0:0:1::/64 on peer to go to first host ping6 -I 2001:0:0:1::1 peer-address -- to verify Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: inet_twsk_deschedule factorizationEric Dumazet2015-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | inet_twsk_deschedule() calls are followed by inet_twsk_put(). Only particular case is in inet_twsk_purge() but there is no point to defer the inet_twsk_put() after re-enabling BH. Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put() and move the inet_twsk_put() inside. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: simplify timewait refcountingEric Dumazet2015-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timewait sockets have a complex refcounting logic. Once we realize it should be similar to established and syn_recv sockets, we can use sk_nulls_del_node_init_rcu() and remove inet_twsk_unhash() In particular, deferred inet_twsk_put() added in commit 13475a30b66cd ("tcp: connect() race with timewait reuse") looks unecessary : When removing a timewait socket from ehash or bhash, caller must own a reference on the socket anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: remove BUG_ON() in twsk_destructor()Eric Dumazet2015-07-09
| | | | | | | | | | | | | | Kernel will crash the same if one of the pointer is NULL anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: do not slow start when cwnd equals ssthreshYuchung Cheng2015-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the original design slow start is only used to raise cwnd when cwnd is stricly below ssthresh. It makes little sense to slow start when cwnd == ssthresh: especially when hystart has set ssthresh in the initial ramp, or after recovery when cwnd resets to ssthresh. Not doing so will also help reduce the buffer bloat slightly. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: add tcp_in_slow_start helperYuchung Cheng2015-07-09
| | | | | | | | | | | | | | | | | | | | | | | | Add a helper to test the slow start condition in various congestion control modules and other places. This is to prepare a slight improvement in policy as to exactly when to slow start. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: act_mirred: remove spinlock in fast pathEric Dumazet2015-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like act_gact, act_mirred can be lockless in packet processing 1) Use percpu stats 2) update lastuse only every clock tick to avoid false sharing 3) use rcu to protect tcfm_dev 4) Remove spinlock usage, as it is no longer needed. Next step : add multi queue capability to ifb device Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: act_gact: remove spinlock in fast pathEric Dumazet2015-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Final step for gact RCU operation : 1) Use percpu stats 2) update lastuse only every clock tick to avoid false sharing 3) Remove spinlock acquisition, as it is no longer needed. Since this is the last contended lock in packet RX when tc gact is used, this gives impressive gain. My host with 8 RX queues was handling 5 Mpps before the patch, and more than 11 Mpps after patch. Tested: On receiver : dev=eth0 tc qdisc del dev $dev ingress 2>/dev/null tc qdisc add dev $dev ingress tc filter del dev $dev root pref 10 2>/dev/null tc filter del dev $dev pref 10 2>/dev/null tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \ u32 match ip src 7.0.0.0/8 flowid 1:15 action drop Sender sends packets flood from 7/8 network Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: act_gact: use a separate packet counters for gact_determ()Eric Dumazet2015-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Second step for gact RCU operation : We want to get rid of the spinlock protecting gact operations. Stats (packets/bytes) will soon be per cpu. gact_determ() would not work without a central packet counter, so lets add it for this mode. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: add percpu stats to actionsEric Dumazet2015-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | Reuse existing percpu infrastructure John Fastabend added for qdisc. This patch adds a new cpustats parameter to tcf_hash_create() and all actions pass false, meaning this patch should have no effect yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: extend percpu stats helpersEric Dumazet2015-07-08
|/ | | | | | | | | | | | | | | | | | qdisc_bstats_update_cpu() and other helpers were added to support percpu stats for qdisc. We want to add percpu stats for tc action, so this patch add common helpers. qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update() qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Kill sock->sk_protinfoDavid Miller2015-06-28
| | | | | | No more users, so it can now be removed. Signed-off-by: David S. Miller <davem@davemloft.net>
* ax25: Stop using sock->sk_protinfo.David Miller2015-06-28
| | | | | | Just make a ax25_sock structure that provides the ax25_cb pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-06-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-06-24
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/mellanox/mlx4/main.c net/packet/af_packet.c Both conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * sctp: fix ASCONF list handlingMarcelo Ricardo Leitner2015-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->auto_asconf_splist is per namespace and mangled by functions like sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization. Also, the call to inet_sk_copy_descendant() was backuping ->auto_asconf_list through the copy but was not honoring ->do_auto_asconf, which could lead to list corruption if it was different between both sockets. This commit thus fixes the list handling by using ->addr_wq_lock spinlock to protect the list. A special handling is done upon socket creation and destruction for that. Error handlig on sctp_init_sock() will never return an error after having initialized asconf, so sctp_destroy_sock() can be called without addrq_wq_lock. The lock now will be take on sctp_close_sock(), before locking the socket, so we don't do it in inverse order compared to sctp_addr_wq_timeout_handler(). Instead of taking the lock on sctp_sock_migrate() for copying and restoring the list values, it's preferred to avoid rewritting it by implementing sctp_copy_descendant(). Issue was found with a test application that kept flipping sysctl default_auto_asconf on and off, but one could trigger it by issuing simultaneous setsockopt() calls on multiple sockets or by creating/destroying sockets fast enough. This is only triggerable locally. Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") Reported-by: Ji Jianwen <jiji@redhat.com> Suggested-by: Neil Horman <nhorman@tuxdriver.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv4 sysctl option to ignore routes when nexthop link is downAndy Gospodarek2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This feature is only enabled with the new per-interface or ipv4 global sysctls called 'ignore_routes_with_linkdown'. net.ipv4.conf.all.ignore_routes_with_linkdown = 0 net.ipv4.conf.default.ignore_routes_with_linkdown = 0 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, will report to userspace that a route is dead and will no longer resolve to this nexthop when performing a fib lookup. This will signal to userspace that the route will not be selected. The signalling of a RTNH_F_DEAD is only passed to userspace if the sysctl is enabled and link is down. This was done as without it the netlink listeners would have no idea whether or not a nexthop would be selected. The kernel only sets RTNH_F_DEAD internally if the interface has IFF_UP cleared. With the new sysctl set, the following behavior can be observed (interface p8p1 is link-down): default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 cache While the route does remain in the table (so it can be modified if needed rather than being wiped away as it would be if IFF_UP was cleared), the proper next-hop is chosen automatically when the link is down. Now interface p8p1 is linked-up: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 dev p8p1 src 80.0.0.1 cache and the output changes to what one would expect. If the sysctl is not set, the following output would be expected when p8p1 is down: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 Since the dead flag does not appear, there should be no expectation that the kernel would skip using this route due to link being down. v2: Split kernel changes into 2 patches, this actually makes a behavioral change if the sysctl is set. Also took suggestion from Alex to simplify code by only checking sysctl during fib lookup and suggestion from Scott to add a per-interface sysctl. v3: Code clean-ups to make it more readable and efficient as well as a reverse path check fix. v4: Drop binary sysctl v5: Whitespace fixups from Dave v6: Style changes from Dave and checkpatch suggestions v7: One more checkpatch fixup Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: track link-status of ipv4 nexthopsAndy Gospodarek2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are reachable via an interface where carrier is off. No action is taken, but additional flags are passed to userspace to indicate carrier status. This also includes a cleanup to fib_disable_ip to more clearly indicate what event made the function call to replace the more cryptic force option previously used. v2: Split out kernel functionality into 2 patches, this patch simply sets and clears new nexthop flag RTNH_F_LINKDOWN. v3: Cleanups suggested by Alex as well as a bug noticed in fib_sync_down_dev and fib_sync_up when multipath was not enabled. v5: Whitespace and variable declaration fixups suggested by Dave. v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them all. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | switchdev: rename vlan vid_start to vid_beginScott Feldman2015-06-23
| | | | | | | | | | | | | | | | | | | | | Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_qeueue: Drop queue entries on nf_unregister_hookEric W. Biederman2015-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. > BUG: unable to handle kernel paging request at 0000000100000001 > IP: [<0000000100000001>] 0x100000001 > PGD b9c35067 PUD 0 > Oops: 0010 [#1] SMP > Modules linked in: > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000 > RIP: 0010:[<0000000100000001>] [<0000000100000001>] 0x100000001 > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16 > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90 > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00 > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28 > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900 > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000 > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0 > Stack: > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8 > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128 > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0 > Call Trace: > [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0 > [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190 > [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360 > [<ffffffff81386290>] ? nla_parse+0x80/0xf0 > [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240 > [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150 > [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20 > [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0 > [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0 > [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650 > [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50 > [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0 > [<ffffffff810e8f73>] ? __wake_up+0x43/0x70 > [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0 > [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80 > [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a > Code: Bad RIP value. > RIP [<0000000100000001>] 0x100000001 > RSP <ffff8800ba9dba40> > CR2: 0000000100000001 > ---[ end trace 08eb65d42362793f ]--- Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'for-upstream' of ↵David S. Miller2015-06-23
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-06-18 Here's the final bluetooth-next pull request for 4.2. - Cleanups & fixes to 802.15.4 code and related drivers - Fix btusb driver memory leak - New USB IDs for Atheros controllers - Support for BCM4324B3 UART based Broadcom controller - Fix for Bluetooth encryption key size handling - Broadcom controller initialization fixes - Support for Intel controller DDC parameters - Support for multiple Bluetooth LE advertising instances - Fix for HCI user channel cleanup path Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Bluetooth: hci_core: increase max adv instFlorian Grandel2015-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all preconditions are present for actual multi-advertising, the number of allowed advertising instances can be larger than one. This patch increases the number of allowed advertising instances to 5. Signed-off-by: Florian Grandel <fgrandel@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: hci_core: remove obsolete adv_instanceFlorian Grandel2015-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the obsolete adv_instance is no longer being referenced anywhere in the code it can be removed without breaking the build. Signed-off-by: Florian Grandel <fgrandel@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: mgmt/hci_core: multi-adv for add_advertising*()Florian Grandel2015-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The add_advertising() and add_advertising_complete() functions reference the now obsolete hdev->adv_instance struct. Both methods are being refactored to access the dynamic advertising instance list instead. This patch also introduces all logic necessary to actually deal with multiple instance advertising. Notably the mgmt_adv_inst_expired() and schedule_adv_inst() method are being referenced to schedule instances in a round robin fashion. This patch also introduces a "pending" flag into the adv_info struct. This is necessary to identify and remove recently added advertising instances when the HCI commands return with an error status code. Otherwise new advertising instances could be leaked without properly informing userspace about their existence. Signed-off-by: Florian Grandel <fgrandel@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: hci_core/mgmt: move adv timeout to hdevFlorian Grandel2015-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the delayed work managing advertising duration and timeout is part of the advertising instance structure. This is not correct as only a single instance can be advertised at any given time. To implement round robin advertising a single delayed work structure is needed. To fix this the delayed work structure is being moved to the hci_dev structure. The instance specific variable is renamed to "remaining_time" to make it clear that this is the remaining lifetime of the instance and not the current advertising timeout. Signed-off-by: Florian Grandel <fgrandel@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: hci_core/mgmt: Introduce multi-adv listFlorian Grandel2015-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current hci dev structure only supports a single advertising instance. To support multi-instance advertising it is necessary to introduce a linked list of advertising instances so that multiple advertising instances can be dynamically added and/or removed. In a first step, the existing adv_instance member of the hci_dev struct is supplemented by a linked list of advertising instances. This patch introduces the list and supporting list management infrastructure. The list is not being used yet. Signed-off-by: Florian Grandel <fgrandel@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: fix flags BIT definitions orderAlexander Aring2015-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes commits ("mac802154: cleanup ieee802154 hardware flags") bcbfd2078d9b11277d9c9ce0c30ba73c750503c9 and ("mac802154: cleanup address filtering flags") 6b70a43c7e0202cf285c864bc9f20f607c42e432 by starting the flags definitions at BIT(0) which is same like the previous behaviour. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: cleanup llsec param flagsVarka Bhadram2015-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the setting of llsec param flag with the BIT macro. Signed-off-by: Varka Bhadram <varkab@cdac.in> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: Read encryption key size for BR/EDR connectionsJohan Hedberg2015-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since Bluetooth 3.0 there's a HCI command available for reading the encryption key size of an BR/EDR connection. This information is essential e.g. for generating an LTK using SMP over BR/EDR, so store it as part of struct hci_conn. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | nl802154: fix misspelled enumChristoffer Holmstedt2015-06-10
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoffer Holmstedt <christoffer@christofferholmstedt.se> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: Move SCO support under BT_BREDR config optionArron Wang2015-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCO/eSCO link is supported by BR/EDR controller, it is suitable to move them under BT_BREDR config option Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: Make l2cap_recv_acldata() and sco_recv_scodata() return voidArron Wang2015-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value of l2cap_recv_acldata() and sco_recv_scodata() are not used, then change it to return void Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: Fix encryption key size handling for LTKsJohan Hedberg2015-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The encryption key size for LTKs is supposed to be applied only at the moment of encryption. When generating a Link Key (using LE SC) from the LTK the full non-shortened value should be used. This patch modifies the code to always keep the full value around and only apply the key size when passing the value to HCI. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: change pan_coord type to boolAlexander Aring2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To indicate if it's a coordinator or not a bool is enough. There should no more values available which represent some other state. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: add missing structure commentsAlexander Aring2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add missing comments to internal mac802154 structures. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: rearrange attribute in ieee802154_hwAlexander Aring2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the priv attribute in ieee802154_hw to the right section which is commented by attributes which needs to be filled by driver layer. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: remove unused hw_filt attributeAlexander Aring2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removed an attribute from ieee802154_hw structure which is never used inside kernel. Address information are stored inside wpan_dev nowadays. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: cleanup ieee802154 hardware flagsAlexander Aring2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the ieee802154 hardware flags to enums and setting the flag values with the BIT macro. Additional this patch changes the commenting style for matching usual kernel style. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | mac802154: remove aack hw flagAlexander Aring2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the hardware auto acknowdledge flag which indicates that the transceiver supports this handling. This flag is never evaluated inside mac802154 and all transceivers should support this handling by default per hardware. Suggested-by: Lennert Buytenhek <buytenh@wantstofly.org> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Varka Bhadram <varkabhadram@gmail.com> Acked-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: Varka Bhadram <varkabhadram@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>