This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Tests for res_init

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: Florian Weimer <fweimer at redhat dot com>
Cc: Siddhesh Poyarekar <siddhesh at gotplt dot org>, GNU C Library <libc-alpha at sourceware dot org>
Date: Sat, 3 Jun 2017 22:10:05 -0700
Subject: Re: [PATCH] Tests for res_init
Authentication-results: sourceware.org; auth=none
References: <2b4fc76e-e756-e4e4-11ca-829a03cf84bb@redhat.com> <760fae72-2aeb-fe24-541e-4a0014cce5ea@gotplt.org> <997caf1e-1676-4bbb-6739-ba27d705430a@redhat.com> <CAMe9rOqSCXyEbN8Y1NthyrLnTwZiKU3TiePuXbmkFHVKQYJAuw@mail.gmail.com> <fd6862a2-202b-c3ec-dc1b-a8299c30c841@redhat.com> <CAMe9rOpJGh8Nna6BX3L=3vi=wCX9sViYL0FrPrWjygdfusVOYg@mail.gmail.com> <CAMe9rOoSsds453_6qTKuiDcChfCNV3x47Rs7YsSRCB3oXZ_SbA@mail.gmail.com> <3c929b88-297f-19ce-1091-a99a751f6a57@redhat.com> <CAMe9rOoujAf7fTgmeWCfcOivomh_EY3mgESWPAWV+n=7SbkOXQ@mail.gmail.com> <CAMe9rOo5Gt7OjdK6YZnC3VMnXoPVjjDCVewz=BA6jFP5SUe=UQ@mail.gmail.com> <CAMe9rOqTg7TaLB3JQXShNiy+cLMGjjaiZO6YCLq2hq-iVb6p2g@mail.gmail.com>

On Sat, Jun 3, 2017 at 6:40 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, Jun 3, 2017 at 5:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sat, Jun 3, 2017 at 8:00 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Jun 2, 2017 at 11:43 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>>> On 06/03/2017 04:06 AM, H.J. Lu wrote:
>>>>> On Fri, Jun 2, 2017 at 1:43 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Fri, Jun 2, 2017 at 1:40 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>>>>>> On 06/02/2017 10:19 PM, H.J. Lu wrote:
>>>>>>>> On Fri, Jun 2, 2017 at 6:49 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>>>>>>> On 05/18/2017 10:05 PM, Siddhesh Poyarekar wrote:
>>>>>>>>>> Why not just have a tst-resolv-res_init.c and
>>>>>>>>>> tst-resolv-res_init-thread.c?  That's how a lot of the other similar
>>>>>>>>>> kinds of tests are rewritten.  I don't have a very strong opinion on
>>>>>>>>>> this though, you can choose the color of your shed :)
>>>>>>>>>
>>>>>>>>> I do it this way so that I can use #if instead of #ifdef, following the
>>>>>>>>> current guidelines to trigger -Wundef warnings on typos.
>>>>>>>>>
>>>>>>>>> I'm going to push this without the tests expecting incorrect results.
>>>>>>>>>
>>>>>>>>
>>>>>>>> On Fedora 25/x86-64, I got
>>>>>>>>
>>>>>>>> [hjl@gnu-6 build-x86_64-linux]$ cat resolv/tst-resolv-res_init.out
>>>>>>>> Timed out: killed the child process
>>>>>>>> [hjl@gnu-6 build-x86_64-linux]$ cat resolv/tst-resolv-res_init-thread.out
>>>>>>>> Timed out: killed the child process
>>>>>>>> [hjl@gnu-6 build-x86_64-linux]$
>>>>>>>
>>>>>>> I see that too, with 4.11.3-200.fc25.x86_64.  What's your kernel version?
>>>>>>>
>>>>>>> I don't see the delay with 4.10.17-100.fc24.x86_64.  There, the poll
>>>>>>> system calls return immediately.  I wonder if it's some sort of
>>>>>>> regression in network namespaces.
>>>>>>
>>>>>> Yes, I am also running 4.11.3-200.fc25.x86_64.
>>>>>>
>>>>>
>>>>> I booted 4.10.16-200.fc25.x86_64 and the test passed.  Any ideas?
>>>>
>>>> I'm going to try to come up with a self-contained reproducer, and then
>>>> approach the kernel people if it still looks like a kernel bug.
>>>>
>>>> Florian
>>>
>>> Kernel regression was introduced by
>>>
>>> commit 580bdf5650fff8f66468ce491f8308f1117b7074
>>> Merge: e60a426 a249708
>>> Author: David S. Miller <davem@davemloft.net>
>>> Date:   Tue Jan 17 15:19:37 2017 -0500
>>>
>>>     Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>>>
>>> But I don't know exactly which commit caused it.
>>>
>>
>> It was caused by:
>>
>> commit 44bb765cf07ab6622e6fdf4bce546b43bd20faee
>> Author: Florian Fainelli <f.fainelli@gmail.com>
>> Date:   Tue Jan 10 12:32:36 2017 -0800
>>
>>     net: dsa: Implement ndo_get_phys_port_name()
>>
>>     Return the physical port number of a DSA created network device using
>>     ndo_get_phys_port_name().
>>
>>     Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>     Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>>     Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>>
>
> I was wrong.  It isn't the bad one.
>

This commit:

commit c0303efeab7391ec51c337e0ac5740860ad01fe7
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Mon Jan 9 16:04:09 2017 +0100

    net: reduce cycles spend on ICMP replies that gets rate limited

    This patch split the global and per (inet)peer ICMP-reply limiter
    code, and moves the global limit check to earlier in the packet
    processing path.  Thus, avoid spending cycles on ICMP replies that
    gets limited/suppressed anyhow.

    The global ICMP rate limiter icmp_global_allow() is a good solution,
    it just happens too late in the process.  The kernel goes through the
    full route lookup (return path) for the ICMP message, before taking
    the rate limit decision of not sending the ICMP reply.

    Details: The kernels global rate limiter for ICMP messages got added
    in commit 4cdf507d5452 ("icmp: add a global rate limitation").  It is
    a token bucket limiter with a global lock.  It brilliantly avoids
    locking congestion by only updating when 20ms (HZ/50) were elapsed. It
    can then avoids taking lock when credit is exhausted (when under
    pressure) and time constraint for refill is not yet meet.

    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

is the culprit.   This patch for kernel 4.11.3 reverts it and fixes
my problems.


-- 
H.J.

From 4f92e60679454c35dbbd2df844ab38184baa62e9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 3 Jun 2017 22:02:26 -0700
Subject: [PATCH] Revert "net: reduce cycles spend on ICMP replies that gets
 rate limited"

This reverts commit c0303efeab7391ec51c337e0ac5740860ad01fe7, which
caused resolv/tst-resolv-res_init and resolv/tst-resolv-res_init-thread
on glibc master branch to timeout.
---
 net/ipv4/icmp.c | 69 ++++++++++++++++++---------------------------------------
 net/ipv6/icmp.c | 49 +++++++++++++---------------------------
 2 files changed, 37 insertions(+), 81 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index fc310db..d58a1bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -280,33 +280,6 @@ bool icmp_global_allow(void)
 }
 EXPORT_SYMBOL(icmp_global_allow);
 
-static bool icmpv4_mask_allow(struct net *net, int type, int code)
-{
-	if (type > NR_ICMP_TYPES)
-		return true;
-
-	/* Don't limit PMTU discovery. */
-	if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED)
-		return true;
-
-	/* Limit if icmp type is enabled in ratemask. */
-	if (!((1 << type) & net->ipv4.sysctl_icmp_ratemask))
-		return true;
-
-	return false;
-}
-
-static bool icmpv4_global_allow(struct net *net, int type, int code)
-{
-	if (icmpv4_mask_allow(net, type, code))
-		return true;
-
-	if (icmp_global_allow())
-		return true;
-
-	return false;
-}
-
 /*
  *	Send an ICMP frame.
  */
@@ -315,22 +288,34 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
 			       struct flowi4 *fl4, int type, int code)
 {
 	struct dst_entry *dst = &rt->dst;
-	struct inet_peer *peer;
 	bool rc = true;
-	int vif;
 
-	if (icmpv4_mask_allow(net, type, code))
+	if (type > NR_ICMP_TYPES)
+		goto out;
+
+	/* Don't limit PMTU discovery. */
+	if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED)
 		goto out;
 
 	/* No rate limit on loopback */
 	if (dst->dev && (dst->dev->flags&IFF_LOOPBACK))
 		goto out;
 
-	vif = l3mdev_master_ifindex(dst->dev);
-	peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1);
-	rc = inet_peer_xrlim_allow(peer, net->ipv4.sysctl_icmp_ratelimit);
-	if (peer)
-		inet_putpeer(peer);
+	/* Limit if icmp type is enabled in ratemask. */
+	if (!((1 << type) & net->ipv4.sysctl_icmp_ratemask))
+		goto out;
+
+	rc = false;
+	if (icmp_global_allow()) {
+		int vif = l3mdev_master_ifindex(dst->dev);
+		struct inet_peer *peer;
+
+		peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1);
+		rc = inet_peer_xrlim_allow(peer,
+					   net->ipv4.sysctl_icmp_ratelimit);
+		if (peer)
+			inet_putpeer(peer);
+	}
 out:
 	return rc;
 }
@@ -409,8 +394,6 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	struct inet_sock *inet;
 	__be32 daddr, saddr;
 	u32 mark = IP4_REPLY_MARK(net, skb->mark);
-	int type = icmp_param->data.icmph.type;
-	int code = icmp_param->data.icmph.code;
 
 	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
 		return;
@@ -418,10 +401,6 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	/* Needed by both icmp_global_allow and icmp_xmit_lock */
 	local_bh_disable();
 
-	/* global icmp_msgs_per_sec */
-	if (!icmpv4_global_allow(net, type, code))
-		goto out_bh_enable;
-
 	sk = icmp_xmit_lock(net);
 	if (!sk)
 		goto out_bh_enable;
@@ -455,7 +434,8 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	rt = ip_route_output_key(net, &fl4);
 	if (IS_ERR(rt))
 		goto out_unlock;
-	if (icmpv4_xrlim_allow(net, rt, &fl4, type, code))
+	if (icmpv4_xrlim_allow(net, rt, &fl4, icmp_param->data.icmph.type,
+			       icmp_param->data.icmph.code))
 		icmp_push_reply(icmp_param, &fl4, &ipc, &rt);
 	ip_rt_put(rt);
 out_unlock:
@@ -674,10 +654,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	/* Needed by both icmp_global_allow and icmp_xmit_lock */
 	local_bh_disable();
 
-	/* Check global sysctl_icmp_msgs_per_sec ratelimit */
-	if (!icmpv4_global_allow(net, type, code))
-		goto out_bh_enable;
-
 	sk = icmp_xmit_lock(net);
 	if (!sk)
 		goto out_bh_enable;
@@ -734,7 +710,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	if (IS_ERR(rt))
 		goto out_unlock;
 
-	/* peer icmp_ratelimit */
 	if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
 		goto ende;
 
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 230b5aa..f480c85 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -166,30 +166,6 @@ static bool is_ineligible(const struct sk_buff *skb)
 	return false;
 }
 
-static bool icmpv6_mask_allow(int type)
-{
-	/* Informational messages are not limited. */
-	if (type & ICMPV6_INFOMSG_MASK)
-		return true;
-
-	/* Do not limit pmtu discovery, it would break it. */
-	if (type == ICMPV6_PKT_TOOBIG)
-		return true;
-
-	return false;
-}
-
-static bool icmpv6_global_allow(int type)
-{
-	if (icmpv6_mask_allow(type))
-		return true;
-
-	if (icmp_global_allow())
-		return true;
-
-	return false;
-}
-
 /*
  * Check the ICMP output rate limit
  */
@@ -200,7 +176,12 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
 	struct dst_entry *dst;
 	bool res = false;
 
-	if (icmpv6_mask_allow(type))
+	/* Informational messages are not limited. */
+	if (type & ICMPV6_INFOMSG_MASK)
+		return true;
+
+	/* Do not limit pmtu discovery, it would break it. */
+	if (type == ICMPV6_PKT_TOOBIG)
 		return true;
 
 	/*
@@ -217,16 +198,20 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
 	} else {
 		struct rt6_info *rt = (struct rt6_info *)dst;
 		int tmo = net->ipv6.sysctl.icmpv6_time;
-		struct inet_peer *peer;
 
 		/* Give more bandwidth to wider prefixes. */
 		if (rt->rt6i_dst.plen < 128)
 			tmo >>= ((128 - rt->rt6i_dst.plen)>>5);
 
-		peer = inet_getpeer_v6(net->ipv6.peers, &fl6->daddr, 1);
-		res = inet_peer_xrlim_allow(peer, tmo);
-		if (peer)
-			inet_putpeer(peer);
+		if (icmp_global_allow()) {
+			struct inet_peer *peer;
+
+			peer = inet_getpeer_v6(net->ipv6.peers,
+					       &fl6->daddr, 1);
+			res = inet_peer_xrlim_allow(peer, tmo);
+			if (peer)
+				inet_putpeer(peer);
+		}
 	}
 	dst_release(dst);
 	return res;
@@ -490,10 +475,6 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	/* Needed by both icmp_global_allow and icmpv6_xmit_lock */
 	local_bh_disable();
 
-	/* Check global sysctl_icmp_msgs_per_sec ratelimit */
-	if (!icmpv6_global_allow(type))
-		goto out_bh_enable;
-
 	mip6_addr_swap(skb);
 
 	memset(&fl6, 0, sizeof(fl6));
-- 
2.9.4

Follow-Ups:
- Re: [PATCH] Tests for res_init
  - From: Florian Weimer

References:
- Re: [PATCH] Tests for res_init
  - From: Florian Weimer
- Re: [PATCH] Tests for res_init
  - From: H.J. Lu
- Re: [PATCH] Tests for res_init
  - From: Florian Weimer
- Re: [PATCH] Tests for res_init
  - From: H.J. Lu
- Re: [PATCH] Tests for res_init
  - From: H.J. Lu
- Re: [PATCH] Tests for res_init
  - From: Florian Weimer
- Re: [PATCH] Tests for res_init
  - From: H.J. Lu
- Re: [PATCH] Tests for res_init
  - From: H.J. Lu
- Re: [PATCH] Tests for res_init
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]