<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | 5 | 6 | Knowledge Base

The Answer Gang

By Jim Dennis, Jason Creighton, Chris G, Karl-Heinz, and... (meet the Gang) ... the Editors of Linux Gazette... and You!

Why does this connection stop being

From Andy Smith

Hi there,

Today I sent this email to the netfilter list, but I've had no responses yet; can the answer gang get anywhere with it?

Since writing this email I have started graphing how many lines are in /proc/net/ip_conntrack, and the value does not go above 200. The maximum according to /proc/sys/ipv4/ip_conntrack is 32768 so I don't think my connection tracking table is overflowing..

Although having said that I haven't experienced the abrupt disconnection again yet either. Perhaps the connections increase dramatically at that time of day.

Andy

Tue, 14 Jun 2005 16:11:05 +0000
Question on netfilter mailing list (netfilter from lists.netfilter.org)
Hi,

This is rather a long email and so I hope that someone who knows about netfilter, bridging and possibly Xen will have patience to read it all the way through.

I have a server that I run Xen (http://www.cl.cam.ac.uk/Research/SRG/netos/xen) on, with 6 xen user domains (virtual machines).

For those unfamiliar with Xen, the dom0 (host machine) has a virtual network interface for each user domain and each of those virtual interfaces are bridged onto xen-br0, along with the machine's real eth0. In each user domain, the virtual interface appears as eth0.

In dom0 I have iptables running, with the eb-nf support of linux 2.6.11 and the physdev module loaded so that I can match traffic coming in to each of my user domains.

Part of my ruleset looks like this:

$IPT -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT $IPT -A FORWARD -m physdev --physdev-out vif+ -j domu_forward_in $IPT -A FORWARD -m physdev --physdev-in vif+ -j domu_forward_out ###################################################################### # strugglers.net ###################################################################### $IPT -A domu_forward_in -m physdev --physdev-out vif-struggler.0 -j domu_forward_in_strugglers $IPT -A domu_forward_in_strugglers -p tcp --syn -j domu_forward_in_strugglers_tcp $IPT -A domu_forward_in_strugglers_tcp -p tcp --dport 22 -j ACCEPT $IPT -A domu_forward_in_strugglers -m limit --limit 1/s -j LOG --log-prefix "FWD DROP: " $IPT -A domu_forward_in_strugglers -j DROP

Now, I have noticed that while this works most of the time, for reasons unknown to me, some TCP connections just seem to stop being tracked and hit the DROP rule. Even though they have been tracked fine for several hours. This happens on every user domain to all kinds of TCP connections, but I have pared the ruleset down to just the one domain (strugglers.net) and SSH to demonstrate.

If I add a rule in domu_forward_in_strugglers to allow all TCP to port 22 regardless of state the I have no problems.

This does not seem to affect the INPUT table where I have a similar set of rules.

Today I decided to take a tcpdump while I was ssh'd in up until when it kicked me out. I ssh'd in at approx 13:07 GMT and got kicked out at approx 15:32:49 GMT. Here is a selection of what got logged on the console of dom0:

See attached andy.dom0-console-log.txt

At the same time I see a lot of TCP connections suddenly being denied to a number of other user domains, so I suspect that all TCP connect tracking was purged then for some reason.

Although I was kicked out, I was able to reconnect straight away (as you would expect from the above ruleset, it allows the SYN to port 22 and away we go)and in fact that is how I am typing this email to you now.

Here is the bridge setup:

[andy@curacao src]$ brctl show bridge name bridge id STP enabled interfaces xen-br0 8000.00e081641d07 no eth0 vif-admin.0 vif-cholet.0 vif-outpostlo.0 vif-ruminant.0 vif-seinfeld.0 vif-struggler.0 [andy@curacao src]$ ip link 1: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:e0:81:64:1d:07 brd ff:ff:ff:ff:ff:ff 2: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:e0:81:64:1d:08 brd ff:ff:ff:ff:ff:ff 3: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 4: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 5: xen-br0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether 00:e0:81:64:1d:07 brd ff:ff:ff:ff:ff:ff 6: vif-admin.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 7: vif-cholet.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 8: vif-outpostlo.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 9: vif-ruminant.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 10: vif-seinfeld.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 11: vif-struggler.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff

As I said above, I did a:

[andy@curacao src]$ sudo tcpdump -w /tmp/xen-br0.dump -i xen-br0 'host 212.13.198.70 and host 82.44.131.131'

just after ssh'ing in and left it running until just after my ssh client gave up. That file (1.5MB) can be found here:

http://strugglers.net/~andy/tmp/xen-br0.dump

But I cannot see anything obviously wrong with it.

Anyone have any ideas? I can give up on connection tracking for my user domains but it's troubling that it doesn't work. Is it an issue with using a bridge?

Thanks,
Andy

http://freebsdwiki.org - Encrypted mail welcome - keyid 0xBF15490B '

Although having said that I haven't experienced the abrupt disconnection again yet either. Perhaps the connections increase dramatically at that time of day.

That is likely. Mind you, I have experienced the same symptoms, even when my tracking table was full but not hitting the upper bound. There could be any number of reasons for this -- iptables is a good firewall, but if it starts to have to deal with a large number of connections simultaneously, I have seen it keel-over and die -- or, at best, start dropping packets.

I suppose it could be the result of your bridge, but I doubt it. I can't offer any technical advice, Andy, but if you can afford a means to disconnect your xen connections, and reconnect them one-by-one, and monitor/log their process, that might help.

-- Thomas Adam

This had been a stumper, so we were going to present it in Help Wanted. However, Andy reports:

regarding netfilter, no, it was revealed to be a bug in 2.6.11 regarding TCP SACK and connection tracking. I have the url for the email thread archive if you want:

https://lists.netfilter.org/pipermail/netfilter/2005-June/061101.html

turning off SACK support has worked around the problem, so presumably upgrading the kernel would too

[Hugo, one of his fellow LUG members] Have you tried 2.6.12.2 yet?

not on that machine.. I don't want to reboot it unless I have to

This page edited and maintained by the Editors of Linux Gazette
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/

Each TAG thread Copyright © its authors, 2005

Published in issue 116 of Linux Gazette July 2005

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | 5 | 6 | Knowledge Base