...making Linux just a little more fun!
By Jim Dennis, Jason Creighton, Chris G, Karl-Heinz, and... (meet the Gang) ... the Editors of Linux Gazette... and You!
From Andy Smith
Hi there,
Today I sent this email to the netfilter list, but I've had no
responses yet; can the answer gang get anywhere with it?
Since writing this email I have started graphing how many lines are
in /proc/net/ip_conntrack, and the value does not go above 200. The
maximum according to /proc/sys/ipv4/ip_conntrack is 32768 so I don't
think my connection tracking table is overflowing..
Although having said that I haven't experienced the abrupt
disconnection again yet either. Perhaps the connections increase
dramatically at that time of day.
Andy
Hi,
This is rather a long email and so I hope that someone who knows
about netfilter, bridging and possibly Xen will have patience to
read it all the way through.
I have a server that I run Xen
(http://www.cl.cam.ac.uk/Research/SRG/netos/xen) on, with 6 xen
user domains (virtual machines).
For those unfamiliar with Xen, the dom0 (host machine) has a virtual
network interface for each user domain and each of those virtual
interfaces are bridged onto xen-br0, along with the machine's real
eth0. In each user domain, the virtual interface appears as eth0.
In dom0 I have iptables running, with the eb-nf support of linux
2.6.11 and the physdev module loaded so that I can match traffic
coming in to each of my user domains.
Part of my ruleset looks like this:
Now, I have noticed that while this works most of the time, for
reasons unknown to me, some TCP connections just seem to stop being
tracked and hit the DROP rule. Even though they have been tracked
fine for several hours. This happens on every user domain to all
kinds of TCP connections, but I have pared the ruleset down to just
the one domain (strugglers.net) and SSH to demonstrate.
If I add a rule in domu_forward_in_strugglers to allow all TCP to
port 22 regardless of state the I have no problems.
This does not seem to affect the INPUT table where I have a similar
set of rules.
Today I decided to take a tcpdump while I was ssh'd in up until when
it kicked me out. I ssh'd in at approx 13:07 GMT and got kicked out
at approx 15:32:49 GMT. Here is a selection of what got logged on
the console of dom0:
See attached andy.dom0-console-log.txt
At the same time I see a lot of TCP connections suddenly being
denied to a number of other user domains, so I suspect that all TCP
connect tracking was purged then for some reason.
Although I was kicked out, I was able to reconnect straight away (as you would
expect from the above ruleset, it allows the SYN to port 22 and away we go)and
in fact that is how I am typing this email to you now.
Here is the bridge setup:
As I said above, I did a:
just after ssh'ing in and left it running until just after my ssh client
gave up. That file (1.5MB) can be found here:
But I cannot see anything obviously wrong with it.
Anyone have any ideas? I can give up on connection tracking for my
user domains but it's troubling that it doesn't work. Is it an
issue with using a bridge?
Thanks,
Tue, 14 Jun 2005 16:11:05 +0000
Question on netfilter mailing list (netfilter from lists.netfilter.org)
$IPT -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPT -A FORWARD -m physdev --physdev-out vif+ -j domu_forward_in
$IPT -A FORWARD -m physdev --physdev-in vif+ -j domu_forward_out
######################################################################
# strugglers.net
######################################################################
$IPT -A domu_forward_in -m physdev --physdev-out vif-struggler.0 -j domu_forward_in_strugglers
$IPT -A domu_forward_in_strugglers -p tcp --syn -j domu_forward_in_strugglers_tcp
$IPT -A domu_forward_in_strugglers_tcp -p tcp --dport 22 -j ACCEPT
$IPT -A domu_forward_in_strugglers -m limit --limit 1/s -j LOG --log-prefix "FWD DROP: "
$IPT -A domu_forward_in_strugglers -j DROP
[andy@curacao src]$ brctl show
bridge name bridge id STP enabled interfaces
xen-br0 8000.00e081641d07 no eth0
vif-admin.0
vif-cholet.0
vif-outpostlo.0
vif-ruminant.0
vif-seinfeld.0
vif-struggler.0
[andy@curacao src]$ ip link
1: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:e0:81:64:1d:07 brd ff:ff:ff:ff:ff:ff
2: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:e0:81:64:1d:08 brd ff:ff:ff:ff:ff:ff
3: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
5: xen-br0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether 00:e0:81:64:1d:07 brd ff:ff:ff:ff:ff:ff
6: vif-admin.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
7: vif-cholet.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
8: vif-outpostlo.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
9: vif-ruminant.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
10: vif-seinfeld.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
11: vif-struggler.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
[andy@curacao src]$ sudo tcpdump -w /tmp/xen-br0.dump -i xen-br0 'host 212.13.198.70 and host 82.44.131.131'
http://strugglers.net/~andy/tmp/xen-br0.dump
Andy
http://freebsdwiki.org - Encrypted mail welcome - keyid 0xBF15490B
'
Although having said that I haven't experienced the abrupt disconnection again yet either. Perhaps the connections increase dramatically at that time of day.
That is likely. Mind you, I have experienced the same symptoms, even when my tracking table was full but not hitting the upper bound. There could be any number of reasons for this -- iptables is a good firewall, but if it starts to have to deal with a large number of connections simultaneously, I have seen it keel-over and die -- or, at best, start dropping packets.
I suppose it could be the result of your bridge, but I doubt it. I can't offer any technical advice, Andy, but if you can afford a means to disconnect your xen connections, and reconnect them one-by-one, and monitor/log their process, that might help.
-- Thomas Adam
This had been a stumper, so we were going to present it in Help Wanted. However, Andy reports:
regarding netfilter, no, it was revealed to be a bug in 2.6.11 regarding TCP SACK and connection tracking. I have the url for the email thread archive if you want:
https://lists.netfilter.org/pipermail/netfilter/2005-June/061101.html
turning off SACK support has worked around the problem, so presumably upgrading the kernel would too
[Hugo, one of his fellow LUG members] Have you tried 2.6.12.2 yet?
not on that machine.. I don't want to reboot it unless I have to