Fucking AT&T
For the most part during our time at Shaina’s house, the internet has seemed pretty good. The house is exceptionally close to the central office, so her average rate is around 18mbps on ADSL2+ (unfortunately with still the same shitty upstream that all ADSL customers are stuck with), but for the last week or so it’s been really intermittent, tearing down the entire connection (DSL logical link and all), sometimes multiple times in a 5 minute period.
Then it went dark completely.
Flip called AT&T, who sent out an “internal tech” who basically looked at things in the house (which is all brand new since about a year ago, a single twisted pair from a single RJ11 socket straight out to the demarcation point) and decided it wasn’t his job, and ordered an outside tech, who supposedly showed up the next day (though we never saw him) before referring it back to internal but neglecting to actually place the order.
When Flip called AT&T again shortly after the window expired, he was informed that the problem was in the central office and they were working on it right now (at 7pm at night). The connection came back online, but was still spotty, so he called again yesterday where they repeated the same mess. They’re supposed to come out this afternoon and fix it.
Last night during the worst of it, I inspected Shaina’s router (which has the annoying habit Mum’s Telstra router had of intercepting HTTP - and HTTPS, with bad certificates - connections when the internet breaks, and it seems you can’t disable that) and found some diagnostic information that looks pretty telling:
2017-01-06T15:34:09-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -5.4 0.0
2017-01-06T15:34:38-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -5.6 0.0
2017-01-06T15:36:51-05:00 1 17995 20583 20583 6.3 6.3 15.0 -51.2 0 0 2.50 8.00 1020 1200 6.9 6.9 9.3 12.0 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.3 -5.2 0.0
2017-01-06T15:37:20-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -3.5 0.0
2017-01-06T15:49:48-05:00 1 17997 20091 20091 6.1 6.1 15.0 -51.2 0 0 2.00 7.00 1020 1193 6.8 6.8 9.3 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.3 -3.5 0.0
2017-01-06T15:50:17-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -8.6 0.0
2017-01-06T17:48:45-05:00 1 17993 21051 21051 6.2 6.2 15.0 -51.2 0 0 3.00 8.00 1020 1200 7.0 7.0 9.2 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.3 -8.1 0.0
2017-01-06T17:49:14-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -4.7 0.0
2017-01-06T17:49:43-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -5.2 0.0
2017-01-06T17:50:34-05:00 1 17995 20635 20635 6.4 6.4 14.5 -51.2 0 0 2.50 8.00 1020 1182 6.5 6.5 9.1 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.0 -4.8 0.0
2017-01-06T17:51:03-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -4.3 -25.2
2017-01-06T17:53:00-05:00 1 17995 20538 20538 6.2 6.2 14.5 -51.2 0 0 2.50 8.00 1020 1183 6.2 6.2 9.1 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.0 -4.4 0.0
2017-01-06T17:53:29-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -0.1 0.0
2017-01-06T17:54:18-05:00 1 17995 20705 20705 6.5 6.5 14.5 -51.2 0 0 2.50 8.00 1020 1197 6.9 6.9 9.1 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.0 -2.0 0.0
2017-01-06T17:54:47-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 3.1 0.0
2017-01-06T17:55:48-05:00 1 17995 20731 20731 6.5 6.5 14.5 -51.2 0 0 2.50 8.00 1020 1182 6.5 6.5 9.1 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.0 -2.1 0.0
2017-01-06T17:56:17-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -6.5 0.0
2017-01-06T20:09:11-05:00 1 17993 21125 21125 6.3 6.3 14.5 -51.2 0 0 3.00 8.00 1020 1186 6.6 6.6 9.0 12.1 0 0 1.50 4.00 G.DMT2+ Annex A Broadcom Success ERR_LOS_LIMIT N 0/0 -109.0 -8.7 0.0
2017-01-06T20:09:40-05:00 1 0 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 0 0 0.0 0.0 0.0 0.0 0 0 0.00 0.00 Unknown Broadcom Config Error ERR_TRAINING_FAILURE N 0/0 0.0 -7.9 -25.4
2017-01-06T20:49:01-05:00 1 17995 20572 21121 6.3 7.1 14.5 18.4 20 7283 2.50 8.00 1021 1146 6.3 10.8 9.0 12.1 11 146 1.00 4.00 G.DMT2+ Annex A Broadcom Success N/A N 0/0 -108.9 -7.2 0.0
Later (this morning) I found this:
Type Since Current Current Time Since
Period Reset 24-hr int. 15-min int. Last Event
Link Retrains 32 32 1 0:01:01
DSL Training Errors 43 43 2 0:01:30
Training Timeouts 43 43 2 0:01:30
Loss of Framing Failures 264 264 9 0:02:28
Loss of Signal Failures 31 31 1 0:02:26
Loss of Power Failures 0 0 0 0:00:00
Loss of Margin Failures 8 8 0 0:26:25
Cum. Seconds w/Errors 430 430 12 0:02:26
Cum. Sec. w/Severe Errors 343 343 11 0:02:26
Corrected Blocks 374488 374488 3845 0:00:11
Uncorrectable Blocks 15981 15981 476 0:02:29
DSL Unavailable Seconds 2609 2609 97 0:01:01
That’s 1.5 minutes in 15 minutes (~10%, if my math is correct), and nearly 45 minutes of outage in 24 hours (not counting the time it takes to get the PPPOE connection back up, if I’m not mistaken). Sadly I don’t think there’s an SLA on home broadband connections, because this is utterly unacceptable.
Update: The internal tech came out again and I managed to catch him. He started out with the questionable assertion that too many devices hanging off the gateway will cause it to reboot. I know that these particular units (Pace 5268AC) have issues with insufficient memory for the NAT state table, which causes grief with network sessions getting kicked out when you have about 50 busy devices on it. We’re nowhere near that level, and it’s the DSL that’s kicking off not the gateway.
He then switched to saying that it’s the DSL line card the other end that’ll reboot when it’s saturated, which I found even more ridiculous… I would expect the DSL link to throw away packets when it’s saturated, not to spontaneously reboot under 100% load. I’ve never had a DSL do that before (I literally kept our ADSL in Sacramento saturated for weeks on end with no connection drops) and it would be utterly unacceptable.
They decided to replace the modem/gateway thing, under the justification that dealing with line errors for extended periods of time shortens the life of the modem… I don’t know enough about DSL stuff to dispute that, but we had him go ahead and do it anyway - the modems are rented so it’s not like it costs Shaina anything.
It appears to have done the trick.