September 2012 Archives

asterisk-func_dns Update

| No TrackBacks

I pushed some updates to my Asterisk func_dns module today.

These updates enable func_dns to build against (at least) Asterisk 1.8, since 1.4 is now long since deprecated. Also, I bastardized the Makefile a little bit to automatically detect a /usr/lib64 Asterisk modules directory and use it as an installation path instead of the old hardcoded /usr/lib/asterisk/modules directory.

I wish there was a pkgconfig that ships with Asterisk, but I did not see one in any of the files installed by the EPEL 6 Asterisk package.

Thank You Sprint!

| No TrackBacks

Yesterday, my Sprint Nexus S 4G stopped sending or receiving calls or text messages. The phone is running stock Sprint Android ICS software.

This occurred after a round of manual application updates to apps like Facebook, Skype, iMO and others. My initial suspicion (after trying to reboot the phone) was that one of these apps and its new permissions might have installed some kind of hook that disrupted my calling abilities. That was probably a stretch, but it was easy enough to try uninstalling these apps.

That didn't work, so I pursued information on the net that suggested updating the phone's profile (and its PRL) and uninstalling Google Voice (though my copy was factory preloaded).

I thought that I might be experiencing a Sprint network problem (despite the fact that data service worked fine) so I decided to sleep on it and see if it was working in the morning.

No dice, so I did a factory reset. I was annoyed that I'd need to replace all of my apps and configuration after said reset, but not as annoyed as I was when I discovered that it still wasn't working.

Finally, I got to another phone and called Sprint support. At this point I wasn't expecting a carrier outage, as an outage lasting this long in the telecommunciations industry can get the carrier in some regulatory hot water. To my surprise, I was told there was an outage in my area, and that Sprint had lost communication with the tower. I suppose that explains why the phone indicates full signal strength even though I can't make calls.

I've been assured that it will be fixed in the next two hours. Time to work on restoring my phone. :(

Super Duper Update!

It's now 5 hours after being told it would be fixed in two hours, and Sprint's connection to their tower is still not working!

Super Duper Super Duper Update!

It's now almost 11 hours since Sprint's first ETR (or at least the first one I was given) and the phone has been unusable in my area since Sunday afternoon (an outage of over 24 hours at this point). The new ETR I have been given is for 3:30 AM, 17 hours and 30 minutes after the first ETR.

Oh Sprint, you continue to be so amazing!

At work, our app is hosted on a pair of Internet connections from different upstream providers. We have incoming and outgoing SIP calls, incoming web traffic, incoming and outcoming e-mail, and incoming and outgoing web service calls.

We wanted to be able to load balance all of these functions with failover, and we have a philosophy of simultaneously utilizing resources from all of our available routers and connections. That allows us to avoid situations in which the failover system or failover circuit does not function as intended when it becomes the active master.

We use a variety of load-balancing proxies and techniques to allow these various services to function reliably.

What we're not using is BGP anycast -- mainly because to do so requires a Class C of provider-neutral IP space and an AS number, and IP space is getting harder to come by. Instead, we utilize DNS-based load balancing and failover from DynECT for all of our inbound traffic.

Our network consists of Linux-based routers/load balancers running in parallel. In order to load balance our outbound Internet traffic, all that traffic goes through proxy servers which are looked up on our internal DNS. Here, we utilize a 5 second TTL for fast failover.

Finally, we have a load balancer probe which continually tests the uptime of our backend servers and our Internet circuits.

One time, we discovered a flaw with this system. We performed uptime monitoring of our outbound Internet circuits by pinging the default gateway. In many cases this check was sufficient, but if our Internet provider was experiencing loss of connectivity at a level beyond our next-hop, this strategy failed.

A tempting solution would be to ping something out on the Internet, but that means tying our reliability to the uptime of what we are pinging. We also weren't too sure about constantly pinging someone we hadn't already made arrangements with.

It turns out there is a better way. DynECT is constantly requesting web pages from each of our external load balancers, in order to determine whether or not to publish that IP address for our domains. We realized that we could monitor the frequency of these requests, and if we were not receiving them on a particular load balancer, that server could arrange for its own internal IP to stop being served for our internal proxy server DNS A record.

Of course, using this approach meant that if we stopped getting requests from DynECT due to a problem on their end, we could generate an outage where we were trying to prevent one. In order to build in some redundancy, we upgraded our Pingdom account and created a check for each server/Internet circuit. Now, if we're not hearing from either DynECT or Pingdom on a given circuit, we consider that circuit offline.

Since the implementation of this solution, we have experienced conditions in which the older "I can ping my default route" check did not trip, but our Internet circuit was nonetheless offline. But our new WAN monitoring solution reliably catches the problem and brings the affected circuit out of service correctly.

Many Linux/BSD users are now hosting their dotfiles in git repositories. This scheme allows you to quickly deploy your favorite system configuration to a new server on which you've been given an account, letting you get bash, vim, screen or whatever utilities you use most working exactly as you prefer them with a minimal amount of fuss.

I started following this approach and have been doing so successfully for months.

In order to make deployment as easy as possible, I wrote a simple "apply" script in bash which would symlink desired configuration files into place, and automatically add an include line to the local server's bashrc which would include my global bashrc, so that my settings would mix in gracefully with operating system defaults.

I published a subset of my shared-env repository on GitHub to help anyone who wants to save a little bit of time spent gettting a skeleton in place.

My shared-env contains a few handy vim plugins (including localvimrc and my own makesd) and the really cool tab completion directory history script