In yet another round of crazy stuff I find while working on client projects, I just came across a very strange iPhone app connectivity problem that ended up being related to a DNS issue on AT&T’s 3G network.
The whole thing started when my client couldn’t use the iPhone app I created while on AT&T’s 3G in New York. Initially I thought that it was just a weird temporary network routing problem, and the problem did go away. However, it came back several times in New York, and then it started happening in San Francisco and Portland too.
We messed around with the connection timeout settings, and added a bunch of debugging code to see what was happening. To make things worse, the iPhone app was already live on the app store and some users were reporting the issue as well.
We were using Amazon EC2 for the backend service of the application, and Amazon Elastic Load Balancer (ELB) in front of the backend web service servers to balance the load. We created a CNAME record that pointed api.domain.com to the specific ELB instance, which then routed the traffic to the EC2 instances. Worked great in Wifi, also great in EDGE, and worked pretty well over 3G most of the time, unless it didn’t.
Long story short, the issue was with using the CNAME record. The following AWS Developer forum post described a problem that was identical to ours, so we tried using an A record for the api.domain.com entry, and things magically started working again.
The obvious issue with that solution is that we cannot use the public ELB hostname, so we had to roll our own load balancer. Not the best situation, but at least everything is working properly again.