Friday, June 27, 2008

House Subcommittee on Courts, the Internet and Intellectual Property passed the Performance Rights Act

The fight between the RIAA and the NAB is heating up.  The RIAA scored one when the House Subcommittee on Courts, the Internet and Intellectual Property passed the Performance Rights Act. I have mixed feelings about this.  I don't like the fact that net radio has to pay high royalties while over the air radio doesn't.  On the other hand, I don't want to see AM/FM broadcasters forced to pay the same ridiculous rates that we have to pay. 
I really think this is going to backfire on the RIAA and it's major label members.

Labels: , , , , ,

Friday, June 20, 2008

We rolled out iPhone streaming today!

After a lot of testing, we rolled out iPhone streaming tonight. I'm still not completely happy with the look of our iPhone mini-site so you might see some changes in the near future, but rather than wait until everything was perfect, I decided to release it now.

So now when you go to somafm.com on your iPhone, you get an iPhone-specific site with links for both EDGE (32-56k) and WiFi (128k) streams.

Labels: , , , ,

Thursday, June 19, 2008

Infrastructure Upgrades

We've been improving our streaming and web infrastructure for the last couple weeks. Not everything has been launched until we can fully test it for a couple more weeks (for example the web site is still running on the old server). We're also installing a backup web server on the East Coast at the facilities of Steadyhost (where we have some streaming servers now). We've been happy with the service provide by Steadyhost, they also provide hosting for some other large internet radio stations such as DI.FM.

Labels: ,

Friday, June 6, 2008

Continued problems with our hosting provider; email and web move to our San Francisco datacenter under way

Regrettably, we're still running services from ThePlanet.com's web hosting facilities until we can migrate everything to 365 Main in San Francisco.

So this morning, I wake up to find that our mail server is unreachable again, and this series of messages on The Planet's service update site:

  • June 6 – 10:00am CDT - We have lost network connectivity to H1. We are confirming the extent of any power loss, and we will be updating shortly.
  • June 6 – 10:05am CDT - Transport for H1 temporarily fell offline and is restored. H1 Phase 2 did not lose power. H1 Phase 1 lost power. We will be updating again shortly.
  • June 6 – 10:10am CDT - The temporary generator powering Phase 1 failed. We switched over to the backup generators that were just brought in. The CRAC units have been powered on, and PDUs are having power restored right now. [THis is the second temporary generator that has failed in the last week at The Planet. Perhaps it is operator error? - Rusty]
  • June 6 – 10:15am CDT - We continue to power PDUs in Phase 1. We will update when all PDUs have been restored.
  • June 6 – 10:20am CDT - Power has been restored completely to Phase 1. Our DC Ops team will be walking through the aisles to confirm all racks are online.
  • Customer Support Overview (June 6, 11:30am CDT): -Technical Support Phone: No Hold Time

From our monitoring, the service went down at 7:40 AM pacific, or 9:40 AM CDT. They were a little slow to notice they lost communications with their data center!

Of course our mail server is still unreachable at 11:44 Pacific, or 1:44 PM CDT, 4 hours after they stated that power has returned.

What really annoys me is that they are stating on their site, "Technical Support Phone: No Hold Time". The reason for this is that they're sending all support calls to their sales people, who don't do much more than tell you they're going to escalate you to Level 2 support, but all those techs are busy and they'll need to call you back. I suspect they did that to reduce their 800 number call expenses, because they had hundreds of customers sitting on hold for 30-40 minutes all the time. They also get to make it look like their response time is much better than it really is.

After waiting 45 minutes for a callback that never came, I called in again. Finally I got them to connect me with a real tech support service, not just the person logging callbacks. I've been on hold with "real" technical support at The Planet for 10 minutes now, trying to get our mail server powered back on.

10 more minutes on hold, and the tech tells me, "Can you go online and submit a reboot request ticket, that will expedite things."

At this point I have no faith of when our mail will be back again.

I'll continue to move our services out of The Planet and to our own servers in San Francisco; our DNS is already moved (although we still don't have the redundant location DNS in place yet); the hardware for the new mail server is setup but the mail services aren't configured yet. There are also a few issues with some of the web services we run; the old systems at the Planet used a much older version of the Berkeley DB software package which isn't compatible with the current versions. So I have make a few changes to our "now playing" code as well as our stream server monitoring systems. The "now playing" database is the most important to our listeners, because that's got all the information on which album songs come from, as well as the info on where to buy the track or get more info on the artist.

The mail server is a bit harder to migrate, but I'm also working on that right now as well.

Hopefully, the good thing that will eventually come out of this is that we'll have redundant servers, in different geographic locations,

Labels: ,

Tuesday, June 3, 2008

Tough Weekend Outage

The company that hosts the webserver for SomaFM.com and the mail server, ThePlanet.net, had a rather large outage last weekend, which took the SomaFM web site off the air (so to speak) from 3:08 PM PDT Pacific time on Saturday, until about 3:37 AM Pacific time Monday morning (June 2nd).

Our mail server is still down, about 72 hours later. More on that in a bit.

The cause of this outage was outage was not immediately known, and calls to The Planet's tech support lines (which had 30 minute waits) were "unrewarding" to say the least. At first they wouldn't give me any information at all (because I didn't have the proper password), and they were only giving out information to "affected customers". I pointed out that since they had caller ID and they knew that I was calling from the phone number on record for our account, that should prove adequate to allow them to give me some information on what was happening. The rep finally agreed, even though he said, "he could get in trouble for telling me this".

What he told me was that they had had a transformer explosion at the datacenter where our servers were located.

This seemed kind of fishy, didn't they have adequate generator power? What about the UPSes? Blown transformers happen fairly frequently, that's one reason you have redundant power systems.

A while later, they made a public announcement about the outage at the Planet's Houston data center:

Today at approximately 5:45 p.m. [central time], a transformer in our H1 data center in Houston caught fire, thus requiring us to take down all generators as instructed by the fire department. All servers are down until power can be restored.
According to our monitoring logs, it was 5:07 PM central time, not 5:45 PM.

We received more information dated May 31 – 10:46pm (8:46 pm Pacific):

On Saturday, May 31st at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room. Thankfully, no one was injured. In addition, no customer servers were damaged or lost.

We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.
This time makes more sense. Seems like the UPSes did indeed work, but they weren't able to switch over to generator power. So about 10 minutes after they lost power, the UPS batteries were expended, and the facility lost power.

This is also the first time they mention "the short". At first it was just a transformer fire. But now it sounds like it was a transformer explosion caused by an electrical short, which implies that some wires were so overloaded that the insulation melted and caused them to short out.

There have been lots of discussions about the blame for the problems at The Planet. I'm not going to go into that now. However I am less than satisfied at the quality of the communications from them, and not happy with at all how they've handled the situation.

The SomaFM.com web server eventually came back while we were just finishing up restoring our backups to a new web server. (So at least we now have a tested plan and sequence from restoring from backups!)

However, as of 10:30am on June 3rd, our mail server is still not running, nor did it come back up when The Planet said that they had powered back on the part of the datacenter where it is located. After sitting on hold (with very bad music) for 35 minutes, a tech told me that our mail server machine was one of the older ones that would have to be powered on by hand... and that there were over 1000 of these machines that they would be going around and turning on one at a time. But that never happened.

The last update on The Planet's web site was kind of ominous:

This morning at approximately 2:45 a.m. CST, the temporary generator supplying power to the servers and environmental control systems located in Phase 1 of our H1 facility shut down. This was caused by some faulty current sensors in the output breaker. The sensors detected an out of balance current condition that did not exist.

At this point, I don't know when the mail servers will be working again. I guess we have to deploy a new mail server (which is also the secondary DNS server).

Wait! Another update:

Fixing the faulty breaker on the generator powering H1 Phase 1 was not successful. we have located a second generator that is currently being delivered to the facility. It is expected to arrive this afternoon and we will provide additional information regarding the new generator at that time.
That doesn't sound promising. And for all I know, our server has been blown up by a power glitch or something. Time to get working on that new mailserver, I guess!

Unfortunately, I screwed up and didn't properly backup the mail server configs and will have to recreate all that by hand, so it's not a real simple process.

But I guess it won't take too long as I won't have any interruptions from email today!

But now we do have a full backup of the SomaFM web server up and running at our rack in 365 Main's San Francisco data center. And I'm working on getting further redundancy in place so this won't impact our listeners much if it happens again.

You can follow the drama of The Planet on their Service Update web page.

And thanks for your patience with us.

Labels: ,