I’ve been really, really busy this week both in and out of work and have been struggling to find time to get all the interesting things we have been doing with Windows Azure down on paper.
This morning while troubleshooting a broken SAN that looses its iSCSI connection to the server it made me realised just how resilient Windows Azure and SQL Azure are.
Here is a rundown of what we are running in the Cloud:
- 3x Production servers hosting two web roles and two worker roles
- 2x development servers hosting dev copies of the above production code
- 4x SQL Azure databases
Connected to
- 200GB used of our 100TB storage allocation
- Worldwide CDN network carrying 220GB/month of traffic from 18 different local data centres
The server instances in Windows Azure are effectively running in an N+1 cluster in what Microsoft calls the ‘Azure Fabric’ so each machine in the group shares the load but as machines fail or are taken down for maintenance another machine is brought online, on the fly, to take on the load up to our configured machine limit. The SQL Azure databases don’t allow the end users to set the level of redundancy yet, Microsoft has decided that triple redundancy works best for now.
Clustering isn’t easy, managing the hardware is just one part, having spares for the inevitable failures as well as managing patches and configuration changes is a full time job even for a clever person. This is something small companies and start-ups can’t generally afford, thats why I think Windows Azure is a valuable service that allows start-ups to focus on their key product and not infrastructure.
If you add on the management of keeping our 100TB data allocation up 24/7 and the fact that this integrates into the Microsoft worldwide CDN (the same one that powers Windows updates, Zune video and Bing Maps. source zdnet) seamlessly to improve data caching in local environments is a massive undertaking, even for a large multinational company.
My day-to-day involvement in running all this? Five minutes glancing at our usage reports and checking the Windows Azure service stats to make sure it’s all running smoothly.
As a developer who has worked in small companies before where the DBA is also the SQL programmer and maybe even handles the firewall security too, being able to hand off the hassle of making hardware run smoothly is simply awesome.
Having Microsoft Azure as a platform to work on gives you such confidence to build great services because I know if we got 100K extra users overnight all I have to do it increase the servers a couple of notches and we can cope. What I don’t want to worry about is having finance take two weeks to order me servers that arrive un-configured while users struggle and quit the service because its slow.
So as busy as I am this week, one thing I don’t have to worry about is the stability and performance of the Azure platform.