Posted on

Windows Azure Virtual Machines, Not Ready For Prime Time

Just last month, Microsoft announced that their Windows Azure Virtual Machines were no longer considered a pre-release service.  In other words, that was the official notification from Microsoft that they feel their Virtual Machines offering is ready for enterprise class deployments.   In fact they even offer uptime guarantees if you employ certain round-robin and/or load balancing deployments that help mitigate the downtime in your cloud environment.

Essentially the Virtual Machines offering on Windows Azure equates to a virtual dedicated server that you would employ from most hosting companies.  The only different with the Windows Azure platform, like most cloud-based offerings, is that you need to serve as your own system admin.   This is not web hosting for business owners but for tech geeks.    In other words, it works perfect for guys like me.

Or so I thought.

Different Shades of White

As I learned tonight, there are differences between the various cloud offerings that are not easy to tease out of the hundreds of pages of online documentation touting how awesome a service provider’s cloud services are.   Sure, there are the metrics.  You can compare instance sizes in terms of disk space, CPU, and bandwidth.   You can comparing pricing and the relative costs of operating your server on each of the cloud platforms.    You can even get the background information on the company providing the virtualized environment, getting some clue (though never a clear picture) of where the servers are physically located, how many servers they have, how secure the environment is, and more.

At the end of the day they all look very similar.  Sure there are discrete elements you can point to on each comparison spreadsheet you throw together, but in the end the differences are relatively minor.   They pricing is similar.   The network and server room build-outs are similar.   The support offerings look similar.     When all is said-and-done you end up making a choice based on price, the reputation of the company, the quality of the online documentation, and the overall user interface experience (UX) that is presented during your research.

After a lot of research, and with quite a bit of experience with Amazon Web Services, all the cloud based offerings were very similar.   Different shades of white.     In the end I decided to try the Microsoft Windows Azure offering.    Microsoft has a good reputation in the tech world, they are not going anywhere, and as a Microsoft Bizspark member I also have preview access and discount services.

My decision to go against the recommendations I’ve been making to my clients for years, “Amazon was one of the first, constantly innovates, and is the leader in the space”, was flawed.    Yes, I tested and evaluated the options for months before making the move.   But it takes an unusual event to truly test the mettle of any service provider.

Breaking A Server

After following the advice of a Microsoft employee that was presented in a Windows Azure forum about Linux servers, I managed to reset the Windows Azure Linux Agent (or WALinuxAgent) application.    No, I did not do this on a whim.   I needed to install a GUI application on the server and followed the instructions presented.  It turns out that Microsoft has deployed a custom application that allows their Azure management interface to “talk” to the Linux server.  That same application DISABLES the basic NetworkManager package on CentOS.  To install any kind of GUI applications or interface you must disable WALinuxAgent, enable NetworkManager, install, disable NetworkManager, then re-enable WALinuxAgent.  The only problem with the instructions that are published in several places is they omit a very important step.  While connected with elevated privileges (sudo or su) you must DISABLE the WALinuxAgent (waagent) provisioning so that it does not employ the Windows Azure proprietary security model on top of your installation.  If you do not do this  and you log out of that elevated privs session y ou will NEVER have access to an elevated privs account again.

Needless to say, you cannot keep an enterprise level server running in this state.  Eventually you need to install updates and patches for security or other reasons.

As I would learn, there is ZERO support on recovering from this situation.

Support versus support

In the years of working with Amazon Web Services and hosting a number of cloud deployments on their platform, I had come accustomed to being able to gain access to support personnel that actually TRY to help you out.   They often go above-and-beyond what is required by contract and try to either get you back on track through their own efforts of at least provide you with enough research and information that you can recover from any issues you have with limited effort.    Amazon support services can be pricey, but having access to not just the level one but also higher level techs is an invaluable resource.

The bottom line is that Microsoft offers NO support services for their Linux images, even those they provide as “sanctioned images”, beyond making sure the ORIGINAL image is stable and that the virtual machine did not crash.    Not only do they not have any apparent means to elevate support tickets, as it turns out there is NO SUPPORT if you are running a Linux image.

Clearly Microsoft does not put this “front and center” on ANY of their Windows Azure literature.  In fact, just the opposite.  Microsoft has made an extended effort in all their “before the purchase” propaganda to try and make it sound like they EMBRACE Linux.   They go out of their way to make you feel like Linux is a welcome member of their family and that they work closely with multiple vendors to ensure a top-quality experience.

Until you have a problem.   At which point they wash their hands, as is evident in this support response along with a link to the Knowledgebase article saying “Linux.  Not our problem.”:

Hello Lance, I understand your concerns and frustration, but Microsoft does not offer technical support for CentOS or any other Linux OS at this time.

 Please, review guidelines for the Linux support on Windows Azure Virtual Machines: http://support.microsoft.com/kb/2805216

No Azure Support
No Azure Support

Other Issues

While the lack of support and the inability to regain privileged user access to my server is the primary concern that has me on the path of choosing a new hosting provider, there have been other issues as well.

A few times in the past several months the WordPress application has put Apache in a tailspin.  This consumes the memory on the server.   While that is not necessarily an issue with Windows Azure, the fact that the “restart virtual image” process DOES NOT WORK at least 50% of the time IS a big issue.   Windows Azure is apparently overly-reliant on that dreaded WALinuxAgent on the server.   If it does not response, because memory is over-allocated for example, the server will not reboot.   The only thing you can do is press the restart button, wait 15 minutes to see if it happened to get enough memory to catch the restart command, and try again.  Ouch.

The Azure interface is also not as nice as I first thought.   While better than the original UX at Amazon Web Services, it is overly simplistic in some places and downright confusing in others.  Try looking at your bill.  Or your subscription status.   You end up jumping between seemingly dis-jointed sites.    Forget about online support forums.  Somehow you end up in the MSDN network, far removed from your cloud portal.    I often find myself with a dozen windows open so I can keep track of where I was or what I need to reference, lest I lose my original navigation path and have to start over.   Not too mention the number of times that this site-to-site hand-off fails and your login is suddenly deemed “invalid” mid-session.

Azure Session Amensia
Azure Session Amensia

Moving Servers

So once again, I find myself looking for a new hosting provider. Luckily I recently made the move to Windows Azure and not only have VaultPress available to make it easy to relocate the WordPress site but also Crash Plan Pro to get all the “auxiliary” installation “cruft” moved along with it.

Where will I go?

In my mind there are only two choices for an expandable cloud deployment running Linux boxes. Amazon Web Services or Rackspace. I’ll likely end up with Amazon again, but who knows… maybe it is time to try the legendary support at Rackspace once again. We’ll see. Stay tuned.

Posted on

How To Be Unproductive Doing 8 Hours of Coding

Today was just one of those days.   You know, THAT day.   When you wake up early, get online and think “this is going to be a good, productive day”.   Then all hell breaks loose.   8 hours later you’ve found yourself coding and doing system work all day long and have NOTHING to show for it.   That was my kind of day today.   What broke & what I learned so you may possibly avoid the same issues or at least spend less time fixing them.

Woocommerce 2.0

Ouch.  The new UI is nice.  They fixed some bugs.  They patched some security holes.   They didn’t document a DAMN THING when it comes to all the modifications in the core engine.  You know, little stuff like how their hooks and filters were changed.  The stuff that many third party Woocommerce add-on packs, their own add-on packs, and my custom PayPal tracking and licensing system utilize.    That made for another long day after spending 4 hours reading their code to learn what they changed and patching those items yesterday.   I don’t know how many other Woocommerce items broke yesterday, but I’m guessing it was more than just mine.

Today the PayPal IPN listener was not working.   Turns out they completely removed the original AJAX listener that was, and still is, documented on their site as “how to get your custom PayPal buttons to record transactions in Woocommerce”.   Thus, anyone that purchased the Pro Pack from within the plugin would not get recorded.  No sale.  No license.  No fun.

Some of the hooks and filters that impacted me were the way the WC_Order methods are written, how to fetch an order from a PayPal transaction, and other niceties.   If you have a custom processor for WooCommerce and PayPal and have issues, contact me at this site and I can share some of my hacks.

After HOURS of digging I finally discovered that I had to change my PayPal IPN settings for the new Woocommerce “feature”.   Thanks Woo team for letting all of us IPN users know.   Making the change was easy enough after reading code for 3 hours to learn what the heck they broke in Woocommerce, but…

PayPal IPN

PayPal, in typical fashion, has done half-a-job in making things work.  Once I discovered what I needed to change to get the IPN working with Woocommerce I went and updated my IPN address in PayPal.   Well, in a wonderfully useful moment of forward-thinking by the PayPal dev team, I learned that if you change the IPN service address TODAY then any transactions posted before TODAY will not use the new address.

OK, I guess I get that, but here is the fun part.   I have several transactions from this morning that were sent to the wrong location.   So I just change the setting to the new location, find the transaction, and click the “resend IPN transaction to your server” button.  That should do it, right?  WRONG.   They never read the updated URL.   Thus there is NO WAY to get those few wrong transactions to be put through to my server and “do the magic”.    What a PITA.     Because of this one simple issue I then spent another hour hacking Woocommerce so I could fake an IPN transaction without leaving a huge gaping security hole in my site.

Thanks PayPal.

Oh… and for the record, your PayPal IPN needs to be like this:

http://<your site>/?wc-api=WC_Gateway_Paypal

NOT the old-school /?paypalListener=paypal_standard_IPN setting.

However, not to be outdone, Microsoft decided to get in on the act.

Azure Code-word For…

I finally figured out why Microsoft named their new cloud service Azure.    At first I thought it was some clever reference to the sky… you know, clouds are in the sky, the sky is blue, but Amex kinda took that word and marketed the crap out of it… so clever Microsoft came up with Azure.  How cute.   But then I figured out what it really means…

Servers are shutdown out of the clear azure… I mean blue…

Yup, that’s right.    Here I was logged in to the middle of a log tracking session as a priv’ed user and suddenly out of nowhere I get a “Server being shut down for POWER OFF. Now.”.   WTF?!?!?    Yeah, that’s right.   Microsoft decided to just shut down my server.     I can’t get any answers from them as to what might have happened, how to look for potential issues, NOTHING.  Not a word.

Thinking “there must be a plausible explanation for this” I spent another few hours scanning my server logs.    Security breach?  Nope.  At least none that are recorded in ANY of the log files.      Rogue shutdown command?  Nope.   Hardware fault on the virtualized “metal”?  Nope.     After looking at Unix & Linux system log files over the past 25 years it sure as hell looks like someone just plain hit the reset or power-off button on the hardware.  For a virtual machine that simply means some dill-hole at Microsoft clicked the “power off” button on the wrong server in the host manager interface.   No warning.  No “let me check and make sure this is the right server”.  Nothing.

Maybe Microsoft can prove me wrong and show me a memory fault, security breach, or some other internal-based explanation of how my server just shut down but I’ll be damned if I can find it.

I’m going to hold my breath until Microsoft responds.

Now that I think of it, maybe that is where they came up with the name Azure…. I’m already starting to feel a bit dizzy… think I’m turning Azure….

Quality of Service

I didn’t even touch on the complete suckitude of a system that Microsoft-owned Expedia produced that I got to deal with in between.  When I get the occasional email from a Store Locator Plus customer saying things like “I can’t believe you can’t do X” or “how could you release a product with Y not working?” I always think of the “awesome quality” that everyone else can produce but I cannot.  Yeah, I’m being sarcastic here.    Today I got to deal with anything-but-perfect services that multi-million dollar corporations with big dev teams, QA teams, and a plethora of available beta testing users available and still have problems.

I guess for a solo act I’m not doing so badly.

At least today was not COMPLETELY wasted.   I did happen to stumble across a few-dozen douchebag hackers from India, China, and Russia that have been trying to brute force my server.  None got in, but it did remind me I need to get better security on my new server.    That is content for another article.    Maybe after I get some actual code written for Tagalong.  At this rate I’ll be lucky to get that done before WordCamp Atlanta next week.

 

###

 Alexa Traffic Rank: 260,950 United States Flag Traffic Rank in US: 87,219

link-icon Sites Linking In: 640

 

 

Authority icon

Technorati Authority: 106
Rank: 23373