• subscribe
July 28, 2011 02:43 PM

Microsoft Works On Regaining User Confidence After Software Fiascos

Windows IT Pro
InstantDoc ID #140000
On July 27, 2011, the Microsoft Exchange Server development group closed out an embarrassing episode with the rerelease of Rollup Update 4 (RU4) for Exchange 2010 Service Pack 1 that was first released on June 22. You can download the new version of RU4 from Microsoft Support. This version replaces the previous version of RU4 plus the interim patch that Microsoft rushed out after discovering the problem in RU4 that caused them to remove it on July 13. The resulting embarrassment was compounded by the fact that Microsoft had also to remove the previous rollup update (RU3) in March after a problem was found with duplicate messages on Blackberry devices.

About 90 minutes after Microsoft announced the rerelease of RU4, Kevin Allison, the General Manager of the Exchange development group, posted some details of the problem that caused Microsoft to withdraw RU4 to the EHLO blog. Kevin explained that the root cause of the problem was an attempt to fix a bug that prevented deleted public folders from being recovered. The fix exposed some code in Outlook that caused problems when clients moved items. Microsoft's explanation given when they withdrew RU4 said:

A small number of customers have reported when the Outlook client is used to move or copy a folder that subfolders and content for the moved folder are deleted. After investigation we have determined that the folder and item contents do not appear in the destination folder as expected but may be recovered from the Recoverable Items folder (what was previously known as Dumpster in older versions of Exchange) from the original folder.

Kevin then attempted to address the concern that many customers have expressed since the recalls of RU3 and RU4. Why didn't Microsoft's extensive regression testing pick up a problem in what is after all a pretty fundamental operation? Kevin noted that the Exchange team uses a suite of well over 100,000 automated tests to validate code. It's understandable that such an array of automated tests is used because otherwise it would be impossible to test a product such as Exchange that is deployed in so many different circumstances. Kevin said that the tests are supplemented with manual validation where necessary and admitted that the downside of depending on automated testing to such a degree is that scenarios that occur outside the boundaries of the testing might not pick up a lurking bug. In this case, the automated testing exercised move and copy functions but didn't emulate the code used by Outlook for the same functions.

I guess that a risk will always exist that testing won't catch bugs and that a software development group might become lulled into a false sense of protection when they look at successful results generated by over 100,000 automated tests. However, an irritating niggle in the back of my mind makes me think that this kind of problem should have been found by manual tests. Specifically, don't the Microsoft engineers use Outlook to move items and folders in their day-to-day work?

Ah, but then we realize that Microsoft lost interest in public folders long ago and that the company's engineers are extremely unlikely to use public folders in their work, so they could never find such a problem even if they ran development code for months before releasing it to customers. The same is probably true of the problem that caused Microsoft to withdraw RU3. How many Microsoft engineers use BlackBerry devices when they can use Windows Phone 7 phones? So the bugs creep through in scenarios that Microsoft cannot anticipate or test manually through normal use.

Kevin also explained that the RU4 bug is a legacy of a change made in the Exchange 2003 Information Store. That code was carried forward into Exchange 2007 but dropped in Exchange 2010 because Microsoft introduced the RPC Client Access service to provide a new MAPI endpoint for Outlook clients. This fact demonstrates just how difficult it is for engineering groups to preserve immaculate backwards compatibility when they make important strategic changes to their product. However, you still go back to the point that regression testing should have picked this issue up far earlier—maybe during the original development cycle for Exchange 2010 way back in 2008–2009.

To be fair to Microsoft and the Exchange team, they have done a good job of communicating to their customer base and acknowledged that they have to do better in the future. They bit the bullet and withdrew RU3 and RU4 in a very public manner when other software development groups might have sought to make the necessary updates behind the scenes, away from the interested gaze of public opinion. This is the way things should be done, and I wish other product groups were as honest and open.

Finally, Kevin's note says that they have conducted a top-to-bottom review of the process used by the Exchange team to triage, develop, and validate changes for rollup updates and service packs and are making some improvements. Interestingly, Kevin says that the Exchange team is now working closer with the Outlook team to use their automated testing tools against new versions of Exchange. Given that Outlook has been the premier client for Exchange since the first release of Outlook in 1998, one wonders why it has taken 13 years for the two teams to take this step together.

The problems with RU3 and RU4 have impacted customer confidence in the testing process used to validate new versions of Exchange. No one in Redmond has been covered in glory. It's now up to the Exchange development team to earn back the confidence and trust of their user community with an increase in quality and reliability in future releases. With Exchange 2010 SP2 on the horizon, now's a good time to start


ARTICLE TOOLS

Comments
  • Chris
    9 months ago
    Aug 09, 2011

    First off, I dont recommend patching/upgrading to anyone unless the patch contains the feature or fix needed to address a current problem or need. That said, I feel the greatest point to take away here is that the group acknowledged the problem and worked quickly to resolve it while explaining what they were going to do moving forward to prevent it from happening again. For most, this gives them many points for sustaining positive customer relationships. Its always best to acknowledge mistakes and work hard to correct them swiftly and completely.

    Chris Rich
    Product Manager, NetWrix Corporation
    NetWrix is #1 in Change Auditing: Simple, Lightweight, Affordable www.netwrix.com

  • Mark
    9 months ago
    Aug 05, 2011

    I agree - it was bewildering the setup. we actually have Lotus Domino on site too - it was embarassing - how could i say Exchange was better? sure, features are great - but management?
    Why is Exchange still using it's own DB? all our other other MS products are using SQL.(SCCM, SCOM, MOSS, ...)
    not gmail though, surely murat? hotmail?

  • murat yildirimoglu
    10 months ago
    Jul 29, 2011

    Exchange team loast our confidence long time ago when they released Exchange 2007. The main problems with it were:
    1) Lack of upgrade: You couldn't in-place upgrade your existing Exchange 2003 installation.
    2) Complexity: While Exchange Server had been allways easy to install and operate, Exchange 2007 was a nightmare from a management point of view. It had too many roles while Exchange 2003 had two simple roles.
    3) emphasize on command prompt: While Exchange 2003 and prior versions rely on graphical user interface, Exchange 2007 necessitated the use of a queer command prompt.
    Now, my suggestions for the Exchange Admins are like that:
    Switch to gmail. If you have less than 10 users, switching is free. If you have more than 10 users switching is less expensive and easier. Switch and forget the mess 2007-2010 brought.

You must log on before posting a comment.

Are you a new visitor? Register Here