Lucky Lucky Boy!

Yesterday I was supposed to be spending the day thinking about SharePoint at an event Combined Knowledge were running (Vijay posted the details here)

I did spend the day thinking about SharePoint but for different reasons

First a little bit of background

We run two SharePoint sites internally that pretty much run our business

There is the SBS “companyweb” SharePoint site (WSS v2) – we’ve been using this from the beginning and we have a ton of information in here. Contract details, company calendar, contacts, customer network information, etc (you get the picture)

Not that long ago I did the side-by-side install to get WSS v3 up and running

(David Schrag has an excellent post on this and there is also the official white paper on the SBS blog)

The idea long term is to move everything over to the WSS v3 site but we’re doing it a bit at a time with the main function of that site currently being our helpdesk system

So back to the story!

I’d blogged a couple of times about problems I was having with emails and workflows so when I saw the details of an “infrastructure update” on the Microsoft download site I thought this may be the answer I’d been looking for

So I eagerly downloaded the update and this is where I made a fatal error

I’ll hold my hands up and say recently I haven’t been treating our internal systems with the same attention we would one of clients systems. We keep drumming into our clients that their systems run their business and why you need to look after them properly so I’m really disappointed in myself

So I installed the update and it failed

The “friendly error message” message was MOST unhelpful.

“Configuration of SharePoint Products and Technologies failed”

I was then informed that nothing would be rolled back and that I should correct the problem and re-run the update

This is where I first failed. Instead of taking my time and trying to figure out what the problem was I did a couple of searches and found solutions that seemed to fit some error messages I found in the logs and tried those

It made it even worse. I couldn’t get to the WSS v3 site or the v2 site (I still don’t understand why that was the case)

So at this point you’d think. Ok go back to the backup you took before you started.

Second failure. I’d just jumped in at the deep end on this one. Very careless of me

However, the overnight backup had taken a full copy of the v2 site so it wasn’t too long before I was able to get that up and running

My main panic was over as so much data was in there. Since a lot of the WSS v3 stuff is still work in progress most of the data was available somewhere else. If the worst came to the worst I’d have to start over and build it from scratch

Then I realised my next failing.

I’d been getting some notifications from the backups recently telling me “backup completed with exceptions” – basically it couldn’t backup some files so just skipped over them

I’d had a quick look and added it to my “to-do” list.

This is when I wished I’d treated the problem the same as I would a clients system and given it immediate attention. The files it skipped just happened to be the WSS v3 SQL database files…..argh!

This was when I got lucky. Before I’d started the update I had SharePoint designer open as I’d been working on some workflows and even though I wasn’t expecting anything to go wrong I took a backup from here – just in case

The difference between this and the WSS v2 site though was that it didn’t matter that the site was down. The restore fixed that!

To restore my SharePoint designer backup I needed a working SharePoint site!

Since I’d been so careless up till now I decided to get back to doing things right

I fired up a virtual machine and configured a SharePoint installation from scratch, then connected to it using SharePoint designer and verified my backup would restore ok

Once I was happy with this it was just a matter of removing SharePoint and reloading it back onto the SBS where I was then able to create a blank site and restore my backup file

It may sound so simple but it took up the whole of my day and I did my final restore at 1am

My workflows are now broken and all the alerts have gone but it could have been a lot worse

So another lesson learnt. I’ve added our internal systems onto our help desk system so it will now be treated in the same way as any other system we look after. I won’t jump in head first “just because it’s our system” and treat it no differently to any other server we look after

The next question I asked myself is why I did I get into this situation?

Impatience I guess.Things have been very busy lately and there were a ton of other things I wanted to get on with instead of testing a patch in a controlled environment to then put it on our own server. My attitude to the running of our own network was very wrong here

As with any mistakes I make I’ve certainly learnt from this one

I was a bit dubious about posting this but I’m treating it as my punishment (even though I feel like I’ve been punished twice as I missed the SharePoint event as well! :-)  ) 

The following two tabs change content below.
Andy Parkes is Technical Director at Coventry based IT support company IBIT Solutions. Formerly, coordinator of AMITPRO and Microsoft Partner Area Lead for 2012-2013. He also isn't a fan of describing himself in the third person.

Latest posts by Andy Parkes (see all)

8 thoughts on “Lucky Lucky Boy!

  • Ah you aren’t the only one to do things like this. I find the paranoia this kind of mistake has instilled in me very useful now!

    Double check? Nah at least 4 times!

  • Andy,

    Can I tell you what I have done? I run Sharepoint on a stand alone virtual PC. Before I do any updates I shutdown the machine and commit the changes to disk. I then power up the virtual machine and apply the updates. If it all works then I leave it. If it fails I close down the virtual machine and don’t save the changes therefore reverting to previous state.

    I am moving towards using HyperV with which you can easily take a snap shot and revert back.

    I would also recommend an stsadm -o backup to backup all the Sharepoint data. I have a “clean” Sharepoint site as a backup onto which I can plonk this data in case of an emergency (again in a virtual PC).

    After being bitten myself a while back I have found the virtual PC option works well.

    I’ll also point you at my Sharepoint Operations Guide http://wssops.saturnalliance.com.au which covers a lot of the stuff I have learned over the years with Sharepoint. Cheap insurance in my book.

    Thanks
    Robert

  • Andy

    This is a *terrific* post and thanks for being honest and sharing. Those lessons learned can be applied to any product/system/discipline – not just SharePoint. And I think you have highlighted something important. When I read about security breaches that end up, say costing millions of dollars in lost reputation, often the root cause is very similar.

    “there were a ton of other things I wanted to get on with”

    Do you mind if I use your post for one of mine? I’d like to expand on this in relation to more holistic governance.

  • Thanks Robert

    We’re using virtualisation a lot for testing at the moment but so much in production (yet!)

    I do normally use the stsadm backup but hadn’t gotten around to setting it up for our WSS v3 site (i missed that out of the post)

    Rest assured it’s done now!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.