We learnt quite a lot on the last SBS install we did recently
Normally when we’re putting in a network we try to ensure we’re only working with equipment we know and trust. Normally this isn’t a problem as we recommend certain hardware to the client they just decide which specific items they want
On this particular install there was an IT person in-house who was looking after the day-to-day running of system and was in charge of managing the new server installation project
He decided that he wouldn’t need the UPS we had quoted for as they already had one and it was “fairly new”
We also know the in-house IT manager well as we’ve worked closely with him on various other projects for other companies so we had to trust him on this one
So towards the end of the install we wanted to make sure the UPS (of a brand I’d never heard of!) behaved as expected so we performed a simulated power failure (I pulled the plug out!)
Normally we’d expect the server to stay on and the software to shut the server down safely when needed
This isn’t what happened
The server stayed on for about two seconds and then restarted. Not good
Worried we had too much load on the UPS we took everything else off and tried again with just the server connected. Same thing and we also noticed a weird clicking noise coming from the back of the rack
As this was a UPS we didn’t trust we decided it best to replace it. The in-house IT guy disappeared and returned with a price for an APC UPS from one of his other suppliers at price he was happy with
Though we’d have preferred him to use the model of UPS we usually recommend the “customer is always right” (and he had budget he wanted to work to) so he ordered the UPS (APC Part number – SC1500I)
It arrived the next day and we tried again. Exactly the same problem!
We couldn’t possibly have two faulty UPS’s could we? So we thought it must be the power supply. What else could it be? We’d also determined the weird clicking noise (like a relay constant switching) was coming from the power supply so this seemed to make sense
After getting the power supply changed we tried again and had exactly the same issue
This was getting frustrating so went back to the UPS. We were getting some odd communication issues between the UPS and the server so maybe we were really unlucky and have been sent a faulty UPS so we sent that back and asked for replacement
The UPS arrived and once again we had the same issue
We had some dealings with HP support and they were pushing us in the direction of UPS (though at the time we felt like they were doing this as it wasn’t a HP component and they weren’t exactly confident about it)
Now we’ve got other ML350 G5 servers with APC UPS’s that are working just fine so we pulled one of our UPS’s out of the office so we knew we were working with equipment we knew and trusted
The problem went away!
After a bit more digging we think we’ve found out why
When using APC we usually use this part code : SUA1500I (the part number varies slightly depending on the load we’re expecting and if we need it rack mounted but it’s still the SMART UPS range)
The APC website allows you to do comparison between products which illustrates what we think is the problem
Look at the last entry
The UPS we normally use has a true sine wave output, the other UPS doesn’t (I’m guessing the first UPS didn’t either – it was so obscure I struggled to find out anything about it)
This website gives a decent explanation of the difference between the two waveform types
I’ve since been back with a brand new UPS (of the type we normally use) and there are no power issues. I did full calibration via the APC software and the system stayed on battery for about twenty minutes and a manual test (pulling the plug out!) gave similar results
So what did we learn?
Be very wary of components you don’t know. This is a great example of why we work with a specific group of products as it makes support easier and we don’t have to stayed trained on several different products that do the same thing (that doesn’t mean we stay fixed on specific items and push clients into products they don’t need or not give them any choice, for example, we work with Leibert UPS’s too).
We possibly should have brought in a trusted component earlier. We ended up chasing after the server power supply when this had nothing to do with the problem
Also be careful of how you "manage" the in-house IT guy. While we knew this particular person very well I think we could have been a little more insistent to try a product we knew and trusted once we’d decided the original UPS was to be replaced
The best way to deal with any mistake or set back is to learn from it and i think we certainly have here