So I set up SRM with the Compellent SRAs, everything was configured as per the best practice guides.
I would run test plans and they would run fine. But on cleanup, the cleanup would show as finished in SRM, but if you went into Enterprise Manager the test volume was still there and mapped to the cluster. SRM had unmounted and detached it from the hosts, but it appeared the SRA/EM didn’t finish the job. interesting when I did a SRM test going the other way, the clear down worked perfectly.
Anyway I kept going and actually did a planned migration of some VMs from our protected site to the recovery site. This went off perfectly. The VMs were brought up over there without issue, and they could login, the DNS customisation I talked about earlier had been applied. Some of the VMs were being recovered using vSphere Replication and the rest where using Array based replication.
Then came the time to re-protect, the vSphere Replication VMs re-protected fine, but the array based volumes failed.
The error would point to a volume ID that didn’t correspond to either the source or destination volumes.
I called up Dell/Compellent support and they asked me to try again, but this time:
Make sure the Enterprise/SAN recycle bins were empty and after the planned migration, to make sure I save the restore points and re-scan the SRAs in SRM.
I did all of this and it still fell over on re-protect. What I found strange was the engineer didn’t even want to look at any logs?! See to me looking at the logs would be one of the first things anyone would have done, but is response was “no need to look at the logs yet, we could end up going in circles”
I pointed him to a reddit post I had made, Reddit Post and they had a similar problem to us. But they noted that when they used EM 2015 R1 they never encountered any problems. We were on 2015 R3, I tried to call the support engineer a few times, and I couldn’t get hold of him, he was either away or on a webex/call. I did ask when I phoned up for him to call me back…but that never happened. He did email me telling me that downgrading was a good idea, I asked why and he said it was the recommended course of action in this situation.
Since I had trouble getting hold of him on numerous occasions, I called up and asked if there was any other engineer that had SRM/SRA knowledge and I was told oh the engineer assigned to your case will be back in about 10 mins and I will make sure he calls you. Well guess what he never called. I got an email from him at 5pm, but when I replied a minute later I got his out of office and he was gone.
I mean I understand the engineers are busy, but come on that level of service isn’t up to par, esp since I asked for any other senior tech that had SRM/SRA knowledge.
The last time I upgraded the Compellent Data Collectors and Enterprise Manager it was a pain, so I wasn’t looking forward to ti at all. but if it had to be done…..
I backed up the VMs, then took a snapshot downloaded the 2015 R1 version and uninstalled everything, and then did a fresh install with the older version. I took screenshots of all the key information so I could put it back in later. I had to configure up the EM users again for people/myself/SRM/SRAs. I did all that and then did same uninstall/reinstall for the remote data collector at the recovery site
When I re-did the array pairs in SRM, it would go through the process fine but not actually show any replications.
I spent a while pondering this, then I realised I had come across this before! It was down to the fact the accounts created for SRM to use with the SRA didn’t have the controllers mapped to it, so it would never see the actual replications.! So I logged into Enterprise Manager with the SRA user account and mapped in the controllers so all the replications would be visible!
Once I got over that, I set up some Protection Groups and Recovery Plans. I tested them out and this time the clear-down was flawless both ways, and the re-protects for the array based replications was fine too!
All in all it came down to some kind of bug in Enterprise Manager 2015 R3!