vCloud Director, VMware

CA-signed vCloud Director certificates no longer trusted – SAN missing

This post has been moved to https://www.funkycloudmedina.com

Hello again! Today’s adventures drove me a little wild…

Some background first. In my test environment, I have a full vCloud Director v8.10.1 deployment, load balanced with an F5 LTM. The certificates are loaded on the F5 so that traffic is terminated and re-encrypted on it’s way to the vCloud cells. Since deployment, both the http and console FQDNs functioned as expected. This all changed just a few months ago…

Some users of the test vCloud deployment reported certificate validation errors when accessing the ‘http’ site for vCloud. They also reported that the console had stopped working. I checked the certificates’ validity periods and they seemed OK. I spoke to the networking team and they confirmed the certificates on the F5 were also A-OK.

I dug deeper into the certificate validity message in Chrome and found this:

san-missing

I looked around online and found that starting with Chrome v58 and Firefox v48 (only source I could find), support for SSL certificates without Subject Alternative Names had been deprecated.

This is very interesting! Why would this issue happen to this environment, when I’m almost certain that SAN attributes are included as part of the VMware doco. In fact, I’m definitely certain it’s there…

keytool 
   -keystore certificates.ks
   -alias consoleproxy 
   -storepass passwd
   -keypass passwd
   -storetype JCEKS
   -genkeypair
   -keyalg RSA
   -keysize 2048
   -validity 365 
   -dname "CN=vcd2.example.com, OU=Engineering, O=Example Corp, L=Palo Alto S=California C=US" 
   -ext "san=dns:vcd2.example.com,dns:vcd2,ip:10.100.101.10"

Yup, there it is.

The linked VMware doco will step you through generating a keypair in the form of a self-signed certificate and private key into a keystore you specify (one will be created if it does not already exist). The SAN attribute specified in the example above will go into this self-signed certificate. However, if you attempt to create a CSR from this self-signed certificate using the instructions from VMware, you will be left with a CSR with no SAN attributes.

You can check the CSR yourself by running:

openssl req -in {csr-file} -noout -text

You’ll see that there are no Subject Alternative Names specified. Without knowing this the first time around, I submitted this newly generated CSR to my internal Microsoft CA. While the certificate was issued successfully, none of the SAN attributes had been included.

This is due to VMware leaving a very important switch off the CSR generation command to make sure that SAN attributes are included in the CSR.

To get the SAN attributes included in the CSR, you’ll need to modify VMware’s example from the doco. Instead of running this command to generate the CSR from the self-signed cert:

keytool -keystore certificates.ks -storetype JCEKS -storepass {password} -certreq -alias http -file http.csr

You’ll want to add your SAN attributes to the keytool certreq command so it looks like this:

keytool -keystore certificates.ks -storetype JCEKS -storepass {password} -certreq -alias http -file http.csr -ext SAN=dns:vcd2.example.com,dns:vcd2,ip:10.100.101.10

Huge credit to Eric Lawrence from textslashplain for sending me down the rabbit hole. Even bigger credit to StackOverflow user MrPatol for basically spelling out the fix (original SO thread here).

Advertisements
Certifications, VMware

VCAP6-DCV Design certification achieved!

After months of studying and a Design and Deploy course for good measure, I passed the incredible VCAP6-DCV Design exam!

VMW-LGO-CERT-ADV-PRO-6-DATA-CTR-VIRT-DESIGN-K

This certification in combination with my VCAP5-DCA from 18 months ago has awarded me the VCIX6 title which I’m incredibly proud of. Here’s the official link from VMware about the certification: https://mylearn.vmware.com/mgrReg/plan.cfm?plan=89125&ui=www_cert

To give you an idea of what the Design exam is about, take a look at www.virtualtiers.net. That exam simulator is a great representation of how the exam is formatted and how you’re tested. The content isn’t the same as the exam for obvious reasons, but it helps.

I’ll be doing a quick post sometime soon on all the resources I’ve collected during my preparation.

Now to slap these badges on the sidebar!

vCenter, VMware

PSC 6.0U3 not respecting certool.cfg settings when generating VMCA CSR

This post has been moved to https://www.funkycloudmedina.com

After a very successful and quick migration from Windows SSO 5.5 U3e installation to a Platform Services Controller v6.0U3 appliance I was ready to get my VMCA into action.

We have a corporate internal Microsoft CA with the VMware certificate templates already created as per VMware KB 2112009. Everything was coming up Milhouse, until CSR generation time using the ‘certificate-manager’ on the PSCs.

After stepping through the ‘certificate-manager’ wizard and having the CSR and private key files sent to a directory of my choosing, I quickly inspected the CSR using openssl to make sure I was on the right track:

openssl req -in vmca_issued_csr.csr -noout -text

My CSR still had the old self-signed details of the PSC node! Sure, it was marked as a certificate authority, but contained all the default VMware self-signed details.

I had a look in the VMware pubs (specifically this bit) and found that it’s possible to generate the CSR with my own config file. Using the “certool.cfg” template config file in /usr/lib/vmware-vmca/share/config, I quickly spun out a config file to match my VMCA node details and stuck it in /tmp for the time being.

Here is how you use certool command:

/usr/lib/vmware-vmca/bin/certool –gencsr –privkey={destination of private key} –pubkey={destination of public key} –csrfile={destination of new CSR} –config={the config file I created}

And here is what I ran:

/usr/lib/vmware-vmca/bin/certool –gencsr –privkey=/root/vmca_private.key –pubkey=/root/vmca_public.key –csrfile=/root/vmca_req.csr –config=/tmp/vmca.cfg

Obviously, you can name the files whatever you like.

While this seems like it should’ve worked and should churn out a VMCA compatible intermediate CSR, it doesn’t. It only creates a CSR for a normal ‘machine’ certificate (compared to what I wanted which was a CA signing cert). I couldn’t figure out the config requirements to generate a CSR for a CA. But how was the certificate-manager doing it?

Certificate-manager is actually generating a CSR from an existing certificate while using a config file to overwrite most of the parameters. The certificate it uses is the default VMCA self-signed root certificate, and the config file is made up from your answers in the certificate-manager wizard. Cool! Maybe I’ll try this manually using the certool instead, thinking certificate-manager has regressed in Update 3. Referencing my previously crafted cartoon.cfg file in /tmp, here’s what I ran:

/usr/lib/vmware-VMCA/bin/certool –gencsrfromexistingcert –privkey=/root/vmca_private.key –pubkey=/root/vmca_public.key –csrfile=/root/vmca_req.csr –certfile=/etc/vmware-vmca/*************

Unfortunately, this didn’t work either. I still ended up with a CSR with all the details of a self signed VMCA. It definitely looks like the 6.0U3 certool has regressed and is experiencing a similar bug to 6.0U1 (6.0U1 release notes).

The only way I was able to get around it was using a temporary 6.0U2 PSC machine and using the certificate-manager tool to create the CSR and private key. The CSR and key were taken off the temporary PSC, submitted and approved to my enterprise CA with great success. I was able to use the 6.0U3 certool to install the new VMCA intermediate certificate.

Let me know in the comments if you found a fix or are experiencing the same issue.

vCenter, VMware

Empty inventory after SSO v5.5 to PSC v6.0 U3 migration

This post has been moved to https://www.funkycloudmedina.com

After performing the vSphere v5.5 to vSphere 6.0 migration in our testing environment with great success, I began work on our production environment. First things first, migrating Windows SSO to PSC appliance.

I had successfully converted the first machine, and started doing some testing. Things like logging into the thick client and checking all vCenter servers and basic login services.

Problem

Out of 6 vCenter servers, only 1 was having issues. Logging in with the SSO administrator account I was able to see entire inventory and all services were running just fine. However, attempting to login with my org’s domain account was met with some generic “You do not have permissions to login”. Quickly jumping over to the SSO administrator session, the permissions for the affected vCenter were completely gone, only the SSO admin was listed as an administrator.

Cause

All vCenter servers have a security setting called Active Directory Validation. Essentially, this setting will perform a synchronization of AD users and groups every X minutes with the domain that vCenter is connected to. If vCenter is unable to perform the validation (SSO is unavailable, for example) then vCenter will remove all invalidated users and groups. For my environment, vCenter was set to sync every 24 hours. This timer begins when the vCenter service starts.

In what may be the worst timing ever, I had restarted the vCenter server roughly 24 hours before I had performed my SSO->PSC migration. This resulted in vCenter attempting to validate just as SSO had become unavailable during the migration. Goodbye user and group permissions.

Fix

To get this vCenter usable, I ended up just re-adding the required ACLs to vCenter for the time being. Although, I did find a VMware KB article on how to restore your permissions from a vCenter DB backup: KB2086548

If you want to prevent this from happening on your vCenter servers, just disable the AD validation setting until you’ve finished your migrations.

vCenter, vCloud Director, VMware

SYSTEM_REFRESH_VIMSERVER – Could not register vCloud Director as an extension to vCenter Server

This post has been moved to https://www.funkycloudmedina.com

While trying to troubleshoot another problem, we tried Refreshing vCloud to vCenter which includes registering/updating the extension. This is when we hit a beauty we’d never seen before:

vimserver refresh

Alright, calm down. Probably something with the network, right? And  if it’s not the network then it’s probably DNS. Right? Wrong.

I dug around in the vCenter MOB and found the vCloud Director extension. As expected it already had a “vCloud Director-1” named extension. What I found odd was the last heartbeat time was back in 2013. Interestingly enough the last version recorded was also v5.1.2. I say interestingly because we are running v8.10.1 for SP.

Jumping into our test environment, I performed a Refresh of our test vCloud instance to vCenter and lo and behold it happened there too! I couldn’t find anything in the vCloud logs reporting the why behind this failure, but I needed to get this running and quick, too.

Knowing that the vCloud DB stores its own references to the vCenter MOB, and that vCloud would try to register itself as vCloud Director-1 again, I theorised that we could remove the existing extension and perform another Refresh without causing any issues.

So, that’s what I did right in the test environment. It went without a hitch. Rolled the same change out in production and it went beautifully.

If you’re getting this error, I’d suggest taking a backup of your vCenter server/DB and removing the existing vCloud Director extension.

Removing the extension (from KB1025360):

  1. In a web browser, navigate to http://vCenter_Server_name_or_IP/mob.
    Where vCenter_Server_name_or_IP/mob is the name of your vCenter Server or its IP address.
  2. Click Content.
  3. Click ExtensionManager.
  4. Select and copy the name of the plug-in you want to remove from the list of values under Properties. For a list of default plug-ins, see the Additional Information section of this article.
  5. Click UnregisterExtension. A new window appears.
  6. Paste the key of the plug-in and click Invoke Method. This removes the plug-in and results in void.
  7. Close the window.
  8. Refresh the Managed Object Type:ManagedObjectReference:ExtensionManager window to verify that the plug-in is removed successfully

Now go back to vCloud and perform a Refresh against your vCenter server. You should be back in action now!

vCloud Director, VMware

‘java.lang.NullPointerException’ received when modifying objects in vCloud Director

This post has been moved to https://www.funkycloudmedina.com

Problem

Roughly 2 weeks ago one of our vCloud Director tenants reported an error when attempting to increase a disk on their VM. They were told to contact their cloud administrator (yay). When we tried to perform the increase, we received an error we’d never seen before: “java.lang.NullPointerException”.

javanull

Here is what we checked:

  1. Confirm the tenant Org vDC has the appropriate resources available (this was an ‘Allocation’ style vDC).
  2. Check the status of vCloud to vCenter connection and perform a vCenter Reconnect followed by a Refresh. This actually exposed another issue written about here.
  3. Check Log Insight for entries similar to this. We found the entries, but even after viewing the log in context we couldn’t find a cause or correlated action.
  4. We tested the same changes against other Org vDCs. We found that newly created test Org vDCs were fine and unaffected by whatever the root issue was. Only some of our existing Org vDCs experienced these issues.
  5. We found that it wasn’t limited to disk changes. Performing any action on infrastructure within these affected Org vDCs resulted in the same error.

We spent longer than we should’ve trawling through the vCloud logs, and ended up logging an SR with VMware.

After going through the intricate details of what we were experiencing and the testing we’d performed, VMware requested a copy of our vCloud databases and a list of affected VMs, vApps and Org vDCs.

It was a nail biting few days, but eventually our assigned tech got back to us and found the cause of this error. There was a stale resgroup-id record in the org_prov_vdc table. Let me explain…

Cause

There are 3 tables in the vCloud Director database that track Org vDC entities and their corresponding Org vDC resource pool and the parent vCenter resource pool. These 3 are supposed to be kept up to date/in sync by vCD:

  • vrp – this tracks Org vDC resource pool names in vCenter that correspond to Org vDCs. It also tracks the resource model (allocated, PAYG) and any compute resource settings that are applied to the resource pool.
  • vrp_rp – stores the vrp_id from the vrp table, along with the sub_rp_moref value for that vrp.
  • org_prov_vdc – stores data related more to the Org vDC entity itself (name, description, network pools, VM folder Moref IDs, resource pool Moref IDs etc)

Notice the bolded resource pool Moref IDs” comment above. This is important, as this value should be the same as what’s stored in the sub_rp_moref column in the vrp_rp table.

Disclaimer: all of the steps below were performed with VMware support.

You can find out if a particular Org vDC has a stale record in the org_prov_vdc table by performing the following queries against your vCloud Director database:

SELECT id FROM vrp WHERE name LIKE '%My Org vDC Name%'

This will return the vrp ID for your Org vDC. Replace the bold ID in the following query with the ID you received in the last step:

SELECT sub_rp_moref FROM vrp_rp WHERE vrp_id = 0x31JSD81AA0923NAFV801234UASD2BF76

Note; this ID has been changed to protect the innocent. Yours will differ.

From the above SELECT query, you’ll get a resource group ID similar to this:

resgroup-6000

Run this query to find out what the current value is in org_prov_vdc table. Make sure to change the Org vDC name inside the percent signs.

SELECT sub_rp_moref FROM org_prov_vdc WHERE name LIKE '%My Org vDC Name%'

If the values from the vrp_rp table and the org_prov_vdc table do not match, then you’ve got a stale moref in the org_prov_vdc table.

Fix

To fix this stale record take the resgroup ID, and the Org vDC name and run the following query:

UPDATE org_prov_vdc SET sub_rp_moref = 'resgroup-6000' where name = 'My Org vDC Name'

All done. You should now be able to make changes to your vCloud objects.

If you’d like to find all Org vDCs with stale moref IDs in the database, I’ve written a small query that can do that for you:

SELECT vrp.name as vrpNAME, org_prov_vdc.name as orgprovNAME, vrp_rp.sub_rp_moref as correctMOREF, org_prov_vdc.sub_rp_moref as staleMOREF
FROM vrp
 JOIN org_prov_vdc
 ON vrp.name LIKE '%' + org_prov_vdc.name + '%'
 JOIN vrp_rp
 ON vrp_rp.vrp_id = vrp.id
 
WHERE vrp_rp.sub_rp_moref != org_prov_vdc.sub_rp_moref

 

The root cause for all of this has not been found. I’m hoping VMware support can provide us with a little more information so I can update this post.

vCenter, VMware

vSphere 6 certificate templates with SHA256 encryption

This post has been moved to https://www.funkycloudmedina.com

I was just in the middle of configuring a PSC 6.0 node’s VMCA as an intermediate CA and, in traditional fashion, went to request a certificate from a 2008 R2 Microsoft CA using the web enrollment form (as per this VMware KB article).

Oddly enough though my brand spanking new vSphere 6.0 machine and intermediate CA certificate templates were missing from the template selection drop down.

I had a look around online and found that MS CA v3 certificate templates are not supported in the web enrollment form. Why is this relevant? Well, this VMware KB states that if you use SHA256 encryption in your environment you must select Windows Server 2008 Enterprise as your certificate template version. That instantly sets your certificate templates to v3.

Damn. How was I going to submit my CSR to this Microsoft CA and get back my certificates?! The Certificate Management snap-in doesn’t allow CSR files to be submitted. It’s just not an option.

Luckily we have the trusty certreq tool. I was easily able to submit my CSR file to the Microsoft CA and get a certificate back in a simple command:

Certreq -submit -attrib "certificateTemplate:vSphere6.0VMCA" vmca_issued_csr.csr

Make sure you specify the correct certificate template. In my example above, I was after the VMCA intermediate CA template. The file specified was in my cmd working directory and is the same file the PSC’s spit out when you’re using the certificate manager tool.