Decommissioning PDC (AD) results in thoughts of seppuku.

Networking, Wireless Routers (802.11 a/b/g/n/ac/ax WiFi), NAT, LAN configuration, equipment, cabling, hubs, switches, and general network discussion
Post Reply
x-guest

Decommissioning PDC (AD) results in thoughts of seppuku.

Post by x-guest »

This involves 4 DCs, we'll call them R, A, H, and Z for simplicity's sake. I'll proceed with a highly condensed backstory so as to give some potential insight into the problem, and gradually get to the problem itself. The story begins on one sunny day ten years ago with two Domain Controllers, both properly functioning (Windows Server 2003 32bit) servers. Their names are A and R. Incidentally, A held all the FSMO roles and R was considered a "BDC". One day, R catastrophically fails. The procedure to manually remove all the junk/metatada from R that had been left on A (since a proper decommissioning isn't an option on a non-booting machine) goes well. It is now as if R never even existed at all. A new machine (Windows Server 2012 R2 64bit) is brought online and promoted to a DC to take R's place. This machine is named H. So now we have A and H. No one knows if the promotion of H was perfect but all seemed well for a long time. 5 years later it was decided to remove the aging, senile, dilapidated A so as to replace it with a superior machine. A new beefcake of a box was brought online (Windows Server 2012 R2 64bit) and promoted to the status of a DC. It's name was Z. However, for reasons that seem of no importance to the reader, A was never actually decommissioned/taken off-line. So now we have A, H, and Z, all working together (or so it seemed). YET ANOTHER 5 years go by. The company has long since had financial problems, and by now has pretty much gone from 80 employees (3D designers, developers, artists, and so on) to under 9. All three DC's continue happily humming along, oblivious to the economics and politics that can so harm human-creatures... Well, the day finally comes where DCPROMO is run on A (but not before all the FSMO roles are properly moved to Z, of course). The old rust-bucket of a 2003/32bit is finally sent to the graveyard. We are now left with neither of the original servers in the beginning of the story, but only have H and Z for the domain.

Just one problem. As soon as A was finally decommissioned and taken offline on that fateful day, everything broke. H and Z report no problems (dcdiag, dcdiag /test:dns, and other tools report no issues). And YET, the entire trust of the network has gone to sh*t. Windows client machines are failing to authenticate after users login with their domain credentials (instead popping up messages that they've been logged in using a "temporary account"). It seems all machines are downshifting to NTLM authentication. When trying to access shares on other Windows machines (or the DCs themselves), users are being asked for their credentials again. Once they're entered, they can then access those shares normally, however this is NOT THE CASE with the Linux machines on the network. The shares on Linux machines can NO longer be accessed even with the proper credentials. This has likely jacked SAMBA up on those machines. Needless to say, this is a nightmare.

The question now becomes, what in the eff happened and how does one go about fixing it?
User avatar
Philip
SG VIP
Posts: 11761
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

I am not an expert in this, but, you can try: "netdom query fmso" to make sure that the FSMO role is correctly moved to Z.. Also, you may want to run dcdiag with the /e (run against all DCs), and /v (verbose) switches to see if there is some issue that was missed.

As for the Samba shares, you can double-check the /etc/samba/smb.conf file.. but it sounds like the Linux machine is not authenticating correctly with an admin account to the Windows domain, similarly to the other clients. I believe you can test it with something like: "getent password Administrator"
x-guest

Post by x-guest »

Thanks for the reply. The good news is that all the windows clients are properly authenticating again. In fact, even the shares on the Linux machine I mentioned are now accessible (read/write) again with no problems. However, there is still something wrong at the end of the day that I think _still_ has to do with what you mentioned. Logging into the Linux machine as 'root' works normally but produces a small message "no logon servers". This of course tells me that samba is still crippled. And I know this because when attempting to try and sync files (using SVN) from the clients, the linux machine _rejects_ the attempts with the (unsurprising) error "svn: Can't connect to host 'svn' : Connection refused". To add to the already deep depression of knowing that no one is left who knows jack about linux/samba, is the fact that the Linux machine itself has gone over 3 years with no updates (since the admin responsible for it left). Basically an ancient Gentoo box with no GUI. Trying to even get simple tools like nslookup, etc. onto the machine has proven to be futile, given the micro skillset available. Needless to say, I continue to scour the Internet and various forums for answers. I suppose if anything good comes out if this, it will be a better understanding of linux/samba and how they interact with kerberos. So many pieces missing on the linux side.
User avatar
Philip
SG VIP
Posts: 11761
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

"nano" is a good text editor in command-line under Linux... Used to be called "pico" before. You should be able to SSH/Telnet to the machine and poke around:

cd /etc/samba (change directory to /etc/samba)
ls (to list files in the directory)
cat samba.conf (to list the contents of the config file)
nano samba.conf (to edit the file if need be)
...

Here is a good starting point: http://www.papercut.com/blog/matt/2004/ ... ws-domain/

I hope this helps.
User avatar
YeOldeStonecat
SG VIP
Posts: 51171
Joined: Mon Jan 15, 2001 12:00 pm
Location: Somewhere along the shoreline in New England

Post by YeOldeStonecat »

Sorry I missed this...haven't seen a Windows server question here in ages so I rarely look now.
(this is what I live and breath every day)

As to "what happened?" I'm not sure. I know some people "rush" adding new DCs to a domain and shifting over the fiz roles. I like to take my time...let things settle, manually replicate, and check the event logs.

Without knowing your network layout...and how the "new" DCs were added...and without knowing how the TCP/IP properties were configured for them (specifically what was set for primary and secondary DNS), and what what setup for the replication subnets...it's hard to say.

And as the fiz roles are shifted...the DHCP properties for the network need to be adjusted. DHCP was handing out (hopefully) the IP address of the old DC....but when a new DC takes over, you need to change the DHCP properties to hand that IP out to clients now. And for nodes on the network that are manually assigned IPs (like other servers)..those need to be updated too.

If they are not, and if replication is not properly happening, you can end up with "active directory islands"....the old DC still gets updates via DNS registration as clients boot up and log in, and the new DC fails to get those updates.

So likely, user and computer accounts got "stale" on the new DC, as the old DC had the current/active ones. Take the old DC away, and the SIDs on the new DC don't match what the workstations and user accounts have...and basically workstations have fallen off the domain, need to be unjoined and rejoined.
MORNING WOOD Lumber Company
Guinness for Strength!!!
x-guest

Post by x-guest »

Thanks for all your help, fellas. Finally got the problem ironed out. It turned out to be a cocktail of issues that needed to be addressed in a certain order, one by one. Ultimately, though, it did boil down to authentication/authorization issues both with the NEWER version of Windows Server (which now reject DES encryption by default), and how a very outdated version of Gentoo/Samba was configured (krb5.conf & smb.conf) and interacting with Kerberos (Basically failing on the TGT's due to that encryption snafu). After meddling (and fixing) the issues with the root CA on the Windows side, going by the instructions on the following page (http://www.gentoo-wiki.info/HOWTO_Activ ... nd_Winbind) enabled me to finally resolve the issue. SVN clients are once again able to sync with the server running on the Gentoo box, and so on. All in all, this harrowing experience has led to enlightenment, as I suspect most collisions with reality often do. Feels good. Thanks again!
User avatar
Philip
SG VIP
Posts: 11761
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

Glad to hear you figured it out and thanks for the follow-up so the info can help others..
Post Reply