Author Archives: Joe Stocker

How to share your Outlook calendar free/busy with your friends and family

Both Microsoft Exchange on-premise and the hosted version of Office 365 provide a calendar publishing feature that makes it easy to share your calendar free/busy information with your friends and family.

For end-users, it takes less than 5 minutes to have this working and only takes a few mouse clicks.

1. Logon to Outlook Web Access and click on the Calendar.

2. Click on the Share menu, and then click on “Publish this Calendar to Internet’

3. Select the Publishing detail (how much do you want to share, the full details of your calendar or just when you are free or busy? Also, select the access level – should anyone be able to view your calendar or only those who receive a link to your calendar?

4. That’s it! Now, to share your calendar, just click on the Share Menu again, and select  ‘Send Links to This Calendar.

5. Enter the email address of your friends or family

After you click Send, then the recipient/s will get an email with an invite. When they click on the hyperlink containing the .ics file, they will have a pop-up message like the one below ‘subscribe to the calendar?’

After they click Subscribe, they will then be able to View your calendar.

The previous steps just work out of box with Office 365. However, if you do not have Office 365, your email administrator can still setup Internet Calendar Publishing if they are running Exchange 2010 SP1 or later. For more information, see:
http://technet.microsoft.com/en-us/library/ff607475(v=exchg.141).aspx

10 Recommendations for preventing worm outbreaks

The US Department of Homeland defense issued a statement on Friday to disable Java. It’s a serious recommendation because many business applications rely on Java.
http://www.kb.cert.org/vuls/id/625617 

[Update 1/13/2013 3:43 PM PST]
Oracle has just released Java SE 7u11 – an emergency software update.
http://www.oracle.com/technetwork/java/javase/downloads/index.html 

The good news is this problem only impacts the very latest versions of Java, so organizations that are behind should be okay. Java installs an auto-updater that nags users to update, so it could be hard to predict how many systems are vulnerable without some type of software inventory tool like Microsoft’s System Center Configuration Manager or Windows Intune.
My guess is many organizations will not disable Java, either because they don’t have the tools to do so, or because they are just going to cross their fingers that they don’t get hit by a worm; perhaps the loss of productivity is greater than the potential impact of a worm. I wouldn’t pause to disable Java because I recently witnessed first-hand how a modern worm can quickly bypass traditional security controls. The result is a complete loss of productivity where users could not access file shares for days.
Consider the typical minimum safeguards that most businesses have in place today:
1. Firewall
2. Antivirus Software
3. Windows Updates

If this is all you have to defend yourself against a modern worm, it is only a matter of time before an employee, vendor or guest brings an infected system onto your network. That is when you will find out that traditional safeguards have not kept pace with the modern worms that are spreading. These worms are being written by state-sponsored organizations.. Not the stereotypical 16-year-old kid looking for attention.  It has always been an arms-race between the virus writers and the security vendors, but lately the bad guys seem to be on top. These are professional teams who sometimes directly target specific users within an organization who have elevated administrative rights on the network. They can also be financially motivated, distributing so-called “ransomware” that holds your data hostage unless you pony up the cash.
The level of sophistication that goes into these worms is astonishing. Consider the multiple attack vectors that these worms can spread through: email, network, USB thumb-drives; virtually any and all methods of propagating. They mutate themselves often to evade detection, then silently send your passwords and private information overseas. They inject themselves into known-good processes to evade detection. They can also spread by exploiting vulnerabilities in the host operating system. But usually they spread by taking advantage of people’s naivety. “But the pop-up said I had a virus on my system and it said to click here to clean it!” Yep.
This requires IT Security policies and procedures to be updated to combat the threat and innovative strategies and tactics to be developed.
I want to make an important distinction between worms/viruses and Malware. Malware infects a single system and does not spread. MalwareBytes is a tool that does a pretty effective job at removing Malware from a single system. But if you have a handful of staff supporting hundreds of users, MalwareBytes is not an effective tool to clean hundreds of systems that are simultaneously infected.

(disclaimer: the following recommendations are for educational purposes only and there is no warranty expressed or implied; use at your own risk).

1. Do not rely on traditional Antivirus alone.

Traditional antivirus engines rely on signatures to detect threats. Lately they have been getting smoked by Malware, Viruses and Worms because they automatically mutate themselves to stay a step ahead of the definition updates.
Zero-day worms are even more sophisticated – they can call home to distributed command center that has an ever-evolving list of domain names so you can’t block a specific static list of IP addresses or domain names at the firewall level. 
Therefore, you really need to combine signature-based AV with behavioral-based AV such as SONAR or Bit9. SONAR develops a profile for a process and then determines if it is a threat based on its behavior, eliminating the dependency for virus definitions (but it should be deployed to supplement AV signatures not completely replace them). For example, if a particular process tried to access the system folder and tried to call home, but does not have any running UI. Also, it downloaded more than 15 files the previous day. Any one of these things alone may not be “bad” but taken as a whole, the behavioral profile is bad, and it can then prevent the process from executing. By taking into consider a processes’ communication characteristics, a behavioral based AV solution is much more effective than a signature-based solution alone. This is not a perfect science, as legitimate processes can be quarantined, but in a controlled environment, those processes can be proactively whitelisted.

So do yourself a favor and deploy the latest AV solution possible, with the most locked down configuration that still allows your applications to function. Security has always been a trade-off between productivity and security, but many are predicting 2013 to be the year of the worm, so it is important to be very proactive and not wait until it is too late. 

2. Do not give end-users local Administrator rights to their computers

If a virus cannot gain a foothold onto the computer to begin with, then half the battle is already won. In the past, this type of configuration would result in increased helpdesk requests (and increased support costs) because end users had to rely on someone else to install printers and software on their systems.
However, the last three major versions of Windows include a feature called User Account Control (UAC) that allows the user to run under a non-privileged account, and supply credentials only when necessary (a process known as elevation). Many IT departments are quick to disable this feature for fear of complaints from users, and to those departments I say it is time to re-evaluate that decision.

Worms that use Windows vulnerabilities do not require local admin privs to spread, they can perform a privilege escalation to grant themselves administrative rights if the system has not kept up to date with Microsoft updates. Worms like W32.ChangeUP disable the registry key for Windows Update, to prevent the machine from fixing those vulnerabilities.

IT Users with Domain Administrator rights must have a separate username and password that they only use sparingly to perform those duties that require elevated rights. Otherwise, if a worm executes itself on a machine with domain admin rights, say good bye to your network.

3. Patch 3rd party products like Java, Acrobat and Flash

How do you patch 3rd party software today? Windows Server Update Services (WSUS) cannot do it. There are three methods native to Microsoft: Group Policy or Scripting, System Center Configuration Manager or Intune (kind of like a Cloud-based SCCM).
Windows Update alone is not enough to protect your network from worms and viruses. It is now mandatory to patch applications like Adobe Acrobat, Flash and Java.

As evidenced by the DHS Java announcement, viruses and worms are spreading not just by exploiting vulnerabilities in Internet Explorer and Windows, but they are increasingly exploiting Adobe Acrobat and Java.
Windows Intune can be used to effectively deploy software updates to computers. Similar to its big brother System Center Configuration Manager, Intune runs in the cloud so there is no back-end infrastructure to setup or maintain.

4. Disable Auto Run

Many worms spread by attaching themselves to network file shares and placing an Autorun.inf file on the share. When the user opens the folder, Autorun.inf will cause a virus to load, even if the user did not open an executable file directly.
Auto Run can be disabled via Group Policy. There are two policies to update: one for XP and one for Vista/Win7/Win8.

Vista/Win7/Win8 Group Policy Setting:

Computer Configuration, expand Administrative Templates, expand Windows Components, and then click Autoplay Policies.
In the Details pane, double-click Turn off Autoplay.
Reboot client computers.

Windows XP

Computer Configuration, expand Administrative Templates, and then click System.
In the Settings pane, right-click Turn off Autoplay, and then click Properties.
Note In Windows 2000, the policy setting is named Disable Autoplay.
Click Enabled, and then select All drives in the Turn off Autoplay box to disable Autorun on all drives.
Click OK to close the Turn off Autoplay Properties dialog box.
Restart client computers.
http://support.microsoft.com/kb/967715 

5. Enable Windows Firewall.

This can prevent a worm from scanning and spreading itself on various ports. Windows Firewall could potentially disrupt valid business applications so be sure to test this and any other configuration before deploying in a production environment.

6. Deploy a virus cleaner on computer startup

[Updated 1/18/2013]

The other technique is to deploy a free tool like McAfee’s stinger.exe. This is a stand-alone executable that can remove many of the worms out there.

Put this in the Domain controller’s sysvol\domains\scripts folder because it is shared out as the netlogon folder, and that way clients will download stinger.exe from their nearest domain controller to minimize the impact on the WAN.

REM Begin cleaner.cmd

if exist %userprofile%\stinger.exe goto end
echo not yet
copy \\contoso.com\netlogon\stinger.exe %userprofile%
%userprofile%\stinger –adl –delete –go –silent
:end
REM End of cleaner.cmd

Notice that if stinger exists then it won’t run a 2nd time, that is to prevent this from running more than once because it consumes a lot of CPU (end users might want to be informed that their computers may slow down a bit).

Then create a group policy that references that cmd file. I recommend putting it in the computer startup scripts so that it runs as local system rather than as a user process. Then email the users and tell them to reboot to take effect.

7. Deploy Network Access Protection (NAP)

Network Access Protection (NAP) is really important to deploy on VLANs where your critical line of business systems are located. Imagine the scenario where someone takes their laptop home, and their child unknowingly downloads a virus on the machine while playing an online game. When the adult brings that system back into work, the worm could spread the moment they plug into the network. They could also do the same damage if they connect to the network from home over a VPN connection. By deploying NAP, the system will first have to go through a health check to validate that AV is running, has the latest virus definitions, and has the latest Windows updates. If it passes the checks, then it can be permitted to communicate on the network.
Deploying NAP takes a serious commitment because it may involve re-architecting the network boundaries to accommodate the multiple requirements.
http://technet.microsoft.com/library/cc771746.aspx 

8. Use File Screening on your file servers

[Updated 1/18/2013]

Windows Server 2003 R2 and up has the ability to block .exe files and .inf files from being placed on file shares. This can be an effective technique to prevent worms from placing themselves on file shares.

2008 R2 Instructions:
http://technet.microsoft.com/en-us/library/cc732074.aspx

2003 R2 Instructions:
http://technet.microsoft.com/en-us/library/cc755492(v=ws.10).aspx

9. Adopt Defense in Depth

Deploy multiple levels of antivirus and defense.  Select different vendors at each layer of your network. It is a mistake to deploy the same antivirus engine at the gateway or web proxy that you do on your desktops. Otherwise the virus that evades your web filter will also evade your desktop. Not filtering web requests? Your users can unknowingly download viruses into your network by checking their personal email and downloading threats from email attachments that do not go through your hardened email server.
I recommend using OpenDNS (paid) or Dyn.com Security Guide (free) to filter DNS requests from known domains that host spyware and malware.

10. DNS Sync Holing

[Updated 1/18/2013]


DNS sinkholing is an effective technique where you host DNS zones that the worm tries to lookup instead of blocking those IP’s at your firewall. The DNS zone is populated with the IP address of your IDS sensor.  This is similar to a Wifi honeypot or tarpit. This is effective for two reasons:
1. It provides the worm a DNS response, so the worm does not attempt to lookup any other domain names. It thus prevents the worm from calling home and getting a new variant.
2. It provides your IDS sensor the exact IP addresses of the infected hosts so that your incident response team can go and clean those systems. This is more effective than firewall logs because those might only show the last previous hop if the last gateway strips off the original host IP.

How to deploy DNS sinkholing quickly.
Worms can use dozens of DNS zones to call home, so the quickest way to create the zones is to use the DNSCMD command built-into Windows:

Step 1: Create the zones
dnscmd /zoneadd ddnsd.at /DsPrimary
dnscmd /zoneadd noip.at /DsPrimary
dnscmd /zoneadd 3d-game.com /DsPrimary
… (repeat for all zones).. Note: DsPrimary means AD Directory Service integrated, meaning this will replicate to all domain controllers. This allows you to only have to run this on a single DC and it will replicate the zones everywhere. You can later clean these zones up with another dnscmd script.

Step 2: Populate the zones with @ records pointing to your IDS sensor
dnscmd /RecordAdd 3d-game.com @ A 192.168.1.2 (<-change this for your IDS)
(repeat for all zones)

Summary
image

Even if you do all the things recommended in this article, you could still get hit by a zero day worm. Therefore, it is important to review your antivirus logs regularly (daily if possible) or configure email alerts so that you can become aware of outbreaks as soon as possible. Make sure you have your Antivirus vendor contact information and support contract numbers at hand. If your network is compromised, engage your Antivirus vendor early in the process so that you can upload the specific strain of worm that has infected your network. They can tell you which virus definitions are effective for removing the threat. This is especially important if it is a zero day threat, or a threat that mutates daily. Communicate to your end users early so that they know what to avoid clicking on. As part of a Business Continuity Plan, departments should have plans for how their business processes can continue to operate without computers. Develop a communication plan for how IT will communicate with each other and key decision makers and end users if the email system is incapacitated.
There are many things you can do proactively to safeguard your network. Hiring a dedicated Security Engineer with CISSP certification is a great start. Hiring an outside consulting company to give you an objective analysis of your strengths and weaknesses is another good idea, and then having them come back to measure you against this first baseline periodically is also a good idea. Providing security awareness training for your end users is also very important.
I think it is also important to keep a level head and not overreact to every news article about the latest threat. Don’t overwhelm your users with scary emails. Sometimes our response to a problem can create a worse situation than any virus or worm outbreak. Therefore our responses should be carefully measured and tested when possible.

Some worms spread by guessing weak passwords on servers, shares and SQL applications. Most publically traded companies are required to change their passwords frequently and should have strong passwords. Private companies are advised to follow suit as this is a wise practice to adopt.

Why backups are important

If a worm or virus does some damage, you may need to restore from Backup.
Before you restore from backup, develop an Incident Response Procedure to inform users about any potential data loss that could occur as a result of the restore. If possible perform one last backup prior to the restore so that you can selectively restore any valid files that may have been saved by users after the last backup was taken. Do not perform the restore until after the threat has been eliminated from the network, otherwise the restore files could become re-infected – wasting valuable time and frustrating end-users.

Keep calm and carry on.

Active Directory Migration Toolkit (ADMT) Walkthrough

Active Directory Migration Toolkit (latest version is v3.2) is a free tool that allows both Inter-Forest and Intra-Forest user, group and computer migration.

Installation ADMT Version 3.2 must be performed on a Windows 2008 R2 server (Member server highly recommended). It only requires SQL Express to be installed as a prerequisite.

An Inter-Forest migration is popular when an organization merges with another organization. An Inter-forest migration requires a forest trust between the two forests. This in itself requires name resolution between the domain controllers and implies WAN connectivity as well.

Objects can be continually be migrated and merged into the target over and over if it is necessary to edit the source object even after the new target object has been created. The ADMT guide goes through this in detail.

The trust relationship must be configured to permit SIDHistory to flow across the forest trust. This can be done with the following command:
Netdom trust <Sournce Doamin> /domain:<Target Domain> /EnableSidHistory:yes

Passwords can be migrated using the Password Export Server ((PES) v3.1) or new passwords can be generated.  This is a separate download and is installed on the source domain controller.

PES performs an initial sync of the password and can be used for subsequent password updates but was not designed to be used as a password sync tool. For that you would need Forefront Identity Manager (FIM) and Password Change Notification Service (PCNS). PCNS has its own set of requirements, for example, it must be installed on each domain controller in the source domain whereas PES only needs to be installed on a single domain controller (the one you select as the source domain controller during ADMT migration).

It is a little tricky because you first must generate an encryption key on the ADMT member server located in the target domain.

http://technet.microsoft.com/en-us/library/cc974435(v=ws.10).aspx

Note: Microsoft recommends that you run the PES service as an authenticated user in the target domain. This way, you do not have to add the Everyone group and the Anonymous Logon group to the Pre–Windows 2000 Compatible Access group.

Notice the warning: You must reboot before ADMT’s Password Migration DLL will be operational.

After reboot, the service does not start automatically and needs to be started

You can then invoke the Password Migration Wizard on the ADMT member server. If you get an error “unable to establish a session with the password export server” – check to make sure the “Password Export Server Service” is running on the source domain controller.

Note: The flag “user must change password after first logon” will be set on the target user after migration with ADMT. The reason for this is because ADMT does not check the target domain’s password policy to see whether the source password is compliant. You can manually deselect this option if you have set the target domain policy with a weaker password policy (ex: password complexity disabled, and less or equal password length, etc). Otherwise the user will be required to change their password immediately after logging into the target domain.  

After the users have been migrated it is necessary to run the Security Translation Wizard from within the ADMT tool against the source domain controller and resource servers (ex: File/Print Servers). This allows the users to access file and print resources in the source domain without error.

After the users have been migrated to the target domain, the next step is to migrate their computer accounts. This is recommended so that the user’s local profile (ex: desktop background wallpaper, files on desktop, mapped network drives, etc, to carry over so that there is minimal impact on the end user.

However, if Windows Firewall is enabled then the ADMT tool may not be able to connect to the machine to translate the user profile.
After resolving that issue, you may run into another error “unable to determine the local path for ADMIN share on the machine” – The solution is to make sure the target admins have sufficient permission on the source computers. For example, you could use group policy to add the new domain admins group into the local administrators group of the source computers. For testing you could manually add it as this blog article suggests:
http://msexchangetips.blogspot.com/2010/08/admt-32-err27674-unable-to-determine.html

ADMT Pushes an agent out to computers to perform the Security Translation activity. To verify that ADMT processes are running, you can use Task Manager to verify that ADMTAgnt.exe and DctAgentServices.exe are in the Processes tab. They will terminate upon completion.  The local working directory for the agent is
%windir%\onepointdomainagent

The agent needs sufficient rights and dynamic TCP RPC ports range between 1024 and 5000 to be open so that it can write back to the ADMT member server in the target domain (c:\windows\ADMT\Logs accessed via the ADMIN$ share).
http://technet.microsoft.com/en-us/library/cc974403(v=ws.10).aspx

There are 3rd party tools that can simplify things by integrating the password synchronization and providing an undo option. The most popular tool is Quest Migration Manager for Active Directory:
http://www.quest.com/migration-manager-for-active-directory/ 

Best practice is to create a test lab and run through all the steps before making any attempts in a production environment.

Resources:

Screenshots of the blog article above as originally posted on my former blog:

http://blogs.catapultsystems.com/IT/archive/2012/12/30/active-directory-migration-toolkit-admt-walkthrough.aspx

ADMT v3.2 Download (free)
http://www.microsoft.com/en-us/download/details.aspx?id=8377

Password Export Server v3.1 x86 Download (free MSFT Tool)
http://www.microsoft.com/en-us/download/details.aspx?id=10370
x64 Download:
http://www.microsoft.com/en-us/download/details.aspx?id=1838

SQL 2008 SP3 (Free PreRequisite)
http://www.microsoft.com/en-us/download/details.aspx?id=27597

ADMT Guide
http://technet.microsoft.com/en-us/library/cc974332(v=ws.10).aspx

Decent Video Walkthrough of the Installation
http://www.youtube.com/v/KHIWlWFf2AM?hd=1

How to Quarantine unauthorized smartphones with Exchange or Office 365

Some organizations have a mobile device policy where they only permit company-owned phones to connect to their email server. They want to prevent employee-owned or rogue devices from establishing an active-sync connection.

Exchange 2010 and Office 365 provide the ability to quarantine phones that attempt to enroll in an active-sync relationship. This permits an administrator to review the device before approving.

The process works very well because the user receives an email letting them know that their device is pending administrator approval. The administrator receives an email letting them know a new device requires approval.

Configuring it is also very simple. Just sign into the Exchange Control Panel (ECP) and click a few boxes.

Note: this setting will apply to all existing phones, so you will need to be prepared to perform a one-time mass approval for existing phones that are already connected. An email will be generated to users that their phone is in quarantine, which might be unsettling to some users, so I recommend sending an email in advance to inform them they can ignore the email. Perhaps there is a way to prevent this behavior from occurring for existing devices and only allow it to occur for new devices, but I have not found that option yet.

After this has been configured, you may want to delegate fine-grained RBAC rights to your mobile phone administrators so that they can approve these devices without having too much additional privs within Exchange.

ActiveSyncDeviceManagementNew-ManagementRole “ActiveSync User Options” –Parent ‘User Options’

New-ManagementRole “ActiveSync Client Access” –Parent ‘Organization Client Access’

Get-ManagementRoleEntry –Identity ‘ActiveSync User Options\*’ | Where {$_.Name –notlike “*activesync*”} | Remove-ManagementRoleEntry –Confirm:$False

Get-ManagementRoleEntry –Identity ‘ActiveSync Client Access\*’ | Where {$_.Name –notlike “*activesync*”} | Remove-ManagementRoleEntry –Confirm:$False

Remove-ManagementRoleEntry ‘ActiveSync Client Access\Set-ActiveSyncOrganizationSettings’
Remove-ManagementRoleEntry ‘ActiveSync Client Access\Set-ActiveSyncDeviceAccessRule’
Remove-ManagementRoleEntry ‘ActiveSync Client Access\Remove-ActiveSyncDeviceAccessRule’
Remove-ManagementRoleEntry ‘ActiveSync Client Access\New-ActiveSyncDeviceAccessRule’

New-RoleGroup ‘ActiveSync Access Admins’ –Roles ‘ActiveSync User Options’, ‘ActiveSync Client Access’

Add-RoleGroupMember “ActiveSync Access Admins” -Member [email protected]

The delegated administrator should then see quarantined devices in the Exchange Control Panel.

The link to my original blog post with pictures is available here:

http://blogs.catapultsystems.com/IT/archive/2012/11/30/how-to-quarantine-unauthorized-smartphones-with-exchange-2010-or-office-365.aspx

Office 365 free busy not working with Exchange 2003

In an Exchange 2003 and Office 365 Hybrid Deployment environment, the Office 365 users are able to view the Free/Busy information of Exchange 2003 users or resources. However, the Exchange 2003 user may not be able to view the Free/Busy information of the Office 365 users or resources unexpectedly.

Many configuration issues may cause this to occur. In the case of my customer, the cause was to change the permission for the free/busy folder from the Default Permission: Author to Editor as described here:

http://blogs.technet.com/b/hot/archive/2012/03/30/an-exchange-server-2003-user-cannot-view-the-free-busy-information-of-office-365-resources-or-users-within-a-hybrid-deployment.aspx

This can be accomplished in the Exchange 2003 System Manager. Right click on folder with the EXTERNAL name and select properties.

Click Client Permissions

Change the Default Permission from Author to Editor

If you are lucky, this will solve the problem for you. If not, there are a few other things to try.

The first issue that you should be aware of is that Outlook Web Access (OWA) cannot view free/busy for a mailbox that resides in Exchange Online.

http://community.office365.com/en-us/wikis/officeapps/558.aspx 

There are some articles that recommend setting the LegacyExchangeDN parameter in mailboxes but I did not have to do that. http://technet.microsoft.com/en-us/library/hh310374.aspx
http://community.office365.com/en-us/wikis/officeapps/558.aspx
http://community.office365.com/en-us/forums/162/t/55245.aspx

There are also articles that recommend hardcoding the targetsharingepr record but I think that was only necessary when Exchange Online was in Beta. For example, they said to run this command
Set-OrganizationRelationship “CompanyABC” -TargetSharingEpr https://mail.companyabc.com/EWS/Exchange.asmx/WSSecurity
http://blogs.technet.com/b/neiljohn/archive/2011/08/15/office-365-hybrid-deployment-exchange-rich-coexistence-sharing-availability-free-busy.aspx
Again, I don’t think that is necessary with the current builds because it is not mentioned in the Exchange Deployment Assistant documentation.

There are also issues with routing group configuration that could cause problems with free/busy. One of our other customers ran into this so you should see whether this impacts you: http://blogs.technet.com/b/messaging_with_communications/archive/2011/09/09/office365-exchange-2003-free-busy-coexistence.aspx

One helpful technique to isolate the issue is to create a user on the Exchange 2010 Hybrid server. If the user can view free/busy for Exchange Online mailboxes, then the issue is isolated to public folder configuration since Exchange 2010 users do not rely on PF for free/busy. However, if a Exchange 2010 mailbox cannot view Exchange online free/busy, then the problem could be with the organizational relationship and autodiscover DNS records.

http://support.microsoft.com/kb/2555008

http://blogs.technet.com/b/hot/archive/2012/03/30/free-busy-information-is-not-being-shared-between-cloud-and-on-premise-accounts-error-code-5037.aspx

Another article I found helpful was to validate that the free/busy folders exist to begin with.

http://support.microsoft.com/kb/2555008

Original blog post with images is available here:

Office 365 free busy not working with Exchange 2003

SQL Transaction Logs filling up disk

This is the most common cause of SQL outages and crashes. The SQL transaction logs multiply until they completely fill up the hard disk, causing the databases to go offline and disrupting service to any application depending upon SQL including mission critical applications like SharePoint, Lync, Project Server, etc.

This is preventable by doing one of two things:

1) Change the database recovery option from Full to Simple*

or

2) Create a transaction log backup job. This can be done using the native SQL backup or a 3rd party backup software. To keep the log from filling up again, schedule log backups frequently.

When the recovery mode for a database is set to Full, then a transaction log backup job must be created in addition to backing up the database itself.

Tips:

1. The backup jobs should be monitored because if they fail for any reason, and the logs continue to grow until they fill up the disk, then SQL will crash along with any application that depends upon SQL.

2. The disk where the transaction logs are stored should be large enough to allow for multiple days of failed backup jobs (just in case your DBA is on vacation and the backups happen to fail for that duration of time, and they are not there to respond to the failed jobs).

3. For performance reasons, the transaction logs should be isolated to their own disks because SQL writes logs sequentially, whereas data written to the MDF file is written with random I/O. This is more relevant for Direct Attach Storage (DAS) and less relevant for a Storage Area Network (SAN) where large amounts of cache reside on the disk controller to offset the write transactions.

4. For high availability reasons, the transaction logs should be isolated to their own disks when databases are set to the Full recovery mode because if they reside on the same disk that hosts the MDF database file, and that disk crashes, then your ability to recover is limited to the last full backup. By having the transaction logs on a separate disk, you give yourself to recovery to the point of failure (steps: restore last full backup, then restore the transaction logs).

5. Your SQL backups should not be stored on the same server as the SQL server itself. If the server crashes, how will you get to your backups? Many companies will use a 3rd party backup software to pull the databases backups off the server, and then simultaneously use the native SQL backups to have a 2nd set of backups reside locally. The benefit is you now have two backup copies of your data, and it is usually faster and more reliable to recover data using the native SQL backups than using 3rd party backup software (this scenario is applicable when the server itself has not crashed and you are simply trying to recover data that was deleted).

*Before changing the database recovery option from Full to Simple, you should understand what the difference is between these two modes because you are choosing to only recover to the point of the last full backup. This may be okay for non-critical data, and only the custodian of the data can make that decision.  See this MSDN article for more information.

It is worth noting that backing up the transaction log will not reduce the physical size of the .LDF file (this is a common misconception). The backup will only free up the internal free space inside the LDF file. To reduce the physical size of the LDF file you must shrink the transaction log.  Normally this is not something you

If you do not have a DBA to monitor and respond to SQL backups then consider hiring a company who will monitor this for you.

How to verify that SQL Server is using the best practice NTFS Cluster Size

 

When planning the NTFS cluster size for a new volume the general consensus with the latest versions of Microsoft SQL Server is to use 64k.

So if you are auditing an existing installation, the following command-line utility can be used to query the NTFS file system to return whether this best practice was followed.

C:\Windows\system32>fsutil fsinfo ntfsinfo d:

NTFS Volume Serial Number :       0x32fe73cbfe7385bf
Version :                         3.1
Number Sectors :                  0x000000001beb87ff
Total Clusters :                  0x000000000037d70f
Free Clusters  :                  0x000000000037d1a9
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Cluster :               65536
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000010000
Mft Start Lcn  :                  0x000000000000c000
Mft2 Start Lcn :                  0x0000000000000001
Mft Zone Start :                  0x000000000000c000
Mft Zone End   :                  0x000000000000cca0
RM Identifier:        582D805C-3F5D-11E0-90B5-0017A4770826

 

You will notice that the result returned is slightly higher than 64k, and rounds up to 66k, but that is due to how storage is calculated and is the expected result when 64k is chosen from the format menu.

What defines success for IT Operations?

 

What must an IT Department do to be successful? The Operations Department within IT requires diligence across many technology disciplines. Here are some suggestions for IT Operations Management, that if met, will bring IT Operations closer to success.

  1. When the latest security patches have been applied to all servers.
  2. When all hardware is operational. There are no known failed components in the infrastructure. A streamlined process is in place to detect and respond to failed components. We also monitor the life cycle of equipment to make sure that critical systems are always under warranty.
  3. When all critical devices are monitored 24/7 IT staff is notified when a failure event occurs.
  4. When Line of business applications have sufficient bandwidth to perform their role. A monitoring solution should alert IT when network traffic exceeds 70% – because WAN links become saturated at this level and TCP retransmissions will occur, causing latency within applications.
  5. When servers have sufficient hard drive space to perform their role.
  6. When servers and workstations are protected from viruses, worms and advanced persistent threats (APTs).
  7. When servers are protected from data loss. For example, Exchange Native Protection does not protect you if all copies of the DAG databases are taken offline by an external hacker, an internal disgruntled admin, or a worm.
  8. When servers are fast or adequately responsive to end user requests. Using something like synthetic transactions are helpful to measure performance against previous accepted baselines.
  9. When servers have sufficient capacity to not only meet existing need, but to handle data and transactional growth for the next twelve months. This helps you be less reactive when problems occur. Using Azure IaaS helps because of the Autoscale feature.
  10. When all servers are provisioned with the lowest surface attack area possible.
  11. When IT can respond to a request to provision a server in minutes.  
  12. When IT discusses and then tests changes before implementing them in a production environment. Using Virtualization can help reduce the cost of implementing change management.
  13. When the most critical systems are clustered.
  14. When the IT staff has a good work/life balance. For example, creating a single weekend where all patches or maintenance is performed can reduce turnover compared to allowing IT Operations staff to work most nights and weekends.

 

Please leave a comment below if you have any other suggestions to add to this list.

Locked out of SQL?

I was recently found myself locked of a demo environment by removing the builtin\administrators group from SQL Server’s sysadmin role – a good practice by the way. I needed to get into SQL Management Studio.

This particular installation of SQL was running under the localsystem account.  Using PSEXEC from sysinternals, I was able to launch a command prompt running as localsystem: psexec –i –s cmd.exe

I was then able to run SQL Management Studio (by right-clicking on the shortcut and pasting the path into the elevated cmd shell) and add my domain account back in as sysadmin =)