BlobCache issues with time difference between SharePoint WFE and SQL

We recently ran into an interesting issue where when a user uploaded an image into SharePoint and then tried to view that image, they would receive an error.  For the rest of the day when viewing the image they would continue to get the error but the image would work fine for others.  If the user cleared their browser cache then the image would start working for them.  Also, after uploading an image, if the user waited for a few minutes before viewing the image it would work as expected.  The error the end user saw was “An Unexpected error has occurred” but by looking at the real error revealed the following:

Message: Specified argument was out of the range of valid values.
Parameter name: utcDate
Stack Trace:    at System.Web.HttpCachePolicy.UtcSetLastModified(DateTime utcDate)   at System.Web.HttpCachePolicy.SetLastModified(DateTime date)   at Microsoft.SharePoint.Publishing.BlobCache.<>c__DisplayClass42.<SendCachedFile>b__41()   at Microsoft.SharePoint.SPSecurity.<>c__DisplayClass4.<RunWithElevatedPrivileges>b__2()  at  Microsoft.SharePoint.Utilities.SecurityContext.RunAsProcess(CodeToRunElevated secureCode)   at Microsoft.SharePoint.SPSecurity.RunWithElevatedPrivileges(WaitCallback secureCode, Object param)   at Microsoft.SharePoint.SPSecurity.RunWithElevatedPrivileges(CodeToRunElevated secureCode)   at Microsoft.SharePoint.Publishing.BlobCache.SendCachedFile(HttpContext context, BlobCacheEntry target, SPUserToken currentUserToken, SiteEntry currentSiteEntry)   at Microsoft.SharePoint.Publishing.BlobCache.HandleCachedFile(HttpContext context, BlobCacheEntry target, Boolean anonymousUser, SiteEntry currentSiteEntry)   at Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(Object sender, EventArgs e, Boolean preAuthenticate)   at Microsoft.SharePoint.Publishing.PublishingHttpModule.AuthorizeRequestHandler(Object sender, EventArgs ea)   at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Looking at the error told me a few things.  One, this seemed to be time related and two, this seems to only affect files that are stored in the blob cache.  That also explained why clearing the browser cache worked to “fix” the issue for a user getting the error and why other users did not receive the error.  Because of the way that SharePoint’s BlobCache optimizes things, it will only send the image to the browser if the image has changed assuming the browser has a cached version.  Since the first time the user viewed the image generated an error, then every time after that the browser was just displaying a cached version or the error.  This was also apparent because the correlation id was the same GUID each time they viewed the image.  In SharePoint, each request gets it’s own guid which is the correlation id so they should never be the same between requests.

I spent some time reflectoring the code and the best I could determine was that it was sending the last modified date of the image in the response header and for some reason that date was in the future on the WFE’s where the blobcache was running and thus the error.  I at first thought that maybe the client’s time was causing this when uploading an image through explorer view but that didn’t seem to affect it.  To be honest, I was stumped for a little while.  Then that evening I was out for a run and it hit me, I bet the SQL box’s time was ahead of the WFE’s.

I tested this theory by viewing the time on the SQL box and the WFE and since SQL was ahead by over a minute I waited until they were at different minute values and uploaded a file.  The last modified time of the file showing in SharePoint was actually the time on the SQL server and not on the WFE.  At this instant the last modified time of the file was in the future on the WFE.  So it seems that the stored procedure that SP calls to add a document to SharePoint calls getutcdate() and thus uses the SQL server time.

I got our infrastructure guys involved and had them look into the time issues.  Once they got those resolved our image issues went away.  I know normally all computers in an active directory domain have the same time but in this instance, we are migrating to a new AD environment and our SQL boxes were on one domain and the WFE’s were on another.

Update 5/3/2013

After talking with our infrastructure team, it turns out the issue was around our domain controllers being virtual.  This is an issue because by default VM’s get their time from the Hyper-V host instead of the PDC emulator on the domain as they should be.  Basically, they needed to uncheck the box which says that the VM get’s their time from the Hyper-V host.

Update 5/30/2013

It seems like this issue has gotten a lot better but it still pops up every once in a while.  My infrastructure team pointed me to this blog article:  http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/11/19/time-synchronization-in-hyper-v.aspx.  It states that “we put logic in our integration service that will not change the time if the virtual machine is more than 5 seconds ahead of the physical computer”.  Has anyone else come across this issue and resolved it completely?

Speaking at SharePoint Saturday Charlotte

I am speaking at the Charlotte SharePoint Saturday event tomorrow.  Below is my abstract and slide deck if you are interested.  If you are a reader of my blog and are going to the event, please stop by and say hello.

Session Title:  Skanska’s Partner Portal and External Document Management System

Session Summary:  Skanska needed a single, consistent place for all project teams to access the important tools they needed on a day to day basis.  A project document management system which allowed internal and external collaboration and an easy to use landing page were keys to success.

Session Topic:  Allowing users to find tools and documents they need to get their job done can be difficult in an enterprise where every project team has different needs and requirements.  Skanska accomplished this by creating a site template which had easy to use links to different enterprise tools and created an effective project document management system where both external and internal parties contributed content.  We will be demoing an early release product while discussing the following pieces of functionality:

  • Managed Metadata
  • Content types and the syndication hub
  • Content Organizer
  • SkyDrive Pro
  • Email enabled lists and libraries
  • Custom Solutions
    • Metadata Based Security
    • Batch Edit (Codeplex)
    • Download as a Zip

Here is the link to the presentation.

Resources for setting up Kerberos Authentication in SharePoint

I recently had a colleague ask me about Kerberos authentication in SharePoint.  They were attempting to get around the infamous NTLM double-hop issue.  Below is the list of resources which I use when getting Kerberos working in SharePoint.

SharePoint 2010 Kerberos document

  • Word Document with step by step instructions on how to setup Kerberos in SharePoint and SQL

Kerbtray

  • System Tray Utility that displays your current Kerberos tickets.  It helps to make sure that you are actually logging in using Kerberos and not ntlm.

Delegconfig

Fiddler2

  • This proxy based packet sniffer is absolutely fantastic for troubleshooting authentication and other SharePoint/website issues.  It will also decrypt HTTPS traffic if you enable it in the settings and add the cert.

Wireshark

  • If fiddler2 isn’t showing you everything this low level packet sniffer will show you everything but it NOT decrypt HTTPS traffic.

It may be included in the documentation above but if you are publishing SharePoint externally through your firewall you will need to open a few ports on your firewall to point to your AD domain controllers.

Also, with SharePoint 2013, claims based authentication is the preferred authentication mechanism.  SP 2013 does include a nice authentication system called windows claims which will work with NTLM and Kerberos but if you are running any custom code in SharePoint, you might need to change it to work correctly with claims auth.

Process Drop Off Library Items Programmatically

I was working on a solution at work where they wanted to tag and approve all documents submitted to the site.  SharePoint 2010’s Content Organizer seemed like it would meet their needs.  I turned on approval for the drop off library and made the default content type be document.  That way when a external user submitted a document, they would just hit save and then the internal user who is approving it, would change it’s content type, fill out the metadata, and approve it.  Then the content organizer routing kicks in and puts it in the correct location.

This all worked great when working with a single document.  What didn’t work well is if you have several documents that needed to have the same metadata applied, approved, and routed.  I turned to Batch Edit on CodePlex for this piece.  I had used it before and with a few tweaks it worked great.  I ran into two issues though in this scenario.

  1. Batch Edit didn’t allow me to edit the content types, so we still had to go to each item individually to change it’s content type
  2. Batch Edit had no way to approve the items
  3. When saving an item programmatically in SharePoint, the routing rules did not process the item until the timer job ran (typically set to run at night once a day).

I wrote some code to fix #1 which was pretty straight forward and I can include the code here on the blog if someone leaves a comment saying they want it.  #2 involved adding a second button to the page that says save and approve and a little extra code on the save to check if that button was pressed and approve the item.  #3 was a lot harder than I thought it would be to get resolved.

Initially I was using OfficialFileCore.SubmitFile().  After realizing there were some additional permissions needed (Records Center Web Service Submitters SP group) I actually got it working.  It seems that OfficialFileCore.Submit() is mainly used to put a file INTO the drop off library and run the routing rules against that file.  The issue was that the files I needed to push through were already in the drop off library, and when using the OfficialFileCore.SubmitFile(), it made an extra copy of those files in the drop off library with a random name that I would somehow need to figure out and delete.  This is not exactly what I wanted and seemed overly complicated.

Remember that timerjob that runs nightly to process items in the drop off library and send emails out about items missing metadata, it’s called the Content Organizer Processing job  (Microsoft.Office.RecordsManagement.Internal.RecordsRepositoryJobDefinition).  I reflectored it and found it was calling an internal method named ProcessWaitingFiles which was looping through all of the files and calling another internal method named ProcessWaitingFile(SPListItem, bool).  This second one was really interesting to me since it only required the SPListItem needing to process and a bool which is whether or not moderation is enabled on the library.  Using reflection I was able to call this method and my files in the drop off library got routed just like I was doing them individually.

Here’s the code:

private void ProcessDropOffLibraryItem(SPWeb web, SPListItem item, bool listRequiresModeration)
{
	EcmDocumentRoutingWeb routingWeb = new EcmDocumentRoutingWeb(web);
	Type clsSPRequest = typeof(EcmDocumentRoutingWeb).Assembly.GetType("Microsoft.Office.RecordsManagement.RecordsRepository.EcmDocumentRoutingWeb", true);
	System.Reflection.MethodInfo processWaitingFile = clsSPRequest.GetMethod("ProcessWaitingFile", BindingFlags.Instance | BindingFlags.NonPublic);

	try
	{
		object result = processWaitingFile.Invoke(routingWeb, new object[] { item, listRequiresModeration });
	}
	catch (System.Reflection.TargetInvocationException ex)
	{
		string message = string.Format("Unable to route file {0}: {1}", item.File.ServerRelativeUrl, ex.Message);
		SPCriticalTraceCounter.AddDataToScope(67, "SP2010 BatchEdit", 1, message + ": " + ex.StackTrace);
	}
}

SharePoint DNS name considerations when using anonymous access

Last year we migrated our external SharePoint farm from SharePoint 2007 to SharePoint 2010. This envrionment has anonymous access turned on for the web application and certain sites have anonymous turned on. We’ve been having issues when accessing these sites with anonymous turned on where it would show us as being signed out for one request and then signed in for the next. See the video here to see what I’m talking about (please ingore the fact that the picture is not working, that’s an unrelated issue…).

We ended up opening a case with Microsoft and they were able to finally figured out what’s going on. It’s a cookie conflict problem with the WSS_KeepSessionAuthenticated cookie. This is the cookie that get’s set to let SharePoint know that you have already authenticated when on an anonymous site. The reason we have a conflict was because of how our dns names for are SharePoint sites are setup. We have a SP 2007 environment at https://company.com which sets the cookie to be 443 (https port #). Our anonymous accessible SP 2010 environment is at https://partners.company.com which is the one with the issues. This environment sets the cookie to be a guid.

Since partners.company.com is a sub-domain of company.com, both the guid and the 443 get sent to partners.company.com for the same cookie name because of sub-domain cookie sharing. I think the SP code in SP 2010 is only looking at the first value when checking for the cookie in the request and sometimes 443 is first and other times the guid is first in the cookie’s namevaluecollection. Thus it shows a user as being logged out when the 443 happens to be first in the namevaluecollection and logged in when the guid is first.

This never showed up before with us and SP 2007 because we used SSL everywhere so the cookie was being set to 443 for all of the SharePoint web applications. Now that SP 2010 uses the web application guid I see this being a bigger problem. Our ultimate solution is to change the dns names for our SharePoint sites. Until then, I was playing around with an HttpModule to “fix” the issue on our SP 2010 environment. Unfortunately, when I tried to remove the cookie that was not a guid, this also causes a race condition with the SharePoint HttpModule which checks the cookie and caused the same behavior even when the correct cookies were being sent. So the thought was to create a HttpModule for SP 2007 that would remove the cookie after it was being set because we aren’t using anonymous in our SP 2007 environment. Unfortunately, this didn’t work either because we use blob caching which exited out of the pipeline before I could remove the cookie. So my final code, which is below, was to always expire the cookie and hopefully, a later request will expire any that the blob cache sets.

using System;
using System.Collections.Generic;
using System.Text;
using System.Web;

public class RemoveAuthCookie : IHttpModule
{

	/// <summary></summary>
	/// Initializes the module, and registers for application events.
	/// 
	public void Init(HttpApplication context)
	{
		context.PreRequestHandlerExecute += new EventHandler(context_PreRequestHandlerExecute);
		context.PostAuthenticateRequest += new EventHandler(context_PostAuthenticateRequest);
	}

	void context_PostAuthenticateRequest(object sender, EventArgs e)
	{
		HttpApplication app = sender as HttpApplication;
		RemoveCookie(app);
	}

	void context_PreRequestHandlerExecute(object sender, EventArgs e)
	{
		HttpApplication app = sender as HttpApplication;
		RemoveCookie(app);
	}

	void RemoveCookie(HttpApplication app)
	{
		if (app == null) return;
		if (app.Response == null) return;
		if (app.Response.Cookies == null) return;

		try
		{

			if (app.Response.Cookies[&quot;WSS_KeepSessionAuthenticated&quot;] == null)
			{
				//WSS_KeepSessionAuthenticated cookie not found
			}
			else
			{

				app.Response.Cookies[&quot;WSS_KeepSessionAuthenticated&quot;].Expires = DateTime.Now.AddDays(-1D);
				//WSS_KeepSessionAuthenticated cookie EXPIRED
			}

			if (app.Response.Cookies[&quot;MSOWebPartPage_AnonymousAccessCookie&quot;] == null)
			{
				//MSOWebPartPage_AnonymousAccessCookie cookie not found
			}
			else
			{
				app.Response.Cookies[&quot;MSOWebPartPage_AnonymousAccessCookie&quot;].Expires = DateTime.Now.AddDays(-1D);
				//MSOWebPartPage_AnonymousAccessCookie cookie EXPIRED
			}                
		}
		catch { }
	}

	/// <summary></summary>
	/// Disposes of the resources (other than memory) used by the module.
	/// 
	public void Dispose()
	{
	}
}

So, this is not a fun situation to be in, so please keep this in mind when you are architecting your SharePoint solutions, a little thing such as the url used to access the site can be a hard thing to change later.