We recently ran into an interesting issue where when a user uploaded an image into SharePoint and then tried to view that image, they would receive an error. For the rest of the day when viewing the image they would continue to get the error but the image would work fine for others. If the user cleared their browser cache then the image would start working for them. Also, after uploading an image, if the user waited for a few minutes before viewing the image it would work as expected. The error the end user saw was “An Unexpected error has occurred” but by looking at the real error revealed the following:
Message: Specified argument was out of the range of valid values.
Parameter name: utcDate
Stack Trace: at System.Web.HttpCachePolicy.UtcSetLastModified(DateTime utcDate) at System.Web.HttpCachePolicy.SetLastModified(DateTime date) at Microsoft.SharePoint.Publishing.BlobCache.<>c__DisplayClass42.<SendCachedFile>b__41() at Microsoft.SharePoint.SPSecurity.<>c__DisplayClass4.<RunWithElevatedPrivileges>b__2() at Microsoft.SharePoint.Utilities.SecurityContext.RunAsProcess(CodeToRunElevated secureCode) at Microsoft.SharePoint.SPSecurity.RunWithElevatedPrivileges(WaitCallback secureCode, Object param) at Microsoft.SharePoint.SPSecurity.RunWithElevatedPrivileges(CodeToRunElevated secureCode) at Microsoft.SharePoint.Publishing.BlobCache.SendCachedFile(HttpContext context, BlobCacheEntry target, SPUserToken currentUserToken, SiteEntry currentSiteEntry) at Microsoft.SharePoint.Publishing.BlobCache.HandleCachedFile(HttpContext context, BlobCacheEntry target, Boolean anonymousUser, SiteEntry currentSiteEntry) at Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(Object sender, EventArgs e, Boolean preAuthenticate) at Microsoft.SharePoint.Publishing.PublishingHttpModule.AuthorizeRequestHandler(Object sender, EventArgs ea) at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Looking at the error told me a few things. One, this seemed to be time related and two, this seems to only affect files that are stored in the blob cache. That also explained why clearing the browser cache worked to “fix” the issue for a user getting the error and why other users did not receive the error. Because of the way that SharePoint’s BlobCache optimizes things, it will only send the image to the browser if the image has changed assuming the browser has a cached version. Since the first time the user viewed the image generated an error, then every time after that the browser was just displaying a cached version or the error. This was also apparent because the correlation id was the same GUID each time they viewed the image. In SharePoint, each request gets it’s own guid which is the correlation id so they should never be the same between requests.
I spent some time reflectoring the code and the best I could determine was that it was sending the last modified date of the image in the response header and for some reason that date was in the future on the WFE’s where the blobcache was running and thus the error. I at first thought that maybe the client’s time was causing this when uploading an image through explorer view but that didn’t seem to affect it. To be honest, I was stumped for a little while. Then that evening I was out for a run and it hit me, I bet the SQL box’s time was ahead of the WFE’s.
I tested this theory by viewing the time on the SQL box and the WFE and since SQL was ahead by over a minute I waited until they were at different minute values and uploaded a file. The last modified time of the file showing in SharePoint was actually the time on the SQL server and not on the WFE. At this instant the last modified time of the file was in the future on the WFE. So it seems that the stored procedure that SP calls to add a document to SharePoint calls getutcdate() and thus uses the SQL server time.
I got our infrastructure guys involved and had them look into the time issues. Once they got those resolved our image issues went away. I know normally all computers in an active directory domain have the same time but in this instance, we are migrating to a new AD environment and our SQL boxes were on one domain and the WFE’s were on another.
Update 5/3/2013
After talking with our infrastructure team, it turns out the issue was around our domain controllers being virtual. This is an issue because by default VM’s get their time from the Hyper-V host instead of the PDC emulator on the domain as they should be. Basically, they needed to uncheck the box which says that the VM get’s their time from the Hyper-V host.
Update 5/30/2013
It seems like this issue has gotten a lot better but it still pops up every once in a while. My infrastructure team pointed me to this blog article: http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/11/19/time-synchronization-in-hyper-v.aspx. It states that “we put logic in our integration service that will not change the time if the virtual machine is more than 5 seconds ahead of the physical computer”. Has anyone else come across this issue and resolved it completely?