Unable to Manually Start Workflows

Recently, after we finished migrating our large external farm from SharePoint 2010 to SharePoint 2013, I started hearing complaints that some user’s could not manually start workflows.  They would either get a 403 Forbidden or get the “Sorry, you don’t have access to this page” message.

2014-09-16_8-57-02

The stack trace on the back end produced the following:

Microsoft.SharePoint.Utilities.SPUtility.HandleAccessDenied(HttpContext context) at Microsoft.SharePoint.Utilities.SPUtility.HandleAccessDenied(Exception ex) at Microsoft.SharePoint.SPSecurableObject.CheckPermissions(SPBasePermissions permissionMask) at Microsoft.SharePoint.WorkflowServices.StoreSubscriptionService.EnumerateSubscriptionsByEventSource(Guid eventSourceId) at Microsoft.SharePoint.WorkflowServices.ApplicationPages.WorkflowPage.ConstructStartArray() at Microsoft.SharePoint.WorkflowServices.ApplicationPages.WorkflowPage.OnLoad(EventArgs e)

Figuring out the problem just took a little investigation.  I recently used this as a Demo for a SharePoint session I did recently at my local SharePoint Saturday.  From that session I have a video of the issue and how I determined the cause.  Skip to about 2:15 for the start of the issue and investigation.  Click here to see the video

So basically the new SharePoint 2013 Workflow Manager is checking for Contribute access at the Site (SPWeb) scope when determining if someone should see the list of workflows for a list (see first line of the method in the code below).

// Microsoft.SharePoint.WorkflowServices.StoreSubscriptionService
public override WorkflowSubscriptionCollection EnumerateSubscriptionsByEventSource(Guid eventSourceId)
{
	this.context.Web.CheckPermissions(SPBasePermissions.EditListItems);
	WorkflowStore workflowStore = new WorkflowStore(this.context.Web);
	eventSourceId = StoreSubscriptionService.ConvertToGuidToken(eventSourceId, this.context.Web);
	WorkflowFile[] files = workflowStore.QueryWithGuid("0x0100AA27A923036E459D9EF0D18BBD0B9587", StoragePublishState.Unchanged, "WSEventSourceGUID", eventSourceId);
	return this.ConvertToWorkflowSubscriptionCollection(files);
}

So in our scenario, our users had read access to the site but contribute access on the list.  This should allow them to start the workflow and it did in SP 2010.  I currently have a Design Change Request (DCR) open with MS regarding this issue so hopefully it will be fixed soon.  Once I hear more, I will update this post.  In the meantime there seems to be two workarounds.

Workaround 1:  Earlier in the code I determined that if you DON’T setup and connect a SharePoint 2013 Workflow Manager to the farm, then this code will never run and thus it will work just like it did in SP 2010.  Of course the issue with this workaround is you can’t have any SP 2013 workflows.

Workaround 2:  Basically give everyone that needs to manually start workflows contribute access to the site, and then break inheritance to all lists and libraries where the user DOESN’T need contribute access and remove their contribute access.

We ended up implementing the second workaround above which sucks for my users.  I am hoping this is fixed soon as I have also seen other people complaining about it (http://sharepoint.stackexchange.com/questions/115311/manually-start-sharepoint-2010-workflow-in-sharepoint-2013-farm/)

SharePoint 2013 conflicts with custom site definition

I was updating one of our custom Site Definitions from SharePoint 2010 to SharePoint 2013 recently and everything was going good until I tried to create a site from the updated definition.  I kept getting the error:

Microsoft.SharePoint.SPException: The template you have 
chosen is invalid or cannot be found.

Searching on the internet told me the most common cause of this error was a conflicting ID for the template.  I was pretty sure this couldn’t be the case since we followed the instructions here:  http://msdn.microsoft.com/en-us/library/office/ms454677(v=office.14).aspx which stated:

Change the ID attribute of the Template element to a 
value of 10000 or more. This ensures that your ID will 
not conflict with future site definitions produced by 
Microsoft. If there are other custom site definitions 
on your target farm, make sure that each one has a 
unique ID.

and we had given our template an ID of 10000.  I had previously updated another custom site definition of ours with an ID of 10001 without any issues.

So I decided to give a quick search in the SharePointRoot\template\1033\XML folder for anything that contains ID=”10000″.  Sure enough I found two site definitions, mine and a new one that comes with SharePoint 2013 called the Academic Library.

I found this article:  http://social.technet.microsoft.com/wiki/contents/articles/20149.sharepoint-2013-default-site-templates.aspx which discusses the new site definitions that come with SharePoint 2013.  You can see it listed there as:

ID Name Title
10000 DOCMARKETPLACESITE#0 Academic Library
The Academic Library template provides a rich view and consumption experience for published content and management. Authors populate metadata and apply rules at the time of publishing, such as description, licensing, and optional rights management.(IRM). Visitors of the site can search or browse published titles and add authorized selections to their collection to consume, subject to the rights and rules applied by the author. The site provides an IRM-capable document library, a publishing mechanism for authors to publish documents, detailed views for each document, a check-out mechanism, and related search capabilities.

So it would seem that if you followed the recommendations for SharePoint 2010 and used an ID of 10000 for your custom site definition, then when you try to go to SharePoint 2013, your site definition won’t work.

This is an issue because there is not a way to change the template ID of a site after it’s been created through the API.  Fortunately, I have found 2 ways to get around this.  One of them I haven’t fully run to ground and verified that it works but the concept seems valid and the other isn’t supported by Microsoft.

Option 1:

This option I got the idea from this blog:  http://iknowsharepoint2007.blogspot.com/2010/02/changing-sharepoint-site-definition.html.  The basic idea is you export your site without using compression, change some xml configuration files so they use your new ID, and then import it back into your SharePoint environment.  You might not even have to change any xml files since that post was for SP 2007.  If someone tries this method, please comment below, otherwise if I find time to give it a try, i’ll update with my findings here.

Option 2:

This involved editing data directly in the SharePoint content database.  This is unsupported by Microsoft but worked for me during my testing.  Go to the AllWebs table in the content database for the site in question.  You should be able to find your site listed by scanning the FullUrl column.  Once found, write a SQL update statement which updates the WebTemplate column to the new site definition ID.  This was an instant fix for me as I was able to immediately start the upgrade to SharePoint 2013 for that site collection.

It sucks that Microsoft told us they would never use Site Definitions ID of 10000 or more and then they go back on their word with SharePoint 2013.  Anyways, I hope this helps someone else out trying to complete their migration to SharePoint 2013.  If anyone finds a better way to fix the issue, please post in the comments below.  Thanks.

BlobCache issues with time difference between SharePoint WFE and SQL

We recently ran into an interesting issue where when a user uploaded an image into SharePoint and then tried to view that image, they would receive an error.  For the rest of the day when viewing the image they would continue to get the error but the image would work fine for others.  If the user cleared their browser cache then the image would start working for them.  Also, after uploading an image, if the user waited for a few minutes before viewing the image it would work as expected.  The error the end user saw was “An Unexpected error has occurred” but by looking at the real error revealed the following:

Message: Specified argument was out of the range of valid values.
Parameter name: utcDate
Stack Trace:    at System.Web.HttpCachePolicy.UtcSetLastModified(DateTime utcDate)   at System.Web.HttpCachePolicy.SetLastModified(DateTime date)   at Microsoft.SharePoint.Publishing.BlobCache.<>c__DisplayClass42.<SendCachedFile>b__41()   at Microsoft.SharePoint.SPSecurity.<>c__DisplayClass4.<RunWithElevatedPrivileges>b__2()  at  Microsoft.SharePoint.Utilities.SecurityContext.RunAsProcess(CodeToRunElevated secureCode)   at Microsoft.SharePoint.SPSecurity.RunWithElevatedPrivileges(WaitCallback secureCode, Object param)   at Microsoft.SharePoint.SPSecurity.RunWithElevatedPrivileges(CodeToRunElevated secureCode)   at Microsoft.SharePoint.Publishing.BlobCache.SendCachedFile(HttpContext context, BlobCacheEntry target, SPUserToken currentUserToken, SiteEntry currentSiteEntry)   at Microsoft.SharePoint.Publishing.BlobCache.HandleCachedFile(HttpContext context, BlobCacheEntry target, Boolean anonymousUser, SiteEntry currentSiteEntry)   at Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(Object sender, EventArgs e, Boolean preAuthenticate)   at Microsoft.SharePoint.Publishing.PublishingHttpModule.AuthorizeRequestHandler(Object sender, EventArgs ea)   at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Looking at the error told me a few things.  One, this seemed to be time related and two, this seems to only affect files that are stored in the blob cache.  That also explained why clearing the browser cache worked to “fix” the issue for a user getting the error and why other users did not receive the error.  Because of the way that SharePoint’s BlobCache optimizes things, it will only send the image to the browser if the image has changed assuming the browser has a cached version.  Since the first time the user viewed the image generated an error, then every time after that the browser was just displaying a cached version or the error.  This was also apparent because the correlation id was the same GUID each time they viewed the image.  In SharePoint, each request gets it’s own guid which is the correlation id so they should never be the same between requests.

I spent some time reflectoring the code and the best I could determine was that it was sending the last modified date of the image in the response header and for some reason that date was in the future on the WFE’s where the blobcache was running and thus the error.  I at first thought that maybe the client’s time was causing this when uploading an image through explorer view but that didn’t seem to affect it.  To be honest, I was stumped for a little while.  Then that evening I was out for a run and it hit me, I bet the SQL box’s time was ahead of the WFE’s.

I tested this theory by viewing the time on the SQL box and the WFE and since SQL was ahead by over a minute I waited until they were at different minute values and uploaded a file.  The last modified time of the file showing in SharePoint was actually the time on the SQL server and not on the WFE.  At this instant the last modified time of the file was in the future on the WFE.  So it seems that the stored procedure that SP calls to add a document to SharePoint calls getutcdate() and thus uses the SQL server time.

I got our infrastructure guys involved and had them look into the time issues.  Once they got those resolved our image issues went away.  I know normally all computers in an active directory domain have the same time but in this instance, we are migrating to a new AD environment and our SQL boxes were on one domain and the WFE’s were on another.

Update 5/3/2013

After talking with our infrastructure team, it turns out the issue was around our domain controllers being virtual.  This is an issue because by default VM’s get their time from the Hyper-V host instead of the PDC emulator on the domain as they should be.  Basically, they needed to uncheck the box which says that the VM get’s their time from the Hyper-V host.

Update 5/30/2013

It seems like this issue has gotten a lot better but it still pops up every once in a while.  My infrastructure team pointed me to this blog article:  http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/11/19/time-synchronization-in-hyper-v.aspx.  It states that “we put logic in our integration service that will not change the time if the virtual machine is more than 5 seconds ahead of the physical computer”.  Has anyone else come across this issue and resolved it completely?