Are your Models killing performance?

Recently I blogged about my design pattern for Sitecore presentation components: Item->Repository->ViewModel->Controller->View. For brevity, let’s refer to it as Item-Repository-ViewModel.

In that post I admonished users to pay careful attention to code responsible for generating ViewModels. This post will reveal some common mistakes around Item retrieval and ViewModel generation and discuss how to get maximum efficiency out of your Repository and Model building layers.

Model Generation

Let’s have a look at the most common ViewModel strategies in Sitecore development.

Item Facades

An Item Facade is any Object that contains or descends from Sitecore.Data.Item. (CustomItem is a good example.) Until recently, Item Facades were a very popular construct in Sitecore development, and are not without advantages:

  • Base classes provided by Sitecore allow for easy extension and interoperability with other aspects of the Sitecore API.
  • Significant Item field values are exposed through properties, allowing for as-you-type code validation.
  • In “Custom Item” derived objects, Properties are not populated until the value is requested, which gives better performance.

Technically objects like this violate the spirit of Item-Repository-ViewModel. They expose a significant footprint of the Sitecore API to your Views, thus they are no longer considered best-practice.

ContentSearch API

If you’re using Content Search to retrieve data for display, you should also be taking advantage of the POCO hydration facilities available in the ContentSearch API. The IQueryable<T> style interface allows you to skip any intermediate Sitecore objects and hydrate your ViewModels directly from the Sitecore API. However, developers without an excellent working knowledge of the Content Search API and/or some understanding of the power of the Solr server that backs up this API can cause all kinds of performance problems.

Boring Old Property Mapping

Let’s assume you’ve adopted the concept that Views should have discrete ViewModels and not directly interact with Items. To keep things simple, let’s also assume you are using naked Sitecore, no ORM kit like Glass or Synthesis.

Chances are your code looks something like this:

var model = new MyModel
{
Property1 = myItem.Fields["FieldName1"],
Property2 = myItem.Fields["FieldName2"].Value,
Property3 = ((LinkField)myItem.Fields["FieldName3"]).Url,
Property4 = FieldRenderer.Render(myItem, "FieldName4")
// etc...
}

This works, although tedious and prone to typos. It’s surprisingly lightweight and therefore should get good performance. You only access the Item Fields you need, and explicitly retrieve relevant Field properties directly.

ORM Frameworks

Developers that do property mapping by hand will eventually tire of it and begin looking for quality of life improvement. Frameworks like Constellation ModelMapper, Glass Mapper and Synthesis can remove uninteresting concerns by handling the mapping internally. Depending on the implementation of the ORM framework, you may inadvertently introduce performance problems or violate the separation of Item and View concerns:

  • ORM frameworks by nature rely on .NET Reflection technology to figure out how to assign Item field values to Model properties. This is a process that involves lots of little loops and can cause performance problems.
  • Depending upon the implementation of the ORM, you may actually be implementing an Item Facade strategy. Know your evil wizards!
  • Even if your ORM is completely “itemless”, pay attention to how it resolves Property values. Most ORMs are two-way: you have full read & write access to Item Field values. This flexibility introduces additional overhead on Model creation, since your Model is likely proxied by a dynamic object that has a significant amount of change-event wiring attached to it.

A Note about Code Generation

Developers who don’t like mapping properties also are likely to not like generating View Models, and may rely on code generation tools to produce models based on Item Templates. Be wary:

  • Automatically generated Models tend to lack the fine-grained scope of a true ViewModel, containing many more properties than are actually required. Aside from violating principles, each extra field contributes to lost performance
  • The runtime resolution of a ViewModel instance from an Item can be very slow depending upon the implementation. In some cases, the ORM compensates by generating mapping tables on app startup, but this just aggregates all that time into Sitecore’s already lengthy startup process.
  • In a Helix-style environment, Model generation based on Templates and Inheritance may cause Feature separation headaches and namespace collisions. These may not affect system performance, but chasing these issues down will impact your developer productivity.

Item Retrieval

Repositories in our pattern are responsible for generating ViewModels from Items. While we’ve discussed some performance pitfalls in models themselves, without question the biggest Repository performance hit comes from poor decisions when retrieving Items from Sitecore.

XPath Pitfalls

While the trend is to use the ContentSearch API, sometimes it’s more legible to rely on Sitecore’s older Database object’s Item retrieval methods. When you consider that the RenderingContext object provides you with access to your Rendering’s Datasource Item as well as the Page’s Item, you have a logical stepping off point for querying the Sitecore content tree using XPath. It’s important to be able to identify performance-sapping queries.

Example

Solving XPath query problems requires some understanding of the requirements that drove the XPath creation in the first place. Let’s look at a not-unrealistic scenario:

query = "//sites/*[@@key='somesitename']/*[@@templateName='News List']/*[@@templateName='News Folder']/*[@@templateName='News Page']";

In the example above, the developer clearly wants all the News Page Items stored in one or more News Lists, but only for a named Site.

Avoid “//” like The Plague

The “//” path expression will force Sitecore to evaluate each and every Item below the query’s starting node. When Sitecore processes the XPath query, the search first goes to the bottom of each branch rather than evaluating all Items at a given tree level. This is not only extremely inefficient, but it returns Items in an order that is seldom useful, forcing yet another organizing loop on the result set. To eliminate this performance-draining operation we can replace “//” with an absolute path:

query = "/sitecore/content/tenants/*/sites/*[@@key='somesitename']/*[@@templateName='News List']/*[@@templateName='News Folder']/*[@@templateName='News Page']";

Not pretty, but 100% better performance.

Try to minimize Attribute parameters

Further interrogation of the developer reveals:

  • Only News Folders are allowed to be inserted below News Lists.
  • Only News Pages are allowed to be inserted below News Folders.

This immediately allows for simplification as we don’t need to specify the type of Items we’re looking for, or that of their parents:

query = "/sitecore/content/tenants/*/sites/*[@@key='somesitename']/*[@@templateName='News List']/*/*";

All searches should have context

Since this post is about the Sitecore presentation layer, one can make a few key assumptions:

  • The query is being called from a Rendering.
  • The Rendering presents a specific list of News Pages.

The following approaches in development can give us additional simplification and therefore performance:

  • The Rendering should have a Datasource, and that Datasource should be an Item with a Template of “News List”.
  • To ease the Author’s ability to select only the current site’s News List Items, the Rendering’s Datasource Location can be set to something like the following: ".ancestor-or-self::*[@@templateName='Site']/home"
  • We can use the Datasource Item as the context node of the query.

Putting these facts together produces code which greatly simplifies our XPath statement and retrieves the desired list of Items with excellent performance:

var query = "./*/*";
var newsList= RenderingContext.Current.ContextItem;
var items = Sitecore.Data.Query.Query.SelectItems(query, newsList);

We have now:

  • Improved query performance.
  • Made the query more portable by removing Template and root path specifications.
  • Improved the Author’s editing experience by forcing them to specify which New List they want to display.

XPath Considerations Rollup:

  • Never start a query from the root (“/” or “/sitecore”). Know the most significant node in your information architecture and start your query there.
  • If you don’t have a starting node in mind, consider the context of Site, and start from SiteContext.StartPath. Aside from trimming the number of inspected Items, this simple rule will also prevent the appearance of _Standard Values Items in your result set.
  • Know which tree level you need to query and include it in your query. “./*/*/*[@@templatename=’Page’]” is surprisingly efficient in comparison to “//” and can often be used to achieve the same effect.
  • Instead of using Database.SelectItems(), consider using Sitecore.Data.Query.Query.SelectItems(query, contextNode) which forces you to explicitly define the top node of your search and implicitly specifies the Database and Language of the query through the provided contextNode, making your query context-safe.
  • Remeber that @@templateName or @@templateId do not support inheritance. Keep your queries future-proof by avoiding these attributes.
  • Store queried Items together in the Content Tree to simplify Query parameters. Every “@” or “@@” in your query incurs processing cost.
  • If you need to use a compound query “|” to retrieve all of your results, consider the complexity of each discrete query. If the queries are not extremely simple, consider using SearchContext instead.
  • If you truly need to use “//” you should switch to SearchContext.

ContentSearch Pitfalls

Sitecore’s Solr-backed ContentSearch API is an incredibly powerful and fast data access system. However, it is often hobbled by poor implementation.

Example

Here’s our poorly performing example, which was cribbed from a similar query seen in a production installation.

items = query.Where(i => i.Path.StartsWith("/sitecore/content/"))
.Where(i => i.Name.Contains(searchWord)
|| i.Headline.Contains(searchword)
|| i.Content.Contains(searchWord)).ToList()
.Where(r => r.Language == contextLanguage.Name);

There’s a lot going on here. Let’s fix the obvious things first.

.ToList() should always be last.

The .ToList() extension method actually executes the query and makes it “real”. Any further LINQ operations occuring after .ToList() are against the realized result set and are not part of the original query. As a rule of thumb, you should consider any LINQ operation after .ToList() to be an additional foreach loop through your results. Let’s re-order:

items = query.Where(i => i.Path.StartsWith("/sitecore/content/"))
.Where(i => i.Name.Contains(searchWord)
|| i.Headline.Contains(searchword)
|| i.Content.Contains(searchWord))
.Where(r => r.Language == contextLanguage.Name).ToList();

One should always be wary about using .ToList() on an IQueryable because you may be shooting yourself in the foot, particularly if you need pagination in your result set. The better UX libraries for managing paginated results can actually handle an IQueryable directly, allowing you to avoid writing pagination code yourself.

Relevance Matters, Use Filters for Speed

In general, one should order the .Where() clauses from least-specific to most-specific. With Sitecore 9, one can also take advantage of Solr’s edismax Filters, which are sort of like database Views, or Index columns, with caching. Use “Filter” on the broad strokes of the query as follows:

items = query.Filter(r => r.Language == contextLanguage.Name) 
.Filter(i => i.Path.StartsWith("/sitecore/content/"))
.Where(i => i.Name.Contains(searchWord)
|| i.Headline.Contains(searchword)
|| i.Content.Contains(searchWord))
.OrderBy(r => r.Name).ToList();

The re-ordered and filtered query should now be significantly better performing. Here’s my filter priority rule of thumb:

  1. Language
  2. Search Start Location/Ancestor Item
  3. Item Template (if applicable or necessary)

Start your search from a known location in the content tree

Just like XPath, we want to limit our search. This prevents the following from being exposed to the public in search results:

  • _Standard Values
  • Branch Templates
  • System settings
  • Dictionary Items
  • Rendering Definitions
  • Items from other Sites in the installation

In our example, there’s an attempt to establish a path, but the syntax is incorrect. Here’s the example with the correct syntax for establishing a “context” node for the search:

var siteRoot = contextDatabase.GetItem(contextSite.StartPath, contextLanguage);
items = query.Filter(r => r.Language == contextLanguage.Name)
.Filter(i => i.Paths.Contains(siteRoot.ID))
.Where(i => i.Name.Contains(searchWord)
|| i.Headline.Contains(searchword)
|| i.Content.Contains(searchWord))
.OrderBy(r => r.Name).ToList();

The magic is in i.Paths.Contains(ID) which provides a high-performance way to limit a query to a specific area of the Content Tree.

Use .Like() and .Boost() to search text

Unfortunately, it’s very difficult to divine the correct way to search for text within a field. The answer is not C# intuition. It requires an understanding of the Solr query parsers used behind the scenes. Here’s some basic takeaways:

  • Use .Like(text, slop) instead of .Contains()
  • Slop is the allowed distance between words in a phrase. 0.0f means the phrase must occur as supplied. It’s a safe default value.
  • Use .Boost(decimal) to set matching priority when you’re searching multiple fields with a query.
  • The .Boost() value needs to be between 0 and 1, and no two .Boost() values should be the same in a given query. Higher numbers give a field higher matching priority.

Here’s our example with Like, slop, and Boost:

var siteRoot = contextDatabase.GetItem(contextSite.StartPath, contextLanguage);

var slop = 0.0f;
var nameBoost = 1.0f;
var headlineBoost = 0.9f;
var contentBoost = 0.8f;

items = query.Filter(r => r.Language == contextLanguage.Name)
.Filter(i => i.Paths.Contains(siteRoot.ID))
.Where(i => i.Name.Like(searchword, slop).Boost(nameBoost)
|| i.Headline.Like(searchword, slop).Boost(headlineBoost)
|| i.Content.Like(searchWord, slop).Boost(contentBoost))
.ToList();

The above will produce very “natural” looking search results for the search term, and do it very quickly.

Use Query<T> to build your models for you

This is bad:

var results = query.ToList();
foreach (var result in results)
{
modelList.Add(mapper.MapToNew<ViewModel>(result.GetItem()));
}

We’re ruining performance by adding a complete loop through the result set and hitting the Sitecore database for the full Item for each record in our search results, just so we can hydrate ViewModels.

This is a significantly better approach:

public class ViewModel : Sitecore.ContentSearch.SearchTypes.SearchResultItem
{
[IndexField("field1")]
public string Property1 {get; set;}

[IndexField("field2")]
public string Property2 {get; set;}
// etc...
}

// and now in your Repository...
public IEnumerable<ViewModel> GetModel(Item contextItem, Site contextSite, Language contextLanguage)
{
IQueryable<ViewModel> query = context.GetQueryable<ViewModel>();
query = query.Where(...something...);

return query.ToList(); // A list of ViewModels for your Controller.
}

ViewModel is now a SearchResultItem and can be hydrated directly by the ContentSearch API. No extra loops, and we never hit the Sitecore database.

Your ViewModel doesn’t have to descend from SearchResultItem

Inheriting from SearchResultItem has some drawbacks:

  • SearchResultItem contains many methods and facade properties that hide potential database calls.
  • SearchResultItem contains many facts that your View doesn’t need.
  • Assuming you’re building a modern, AJAX-enabled web application, SearchResultItem has several properties that resist basic JSON serialization.

Consider the following alternative base class:

public class SearchResultViewModel
{
#region Utility Fields borrowed from SearchResultItem
[IndexField("_group")]
[ScriptIgnore]
[TypeConverter(typeof(IndexFieldIDValueConverter))]
public virtual ID ItemId { get; set; }

[IndexField("_name")]
[ScriptIgnore]
public virtual string Name { get; set; }

[IndexField("_database")]
[ScriptIgnore]
public virtual string DatabaseName { get; set; }

[IndexField("_language")]
[ScriptIgnore]
public virtual string Language { get; set; }

[IndexField("_path")]
[ScriptIgnore]
[TypeConverter(typeof(IndexFieldEnumerableConverter))]
public IEnumerable<ID> Paths { get; set; }

[IndexField("_template")]
[ScriptIgnore]
[TypeConverter(typeof(IndexFieldIDValueConverter))]
public virtual ID TemplateId { get; set; }
#endregion
}

A base class like this provides the critical search facets needed to build queries, but intentionally hides them from JSON serialization, and contains no performance-sucking methods to trip up developers building Views.

Next Steps

We’ve discussed performance pitfalls you can avoid in your Repositories. Here’s a small list of things to remember:

  • Use the Item-Repository-ViewModel pattern to isolate Item retrieval.
  • Use the ContentSearch API when possible for performance.
  • For both ContentSearch and XPath, establish a Context Item for your searches.
  • Build your ViewModels as small as possible:
    • This enforces separation of concerns
    • If you use a model-mapping technology, fewer ViewModel properties means fewer loops in .NET Reflection land.
    • Serializing ViewModels is easier if you start from scratch. You can then control the serialization method of each Property.

Following the recommendations in this post will keep your Sitecore build speedy and bug free. Time to go inspect your own code!

2 thoughts on “Are your Models killing performance?

  1. I have to say – Kudos for the post, but I do largely disagree with your approaches and reasoning.

    Item > Repository > (Service) > Viewmodel is predominantly to achieve proper isolation for unit testing and flexibility concerns, from a pure performance point of view it is heavy as there are many transactions occurring (albeit small)

    Sitecore Query is just bad – just don’t use it, I genuinely can’t put it more bluntly. If you have to – I agree, set a context, don’t traverse the tree etc etc.

    The content search api is based upon an inherently stale data source. Sitecore in fact takes an intrinsically fast full text searching tools and makes them slow by storing WAY too much content in there. This leads to often incorrect results, in the case of solr / azure search etc, large amounts of data being streamed over http(s) which has inherent latency and potential security issues.Where the content search api does tend to help (as indexing helped before the content search api) is to cover the shortcomings in the Sitecore heirarchical model. So – for example – get all ancestors of type x. But truthfully, once you have the item uri of a sitecore item, you are usually looking at low ms or ticks to get at its data due to caching. In addition – queries are virtually untestable.

    I do agree with building viewmodels with only what you use.

    In terms of Glass Mapper in particular, it does not use reflection to establish property values for the most part due to inherent perforrmance concerns. instead (as of V4 at least) it was using compiled lambdas. In terms of performance, Glass Mapper’s own site used to have live tests showing that it could be (from database to screen) actually quicker than Sitecore itself. In addition, Glass Mapper(V4) was largely microoptimised with many unit tests in the codebase showing the reasoning for its performance decisions made down to even string manipulation.

    I have worked on many performance optimisation projects on the Sitecore platform, many of them blaming things like the ORM for the relative lack of performance. In almost all cases, I have proven that it is not the ORM that is at fault (though occasionally it might be a circular parent > child > parent mapping that may be at fault). 99% of the time it is poor thought on the part of the developer in terms of how there code is going to affect the sitecore database or how there code is going to behave at scale.Other things have been – IoC container usage (be it – the container chosen, overuse, repeated usage), incorrect calling of controllers / renderings without or bypassing caching, poor entries in pipelines doing more than they should. Never much if any of the above in the 25 + Sitecore projects I have personally been involved in.

    1. Regarding the complaint about micro-transactions across the Item-Repository-Service-ViewModel, I’m going to disagree. The task breakdown ends up being something like this:

      • Get Context from Sitecore (Controller)
      • Get Items that need to be Rendered (Repository)
      • Make ViewModel (Repository or Model Builder)
      • Get View (Controller or View Resolver)
      • Render View (View)

      I’m going to assume you’re following decades-old industry best practice for organizing your code (a la Code Complete).

      If you were going to go more traditional OOP and use base classes, you’d still have roughly the same number of handshakes. Even if you were to approach this in a more procedural manner, you’d still see roughly the same operations, assuming any respect for DRY at all. All this design pattern does is isolate these methods into discrete areas that can be (in practice) modified or replaced while minimizing regression testing. It gives developers a bit of religion to follow and makes the code more legible and predictable.

      If you are intensely interested in speed and you don’t care about unit testing, then yes, you could do everything in a single block of code, but your overall code would not be DRY and I would curse it every minute I was forced to debug it. The handshakes between methods when using a proper Design Pattern are on the order of nanoseconds. Considering you also complain (and rightfully so) about millisecond long HTTPS transaction speeds between storage and web server, I’d say that the decision to not use a modern, testable OOP approach is early over-optimization and therefore a strain on the budget. Coupled with a lack of respect for DRY principle, I’d say that approach is a project risk.

      Let’s talk about data sourcing. You don’t like Sitecore Query and you don’t appear to like the ContentSearch API either, which begs the question: where are you getting your data? You can’t get everything from Database.GetItem(ID, Language) or RenderingContext.Current. It’s just not a reasonable expectation, because it assumes complete assembly of the page by the content author, a-la SXA. This might be fine for campaign landing pages on a 100-page website, but will not scale to a 50,000 page hospital or university site. Contrary to the current SXA fad, content management systems aren’t just design management systems They’re supposed to provide some automation to the maintenance of the overall information architecture, and that requires data queries.

      You mention stale data, which I also found interesting. In my opinion, one thing that Sitecore implementers fail to establish during the specification stage is an expectation around content freshness. Coupled with a frequent lack of understanding on how Sitecore’s Workflow and Publishing features work, this tends to result in one of the largest sources of Sitecore frustration among content authors: Content either fails to show up on time, or content shows up before it’s finalized. Establishing some expectations around when content needs to be live will affect your approach to development
      around that particular content type or page fragment:

      • What data repository do you retrieve the content from?
      • What level of caching do you need to support for that particular content type (or page fragment)
      • Do you need to customize cache keys to ensure presentation of new data is timely?
      • What is the “normal” SLA for “content going live” that won’t impact site performance under load? (content authors will generally assume “immediately” but this is seldom a reasonable expectation)

      My point is that your complaint about the freshness of the Solr/Lucene/Azure search repository is irrelevant. Content on the CDs almost never needs to be “up to the millisecond”, and once you introduce server performance constraints due to budget, as well as 3rd party CDNs the suggestion becomes even less viable. You have to cache data somewhere!

      This post covers the incredibly common mistakes I see out of Sitecore shops worldwide. Clearly you’ve got a better than average grasp of Glass and you’re avoiding the pitfalls I discuss in this post, which is good. Many many developers use Glass based on older, slower, and more error prone strategies that have proliferated on the Internet and/or have been encoded as “standard” for their organization, hence my warnings. Frankly, every time I run into Glass on a project, its implementation, tends to be a source of trouble, particularly when paired with code gen. Everything you mention in your last paragraph is accurate: Circular references, incorrect Controller access, caching (which I mention!), pipeline messes, and the like. I personally haven’t seen a ton of IoC problems, but then, a lot of my rescue missions were either before Sitecore supported that customization, the developers chose not to customize (smart), or the project was after Sitecore changed their approach to something more standard for ASP.NET MVC. I have heard it can be a nightmare though.

      I don’t think we disagree as much as you say.

Leave a Reply to Nat Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.