Posts Tagged ‘.NET’

Performance Spotlight: Detecting Multithreaded Bottlenecks in .NET

Case study as hands-on software architect for a large three-tier business application for global services and consulting firm.  

Takeaways: Quickly detect performance bottlenecks in multithreaded .NET applications by measuring and analyzing “processor queue length” and “thread contention” performance counters in your monitoring practice. 

Metric: Frequent or sustained processor queue lengths of more than two per processor may indicate a bottleneck or locking issues.  If coupled with high thread contention levels or high CPU utilization, it may point to locking or thread synchronization issues.

Situation

Performance is a critical aspect of successful project delivery and usability.

Recently, I was appointed software architect for a large business-critical web application with a global audience.  This application was having performance problems, and the team was trying to get it to pass its performance measurement test (PMT).  Earlier in the year, the the application had been the subject of a month-long review by an outside team of scalability consultants.

The web application is a classic three-tier Microsoft stack with an ASP.NET front end, a WCF business services tier, and a SQL Server backend. It spans roughly one million lines of code including test suites – the larger and more complex applications are, the harder it is generally to find problems.  On the whole, I found the solution very well coded, with advanced use of templates, interfaces, design patterns, and layering.

Solution network diagram

I took a look at the monitoring data they were gathering during performance stress test sessions.  There was a basic set of performance counters, and not much stood out: for the front-end web and middle-tier services, CPU was only moderately loaded at 25-40%, and all other major subsystems – memory, disk, network – were barely taxed at all.  The SQL Server appeared to be barely turning over, with single-digit CPU usage and no stress on other subsystems.

Where was the problem?  

Only one performance counter really stood out as spiky and hyperactive: System\Processor Queue Length.  This is one of those counters that a lot of developers may have heard about, but have never really used or looked at.  PQL was regularly spiking to 150-200, and maxing out at 250 (click for larger image): 

Performance stress test showing spiking processor queue length

The general guidelines from Microsoft say that an average PQL of two or more per core indicates a bottleneck:

“A collection of one or more threads that is ready but not able to run on the processor due to another active thread that is currently running is called the processor queue. The clearest symptom of a processor bottleneck is a sustained or recurring queue of more than two threads

What this means is that on a 12-core hyper-threaded server (24 logical cores), in our application threads were stacked 10 to 20 deep on each processor just waiting for the CPU.   This is the opposite of high performance: more specifically, it’s a bottleneck.

Advancing the analysis

Based on this & research, I compiled a more detailed set of performance counters that included .NET thread contention levels: the Contention Rate / sec counter in the .NET CLR LocksAndThreads category.  These counters were collected in a subsequent PMT run:

Detail of contention rate / sec

This shows that on average there are ~90 contentions per second where a thread is waiting on or blocked by a lock – and sometimes up to 300 per second.   By itself, 300 contentions per second might not be an issue – but coupled with a high PQL it seemed suspect.

Finding the culprit

When troubleshooting systemic issues, start with the core/common code structures and move outward.

How do you find the cause of excessive contention in a million lines of code that’s studded with locks, much less on a new project you’re not familiar with?  Answer: very strategically. 

I’ve had many years’ experience as a consultant troubleshooting clients’ solutions and reviewing new solution codebases.  If you’re lucky, you’ll be able to reproduce the issue in a development environment where you can use profiling tools to identify performance hotspots or concurrency bottlenecks.  For the harder issues to solve, the first thing I will do is to code review any core/common code related to suspected areas of functionality.  In this case, searching for Parallel.For statements would have been a great bet…

On this application, I had already in fact flagged a piece of code in a core component.  Caches are essential to high performance, and the business logic service tier was home to more than 20 in-memory caches, all of which derived from a common base class, SynchronizedCache. 

Sample cache declaration showing use of common base class

In this base class the developers parallelized the cache search functionality, presumably to take advantage of nice things like a 72-core production environment.  Unfortunately, there’s a lock statement in the heart of it:

Parallel.For with a lock in its heart!

Developers experienced with high-performance multithreading know to avoid locking within parallelized code: it’s a real performance killer.   Here the code, with many thread combing through a cache that could contain a million entities, locks the shared matches list when it finds a match, causing all other matching threads to pile up in a traffic jam behind its lock. 

A search-heavy run through the .NET profiler in Visual Studio 2012 supported this analysis.  The little “flame” icon shows this line as a hotspot, which by the way doesn’t conclusively indicate a bottleneck – one might reasonably look at this and expect that this would be “hot” since it’s doing a lot of search work: 

Visual Studio profiler showing hotspot on Parallel.For

But this is a classic multithreading anti-pattern.  Locking — and blocking — on a shared resource (the matched results list in this case) negates the fundamental reason for multithreading.  Multithreading splits the work up among processors, and locking and synchronization in general brings them together.  It also adds context-switching work for the processor, since each thread match forces, on average, a context switch out and back during a lock wait.

Fixing it

Best practice in multithreading: minimize locking on shared shared resources.

So we’d like to search a large cache, dividing the work among many threads in parallel, but not lock on a central list; how do we accomplish that?   The solution is straightforward: have a results list for each thread, then merge the thread’s results into the master list when complete.  The “penalty” for this is creating a few extra lists — one per thread – which is almost zero.  There’s even a variant of Parallel.For that has a delegate for local thread init action (create local results list) and completion action (lock and merge to master) – perfect!

After reworking this tiny stretch of code, the results were dramatic: PQL dropped to zero, thread contention dropped to zero, and – most surprisingly – CPU utilization dropped dramatically, from its range of 25-40%, to around 8-10%.  To me, this says that the CPU was having to do a lot of work juggling all the threads and locks and context switches that resulted from the anti-pattern. 

With the fix in, a 90-minute performance measurement test shows the radical improvement:

Performance counters after fix

Conclusion

Multithreading can be a curse as well as a blessing; monitoring processor queue length and thread contention will reveal how effectively your application is utilizing processor threads and cores.  Just a single statement in a million lines of code can dramatically impact optimal performance and throughput.

Add the following performance counters to your monitoring bag of tricks to identify bottlenecks in multithreaded .NET solutions:

  • System\Processor Queue Length.  Look for any spikes or sustained queue lengths of more than one or two per processor.
  • .NET CLR ThreadsAndLocks\Contention rate /sec.   Make sure you understand what in your code  is causing contention; look for spikes or sustained contention.
  • Processor\% Processor Time.   I am sure you’re already monitoring this!

More Reading

ASP.NET’s dirty little secret

image[tweetmeme source=”KeithBluestone” only_single=false]It’s no wonder most developers are confused about Microsoft’s ASP.NET. Although it was released over eight years ago in January 2002, there are few if any coherent explanations of the ASP.NET page life cycle. This is unfortunate, as it’s probably the most critical aspect for web developers to understand.

The Microsoft documentation is almost worthless (see below), and strangely enough, books and blogs give it only a cursory treatment, without really explaining how best to use it – and why.   The net effect is a huge number of developers who struggle to make their ASP.NET applications work.

It’s ASP.NET’s dirty little secret: very few developers really understand the page life cycle and the page model, and there’s little meaningful guidance to be found.   From a Microsoft product/program perspective, this seems like a monumentally costly mistake: developer guidance – or lack thereof – has a huge impact on the success of the entire product ecosystem, from Microsoft to developers to clients to end-users.

In a future post: I’ll offer up my own take on the ASP.NET page life cycle, with metaphors that developers can relate to.

When documentation isn’t guidance…

Sure, there’s plenty of documentation reams and reams of it. There are a thousand articles which offer functional, mechanical, and temporal descriptions of the page life cycle and its ten events.

But there’s little that makes sense to the everyday developer.  There is no real guidance, no best practices for page architecture that tells us how a well-designed page should be factored.  There is plenty explaining the order of events;  but little explaining their meaning.

I’m not sure why Microsoft let this happen: ASP.NET is a great framework.  It’s hard to imagine what the development costs associated with this must be across the industry.  You think that Deep Horizon spill is big?  This one’s been leaking since January 2002!  One can only imagine the millions of developer hours spent debugging issues related to an imperfect understanding of the page lifecycle.   I know I’ve spent my share.

A hammer and a saw does not a carpenter make.
— Anon.

Backstory

I’m doing  architectural reviews of a couple ASP.NET projects for my client.  These sites are regular, relatively low-volume, get-things-done business apps:  a lot of forms-based data entry, forms authentication, a healthy dash of reporting, some middle-tier WCF services, and a handful of back-end databases.  The “meat and potatoes” of ASP.NET, as it were.

While the sites actually work pretty well, the ASP.NET code is a mess. (I’m sorry, I couldn’t come up with any nicer way to say it.)  It looks like it was written by developers who had success writing basic ASP.NET forms – like most of us, once upon a time — but then everything went to hell in a handbasket in the seething cauldron of project schedules, nested user controls and AJAX postbacks.

Personal aside: when I first began doing ASP.NET around 2002, I had no clue. I had a simplistic, relatively inaccurate conceptual model of the page lifecycle,  limited primarily to a dysfunctional love-hate relationship with the Load and PreRender events.   I certainly didn’t have anywhere near a full understanding of viewstate, databinding, or the postback model.  Like most of the developers out there, I suspect, I learned just enough to get by – and got myself into a heap of trouble as the pages I was building became more complex.

“Lifecycle not optional”

The facts are this: without a thorough understanding of the page lifecycle, the ASP.NET developer is doomed to a long and rocky road. The same may be said for the site’s end users, who will have to deal with bugs and erratic behavior.

The lifecycle is central to the page developer’s world:  it weaves together page creation, a hierarchy of controls, viewstate, postback, eventing, validation, rendering, and ultimately, death — against the backdrop of the page model and the HTTP application pipeline, which fuse the the protocol of the HTTP request with the mechanism of rendering an HTML response.

Misconceptions about core concepts like the page lifecycle cause a huge number of problems.  When a page becomes a little more complex than a basic form – a few user controls here, some databinding there – things start to break.  Under schedule pressure, developers start relying on oh-so-clever software hackery — which in the best scenario, actually makes things work reasonably well.   The best result, unfortunately, is a hacked-up codebase that only works “reasonably well.”

It is important for you to understand the page life cycle so that you can write code at the appropriate life-cycle stage for the effect you intend.
ASP.NET Page Life Cycle Overview (MSDN)

The control life cycle is the most important architectural concept that you need to understand as a control developer.
Developing Microsoft ASP.NET Server Controls and Components (Microsoft Press)

image

Back to the client

In my client’s code, I consistently found treatments of the page Load and PreRender events that showed a lack of understanding of the life cycle.   Sometimes everything was done on Load, and sometimes on PreRender;  sometimes postback conditions were respected, and other times, not so much.

The code was also guilty of minor abuse of session state (using it as the backing store for user controls), occasional major violence to user controls (page reaching in, control reaching out — a violation of encapsulation), and a near-complete aversion to object-oriented design (primarily a procedural approach).

Some of this, like using session state to back user controls, I believe was due to the fact that the developers didn’t understand the life cycle:  how the page databinds its controls, in what order it all happens (top-down or bottom-up?), and how data flows on the page throughout the life cycle.  In this case, using session state worked — for one control on one page, in one window — so it shipped.

This would all be duly noted in the architectural report.  But to be most helpful, I really wanted to give the client team some rock-solid guidance and best practices for developing ASP.NET web forms:  what should one do to write clear, stable, maintainable ASP.NET applications?  What kind of code should go in Init, Load, and PreRender? How to explain clearly to the developers what the events really mean?

20% of the effort produces 80% of the results I started off thinking that compiling best practices would be a reasonably simple affair involving a few web searches.

The search…

I was looking specifically for ASP.NET “best practices” related to the page lifecycle that would address questions like these:

  • What key concepts should developers keep in mind when designing ASP.NET pages to work in harmony with the page lifecycle?
  • How should a developer best structure the page to address common scenarios, such as a form which accepts input from its users?
  • What kinds of things should be done during the page Load event?  In the Init event?  In PreRender?    What are some guidelines for deciding when to put what types of code in Load vs. PreRender? Init vs. Load? Etc.
  • What kinds of things should be done –or not done — during postback (IsPostBack == true), especially in specific, basic scenarios — like that standard data input form?
  • What key concepts should developers keep in mind when using controls throughout the page lifecycle?  E.g. what are the responsibilities of the page or parent control vs. those of the child control?

Surely there was a site that would distill “The Tao of ASP.NET” or somesuch for the poor developer; something that would convey a clear, coherent conceptual model.

The emperor has no clothes

What I found first was the MSDN documentation from Microsoft on the ASP.NET Life Cycle.   This might charitably be described as mind-numbingly functional. Can I get a witness??

From the official ASP.NET Page Life Cycle Overview on MSDN:

Use this event to read or initialize control properties.
— Explaining the page Init event

Use the OnLoad(EventArgs) event method to set properties in controls and to establish database connections.
— Explaining the “typical use” of the page Load event

If the request is a postback, control event handlers are called.
— On the postback event handling phase

That’s it?  In Load, we should “set properties in controls and establish database connections?”  To me, this is of shockingly little value.  At this point I was thinking, wow, if this is the official guidance, it’s no wonder most developers don’t really understand the ASP.NET page life cycle.

It’s really hard to believe that this is the extent of Microsoft’s offering on such an important subject to its developers.   Don’t they understand what it means not only for Microsoft and its image, but also developers and their clients, the users… the whole ASP.NET ecosystem?

Sigh.

Order is not the whole of knowledge

Outside the official Microsoft channel, I found scores and scores of sites which catalogued the various lifecycle events, many of them replaying the MSDN documentation without much additional detail.  Site after site described  the page life cycle and its events as if  “to order” were “to know.”

There were few meaningful metaphors for developers to relate to, and little if any guidance as to how to actually use the events in a practical manner.  Witness again:

Objects take true form during the Load event.
— “The ASP.NET Page Life Cycle,” 15seconds.com, in an apparent fit of Buddhistic nihilism

The Pre_Render [sic] event is triggered next and you can perform any update operations that you require prior to saving the View State and rendering of the web page content.
— “Understanding the ASP.NET Page Life Cycle,”  aspnet101.com

PreRender is often described as “the last chance to do something that affects viewstate.” Am I the only one who doesn’t find this particularly helpful?

Personal aside: I didn’t understand viewstate fully; until recently, that is, when I read Dave Reed’s epic, thorough, and now-classic treatment in Truly Understanding Viewstate.   If you want to be an ASP.NET pro, read this excellent post.

It’s a gusher…  a leaky abstraction

After many hours across days of searching, and hundreds of articles read, I found few clues.  I did find grim evidence in blogs and forums that what I was after might not exist:  a large number of developers expressing frustration with how ASP.NET works:

Let me tell you all that if you want to work from home and be a freelance web developer, ASP.NET will stress your life out and you will die young.
— From a multi-year discussion on the post “ASP.NET Sucks” by Oliver Brown

WebForms is a lie. It’s abstraction wrapped in deception covered in lie sauce presented on a plate full of diversion and sleight of hand.
— “Problems or Flaws of the ASP.NET postback model,” quoting Rob Conery in “You Should Learn MVC,” channelling Will Ferrell

Are there any best practices or recommendations around using "Page_PreRender" vs "Page_Load"?
— Max, a lonely voice in the wilderness,  on the thread “Best Practices around PreRender and Load”  (July 2007)

image

" WebForms is a lie! It’s abstraction wrapped in deception covered
in lie sauce presented on a plate full of diversion and sleight of hand!"

Finally, from respected veteran blogger and ASP.NET consultant Rick Strahl:

It can sometimes get very difficult to coordinate the event sequence for data binding, rendering, and setup of the various controls at the correct time in the page cycle. Do you load data in the Init, Load or PreRender events or do you assign values during postback events?
— “What’s Ailing ASP.NET Web Forms,” Rick Strahl (Nov 2007), citing complexity in the page lifecycle as a major ASP.NET issue

If it’s “very difficult” for an ASP.NET guru, what hope is there for Joe and Jane ASP.NET Developer?

Amidst the sea of holes…

In the middle of all of this I did find this gem of conceptual guidance:

Web Forms is framed as "page-down" instead of "control-up"… From the perspective of a control, databinding looks like "someone who is responsible for me, because they contain me, will set my data source and tell me when to bind to it. They know what I am supposed to do and when, *not* me. I am but a tool in the master plan of a bigger entity."

From the perspective of a page, databinding looks like: "I am the biggest entity on this page. I will determine relevant data, set data sources on my children, and bind them. They will in turn set data sources on their children and bind them, and so forth. I kick off this entire process but am only responsible for the first logical level of it."
Bryan Watts in a reply on Rick Strahl’s blog, offering a conceptual model of page-control interaction

Brilliant!!  Watt’s Law of the page-control interaction. Finally, a conceptual model that a developer can relate to, understand, and most importantly, help teach effective page and control design.  This simple guidance conveys:

  1. How the control should be structured to work with its parent (page will set my data source and bind to it); and
  2. How the page should be structured to work with a child control (I will set the data source of the control and tell it when to bind)

This was the only real ray of light so far.  I forged on, throwing hail-Mary terms like “zen” and “tao” into my search.  It seemed like the more I read, the less I understood.

This was partly because as I delved further I realized there were aspects of the page model and lifecycle that I didn’t truly understand, either.

The great white (paper) hope

Undaunted, I renewed my subscription to Safari Books Online, publishing home to some of the world’s finest technical authors.  If the answer was anywhere, surely it would be here. I lapsed briefly into optimism:  maybe the blogosphere had eschewed its treatment of this topic because the publishing world had illuminated it so thoroughly?

Not so much. Most books, while fantastic learning resources – far better for learning than reading random articles on the web –  spilled precious little ink on the page life cycle, rushing on to more exciting topics like databinding and the GridView control.

image ASP.NET 3.5 Unleashed treats the life cycle almost dismissively:

Ninety-nine percent of the time, you won’t handle any of these events except for the Load and the PreRender events. The difference between these two events is that the Load event happens before any control events and the PreRender event happens after any control events.
ASP.NET 3.5 Unleashed, “Handling Page Events

imageJesse Liberty’s Programming ASP.NET 3.5 presents a great life cycle flowchart: it has a lot of order and flow information, but little guidance.  A page or so later, and the lifecycle is dragged outside and summarily dispatched in the following fashion:

The load process is completed, control event handlers are run, and page validation occurs.
Programming ASP.NET 3.5,Website Fundamentals: Life Cycle

The excellent book Developing Microsoft ASP.NET Server Controls and Components (Kothari /Datye, Microsoft Press) is a must-read for any serious ASP.NET architect or control developer.  But they, too, shy away from the concerns of guidance:

In this phase, you should implement any work that your control needs to do before it is rendered, by overriding the OnPreRender method.
Developing Microsoft ASP.NET Server Controls and Components, “Chapter 9, Control Life Cycle

image

From Programming ASP.NET 3.5: running before walking

As I read each book’s treatment of the life cycle, I kept thinking, how can you move beyond the page life cycle when you haven’t described what we’re supposed to do with it?

Conclusion

What do the Init, Load, and PreRender events really mean? How should developers architect pages to address common scenarios? Where are the workable models and metaphors?   The order of page events is only one aspect of their meaning. And explaining events in terms of their interaction with viewstate, postback, or the control tree is most useful only when one thoroughly understands those concepts.

I am a big proponent of software designs and user interfaces that follow natural conceptual models: it’s a win-win for developers and all stakeholders across the entire software development lifecycle.    See my post on metaphors in software design.

In the end, it took a very long time and a lot of dogged persistence for me, an experienced software architect, to feel like I thoroughly understood the ASP.NET page model and its lifecycle:  the “bottom-up” of control initialization vs. the “top-down” of control loading, the meaning of the Init, Load, and PreRender events — and specifically what kind of code I would use in them, when and why– and a clear understanding of viewstate and postback.

Why burn money when developing ASP.NET pages is so much more fun?Along the way, I realized why a lot of developers have issues with ASP.NET.  It’s ASP.NET’s dirty little secret: very few developers really understand the page lifecycle and the page model – and very little “real” documentation exists for it.

From a Microsoft product/program perspective, this seems like a monumentally costly mistake. On my Googling travels through ASP.NET forums and in blog post comments, I “met” tons of developers who were struggling with ASP.NET – and who are still struggling with it.  It’s pretty congruent with my own experience, and of the ASP.NET developers I’ve known in person.

As far as I’m concerned, the best architectures are those that are almost impossible to screw up.  Despite its power and flexibility, ASP.NET falls short on that account.

As far  as Microsoft goes, devoting time, care, and energy to make sure developers have crystal-clear guidance represents a huge opportunity for the success of the product ecosystem. Imagiimagene how many millions of corporate development hours have been spent since January 2002 debugging ASP.NET apps because a developer didn’t understand the page lifecycle, viewstate,  postback, or Watt’s Law of Databinding.

Even worse, how many clients and end-users were frustrated because their Microsoft-based web apps didn’t work properly? You can see how this issue could have disastrous implications for the ecosystem.  As a long-time Microsoft fan, I hope they make sure this never happens again.

Coming soon: in my next  post, I’ll offer up my “three dualities of ASP.NET,” clear metaphors that will enable ASP.NET developers to understand the lifecycle and write complex applications that work well.

***

More Reading

  • Truly Understanding ViewState, Dave Reed (Infinities Loop)
    A comprehensive and humorous look at the uses and mis-uses of viewstate; a definitive resource and must-read for the professional ASP.NET developer:
    “It’s not that there’s no good information out there about ViewState, it’s just all of them seem to be lacking something, and that is contributing to the community’s overall confusion about ViewState.”
  • Understanding ASP.NET ViewState, Scott Mitchell (4GuysFromRolla.com)
    A great starter article.
  • The ASP.NET Page Object Model, Dino Esposito (MSDN)
    Another excellent article from an top-notch writer…  but shy of guidance.
  • What’s Ailing ASP.NET Web Forms, Rick Strahl (West-Wind.com)
    A great post with pros and cons of ASP.NET and a comparison to MVC.
    “Microsoft built a very complex engine that has many side effects in the Page pipeline.”
  • 10 Reasons Why ASP.NET WebForms Suck, JD Conley (“Starin’ at the Wall 2.0”)
    Like a true mockumentary, this dial actually goes up to 11.  What gets that honor?
    “Number 11: The odd feeling that you have to beat the framework into submission to get it to do what you want.”
  • Page Load is Evil, Wayne Barnett
    I can’t really say I agree. But this post is accompanied by a deliciously raging discussion that boils over into near-flame-war at the end.  Unfortunately with no pat answers to what to do in Load or PreRender or Init, for that matter.
    “It seems to me that the root problem is that the page life cycle isn’t given the respect it deserves in tutorials and documentation.” – Anonymous poster
    I would desperately like to see more on this subject [of page life cycle]!  … I’ve tried reading up on this before, but from a designer/programmer’s point of view it’s sometimes difficult to get my head around the order of events or their meanings.” – Chris Ward, followup poster