Performance Spotlight: Detecting Multithreaded Bottlenecks in .NET

Case study as hands-on software architect for a large three-tier business application for global services and consulting firm.  

Takeaways: Quickly detect performance bottlenecks in multithreaded .NET applications by measuring and analyzing “processor queue length” and “thread contention” performance counters in your monitoring practice. 

Metric: Frequent or sustained processor queue lengths of more than two per processor may indicate a bottleneck or locking issues.  If coupled with high thread contention levels or high CPU utilization, it may point to locking or thread synchronization issues.

Situation

Performance is a critical aspect of successful project delivery and usability.

Recently, I was appointed software architect for a large business-critical web application with a global audience.  This application was having performance problems, and the team was trying to get it to pass its performance measurement test (PMT).  Earlier in the year, the the application had been the subject of a month-long review by an outside team of scalability consultants.

The web application is a classic three-tier Microsoft stack with an ASP.NET front end, a WCF business services tier, and a SQL Server backend. It spans roughly one million lines of code including test suites – the larger and more complex applications are, the harder it is generally to find problems.  On the whole, I found the solution very well coded, with advanced use of templates, interfaces, design patterns, and layering.

Solution network diagram

I took a look at the monitoring data they were gathering during performance stress test sessions.  There was a basic set of performance counters, and not much stood out: for the front-end web and middle-tier services, CPU was only moderately loaded at 25-40%, and all other major subsystems – memory, disk, network – were barely taxed at all.  The SQL Server appeared to be barely turning over, with single-digit CPU usage and no stress on other subsystems.

Where was the problem?  

Only one performance counter really stood out as spiky and hyperactive: System\Processor Queue Length.  This is one of those counters that a lot of developers may have heard about, but have never really used or looked at.  PQL was regularly spiking to 150-200, and maxing out at 250 (click for larger image): 

Performance stress test showing spiking processor queue length

The general guidelines from Microsoft say that an average PQL of two or more per core indicates a bottleneck:

“A collection of one or more threads that is ready but not able to run on the processor due to another active thread that is currently running is called the processor queue. The clearest symptom of a processor bottleneck is a sustained or recurring queue of more than two threads

What this means is that on a 12-core hyper-threaded server (24 logical cores), in our application threads were stacked 10 to 20 deep on each processor just waiting for the CPU.   This is the opposite of high performance: more specifically, it’s a bottleneck.

Advancing the analysis

Based on this & research, I compiled a more detailed set of performance counters that included .NET thread contention levels: the Contention Rate / sec counter in the .NET CLR LocksAndThreads category.  These counters were collected in a subsequent PMT run:

Detail of contention rate / sec

This shows that on average there are ~90 contentions per second where a thread is waiting on or blocked by a lock – and sometimes up to 300 per second.   By itself, 300 contentions per second might not be an issue – but coupled with a high PQL it seemed suspect.

Finding the culprit

When troubleshooting systemic issues, start with the core/common code structures and move outward.

How do you find the cause of excessive contention in a million lines of code that’s studded with locks, much less on a new project you’re not familiar with?  Answer: very strategically. 

I’ve had many years’ experience as a consultant troubleshooting clients’ solutions and reviewing new solution codebases.  If you’re lucky, you’ll be able to reproduce the issue in a development environment where you can use profiling tools to identify performance hotspots or concurrency bottlenecks.  For the harder issues to solve, the first thing I will do is to code review any core/common code related to suspected areas of functionality.  In this case, searching for Parallel.For statements would have been a great bet…

On this application, I had already in fact flagged a piece of code in a core component.  Caches are essential to high performance, and the business logic service tier was home to more than 20 in-memory caches, all of which derived from a common base class, SynchronizedCache. 

Sample cache declaration showing use of common base class

In this base class the developers parallelized the cache search functionality, presumably to take advantage of nice things like a 72-core production environment.  Unfortunately, there’s a lock statement in the heart of it:

Parallel.For with a lock in its heart!

Developers experienced with high-performance multithreading know to avoid locking within parallelized code: it’s a real performance killer.   Here the code, with many thread combing through a cache that could contain a million entities, locks the shared matches list when it finds a match, causing all other matching threads to pile up in a traffic jam behind its lock. 

A search-heavy run through the .NET profiler in Visual Studio 2012 supported this analysis.  The little “flame” icon shows this line as a hotspot, which by the way doesn’t conclusively indicate a bottleneck – one might reasonably look at this and expect that this would be “hot” since it’s doing a lot of search work: 

Visual Studio profiler showing hotspot on Parallel.For

But this is a classic multithreading anti-pattern.  Locking — and blocking — on a shared resource (the matched results list in this case) negates the fundamental reason for multithreading.  Multithreading splits the work up among processors, and locking and synchronization in general brings them together.  It also adds context-switching work for the processor, since each thread match forces, on average, a context switch out and back during a lock wait.

Fixing it

Best practice in multithreading: minimize locking on shared shared resources.

So we’d like to search a large cache, dividing the work among many threads in parallel, but not lock on a central list; how do we accomplish that?   The solution is straightforward: have a results list for each thread, then merge the thread’s results into the master list when complete.  The “penalty” for this is creating a few extra lists — one per thread – which is almost zero.  There’s even a variant of Parallel.For that has a delegate for local thread init action (create local results list) and completion action (lock and merge to master) – perfect!

After reworking this tiny stretch of code, the results were dramatic: PQL dropped to zero, thread contention dropped to zero, and – most surprisingly – CPU utilization dropped dramatically, from its range of 25-40%, to around 8-10%.  To me, this says that the CPU was having to do a lot of work juggling all the threads and locks and context switches that resulted from the anti-pattern. 

With the fix in, a 90-minute performance measurement test shows the radical improvement:

Performance counters after fix

Conclusion

Multithreading can be a curse as well as a blessing; monitoring processor queue length and thread contention will reveal how effectively your application is utilizing processor threads and cores.  Just a single statement in a million lines of code can dramatically impact optimal performance and throughput.

Add the following performance counters to your monitoring bag of tricks to identify bottlenecks in multithreaded .NET solutions:

  • System\Processor Queue Length.  Look for any spikes or sustained queue lengths of more than one or two per processor.
  • .NET CLR ThreadsAndLocks\Contention rate /sec.   Make sure you understand what in your code  is causing contention; look for spikes or sustained contention.
  • Processor\% Processor Time.   I am sure you’re already monitoring this!

More Reading

Visual Studio 2010: automate datestamping of code comments

Visual Studio 2010In this post, I’ll show you how easy it is to create a three-line macro to insert your name with the current date in the current file, handy for automatically adding a datestamp and/or a timestamp with your user info to code comments.

Backstory

If you’re like me, you hate repetitive tasks.   But when I’m reviewing or writing code, it’s a best practice to add my name or initials to the code, along with the current date and time.  This ensures that other team members know when the comment was made: there’s a big difference between a comment that says “fix this” that was added two years ago, and one that was added last week.   And adding my name or initials ensures that someone can always track me down and ask a question.

But to the point – I hate typing in the same text over and over:

// ALERT: this should use CurrentThread.Principal. KWB 2010.11.04

Fortunately, Visual Studio makes it easy to automate this.

Creating a macro in Visual Studio

The macro below will insert the name of the currently logged in user and the current date. For me at my current client, my domain login (reflected by Environment.UserName)  is “Keith.Bluestone:”

// ALERT: take me now, Lord!  Keith.Bluestone 11/8/2010

Go to View | Other Windows | Macro Explorer on the Visual Studio menu:

Macro Explorer menu item

In any public module, create a new sub named AddInitialsDateTime – or pick any name you like. The sub should look something like the code below.  I chose to identify my comments with my Windows login name, which ensures consistency within a corporate environment – but you can customize this to include whatever information you like! 

Imports System
Imports EnvDTE
Imports EnvDTE80
Imports EnvDTE90
Imports EnvDTE90a
Imports EnvDTE100
Imports System.Diagnostics

Public Module Module1

‘ Add name of currently logged in user with current date/time
‘ Handy for timestamping comments
Public Sub AddInitialsDateTime()

Dim textSelection As EnvDTE.TextSelection

textSelection = CType(DTE.ActiveDocument.Selection(), EnvDTE.TextSelection)

textSelection.Insert(" " + Environment.UserName + " " + Date.Now.ToShortDateString)

End Sub
End Module

Hooking your macro up to a key shortcut

Now that you have entered the macro, you simply need to hook it up to a keyboard shortcut.  Close the Macro editor and open Tools | Options | Environment | Keyboard. Search for commands containing your macro name and choose a key combination;  I chose “Ctrl-D Ctrl-D” because it’s really easy and it’s not used by anything I need.  Make sure you click the “Assign” button to commit your shortcut choice.

Assigning a keyboard shortcut to a macro

Testing the macro

Now go to any code file, enter a comment, and hit Ctrl-D Ctrl-D or whatever shortcut you chose:

// I love automation…   Keith.Bluestone 12/15/2010

Enjoy.

Conceptual coherence in software design

There’s no doubt about it, a good design is hard to find.   But design is critical in software engineering, just as in hardware engineering – bridges, skyscrapers, iPads, anything in the physical world.  Design affects almost every aspect of product development and its costs – and in software, the most costly one is maintenance. 

I always try to take extra time in the design phase to ensure that I’ve got it right, since I find that with a good design, everything is easier – from the start.

clip_image002How do you know if you have a “good” design?   After all, no one in the world can sum up the essentials of good design in a meaningful way for everyone.  And rather than empty academic prescriptions — “flexible, layered, blah blah blah” — I tend to use a series of acid tests whenever I’m coming up with a software design.  For instance:

  • Does the design meet the needs of the project?
  • Is it flexible and extensible enough to meet the likely needs of the business moving forward?
  • Can the development team pull it off easily to meet committed deadlines?
  • Is it conceptually coherent?  Would someone easily understand it, or would it be an effort to explain? 

There are others, too; but in this post what I want to talk about is near and dear to my heart: whether a design is conceptually coherent, and whether it affords easy use. 

Conceptual coherence

Sounds fancy, but what this really means is: “does it make sense?”  Do the concepts “stick together?”  (From the Latin, co-, together, + haere, to stick.)   If you’ve factored your design into classes and entities, do they make sense at a high level?  Can the classes, components, and services in your design be easily understood and used by other developers?  Or do developers have to understand and master details that should really be hidden from their sphere of concern?

A good design will make it easy to create robust, working software:

  • Classes and components map directly to real-world entities
  • Object model is easy to use and supports real-world scenarios (containment, iteration, etc.)

Let’s look at each one of these aspects in turn.

Classes and components map to real-world entities

The classes and entities you create in your design should map almost 1:1 to real-world entities.  Name your classes and entities in terms that end-users understand; use vocabulary familiar in the solution domain.  We programmers are generally such a smart lot that we tend to invent alternate realities, populated with their fantastic devices and widgets. Sometimes we lose sight of the real one, and that eventually have to explain ours to someone else in order for them to use it.  Avoid this like the plague!  Keep it simple: classes and components should model “things” in the real world.

An astute reader pointed out that this is very similar to the concepts of domain-driven design, which is true:  see additional links at the end.

clip_image004Say you’re developing the My Golf Genius application, which will allow golfers to track their play on courses across the world, and will use massively parallel service-oriented clustered computing architectures in the cloud to apply brilliant heuristic analyses to captured golf rounds, bettering the game of the average Joe by over 8 strokes! 

Your classes, objects, and entities should model real-world things like golfers, clubs, courses, holes, tees, strokes, handicaps, and so on.   Even if the implementation seems trivial, create and keep entities separate if that’s the way we think about them in the real world.   In code, classes and objects are super-cheap to create and manage these days in all aspects, from compilation to memory footprint to runtime.  The benefits of conceptual clarity far outweigh the shortcuts. 

For example, our My Golf Genius application architect needs to keep track of the number of strokes per hole during play;  he could reasonably store the number of strokes as an integer StrokeCount in a MyGolfApp.Hole class.  After all, it is a “count” – and counts are integers or longs, right?  It’s design overkill to create a whole object to represent each stroke, right??

clip_image006

Not so fast.  What happens when you need to extend your successful My Golf Genius application to provide stats on how your success in sand saves correlates with the time of day, the weather, and the stock market trends that day?  You don’t have this info:  you only have “StrokeCount” for each hole.   The problem is that "StrokeCount” is really a rollup of the real data. The solution is, of course, to have a list of Stroke objects, each of which can be any legal type of stroke, such as a PenaltyStroke or a SandStroke.  You can even add Drives and Putts – which of course, derive from MyGolfGenius.Stroke.  

If the architect of My Golf Genius had started out simple, he would have captured the reasonably relevant details of what really happens on the golf course: a series of strokes are taken on each hole by a golfer.  Extending the application to capture stroke detail – which of course enables enhanced analytics for Joe the Golfer –  would then be a straightforward task. 

clip_image008

An exception to this rule might naturally be  an “extreme” design, such as a high-performance computing system or any system where memory, storage, or performance requirements might require a tailored approach.  Few systems are like this.

The model supports real-world scenarios

It’s important beyond all that the models and classes and entities you come up with are usable.   They should work together easily and make it difficult for developers to screw up.   A lot of this happens naturally if you have composed your entities according the “looks like the real world” directive above.   Developers can interact with your model in a straightforward way to access and use entities in the system. 

For instance, we’d certainly be happy to see a clear model like this, which every developer is sure to understand:

clip_image010

Avoid creating leaky abstractions,” where the user must understand underlying implementation details or “gotchas”  in order to use your object model correctly.   The developer’s mental image of what’s happening must match what the objects are doing in software;  if there’s a disparity, the developer doesn’t really understand what’s going on.

Conclusion

Model your designs after the real world, and your solutions will be easy to create, easy to use, and easy to test.  By not inventing an alternate world in software, the team will have a ready common language of objects between them, with an existing understanding of how they interrelate and function.  In addition, other stakeholders such as business analysts and testers will find it easier to relate to the system.

***

More reading

  • Joel on Software: The Law of Leaky Abstractions
    Abstractions “leak” when you have to understand the underlying implementation details in order to use them.
  • Domain-Driven Design,” Eric Evans (Amazon.com)
    The book that started it all: four stars, well-loved by its readers. 
    “Readers learn how to use a domain model to make a complex development effort more focused and dynamic. A core of best practices and standard patterns provides a common language for the development team.”
  • Domain-driven design (Wikipedia.com)
    ”Domain-driven design (DDD) is an approach to developing software for complex needs by deeply connecting the implementation to an evolving model of the core business concepts”
  • The Secret to Designing an Intuitive UX: Match the Mental Model to the Conceptual Model (UXMag.com)
    An analog in user interface design – again, the paradigm of matching existing mental models. 
    “The secret to designing an intuitive user experience is making sure that the conceptual model of your product matches, as much as possible, the mental models of your users”

Better UX: application workspaces

[tweetmeme source=”KeithBluestone” only_single=false]Lately I’ve had some questions and thoughts about software usability, specifically in the area of desktop software personalization:   why do I have to care so much about where I compute?  

  • If I’m writing a book, why can’t I just sit down at any of my computers, open it in Microsoft Word, and work on it? 
  • Why are all my browser bookmarks and favorite sites different on each computer I log into?   If I install Firefox – or Internet Explorer — on a new computer, why do I have to go through and re-set all my favorite options and settings?  
  • When I install my favorite blogging software (Windows Live Writer), why do I have to re-create all of my blog accounts on the new computer?  And re-import a copy of my glossary (quick links)?  Why can’t I just “log in” and have everything set up the same, on every computer? 

And so on, down a long list of applications.  Why do we have to care so much about the where in computing, so often? 

Looking backwards

I think it’s part cultural artifact, part technological artifact, and part “chocolate ice-cream factor.”  The cultural aspect is that it’s always been done this way: why change?  The technological aspect is we simply don’t have well established frameworks and programming paradigms to provide usable identity-based support in our apps, especially desktop apps (“rich clients”).   The chocolate ice-cream factor is that people don’t realize how much easier it would make computing in their everyday lives.

imageChocolate ice cream factor: when people turn down choices simply because don’t realize how really good it can be — because they’ve never had it before; the status quo is good enough. Chocolate mixed with ice…  and cream?  Yeccch.

I coined this term in the late-1980’s after University Microfilms, where I worked as a software lead, spent $70,000 on a study by a high-falutin’ strategy consulting firm in Boston to tell them that no, people didn’t really want to read magazines on computers in gorgeous high-resolution, but “were fine with” a plain text transcription only (the existing format).  Eventually common sense prevailed… but still.  

“If I had asked them what they wanted, they would have said ‘Faster horses.’ ”
— Henry Ford

So I say to the product management visionaries, CTO’s, and chief software architects of the land:  provide personalized workspaces in your applications. Don’t do it for me; do it for your users, who will thank you a million times over.

And do it to be competitive with web applications, who are rapidly eating your desktop lunch. 

“In other words, the desktop is simply the means by which a user loads a browser. It’s a gateway. The value is not in the desktop anymore. It’s in the browser, which is the new desktop, in terms of real functionality delivered.”
Google competes for the future; Microsoft, the past (CNET, October 2009)

Next-gen Windows?

MicrosoftMicrosoft could bestow this vision on the world by designing an application-based “virtualizable workspace” – say, in a next-generation version of Windows — which would transparently virtualize your documents.

Imagine logging on to any Windows 2012 desktop in the world and having Windows customized exactly like your personal home desktop theme!  Not to mention having your documents securely and transparently mirrored on the desktop and in your document folders.  If workspace-enabled client applications are installed on the local machine (FireFox, QuickBooks, whatever), they would behave exactly like they did on your home desktop.  Awesome!

OK, you can put the top of your head back on now.  Tell me this wouldn’t help Microsoft in the fight against the browser invasion…

Or Microsoft could innovate in the .NET framework arena, and marry user information (an authenticated user profile) with a filesystem proxy which writes to “a cloud.”  (See code snippet below.) No more local, file-based filesystem — or only by special request, via special API, or by advanced users.  For existing legacy apps, it might even be pretty easy to retrofit them into this new storage paradigm.

Aside: I saw recently Microsoft’s introduction of Windows Live MeshimageNow this is a particularly boneheaded example of Microsoft’s technobfuscation of useful features.    (I just made that word up especially for WLM.)  First of all, I, a 30-year veteran of technology, cannot understand without drilling into it what the hell is going on and what I really have to do to use it.  It’s all hidden behind marketbabble and hype.  Mesh??  I don’t need a mesh – I need my files and folders!  For a clean, clear-headed rendition of filesharing that really works, see Dropbox.com (watch the excellent 2 min intro video).

For the record, a primary definition of the word mesh is “something that snares or entraps.”  Nice.

imageI haven’t studied how Apple’s iOS conceptualizes application storage, but I wouldn’t be surprised if theirs is pretty close to the right way.  One thing is for sure: users of iPhones and the iPads never have to worry about a “filesystem.”

Microsoft, are you listening?

Design notes

Let’s think for a brief minute about how this vision might be realized, at the level of software design and architecture.

Remember who you're dealing withPersonalization is enabled by having an authenticated identity: at the very least, a currently logged-in user profile.

The application workspace is the stuff the user’s working on, whatever is produced by the application:  files, projects, documents, etc.  A “workspace” is an easily graspable paradigm for the user – read: better UX — as well as an architectural abstraction that works extremely well in software.

Imagine the joy of “logging into” any Microsoft Word installation, on any internet-connected PC, to find that Word has instantly configured itself with all your favorite options, toolbars, and settings!  Priceless.

Implemented properly in code, the workspace as a storage provider abstraction completely isolates and protects the application from the physical details of where the documents and settings are stored.

At the same time, the workspace concept opens up a world of flexibility: the user’s workspace can be stored in “the cloud” – pick one – or via a central corporate web service, in a database, or even on a local filesystem if needed.

In fact, it makes offline workspace access a snap, since a local workspace can be easily synced with a central master as needed.   It makes backup a model of simplicity, since the workspace is the unit of backup, not a whole bunch of disparate, messy files & settings.  Sigh…  good design makes everything easy.

But the net effect is that the design lets the user work efficiently wherever the app is installed,  just like the incredibly successful model of a web application.  I can’t help but think this would be a significant competitive advantage in today’s marketplace.

FireFox, are you listening?

image
Now that’s a workspace

Executing a framework strategy

The problem with implementing personalization and location-independence in desktop apps is in the framework – or specifically, the almost complete lack of it.  There just aren’t built-in facilities to either Windows or .NET to do this kind of stuff.

The Microsoft .NET stack, as wonderful as it is,Microsoft .NET provides no additional built-in storage abstractions beyond those of the Windows file and folder.  Instead, each Microsoft developer or team must venture into the wilds of software-land and cobble together a custom implementation of the personalization and workspace concepts, involving both an identity model and a remote storage provider model. 

For one, it can be very expensive.  From my experience, it’s possible only in more well funded teams – that is, if Management in its infinite wisdom even decided it should be done in the first place.  Number two, the level of complexity in design and implementation is such that only the most technically elite teams could pull it off well, while maintaining commitments to project schedule.

For those companies and teams that can pull it off, they will reap the benefits of a “wow” factor from their users.  In addition, with the proper architectural guidance, a single implementation can be leveraged across an entire product portfolio.

It’s a compelling vision.

Some example code

A simple little example, where an “AppWorkspace” framework-style class implements .NET-compatible replacements for file and folder manipulation.  I imagine an authenticated user profile, attached to the current thread context, would be obtained by any of the available services out there, from Windows Live to OpenID.

// current practice = locally bound files
using (File f1 = File.Open(@"C:\MyStuff\mystuff.doc", FileMode.Create))

{

    // write user stuff to local file

}

 

// proposed concept: location-independence – no “C:”

using (File f2 = AppWorkspace.File.Open(@"mystuff.doc", FileMode.Create))

{

    // write user stuff to omnipresent “workspace”

}

Conclusions

Identity-aware, personalized, location-independent computing just plain old helps us get things done.   Because we don’t have to worry about where we’re computing, we – the users — can always focus on the task at hand: the document or blog post, the PowerPoint or the poem.

Today, legacy paradigms still haunt the modern desktop domain.  Until all applications can be effectively written in a browser (HTML 5, anyone?), there will still be a need for locally installed rich desktop applications.   As developers, we should implement truly personalized experiences, freeing the average user from having to think about physical location and – horrors!  a file system.  It’s just the right thing to do – it takes the user experience to a new level. 

And in case, there’s anyone alive today who does not understand that the new game in town is user experience (UX), then I have three words for you:  Apple Computer, Inc.   So then it’s decided:  let’s allow our users to log into any Internet-connected computer, and have their applications behave exactly as they did on the home machine.

Now that’s what I would call progress.

***

More reading and references

  • DropBox.comGoogle-esque in its clarity and simplicity.  And it just works beautifully!  I use Dropbox to sync files between about four or five PC’s.   Best of all, it’s free!
    “Dropbox allows you to sync your files online and across your computers automatically.”
  • Windows Live Mesh.  Microsoft’s new offering Confusing to me – if anyone has an experience to relate, please share it with us. 
    “Working on one computer, but need a program from another? No problem. Use Live Mesh to connect to your other computer and access its desktop as if you were sitting right in front of it.”
  • How "Windows Live" Is Obscuring Some Actually Good Products (LifeHacker, October 2010)
    “Microsoft has some cool products hiding behind ridiculously confusing names. Users of the very nifty  Live Mesh file and desktop syncing beta, for example, were told their service shuts down in March 2011. Where should they migrate? Windows Live Mesh, of course.   
    “Microsoft is emailing Live Mesh beta users and explaining how they can transition their files to Windows Live Mesh. It involves not a small bit of re-configuring the folders you want to sync, the settings you’d like for each synced folders, and waiting while your folders all move over to the new Windows Live Mesh servers….
    But it’s the naming, and duality of names, that puts people off—people including your Lifehacker editors, if I do presume to speak for most of us. The fact that somebody pulled the trigger on a mass user email saying, essentially, "Live Mesh is dead, so use Windows Live Mesh" is pretty astounding. To then require that users pull off what amounts to a manual transition of folders they wanted to set-and-forget for syncing is just salt in the weirdly worded wound.

Afterthoughts: design practices

In case there’s anybody still reading:  you have too much time on your hands, get a job!  Here are a few off-the-top-of-my-head thoughts about how a team might go about designing a virtualization solution.  It’s not rocket science, but… 

A core best practice that I always try to follow in new designs and architectures is: don’t reinvent reality.   Software designs and architectures are best when they reflect the way things really are.   If you’re in the banking vertical, your domain of interest includes people, tellers, accountants, accounts, deposits, withdrawals, and so on.   It’s easy to express a banking solution in the same language;  in fact it’s a hell of a lot easier, since you the architect and your posse of developers didn’t have to come up with a whole new set of objects – which is what most teams do, more or less. 

No, in good design, types of items in real life typically map directly to classes in code-land.    Things that can happen in real life become methods in code.  It simplifies life for architects, developers, and testers:  basically anyone who has to work with code.   Everyone understands the code structure and its classes, because they’ve seen them all before – in “real life.”

In the case of application workspaces and personalization of desktop applications, “real life” says the user is truly one and the same person, no matter where she sits down (logs in).  The “identity-aware application” might reflect this by having a current user profile

Likewise, the user has – in general – one set of files and documents they’re working with:  this is the user workspace.   The workspace proxies the Windows filesystem, looking exactly like it, and enabling centralized, one-shop storage and retrieval of  user information (profile), preferences and configuration settings, and any documents or content items that the user was involved in creating.    Like web apps, it would seem that all file paths would have to be “root-relative,” and not refer to a specific physical directory; an exception would be thrown otherwise.

ASP.NET’s dirty little secret

image[tweetmeme source=”KeithBluestone” only_single=false]It’s no wonder most developers are confused about Microsoft’s ASP.NET. Although it was released over eight years ago in January 2002, there are few if any coherent explanations of the ASP.NET page life cycle. This is unfortunate, as it’s probably the most critical aspect for web developers to understand.

The Microsoft documentation is almost worthless (see below), and strangely enough, books and blogs give it only a cursory treatment, without really explaining how best to use it – and why.   The net effect is a huge number of developers who struggle to make their ASP.NET applications work.

It’s ASP.NET’s dirty little secret: very few developers really understand the page life cycle and the page model, and there’s little meaningful guidance to be found.   From a Microsoft product/program perspective, this seems like a monumentally costly mistake: developer guidance – or lack thereof – has a huge impact on the success of the entire product ecosystem, from Microsoft to developers to clients to end-users.

In a future post: I’ll offer up my own take on the ASP.NET page life cycle, with metaphors that developers can relate to.

When documentation isn’t guidance…

Sure, there’s plenty of documentation reams and reams of it. There are a thousand articles which offer functional, mechanical, and temporal descriptions of the page life cycle and its ten events.

But there’s little that makes sense to the everyday developer.  There is no real guidance, no best practices for page architecture that tells us how a well-designed page should be factored.  There is plenty explaining the order of events;  but little explaining their meaning.

I’m not sure why Microsoft let this happen: ASP.NET is a great framework.  It’s hard to imagine what the development costs associated with this must be across the industry.  You think that Deep Horizon spill is big?  This one’s been leaking since January 2002!  One can only imagine the millions of developer hours spent debugging issues related to an imperfect understanding of the page lifecycle.   I know I’ve spent my share.

A hammer and a saw does not a carpenter make.
— Anon.

Backstory

I’m doing  architectural reviews of a couple ASP.NET projects for my client.  These sites are regular, relatively low-volume, get-things-done business apps:  a lot of forms-based data entry, forms authentication, a healthy dash of reporting, some middle-tier WCF services, and a handful of back-end databases.  The “meat and potatoes” of ASP.NET, as it were.

While the sites actually work pretty well, the ASP.NET code is a mess. (I’m sorry, I couldn’t come up with any nicer way to say it.)  It looks like it was written by developers who had success writing basic ASP.NET forms – like most of us, once upon a time — but then everything went to hell in a handbasket in the seething cauldron of project schedules, nested user controls and AJAX postbacks.

Personal aside: when I first began doing ASP.NET around 2002, I had no clue. I had a simplistic, relatively inaccurate conceptual model of the page lifecycle,  limited primarily to a dysfunctional love-hate relationship with the Load and PreRender events.   I certainly didn’t have anywhere near a full understanding of viewstate, databinding, or the postback model.  Like most of the developers out there, I suspect, I learned just enough to get by – and got myself into a heap of trouble as the pages I was building became more complex.

“Lifecycle not optional”

The facts are this: without a thorough understanding of the page lifecycle, the ASP.NET developer is doomed to a long and rocky road. The same may be said for the site’s end users, who will have to deal with bugs and erratic behavior.

The lifecycle is central to the page developer’s world:  it weaves together page creation, a hierarchy of controls, viewstate, postback, eventing, validation, rendering, and ultimately, death — against the backdrop of the page model and the HTTP application pipeline, which fuse the the protocol of the HTTP request with the mechanism of rendering an HTML response.

Misconceptions about core concepts like the page lifecycle cause a huge number of problems.  When a page becomes a little more complex than a basic form – a few user controls here, some databinding there – things start to break.  Under schedule pressure, developers start relying on oh-so-clever software hackery — which in the best scenario, actually makes things work reasonably well.   The best result, unfortunately, is a hacked-up codebase that only works “reasonably well.”

It is important for you to understand the page life cycle so that you can write code at the appropriate life-cycle stage for the effect you intend.
ASP.NET Page Life Cycle Overview (MSDN)

The control life cycle is the most important architectural concept that you need to understand as a control developer.
Developing Microsoft ASP.NET Server Controls and Components (Microsoft Press)

image

Back to the client

In my client’s code, I consistently found treatments of the page Load and PreRender events that showed a lack of understanding of the life cycle.   Sometimes everything was done on Load, and sometimes on PreRender;  sometimes postback conditions were respected, and other times, not so much.

The code was also guilty of minor abuse of session state (using it as the backing store for user controls), occasional major violence to user controls (page reaching in, control reaching out — a violation of encapsulation), and a near-complete aversion to object-oriented design (primarily a procedural approach).

Some of this, like using session state to back user controls, I believe was due to the fact that the developers didn’t understand the life cycle:  how the page databinds its controls, in what order it all happens (top-down or bottom-up?), and how data flows on the page throughout the life cycle.  In this case, using session state worked — for one control on one page, in one window — so it shipped.

This would all be duly noted in the architectural report.  But to be most helpful, I really wanted to give the client team some rock-solid guidance and best practices for developing ASP.NET web forms:  what should one do to write clear, stable, maintainable ASP.NET applications?  What kind of code should go in Init, Load, and PreRender? How to explain clearly to the developers what the events really mean?

20% of the effort produces 80% of the results I started off thinking that compiling best practices would be a reasonably simple affair involving a few web searches.

The search…

I was looking specifically for ASP.NET “best practices” related to the page lifecycle that would address questions like these:

  • What key concepts should developers keep in mind when designing ASP.NET pages to work in harmony with the page lifecycle?
  • How should a developer best structure the page to address common scenarios, such as a form which accepts input from its users?
  • What kinds of things should be done during the page Load event?  In the Init event?  In PreRender?    What are some guidelines for deciding when to put what types of code in Load vs. PreRender? Init vs. Load? Etc.
  • What kinds of things should be done –or not done — during postback (IsPostBack == true), especially in specific, basic scenarios — like that standard data input form?
  • What key concepts should developers keep in mind when using controls throughout the page lifecycle?  E.g. what are the responsibilities of the page or parent control vs. those of the child control?

Surely there was a site that would distill “The Tao of ASP.NET” or somesuch for the poor developer; something that would convey a clear, coherent conceptual model.

The emperor has no clothes

What I found first was the MSDN documentation from Microsoft on the ASP.NET Life Cycle.   This might charitably be described as mind-numbingly functional. Can I get a witness??

From the official ASP.NET Page Life Cycle Overview on MSDN:

Use this event to read or initialize control properties.
— Explaining the page Init event

Use the OnLoad(EventArgs) event method to set properties in controls and to establish database connections.
— Explaining the “typical use” of the page Load event

If the request is a postback, control event handlers are called.
— On the postback event handling phase

That’s it?  In Load, we should “set properties in controls and establish database connections?”  To me, this is of shockingly little value.  At this point I was thinking, wow, if this is the official guidance, it’s no wonder most developers don’t really understand the ASP.NET page life cycle.

It’s really hard to believe that this is the extent of Microsoft’s offering on such an important subject to its developers.   Don’t they understand what it means not only for Microsoft and its image, but also developers and their clients, the users… the whole ASP.NET ecosystem?

Sigh.

Order is not the whole of knowledge

Outside the official Microsoft channel, I found scores and scores of sites which catalogued the various lifecycle events, many of them replaying the MSDN documentation without much additional detail.  Site after site described  the page life cycle and its events as if  “to order” were “to know.”

There were few meaningful metaphors for developers to relate to, and little if any guidance as to how to actually use the events in a practical manner.  Witness again:

Objects take true form during the Load event.
— “The ASP.NET Page Life Cycle,” 15seconds.com, in an apparent fit of Buddhistic nihilism

The Pre_Render [sic] event is triggered next and you can perform any update operations that you require prior to saving the View State and rendering of the web page content.
— “Understanding the ASP.NET Page Life Cycle,”  aspnet101.com

PreRender is often described as “the last chance to do something that affects viewstate.” Am I the only one who doesn’t find this particularly helpful?

Personal aside: I didn’t understand viewstate fully; until recently, that is, when I read Dave Reed’s epic, thorough, and now-classic treatment in Truly Understanding Viewstate.   If you want to be an ASP.NET pro, read this excellent post.

It’s a gusher…  a leaky abstraction

After many hours across days of searching, and hundreds of articles read, I found few clues.  I did find grim evidence in blogs and forums that what I was after might not exist:  a large number of developers expressing frustration with how ASP.NET works:

Let me tell you all that if you want to work from home and be a freelance web developer, ASP.NET will stress your life out and you will die young.
— From a multi-year discussion on the post “ASP.NET Sucks” by Oliver Brown

WebForms is a lie. It’s abstraction wrapped in deception covered in lie sauce presented on a plate full of diversion and sleight of hand.
— “Problems or Flaws of the ASP.NET postback model,” quoting Rob Conery in “You Should Learn MVC,” channelling Will Ferrell

Are there any best practices or recommendations around using "Page_PreRender" vs "Page_Load"?
— Max, a lonely voice in the wilderness,  on the thread “Best Practices around PreRender and Load”  (July 2007)

image

" WebForms is a lie! It’s abstraction wrapped in deception covered
in lie sauce presented on a plate full of diversion and sleight of hand!"

Finally, from respected veteran blogger and ASP.NET consultant Rick Strahl:

It can sometimes get very difficult to coordinate the event sequence for data binding, rendering, and setup of the various controls at the correct time in the page cycle. Do you load data in the Init, Load or PreRender events or do you assign values during postback events?
— “What’s Ailing ASP.NET Web Forms,” Rick Strahl (Nov 2007), citing complexity in the page lifecycle as a major ASP.NET issue

If it’s “very difficult” for an ASP.NET guru, what hope is there for Joe and Jane ASP.NET Developer?

Amidst the sea of holes…

In the middle of all of this I did find this gem of conceptual guidance:

Web Forms is framed as "page-down" instead of "control-up"… From the perspective of a control, databinding looks like "someone who is responsible for me, because they contain me, will set my data source and tell me when to bind to it. They know what I am supposed to do and when, *not* me. I am but a tool in the master plan of a bigger entity."

From the perspective of a page, databinding looks like: "I am the biggest entity on this page. I will determine relevant data, set data sources on my children, and bind them. They will in turn set data sources on their children and bind them, and so forth. I kick off this entire process but am only responsible for the first logical level of it."
Bryan Watts in a reply on Rick Strahl’s blog, offering a conceptual model of page-control interaction

Brilliant!!  Watt’s Law of the page-control interaction. Finally, a conceptual model that a developer can relate to, understand, and most importantly, help teach effective page and control design.  This simple guidance conveys:

  1. How the control should be structured to work with its parent (page will set my data source and bind to it); and
  2. How the page should be structured to work with a child control (I will set the data source of the control and tell it when to bind)

This was the only real ray of light so far.  I forged on, throwing hail-Mary terms like “zen” and “tao” into my search.  It seemed like the more I read, the less I understood.

This was partly because as I delved further I realized there were aspects of the page model and lifecycle that I didn’t truly understand, either.

The great white (paper) hope

Undaunted, I renewed my subscription to Safari Books Online, publishing home to some of the world’s finest technical authors.  If the answer was anywhere, surely it would be here. I lapsed briefly into optimism:  maybe the blogosphere had eschewed its treatment of this topic because the publishing world had illuminated it so thoroughly?

Not so much. Most books, while fantastic learning resources – far better for learning than reading random articles on the web –  spilled precious little ink on the page life cycle, rushing on to more exciting topics like databinding and the GridView control.

image ASP.NET 3.5 Unleashed treats the life cycle almost dismissively:

Ninety-nine percent of the time, you won’t handle any of these events except for the Load and the PreRender events. The difference between these two events is that the Load event happens before any control events and the PreRender event happens after any control events.
ASP.NET 3.5 Unleashed, “Handling Page Events

imageJesse Liberty’s Programming ASP.NET 3.5 presents a great life cycle flowchart: it has a lot of order and flow information, but little guidance.  A page or so later, and the lifecycle is dragged outside and summarily dispatched in the following fashion:

The load process is completed, control event handlers are run, and page validation occurs.
Programming ASP.NET 3.5,Website Fundamentals: Life Cycle

The excellent book Developing Microsoft ASP.NET Server Controls and Components (Kothari /Datye, Microsoft Press) is a must-read for any serious ASP.NET architect or control developer.  But they, too, shy away from the concerns of guidance:

In this phase, you should implement any work that your control needs to do before it is rendered, by overriding the OnPreRender method.
Developing Microsoft ASP.NET Server Controls and Components, “Chapter 9, Control Life Cycle

image

From Programming ASP.NET 3.5: running before walking

As I read each book’s treatment of the life cycle, I kept thinking, how can you move beyond the page life cycle when you haven’t described what we’re supposed to do with it?

Conclusion

What do the Init, Load, and PreRender events really mean? How should developers architect pages to address common scenarios? Where are the workable models and metaphors?   The order of page events is only one aspect of their meaning. And explaining events in terms of their interaction with viewstate, postback, or the control tree is most useful only when one thoroughly understands those concepts.

I am a big proponent of software designs and user interfaces that follow natural conceptual models: it’s a win-win for developers and all stakeholders across the entire software development lifecycle.    See my post on metaphors in software design.

In the end, it took a very long time and a lot of dogged persistence for me, an experienced software architect, to feel like I thoroughly understood the ASP.NET page model and its lifecycle:  the “bottom-up” of control initialization vs. the “top-down” of control loading, the meaning of the Init, Load, and PreRender events — and specifically what kind of code I would use in them, when and why– and a clear understanding of viewstate and postback.

Why burn money when developing ASP.NET pages is so much more fun?Along the way, I realized why a lot of developers have issues with ASP.NET.  It’s ASP.NET’s dirty little secret: very few developers really understand the page lifecycle and the page model – and very little “real” documentation exists for it.

From a Microsoft product/program perspective, this seems like a monumentally costly mistake. On my Googling travels through ASP.NET forums and in blog post comments, I “met” tons of developers who were struggling with ASP.NET – and who are still struggling with it.  It’s pretty congruent with my own experience, and of the ASP.NET developers I’ve known in person.

As far as I’m concerned, the best architectures are those that are almost impossible to screw up.  Despite its power and flexibility, ASP.NET falls short on that account.

As far  as Microsoft goes, devoting time, care, and energy to make sure developers have crystal-clear guidance represents a huge opportunity for the success of the product ecosystem. Imagiimagene how many millions of corporate development hours have been spent since January 2002 debugging ASP.NET apps because a developer didn’t understand the page lifecycle, viewstate,  postback, or Watt’s Law of Databinding.

Even worse, how many clients and end-users were frustrated because their Microsoft-based web apps didn’t work properly? You can see how this issue could have disastrous implications for the ecosystem.  As a long-time Microsoft fan, I hope they make sure this never happens again.

Coming soon: in my next  post, I’ll offer up my “three dualities of ASP.NET,” clear metaphors that will enable ASP.NET developers to understand the lifecycle and write complex applications that work well.

***

More Reading

  • Truly Understanding ViewState, Dave Reed (Infinities Loop)
    A comprehensive and humorous look at the uses and mis-uses of viewstate; a definitive resource and must-read for the professional ASP.NET developer:
    “It’s not that there’s no good information out there about ViewState, it’s just all of them seem to be lacking something, and that is contributing to the community’s overall confusion about ViewState.”
  • Understanding ASP.NET ViewState, Scott Mitchell (4GuysFromRolla.com)
    A great starter article.
  • The ASP.NET Page Object Model, Dino Esposito (MSDN)
    Another excellent article from an top-notch writer…  but shy of guidance.
  • What’s Ailing ASP.NET Web Forms, Rick Strahl (West-Wind.com)
    A great post with pros and cons of ASP.NET and a comparison to MVC.
    “Microsoft built a very complex engine that has many side effects in the Page pipeline.”
  • 10 Reasons Why ASP.NET WebForms Suck, JD Conley (“Starin’ at the Wall 2.0”)
    Like a true mockumentary, this dial actually goes up to 11.  What gets that honor?
    “Number 11: The odd feeling that you have to beat the framework into submission to get it to do what you want.”
  • Page Load is Evil, Wayne Barnett
    I can’t really say I agree. But this post is accompanied by a deliciously raging discussion that boils over into near-flame-war at the end.  Unfortunately with no pat answers to what to do in Load or PreRender or Init, for that matter.
    “It seems to me that the root problem is that the page life cycle isn’t given the respect it deserves in tutorials and documentation.” – Anonymous poster
    I would desperately like to see more on this subject [of page life cycle]!  … I’ve tried reading up on this before, but from a designer/programmer’s point of view it’s sometimes difficult to get my head around the order of events or their meanings.” – Chris Ward, followup poster

Design Q&A: usage quotas in WCF services

Question

I’ve got a WCF service on which I want to restrict the number of calls per hour – on a per-user basis. For example, max 1000 calls per user, per hour (a la Google Maps, etc).   I also want to implement some sort of subscription mechanism, so that users can upgrade their call-limit across various ‘price plans’.

I know that I could achieve this with a custom Inspector, backed by a DB containing some sort of ‘subscription’ table and a counter, but I’d like to avoid reinventing the wheel.

Thanks, Eric

Answer

I don’t know if there are any off-the-shelf packages to do this (anyone listening? could be an opportunity!), but here are my quick thoughts on the issue:

  1. Your requirement is “within the last hour” — let’s say “time period” instead of hour, since that can be changed easily. You’ll have to keep track of all the calls by that user within the time period, as well as have some kind of mechanism to roll off or archive this data. If you’re storing in a database, this can be a significant performance issue, depending on your database, the # of users, the number of calls made per time period, the “weight” of your service methods (i.e. amount of work done), etc.

    It’s pretty easy to design a generic interface that will let you splice in caching if you need it — but you will also want to track the total time spent retrieving API/service limit info, to make sure your usage quota enforcement isn’t slowing down your service too much.

  2. Partition the quota-limited functionality at the service level if possible — not the individual operation or method. If you can make the limits apply to use of an entire service and to just specific or individual methods, everything will be easier: the code, the tracking, the user’s understanding, etc. In general, that is…
  3. The proper place to intercept & check is not in a message inspector IMHO, but in the OperationInvoker. Install a custom operation invoker via a service-wide behavior, and you will lock down the entire service. In addition, you will have access to post-message-processing info, like the authenticated user name etc. See Skonnard’s article on MSDN “Extending WCF via Behaviors.”

Hope this is helpful. If you decide to do it yourself, make sure to handle concurrency (multiple threads calling into your service at the same time)!

Keith

The metaphor in software design

[tweetmeme source=”KeithBluestone” only_single=false]One major problem with software today is usability: user interfaces are difficult to use, obscure, confusing, and frustrating.   Why?   Because they do not present consistent metaphors to the users, who don’t understand what the software is doing.

Metaphor, metaphor, on the wall

Quicken check formAn interface metaphor represents the user’s conception of what is going on.  UI coherence is achieved if the user’s mental model mirrors what’s going on in the software.

As in life, software interface metaphors should be simple, common, and relevant.

An early example was Intuit’s Quicken bill payment software:  Quicken presented check-writing and bill payment functionality using the metaphor of an ordinary checkbook.

How radical!  Making software in the model of real-world things!  If only more software was made as easy to use as Quicken…

The checkbook and its checks are familiar to all of us — well, most of us.  As a result, we immediately understand what the application is doing and find it highly user-friendly.

Poor design strips context

An impaired metaphor – a poor design — for check-writing might present fields in a table-like layout, labelled “number”, “payee” and “amount” and “date.”  Think spreadsheet.  (Trust me, it has been done.)

Tabular design-inappropriate metaphor

In this interface design, almost all the aspects of normal, real-world bill payment  have been stripped out.  The process has been reduced to a simple task of data input.  Bland and tasteless, isn’t it?

So what, you might say.  When writing a check, everyone has to enter the check amount and payee anyway — it’s just a few fields of data.  What’s the big deal about the "check" form and all these fancy “metaphors?”  Seems like a big waste of time to me.

Well… no.   Check-writing is just one aspect of managing finances on a computer.   There are a whole host of other activities that need to be addressed, including managing the register, searching for items, reconciling the “checkbook,” and reports.   We already have a conceptual framework for doing most of these things in the physical world.

What are you, the developer, going to do?  Re-invent the wheel for each concept and activity by providing new, “de-natured” modern-age mechanisms for each — a form, a field, a listbox, a grid?   Or will you create familiar models for your users?

Encapsulating familiar metaphors in self-contained packages – user interface controls, for instance – promotes re-use across the codebase (lowering SDLC costs) and a better, more consistent UI (looking and behaving the same everywhere).

Affordances

Affordance is another key usability concept mentioned by Don Norman in his excellent Design of Everyday Things. An affordance suggests or directs the appropriate action.

A good example of a design with a prominent affordance is the lowly doorknob.  It’s shaped and sized for the hand to grab comfortably, and it’s placed at a height which makes for easy access to the hand.

Affordance ensures effective use.  The best designs are ones that can’t be misunderstood, and which don’t cause frustration.  The same applies to software design as to user interface design.

Doorknob

Is your design as good as this? 
(Don’t gold-plate it unless you’re asked to…)

Poor design is very expensive

Defining usable metaphors takes time and dedication to the task.  It takes time to model the entities in the application domain – entities the user will already be familiar with, and a requirement for creating a coherent model.  Often, the designers or engineers cave to schedule pressure and use an assortment of loosely coupled, functional designs:  the fields and listboxes and grids mentioned above.  These get the job done, but often at a high cost to UX, code clarity, extensibility, scalability, and maintainability.  

Individually and for certain tasks, these metaphors may be usable and may make sense.   But no system is an island any more, and complexity and other issues may pop up at system boundaries, interactions, and special cases.

As users, we have high expectations for software, and it’s the unexpected failures, the crashes, the lost documents, that lead us to silently loathe software.   Tell me:  who loves Microsoft Word?   Many use it, many respect it, but very few actually love it.

Conclusion

When designing software, ask what metaphors you’re offering to the user.  The key to usability is modeling your user interface after concepts the user understands fully.  As an added benefit, these concepts generally map one-to-one to objects or entities in your software designs and models, so you’re killing at least two birds with one stone.  There are many other benefits, all of which lead to better software in less time.  

Whatever it is you’re building, develop your metaphors fully, then make the software behave consistently with them.  Your users will thank you – and your clients will re-hire you.

Some great examples include:

  • Quicken’s check metaphor. As mentioned above.  More or less everybody could figure out how to use Quicken in 15 minutes.
  • Control panels. A good control panel or dashboard metaphor enables a user to understand what’s happening at a glance.  Everybody understands gauges, warning lights, and switches;  and almost everybody understands the basic semantics of red (problem/critical), yellow (warning), and green (ok).  
  • Print layout view.  Shows documents on the screen as they would look in printed form;  most modern word processors have this feature.
  • The Macintosh trash can icon.   Trash canContinues the metaphor of a real desktop; understood and loved by all.  (Yet to be understood:  why Apple’s OS designers violated this metaphor most rudely by requiring users to drag their disks into the trash can to eject them.  Tell me you didn’t think long and hard before you did that the first time.)

***

References / more reading

  • The Secret to Designing an Intuitive UX: Match the Mental Model to the Conceptual Model (UXMag.com, April 2010)
    This is what a metaphor enables…  Excerpt:
    “The secret to designing an intuitive user experience is making sure that the conceptual model of your product matches, as much as possible, the mental models of your users.”
  • The Design of Everyday Things (Donald Norman).   Excellent, fun-to-read treatment by one of the world’s foremost design authorities.  Learn about “affordance” and other tenets of good design, illustrated in everyday objects in the world around you.
  • useit.com (Jakob Nielsen).  A partner of Donald Norman (above) in the Nielsen-Norman Group and also a leading design expert.  Great articles and perspectives on usability.
  • Interface metaphor (Wikipedia).  Actually uses the example of files and folders, too. (I wrote this article before I read the Wiki entry.)
  • Tom Proulx (Wikipedia).  CO-founder of Intuit and passionate user interface designer who pioneered software usability testing.
  • Web Design Trends 2010: Real-Life Metaphors and CSS3 Adaptation(Smashing Magazine).