Archive for the ‘Best Practices’ Category

Visual Studio 2010: automate datestamping of code comments

Visual Studio 2010In this post, I’ll show you how easy it is to create a three-line macro to insert your name with the current date in the current file, handy for automatically adding a datestamp and/or a timestamp with your user info to code comments.

Backstory

If you’re like me, you hate repetitive tasks.   But when I’m reviewing or writing code, it’s a best practice to add my name or initials to the code, along with the current date and time.  This ensures that other team members know when the comment was made: there’s a big difference between a comment that says “fix this” that was added two years ago, and one that was added last week.   And adding my name or initials ensures that someone can always track me down and ask a question.

But to the point – I hate typing in the same text over and over:

// ALERT: this should use CurrentThread.Principal. KWB 2010.11.04

Fortunately, Visual Studio makes it easy to automate this.

Creating a macro in Visual Studio

The macro below will insert the name of the currently logged in user and the current date. For me at my current client, my domain login (reflected by Environment.UserName)  is “Keith.Bluestone:”

// ALERT: take me now, Lord!  Keith.Bluestone 11/8/2010

Go to View | Other Windows | Macro Explorer on the Visual Studio menu:

Macro Explorer menu item

In any public module, create a new sub named AddInitialsDateTime – or pick any name you like. The sub should look something like the code below.  I chose to identify my comments with my Windows login name, which ensures consistency within a corporate environment – but you can customize this to include whatever information you like! 

Imports System
Imports EnvDTE
Imports EnvDTE80
Imports EnvDTE90
Imports EnvDTE90a
Imports EnvDTE100
Imports System.Diagnostics

Public Module Module1

‘ Add name of currently logged in user with current date/time
‘ Handy for timestamping comments
Public Sub AddInitialsDateTime()

Dim textSelection As EnvDTE.TextSelection

textSelection = CType(DTE.ActiveDocument.Selection(), EnvDTE.TextSelection)

textSelection.Insert(" " + Environment.UserName + " " + Date.Now.ToShortDateString)

End Sub
End Module

Hooking your macro up to a key shortcut

Now that you have entered the macro, you simply need to hook it up to a keyboard shortcut.  Close the Macro editor and open Tools | Options | Environment | Keyboard. Search for commands containing your macro name and choose a key combination;  I chose “Ctrl-D Ctrl-D” because it’s really easy and it’s not used by anything I need.  Make sure you click the “Assign” button to commit your shortcut choice.

Assigning a keyboard shortcut to a macro

Testing the macro

Now go to any code file, enter a comment, and hit Ctrl-D Ctrl-D or whatever shortcut you chose:

// I love automation…   Keith.Bluestone 12/15/2010

Enjoy.

Advertisements

Conceptual coherence in software design

There’s no doubt about it, a good design is hard to find.   But design is critical in software engineering, just as in hardware engineering – bridges, skyscrapers, iPads, anything in the physical world.  Design affects almost every aspect of product development and its costs – and in software, the most costly one is maintenance. 

I always try to take extra time in the design phase to ensure that I’ve got it right, since I find that with a good design, everything is easier – from the start.

clip_image002How do you know if you have a “good” design?   After all, no one in the world can sum up the essentials of good design in a meaningful way for everyone.  And rather than empty academic prescriptions — “flexible, layered, blah blah blah” — I tend to use a series of acid tests whenever I’m coming up with a software design.  For instance:

  • Does the design meet the needs of the project?
  • Is it flexible and extensible enough to meet the likely needs of the business moving forward?
  • Can the development team pull it off easily to meet committed deadlines?
  • Is it conceptually coherent?  Would someone easily understand it, or would it be an effort to explain? 

There are others, too; but in this post what I want to talk about is near and dear to my heart: whether a design is conceptually coherent, and whether it affords easy use. 

Conceptual coherence

Sounds fancy, but what this really means is: “does it make sense?”  Do the concepts “stick together?”  (From the Latin, co-, together, + haere, to stick.)   If you’ve factored your design into classes and entities, do they make sense at a high level?  Can the classes, components, and services in your design be easily understood and used by other developers?  Or do developers have to understand and master details that should really be hidden from their sphere of concern?

A good design will make it easy to create robust, working software:

  • Classes and components map directly to real-world entities
  • Object model is easy to use and supports real-world scenarios (containment, iteration, etc.)

Let’s look at each one of these aspects in turn.

Classes and components map to real-world entities

The classes and entities you create in your design should map almost 1:1 to real-world entities.  Name your classes and entities in terms that end-users understand; use vocabulary familiar in the solution domain.  We programmers are generally such a smart lot that we tend to invent alternate realities, populated with their fantastic devices and widgets. Sometimes we lose sight of the real one, and that eventually have to explain ours to someone else in order for them to use it.  Avoid this like the plague!  Keep it simple: classes and components should model “things” in the real world.

An astute reader pointed out that this is very similar to the concepts of domain-driven design, which is true:  see additional links at the end.

clip_image004Say you’re developing the My Golf Genius application, which will allow golfers to track their play on courses across the world, and will use massively parallel service-oriented clustered computing architectures in the cloud to apply brilliant heuristic analyses to captured golf rounds, bettering the game of the average Joe by over 8 strokes! 

Your classes, objects, and entities should model real-world things like golfers, clubs, courses, holes, tees, strokes, handicaps, and so on.   Even if the implementation seems trivial, create and keep entities separate if that’s the way we think about them in the real world.   In code, classes and objects are super-cheap to create and manage these days in all aspects, from compilation to memory footprint to runtime.  The benefits of conceptual clarity far outweigh the shortcuts. 

For example, our My Golf Genius application architect needs to keep track of the number of strokes per hole during play;  he could reasonably store the number of strokes as an integer StrokeCount in a MyGolfApp.Hole class.  After all, it is a “count” – and counts are integers or longs, right?  It’s design overkill to create a whole object to represent each stroke, right??

clip_image006

Not so fast.  What happens when you need to extend your successful My Golf Genius application to provide stats on how your success in sand saves correlates with the time of day, the weather, and the stock market trends that day?  You don’t have this info:  you only have “StrokeCount” for each hole.   The problem is that "StrokeCount” is really a rollup of the real data. The solution is, of course, to have a list of Stroke objects, each of which can be any legal type of stroke, such as a PenaltyStroke or a SandStroke.  You can even add Drives and Putts – which of course, derive from MyGolfGenius.Stroke.  

If the architect of My Golf Genius had started out simple, he would have captured the reasonably relevant details of what really happens on the golf course: a series of strokes are taken on each hole by a golfer.  Extending the application to capture stroke detail – which of course enables enhanced analytics for Joe the Golfer –  would then be a straightforward task. 

clip_image008

An exception to this rule might naturally be  an “extreme” design, such as a high-performance computing system or any system where memory, storage, or performance requirements might require a tailored approach.  Few systems are like this.

The model supports real-world scenarios

It’s important beyond all that the models and classes and entities you come up with are usable.   They should work together easily and make it difficult for developers to screw up.   A lot of this happens naturally if you have composed your entities according the “looks like the real world” directive above.   Developers can interact with your model in a straightforward way to access and use entities in the system. 

For instance, we’d certainly be happy to see a clear model like this, which every developer is sure to understand:

clip_image010

Avoid creating leaky abstractions,” where the user must understand underlying implementation details or “gotchas”  in order to use your object model correctly.   The developer’s mental image of what’s happening must match what the objects are doing in software;  if there’s a disparity, the developer doesn’t really understand what’s going on.

Conclusion

Model your designs after the real world, and your solutions will be easy to create, easy to use, and easy to test.  By not inventing an alternate world in software, the team will have a ready common language of objects between them, with an existing understanding of how they interrelate and function.  In addition, other stakeholders such as business analysts and testers will find it easier to relate to the system.

***

More reading

  • Joel on Software: The Law of Leaky Abstractions
    Abstractions “leak” when you have to understand the underlying implementation details in order to use them.
  • Domain-Driven Design,” Eric Evans (Amazon.com)
    The book that started it all: four stars, well-loved by its readers. 
    “Readers learn how to use a domain model to make a complex development effort more focused and dynamic. A core of best practices and standard patterns provides a common language for the development team.”
  • Domain-driven design (Wikipedia.com)
    ”Domain-driven design (DDD) is an approach to developing software for complex needs by deeply connecting the implementation to an evolving model of the core business concepts”
  • The Secret to Designing an Intuitive UX: Match the Mental Model to the Conceptual Model (UXMag.com)
    An analog in user interface design – again, the paradigm of matching existing mental models. 
    “The secret to designing an intuitive user experience is making sure that the conceptual model of your product matches, as much as possible, the mental models of your users”

ASP.NET’s dirty little secret

image[tweetmeme source=”KeithBluestone” only_single=false]It’s no wonder most developers are confused about Microsoft’s ASP.NET. Although it was released over eight years ago in January 2002, there are few if any coherent explanations of the ASP.NET page life cycle. This is unfortunate, as it’s probably the most critical aspect for web developers to understand.

The Microsoft documentation is almost worthless (see below), and strangely enough, books and blogs give it only a cursory treatment, without really explaining how best to use it – and why.   The net effect is a huge number of developers who struggle to make their ASP.NET applications work.

It’s ASP.NET’s dirty little secret: very few developers really understand the page life cycle and the page model, and there’s little meaningful guidance to be found.   From a Microsoft product/program perspective, this seems like a monumentally costly mistake: developer guidance – or lack thereof – has a huge impact on the success of the entire product ecosystem, from Microsoft to developers to clients to end-users.

In a future post: I’ll offer up my own take on the ASP.NET page life cycle, with metaphors that developers can relate to.

When documentation isn’t guidance…

Sure, there’s plenty of documentation reams and reams of it. There are a thousand articles which offer functional, mechanical, and temporal descriptions of the page life cycle and its ten events.

But there’s little that makes sense to the everyday developer.  There is no real guidance, no best practices for page architecture that tells us how a well-designed page should be factored.  There is plenty explaining the order of events;  but little explaining their meaning.

I’m not sure why Microsoft let this happen: ASP.NET is a great framework.  It’s hard to imagine what the development costs associated with this must be across the industry.  You think that Deep Horizon spill is big?  This one’s been leaking since January 2002!  One can only imagine the millions of developer hours spent debugging issues related to an imperfect understanding of the page lifecycle.   I know I’ve spent my share.

A hammer and a saw does not a carpenter make.
— Anon.

Backstory

I’m doing  architectural reviews of a couple ASP.NET projects for my client.  These sites are regular, relatively low-volume, get-things-done business apps:  a lot of forms-based data entry, forms authentication, a healthy dash of reporting, some middle-tier WCF services, and a handful of back-end databases.  The “meat and potatoes” of ASP.NET, as it were.

While the sites actually work pretty well, the ASP.NET code is a mess. (I’m sorry, I couldn’t come up with any nicer way to say it.)  It looks like it was written by developers who had success writing basic ASP.NET forms – like most of us, once upon a time — but then everything went to hell in a handbasket in the seething cauldron of project schedules, nested user controls and AJAX postbacks.

Personal aside: when I first began doing ASP.NET around 2002, I had no clue. I had a simplistic, relatively inaccurate conceptual model of the page lifecycle,  limited primarily to a dysfunctional love-hate relationship with the Load and PreRender events.   I certainly didn’t have anywhere near a full understanding of viewstate, databinding, or the postback model.  Like most of the developers out there, I suspect, I learned just enough to get by – and got myself into a heap of trouble as the pages I was building became more complex.

“Lifecycle not optional”

The facts are this: without a thorough understanding of the page lifecycle, the ASP.NET developer is doomed to a long and rocky road. The same may be said for the site’s end users, who will have to deal with bugs and erratic behavior.

The lifecycle is central to the page developer’s world:  it weaves together page creation, a hierarchy of controls, viewstate, postback, eventing, validation, rendering, and ultimately, death — against the backdrop of the page model and the HTTP application pipeline, which fuse the the protocol of the HTTP request with the mechanism of rendering an HTML response.

Misconceptions about core concepts like the page lifecycle cause a huge number of problems.  When a page becomes a little more complex than a basic form – a few user controls here, some databinding there – things start to break.  Under schedule pressure, developers start relying on oh-so-clever software hackery — which in the best scenario, actually makes things work reasonably well.   The best result, unfortunately, is a hacked-up codebase that only works “reasonably well.”

It is important for you to understand the page life cycle so that you can write code at the appropriate life-cycle stage for the effect you intend.
ASP.NET Page Life Cycle Overview (MSDN)

The control life cycle is the most important architectural concept that you need to understand as a control developer.
Developing Microsoft ASP.NET Server Controls and Components (Microsoft Press)

image

Back to the client

In my client’s code, I consistently found treatments of the page Load and PreRender events that showed a lack of understanding of the life cycle.   Sometimes everything was done on Load, and sometimes on PreRender;  sometimes postback conditions were respected, and other times, not so much.

The code was also guilty of minor abuse of session state (using it as the backing store for user controls), occasional major violence to user controls (page reaching in, control reaching out — a violation of encapsulation), and a near-complete aversion to object-oriented design (primarily a procedural approach).

Some of this, like using session state to back user controls, I believe was due to the fact that the developers didn’t understand the life cycle:  how the page databinds its controls, in what order it all happens (top-down or bottom-up?), and how data flows on the page throughout the life cycle.  In this case, using session state worked — for one control on one page, in one window — so it shipped.

This would all be duly noted in the architectural report.  But to be most helpful, I really wanted to give the client team some rock-solid guidance and best practices for developing ASP.NET web forms:  what should one do to write clear, stable, maintainable ASP.NET applications?  What kind of code should go in Init, Load, and PreRender? How to explain clearly to the developers what the events really mean?

20% of the effort produces 80% of the results I started off thinking that compiling best practices would be a reasonably simple affair involving a few web searches.

The search…

I was looking specifically for ASP.NET “best practices” related to the page lifecycle that would address questions like these:

  • What key concepts should developers keep in mind when designing ASP.NET pages to work in harmony with the page lifecycle?
  • How should a developer best structure the page to address common scenarios, such as a form which accepts input from its users?
  • What kinds of things should be done during the page Load event?  In the Init event?  In PreRender?    What are some guidelines for deciding when to put what types of code in Load vs. PreRender? Init vs. Load? Etc.
  • What kinds of things should be done –or not done — during postback (IsPostBack == true), especially in specific, basic scenarios — like that standard data input form?
  • What key concepts should developers keep in mind when using controls throughout the page lifecycle?  E.g. what are the responsibilities of the page or parent control vs. those of the child control?

Surely there was a site that would distill “The Tao of ASP.NET” or somesuch for the poor developer; something that would convey a clear, coherent conceptual model.

The emperor has no clothes

What I found first was the MSDN documentation from Microsoft on the ASP.NET Life Cycle.   This might charitably be described as mind-numbingly functional. Can I get a witness??

From the official ASP.NET Page Life Cycle Overview on MSDN:

Use this event to read or initialize control properties.
— Explaining the page Init event

Use the OnLoad(EventArgs) event method to set properties in controls and to establish database connections.
— Explaining the “typical use” of the page Load event

If the request is a postback, control event handlers are called.
— On the postback event handling phase

That’s it?  In Load, we should “set properties in controls and establish database connections?”  To me, this is of shockingly little value.  At this point I was thinking, wow, if this is the official guidance, it’s no wonder most developers don’t really understand the ASP.NET page life cycle.

It’s really hard to believe that this is the extent of Microsoft’s offering on such an important subject to its developers.   Don’t they understand what it means not only for Microsoft and its image, but also developers and their clients, the users… the whole ASP.NET ecosystem?

Sigh.

Order is not the whole of knowledge

Outside the official Microsoft channel, I found scores and scores of sites which catalogued the various lifecycle events, many of them replaying the MSDN documentation without much additional detail.  Site after site described  the page life cycle and its events as if  “to order” were “to know.”

There were few meaningful metaphors for developers to relate to, and little if any guidance as to how to actually use the events in a practical manner.  Witness again:

Objects take true form during the Load event.
— “The ASP.NET Page Life Cycle,” 15seconds.com, in an apparent fit of Buddhistic nihilism

The Pre_Render [sic] event is triggered next and you can perform any update operations that you require prior to saving the View State and rendering of the web page content.
— “Understanding the ASP.NET Page Life Cycle,”  aspnet101.com

PreRender is often described as “the last chance to do something that affects viewstate.” Am I the only one who doesn’t find this particularly helpful?

Personal aside: I didn’t understand viewstate fully; until recently, that is, when I read Dave Reed’s epic, thorough, and now-classic treatment in Truly Understanding Viewstate.   If you want to be an ASP.NET pro, read this excellent post.

It’s a gusher…  a leaky abstraction

After many hours across days of searching, and hundreds of articles read, I found few clues.  I did find grim evidence in blogs and forums that what I was after might not exist:  a large number of developers expressing frustration with how ASP.NET works:

Let me tell you all that if you want to work from home and be a freelance web developer, ASP.NET will stress your life out and you will die young.
— From a multi-year discussion on the post “ASP.NET Sucks” by Oliver Brown

WebForms is a lie. It’s abstraction wrapped in deception covered in lie sauce presented on a plate full of diversion and sleight of hand.
— “Problems or Flaws of the ASP.NET postback model,” quoting Rob Conery in “You Should Learn MVC,” channelling Will Ferrell

Are there any best practices or recommendations around using "Page_PreRender" vs "Page_Load"?
— Max, a lonely voice in the wilderness,  on the thread “Best Practices around PreRender and Load”  (July 2007)

image

" WebForms is a lie! It’s abstraction wrapped in deception covered
in lie sauce presented on a plate full of diversion and sleight of hand!"

Finally, from respected veteran blogger and ASP.NET consultant Rick Strahl:

It can sometimes get very difficult to coordinate the event sequence for data binding, rendering, and setup of the various controls at the correct time in the page cycle. Do you load data in the Init, Load or PreRender events or do you assign values during postback events?
— “What’s Ailing ASP.NET Web Forms,” Rick Strahl (Nov 2007), citing complexity in the page lifecycle as a major ASP.NET issue

If it’s “very difficult” for an ASP.NET guru, what hope is there for Joe and Jane ASP.NET Developer?

Amidst the sea of holes…

In the middle of all of this I did find this gem of conceptual guidance:

Web Forms is framed as "page-down" instead of "control-up"… From the perspective of a control, databinding looks like "someone who is responsible for me, because they contain me, will set my data source and tell me when to bind to it. They know what I am supposed to do and when, *not* me. I am but a tool in the master plan of a bigger entity."

From the perspective of a page, databinding looks like: "I am the biggest entity on this page. I will determine relevant data, set data sources on my children, and bind them. They will in turn set data sources on their children and bind them, and so forth. I kick off this entire process but am only responsible for the first logical level of it."
Bryan Watts in a reply on Rick Strahl’s blog, offering a conceptual model of page-control interaction

Brilliant!!  Watt’s Law of the page-control interaction. Finally, a conceptual model that a developer can relate to, understand, and most importantly, help teach effective page and control design.  This simple guidance conveys:

  1. How the control should be structured to work with its parent (page will set my data source and bind to it); and
  2. How the page should be structured to work with a child control (I will set the data source of the control and tell it when to bind)

This was the only real ray of light so far.  I forged on, throwing hail-Mary terms like “zen” and “tao” into my search.  It seemed like the more I read, the less I understood.

This was partly because as I delved further I realized there were aspects of the page model and lifecycle that I didn’t truly understand, either.

The great white (paper) hope

Undaunted, I renewed my subscription to Safari Books Online, publishing home to some of the world’s finest technical authors.  If the answer was anywhere, surely it would be here. I lapsed briefly into optimism:  maybe the blogosphere had eschewed its treatment of this topic because the publishing world had illuminated it so thoroughly?

Not so much. Most books, while fantastic learning resources – far better for learning than reading random articles on the web –  spilled precious little ink on the page life cycle, rushing on to more exciting topics like databinding and the GridView control.

image ASP.NET 3.5 Unleashed treats the life cycle almost dismissively:

Ninety-nine percent of the time, you won’t handle any of these events except for the Load and the PreRender events. The difference between these two events is that the Load event happens before any control events and the PreRender event happens after any control events.
ASP.NET 3.5 Unleashed, “Handling Page Events

imageJesse Liberty’s Programming ASP.NET 3.5 presents a great life cycle flowchart: it has a lot of order and flow information, but little guidance.  A page or so later, and the lifecycle is dragged outside and summarily dispatched in the following fashion:

The load process is completed, control event handlers are run, and page validation occurs.
Programming ASP.NET 3.5,Website Fundamentals: Life Cycle

The excellent book Developing Microsoft ASP.NET Server Controls and Components (Kothari /Datye, Microsoft Press) is a must-read for any serious ASP.NET architect or control developer.  But they, too, shy away from the concerns of guidance:

In this phase, you should implement any work that your control needs to do before it is rendered, by overriding the OnPreRender method.
Developing Microsoft ASP.NET Server Controls and Components, “Chapter 9, Control Life Cycle

image

From Programming ASP.NET 3.5: running before walking

As I read each book’s treatment of the life cycle, I kept thinking, how can you move beyond the page life cycle when you haven’t described what we’re supposed to do with it?

Conclusion

What do the Init, Load, and PreRender events really mean? How should developers architect pages to address common scenarios? Where are the workable models and metaphors?   The order of page events is only one aspect of their meaning. And explaining events in terms of their interaction with viewstate, postback, or the control tree is most useful only when one thoroughly understands those concepts.

I am a big proponent of software designs and user interfaces that follow natural conceptual models: it’s a win-win for developers and all stakeholders across the entire software development lifecycle.    See my post on metaphors in software design.

In the end, it took a very long time and a lot of dogged persistence for me, an experienced software architect, to feel like I thoroughly understood the ASP.NET page model and its lifecycle:  the “bottom-up” of control initialization vs. the “top-down” of control loading, the meaning of the Init, Load, and PreRender events — and specifically what kind of code I would use in them, when and why– and a clear understanding of viewstate and postback.

Why burn money when developing ASP.NET pages is so much more fun?Along the way, I realized why a lot of developers have issues with ASP.NET.  It’s ASP.NET’s dirty little secret: very few developers really understand the page lifecycle and the page model – and very little “real” documentation exists for it.

From a Microsoft product/program perspective, this seems like a monumentally costly mistake. On my Googling travels through ASP.NET forums and in blog post comments, I “met” tons of developers who were struggling with ASP.NET – and who are still struggling with it.  It’s pretty congruent with my own experience, and of the ASP.NET developers I’ve known in person.

As far as I’m concerned, the best architectures are those that are almost impossible to screw up.  Despite its power and flexibility, ASP.NET falls short on that account.

As far  as Microsoft goes, devoting time, care, and energy to make sure developers have crystal-clear guidance represents a huge opportunity for the success of the product ecosystem. Imagiimagene how many millions of corporate development hours have been spent since January 2002 debugging ASP.NET apps because a developer didn’t understand the page lifecycle, viewstate,  postback, or Watt’s Law of Databinding.

Even worse, how many clients and end-users were frustrated because their Microsoft-based web apps didn’t work properly? You can see how this issue could have disastrous implications for the ecosystem.  As a long-time Microsoft fan, I hope they make sure this never happens again.

Coming soon: in my next  post, I’ll offer up my “three dualities of ASP.NET,” clear metaphors that will enable ASP.NET developers to understand the lifecycle and write complex applications that work well.

***

More Reading

  • Truly Understanding ViewState, Dave Reed (Infinities Loop)
    A comprehensive and humorous look at the uses and mis-uses of viewstate; a definitive resource and must-read for the professional ASP.NET developer:
    “It’s not that there’s no good information out there about ViewState, it’s just all of them seem to be lacking something, and that is contributing to the community’s overall confusion about ViewState.”
  • Understanding ASP.NET ViewState, Scott Mitchell (4GuysFromRolla.com)
    A great starter article.
  • The ASP.NET Page Object Model, Dino Esposito (MSDN)
    Another excellent article from an top-notch writer…  but shy of guidance.
  • What’s Ailing ASP.NET Web Forms, Rick Strahl (West-Wind.com)
    A great post with pros and cons of ASP.NET and a comparison to MVC.
    “Microsoft built a very complex engine that has many side effects in the Page pipeline.”
  • 10 Reasons Why ASP.NET WebForms Suck, JD Conley (“Starin’ at the Wall 2.0”)
    Like a true mockumentary, this dial actually goes up to 11.  What gets that honor?
    “Number 11: The odd feeling that you have to beat the framework into submission to get it to do what you want.”
  • Page Load is Evil, Wayne Barnett
    I can’t really say I agree. But this post is accompanied by a deliciously raging discussion that boils over into near-flame-war at the end.  Unfortunately with no pat answers to what to do in Load or PreRender or Init, for that matter.
    “It seems to me that the root problem is that the page life cycle isn’t given the respect it deserves in tutorials and documentation.” – Anonymous poster
    I would desperately like to see more on this subject [of page life cycle]!  … I’ve tried reading up on this before, but from a designer/programmer’s point of view it’s sometimes difficult to get my head around the order of events or their meanings.” – Chris Ward, followup poster

Creating a code repository with Subversion

If you’re going to be serious about software development, you need to maintain a source code repository.  In this post, I’ll share my experience creating a source code repository with Subversion on a Windows server in under 30 minutes.   

image image

My first home repo

Kind of appropriate for a first post on a software blog, don’t you think?   I’ve done a ton of coding (C# and .NET mostly since 2001), but never took the time to set up a code repository at home.   Now that I think of it, it’s sort of like the cobbler with his tattered shoes, imagesince at work part of what I do is make sure that software best practices are followed:  checking in code, testing code, ensuring repeatable release practices (CMM) are followed, etc. 

But in my last experience as an architect and director at Advertising.com (now owned by AOL), we evaluated code repo’s, selected Subversion, and migrated the whole Technology division onto it.  

The best part about it – beyond that it works very well, of course – is that Subversion is open-source and free.

Preparing the server: 15 minutes

I chose to set it up on my primary network server, a Dell PowerEdge 1600SC running Windows Server 2003.    To do this, I wanted a dedicated and secure space, so I chose to create a separate volume for the repo.  

I followed my own, relatively new best practices for intellectual property (IP) and information assurance (IA) management, to ensure that this critical resource (code) was both secure and could not go away via a hard drive failure or other accidental happening:

  • Created a highly available disk volume. I created a volume (“DEV”) using Windows Disk Management on my most reliable, RAID-1 mirrored disks (Seagate 10K SCSI drives).  The likelihood of one of these failing is small enough, but the likelihood of both failing is astronomical
  • Enabled file system versioning. I set up a cross-volume shadow copy (VSS, not to be confused with the ancient Microsoft source control product) to ensure that prior versions of source files would be accessible.   Having your IP on mirrored, robust drives means nothing if you or someone else accidentally changes or deletes a file or folder.   If you’re interested in this kind of stuff, check out Windows volume shadow copy – it automatically keeps prior versions of files and lets you restore them using Windows Explorer. 
  • Locked down NTFS permissions.  I constrained the permissions on the volume to just Administrators, since this is a highly sensitive resource:  I removed the general Users group permissions.  This of course means that only users authenticated as admins will be able to browse the repo.
  • Shared the volume on the network.  I created a public network share (DEV), making it available to my development machines.  Once again, I restricted the share permissions – even though the effective permissions on a share under Windows are never less restrictive than the permissions on the underlying filesystem.   
  • Set up a file-level recurring backup task.  Set up a file/document level secure backup job under Scheduled Tasks in WS2003 using a 7-zip backup batch file I created.   This backup job ensures that all the files and folders on the volume are backed up nightly using AES-256 encryption, to a separate physical backup location (which is itself mirrored).    Since this archive is securely encrypted, it could also be FTP’d to an online storage service to provide additional security and recoverability. 
  • Set up a volume-level recurring backup task.   Hot-imaging the repo at the disk volume level provides additional recoverability: I feel much better knowing that if the primary RAID array failed for any reason, I would have not just one, but two separate means of restoring it.    This backup task I scheduled to run weekly and images the entire volume using a supporting batch file I created for DriveImage XML

Preparation extras

Other options you can consider, based on your knowledge of security and Windows, etc – I may implement some of these in the future:

  • Hide the share.  You could make the network share invisible to casual browsers by calling it “DEV$” instead of “DEV”:  this is a slightly hacky remainder which goes back to DOS, if I’m not mistaken.   This would prevent non-authorized users or hackers from even seeing that the share exists (“security through obscurity”).
  • imageEncrypt the volume. You could encrypt the volume using the highly secure Windows Encrypting File System (EFS) or BitLocker under Windows 7.   This would help ensure that the contents (your source code) would not be available to someone who swiped the actual physical server or hard drives.
  • Use asymmetric encryption on the backups.  You could use asymmetric encryption on the backup media (the archives and the volume images).   While the 7-zip archives I’m creating are locked down with highly secure AES-256, the password always ends up in a batch file somewhere, making it vulnerable to interception or theft.   With asymmetric public-private key encryption, the public key is used to encrypt – you can even give the key out to the world — but only the private key can decrypt – and it’s locked in your vault on a couple USB drives… and probably a good old piece of paper.

If anyone has experience or a best practices using secure and available volumes, please chime in via comment:  I’m a software guy, not a security expert or IT administrator. 

Installing Subversion: 10 minutes

Now that I had created a relatively secure home for the repo that couldn’t easily be wiped out by any single failure, it was time to actually create the code repository.  

In my case, I followed the simple and clear instructions at Coding Horror to install Subversion on my service, start Subversion as a service (so it’s always available when the machine is running), and create a code repo.  True to the article, it took less than 30 minutes:  ten in my case, since I had worked with Subversion before.   That’s it!

If you’ve set up Subversion on your home server, are there any great tips/tricks/links that I’ve missed here?