Trick and Tips: You are what you document

Saturday, January 24, 2015

You are what you document

Hey, grab a seat - we need to talk about documentation. Now, I know what youre thinking: documentation is tedious, a chore, an afterthought, a redundant source of information given your beautiful, self-documenting code. Its just like a good diet and exercise - youll do it when you have the time!

Well, this blog post is an intervention. Youre hurting others and youre hurting yourself. You poured countless hours into a project, but your co-workers wont use it. You tried to run it in production, but the OPs team wont support it. You put the project on Github, but the fools on Hacker News just dont see the brilliance of what youve done.

The number one cause of startup failure is not the product, but the distribution: it doesnt matter how good the product is if no one uses it. With software, the documentation is the distribution: it doesnt matter how good the code is if no one uses it. If it isnt documented, it doesnt exist.

Think of this blog post as documentation for your documentation. By "documentation", I dont just mean a written manual, but all the pieces that go into making your software learnable: the coding practices, tutorials, white papers, marketing, the community, and the user experience.

Ill be discussing three types of documentation:

Written documentation: READMEs, tutorials, reference guides, white papers.
Code documentation: API docs, comments, example code, the type system.
Community documentation: blog posts, Q&A sites, talks, meetup groups.

Each type of documentation solves a different problem, so most projects should include some mix of all three types. Ive tried to include links to open source projects that best demonstrate each of the different types of documentation. If you know of other great examples or other types of documentation that Ive missed, please leave a comment.

1. Written documentation

Lets start with what people typically think of when they hear the word "documentation": READMEs, tutorials, reference guides, etc.

1a. The README

Every project should have a README: it is the single most important document in your codebase. The README is typically your first contact with a new user, so your goal is to introduce them to the project as quickly as possible, convince them why its worth learning more, and give them pointers on how to get started and where to get more info.

A typical README should have the following information:

Description: short "sales pitch". Tell the reader why they should keep reading.
Quick examples: short code snippets or screenshots to support the description.
Quick start: how to get going, install instructions, and more examples.
Further documentation: links to the full docs and more info.
Project organization: who are the authors, how to contribute, how to file bugs.
Legal notices: license, copyright, and any other legal details.

Here are some examples of great READMEs:

Twitter Bootstrap
guard
Ace
jekyll
hogan.js
ember.js

I usually practice Readme Driven Development, writing the README before writing any code. This forces me to be clear on exactly what Im trying to build, helps me prioritize the work (anything in the "sales pitch" is a must-have), and provides a great sanity check on what the basic user experience looks like (the quick example and quick start sections are essential). See the original Readme Driven Development post and The Most Important Code Isnt Code for more info.

1b. Tutorials, walkthroughs, and guides

The README gets the user in the door; the tutorial shows them how to walk around. The goal is to guide a new user through example use cases that highlight the idiomatic patterns, the best practices, and the unique features of the project. Use the tutorial to have a dialogue with the user, walking them through the typical development flow step by step and introducing the key ideas. You dont have to cover every single topic and you dont have to go too in-depth: instead, at each step of the tutorial, provide links to where the user can find more info.

For small, simple projects, you may be able to squeeze a tutorial into the README itself, but most projects will want to use a wiki, a blog post, a standalone webpage, slide deck, or even a recorded video. Here are some great examples:

Ruby on Rails Guides
Django Tutorial
Dropwizard Getting Started
Intro to Play Framework for Java
Twilio quick start tutorials

The gold standard, however, is the interactive tutorial. Most developers learn best by doing, so a step-by-step guide that lets the developer participate is the ultimate learning tool. Here are a few great examples:

A Tour of Go
Scala Tutorials
Typesafe Activator
Try Redis and Redis commands
Try Git
Codecademy

Creating your own interactive tutorial is not easy, but it dramatically lowers the bar for trying and learning about your project. Here are some (language/framework specific) tools you may find helpful: io.livecode.ch, IPython Notebook, java-repl, Pamflet, Typesafe Activator, repl.it, Ace Editor, CodeMirror, Cloud9 IDE, jsfiddle, Codecademy, and codepen.

1c. Reference documentation

Ok, your new user got their foot in the door with the README and they took a few steps by following the tutorial; now, the user actually knows enough to start asking questions. This is where the reference documentation comes into play: the goal is to give users a way to find the specific information they need. In this part of the documentation, you can cover all the major topics in depth, but make sure to organize the information in a way that is easy to search and navigate.

Here are some great examples of reference documentation:

Stripe docs
Django documentation
Dropwizard user manual
Codahale metrics
SQLite documents

For large projects, the amount of reference documentation can be pretty large. How do you keep it up to date? One technique is to include references to real code: that is, instead of typing code snippets directly into your docs, build a system to dynamically include them from a real repository.

For example, consider this entry in the Play Framework async docs:

This documentation is generated from markdown files using the play-doc project. For example, here is the Markdown for the "Returning futures" section:

Notice that the code snippet is not in the Markdown. Instead, there is just the line @[async-result](code/ScalaAsync.scala), which is a reference to ScalaAsync.scala in Plays git repo, where the relevant code is demarcated using special comments:

Since this file is compiled and tested, developers have to update it whenever they make changes to the framework - otherwise, the build fails. Moreover, as the comments identify the section of code as "used in the documentation", there is a good chance the developers will remember to update the relevant part of the documentation as well.

1d. Project websites

Standalone project websites are a great example of documentation as marketing: you can give your project its own home, with a custom look and feel, and content that is linkable, tweetable, and indexable.

Here are a few great examples:

Bootstrap
jekyll
Yeoman
Ember
Foundation

The easiest way to create a website for your project is with Github Pages: create a repo on Github, put a few static HTML files in it (possibly using jekyll), git push, and you have your own landing page on the github.io domain.

1e. White papers and books

If you want to make a project look legit, a white paper, and especially a book, is the way to go. White papers are a great way to explain the background for the project: why it was built, the requirements, the approach, and the results. Books, of course, can contain the material in all the sections above: a quick intro, a tutorial, a reference guide, and more. Books are a sign that your project has "made it": there is enough interest in it that a publisher is willing to put money into printing the book and programmers are willing to put money into buying the book.

Some great examples:

Bitcoin: a peer-to-peer electronic cash system
Ethereum white paper
Kafka: a distributed messaging system for log processing
C Programming Language
Effective Java

2. Code documentation

We now understand the role of written documentation: the README gets your foot in the door; the tutorial shows you how to walk around; the reference guide is a map. But to truly understand how a piece of software works, you have to learn to read the source. As the author of a project, it is your job to make the code as easy to understand as possible: programs must be written for people to read, and only incidentally for machines to execute.

However, the code cannot be the only documentation for a project. You can no more learn how to use a complicated piece of software by reading the source than you can learn to drive a car by taking apart the engine.

As well discuss below, code structure, comments, API docs, design patterns, and test cases all contain critical information for learning how to use a project, but remember that they are not a replacement for written documentation.

2a. Naming, design patterns, and the type system

There is no such thing as "self documenting" code, but there are ways to make the code easier or harder to understand. One of the first aspects of code readability is naming: every piece of software defines its own mini language or DSL that consists of class names, package names, method names, and variable names. When a developer uses your code, they are really learning a new language, so choose the words in it wisely! However, since naming is one of the two hardest problems in computer science, I recommend getting yourself a copy of Code Complete, which dedicates quite a few pages to this topic:

Design patterns are another tool for communicating the intent of your code. You have to be careful not to overuse them (see Rethinking Design Patterns), but having a shared vocabulary of terms like singleton, factory, decorator, and iterator can be useful in setting expectations and making the naming problem a little easier. The classic book in on this topic is Design Patterns: Elements of Reusable Object-Oriented Software, aka "The Gang of Four":

Finally, the type system in statically typed languages can be another powerful source of information. A type system can reduce not only the number of tests you write (by catching a certain class of errors automatically), but also the amount of documentation you have to write. For example, when calling a function in a dynamically typed language, there is no way to know the types of parameters to pass in unless the author of the function manually documented it; in a statically typed language, the types are known automatically, especially with a good IDE.

Of course, not all type systems are equal, and you have to use them correctly (e.g. avoid stringly typed programming) to see the benefits. For examples of powerful type systems, check out (in increasing order of power and crazy) Scala, Haskell, and Idris.

2b. API docs and literate programming

API docs are documentation for each class, function, and variable in your code. They are a fine-grained form of documentation that lets you learn about the inputs and outputs of each function, the preconditions and postconditions, and, perhaps most importantly, why a certain piece of code exists and behaves the way it does.

Many programming languages have tools to generate API docs. For example, Java comes with JavaDoc, which lets you add specially formatted comments to the code:

You can then run a command line utility that generates a webpage for each class with the JavaDoc comment formatted as HTML:

Good IDEs can show API docs automatically for any part of the code:

Some frameworks have special handling for API docs as well. For example, rest.li automatically extracts the documentation from your REST service and exposes it in a web UI. You can use this UI to browse all the RESTful services available, see what resources they expose, what methods and parameters they support, and even make REST calls straight from your browser:

Here are a few nice examples of API docs:

Java API docs
Scala API docs
Stripe API docs
Twilio API docs
Github API docs
rest.li API docs

Literate programming goes even further: the idea is that program logic should be described first in natural language; the code comes second, interspersed amongst the English description where convenient. Instead of organizing programs in a way thats easy for compilers to process (ie, rigid file, folder, and package structure), literate programs should be organized in a way that makes it easier for humans to understand, such as an essay format.

I think literate programming is a great concept, but Im not aware of any mainstream languages that support it fully. The closest Ive seen are projects that use tools like docco, which lets you generate an HTML page that shows your comments intermingled with the code, and feels like a halfway point between API docs and literate programming. Heres an example from Literate CoffeeScript:

There are flavors of docco tailored for specific languages, such as rocco (Ruby), Pycco (Python), Gocco (Go), and shocco (POSIX shell). There is also an extension of docco called Groc, which adds support for a searchable table of contents, handles hierarchies of files and folders, and integrates with Github Pages..

2c. Comments

When used correctly, comments are another important source of information: whereas the code tells you how, comments tell you why. The trick is finding the right balance. Code without any comments cant explain why the program is being written, the rationale for choosing this or that method, or the reasons certain alternative approaches were taken; code with too many comments can often be a sign that the code itself is unclear and instead of fixing the code, the comments are being used as a crutch.

In short: always use comments in moderation and always to explain why.

For the "best" examples of comments, I point you to a hilarious StackOverflow thread: What is the best comment in source code you have ever encountered?

2d. Example code and test code

No matter how good your docs are, you cant force developers to RTFM. Some developers prefer to learn by example - which is a polite way of saying that they like to copy and paste.

Getting the example code right is critical to the success of a project, as many developers will blindly copy and paste it. Your goal is to make as many clean, idiomatic examples available as possible. You may also want to invest extra time with the first few teams that adopt your project to help them write clean code: their projects may become the models for everyone else, so make sure its a model thats worth following!

Here are some projects with great example code:

Twilio HowTos and Example Code
Twitter bootstrap examples
Typesafe Activator templates
async.js
Firebase examples

Automated tests are a special case of example code. Tests can be useful as documentation in that they show the expected behavior of the code for a variety of use cases. BDD style unit tests, such as Specs2 and RSpec, even encourage writing test cases as a formal specifications. However, in practice, test code can get tangled up with mock objects, test frameworks, and corner cases, all of which can be a source of confusion if you try to rely on it too heavily as a form of documentation.

Projects with great test code:

SQLite
Apache Lucene
backbone.js
Chromium
jQuery

3. Community documentation

Weve talked about written documentation and code documentation; the final piece of the puzzle comes from the people involved with the project and the tools they use.

3a. Project management tools

Most teams use bug tracking software (e.g. JIRA, bugzilla, github issues) and/or project management software (e.g. Basecamp, Asana, Trello). These systems contain a lot of information about the project: what you worked on before, what youre working on now, what youll work on in the future, bugs found, bugs fixed, and so on.

A few examples:

Play Framework Github Issues
Mozilla Bugzilla
Firefox Roadmap Wiki
Chromium Issues

Its hard to imagine how a TPS report can be useful as documentation, but very often, the discussions over a tricky bug or the requirements gathering before starting a new project contain critical information not available anywhere else. Its not uncommon to come across a bug report or an old wiki page while searching for information about a project, especially if its an open source project that makes all of this information publicly available.

3b. Mailing lists and Q&A boards

Discussions from Q&A sites like StackOverflow and mailing lists like google groups also come up frequently in search results. Even the best documentation will not be able to answer everything, so cultivating community websites can be a critical part of making software learnable. Over time, these may become some of the most important parts of your projects documentation, as they inherently deal with issues where many developers got stuck.

A few examples:

Play Framework Google Group
Android StackOverflow Tag
Ruby on Rails StackOverflow Tag

This is one area where open source projects shine: being able to instantly find answers by using google is a huge win. That said, for internal/proprietary projects, I encourage you to setup internal mailing lists, maintain an FAQ, and/or install an internal StackOverflow-style Q&A site within your company.

3c. Blog posts, talks, meetup groups

For popular open source projects, some of the best documentation comes in the form of content contributed by the community. For example, blog posts and talks from end users are a valuable source of information, revealing whats really working and what isnt; they are also great marketing, as it makes it clear other people are using project. Even blog posts that completely trash the project can be useful - think of it as a free design review!

If your project is open source, growing a community around it can have a huge pay off. A small investment in "marketing" your project - via good documentation, custom project pages, giving talks, and setting up meetup groups - can yield huge returns in the form of free labor, cleaner code, and better branding.

There are countless great blog posts and talks, so here are a few unbiased, randomly selected links that you should definitely check out:

The Ultimate Guide to Getting Started with the Play Framework
Composable and Streamable Play Apps
The Play Framework at LinkedIn
Play Framework: Async I/O with Java and Scala
Bitcoin by Analogy

Further reading

If youve made it this far, you should now know how, and why, to document your code. I hope you join me in building software that is easier to use and learn.

If youre hungry for more info, I recommend the following resources:

Writing Great Documentation
The Most Important Code Isnt Code
Teach, Dont Tell
Designing Great API Docs
No docs == no product
Pointers to useful, well-written, and otherwise beautiful documentation
If It Isnt Documented, It Doesnt Exist
A beginners guide to writing documentation
Tips for Writing Good Documentation

Trick and Tips

Saturday, January 24, 2015

You are what you document

No comments:

Post a Comment