Saturday, January 24, 2015
You are what you document
Hey, grab a seat - we need to talk about documentation. Now, I know what youre thinking: documentation is tedious, a chore, an afterthought, a redundant source of information given your beautiful, self-documenting code. Its just like a good diet and exercise - youll do it when you have the time!
Well, this blog post is an intervention. Youre hurting others and youre hurting yourself. You poured countless hours into a project, but your co-workers wont use it. You tried to run it in production, but the OPs team wont support it. You put the project on Github, but the fools on Hacker News just dont see the brilliance of what youve done.
The number one cause of startup failure is not the product, but the distribution: it doesnt matter how good the product is if no one uses it. With software, the documentation is the distribution: it doesnt matter how good the code is if no one uses it. If it isnt documented, it doesnt exist.
Think of this blog post as documentation for your documentation. By "documentation", I dont just mean a written manual, but all the pieces that go into making your software learnable: the coding practices, tutorials, white papers, marketing, the community, and the user experience.
Ill be discussing three types of documentation:
- Written documentation: READMEs, tutorials, reference guides, white papers.
- Code documentation: API docs, comments, example code, the type system.
- Community documentation: blog posts, Q&A sites, talks, meetup groups.
1. Written documentation
Lets start with what people typically think of when they hear the word "documentation": READMEs, tutorials, reference guides, etc.
1a. The README
Every project should have a README: it is the single most important document in your codebase. The README is typically your first contact with a new user, so your goal is to introduce them to the project as quickly as possible, convince them why its worth learning more, and give them pointers on how to get started and where to get more info.
A typical README should have the following information:
- Description: short "sales pitch". Tell the reader why they should keep reading.
- Quick examples: short code snippets or screenshots to support the description.
- Quick start: how to get going, install instructions, and more examples.
- Further documentation: links to the full docs and more info.
- Project organization: who are the authors, how to contribute, how to file bugs.
- Legal notices: license, copyright, and any other legal details.
- Twitter Bootstrap
- guard
- Ace
- jekyll
- hogan.js
- ember.js
I usually practice Readme Driven Development, writing the README before writing any code. This forces me to be clear on exactly what Im trying to build, helps me prioritize the work (anything in the "sales pitch" is a must-have), and provides a great sanity check on what the basic user experience looks like (the quick example and quick start sections are essential). See the original Readme Driven Development post and The Most Important Code Isnt Code for more info.
The README gets the user in the door; the tutorial shows them how to walk around. The goal is to guide a new user through example use cases that highlight the idiomatic patterns, the best practices, and the unique features of the project. Use the tutorial to have a dialogue with the user, walking them through the typical development flow step by step and introducing the key ideas. You dont have to cover every single topic and you dont have to go too in-depth: instead, at each step of the tutorial, provide links to where the user can find more info.
For small, simple projects, you may be able to squeeze a tutorial into the README itself, but most projects will want to use a wiki, a blog post, a standalone webpage, slide deck, or even a recorded video. Here are some great examples:
- Ruby on Rails Guides
- Django Tutorial
- Dropwizard Getting Started
- Intro to Play Framework for Java
- Twilio quick start tutorials
- A Tour of Go
- Scala Tutorials
- Typesafe Activator
- Try Redis and Redis commands
- Try Git
- Codecademy
Creating your own interactive tutorial is not easy, but it dramatically lowers the bar for trying and learning about your project. Here are some (language/framework specific) tools you may find helpful: io.livecode.ch, IPython Notebook, java-repl, Pamflet, Typesafe Activator, repl.it, Ace Editor, CodeMirror, Cloud9 IDE, jsfiddle, Codecademy, and codepen.
Ok, your new user got their foot in the door with the README and they took a few steps by following the tutorial; now, the user actually knows enough to start asking questions. This is where the reference documentation comes into play: the goal is to give users a way to find the specific information they need. In this part of the documentation, you can cover all the major topics in depth, but make sure to organize the information in a way that is easy to search and navigate.
Here are some great examples of reference documentation:
- Stripe docs
- Django documentation
- Dropwizard user manual
- Codahale metrics
- SQLite documents
For example, consider this entry in the Play Framework async docs:
This documentation is generated from markdown files using the play-doc project. For example, here is the Markdown for the "Returning futures" section:
Notice that the code snippet is not in the Markdown. Instead, there is just the line
@[async-result](code/ScalaAsync.scala),
which is a reference to ScalaAsync.scala in Plays git repo, where the relevant code is demarcated using special comments:Since this file is compiled and tested, developers have to update it whenever they make changes to the framework - otherwise, the build fails. Moreover, as the comments identify the section of code as "used in the documentation", there is a good chance the developers will remember to update the relevant part of the documentation as well.
Standalone project websites are a great example of documentation as marketing: you can give your project its own home, with a custom look and feel, and content that is linkable, tweetable, and indexable.
Here are a few great examples:
- Bootstrap
- jekyll
- Yeoman
- Ember
- Foundation
The easiest way to create a website for your project is with Github Pages: create a repo on Github, put a few static HTML files in it (possibly using jekyll), git push, and you have your own landing page on the github.io domain.
If you want to make a project look legit, a white paper, and especially a book, is the way to go. White papers are a great way to explain the background for the project: why it was built, the requirements, the approach, and the results. Books, of course, can contain the material in all the sections above: a quick intro, a tutorial, a reference guide, and more. Books are a sign that your project has "made it": there is enough interest in it that a publisher is willing to put money into printing the book and programmers are willing to put money into buying the book.
Some great examples:
- Bitcoin: a peer-to-peer electronic cash system
- Ethereum white paper
- Kafka: a distributed messaging system for log processing
- C Programming Language
- Effective Java
2. Code documentation
We now understand the role of written documentation: the README gets your foot in the door; the tutorial shows you how to walk around; the reference guide is a map. But to truly understand how a piece of software works, you have to learn to read the source. As the author of a project, it is your job to make the code as easy to understand as possible: programs must be written for people to read, and only incidentally for machines to execute.
However, the code cannot be the only documentation for a project. You can no more learn how to use a complicated piece of software by reading the source than you can learn to drive a car by taking apart the engine.
As well discuss below, code structure, comments, API docs, design patterns, and test cases all contain critical information for learning how to use a project, but remember that they are not a replacement for written documentation.
2a. Naming, design patterns, and the type system
Design patterns are another tool for communicating the intent of your code. You have to be careful not to overuse them (see Rethinking Design Patterns), but having a shared vocabulary of terms like singleton, factory, decorator, and iterator can be useful in setting expectations and making the naming problem a little easier. The classic book in on this topic is Design Patterns: Elements of Reusable Object-Oriented Software, aka "The Gang of Four":
Finally, the type system in statically typed languages can be another powerful source of information. A type system can reduce not only the number of tests you write (by catching a certain class of errors automatically), but also the amount of documentation you have to write. For example, when calling a function in a dynamically typed language, there is no way to know the types of parameters to pass in unless the author of the function manually documented it; in a statically typed language, the types are known automatically, especially with a good IDE.
Of course, not all type systems are equal, and you have to use them correctly (e.g. avoid stringly typed programming) to see the benefits. For examples of powerful type systems, check out (in increasing order of power and crazy) Scala, Haskell, and Idris.
2b. API docs and literate programming
API docs are documentation for each class, function, and variable in your code. They are a fine-grained form of documentation that lets you learn about the inputs and outputs of each function, the preconditions and postconditions, and, perhaps most importantly, why a certain piece of code exists and behaves the way it does.
Many programming languages have tools to generate API docs. For example, Java comes with JavaDoc, which lets you add specially formatted comments to the code:
You can then run a command line utility that generates a webpage for each class with the JavaDoc comment formatted as HTML:
Good IDEs can show API docs automatically for any part of the code:
Some frameworks have special handling for API docs as well. For example, rest.li automatically extracts the documentation from your REST service and exposes it in a web UI. You can use this UI to browse all the RESTful services available, see what resources they expose, what methods and parameters they support, and even make REST calls straight from your browser:
Here are a few nice examples of API docs:
- Java API docs
- Scala API docs
- Stripe API docs
- Twilio API docs
- Github API docs
- rest.li API docs
I think literate programming is a great concept, but Im not aware of any mainstream languages that support it fully. The closest Ive seen are projects that use tools like docco, which lets you generate an HTML page that shows your comments intermingled with the code, and feels like a halfway point between API docs and literate programming. Heres an example from Literate CoffeeScript:
There are flavors of docco tailored for specific languages, such as rocco (Ruby), Pycco (Python), Gocco (Go), and shocco (POSIX shell). There is also an extension of docco called Groc, which adds support for a searchable table of contents, handles hierarchies of files and folders, and integrates with Github Pages..
When used correctly, comments are another important source of information: whereas the code tells you how, comments tell you why. The trick is finding the right balance. Code without any comments cant explain why the program is being written, the rationale for choosing this or that method, or the reasons certain alternative approaches were taken; code with too many comments can often be a sign that the code itself is unclear and instead of fixing the code, the comments are being used as a crutch.
In short: always use comments in moderation and always to explain why.
For the "best" examples of comments, I point you to a hilarious StackOverflow thread: What is the best comment in source code you have ever encountered?
2d. Example code and test code
No matter how good your docs are, you cant force developers to RTFM. Some developers prefer to learn by example - which is a polite way of saying that they like to copy and paste.
Getting the example code right is critical to the success of a project, as many developers will blindly copy and paste it. Your goal is to make as many clean, idiomatic examples available as possible. You may also want to invest extra time with the first few teams that adopt your project to help them write clean code: their projects may become the models for everyone else, so make sure its a model thats worth following!
Here are some projects with great example code:
- Twilio HowTos and Example Code
- Twitter bootstrap examples
- Typesafe Activator templates
- async.js
- Firebase examples
Projects with great test code:
- SQLite
- Apache Lucene
- backbone.js
- Chromium
- jQuery
3. Community documentation
Weve talked about written documentation and code documentation; the final piece of the puzzle comes from the people involved with the project and the tools they use.
3a. Project management tools
Most teams use bug tracking software (e.g. JIRA, bugzilla, github issues) and/or project management software (e.g. Basecamp, Asana, Trello). These systems contain a lot of information about the project: what you worked on before, what youre working on now, what youll work on in the future, bugs found, bugs fixed, and so on.
A few examples:
- Play Framework Github Issues
- Mozilla Bugzilla
- Firefox Roadmap Wiki
- Chromium Issues
Its hard to imagine how a TPS report can be useful as documentation, but very often, the discussions over a tricky bug or the requirements gathering before starting a new project contain critical information not available anywhere else. Its not uncommon to come across a bug report or an old wiki page while searching for information about a project, especially if its an open source project that makes all of this information publicly available.
Discussions from Q&A sites like StackOverflow and mailing lists like google groups also come up frequently in search results. Even the best documentation will not be able to answer everything, so cultivating community websites can be a critical part of making software learnable. Over time, these may become some of the most important parts of your projects documentation, as they inherently deal with issues where many developers got stuck.
A few examples:
- Play Framework Google Group
- Android StackOverflow Tag
- Ruby on Rails StackOverflow Tag
This is one area where open source projects shine: being able to instantly find answers by using google is a huge win. That said, for internal/proprietary projects, I encourage you to setup internal mailing lists, maintain an FAQ, and/or install an internal StackOverflow-style Q&A site within your company.
For popular open source projects, some of the best documentation comes in the form of content contributed by the community. For example, blog posts and talks from end users are a valuable source of information, revealing whats really working and what isnt; they are also great marketing, as it makes it clear other people are using project. Even blog posts that completely trash the project can be useful - think of it as a free design review!
If your project is open source, growing a community around it can have a huge pay off. A small investment in "marketing" your project - via good documentation, custom project pages, giving talks, and setting up meetup groups - can yield huge returns in the form of free labor, cleaner code, and better branding.
There are countless great blog posts and talks, so here are a few unbiased, randomly selected links that you should definitely check out:
- The Ultimate Guide to Getting Started with the Play Framework
- Composable and Streamable Play Apps
- The Play Framework at LinkedIn
- Play Framework: Async I/O with Java and Scala
- Bitcoin by Analogy
Further reading
If youve made it this far, you should now know how, and why, to document your code. I hope you join me in building software that is easier to use and learn.
If youre hungry for more info, I recommend the following resources:
- Writing Great Documentation
- The Most Important Code Isnt Code
- Teach, Dont Tell
- Designing Great API Docs
- No docs == no product
- Pointers to useful, well-written, and otherwise beautiful documentation
- If It Isnt Documented, It Doesnt Exist
- A beginners guide to writing documentation
- Tips for Writing Good Documentation
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.