Introduction to CouchDB

CouchDB is an open-source document-style database created and maintained by the Apache Software Foundation, the same foundation behind open-source projects like Lucene.Net, Tomcat, and the Groovy programming language.

As a developer, you’re probably familiar with the typical SQL-style databases like Microsoft’s SQL Server, PostgreSQL, and MySQL. Document databases like CouchDB are conceptually different from those kinds of databases. Instead of storing data in tables and rows, your data gets stored in a document file. It’s a fancy way of saying your data is just a JSON file saved on disk somewhere; well, it’s a little more complicated than that, but that’s the gist: you pass CouchDB some JSON, it puts it in a document, and returns it when you query for it in the future.

CouchDB’s core strengths are syncing and replication, database-per-user scenarios, in-browser support through PouchDB, and a concept called “eventual consistency”. One of my own personal favorite things about CouchDB is its built-in “Futon” admin interface. If you need to quickly look at some data, just navigate to the admin interface in your browser and get to work; no need to install SQL management tools or load up Azure Data Studio, your browser is all you need. Futon can be used to view or edit documents, create new documnets, run a quick query with “Mango” (their MongoDB-compatible query language), and even edit the database’s views — something we’ll talk more about later.

Installation used to be a huge pain (for me, at least, on Windows), but in the age of Docker and containers, it’s as easy as docker pull couchdb && docker run -it -p 5984:5984 couchdb. Go ahead, try it on your machine right now! Once it’s done, just open up localhost:5984/_utils in your browser and you can start playing with CouchDB!

The CouchDB "Futon" interface

The document and eventual consistency

Documents are a fairly easy thing to reason about in CouchDB. If you’re familiar with SQL, then you can think of them as the equivalant of a row in a table, except they have no schema. As long as your data can be converted to valid JSON, it’s a valid document. This is an important distinction, because it means you can store nested data inside the same document. Since there’s no concept of rows or tables, you don’t create a bunch of tables and link rows together with foreign keys like you would in SQL; instead you either store nested data inside the document, or you create “views” to map and reduce multiple documents into a clear set of data. Again, more on views later!

Every document in CouchDB has two required fields:

  1. The _id field, which must be unique but can also be literally any string you want it to be, whether it’s a GUID, a name, a street address, and so on. When you look up a document in CouchDB, you’re looking it up by the _id.
  2. The _rev field, which stands for revision. This field is very important, but it’s auto-generated by the database whenever you create, update and even delete a document. If you’re updating or deleting a document in CouchDB, you must include the revision or the update will be rejected.

The revision is the secret behind CouchDB’s “eventual consistency”. Eventual consistency means that CouchDB does not guarantee that everyone who uses the database will receive the latest version of a document, but rather CouchDB focuses on being highly available and partionable.

        - It sounds a little tedious, and is certainly more "work" than updating a row in SQL, but the revision field is what allows CouchDB to be "eventually consistent". This is particularly powerful in scenarios where users may be offline (e.g. in a Progressive Web App), or where multiple CouchDB databases or even instances are synchronizing with each other. 

Database-per-user

  • Each CouchDB instance can have tens-of-thousands of databases. That sounds confusing, but these are not the same thing as a SQL Database. It might be better to think of databases as a SQL table (except that there’s no schema).
  • Each database can be password protected too, with a unique password for every database. This lets you do things like create a database for every unique user and protect it with a password that’s unique to that user. Each user owns all of the data in a database, and they can’t access the databases owned by other users.

Synchronizing and replicating databases

  • CouchDB has a robust replication tool that lets you synchronize changes from one database to another, even across CouchDB instances.
  • [Screenshot of replication page]
  • You can replicate from a local database to another local database, or from a remote database to a local database, local database to remote, or even remote to remote.
  • If you’re using PouchDB to “emulate” CouchDB databases in the browser, you can sync those changes to your CouchDB instance, and get document conflict warnings if there are documents that can’t be merged (e.g. because a document was changed on the server and on the browser without syncing first).
    • This enables some pretty powerful offline scenarios, such as when you’re building a PWA that might not always have internet connectivity. The user can continue using the app even when they have no internet connection, and any documents that were created, updated or deleted will get replicated back to the CouchDB database as soon as a connection is available.

Document views

  • By now you might be wondering how you can actually work with the data in a CouchDB database. Since there is no SQL involved, how are you supposed to group things together, or only pull out the data you need? After all, you don’t get to choose which fields are returned when you get a document; you either get it all or you get nothing.
    • Document views are “precompiled”, which means CouchDB will populate them once when the view is created, and then whenever an affected document is changed. The view results are sitting around in the database waiting for you to request them, they aren’t calculated on the fly like a SQL query is.
    • CouchDB 2.0 introduced “find queries”, which gives you a sort-of query language (very similar to MongoDB’s), but these are run against all documents in the database whenever the query is made. The results aren’t optimized for performance, so if you’re making the same query often, you should absolutely put it in a view instead and get a huge performance boost.
    • Views have two functions:
      1. A map function, which receives a doc and can them map that data to an any valid JavaScript value to be “emitted” (or not). The map function is just a regular JavaScript function, although it has no access to the DOM or Node.
      2. A reduce function, which is optional. The map function will always be called first, and then the reduce function will be called on the array of mapped values to further transform them.
        • I find the reduce functions can sometimes be incredibly confusing in that they’re prone to get stuck in loops and throw errors if the reduce function causes the result to grow instead of shrink.
        • CouchDB views have several built-in reduce functions, including count and sum. The count function will count the number of documents in the result. The sum function will sum up the mapped values (assuming they’re a number of some kind).
    • You can combine these views with certain parameters to return specific results. For example, if your database contains order information, using the reduce function count isn’t super useful as listing the docs in a database will tell you the total number of docs; but if you combine the view with the start_key={some timestamp} parameter, CouchDB will only return mapped results with a timestamp key greater than that timestamp, giving you a count of all orders placed after that timestamp.

Using views, an example

  • We’re going to use a simple scenario as an example, which will be a CouchDB database that stores orders placed on both mobile devices and on the web. We’ll create two different views: one that will list orders placed before or after certain timestamps, and one that will list orders based on whether they were placed on the web or on mobile.
    • You could also put both of these in one single view by making smart use of the key you emit in the map function. Since any JavaScript is valid, you can emit an array containing two different keys: the timestamp, and the source. Then you’d use the start_key and end_key parameters to filter the results.
      • This is where views can start to get a little confusing in my opinion, because the way the start_key and end_key work as an array isn’t always obvious. Instead, we’ll use two simple views for this example and dive into more complex views and lookups in a dedicated article.
    • In our scenario, reduce functions aren’t actually needed to get the functionality we’re after; the map functions emit all of the data we need and it can be easily filtered using the key parameters. Instead we’ll use a reduce function on the list_by_source view to tally the orders for both sources and return that information.
      • Without that reduce function, you’d either need to list all of the documents and count them in your program (not recommended, there could be tens of thousands of orders), or you’d add a prebuilt _count reduce function and make two calls to get the counts for web and mobile.
  • Example of creating the views using the CouchDB UI
  • Example of querying the views using the CouchDB UI

CouchDB’s HTTP API

  • While CouchDB has a fancy UI for working with the data inside of it, and that’s really nice when you’re debugging an issue or running a quick manual query using Mango, that isn’t really useful for your app or program. Here’s where we come to CouchDB’s best feature (in my opinion): CouchDB’s HTTP API.
    • Everything you’ve used up to this point in the Futon UI is just a wrapper around the HTTP API. You can do everything you just did with curl on the command line, fetch from the browser, System.Net.HttpClient in .NET, and the http package in Node. There’s no special port or query language like you have in a SQL database. Anything that can make a web request can use the CouchDB API!
  • Here are some things you can do with the HTTP API:
    • Create, get, update, delete documents
    • Create, get, update, delete views
    • Create, get, update, delete entire databases
  • In my opinion, this makes CouchDB very, very accessible. Just do a quick docker install couchdb && docker run -p 5984:5984 couchdb and start making HTTP calls to the database.

Using Davenport

  • Since Davenport is TypeScript, you can store multiple document types in the same database, then discriminate on a shared field (such as type), and you’ll get intellisense and compiler protection when working with the documents.
  • Davenport abstracts away much of the API into simple functions you can use to work with your database.
    • Although it only currently supports Node, I have plans to add browser support using fetch.
  • Davenport provides a “client” which you can pass a generic type to, giving you intellisense and compiler protection on all of the documents in your database. A long while ago this meant your database was constrained to one single type — something that works perfectly fine but precludes your views from doing things like combining a user document with all of the orders placed by that user. These days, though, you can model all of your document types in one single database thanks to TypeScript’s union types, string literals and discriminated unions.

Your CouchDB client goes from this:

import * as Davenport from "davenport";

interface Order {
    _id: string
    _rev: string 
    timestamp: number
    user_id: string
}

const client = new Davenport<Order>(...);
const doc = await client.get("some id");
// doc type is Order, and all docs listed or retrieved by this client will be Order too

To this:

import * as Davenport from "davenport";

interface Order {
    _id: string
    _rev: string
    type: "order"
    timestamp: number
    user_id: string
}

interface User {
    _id: string
    _rev: string
    type: "user"
    name: string
}

type Doc = Order | User;

const client = new Davenport<Doc>(...);
const doc = await client.get("some id");
// doc type is Doc. TypeScript knows it has _id, _rev, and type props.
// Check the doc's type prop to determine what it is
if (doc.type === "order") {
    // TypeScript knows this doc is an Order, and now we can use the doc.timestamp and doc.user_id props
} else {
    // TypeScript knows this doc is a User, and now we can use the doc.name prop
}

Learn how to build rock solid Shopify apps with C# and ASP.NET!

Did you enjoy this article? I wrote a premium course for C# and ASP.NET developers, and it's all about building rock-solid Shopify apps from day one.

Enter your email here and I'll send you a free sample from The Shopify Development Handbook. It'll help you get started with integrating your users' Shopify stores and charging them with the Shopify billing API.

We won't send you spam. Unsubscribe at any time.