Building Browser Extensions At Scale

Over ten million people use the Grammarly Chrome extension. Our Firefox, Safari and Edge extensions are incredibly popular, too. These extensions may look easy on the outside because they are low profile and easy to use. But it is actually a complex product supported by a full team of engineers. We have been developing and perfecting it for 6 years. Along the way we have learned a thing or two that we’d like to share. This article is intended to be an overview of our learnings and best practices on a broad range of topics. Feel free to skip the topics that aren’t relevant for you!

As a heads up, we are planning to host a meetup dedicated to browser extension development, where we can gather feedback and share knowledge. It will be in Grammarly’s San Francisco office. If you are interested in coming, register here and we will notify you close to the event date.

Extension  

Our extension highlights mistakes in editable fields on all sites that we can technically support. We inject UI onto the page, have processes running on the background page, trigger popups, and integrate the extension with our Editor app. We also conduct experiments within the extension, track business metrics, and monitor technical performance.

Table of contents

Part 1. Development infrastructure

Extension Builder


Extension builder

Chrome, Firefox, and Edge all support the WebExtensions standard, which makes development on those browsers relatively easy. Safari (Apple) is the only holdout, instead choosing to promote their own kind of extension called Safari Application Extensions. So development on Safari requires special support.

However, even the browsers that all adopted the WebExtensions standard still have differences in their APIs. Chrome has chrome.* API with callbacks. Firefox has browser.* and chrome.* (which will be deprecated in the future) API with Promises. And Edge has browser.* API with callbacks (!). It means that we have 3 different implementations of one standard!

Directory structure

The most convenient way to build a cross-browser extension is to maximize shared code between extensions. Below is an example of an annotated file tree of an extension that supports cross-browser build:

/src - source code
    /bg - background page files
    /cs - content script files
    /popup - popup files
    index.js
/resources - resources and templates required to generate valid extension
    /safari
        update.plist
        Settings.plist
        Info.plist
        icon-32.png
        icon-48.png
    /webextensions
        manifest.json
        ...

/tasks - build tasks (can be a gulp or any other scripts)
    assets.ts - builds JavaScript and CSS from source
    safari.ts - generates Safari specific extension
    webextension.ts - generates a WebExtension (Chrome, Firefox, Edge)
/bin - output directory
    /development - unpackaged extensions for testing needs
        /safari.safariextension - this is Safari's naming convention
        /edge
        /firefox
        /chrome
    /release - packaged extensions
        /safari.safariextension
        /edge
        /firefox
        /chrome

config.ts - here we keep data to generate manifests
package.json

We first developed the extension in Javascript, but in the past year we've migrated all of our code to Typescript (this journey merits its own article, but not here!). In the process, we realized that it's better to split one large TypeScript project into subsets of smaller projects. For example, you don't want to have typings that are relevant to the extension source code in the builder or tests and vice versa. This is also true for node.js dependencies.

Command line API

This is the default command to produce a development build for Chrome. It also runs a watcher and development web server (discussed later in this post):

npm start

Development build for Safari:

npm run dev:safari

Production build for Chrome:

npm run build:chrome

This article is not intended to tell you how to produce bundles, move files, or fill templates. You have to decide this for yourself, depending on your needs. We at Grammarly use:

  • gulp to compose tasks and move files. Gulp also has plugins that simplify many tasks. However, we’re considering using npm scripts as a simpler solution.
  • browserify to build JavaScript bundles. We’re considering using Webpack in the future.
  • TypeScript - we write code in TypeScript as it allows us to catch typos, simplifies refactoring, and adds autosuggestions.
  • We have these bundles:
    • background.js - scripts that run on the background page
    • popup.js - scripts for popup (UI that opens by default when you click the toolbar button)
    • vendor.js - dependencies
    • content.js - content scripts code injected into the page
  • gulp tasks are called from npm scripts, which helps to make a CLI API more typed

Development workflow

Sandbox (components playground)

Testing structure We develop pieces of our app separately to shorten the feedback loop. We have a sandbox directory where we configure components, serve via a development web server, and open in a browser as a regular webpage. In sandbox mode, we mock browser APIs. It’s much easier to develop UI components in the webpage context instead of the extension context. We use Chrome as the primary browser for development because it has the best development tools.

Testing

Testing structure

Testing is one of our main challenges, because a cross-browser extension is difficult to test manually. We have many auto tests that protect the project from bugs:

  • Unit tests with browser API mocks. Including tests that rely on a page DOM. We run them in different browsers on TestingBot (you can also run them on Browserstack or a similar provider). We use Karma runner and Mocha. The Karma runner has pros and cons. It runs all suites on one page, and if some tests change the DOM or global scope, they can interfere with each other in unexpected ways.

  • Functional tests. We use Selenium WebDriver to install our browser extension and simulate user behavior with generated input events. We run these tests on TestingBot as well. Their developer support is responsive, and they are quick to fix issues or add missing features. They implemented a good way to test browser extensions. These tests are hard to write and debug, but they are worth it for us.

  • Browser API tests. We wrote a module that encapsulates all browser APIs under one interface. So, for example, when you want to show a badge in Safari, it calls a specific API from Safari, and in Chrome it will call a specific API from Chrome. To test this functionality, we run tests right in the extension (!) where we have all the needed browser APIs we need to test. After tests are complete, we pass the report. Here we use Selenium and TestingBot as well.

Branching model & continuous integration

Extension branching model

 

Our goal is to eliminate manual actions as much as possible. We implemented (and are still improving) a set of CI jobs that help us. We use GitHub, where we have a master branch and release branches for all supported browsers (Chrome, Safari, Firefox, Edge).

In master, we always have tested, production-ready code. In release branches, we have the actual production state for each browser.

We create a branch per task and create pull requests (PR) to master upon completion. A PR triggers a CI job (we use TeamCity) that builds extensions and runs all tests mentioned above. A PR can be merged only if all checks are passed.

To trigger a release, you have to create and merge a PR to one of the release branches (Chrome, for example). It will run the build, then test jobs, and upon success, will run release jobs.

TeamCity

We use TeamCity as a CI platform. Before that, we used Jenkins and decided to migrate to a more robust CI system. TeamCity has good pricing for our scale as we have many agents. Just for the extension project, we have 11 agents (including 1 OS X agent). We execute tasks in Docker containers powered by Rocker (a wrapper on top of Docker that adds nice features like directory mounting). This way, all our builds always run in the same conditions, and they are reproducible.

Publishing

Every store has its own nuances, so this is tricky.

Chrome Store

Publish to the Chrome store

As with development tools, Chrome is ahead here as well. Google has a webstore API. This webstore API is super easy to use with an npm package called chrome-webstore-manager. It allows uploading extensions to the Chrome Store. If you were to do this manually, it would take over an hour to propagate the extension (update in the store); with the npm package, propagation takes only seconds. Another good thing about the Chrome Store is that it has an automated review process for established extensions, so we can have extremely quick release cycles. We release as often as we want and don’t collect release debt.

For security, we’d recommend securing your Google account with two-factor authentication so no one else can release your extension.

Mozilla add-ons

Publish to the Firefox store

In contrast to Chrome, passing Mozilla review is hard and time-consuming. Firefox reviewers check extensions manually and have limited resources to do so. When we tried to submit our first version, it took us six months to get to the store. They are trying to make things better though.

Automatic upload can be done with web-ext tool. When you run the sign command, it uploads the extension to the store, and if the code in your extension is not minified, you are fine. Otherwise, you will also need to submit raw sources, which we had to do. You will need to include a builder script that will produce the same minified code as you have in the extension.

In our case, it was a significant effort to implement a sources bundle, because we had a big project with private npm packages and didn’t want to share credentials. We created a special build step that generates a bundle for reviewers, with all needed dependencies already downloaded. We also automated the upload of this bundle (using casperjs script).

We learned that if you can get in touch with a reviewer through IRC and answer questions from the reviewer, it can significantly speed up the review process.

The Mozilla development account doesn’t have two-factor authentication, so your account is not as secure as Chrome’s.

Publish to the Safari store

Apple also has a manual review process. Release automation to the Safari store is hard. To pass review, you have to sign your extension. Before 2016, we used a xar utility (similar to this approach) to sign extensions. However, this is no longer possible after Apple changed the store. Now, extensions signed with xar are no longer passing reviews. To sign an extension, you have to use the Extension Builder UI from Safari (or a new Xcode-powered way to develop extensions). We found a way to automate that with Apple Script and dedicated an OS X (we have a physical Mac mini) TeamCity agent just for signing the Safari extension.

Recently we found xar-js, which creates a correct xar archive that works in Safari. The question is whether it will pass review. Its XML manifest (read more on xar archive) is a bit different from the manifest generated with Extension Builder.

We upload our extension via a POST request to URL from the upload form. The only thing you need is the auth token that you can get from the myacinfo cookie, which you get after you log in to the Apple developer account. We created this procedure:

  1. You log in manually to the store (which does allow for two-factor authentication). Copy myacinfo cookie value with WebInspector (this part can be automated with some console utility that utilizes phantomjs to log in to the store and fill forms).
  2. Set CI parameter to the cookie value and run build.

Regarding security: If your team operates in a single physical location, it makes sense to have a physical phone device in the office to get two-factor auth codes.

Edge

In the majority of cases, your Chrome extension should work as is in Edge. See Edge extensions documentation.

Deployment, however, is manual:

Submitting a Microsoft Edge extension to the Windows Store is currently a restricted capability. Reach out to us with your requests to be a part of the Windows Store, and we’ll consider you for a future update.

Versioning

The Chrome and Firefox stores will not pass the same version twice. So in a case of revert, you will have to upload a bumped version.

Monitoring

It goes without saying that we want to make sure no extension release breaks anything. We created a set of essential metrics that tell us about extension health. Here is a screenshot of our Grafana dashboard (sensitive data removed) on top of Elastic search. We monitor:

  • browser segmentation, including which browser version
  • spikes in errors
  • UI (cards) usage. If we see, for example, that cards with suggested corrections are opened and never closed, it means that we have an error and should revert

Tracking

It’s easy to confuse monitoring with tracking. Monitoring is about application health. Tracking is about business metrics. The primary extension metric for us is a Daily Active Users, which is just how many daily users we have.

async function dailyPing() {
    let pingDate = await prefs.get('pingDate')
    if (pingDate && new Date(parseInt(storageNextDate)) > Date.now())
        return
    logger.dailyPing()
    prefs.set('pingDate', getNextPingDate())
}

// get random local time between 3-4 AM
function getNextPingDate() {
  const now = new Date()
  if (now.getHours() > 2) {
    now.setDate(now.getDate() + 1)
  }
  now.setHours(3)
  now.setMinutes(Math.floor(Math.random() * 60))
  return now.getTime()
}

Logger is our entity that sends events via HTTP (using window.fetch) to the server. We use our in-house Apache Spark based solution for business metrics. Before that, Mixpanel worked pretty well for us, though we grew out of it (and it can be expensive when you send many events).

Collecting feedback about our extension

Feedback bot
We created a little script that downloads user reviews from the Chrome and Firefox stores. It posts feedback to a Slack channel, so we can easily see what our customers are saying.

Uninstall page

Feedback bot
Another even more important channel of feedback is our uninstall page survey. It’s implemented using runtime.setUninstallURL api.

Part 2. Application Specifics and trouble-shooting

This part is about our app-specific learnings.

How to send HTTP-requests

How to send http requests

While it’s technically possible to send HTTP (Ajax) requests from client scripts, it’s not reliable because the content security policy of a page where you inject content scripts may be applied and it can block the request. So instead we send a message to the background page and then send a request from it to the server.

It’s highly advised to filter the request origin on the backend. It will make your API more secure. In Chrome, by default the extension on each development machine will have different IDs and different origins. To make it static (so it will work with API), you have to put a key property in the manifest of a development build.

Authentication

How to send http requests The easiest way to implement authentication is using server-side cookies. A user goes to the registration/log-in page and gets a cookie associated with your domain.

Then when you send a request from the background page to the authentication API (on your domain), it will see your cookie and return a valid user object. Any other API on your domain will see the auth cookie as well and can identify the user.

If a cookie is removed, you will need to update the state of your extension. In WebExtensions, you can subscribe to the cookie.onChanged API to invalidate the user. Safari is missing this API, so you will need to make periodic requests to the server to get an actual user state.

Storing User Preferences

We use browser.storage.local in Chrome, Firefox, and Edge. In Safari, we use safari.extension.settings. We don’t call the preferences API from content scripts. We only call them on the background page, which we treat like a server to which we send messages from content scripts. This is just an architectural choice.

Experiments

When we do a major UI change, we want to make sure that our main business metrics don’t suffer. So, on rollout, we initially show a new feature to only a small percentage of users. We assign a test group to users through our in-house experiment framework. And then in the code, we have something like (pseudocode):

if (experiments.showExperiment) {
    showNewInterface()
} else {
    showOldInterface()
}

How we implemented our backend for assigning groups for tracking business metrics is another big and separate topic, not covered here.

Browser API wrapper

Our application works in multiple browsers. We created a wrapper module around browser APIs that encapsulates browser differences. In the extension application logic we write extensionApi.preferences.set, which translates to browser.storage.set in Chrome or safari.extension.settings in Safari. For every browser, we create a bundle with a particular extension API implementation. Our Extension API wrapper is organized as a separate library. Here is a high-level TypeScript interface:

export interface ExtensionApi {
  readonly tabs: TabsApi
  readonly cookies: CookiesApi
  readonly button: ButtonApi
  readonly notification: NotificationApi
  readonly message: MessageApi
  readonly preferences: PreferencesApi
  readonly management: ManagementApi
}

Permissions

You have to think ahead of time about what permissions you want to ask users for in the extension. When users first install an extension, they easily grant permissions. But later, it’s harder to ask for greater permissions. When you add a new permission to the manifest, Chrome silently disables your extension with little warning. If you have millions of users, this could be disastrous for your business.

We have a CI check that prevents a manifest file from accidentally changing permissions.

However, sometimes a new permission is critical for your product. In that case, you can use optional_permissions (at least in Chrome). You may also need to show a custom-designed dialog that explains to the user why you are going to ask for additional permission. Otherwise, you might get uninstalls and negative reviews.

Preventing broken UI

Page UI

When you add your DOM elements to the page, the page styles will be applied, and you may break your UI. Here are some tips on how to protect your components from page styles.

CSS modules. You want to be sure that your elements have unique CSS class names that do not conflict with page classes (css modules).

Custom tags Instead of span or div tags use unique custom tags like <extensionTag>, so they will not match any other page’s CSS selector.

Put important in CSS whenever possible if your UI is exposed to the page. (It’s best to automate this step from the beginning.) Many sites still have greedy CSS resets with * {} selector.

Put UI outside the body tag with document.documentElement.appendChild when possible. This is how your UI will not match body * selector (it will be inside an html tag), and if you set a z-index, it will be on top of other elements.

Another approach is to use Iframe to completely isolate your UI inside the iframe. We use this method for our popup editor. Use iframe.contentWindow to modify content inside iframe or use srcdoc property. Using iframe with remote src is not advised because the security policy of many sites prohibits this. This approach is good when you have one big UI element.

How to set up a popup

Page UI

If your extension has a toolbar button, it will probably have a popup that opens on a button click. By default, you can add an HTML page popup.html (Chrome, Firefox, Safari). This approach has pitfalls. You will have to deal with differences in the way browsers open it. In some cases, the popup appears with animation. In some cases, it will jump because of a height change after your UI renders on the client side. You can spend hours fixing that.

Another approach is to handle the toolbar button click and inject your popup UI to the page. In this case, you will have full control over the popup interface. This is how Pocket extension does it.

Having blacklisted domains

Blacklist

When your extension conflicts with some site, you may not be able to update your extension quickly. In Safari and Firefox, you may wait weeks before your update can be released. To solve this problem, we created a server-side config (blacklist), that has a list of domains and pages for which our extension is disabled. You could use this as a remote config for your extension.

Messaging mechanism

Messaging In some cases, you can’t update your extension. It may happen if you decide to restart your extension from zero (i.e., upload a new extension and deprecate the old one). This situation happened to us when Apple released a new Safari extension gallery, and we had to upload our extension to the new gallery and deprecate the old one. We needed a way to show users migration instructions.

You can use a similar mechanism to the Blacklist. Make sure you send a message in JSON (or other) format that is parsed on the extension side and constructs DOM. Don’t send raw HTML with naive parsing like el.innerHTML = serverRawMessage. It’s insecure, and you will not pass Firefox review.

Runtime updates

Chrome updates extensions in runtime at any given moment. If an extension has a background page, it means that it will be reloaded. But all open tabs with previously injected content scripts remain unchanged, and content scripts will no longer be able to send messages to the background page! When a content script tries to use port.postMessage (or chrome.runtime.sendMessage), it will get an exception. We found a way to fix that. The background page injects an additional content script that provides a proxy to a new port. And the old content scripts will be able to use it through the custom DOM events.

Sometimes changes are so significant that content scripts and the background page are no longer compatible. In that case, during release we change a major version and the extension detects it (current version is stored in the extension storage). Then we show a browser notification with a reload button that reloads from the background page tabs where we want to refresh content scripts.

This is not relevant for Safari because it updates extensions only after the browser restarts.

Performance

We noticed that a big bundle of injected code could dramatically slow down page load speed. You have to pay attention to your dependencies and try not to include more than you need.

We also don’t load content scripts to iframes (see all_frames at manifest documentation) because it considerably slows down pages with many iframes (like Gmail).

Summary

It took us six years to learn these nuances of extension development. We are still learning, and we still have many things to improve. If the above challenges sound interesting to you, we’d love for you to join our team! Feel free to write me at kigorw@grammarly.com. Tell us about yourself, share your own challenges, and even better, tell us what we could be doing better.