Climbing the Summit

Stories and Insights about Building Grammarly

  • WebdriverIO (or Wdio) is a great open-source tool for end-to-end UI testing. It provides Selenium bindings for NodeJS, along with a set of different services and reporters.

    But what if none of them serves your purpose? E.g., reporter doesn’t show test duration time or doesn’t attach failure screenshots? You have at least two options: make a pull request and ask the package’s maintainer to review it and merge it, or develop your own reporter (or service).

    In this post, we will show you step-by-step how to create a custom Wdio reporter using TypeScript and NodeJS. We’ll be using documented Wdio features as well as undocumented Wdio features that we discovered along the way.

    Andrii Lazebnyi, Alisa Mansurova December 6, 2017 webdriverio, reporting, typescript
  • Over ten million people use the Grammarly Chrome extension. Our Firefox, Safari and Edge extensions are incredibly popular, too. These extensions may look easy on the outside because they are low profile and easy to use. But it is actually a complex product supported by a full team of engineers. We have been developing and perfecting it for 6 years. Along the way we have learned a thing or two that we’d like to share. This article is intended to be an overview of our learnings and best practices on a broad range of topics. Feel free to skip the topics that aren’t relevant for you!

    Igor Kononuchenko October 6, 2017 browser extensions
  • Grammarly, like most growing companies, strives to make data-driven decisions. That means that we need a reliable way to collect, analyze, and query data about our users. We started out using third-party tools like Mixpanel to handle our analytics needs, but soon our needs surpassed the capabilities of those tools. For example, we wanted to control the pre-aggregation and enrichment of data, to generate reports that were more customized, and to have higher confidence in the accuracy of data. So we developed our own in-house analytics engine and application on top of Apache Spark. Recently, I gave a talk at the Spark Summit sharing some of our learnings along the way. The talk covered:

    Misha Chernetsov June 28, 2017 data, analytics, etl, apache, spark
  • This is the first of a series of interviews with engineers at Grammarly. We want to highlight the personalities, backgrounds, talents, and perspectives of our Engineering team—the people who bring to life our ideas.

    Sunshine Yin, Stas Kravets May 12, 2017 profile, team, diversity, culture
  • We at Grammarly have a lot of data at our disposal: frequency and type of errors, user behavior, the amount of text sent for processing, etc. This data allows us to test and improve our performance. However, managing data is extra work — often the location and format of the data are not suitable for immediate consumption. So in data-driven companies, it is very common to create information reshaping pipelines, conventionally called ETL (Extract, Transform, Load), which do literally what it says on the box — grab the data in one place, modify it, and put it in another place.

    Alexander Yakushev May 4, 2017 clojure, etl
  • How do you know if your proofreading algorithm is doing a good job? So far, the NLP community has used the standard of “minimal edit corrections,” i.e., the minimal number of edits to make a sentence grammatically correct. However, the problem with this approach is that a grammatically correct sentence doesn’t always sound natural to a native speaker. For the past two years, we—Joel Tetreault, Courtney Napoles, and Keisuke Sakaguchi—have been tackling this problem. Joel is Grammarly’s Director of Research, and Courtney and Keisuke are both Ph.D. students at Johns Hopkins Center for Language and Speech Processing.

    Joel Tetreault, Courtney Napoles, Keisuke Sakaguchi March 31, 2017 nlp, machine learning, corpus, proofreading
  • As discussed in the first part of this series, we were very excited when we figured out how to properly build Docker images, until we realized that we had no idea how to run them in production. You might have already guessed that we were pondering building our own tool.

    Yuriy Bogdanov September 9, 2015 infrastructure, platform, open source
  • Today, the industry is saturated with discussions about containers. Many companies are looking for ways they can benefit from running an immutable infrastructure or simply boost development performance by making repeatable builds between environments simpler. However, sometimes by simplifying the user experience we end up complicating the implementation. On our journey to a usable, containerized infrastructure, we faced a number of daunting challenges, the solutions to which are the subject of this post. Welcome to the bleeding edge!

    Yuriy Bogdanov September 7, 2015 infrastructure, platform, open source
  • At Grammarly, the foundation of our business, our core grammar engine, is written in Common Lisp. It currently processes more than a thousand sentences per second, is horizontally scalable, and has reliably served in production for almost 3 years.

    We noticed that there are very few, if any, accounts of how to deploy Lisp software to modern cloud infrastructure, so we thought that it would be a good idea to share our experience. The Lisp runtime and programming environment provides several unique, albeit obscure, capabilities to support production systems (for the impatient, they are described in the final chapter).

    Vsevolod Dyomkin June 26, 2015 lisp, infrastructure, debugging
  • In this post, we are going to discuss a common evolution of server-side architecture that many growing companies face. It is a now-legendary transition from a monolithic application to a micro-services architecture. And although decoupling is a sound software development concept, there are a number of risks, and pain points associated with it. This writeup covers some of the issues we faced while scaling Grammarly’s server backend and the solutions and insights that we had in the process.

    Stas Kravets April 24, 2015 introductory, architecture
  • The task of comparing constituency parsers is not a trivial one. Parsers vary in the types of mistakes they make, types of texts they are good at parsing, speed, and all kinds of interesting features and quirks within each implementation. We set out to understand what stands behind the vague F-measure numbers lurking around 90% and what kind of issues to expect from different parsers, regardless of their overall quality.

    Mariana Romanyshyn, Vsevolod Dyomkin November 3, 2014 nlp, open source
  • At Grammarly, we use a lot of off-the-shelf core NLP technologies to help us make a little bit of sense in the mess that is natural language texts (English in particular). The issue with all these technologies is that even small errors in their output are often multiplied by the downstream algorithms. So, when a sophisticated mistake-detection algorithm is supposed to work on individual sentences, but it receives a fragment of a sentence or a couple merged together, it may find all sorts of funny things inside.

    Oleksii Sliusarenko, Vsevolod Dyomkin April 22, 2014 nlp