Thinkful Dev Toolkit Series Part 3: Philip Forget, CTO: Readability

In this installment of Thinkful’s Dev Toolkit: an interview with Philip Forget, CTO of Readability–an NYC startup that turns cluttered web content into clean articles that you’ll actually want to read. 

Hi Philip. Tell us about more about Readability.

Readability is basically a DVR for reading. We find just the content for the articles you want to read and sync them to all your devices–phone, tablet, e-reader–just by clicking one button or keyboard key as well as providing you with a clean, readable view almost instantly.

What tools and services are crucial to keeping Readability up and running?

Sentry and Graphite are extremely important for us. With Graphite, having the ability to measure almost any metric with absolutely no setup required for tracking new metrics is a revelation. We also lean heavily on Sentry for error tracking which provides us with a real time view of system health and, almost more importantly, the impacts of new deploys as they happen.

We also rely heavily on Boto, a community driven AWS python library–it’s the underlying glue for our Amazon Web Services (AWS) integrations. It maps AWS to a very clean Python interface. We use it for small scripts, maintenance, for deploys via our Jenkins box. We also have a number of libraries that use it transparently in the background, e.g. signing all our outgoing emails with DKIM and sending them using SES.

We also use Boto to handle part of the transport layer for our messaging queues in connection with celery task servers and Amazon’s SQS (Simple Queue Service).

Tell me about some interesting tech challenges that you’re solving, and what tools or processes you’re using to solve them.

One of the biggest challenges for us is making sense of something as heterogeneous as the web. We don’t only parse english content with Readability–we handle right-to-left languages like Hebrew and Arabic really well too.

We’ve set it up so that we can handle pretty much any language, and that wasn’t trivial. Our goal is to keep our parser as language agnostic as possible to the content it’s working on while making sure to capture everything that a user identifies as useful content.

We operate on a huge dataset–making what seems like a small change to the parser can have a huge impact on the results we serve to users. Constant testing helps us deal with the cat-and-mouse problems that we often find ourselves chasing down, so we’re really investing in our testing suite.

It’s not just unit testing–for Readability, we had to create our own test framework for testing parsers. We also built a plugin to our parser to allow less technical people to improve the parser internally–helping them tweak parameters, add and remove selectors–with an easier-to-use visual interface.

What’s interesting about your tech stack?

Well, we built a simple SMTP server in Python to handle the email-in-to-your-account functionality, so you can email articles to your account that you set up easily. This was actually surprisingly easy to do–there are lots of libraries that make it very trivial to set up this kind of email integration.

Also, we’re an AWS company all the way through. One thing that’s particularly interesting is our “Send to Kindle” integration. Amazon actually doesn’t provide any API for sending to the Kindle. When you get a Kindle device or app, all you get is an email address that you can email things to in a particular format.

That’s really interesting. How did you get around that Kindle limitation?

Amazon provides a closed binary for generating mobi files called kindlegen, and we wrote a Python library that enables you to create both EPUB and MOBI files on the fly, creating all the necessary navigation files and metadata for those given formats. We’d love to open-source this, and we’re working on it– but it’s still very tied to the codebase.

We’re big open-source enthusiasts generally–our original parser was written in Javascript, and that’s open-sourced. Now, the parsing happens on the server side, but we’ve open sourced a few other internal tools.

Tell us about one of the tools that the team has open-sourced so far.

Keanu’s an exciting one.  It powers our browser extensions and enables keyboard shortcuts that work across different browsers. Essentially, we did a bunch of abstract mapping for all 3 of the browsers, and Keanu gives a clean interface for doing keyboard stroke-listening so that you don’t have to worry about conflicting shortcut rules between browsers.

Photo of Chris Schomaker in Readability's offices at Arc90

Readability’s Chris Schomaker keeping an eye on the codebase

On a more personal note, what’s your machine set-up? What tools do you depend upon to keep yourself efficient?

Well, I’m a Vim zealot. I think you should spend as much time as possible learning Vim and UNIX. Learning the philosophy of getting small tools together to do one job makes it a lot easier for you to work on a team of people that also think that way.

When you’re spending 10 hours a day moving text around, your tools should make it as close to the way that you’re thinking as possible. It’s important to make sure that your tools aren’t getting in the way.

Did you always want to work in tech?

When I was 8, my parents bought me a Visual Basic book. I moved on to C++ in high school and ended up studying architecture in college. I didn’t do “computer stuff” for a while, but did a lot of 3D and programmatic modeling for my architecture degree. When I left school, the job path for architecture was closing a bit, so I decided to move into robotics for a while. That led me back into the tech world, and here we are today at Readability.

How did you learn to code when you decided to dive back into the tech world?

Well, I mostly taught myself web programming. I tried to apply what I knew already, and when I was programming in architecture school, I’d only played around on the front-end, mostly with JavaScript and Actionscript 3. Having worked with those languages for 7-8 years, you learn more about the parts you don’t like than the parts you do, and that led me to play around with Ruby and Python and Java.

Python stuck out immediately as the one that fits my way of thinking the most. At the beginning, I thought I’d remain a generalist in terms of programming languages, but now I focus so much on Python.

I find it that much more useful to be well-versed in one language and paradigm. It’s definitely important to be a polyglot and know more than one language, but really knowing your tools and frameworks inside and out makes your life a lot easier.

What advice would you give to programmers who are just starting out?

When I started programming, I had this nagging fear of starting new projects. I’d always think, “Oh, this problem has likely been solved already, so I won’t even attempt it.” That’s wrong–there are so many problems that still need to be solved or solved differently.

That’s the one thing that I wished I’d have known–just do the thing you want to do, just code. The worst thing that happens is finding something better than what you’ve made, and there’s no shame in that… you’ve learned a ton along the way.

Readability is hiring for a few positions–tell us what you look for when evaluating candidates.

We’re hiring for a dedicated sysadmin / devops position; those with AWS experience, and those who know and love to work at scale. Readability, and Arc90 where it’s built, is a great atmosphere for growth.

—-

Philip got his start with front-end development. Sign up for more info on our next class–starting June 10th–to learn about how front-end skills can help you kickstart your coding career.