This nascent site will be a repository for thoughts too big for a tweet that I really have to stop bothering my friends with. It will likely focus on natural language processing and machine learning, but some other topics might creep in from time to time. I’ll be impressed if I manage two posts a year.

Some earlier posts are hosted on the Roam Analytics github page (previously on their website which no longer exists). A few highlights:


  • Tokenization for language modeling: Byte Pair Encoding vs Unigram Language Modeling