This nascent site will be a repository for thoughts too big for a tweet that I really have to stop bothering my friends with. It will likely focus on natural language processing and machine learning, but some other topics might creep in from time to time. I’ll be impressed if I manage two posts a year.
Some earlier posts are hosted on the Roam Analytics github page (previously on their website which no longer exists). A few highlights:
- Are categorical variables getting lost in your random forests?: demonstrating a limitation of common random forest implementations when faced with categorical data.
- Prescription-based prediction: predicting a medical provider’s specialty from their prescribing patterns.
- Translation and scaling invariance in regression models: correcting some misconceptions about feature normalization for linear models.