Plea: Use Machine Learning in the Linux World, More!


As a Linux user with a fondness for the command line, it somewhat disappoints me to see machine learning go unused in the domain of CLI software. Specifically, anytime there is a "hard" or "confusing" problem to be solved, usually one whose solution may be partially or entirely based around the user's personal preferences (the ones they hold in their head, not the ones in their config files), it always falls back to being a matter of manual work to solve every single time. Let's give an example, because that sentence confuses me, too:

Problem: How to select all related email threads in a tree within mutt, based on a certain primer, e.g. "Discussion about adding HTTPS support to Mux".

Further Details: Email is threaded and, while one might hope that replies would always fall exactly in line with the context of the branch being replied to, this is not always the case. Sometimes the replier goofs and replies to the wrong thread, and other follow suit because that's where the topic has moved to. Other times, topics intermix and there's no clear way to cleanly separate branches into logical topics.

Current Solution: The user needs to manually search through the tree (maybe assisted by a search command, done by text matching or regex), and has to manually select each related thread, before performing whatever operation they desire.

Proposed Solution: Assume a model is trained to classify blocks of text (paragraphs, sentences, whatever) into one or more topics. This is commonly done and done very well, so it's a safe assumption that this is feasible. Select the block of text with the context you care about in any email within the tree, feed the text to the model as a primer (initiated via keybinding or something), and then run the model over the thread tree. Anytime the model classifies an email as having the same topic as the primer, select it.

Result: If your classifier was adequetly trained on enough text similar enough to your email in question (say by training it on your labelled inbox), you should have just saved yourself many minutes of reading and selecting emails, now only having to wait a second or two at most, even for a very large tree of lengthy emails.

I've of course made many simplifications and assumptions about how easy this would be, but even if it's not right now, it *should* be that easy. We live in an age where we're getting ready to make self-driving cars a mainstay, where human lives and infrastructure would be at risk with even the slightest mistake or bad assumption in the training and implementation of the car's model. And you're telling me there's no easy way to do what I want? Exactly, of *course* this is easy, it's just a matter of setting up the right tooling to make the "hard" parts of this just single key presses, and the tedious parts (labelling samples and other hand-holding of models) one-time investments.