Sunday, February 21, 2016

The Set Up - What tools will be implemented...subject to change :-)

Machine Learning is a hot topic currently.  Being a trendy there are many bright shiny tools out there to help distract from the process of getting good at my real goal: prediction.  To the end of being good at prediction the tech stack that I will implement is going to be limited.  It's far too easy to get distracted by the next new thing.  Staying focused on the end goal is key. is will be the code editor of choice.   Why?  There are not concrete reasons.  The soft reasons range from it's an opensource project that is widely used to it is customizable and as they like to boast, it's hackable.  There won't be much hacking in my future but packages have been added to support Python.

Python is my scripting language of choice.  Specifically Python 2.7.11 rather than the 3.x version.  I'm not geeked-out enough to understand the underlying differences between the two strains but in doing research on other downstream packages 2.x seemed to be a more compatible choice.

Python again is open source, stable and broadly used.  My initial working in econometrics was done in SAS, which is not open source and too expensive for a personal project.  Many use R which is also open source.   That would be a great choice too as it seems many online classes use R.

Python is implemented for many other reasons.  A few of them are that learning Python enables skills to attack other types of problems outside of machine learning and statistical analysis; it supports machine learning with pandas, numpy, ggplot, sklearn and many other modules; and finally given the timing I will attempt to use Google's Tensorflow (one shiny new toy) that comes with an easy to use Python interface.

Tensorflow can work on a Windows machine but is really linux based.  To that end I will also install Oracle VM VirtualBox with Ubuntu.  The virtual machine is pretty easy to set up and Tensorflow installs easily too.

A review of the stack above reveals a strong bias of mine.  I will default to broadly adopted open source tools.  There are good and bad points to this bias.  One bad one is that sometimes I find myself solving a simple problem that is difficult in open source and easy in a paid product.  It seems that paid products spend more resources on getting the UI/UX right.  However the good points (especially for broadly adopted tools) generally outweigh the bad.  Also if anything that gets developed turns into an income producing product the stack doesn't need to be changed to scale.

One final tool that I use extensively is Google. My coding skills are not all that advanced.  But I found that my problems are pretty common too.  Google points me to the right place, which often happens to be StackOverflow.   Get an account.  I'm mostly a consumer but hope to be able to contribute later.