Sunday, February 21, 2016

The Set Up - What tools will be implemented...subject to change :-)

Machine Learning is a hot topic currently.  Being a trendy there are many bright shiny tools out there to help distract from the process of getting good at my real goal: prediction.  To the end of being good at prediction the tech stack that I will implement is going to be limited.  It's far too easy to get distracted by the next new thing.  Staying focused on the end goal is key.

Atom.io is will be the code editor of choice.   Why?  There are not concrete reasons.  The soft reasons range from it's an opensource project that is widely used to it is customizable and as they like to boast, it's hackable.  There won't be much hacking in my future but packages have been added to support Python.

Python is my scripting language of choice.  Specifically Python 2.7.11 rather than the 3.x version.  I'm not geeked-out enough to understand the underlying differences between the two strains but in doing research on other downstream packages 2.x seemed to be a more compatible choice.

Python again is open source, stable and broadly used.  My initial working in econometrics was done in SAS, which is not open source and too expensive for a personal project.  Many use R which is also open source.   That would be a great choice too as it seems many online classes use R.

Python is implemented for many other reasons.  A few of them are that learning Python enables skills to attack other types of problems outside of machine learning and statistical analysis; it supports machine learning with pandas, numpy, ggplot, sklearn and many other modules; and finally given the timing I will attempt to use Google's Tensorflow (one shiny new toy) that comes with an easy to use Python interface.

Tensorflow can work on a Windows machine but is really linux based.  To that end I will also install Oracle VM VirtualBox with Ubuntu.  The virtual machine is pretty easy to set up and Tensorflow installs easily too.

A review of the stack above reveals a strong bias of mine.  I will default to broadly adopted open source tools.  There are good and bad points to this bias.  One bad one is that sometimes I find myself solving a simple problem that is difficult in open source and easy in a paid product.  It seems that paid products spend more resources on getting the UI/UX right.  However the good points (especially for broadly adopted tools) generally outweigh the bad.  Also if anything that gets developed turns into an income producing product the stack doesn't need to be changed to scale.

One final tool that I use extensively is Google. My coding skills are not all that advanced.  But I found that my problems are pretty common too.  Google points me to the right place, which often happens to be StackOverflow.   Get an account.  I'm mostly a consumer but hope to be able to contribute later.

 

Wednesday, February 17, 2016

Ginning up

It has been forever since my last post and seems like a good time to try to start writing again.  So...

I'm starting a new life project.  I am attempting to learn machine learning.  This is not a from scratch project as much of my Master's degree was very quantitative with heavy emphasis on econometrics.  Some of the basics of machine learning build on this foundation.  

Part of my problem is that I've forgotten many of the details of econometrics and statistics.  Thankfully the Internets have rushed to my rescue.  The Khan Academy has sessions on matrix algebra, calculus, statistics and history (for fun.)   Udacity and Coursera have many courses on data science, machine learning and visualizations.  Just to be a bit traditional like the old fart I am, Amazon has provided Machine Learning in Python by Michael Bowles.   There are also many new machine learning tools being open sourced by Google and others.

A reasonably large data set is also needed.  For that I'm using the Lending Club loan data which has initially provide 240Mb of data.

Wish me luck on my journey and feel free to participate.

Sunday, November 4, 2012

Chopper Dog

Chopper Dog passed away last night at 1:00 AM with Peggy holding him and making sure his last moments were comfortable.

Chopper had the biggest heart and the best "Hello" ever to greeting old and new friends alike. Sadly his enormous personality could not compensate for his weak heart.

 I miss him.

 I miss his "good morning"

 I miss his pants, snorts and snores.

He was my office budd--laying on his bed supervising my every action just itching to go on the next walk to sniff and investigate every leaf, grass, litter or whatever crossed his path. We called him the Professor because he would smell, investigating ever nuance of one spot until pulled away.

 I'm not sure what else to say. I am not sure how to come to terms with his passing. Chopper is now sniffing everything, playing frisbee and doing all those things his body would not allow.

Sic Balls Chopper!!!

Thursday, October 4, 2012

Only once?

Is it possible for a person to be mentioned only once on the Internet?

Depending on how the Internet is defined, I'd say it's impossible to only occur once on the web.  For now, the web is defined as any data publicly available thru a URL in a browser--Google counts.  

Why?

A soon as the individual is mentioned on a url, the site is index, scrapped, shared and so on.  Each replication adding a mention.

So what is the minimum number of times a person/instance of data can occur on the Internet?

Tuesday, October 2, 2012

Rabbits

While cycling with my former co-worker this weekend, we spied a pack of riders ahead of us. 

At what age do the people in front of you morph into just people out having fun.  Right now for me they are more than that.  They need to be reeled in.  They need to be behind me.  I must catch the rabbit.

Why?  Does this change?

Strava

Search This Blog

Loading...