Ron Zacharski | Textbooks for data mining

I finally made a decision regarding what textbook to use for a data mining course I will be teaching in the spring. One challenge was that the course is cross-listed in a variety of departments: computer science, business, and information technology and, as a result, the students taking the class will have a diversity of backgrounds–some strong in statistics, others in programming. My original plan was not to have people do programming at all and have them just use Weka, a free, data mining tool. I was considering 2 textbooks: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar; and Data Mining: Practical Machine Learning Tools and Techniques, by Ian Witten and Eibe Frank. I’ve owned the Witten & Eibe book for quite some time and found it useful, but the Tan et al. book seemed more comprehensive and presented a bit more of the mathematical foundations (but it wasn’t overwhelming math). Then a third book entered the picture Programming Collective Intelligence by Toby Segaran. Even though I have only had it for a week, I like this book a lot. The book is oriented toward applying data mining tools. It uses Python and involves connecting to a variety of online data (for example, del.icio.us links). For example, the book covers how to make a recommendation system.

So now I have placed my textbook order and I am having two required textbooks: the collective intelligence book and the Witten & Eibe one.

If you are a student taking this course in the spring and if you don’t already know Python, I would recommend learning Python basics over Christmas break.

One thing I will cover in the course that is not in either of these two books is visualization. We will probably be using the programming language Processing. Another possibility is to use the excellent site, many-eyes.com, which allows people to visualize their own data.

This should be a fun!