Mr. Thearling: I am incredibly interested in moving my career into the world of data mining. What training do you recommend in order to be successful in this field?
FYI: I have an undergraduate degree in Sociology, which included plenty of instruction in quantitative research methods, and began my career as a data processing manager (cross tabs in WinCross, other analysis in SPSS, etc.) I'm currently known in-house as the stat/numbers nerd.
I'm registered for a course in SQL and aspire to attend training in SPSS and Cognos BI. I am proficient at Crystal Reports, but consider it a pretty crude tool (I liken using Crystal to trying to paint a fine portrait with kindergarten jumbo crayons.) Any other suggestions on launching a new leg to my career toward data mining?
Sarah, I think you have made a good start. Your interest in quantitative methods provides a solid foundation to build upon. I would recommend that you focus on increasing your skills in two areas: 1) databases and 2) analytic techniques. Your course in SQL is a great way to get started with databases. You might want to look into getting some hands-on experience with the implementation of a commercial database product (e.g., Oracle) so that your knowledge isn't completely abstract. OLAP (Online Analytic Processing) databases are also pretty important but you apparently have experience with this kind of software (WinCross, Cognos) so that should give you the background you need. The basic idea with the database experience is that you want to be able to move data into and out of a relational database, pull out subsets you might be interested in, and slice-and-dice the data and present it to users. If you can't reliably manipulate the data, it is difficult to get it to the point where you can explore it.
After you can access the data that you are interested in, the next step is to analyze it. A good statistical background is critical since all the data mining techniques need to operate in a statistically robust framework. Take several stats classes (if you haven't done this already) and try to work with real data. Learn how to evaluate and compare statistical models. You should also learn about the various machine learning algorithms (neural networks, decision trees, nearest neighbors, etc.), and depending on where you take your classes, this might be in either the statistics or computer science department. You want to build up a general knowledge of the tools that you can apply to solving data analysis problems. This isn't something that will happen quickly. If you are looking for a couple books on techniques, I can recommend "Intelligent Data Analysis" by Berthold & Hand and "The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman.
I would also recommend looking into data visualization and information design. Presenting the results of a data analysis project is often as important as the analysis itself. If your results are not understood and trusted, they will not have an impact. Unfortunately most data visualization courses focus on the technology and whiz-bang presentations so you should be careful that you don't waste your time in this area. I highly recommend you look into the books of Edward Tufte (especially his second book, "Envisioning Information") for an introduction to this subject. Tufte teaches a very popular one-day course on the subject that is quite good.
Finally, if you are mathematically inclined, you might want to look into the field of operations research and optimization. In many cases the output of a data mining model is not a single result but a collection of predictions. Optimization procedures can take these results and select one (or more) optimal actions.
For more information, check out SearchCRM's Best Web Links on Data Mining.
Dig deeper on Data governance
Related Q&A from Kurt Thearling, Years: 2002-2003
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.