So why is that a subject, that I loathed that much when I was in college, becomes the centerpiece of this blog? I don’t have the answer. The truth is that in order to understand many things in this world, we need to understand statistics. And not just your plain “I know what a bell curve is” but the required knowledge to make decisions when faced with the data. These days the amount of data that is generated every single day surpases what we can read in one year, reading on a full time basis. We need tools to process all this data so that we can mine knowledge and make decisions based on that knowledge.
Maybe there is your answer. I need to understand statistics in order to understand data mining. But not only that, I need to master Probability as well. At the end of the day, we need to produce models that can predict future events based on the analysis of the data performed. These models are sometimes based on statistical models, or we need statistical concepts to calculate parameters that mean something when presented. I consider data mining to be a subject that every graduate from a college these days have to know, at least on a basic level.
And why is it that I hated the subject so much? Well, as it usually happens, everything started a long time ago, when I was in college. The teacher we had was good, I don’t want to imply she didn’t teach us anything. I was able to grasp some things from that class. I hated that we didn’t have time to cover a lot. She surely made the subject hard to learn. In those days we didn’t have no stinking computers, or TAs, her office hours were impossible to attend (usually while I was taking other classes), and my classmates hated the subject more than I did, so not much help from my peers either.
When I graduated from college I went to work for a Metrology Laboratory (yes, metrology the science of measurements, maybe I should start a blog about that too), and surprise! I had to use statistics and probability to calculate uncertainties and errors in the measurement. At this point, I was able to learn a little more about the subject. But most of the details on this area were already worked out for us (we usually based our calibration and calculations procedures on standards and other publications), so it is not like we were talking about statistics that much anyways.
Then I came to the University of Florida to get a MS in Engineering Mechanics. There I took a class on data analysis. It was very interesting and I guess was one of the reasons I started to use Matlab for programming, mathematical calculations and plotting of data.The use of probability and techniques such as Fourier transforms made the class more interesting, although the focus was more on analysis of periodic data. But in this class I learned cross correlation analysis, linear regression, probability density functions, and so on.
To be continued...
No comments:
Post a Comment