Factor Analysis of Ameru Sales

DoChenRollingBearing

Yellow Jacket
Supporter
Messages
1,275
Reaction score
0
Points
0
Location
SE USA
...

My newest article is very numbers-intensive, so will not likely appeal to most of you, but I hope that some of you guys (wait, wait, I know at least one!) who might...

:)

"Factor Analysis" is a multivariate statistical routine that seeks to simplify complex data sets into a relatively compact model, yet retain some predictive power. I am hoping to use this and other methods to help me find useful nuggets in our sales data, like (an example) if I buy 500 pcs of X, how confident should I be in buying 200 pcs of Y if they are used in the same vehicle?

http://tinyurl.com/bsjlho7

I am only an "advanced beginner" with this, I have a little training, but as they say, "a little knowledge is a very dangerous thing".

Comments and suggestions for further work are extremely welcome!
 

pmbug

Your Host
Administrator
Messages
7,433
Reaction score
14
Points
193
Location
Texas
Sorry DCRB, it looks all Greek to me. :shrug:

Glad you got your computer working again. :)
 

benjamen

Yellow Jacket
Messages
1,620
Reaction score
0
Points
0
Location
Migratory
Interesting read; a few comments:
1) You do have some really strong correlations, which is great for you. Typically, I only see correlation coefficients over 0.9 in a class room example!

2) If you have a decent idea of what products are correlated with product X, why don't you try something like multiple linear regression? Even excel has a free patch download that includes regression (data analysis I believe) that lets you run up to 16 variables through regression. This will give you an equation to calcuate how many of a certain product to purchase given certain inputs.

Essentially, if you know how many of product 1(X) you are going to buy, how many of product 2 (Y) to purchase. Regression will solve the beta0 and beta1 portion of the below equation.

Y = beta0 + beta1*X

http://en.wikipedia.org/wiki/Regression_analysis
http://office.microsoft.com/en-us/e...ssion-analysis-in-excel-2007-HA010219001.aspx
 
Last edited:

DoChenRollingBearing

Yellow Jacket
Supporter
Messages
1,275
Reaction score
0
Points
0
Location
SE USA
...

benjamen, a multiple regression along the lines of what you suggest looks like a great idea, thanks!

***

I am wondering about the very high correlations as well. Perhaps business data does not behave in the messy way of typical "Social Sciences" studies that show correlations of 0.3 - 0.6, for example.

But a correlation of 0.463 (as in one of my examples) does me little good as a buyer!

Another thing I need to be looking at would be "spurious correlations" (The Signal and the Noise, Nate Silver). These would be apparently related pieces that got in the "by chance" only. But, as my data set gets larger, and as I do this more often (say every three months), those spurious correlations should lessen.

The regression method you suggest would at least give me a prediction as well as an error component. That could be very useful, and it is better than just "dead reckoning".

:cheers:
 

benjamen

Yellow Jacket
Messages
1,620
Reaction score
0
Points
0
Location
Migratory
...

benjamen, a multiple regression along the lines of what you suggest looks like a great idea, thanks!

***

I am wondering about the very high correlations as well. Perhaps business data does not behave in the messy way of typical "Social Sciences" studies that show correlations of 0.3 - 0.6, for example.

But a correlation of 0.463 (as in one of my examples) does me little good as a buyer!

Another thing I need to be looking at would be "spurious correlations" (The Signal and the Noise, Nate Silver). These would be apparently related pieces that got in the "by chance" only. But, as my data set gets larger, and as I do this more often (say every three months), those spurious correlations should lessen.

The regression method you suggest would at least give me a prediction as well as an error component. That could be very useful, and it is better than just "dead reckoning".

:cheers:
How high you can realistically get your correlation really depends on what you are trying to model. For example, for amusement I tried every combination of a baseball team's statistics (RBIs, wins, HRs, ect) during the season to predict who would win the world series out of the initial 8 teams that make the playoffs (this was before they added more wildcards last year). I knew going into this little experiment that most likely nothing was going to be extremely correlated. I think the highest combination of variables only produced a model that correctly picked the team ~40% of the time.

Things to watch for in your regression model:
R square: You want this to be high; essentially means what percentage of the variance is explained by your model
Significance F: You want this to be low; essentially means the chance your model is completely useless
p-value of each predictor variable: you want this to be low; essentially means the chance that predictor variable is completely useless

:cheers:
 
Top