## Assignment 3 comment advice

The multiple regression and the simple regression give you different numbers for the effect of fertilizer on yield. Here are some ideas that you can use in your comment:

1. Rain and fertilizer and correlated in the data (from part 2 of your answer).
2. The difference in rain, between plots that had a lot of fertilizer and plots that had a little, may have contributed to the effect that fertilizer showed in the simple regression.
3. The multiple regression tried to separate the effects of fertilizer and rain. It gave some of the effect on yield to rain, taking it away from fertilizer.

To explain the difference in prediction between the two methods, you can use these ideas, in addition to (or instead of) the ideas in Assignment 3:

1. Rain and fertilizer and correlated in the data. (See part 2 of your answer).
2. The simple regression will do fine if rain and fertilizer continue to move together. This means that if you give a plot 800 pounds of fertilizer, the simple regression predicts as if the plot got 30 inches of rain.
3. You were predicting for 800 pounds of fertilizer but only 20 inches of rain.

Here's another way to think about it:

Imagine that, instead of rain from heaven, the different amounts of water on the different fields came from a practical joker who deliberately put more water on the fields on which you put more fertilizer. He wants to fool you into thinking that the fertilizer's effect is bigger than it really is. (Maybe he sells fertilizer for a living!) If you use the simple regression to predict, you will fall for his scheme. If you then apply 800 pounds of fertilizer -- an amount that is way above the average of the other fields -- to a new field, and if the joker doesn't add a corresponding above-average amount of water, the yield will disappoint you. You won't get as much as the simple regression predicted.

By the way, if your rain coefficient is not statistically significant, the difference between the two predictions will not be statistically significant either. If you compare the confidence intervals of the predictions, you'll see that each prediction is inside the other prediction's 95% confidence interval.