Replace "independent variables" with "modeling variables"

The meaning of the words dependent and independent are not difficult to understand. Independent simply means not dependent. It could refer to a person, a molecule, a cell, a country, or to any kind of object. The point is that it is not influenced or controlled by anything, except by itself. It is IN-dependent. Of course, no person or nation can be completely independent, and there are degrees of dependence and independence. This is the first problem with these words. They have of black-or-white character.

The second problem appears when using them in the context of modeling. They are guiding us to think about cause-and-effect. It seems very reasonable to believe that the “dependent variables” are depending something. And because there are no other variables around, the reader is guided to think that they are depending on the “independent variables”.

Although we already see that there is a problem with these words, let us do our best to understand them when used in data modeling, and to make them fit to a modeling context.

“Independent” could mean that the variables are not depending on anything at all. But what could be an example of such variables? Think about this and you will realize that only random variables are like that. Random variables are by definition not caused or affected by anything at all! So the words “independent variables” can be used for random numbers. But in data modeling, we usually don’t work with random numbers.

“Independent” could also mean that there is no correlation between the variables, that the variables are independent of each other. The level of one variable will not be related to the level of any other of the independent variables. Such sets of independent variables can be created by using designed experiments where only selected variables having (close to) zero correlation are allowed. An interesting point in this context is that beside the designed variables, random variables also have zero correlation. But again, we usually don’t work with random numbers in data modeling.

The third problem with the words “dependent” and “independent” is therefore that no matter how we try to make “independent variables” understandable, they relate only to tightly controlled experiments, or to random variables. Fortunately, the nature has more to offer than this. Fortunately, the world is not black and white.

The way out of the swamp is to use stop using “independent” and “dependent” and to use other words. One way is to use “predictor variables” instead of independent variables and “response variables” to replace “dependent variables”. I have often copied these from the literature and I have used them myself in my own scientific writing, but I never felt completely happy about them. The reason is that they have a tendency to guide the reader towards cause-and-effect thinking.

The better way would be to use “modeling variables” to represent the variables making the model instead! “Model” is a very good word to explain that something is a simplification. “It’s not the real thing, it’s a model”. Even better, “modeling variables” are a simplifications in TWO ways. They are 1) a simplification themselves because not all variables of the universe are included and 2) because there will always be a part of the variables that does not relate to the system under study or that will not be possible to describe, for example random errors. Therefore, when creating a mathematical model from the modeling variables, a next step of simplification/modeling, is being taken. Again, it is great that the word “model” is used. It is clear from the beginning that is is an approximation. Another good point is that “modeling variables” is a neutral expression, clearly stating that it is NOT about cause-and-effect.

Finally, use “predicted variables” or “estimated variables” if we are trying to calculate other variables using e.g. regression. In this way, we will say that the modeling variables can be used to obtain an estimated or predicted variable.

Replace “independent variables” with “modeling variables”

Leave a Comment Cancel Reply