Techincal Report number 413
Univesity of Wisconsin - Madison
I want to share some ideas about teaching design of experiments. They are related to something I have often wondered about: whether it is possible to let students experience first-hand all the steps involved in an experimental investigation-thinking of the problem, deciding what experiments might shed light on the problem, planning the runs to be made, carrying them out, analyzing the results, and writing a report summarizing the work. One curiosity about most courses on experimental designing, it seems to me, is that students get no practice designing realistic experiments although, from homework assignments, they do get practice analyzing data. Clearly, however, because of limitations of time and money, if students are to design experiments and actually carry them out, they cannot be involved with elaborate investigations. Therefore, the key question is this: Is it feasible for students to devise their own simple experiments and carry them through to completion and, if so, is it of any educational value to have them do so? I believe the answer to both parts of the question issues, and the purpose of this paper is to explain why.
The particular design course I have taught most often is a one-semester course that includes these standard statistical techniques: t-tests (paired and unpaired), analysis of variance (primarily for one-way and two-way layouts), factorial and fractional factorial designs (emphasis given to two-level designs), the method of least squares (for linear and nonlinear models), and response surface methodology. The value of randomization and blocking is stressed. Special attention is given to these questions: What are the assumptions being made? What if they are violated? What common pitfalls are encountered in practice? What precautions can be taken to avoid these pitfalls? In analyzing data how can one determine whether the model is adequate? Homework problems provide ample opportunity for carefully examining residuals, especially by plotting them. The material for this course is discussed in the context of the iterative nature of experimental investigations.
Most of those who have taken this course have been graduate students, principally in engineering (chemical, civil, mechanical, industrial, agricultural) but also in a variety of other fields including statistics, food science, forestry, chemistry, and biology. There is a prerequisite of a one-semester introductory statistics course, but this requirement is customarily waived for graduate students with the understanding that they do a little extra work to catch up.
One possibility is to use simulated data, and the scope here is wide, especially with the availability of computers. At times I have given assignments of this kind, especially response surface problems. Each student receives his or her own sets of data based upon the designs he or she chooses.
The problem might be set up as one involving a chemist who wishes to find the best settings of these five variables-temperature, concentration, pH, stirring rate, and amount of catalyst-and to determine the local geography of the response surface(s) near the optimum. To define the region of operability, ranges are specified for each of these variables. Perhaps more than one response can be measured, for instance, yield and cost. The student is given a certain budget, either in terms of runs or money, the latter being appropriate if there is an option provided for different types of experiments which have different costs. The student can ask for data in, say, three stages. Between these stages the accumulated data can be analyzed so that future experiments can be planned on the basis of all available information.
In generating the data, which contains experimental error, there are many possibilities. Different models can be used for each student, the models not necessarily being the usual simple first-order or second-order linear models. Not all variables need to be important, that is, some may be dummy variables (different ones for different students). Time trends and other abnormalities can be deliberately introduced into the data provided to the students.
The student prepares a report including a summary of the most important facts discovered about his or her system and perhaps containing a contour map of the response surface(s) for the two most important variables (if three of the five variables are dummies, this map should correspond to the true surface from which the data were generated). It is instructive then to compare each student's findings with the corresponding true situation.
Students enjoy games of this type and learn a considerable amount from them. For many it is the first time they realize just how frustrating the presence of an appreciable amount of experimental error can be. The typical prearranged undergraduate laboratory experiments in physics and chemistry, of course, have all important known sources of experimental error removed (typically the data are supposed to fall on a straight line-exactly-or else).
A few years ago I asked each student taking the course to perform an experiment of his or her own devising, thereby diving rise to real rather than simulated data. The students were given three weeks to complete this assignment and hand in a detailed report describing what they had done and what they had learned. The students obviously enjoyed the project and derived quite a bit from it. Consequently, I have repeated the assignment every semester I have taught the course since then.
One's first reaction might be that there are not enough possibilities for experiments of this kind. But this is incorrect, as is illustrated by Table 1, which lists some of the experiments reported by the students. Experiments number 1-63 are of the home type and experiments number 64-101 are of the laboratory type. Note the variety of studies done. To save space, for most variables the levels used are not given. Anyway, they are not essential for our purposes here. Most of these experiments were factorial designs. Let us look briefly at the first two home experiments and the first two laboratory experiments.
In experiment number 1 the student, Norman Miller, using a factorial design with all points replicated, studied the effects of three variables-seat height (26, 30 inches), light generator (on or off), and tire pressure (40, 55 psi)-on two responses-time required to ride his bicycle over a particular course and his pulse rate at the finish of each run (pulse rate at the start was virtually constant). To him the most surprising result was how much he was slowed down by having the generator on. The average time for each run was approximately 50 seconds. He discovered that raising the seat reduced the time by about 10 seconds, having the generator on increased it by about one-third that amount and inflating the tires to 55 psi reduced the time by about the same amount that the generator increased it. He planned further experiments.
In experiment number 2 the student, Karen Vlasek, using a factorial design with four replicated center points, determined the effects of three variables on the amount of popcorn produced. She found, for example, that although double the yield was obtained with the gourmet popcorn, it cost three times as much as the regular popcorn. By using this experimental design she discovered approximately what combination of variables gave her best results. She noted that it differed from those recommended by the manufacturer of her popcorn popper and both suppliers of popcorn.
In experiment number 64 the student, Dean Hafeman, studied a routine laboratory procedure (a dilution) that was performed many times each day where he worked-almost on a mass production basis. The manufacturer of the equipment used for this work emphasized that the key operations, the raising and lowering of two plungers, had to be done slowly for good results. The student wondered what difference it would make if these operations were done quickly. He set up a factorial design in which the variables were the raising and lowering of plunger A and the raising and lowering of plunger B. The two levels of each variable were slow and fast. To his surprise, he found that none of the variables had any measurable effect on the readings. This conclusion had important practical implications in his laboratory because it meant that good results could be obtained even if the plungers were moved quickly; consequently a considerable amount of time could be saved in doing this routing work.
In experiment number 65 the student, Rodger Melton, solved a trouble-shooting problem that he encountered in his research work. In one piece of his apparatus an extremely small quantity of a certain chemical was distilled to be collected in a second piece of the apparatus. Unfortunately, some of this material condensed prematurely in the line between these two pieces of apparatus. Was there a way to prevent this? By using a factorial design the problem was solved, it being discovered that by suitably adjusting the voltage and using a J-tube none of the material condensed prematurely. The column temperature, which was discovered to be minor consequence as far as premature condensation was concerned (a surprise), could be set to maximize throughput.
The most popular home experiments have concerned cooking since recipes lend themselves so readily to variations. What to measure for the response has sometimes created a problem. Usually a quality characteristic such as taste has been determined (preferably independently by a number of judges) on a 1-5 or 1-10 scale. Growing seeds has also been an easy and popular experiment. In the laboratory experiments, sensitivity or robustness tests have been the most common (the dilution experiment, number 65, discussed above is of this type). Typically the experimenter varies the conditions for a standard analytical procedure (for example, for the measurement of chemical oxygen demand, COD) to see how much the measured value is affected. That is, if the standard procedure calls for the addition of 20 ml. of a particular chemical, 18 ml. and 22 ml. might be tried. Results from such tests are revealing no matter which way they turn out. One student, for example, concluded ``The results sort of speak for themselves. The test is not very robust.'' Another student, who studied a different test, reported ``The results of the Yates analysis show that the COD test is indeed robust.''
I have always made these assignments completely open, saying that they could study anything that interested them. I have tended to favor home rather than laboratory experiments. I have suggested they choose something they care about, preferably something they've wondered about. Such projects seem to turn out better than those picked for no particularly good reason. Here is how a few of the reports began: ``Ever since we came to Madison my family has experienced difficulty in making bread that will rise properly.'' ``Since moving to Madison, my green thumb has turned black. Every plant I have tried to grow has died.'' (Nothing works in Madison?) ``This experiment deals with how best to prepare pancakes to satisfy the group of four of us living together.'' ``I rent an efficiency on the second floor of an apartment building which has cooking facilities on the first floor only. When I cook rice, my staple food,I have to make one to three visits to the kitchen to make sure it is ready to be served and not burned. Because of this inconvenience, I wanted to study the effects of certain variables on the cooking time of rice.'' ``My wife and I were wondering if our oldest daughter had a favorite toy.'' ``For the home brewer, a small kitchen blender does a good job of grinding malt, provided the right levels of speed, batch size and time are used. This is the basis of the experimental design.'' ``During my career as a beer drinker, various questions have arisen.'' ``I do much of the maintenance and repair work around my home, and some of the repairs require the use of epoxy glue. I was curious about some of the factors affecting its performance.'' ``My wife and I are interested in indoor plants, and often we like to give them as gifts. We usually select a cutting from one of our fifty or so plants, put it in a glass of water until it develops roots, and then pot it. We have observed that sometimes the cutting roots quickly and sometimes it roots slowly, so we decided to experiment with several factors that we thought might be important in this process.'' ``I chose to find out how my shotguns were firing. I reload my own shells with powders that were recommended to me, one for short range shooting and one for long range shooting. I had my doubts if the recommendations were valid.''
The conclusion reached in this last experiment was: ``As it looks now, I should use my Gun A with powder C for close range shooting, such as for grouse and woodcock. I should use gun B and powder D for longer range shooting as for ducks and geese.'' As is illustrated by this example and the first four discussed above, the students sometimes learned things that were directly useful to them. Some other examples: ``Spending $70 extra to buy tape deck 2 is not justified as the difference in sound is better with the other, or probably there is no difference. The synthesizer appears not to affect the quality of the sound.'' In operating my calculator I can anticipate increasing operation time by an additional 15 minutes and 23 seconds on the average by charging 60 minutes instead of 30 minutes.'' ``In conclusion, the Chinese dumplings turned out very pretty and very delicious, especially the ones with thin skins. I think this was a successful experiment.
Naturally, not all experiments were successful. ``A better way to have run the experiment would have been to...'' Various troubles arose. ``The reason that there is only one observation for the eighth row is that one of the cups was knocked over by a curious cat.'' ``One observation made during the experiment was that the child's posture may have affected the duration of the ride. Mark (13 pounds) leaned back, thus distributing his weight more evenly. On the other hand, Mike (22 pounds) preferred to sit forward, which may have made the restoring action of the spring more difficult.'' (The trouble here was that the variable the student wanted to study was weight, not posture.) Another student, who was studying factors that affected how fast snow melted on sidewalks, had some of his data destroyed because the sun came out brightly (and unexpectedly) one day near the end of his experiment and melted all the snow.
Because of such troubles these simple experiments have served as useful vehicles for discussing important practical points that arise in more serious scientific investigations. Excellent questions for this purpose have arisen from these studies. ``Do I really need to use a completely randomized experiment? It will take much longer to do that way?'' There have been good examples that illustrate the sequential nature of experimentation and show how carefully conceived experimental designs can help in solving problems.''...This must have been the main reason why the first experiment completely failed. I decided to try another factorial design. Synchronization of the flash unit and camera still bothered me. I decided to experiment with...'' some other factors.
As a result of these projects students seem to get a much better appreciation of the efficiency and beauty of experimental designs. For example, in this last experiment the student concluded: ``The factorial design proved to be efficient in solving the problem. I did get off on the wrong track initially, but the information learned concerning synchronization is quite valuable.'' Another student: ``It is interesting to see how a few experiments can give so much information.''
There is another point, and it is not the least important. Most of the students had fun with these projects. And I did, too. Just looking through Table 1 suggests why this is so, I think. One report ended simply: ``This experiment was really fun!'' Many students have reported that this was the best part of the course.
There is a tendency sometimes for experimenters to discount what they have learned, this being true not only for students in this class, but also for experimenters in general. That is, they learn more than they realize. Hindsight is the culprit. On pondering a certain conclusion, one is prone to say ``Oh yes, that makes sense. Yes, that's the way it should be. That's what I would have expected.'' While this reaction is often correct, one is sometimes just fooling oneself, that is, interrogation at the outset would have produced exactly the opposite opinion. So that students could more accurately gauge what they learned from their simple experiments, I tried the following and it seemed to work: after having decided on the experimental runs to perform, the student guessed what his or her major conclusions would be and wrote them down. Upon completion of the assignment, these guesses were checked against the actual results, which immediately provided a clear picture of what was learned (the surprises) and what was confirmed (the non-surprises).
I now tend to spend much more time introducing each new topic than I used to. Providing appropriate motivation is extremely important. For classes I have had the privilege of teaching-whether in universities or elsewhere-I have found that it has been better to use concrete examples followed by the general theory rather than the reverse. I now try to describe a particular problem in some detail, preferably a real one with which I am familiar, and then pose the question: What would YOU do? I find it helpful to resist the temptation to move on too quickly to the prepared lecture so that there is ample time for students to consider this question seriously, to discuss it, to ask questions of clarification, to express ideas they have, and ultimately (and this really the object of the exercise) to realize that a genuine problem exists and they do not know how to solve it. They are then eager to learn. And after we have finished with that particular topic they know they have learned something of value. (I realize as I write this that I have been strongly influenced by George Barnard, who masterfully conducted a seminar in this manner at Imperial College, London, in 1964-65, which I was fortunate to have attended.)
Current examples are well-received, especially controversies (for example, weather modification experiments). Some useful sources are court cases, advertisements, TV and radio commercials, and ``Consumer Reports''. An older controversy still of considerable interest from a pedagogical point of view is the AD-X2 battery additive case. Gosset's comments on the Lanarkshire Milk Experiment are still illuminating. Sometimes trying to get the data that support a particular TV commercial or the facts from both parties of a dispute has made an interesting side project to carry along through a semester.
Having each student exercise his or her own initiative in thinking up an experiment and carrying it through to completion has turned out successfully. Using games involving simulated data has also been useful. I have incorporated such projects, principally of the former type, into courses I have taught, and I urge others to consider doing the same. Why?
First of all, it's fun. The students have generally welcomed the opportunity to learn something about a particular question they have wondered about. I have been fascinated to see what they have chosen to study and what conclusions they have reached, so it has been fun for me, too. The students and I have certainly learned interesting things we did not know before. Why doesn't my bread rise? Why don't my flowers grow? Is this analytical procedure robust? Will carrying a crutch make it easier for me to get a ride hitchhiking? (Incidentally, it made it harder.)
Secondly, the students have gotten a lot out of such experiences. There is a definite deepening of understanding that comes from having been through a study from start to finish-deciding on a problem, the variables, the ranges of the variables, and how to measure the response(s), actually running the experiment and collecting the data, analyzing the results, learning what the practical consequences are, and finally writing a report. Being veterans, not of the war certainly but of a minor skirmish at least, the students seem more comfortable and confident with the entire subject of the design of experiments, especially as they share their experiences with one another.
Thirdly, I have found it particularly worthwhile to discuss with them in class some of the practical questions that naturally emerge from these studies. ``What can I do about missing data?'' ``These first three readings are questionable because I think I didn't have my technique perfected then-What should I do?'' ``A most unusual thing happened during this run, so should I analyze this result with all the others or leave it out?'' They are genuinely interested in such questions because they have actually encountered them, not just read about them in a textbook. Sometimes there is no simple answer, and lively and valuable discussions then occur. Such discussions, I hope, help them understand that, when they confront real problems later on which refuse to look like those in the textbooks no matter how they are viewed, there are alternatives to pretending they do and charging ahead regardless or forgetting about them in hopes they will go away or adopting a ``non-statistical'' approach-in a word, there are alternatives to panic.
Table 1. List of some studies done by students in an experimental design course.