Predictive Hacks

# Orthogonal Arrays with R

Whenever we are able to control the way the data is serving we should take an advantage. For example in a poll by applying sampling techniques or in medical statistics by splitting the participants into groups and treatments and so on.

In this post we will give an example of the Orthogonal Arrays which is a one example of the family of Experimental Designs.

## Scenario

Let’s assume that Joe sometimes suffers from stomach ache during the night. His gastroenterologist suspects that his diet is responsible for this occasional symptoms. Let’s also assume that Joe’s diet includes:

• Breakfast: Sandwich or Pancakes or Omelette or Yogurt with Honey and Nuts
• Morning Beverage: Coffee or Orange Juice
• Lunch: Pork or Fish or Chicken or Salad
• Dinner: Pasta or Rice or Milk with Cereals or Pizza
• Dessert: Ice-Cream or Nothing
• Night Drink: Tea or Wine

So all the possible combinations are $$4\times 2 \times 4 \times 4 \times 2 \times 2=512$$. The Doctor would like to detect which food(s) may cause him this discomfort and he is planning to apply the Orthogonal Arrays. Assuming that there is no interaction in the meals, he asks Joe to follow the following diet.

Question: Which are all the possible Orthogonal Arrays from this case?

Answer: Notice that we have 3 factors of 4 levels and 3 factors of 2 levels. Using the library DoE.base we can get the list of them.

library(DoE.base)

## the orthogonal arrays with 3 4-level factors and 3 2-level factors
show.oas(factors = list(nlevels=c(4,2),number=c(3,3)))

5  resolution IV or more  arrays found
name nruns lineage
10   L64.2.8.4.3    64
12   L64.2.6.4.4    64
23 L128.2.20.4.3   128
26 L192.2.36.4.3   192
29 L256.2.52.4.3   256
990  orthogonal  arrays found,
the first  10  are listed
name nruns                           lineage
17      L16.2.6.4.3    16                   4~5;:(4~1!2~3;)
18      L16.2.3.4.4    16                   4~5;:(4~1!2~3;)
53     L32.2.22.4.3    32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
55     L32.2.19.4.4    32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
57     L32.2.16.4.5    32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
59 L32.2.15.4.3.8.1    32               4~8;8~1;:(4~1!2~3;)
60     L32.2.13.4.6    32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
61 L32.2.12.4.4.8.1    32               4~8;8~1;:(4~1!2~3;)
62     L32.2.10.4.7    32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
63  L32.2.9.4.5.8.1    32               4~8;8~1;:(4~1!2~3;)

From the R output we can see that 8 runs is the minimum number of runs that we can get from this experiment. The ID code of this experiment is L16.2.6.4.3 which tells you that you can also use 6 2-level factors and 3 4-level factor.

Question: What is the recommended diet for Joe?

Answer: The doctor could ask Joe to follow the diet below for the next 16 days. Notice that 16 was the number of minimum runs that we got from that particular experimental design.

OA<-oa.design(nruns=16, factor.names=list(Breakfast=c("Sandwich","Pancakes","Omelette", "Yogurt+Honey+Nuts"), Beverage=c("Coffee","Orange Juice"),
Lunch=c("Pork","Fish", "Chicken", "Salad"), Dinner=c("Pasta","Rice", "Milk+Cereals", "Pizza"),
Dessert=c("Ice-Cream","Nothing"), Drink=c("Tea", "Wine")))

OA

           Breakfast     Beverage   Lunch         Dinner   Dessert Drink
1  Yogurt+Honey+Nuts       Coffee    Fish Milk+Cereals Ice-Cream   Tea
2  Yogurt+Honey+Nuts Orange Juice   Salad        Pizza   Nothing   Tea
3           Omelette Orange Juice Chicken Milk+Cereals Ice-Cream  Wine
4           Pancakes       Coffee    Fish         Rice   Nothing  Wine
5           Omelette Orange Juice    Fish        Pasta   Nothing   Tea
6           Pancakes       Coffee Chicken        Pizza Ice-Cream   Tea
7  Yogurt+Honey+Nuts Orange Juice    Pork         Rice Ice-Cream  Wine
8           Pancakes Orange Juice   Salad        Pasta Ice-Cream  Wine
9           Sandwich       Coffee    Pork        Pasta Ice-Cream   Tea
10          Sandwich       Coffee   Salad Milk+Cereals   Nothing  Wine
11 Yogurt+Honey+Nuts       Coffee Chicken        Pasta   Nothing  Wine
12          Sandwich Orange Juice    Fish        Pizza Ice-Cream  Wine
13          Omelette       Coffee   Salad         Rice Ice-Cream   Tea
14          Omelette       Coffee    Pork        Pizza   Nothing  Wine
15          Sandwich Orange Juice Chicken         Rice   Nothing   Tea
16          Pancakes Orange Juice    Pork Milk+Cereals   Nothing   Tea


Every row in the table above represents one day.

A good check is to see that the factor levels are balances pairwise. Let’s take two factors for example:

aggregate(Lunch~Breakfast+Dessert, OA, length)

          Breakfast   Dessert Lunch
1          Sandwich Ice-Cream     2
2          Pancakes Ice-Cream     2
3          Omelette Ice-Cream     2
4 Yogurt+Honey+Nuts Ice-Cream     2
5          Sandwich   Nothing     2
6          Pancakes   Nothing     2
7          Omelette   Nothing     2
8 Yogurt+Honey+Nuts   Nothing     2

Question: What are the next steps

Answer: Every single day, Joe should right down how was his stomach ache during the night. The range of the score can be from 0 to 10. Then the doctor would have the Xs independent variables from the Orthogonal Array and the Y dependent variable will be the score provided by Joe. Finally, he will be able to run a regression or ANOVA model to find out which variables are statistically significant.

### Get updates and learn from the best

Miscellaneous

#### AWS SageMaker and GitHub Integration

In this tutorial, we will show you how to integrate SageMaker with GitHub. There are some steps that need to

Python

#### How to Create a Simple Streamlit App + How to Deploy it on Heroku

Streamlit is our favorite way to create python web apps due to its simplicity. You can build beautiful and complex