Multiple testing procedures for identifying desirable dose combinations in bifactorial designs
Bettina Buchheister 1Walter Lehmacher 1
1 Institute for Medical Statistics, Informatics and Epidemiology, University of Cologne, Cologne, Germany
Abstract
Hung, Chi, and Lipicky proposed the AVE and MAX tests to analyse in a bifactorial design whether combinations of two drugs at several doses fulfil the desirable property of superiority to both their single drug components. These are global tests and do not identify the special combinations which are more effective than their respective single components. Here multiple testing procedures based on linear contrast tests and on the closed testing principle will be presented. They will be compared with simultaneous Min tests of Laska and Meisner. The performance of these approaches is investigated by simulation studies.
Keywords
drug combination, Min test, bifactorial design, AVE- and MAX test, closed testing procedure, linear contrast test, experimentwise error rate
Introduction
According to the guidelines of the U. S. Food and Drug Administration (FDA, CFR 300.50), one of the requirements for approving the use of a drug combination is that each component must make a contribution to the claimed effects. Analogously the guideline for combination drugs of the European Agency for the Evaluation of Medical Products (EMEA, CPMP/EWP/240/95) requires that the benefit/risk assessment of the fixed combination is equal or exceeds the one of each of its substances taken alone. That means, the combination needs to be simultaneously more effective than its single components.
This property of superiority may be tested by using the Min test of Laska, Meisner [1], [2] in (2x2) factorial design trials where each component of the combination is chosen at some fixed dose level based on prior information. Due to unknown potential interactions of the components the preselection of the dose combination is often difficult. Therefore multi level factorial designs involving simultaneous multiple dose combinations are demanded.
In case of more than one combination drug, however, there is a multiple testing problem and two questions are posed: (1) Globally: Is there any combination which fulfils the property of superiority to their single components? - Here one global hypothesis is tested. (2) Locally: Which specific combinations fulfil this property? - In this case test procedures controlling the experimentwise error rate α are required. That means the probability of at least one wrong inference should be controlled by the error rate α.
Neither dose response analyses nor general analysis of variance with interactions are suited for answering these two questions. Approaches which solely compare the effect of the combinations with the effects of their components are interesting. Hung, Chi, Lipicky [3] and Hung [4] developed two global test procedures which protect the overall type I error rate. For the local problem they only recommended to use a adjusted simultaneous Min tests according to the Hochberg [5] procedure.
In this article new procedures based on the closed testing principle will be presented. They are formed of different families of elementary hypotheses and allow for multiple testing hypotheses in a step down manner. Specific linear contrast tests were used. Furthermore, a local maximum test based on the global MAX test from Hung et al. [3] will be developed and a modification of simultaneous Min tests will be suggested. All procedures control the experimentwise error rate α. The performance of the proposed test procedures is investigated by simulation studies and suggestions for practical applications will be formulated.
Notation
Consider a two factorial design with I dose levels of drug A and J dose levels of drug B. Let µij, i=0, …, I and j=0, …, J, denote the true mean responses of the dose combination (i, j), whereby high values of µij's indicate benefit. (i, 0) and (0, j) denote the single drug components (Table 1 [Tab. 1]).
Table 1: Scheme of a bifactorial dose combination design

Let µij be estimated by the group mean  , where Xijk, k=1, …, nij, is the observed effect of the k-th subject in the (i, j)-th dose combination group. nij is the sample size of the (i, j)-th dose combination group. Assuming variance homogeneity, the pooled estimator of σ2 is given by
, where Xijk, k=1, …, nij, is the observed effect of the k-th subject in the (i, j)-th dose combination group. nij is the sample size of the (i, j)-th dose combination group. Assuming variance homogeneity, the pooled estimator of σ2 is given by 
 
There are 2 • IJ marginal hypotheses, which compare each combination with one of its single component:
 versus
  versus  
 versus
  versus  .
.
Examining the claimed combination superiority, IJ local combination hypotheses can be formulated as union hypotheses of the two marginal hypotheses:
 
versus  
These IJ local combination hypotheses should be tested controlling the experimentwise error rate α. If the global testing problem is considered the global hypothesis is
 .
.
Previous approaches
Up to now, two approaches have been published. In case of only one combination drug Laska and Meisner [1], [2] suggest the Min test. For the general (I+1) x (J+1) design, two global tests are proposed by Hung, Chi, Lipicky [3].
Laska-Meisner Min test
In the simple case (I = J = 1) only one combination drug is observed and the hypothesis of interest is the following union hypothesis:
 
versus  
HAB will be rejected if both marginal hypotheses HA and HB are rejected at level α, using appropriate test statistics. This so-called Min test is a test for the simple combination drug problem with experimentwise level α.
Under rather mild conditions Laska and Meisner [1], [2] showed, that this test is the uniformly most powerful within the class of monotone level α tests. A generalization for testing union hypotheses with more than two hypotheses is possible. By the extended Min test procedure a union hypothesis is rejected if each partial hypothesis can be rejected at level α.
Global tests from Hung, Chi and Lipicky
In the general (I + 1) x (J + 1) case, a multiple testing problem arises. Two global tests are presented by Hung, Chi, Lipicky [3]. Their test statistics are based on the minimum gains over all dose combinations  . The tested global hypothesis
. The tested global hypothesis  versus
 versus  is equivalent to H0.
  is equivalent to H0.
The "AVErage" global test statistic TAVE is defined by the average of the observed minimum gains, and the "MAXimum" global test statistic TMAX is the maximum of the observed minimum gains. That is,
 
where S is the pooled estimator of σ and  the observed mean effect of the combination (i, j). Both tests are one-sided level α tests, requiring a balanced design and normally distributed data with homogeneous variances. The distributions of TAVE and TMAX are derived by Hung et al. [3]. A more precisely and extended table of critical values
  the observed mean effect of the combination (i, j). Both tests are one-sided level α tests, requiring a balanced design and normally distributed data with homogeneous variances. The distributions of TAVE and TMAX are derived by Hung et al. [3]. A more precisely and extended table of critical values  of the tests as that given by Hung et al. [3] is presented in Table 2 [Tab. 2]. The tests, however, are not concerned with the multiplicity of the testing problem of the IJ local combination union hypotheses.
 of the tests as that given by Hung et al. [3] is presented in Table 2 [Tab. 2]. The tests, however, are not concerned with the multiplicity of the testing problem of the IJ local combination union hypotheses.
Table 2: Level 
α critical values of AVE and MAX tests in a balanced design

Note that these two global tests are developed for balanced designs, but two modified global tests in case of unequal sample sizes are provided [6].
New approaches
In practice, one might be often interested in the local question: Which dose combination(s) have the property of superiority to their respective components? Therefore, multiple procedures to find desirable combinations which control the experimentwise error rate are required.
Closed testing procedure of IJ local combination hypotheses
Consider the IJ local combination hypotheses  as elementary hypotheses. Constructing a closed system of hypotheses, the global hypothesis is the intersection of all IJ local combination hypotheses H0. Therefore each global test for H0 is a competitor to the AVE and MAX global tests. The family of all intersections of the IJ local combination hypotheses can be tested at the α level by using a step down procedure.
 as elementary hypotheses. Constructing a closed system of hypotheses, the global hypothesis is the intersection of all IJ local combination hypotheses H0. Therefore each global test for H0 is a competitor to the AVE and MAX global tests. The family of all intersections of the IJ local combination hypotheses can be tested at the α level by using a step down procedure. 
Two examples of closed system of hypothesis with local combination hypotheses are given in Figure 1 [Fig. 1] and Figure 2 [Fig. 2]. In latter only the hierarchy of hypothesis is presented; arrows are omitted for sake of clearness.
Figure 1: Closed system of hypotheses with local combination hypotheses, (2x3)-design

Figure 2: Closed system of hypotheses with local combination hypotheses, (3x3)-design

Notice, that all hypotheses are intersection union hypotheses. But most of the general used level α test procedures are constructed for union intersection hypotheses. Thus each intersection union hypothesis must first be transformed into union intersection hypothesis by the rules of elementary set theory algebra. Afterwards generalized Min tests may be used. Specific level α tests for the intersection hypotheses based on linear contrast tests will be specified later on.
Closed testing procedure of 2 • IJ marginal hypotheses
Consider the 2 • IJ marginal hypotheses  and
 and  , which compare the effect of the combination with the effect of one of its components, as elementary hypotheses. Then, a system of hypotheses closed under intersection can be constructed (e. g. Figure 3 [Fig. 3]). This system of hypotheses contains
, which compare the effect of the combination with the effect of one of its components, as elementary hypotheses. Then, a system of hypotheses closed under intersection can be constructed (e. g. Figure 3 [Fig. 3]). This system of hypotheses contains  hypotheses and is substantially larger than the system of hypotheses constructed by the IJ elementary local combination hypotheses (cf. Figure 3 [Fig. 3]). However, it contains only intersection hypotheses without unions which are easier to test. The family of all intersections of the 2 • IJ marginal hypotheses will be tested by a step down procedure. Subsequently a local combination hypothesis
 hypotheses and is substantially larger than the system of hypotheses constructed by the IJ elementary local combination hypotheses (cf. Figure 3 [Fig. 3]). However, it contains only intersection hypotheses without unions which are easier to test. The family of all intersections of the 2 • IJ marginal hypotheses will be tested by a step down procedure. Subsequently a local combination hypothesis  can be rejected by the Min test principle, if both of its marginal hypotheses
 can be rejected by the Min test principle, if both of its marginal hypotheses  and
 and  are rejected by the step down procedure using level α tests.
 are rejected by the step down procedure using level α tests.
Figure 3: Closed system of hypotheses with marginal hypotheses, (2x3)-design

The global hypothesis  of this closed system of hypotheses is the intersection of all marginal hypotheses and differs from H0:
  of this closed system of hypotheses is the intersection of all marginal hypotheses and differs from H0:
 .
.
Accordingly, a global test for  is not a competitor to the AVE and MAX global tests. This test procedure allows one only to answer the local question.
 is not a competitor to the AVE and MAX global tests. This test procedure allows one only to answer the local question.
Two simultaneous closed testing procedures for each drug
Consider two simultaneous closed testing procedures generated by the IJ marginal hypotheses  and the IJ marginal hypotheses
 and the IJ marginal hypotheses  (e. g. Figure 4 [Fig. 4]).
 (e. g. Figure 4 [Fig. 4]).
Figure 4: Closed systems of hypotheses for each drug with marginal hypotheses drugs, (3x3)-design

This procedure includes the advantages of both approaches mentioned above: Both systems of hypotheses are as small as in the first approach, and there are only intersection hypotheses without unions as in the second approach. But the disadvantage is that an α adjustment is required in order to control the overall error rate α. The two families of hypotheses will be tested separately for drug A and for drug B by step down procedures at level α/2. Finally Min tests can be applied to test the local combination hypotheses.
As in the approach before both global hypotheses  and
 and  and their union or intersection differ from H0. This approach does not test the global question.
 and their union or intersection differ from H0. This approach does not test the global question.
Simultaneous Min tests and a modification
A common procedure to answer the local question by controlling the overall error rate are simultaneous Min tests. Each local combination hypotheses will be simultaneous tested using the Min test from Laska, Meisner [2] at an adjusted level α* ≤ α . Several α adjustments are described in the literature (e. g. Bonferroni [7], [5] or [8], [9]). Simultaneous Min tests also belong to the class of closed testing principle. An α adjustment by Holm is e. g. a closed testing procedure using the Bonferroni inequality at each step.
Lehmacher [10] and Lehmacher, Wassmer, Reitmeir [11] propose a modification which is a short cut version of a closed testing procedure. Their suggested approach is a two step procedure where both the global and the local combination hypotheses are tested. A combination drug fulfils the property of superiority over its components if the global hypothesis can be rejected at level α and the corresponding local combination hypothesis can be rejected using the modified Bonferroni-Holm procedure with modified levels α/(IJ-1), α/(IJ-1), α/(IJ-2), …, α/2, α.
Special linear contrast tests
When testing superiority of combination drugs in a multi level two factorial design not all pairwise comparisons of treatments will be considered, but the 2 • IJ comparisons of combination drug with their components. Therefore, in the above described closed testing procedure partition hypotheses will be tested where two or more disjunctive hypotheses are intersected (e. g.  . Usual test statistics like, e. g., F-tests do not apply. In order to control the experimentwise error rate, partition hypotheses can be tested by multiple tests with α adjustment. Another possibility is to use a special linear contrast test which could be less conservative and more applicable.
. Usual test statistics like, e. g., F-tests do not apply. In order to control the experimentwise error rate, partition hypotheses can be tested by multiple tests with α adjustment. Another possibility is to use a special linear contrast test which could be less conservative and more applicable.
Thus, each hypothesis in the closed test procedures will be tested by a specific linear contrast statistic. In case of testing an intersection of union hypotheses a transformation in an union of intersection hypotheses is required. The test statistics of the partition hypotheses will be constructed by averaging the corresponding marginal hypotheses. That is, the suitable contrasts cij will be calculated as the sum of the differences between the effect of the combinations and the effect of their single components. The test statistic is given by
 
where
 
Index set π1 and π2
 {(i, j) |i=1,…, I; j=1,…,J} and |π1| and |π2| denote the number of elements in π1 and π2, respectively. S is the pooled estimator of σ for all treatment groups with cij
 {(i, j) |i=1,…, I; j=1,…,J} and |π1| and |π2| denote the number of elements in π1 and π2, respectively. S is the pooled estimator of σ for all treatment groups with cij 
 0.
 0. 
In case of normally distributed data with homogeneous variances T is t-distributed with  degrees of freedom.
 degrees of freedom.
Local MAX test
The MAX test from Hung, Chi, Lipicky [3] is a global test. Here an extension to the local question, the local MAX test will be developed. The test statistic TMAX of the global MAX test is based on the combination drug with the maximum observed minimum gain over its components (in the following called "MAX-combination"). Rejecting the global hypothesis (TMAX >  ) at least the MAX-combination fulfils the property of superiority to their respective components. But indeed there could be other combinations which fulfil this property too.
) at least the MAX-combination fulfils the property of superiority to their respective components. But indeed there could be other combinations which fulfil this property too.
The idea of the local MAX test is to test all local combination hypotheses against the critical value  . Thus, each combination drug whose local test statistic
. Thus, each combination drug whose local test statistic  is greater or equal to
 is greater or equal to  fulfils the property of combination superiority.
 fulfils the property of combination superiority.
The local MAX test is a step up procedure (cf. [12]) based on the ordered local test statistics T(1) ≤ … ≤ T(IJ) = TMAX and a fixed critical value C:
step 1: test the local combination hypothesis  by the critical value C:
 by the critical value C:
 2. step
 2. step
 Stop! Reject all local combination hypotheses
 Stop! Reject all local combination hypotheses
step 2: test the local combination hypothesis  by the critical value C:
 by the critical value C:
 3. step
 3. step
 Stop! Retain
 Stop! Retain  and reject all
 and reject all  , i = 2,..., IJ
, i = 2,..., IJ
step n: test the local combination hypothesis  by the critical value C:
 by the critical value C:
(n = 3,...,IJ)  n + 1. step
 n + 1. step
 Stop! Retain
 Stop! Retain  and reject all
 and reject all  .
.
It is obvious that the local MAX test controls the experimentwise error rate α when the critical value C is  . Consider any true hypothesis
. Consider any true hypothesis  , then
, then
error rate
= P (Reject  )
)
= 1 - P (Retain  )
)
= 1 - P (T(1) < C, …, T(i) < C).
Under  this error rate is clearly maximum and from this follows that α = 1 - P (T(1) < C, …, TMAX < C) and consequently C =
 this error rate is clearly maximum and from this follows that α = 1 - P (T(1) < C, …, TMAX < C) and consequently C =  .
.
Simulation studies of statistical power
All approaches described above control the experimentwise error rate α. The power of the following methods for the two posed questions is compared by simulation.
The global hypothesis H0 is tested by:
• Average (AVE) and maximum (MAX) test,
• Linear contrast test for the transformed global hypothesis H0 (GCo) and
• Multiple test of Simes [13] using Min tests (SIM).
The IJ local combination hypotheses are tested by:
• Simultaneous Min tests according to the Hochberg procedure (simMin),
• Simultaneous Min tests according to the modified Holm procedure after rejection of the global hypothesis (MinAVE, MinMAX, MinGCo, MinSIM),
• Closed test procedure of IJ local combination hypotheses (CTPloc),
• Closed test procedures of 2 • IJ marginal hypotheses (CTPmar),
• Two simultaneous closed test procedures at level α/2 (TwoCTP) and
• Local MAX test (loMAX).
The simulations are based on normally distributed data with different means, homogeneous variances σ = 1, balanced design nij = 30 and significance level α = 0.05. All approaches are very conservative. There is no procedure which is uniformly more powerful than the other ones. Nevertheless depending on the kind of design and a priori informations, suggestions for practical applications can be formulated.
In this paper the results of simulation are mainly qualitatively described. Some quantitative results for a (2x3) as well as for a (3x3)-design (Table 3 [Tab. 3]) are presented as an example. Detailed power analyses are given by Buchheister [14].
Table 3: (2x3) and (3x3)-design (examples)

In case of the global question the results clearly show that the global contrast test (GCo) may substitute the global AVE test. GCo is the most powerful test in situations where as yet the global AVE test was more powerful than the global MAX test (see Figure 5 [Fig. 5]).
Figure 5: Power of global tests in (3x3)-design of Table 3 when all combinations are similar more effective than their components (δ
11
=δ
12
=δ
21
=δ
22
=δ)

An example of comparison of power simultaneous in different situations for (2x3)-design is given in Figure 6 [Fig. 6]. There the black areas describe situations where the global contrast test is more powerful than the global MAX test and in grey areas is power(GKo) < power(MAX).
Figure 6: Difference of power of global MAX test (MAX) and global contrast test (GKo) in (2x3)-design of Table 3

The global contrast test (GCo) is recommended when all combination drugs fulfil the property of superiority or only few combinations do not and the others are similarly strong effective than their components. Otherwise the global MAX test of Hung, Chi, Lipicky [3] is suggested. In case of very large multi level two factorial design, when the tables of critical values for the global MAX test (cf. Table 2 [Tab. 2] or [3]) do not suffice, for practical reasons the multiple test of Simes (SIM) will be recommended. The loss of power is negligible.
The results of the simulations, concerning the local question are quite difficult to summarize. Comparing the three new closed testing procedures among each other shows that the closed testing procedures of IJ local combination hypotheses (CTPloc) are often the most powerful one. But in large multi level designs this closed testing procedure is hardly used in practice because of many transformations of intersection union hypotheses in union intersection hypotheses. In contrast the practical application of the closed testing procedures CTPmar and twoCTP is much easier. Therefore the closed testing procedure of 2 • IJ marginal hypotheses is preferable, except that there is no interest in the global hypothesis. The loss of power in relevant situations is negligible too. The approach of two simultaneous closed testing procedures at level α/2 is very conservative in small designs. However, it becomes interesting in larger designs because of its manageable systems of hypotheses.
Depending on the number of dose combinations which fulfil the property of superiority and the size of their effectiveness, different most powerful procedures can be specified:
1) In case that all or nearly all combinations are similar more effective than their components the three new closed testing procedures have the largest power. In this situation the closed testing procedure of the 2 • J marginal hypotheses is suggested for small designs and two simultaneous closed testing procedure at level α/2 are suggested in case of large designs.
2) If the sizes of effectiveness of the dose combinations differ a lot, but there are only few combinations which are not simultaneously more effective than all of their single components, simultaneous Min tests like simMin, MinMAX or MinSIM are more powerful. The power of these three procedures is similar.
3) Otherwise, if only few dose combinations fulfil the property of superiority the local maximum test has the largest power. Using the tabulated level α critical value (cf. Table 2 [Tab. 2] or [3]) it is easy to apply.
These suggestions are simplified and summarized in Table 4 [Tab. 4] and an example is given in Figure 7 [Fig. 7].
Table 4: Suggestions for approaches depending on design and asked question

Figure 7: Power of local tests in (3x3)-design of Table 3 with (δ
11
=0, δ
12
=0.5δ, δ
21
=0, δ
22
=δ)

If no a priori informations about the effect of the combinations are known it will be difficult to assume that all dose combinations are similarly effective to their components, especially in larger multi level two factorial designs. Therefore concerning simplicity, robustness and power, the new global linear contrast test with subsequent α adjusted simultaneous Min tests according to the modified Holm procedures (MinGCo) are a good compromise and thus the procedure of choice in this situation. The loss of power by using a simultaneous Min test in case of similarly strong effects is small.
Discussion
When more than one dose combination should be tested if they are simultaneous more effective than all of their single components, a multiple testing problem arises. Different test procedures concerning the global or the local approach can be used. Due to the complexity of the problem, an optimal procedure cannot be recommended in general (cf. [14]).
It is not really worse retaining the use of the hitherto most used global tests of Hung et al [3]. Beside a very easy quick extension of the global MAX test to the local question is the here called local MAX test. Anyway there are some advantages of the closed testing procedures. With the closed testing procedures there is less restriction to the data in contrast to the two global tests from Hung et al [3]. which requires normally distributed data with homogeneous variances. Each intersection hypothesis may be tested by suitable level α tests regarding the nature of the data. Because of the numerous partition hypotheses in the closed testing procedures for testing the property of combination superiority it is recommended to use special contrast tests. In case of variance heterogeneity Welch-type modifications of the tests can be easily applied. In case of binary data Gauss tests can be used. Note also that applying closed testing procedures no placebo dose combination must be included in the study. This can be important when there are ethical or medical problems to administer a placebo.
In view of the complexity and the multiplicity of the problem (cf. the application in [15]) a sequential design as presented by Lehmacher, Kieser, Hothorn [16] could be more advantageous, because during the conduct of the study drug combinations can be skipped.
This paper is focussed on the multiple identification of desirable dose combinations. There are multiple decision procedures and related simultaneous confidence intervals are not available. Another related topic is the identification of minimum effective doses; multiple testing procedures for this problem are proposed by Hellmich and Lehmacher [17].
References
[1] Laska EM, Meisner MJ. Testing whether an identified treatment is best: The combination problem. Proceedings of the Biopharmaceutical Section of the American Statistical Association. 1986;163-70.[2] Laska EM, Meisner MJ. Testing whether an identified treatment is best. Biometrics. 1989;45:1139-51.
[3] Hung HMJ, Chi GYH, Lipicky RJ. Testing for the existence of a desirable dose combination. Biometrics. 1993;49:85-94.
[4] Hung HMJ. Testing for existence of desirable dose combination (Correspondence). Biometrics. 1994;50:307-8.
[5] Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800-2.
[6] Hung HMJ. Evaluation of a combination drug with multiple doses in unbalanced factorial design clinical trials. Statist Med. 2000;19:2079-87.
[7] Holm S. A simple sequentially rejective multiple test procedure. Scand J Statist. 1979;6:65-70.
[8] Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383-6.
[9] Hommel G. A comparison of two Bonferroni procedures. Biometrika. 1989;76:624-5.
[10] Lehmacher W. Verlaufskurven und Crossover. Heidelberg: Springer; 1987.
[11] Lehmacher W, Wassmer G, Reitmeir P. Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate. Biometrics. 1991;47:511-21.
[12] Tamhane AC, Hochberg Y, Dunnett CW. Multiple test procedures for dose finding. Biometrics. 1996;52:21-37.
[13] Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751-4.
[14] Buchheister B. Statistische Methoden zum Nachweis der Effektivität von Kombinationspräparaten [Dissertation]. Köln: Medizinische Fakultät der Universität zu Köln; 2001.
[15] Letzel H, Blümner E. Bivariate Dosis-Wirkungs-Beziehungen für ein Kombinationsantihypertensivum: Biometrische Erfahrungen mit einem komplexen Studienmodell. In: Baur MP et al.: Medizinische Informatik, Biometrie und Epidemiologie, 41. Jahrestagung der GMDS, Bonn. München: Urban und Vogel; 1997. p. 382-6.
[16] Lehmacher W, Kieser M, Hothorn L. Sequential and multiple testing for dose-response analysis. Drug Inf J. 2000;34:591-7.
[17] Hellmich M, Lehmacher W. Closure Procedures for Monotone Bi-Factorial Dose-Response Designs. Biometrics. 2005;61:270-7.
 
                                                        


