Background Fields on Advanced Medical Statistics
A] Clinical diagnosis：
1] Validity Control：ROC, Sensitivity / Specificity
2] Capture Recapture Sampling
3] χ2 Association Test / Fisher’s direct count
4] Markers Detection：
a) T-test / Rank sum test / Sign rank test
b) Pearson moment-product r / Spearman rankρ
B] Clinical Decision Making：
1] Decision Tree
2] Sequential Trial
C] Clinical Treatment：
RCT : Randomized Clinical Trial : Block / Stratification
Efficiency / Effectiveness / Efficacy
1] χ2 Test：Genaral / Breslow-Day test / Cochran-Mantel Haenszel tests
2] Armitage Trend Test
3] t , ANOVA Tests / Non-parametric Tests
4] Multiple linear regression / Multiple logistic regression
D] Prognosis：[ Treatment Comparison ]
1] Kaplan-Meier Curve / Log-Rank Test
2] Cox Proportional Hazard Regression：OR estimation
E] Dx Causation：
1] Incidence Rates： Risk Ratios / Odds Ratios / SMR
2] Confounding： Standardization
3] χ2 , t , ANOVA Tests / Fisher’s direct count / Rank sum test etc.
Multiple Logistic Regression：OR estimation
@@ 人頭資料統計： χ2Test
一、 Variance Tests：
1. Of a Normal Distribution
2. Bartlett's Test for Homogeneity
3. Breslow-Day Test for Homogeneity (Meta-Analysis for OR): 2×2 tables with large sample size
二、 Inference Tests：
1. Association Test χ2A
2. Goodness of fit test χ2GF
3. Trend Test [Cochran Armitageχ2T ]
4. Fisher's Exact Test
5. McNemar's Test (Pairedχ2M)：Sensitivity
6. Cochran-Mantel-Haenszel Test (CMH / QMH: Stratifiedχ2): more generalized than Breslow test
7. Mantel- Extension Test : Trend Test by Confounding
8. Matched stratified Cochran Armitage Test (Matched stratified χ2T): Trend Test
三、 Clinical / Epidemiologicalχ2Test for Person-Time Data：
1. χ2HM (χ2het)：Woolf's Homogeneity Test (for Odds Ratios across Strata)
2. Incidence-Rate Trend Test
3. χ2LR： Log-Rank Test [Mantel-Cox Test]
(Kaplan-Meier Survival Comparisons)
四、 Genotox χ2Test：
1. χ2GF for Dose-Response Curve Comparisom
2. χ2MT for Mutagenicity Significance
@ retrospective study has a great chance of bias :
(1) selection bias
(2) recall bias
@ 2 sample test for binomial proportions (如吸煙, 不吸煙..)
(1) chi-square test for 2x2 contingency table
(2) Fisher's exact test --- esp for small samples
(3) chi-square Goodness-of-fit test
@ Measures of effect for categorical data
1, p1: probability of developing dz for exposed individuals
p2: probability of developing dz for unexposed individuals
risk difference=p1-p2; risk ratio(relative risk)=p1/p2
2, odds ratio(OR) --- be designed to avoid the small value restriction of risk ratio (q=1-p; p:probability of success)
OR=p1/q1 / p2/q2
3, 為了避免 confounding variable (positive指得是對exposure or dz 不是都正面, 就是都負面; negative指得是對exposure or dz 其中之一是正面, 另一是都負面) 建議分層stratification. 如研究飲酒對口腔癌的影響, 必須考慮吸煙與否. 尤其年齡(age)是常常必須control的因素, 也就是對age 做standardization. 而後就會得到 standardized risk ratio
4, methods of inference for stratified categorical data:
5, chi-square test for homogeneity of odds ratios over different strata (Woolf method)
@ multiple logistic regression
上述是探討 techniques for controlling for a single categorical covariate C while assessing the association between a dichotomous dz D variable and a categorical exposure variable E.
E 是 continuous
或 C 是 continuous
或 有許多confounding variables C1, C2, C3...毎一個有可能是categorical or continuous
@ A randomized controlled trial (RCT) is a form of clinical trial, or scientific procedure used in the testing of the efficacy of medicines or medical procedures. It is widely considered the most reliable form of scientific evidence because it is the best known design for eliminating the variety of biases that regularly compromise the validity of medical research.
Sellers of medicines throughout the ages have had to convince their patients that the medicine works. As science has progressed, public expectations have risen, and government health budgets have become ever tighter, pressure has grown for a reliable system to do this. Moreover, the public's concern for the dangers of medical interventions has spurred both legislators and administrators to provide an evidential basis for licensing or paying for new procedures and medications. In most modern health-care systems all new medicines and surgical procedures therefore have to undergo trials before being approved.
Trials are used to establish average efficacy of a treatment as well as learn about its most frequently occurring side-effects. This is meant to address the following concerns. First, effects of a treatment may be small and therefore undetectable except when studied systematically on a large population. Second, biological organisms (including humans) are complex, and do not react to the same stimulus in the same way, which makes inference from single clinical reports very unreliable and generally unacceptable as scientific evidence. Third, some conditions will spontaneously go into remission, with many extant reports of miraculous cures for no discernible reason. Finally, it is well-known and has been proven that the simple process of administering the treatment may have direct psychological effects on the patient, sometimes very powerful, what is known as the placebo effect.
"Evidence-based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients."
@ Types of trials:
1, Open trial
In an open trial, the researcher knows the full details of the treatment, and so does the patient. These trials are open to challenge for bias, and they do nothing to reduce the placebo effect. However, sometimes they are unavoidable, particularly in relation to surgical techniques, where it may not be possible or ethical to hide from the patient which treatment he or she received.
2, Blind trials
In a single-blind trial, the researcher knows the details of the treatment but the patient does not. Because the patient does not know which treatment is being administered (the new treatment or another treatment) there should be no placebo effect. In practice, since the researcher knows, it is possible for them to treat the patient differently or to subconsciously hint to the patient important treatment-related details, thus influencing the outcome of the study.
In a double-blind trial, one researcher allocates a series of numbers to 'new treatment' or 'old treatment'. The second researcher is told the numbers, but not what they have been allocated to. Since the second researcher does not know, they cannot possibly tell the patient, directly or otherwise, and cannot give in to patient pressure to give them the new treatment. In this system, there is also often a more realistic distribution of sexes and ages of patients. Therefore double-blind (or randomized) trials are preferred, as they tend to give the most accurate results.
But with surgical procedures, for example, a surgeon inevitably knows whether it is the procedure or a sham that he or she is performing. The evaluation of such procedures can be approximately double-blind if the researchers responsible for recording subjects' responses and analyzing the data are blinded. Such a test typically is not considered "double-blind."
Some randomized controlled trials are considered triple-blinded, although the meaning of this may vary according to the exact study design. The most common meaning is that the subject, researcher and person administering the treatment (often a pharmacist) are blinded to what is being given. Alternately, it may mean that the patient, researcher and statistician are blinded. These additional precautions are often in place with the more commonly accepted term "double blind trials", and thus the term "triple-blinded" is infrequently used. However, it connotes an additional layer of security to prevent undue influence of study results by anyone directly involved with the study.
@ Tabular presentation
(1) contingency table: 資料必須mutually exclusive
univariate distribution---- one-way table
bivariate distribution----- two-way table
trivariate distribution---- three-way table
(2) multiple response table: 如一個病人的症狀並非只有一個
@ central tendency的表示主要有三種: mean=7.9 days; median=7 days; mode=5 days
---- means有缺點: 易受extraordinary data而過度影響(overdue influence)
@ measuring variability (spread):
(1) Range=max to min
(2) Standard deviation (SD)
(3) Coefficient of variation (CV)=SD/mean (percent deviation)------eg, An SD of 5 mmHg for systolic pressure readings is small but an SD of even 3 g/dL for Hb level is large
(1) skew: A distribution is skewed if one of its tails is longer than the other
--- positive skew (skewed to the right) // negative skew (skewed to the left)
(2) Kurtosis: based on the size of a distribution’s tails
--- large tails稱為leptokurtic; small tails稱為platykurtic; normal distribution with same kurtosis稱為mesokurtic
@ presentation of variation by figures----should not be too complex !!
------Polygon的畫法比較常見: is a shape enclosed by straight lines 可呈現frequency curve
(2) Diaphragm (chart)---Bar, Pie, Line, scatter
(3) Box(-and-Whiskers )plot: A box is made at the value of the median with two divisions. The vertical height of the lower division represents the first quartile Q1 and that of the upper division represents the third quartile Q3.
@ Wrong theory from “Joint Statistical Papers”---- Neyman and Pearson
---- type I error: reject right hypothesis (他們宣稱p value=X指means probability of type I error)---1-X=confidence level
type II error: accept wrong hypothesis (B指means probability of type I error)— 1-B=power
@ ANOVA (analysis of variance)
<1> ANOVA 乃Ronald A. Fisher發明----特別有用於分析interrelated factors及exploration of causes of heterogeneity
(1) factor: controlled----- stratification (+ or -) // uncontrolled
(2) 唯一有效減少confounding乃equal distribution of the uncontrolled factors (又稱covariate) among Tx groups, eg, 藉由stratification
(3) Specification form
@ curve over the time
@ ANOVA table
(4) linear model:
--- if an explanatory variables is continuous, its effects are represented by a mean reponse curve, known as regression curve; if it is categorical, its effects are represented by a group of means.
(5) measurements for multiple factors
a, type I measure: simple average without stratification 即不單獨計算, pool混合所有數據
b, type II measure:
c, type III measure: stratified averages representing joint effects of involving factors
(6) F value= ratio of mean sum of squares for the effects of interest over the residual mean sum of squares----- large value indicates that the variation caused by the effects is greater than that caused by the effects of the uncontrolled factors, so suggests a significant effect.
(7) student’s t test is valid for almost any underlying distributions if n is large but requires a Gaussian pattern if n is small. --- similar conditions also applies to the methods of comparison of means in >= 3 groups
(8) 但是ANOVA需要SAMPLE是random sampling
(9) 方法乃藉由比較within-group (如age, gender, height, etc, 但是統計認為它們under the control) and between-group variance---用算式表示為F=between-group variance/within-group variance, 假如>>1 表示group means是different. 然後其數值理應符合常態分佈, 所以Cutoff value of F for x=0.05
1, Gaussian pattern: 簡單的criteria是什麼叫做sample”大”
for proportion, n is large if np>=8 and n(1-p)>=8
for mean, n is large if n>=30
2, homoscedasticity(equality of variance in different groups): 用 Hartley’s Fmax test來check
3, independence: 用 Durbin-Watson test來check
(11) multivariate analysis少用比較好, 因為很多term本身並不明確, 如HTN包括systolic and diastolic BP兩種因素; cure rate包括symptomatic response, functional change, laboratory results, etc
(12) Covariates指factors or explanatory variables that are observed and recorded, but not used to guide the assignment of p’ts to tx groups
<13> Multiple comparisons in ANOVA
Tukey (Tukey-Kramer if unequal group sizes), Scheffé, Bonferroni and Newman-Keuls methods are provided for all pairwise comparisons (Armitage and Berry, 1994; Wallenstein, 1980; Miller, 1981; Hsu, 1996; Kleinbaum et al., 1998). Dunnett's method is used for multiple comparisons with a control group (Hsu, 1996).
For k groups, ANOVA can be used to look for a difference across k group means as a whole. If there is a statistically significant difference across k means then a multiple comparison method can be used to look for specific differences between pairs of groups. The reason that two sample methods should not be used to make multiple pairwise comparisons is that they are not designed for repeat testing in a "data dredging" manner.
If 20 repeat pairwise tests are made then you can not accept the conventional 1 in 20 chance of being wrong as a cut off level for statistical inference, i.e. there is a higher risk of type I error. A simple solution to this problem is to reduce the cut-off for statistical significance with increasing numbers of contrasts made; Bonferroni's method does just this with multiple t tests. More sophisticated methods, such as Tukey(-Kramer), consider the statistical distributions associated with systematic repeated testing; both Tukey(-Kramer) and Newman-Keuls methods are based upon the Studentized range statistic. Scheffé's method gives a very conservative/cautious weighting against the risk of type I error and is therefore less powerful for the detection of true differences. The most acceptable general method for all pairwise comparisons is Tukey(-Kramer), the P values for which are exact with balanced designs (Hsu, 1996).
The outputs from the different multiple contrast methods are displayed in decreasing order based upon on the absolute value of the difference between the means of the two groups compared for each contrast. The word "stop" is shown next to the first non-significant P value, this indicates that you should not consider further contrasts if you are making a simultaneous analysis (similar to the Shaffer-Holm method).
The following is a decision tree for selecting a multiple contrast method:
· equal groups sizes: Tukey
· unequal group sizes: Tukey-Kramer or Scheffé
· not pairwise
· with a control: Dunnett
· planned: Bonferroni
· not planned: Scheffé
Note that Bonferroni and Scheffé methods are completely general; they can be used for unplanned (a posteriori) or planned (a priori) multiple comparisons.
This is a controversial area in statistics and you would be wise to seek the advice of a statistician at the design stage of your study. In general you should design experiments so that you can avoid having to "dredge" groups of data for differences, decide which contrasts you are interested in at the outset. Note that multiple independent comparisons (e.g. multiple t or Mann-Whitney tests) may be justified if you identify the comparisons as valid at the design stage of your investigation.
@ one-tail test is enough wherever clear indication is available
(1) reliability= repeatability of a result in a series of similar studies
(2) robustness= stability of a result against minor fluctuations
@ numbers of patients又稱為study size or sample size---大的sample size中的誤差, 就像一滴染料掉入太平洋中, 毫無影響
@ Kaplan-Meier plot of projected survival rates
@ Study design:
(1) #parallel setups: a number of clearly defined groups to which patients are assigned in a mutually exclusive manner
即Each p’t is assigned to only one Tx
#crossover setups: a number of Tx sequences, with each p’t asigned in a mutually exclusive manner to one of those sequences.---- each p’t is assigned to multiple Txs in a sequence---TIME也呈現Randomization---此種設計可以降低cost by generating large quantity of data from very few p’ts
Sequence A to B
Sequence B to A