蔡豐州 醫師 Background Fields on Advanced Medical Statistics A] Clinical diagnosis： 1] Validity Control：ROC, Sensitivity / Specificity 2] Capture Recapture Sampling 3] χ2 Association Test / Fisher’s direct count 4] Markers Detection： a) Ttest / Rank sum test / Sign rank test b) Pearson momentproduct r / Spearman rankρ B] Clinical Decision Making： 1] Decision Tree 2] Sequential Trial C] Clinical Treatment： RCT : Randomized Clinical Trial : Block / Stratification Efficiency / Effectiveness / Efficacy 1] χ2 Test：Genaral / BreslowDay test / CochranMantel Haenszel tests 2] Armitage Trend Test 3] t , ANOVA Tests / Nonparametric Tests 4] Multiple linear regression / Multiple logistic regression D] Prognosis：[ Treatment Comparison ] 1] KaplanMeier Curve / LogRank Test 2] Cox Proportional Hazard Regression：OR estimation E] Dx Causation： 1] Incidence Rates： Risk Ratios / Odds Ratios / SMR 2] Confounding： Standardization 3] χ2 , t , ANOVA Tests / Fisher’s direct count / Rank sum test etc. MantelHaenszel Test Trend test Multiple Logistic Regression：OR estimation
@@ 人頭資料統計： χ2Test 一、 Variance Tests： 1. Of a Normal Distribution 2. Bartlett's Test for Homogeneity 3. BreslowDay Test for Homogeneity (MetaAnalysis for OR): 2×2 tables with large sample size 二、 Inference Tests： 1. Association Test χ2A 2. Goodness of fit test χ2GF 3. Trend Test [Cochran Armitageχ2T ] 4. Fisher's Exact Test 5. McNemar's Test (Pairedχ2M)：Sensitivity 6. CochranMantelHaenszel Test (CMH / QMH: Stratifiedχ2): more generalized than Breslow test 7. Mantel Extension Test : Trend Test by Confounding 8. Matched stratified Cochran Armitage Test (Matched stratified χ2T): Trend Test 三、 Clinical / Epidemiologicalχ2Test for PersonTime Data： 1. χ2HM (χ2het)：Woolf's Homogeneity Test (for Odds Ratios across Strata) 2. IncidenceRate Trend Test 3. χ2LR： LogRank Test [MantelCox Test] (KaplanMeier Survival Comparisons) 四、 Genotox χ2Test： 1. χ2GF for DoseResponse Curve Comparisom 2. χ2MT for Mutagenicity Significance
@ Bias Precision @ retrospective study has a great chance of bias : (1) selection bias (2) recall bias
@ 2 sample test for binomial proportions (如吸煙, 不吸煙..) (1) chisquare test for 2x2 contingency table (2) Fisher's exact test  esp for small samples (3) chisquare Goodnessoffit test
@ Measures of effect for categorical data 1, p1: probability of developing dz for exposed individuals p2: probability of developing dz for unexposed individuals risk difference=p1p2; risk ratio(relative risk)=p1/p2 2, odds ratio(OR)  be designed to avoid the small value restriction of risk ratio (q=1p; p:probability of success) OR=p1/q1 / p2/q2 3, 為了避免 confounding variable (positive指得是對exposure or dz 不是都正面, 就是都負面; negative指得是對exposure or dz 其中之一是正面, 另一是都負面) 建議分層stratification. 如研究飲酒對口腔癌的影響, 必須考慮吸煙與否. 尤其年齡(age)是常常必須control的因素, 也就是對age 做standardization. 而後就會得到 standardized risk ratio 4, methods of inference for stratified categorical data: MantelHaenszel test 5, chisquare test for homogeneity of odds ratios over different strata (Woolf method)
@ multiple logistic regression 上述是探討 techniques for controlling for a single categorical covariate C while assessing the association between a dichotomous dz D variable and a categorical exposure variable E. 但是假如 E 是 continuous 或 C 是 continuous 或 有許多confounding variables C1, C2, C3...毎一個有可能是categorical or continuous 就得靠新的方法 @ A randomized controlled trial (RCT) is a form of clinical trial, or scientific procedure used in the testing of the efficacy of medicines or medical procedures. It is widely considered the most reliable form of scientific evidence because it is the best known design for eliminating the variety of biases that regularly compromise the validity of medical research. Sellers of medicines throughout the ages have had to convince their patients that the medicine works. As science has progressed, public expectations have risen, and government health budgets have become ever tighter, pressure has grown for a reliable system to do this. Moreover, the public's concern for the dangers of medical interventions has spurred both legislators and administrators to provide an evidential basis for licensing or paying for new procedures and medications. In most modern healthcare systems all new medicines and surgical procedures therefore have to undergo trials before being approved. Trials are used to establish average efficacy of a treatment as well as learn about its most frequently occurring sideeffects. This is meant to address the following concerns. First, effects of a treatment may be small and therefore undetectable except when studied systematically on a large population. Second, biological organisms (including humans) are complex, and do not react to the same stimulus in the same way, which makes inference from single clinical reports very unreliable and generally unacceptable as scientific evidence. Third, some conditions will spontaneously go into remission, with many extant reports of miraculous cures for no discernible reason. Finally, it is wellknown and has been proven that the simple process of administering the treatment may have direct psychological effects on the patient, sometimes very powerful, what is known as the placebo effect. @ EBM: "Evidencebased medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients."
@ Types of trials: 1, Open trial In an open trial, the researcher knows the full details of the treatment, and so does the patient. These trials are open to challenge for bias, and they do nothing to reduce the placebo effect. However, sometimes they are unavoidable, particularly in relation to surgical techniques, where it may not be possible or ethical to hide from the patient which treatment he or she received. 2, Blind trials Singleblind trial In a singleblind trial, the researcher knows the details of the treatment but the patient does not. Because the patient does not know which treatment is being administered (the new treatment or another treatment) there should be no placebo effect. In practice, since the researcher knows, it is possible for them to treat the patient differently or to subconsciously hint to the patient important treatmentrelated details, thus influencing the outcome of the study. Doubleblind trial In a doubleblind trial, one researcher allocates a series of numbers to 'new treatment' or 'old treatment'. The second researcher is told the numbers, but not what they have been allocated to. Since the second researcher does not know, they cannot possibly tell the patient, directly or otherwise, and cannot give in to patient pressure to give them the new treatment. In this system, there is also often a more realistic distribution of sexes and ages of patients. Therefore doubleblind (or randomized) trials are preferred, as they tend to give the most accurate results. But with surgical procedures, for example, a surgeon inevitably knows whether it is the procedure or a sham that he or she is performing. The evaluation of such procedures can be approximately doubleblind if the researchers responsible for recording subjects' responses and analyzing the data are blinded. Such a test typically is not considered "doubleblind."
Tripleblind trial Some randomized controlled trials are considered tripleblinded, although the meaning of this may vary according to the exact study design. The most common meaning is that the subject, researcher and person administering the treatment (often a pharmacist) are blinded to what is being given. Alternately, it may mean that the patient, researcher and statistician are blinded. These additional precautions are often in place with the more commonly accepted term "double blind trials", and thus the term "tripleblinded" is infrequently used. However, it connotes an additional layer of security to prevent undue influence of study results by anyone directly involved with the study. @ Tabular presentation (1) contingency table: 資料必須mutually exclusive univariate distribution oneway table bivariate distribution twoway table trivariate distribution threeway table (2) multiple response table: 如一個病人的症狀並非只有一個 @ central tendency的表示主要有三種: mean=7.9 days; median=7 days; mode=5 days  means有缺點: 易受extraordinary data而過度影響(overdue influence) @ measuring variability (spread): (1) Range=max to min (2) Standard deviation (SD) (3) Coefficient of variation (CV)=SD/mean (percent deviation)eg, An SD of 5 mmHg for systolic pressure readings is small but an SD of even 3 g/dL for Hb level is large @ Shape: (1) skew: A distribution is skewed if one of its tails is longer than the other  positive skew (skewed to the right) // negative skew (skewed to the left) (2) Kurtosis: based on the size of a distribution’s tails  large tails稱為leptokurtic; small tails稱為platykurtic; normal distribution with same kurtosis稱為mesokurtic @ presentation of variation by figuresshould not be too complex !! (1) Histogram Polygon的畫法比較常見: is a shape enclosed by straight lines 可呈現frequency curve (2) Diaphragm (chart)Bar, Pie, Line, scatter (3) Box(andWhiskers )plot: A box is made at the value of the median with two divisions. The vertical height of the lower division represents the first quartile Q1 and that of the upper division represents the third quartile Q3. @ Wrong theory from “Joint Statistical Papers” Neyman and Pearson  type I error: reject right hypothesis (他們宣稱p value=X指means probability of type I error)1X=confidence level type II error: accept wrong hypothesis (B指means probability of type I error)— 1B=power @ ANOVA (analysis of variance) <1> ANOVA 乃Ronald A. Fisher發明特別有用於分析interrelated factors及exploration of causes of heterogeneity
(1) factor: controlled stratification (+ or ) // uncontrolled (2) 唯一有效減少confounding乃equal distribution of the uncontrolled factors (又稱covariate) among Tx groups, eg, 藉由stratification (3) Specification form Response variables   Controlled factors   @ interaction   Covariates   Chronological marker   @ timespecific   @ curve over the time   Presentation   @ ANOVA table   @ Graphics  
(4) linear model:  if an explanatory variables is continuous, its effects are represented by a mean reponse curve, known as regression curve; if it is categorical, its effects are represented by a group of means. (5) measurements for multiple factors a, type I measure: simple average without stratification 即不單獨計算, pool混合所有數據 b, type II measure: c, type III measure: stratified averages representing joint effects of involving factors (6) F value= ratio of mean sum of squares for the effects of interest over the residual mean sum of squares large value indicates that the variation caused by the effects is greater than that caused by the effects of the uncontrolled factors, so suggests a significant effect. (7) student’s t test is valid for almost any underlying distributions if n is large but requires a Gaussian pattern if n is small.  similar conditions also applies to the methods of comparison of means in >= 3 groups (8) 但是ANOVA需要SAMPLE是random sampling (9) 方法乃藉由比較withingroup (如age, gender, height, etc, 但是統計認為它們under the control) and betweengroup variance用算式表示為F=betweengroup variance/withingroup variance, 假如>>1 表示group means是different. 然後其數值理應符合常態分佈, 所以Cutoff value of F for x=0.05 (10) 其validity必須有三個條件: 1, Gaussian pattern: 簡單的criteria是什麼叫做sample”大” for proportion, n is large if np>=8 and n(1p)>=8 for mean, n is large if n>=30 2, homoscedasticity(equality of variance in different groups): 用 Hartley’s Fmax test來check 3, independence: 用 DurbinWatson test來check (11) multivariate analysis少用比較好, 因為很多term本身並不明確, 如HTN包括systolic and diastolic BP兩種因素; cure rate包括symptomatic response, functional change, laboratory results, etc (12) Covariates指factors or explanatory variables that are observed and recorded, but not used to guide the assignment of p’ts to tx groups <13> Multiple comparisons in ANOVA Tukey (TukeyKramer if unequal group sizes), Scheffé, Bonferroni and NewmanKeuls methods are provided for all pairwise comparisons (Armitage and Berry, 1994; Wallenstein, 1980; Miller, 1981; Hsu, 1996; Kleinbaum et al., 1998). Dunnett's method is used for multiple comparisons with a control group (Hsu, 1996). For k groups, ANOVA can be used to look for a difference across k group means as a whole. If there is a statistically significant difference across k means then a multiple comparison method can be used to look for specific differences between pairs of groups. The reason that two sample methods should not be used to make multiple pairwise comparisons is that they are not designed for repeat testing in a "data dredging" manner. If 20 repeat pairwise tests are made then you can not accept the conventional 1 in 20 chance of being wrong as a cut off level for statistical inference, i.e. there is a higher risk of type I error. A simple solution to this problem is to reduce the cutoff for statistical significance with increasing numbers of contrasts made; Bonferroni's method does just this with multiple t tests. More sophisticated methods, such as Tukey(Kramer), consider the statistical distributions associated with systematic repeated testing; both Tukey(Kramer) and NewmanKeuls methods are based upon the Studentized range statistic. Scheffé's method gives a very conservative/cautious weighting against the risk of type I error and is therefore less powerful for the detection of true differences. The most acceptable general method for all pairwise comparisons is Tukey(Kramer), the P values for which are exact with balanced designs (Hsu, 1996). The outputs from the different multiple contrast methods are displayed in decreasing order based upon on the absolute value of the difference between the means of the two groups compared for each contrast. The word "stop" is shown next to the first nonsignificant P value, this indicates that you should not consider further contrasts if you are making a simultaneous analysis (similar to the ShafferHolm method). The following is a decision tree for selecting a multiple contrast method: · pairwise · equal groups sizes: Tukey · unequal group sizes: TukeyKramer or Scheffé · not pairwise · with a control: Dunnett · planned: Bonferroni · not planned: Scheffé Note that Bonferroni and Scheffé methods are completely general; they can be used for unplanned (a posteriori) or planned (a priori) multiple comparisons. This is a controversial area in statistics and you would be wise to seek the advice of a statistician at the design stage of your study. In general you should design experiments so that you can avoid having to "dredge" groups of data for differences, decide which contrasts you are interested in at the outset. Note that multiple independent comparisons (e.g. multiple t or MannWhitney tests) may be justified if you identify the comparisons as valid at the design stage of your investigation. @ onetail test is enough wherever clear indication is available @ 定義: (1) reliability= repeatability of a result in a series of similar studies (2) robustness= stability of a result against minor fluctuations @ CrossGraphs軟體超棒www.statistics.com @ numbers of patients又稱為study size or sample size大的sample size中的誤差, 就像一滴染料掉入太平洋中, 毫無影響 @ KaplanMeier plot of projected survival rates @ Study design: (1) #parallel setups: a number of clearly defined groups to which patients are assigned in a mutually exclusive manner 即Each p’t is assigned to only one Tx  Drug A  Placebo  Drug B  A+B  B  Placebo  A  P 
#crossover setups: a number of Tx sequences, with each p’t asigned in a mutually exclusive manner to one of those sequences. each p’t is assigned to multiple Txs in a sequenceTIME也呈現Randomization此種設計可以降低cost by generating large quantity of data from very few p’ts  Period 1  Washout  Period 2  Sequence A to B  Tx A  No Tx  Tx B  Sequence B to A  Tx B  No Tx  Tx A 
