2.1. Statistics

Statistics

[Ref: Michael Bennett's lecture on 20060511]

 

Study design

Randomised controlled trials

Blinding

Elimination of observer bias (or conscious or unconscious influence by researcher and subjects)

Blinding is important --> always advantageous

Allocation concealment

People in study are unaware and cannot be aware of

Enrolment before allocation.

Not just blind to treatment, but also to allocation.

3rd party (not involved in the study) does the allocation.

Design

Recruited study sample
* Inclusive? Exclusive?
* Exclusive --> Use very tight criteria for inclusion. May affect how well the result of the study can be applied to the rest of the population. Generalisibility not good. Result may only apply to a subset of the population.

Randomisation
* New therapy
* Standard therapy

Outcome measurement

 

Identification of study population

Draw study sample

Identify outcome parametres

 

Cohort studies

Case-controlled studies

Always retrospective

Cases vs controls
* Population-base would be better

Different levels of exposure between groups --> People in the control group may also have the exposure to the study factor (e.g. smoking) as people in the cases.

Good for hypothesis generation. Very poor for proving causation.

Cross-sectional studies

Observational.

e.g. look at all lung cancer patients --> check if they are smokers.

Conclusion

At the end of the day, studies are designed to find out whether the difference between groups are due to chance.

Levels of evidence

Level one: Large RCTs or systematic reviews with formal meta-analysis

Level two: Smaller RCTs or good cohorts

Level three: case-controlled studies

Level four: Cross-sectional or descriptive studies

Level five: Case reports, anecdote

Types of data

In general, data can either
* Conforms to a described distribution (Parametric), OR
* Distribution-free (non-parametric)

Population

Sample

Inference

Error. Sampling error vs non-sampling errors.

Central tendency and dispersion

A numerical value describing a characteristic of a sample is called a statistic

A statistic is often used as an estimate of a population parameters

Mean, median, mode (central tendency)

Standard deviation, range (dispersion)

Some distribution

Normal distribution

Bell-shaped, symmetrical about its mean

Has the property that 32% of the area under the curve is outside the 1 standard deviation in either direction from the mean, and 5% of the area is outside 1.96 standard deviation

Standard deviation = square root of variance

SEM
= standard error of mean
= SD of the distribution of sample means
* Tend to be smaller than SD
* More relevant in confidence interval

Variance = sum of {square of (difference between sample and mean)} / (n-1)

T-distribution

Chi-square distribution

 

Continuous vs interval/nominal/categorical

Use common sense

If not continuous, is it interval or categorical

Appropriate test selection

--> Popular question

Questions can be:

Single group - descriptive only

Two groups

 

Number at any one time = prevalence

Rate at which things occur = incidence

 

Measures of frequency and association

Relative risk (RR)

2x2 table (drug 1 and drug 2, outcome 1, and outcome 2)

The ratio of incidence in one group to another
--> Chi-square test

 

 

 

Odds ratio (OR)

Ratio of two sections of a population. Not a proportion and therefore not interpretable as risk.

Drug 1  Drug2
Adverse effect A B
No adverse effect C D

Odds ratio =  (a/c)/(b/d)

 

 

Sensitivity and specificity

Table
IV not IV Total
+ve 35 380 415
-ve 15 570 585
Total 50 950 1000

Prevalence = 50/1000 = 5%

Sensitivity
= TP / (TP + FN)
= 35/50 = 70%
= the chance of the test being positive when patient is positive

Specificity
= TN / (FP + TN)
= 570/950 = 60%
= the chance of the test being negative when patient is negative

Positive predictive value
= 35/415
= 8.4%
= Requires prevalence to calculate

NB:

 

Error and power

If we incorrectly reject the null hypothesis, we commit a type I error (alpha)
* Cannot be helped
* e.g. a true hypothesis is rejected because the data falls outside 95% CI due to chance
* Conventionally 5% of type I error is acceptable

With small samples and/or small real differences, we may accept the null hypothesis when it is incorrect --> a type II error (beta)
* occurs more often
* Conventionally 20% of type II error is acceptable

Increasing sample size is the only way of improving the chance of avoiding both errors at the same time

The difference we choose should be the smallest clinically important difference

To calculate sample size, therefore, we need to know or define
* minimal important effect
* type I error acceptable
* Type II error acceptable
* The variability (or SD) of the characteristics we are measuring in the sample we will use

 

 

Confidence intervals

P-vale and the point estimate of the effect gives us a lot of information, however...

There is no information on the precision of the result and the confidence we should have to apply the result in clinical practice

CI gives us this information

The range of values within which we can be 95% confident that the population value lies

General formula: d +/- 1.96 x standard error


 

 

 

 



Table of contents  | Bibliography  | Index