Menu
Chiropractic Research
Research and Evidence

How to Read a Chiropractic Research Study

· Chiropractic Research Center

A study gets cited on social media, a clinic pamphlet references “the latest research,” or a news headline announces that chiropractic care has been proven effective for a condition you happen to have. The research may well be sound. Or it may be a case report involving nine people and no control group, dressed up in confident language. The difference matters, and telling one from the other does not require a science degree — it requires knowing which questions to ask.

The Hierarchy of Evidence

Research into Chiropractic treatment ...

Case Reports and Case Series

Most chiropractic research begins at the same place all clinical science does: someone observes something interesting in practice and writes it down. A case report documents a single patient — their presenting complaint, the treatment provided, and the outcome. A case series does the same for a small group, typically fewer than twenty people. These are the ground floor of the evidence hierarchy, and they are genuinely useful for what they are designed to do. A published case report on, say, a patient with cervical radiculopathy who responded well to spinal manipulation can alert other practitioners to a pattern worth investigating.

What case reports cannot do is prove that the treatment caused the improvement. Without a control group, there is no way to distinguish the effect of the intervention from natural recovery, the placebo response, or simple regression to the mean — the statistical tendency for extreme symptoms to moderate over time regardless of treatment. A patient who seeks care when their pain is at its worst will often improve, whether they receive manipulation, medication, or nothing at all.

This does not make case reports worthless. They generate hypotheses, document rare adverse events, and provide detailed clinical context that larger studies strip away. But when someone cites a case report as proof that a treatment works, they are asking the evidence to carry more weight than it was built to bear.

Randomised Controlled Trials

The randomised controlled trial is where chiropractic claims meet their real test. The basic architecture is straightforward: participants are randomly assigned to either a treatment group or a control group, and the outcomes are compared. Randomisation matters because it distributes confounding variables — age, fitness, pain severity, psychological factors — roughly equally between groups, so that any difference in outcomes can more confidently be attributed to the treatment itself.

The particular challenge for chiropractic RCTs is blinding. In a drug trial, neither the patient nor the administering clinician needs to know whether the pill is real or a placebo. Manual therapy does not afford this luxury. You cannot deliver a convincing sham spinal adjustment the way you can a sugar pill — the practitioner always knows what they are doing, and the patient usually does too. Some trials use sham manipulation (light touch at non-therapeutic sites), but participants often guess their allocation, which introduces performance and expectation biases.

The Bronfort 2012 trial, published in the Annals of Internal Medicine and one of the more frequently cited studies in the field, compared spinal manipulation to medication and home exercise for acute and subacute neck pain across 272 participants. It found manipulation and exercise both outperformed medication at several time points. This design — comparing chiropractic to another active treatment rather than to a placebo — is called a pragmatic trial. It answers a practical question: does this treatment work better than what we already offer? That is a different question from whether it works better than nothing, but for patients deciding between treatment options, it is often the more relevant one.

Systematic Reviews and Meta-Analyses

At the top of the evidence hierarchy sit systematic reviews and their statistical counterparts, meta-analyses. A systematic review follows a pre-defined protocol to identify, evaluate, and synthesise all available research on a specific clinical question. Rather than relying on whatever studies the author happens to know about, it searches multiple databases systematically and applies quality criteria to decide which studies are rigorous enough to include.

A meta-analysis takes this a step further by statistically pooling the results of multiple trials to produce a single summary estimate of the treatment effect. The logic is intuitive: if five separate trials each found a small benefit from spinal manipulation for low back pain, combining their data gives a more precise estimate than any one trial alone. The Cochrane Collaboration is widely regarded as the gold standard for this kind of work, applying strict methodological criteria that have earned it credibility across medical disciplines.

The Rubinstein 2012 Cochrane review examined the evidence for spinal manipulative therapy in chronic low back pain and concluded there was moderate-quality evidence of a small, short-term effect on pain. That careful, qualified language is typical of well-conducted systematic reviews — and worth paying attention to. But there is an important limitation that is sometimes overlooked: a systematic review is only as strong as the trials it includes. If the underlying studies are small, poorly blinded, or methodologically flawed, pooling them does not repair those weaknesses. It can, in fact, lend a false sense of precision to imprecise data.

Making Sense of the Numbers

Sample Size and Why It Matters

The number of participants in a study is not just a logistical detail — it directly determines whether the study can detect a real treatment effect if one exists. This concept is called statistical power, and it is one of the most common weaknesses in chiropractic research.

Consider a trial testing whether spinal manipulation reduces low back pain more effectively than general exercise. If the true difference between the two treatments is modest — say, an additional 1.5-point improvement on a 10-point pain scale — a study with 12 participants per group is unlikely to detect it reliably. Random variation between individuals will swamp the signal. The result will probably come back as “not statistically significant,” not because the treatment does not work, but because the study was too small to see whether it does.

An adequately powered trial for a musculoskeletal intervention typically requires 100 or more participants per group. This is expensive. Each participant in a manual therapy trial needs to physically attend multiple treatment sessions delivered one-to-one by a qualified practitioner. Unlike drug trials, where one pharmacist can dispense thousands of pills, the treatment itself does not scale easily. This practical constraint means that many published chiropractic studies are smaller than they ideally should be, and readers should pay close attention to sample size when evaluating the strength of reported findings.

P-Values in Plain Language

Few statistical concepts are as frequently cited — or as commonly misunderstood — as the p-value. In its simplest form, a p-value tells you the probability of observing results at least as extreme as the ones in the study, assuming the treatment actually has no effect. A p-value below 0.05 is conventionally taken to mean the result is “statistically significant” — that there is less than a 5% chance the observed difference arose from random variation alone.

What this does not tell you is how large the treatment effect is, or whether it matters to the person receiving the treatment. A study might find that spinal manipulation reduces pain scores by 0.3 points on a 10-point scale compared to a control group, and if the sample is large enough, that tiny difference will reach statistical significance. The p-value will be below 0.05. But a 0.3-point improvement is unlikely to be perceptible to the patient, let alone meaningful in their daily life.

This is the distinction between statistical significance and clinical significance, and it is arguably the more important of the two when reading any health research. A result can be statistically real but clinically trivial. When evaluating a chiropractic trial, the question worth asking is not simply “did it reach significance?” but “was the improvement large enough to matter to someone in pain?”

Effect Size and Clinical Significance

If the p-value answers whether a result is likely to be real, the effect size answers whether it is likely to matter. Effect size measures the magnitude of the difference between the treatment group and the control group — how much better did the people receiving chiropractic care actually do?

One of the most practical tools for interpreting effect size in clinical research is the minimal clinically important difference, or MCID. This is the smallest improvement that a patient would recognise as meaningful. For the commonly used Numerical Rating Scale for pain (0 to 10), the MCID is generally accepted to be around 2 points. An improvement of less than 2 points, while potentially statistically significant in a large study, would likely go unnoticed by the person experiencing the pain.

When reading a chiropractic trial, the section to focus on is not the abstract conclusion (which almost always emphasises the positive) but the results table. Look for the mean difference between groups and compare it to the MCID for whatever outcome measure was used. A trial reporting that spinal manipulation produced a 2.5-point greater improvement in pain scores than usual care is telling you something clinically relevant. A trial reporting a 0.6-point difference, even if labelled significant, is telling you something that would be difficult to feel. The numbers, more than the authors’ interpretation, tell you what the study actually found.

Red Flags in Study Design

No Control Group or Inadequate Blinding

A study without a control group is, in essence, an expensive anecdote. If forty patients receive spinal manipulation and thirty of them report improvement, the natural question is: how many would have improved anyway? Without a comparison group receiving either a placebo, no treatment, or an alternative intervention, there is no way to answer that question. Natural recovery, the placebo response, and regression to the mean all produce genuine symptomatic improvement that has nothing to do with the treatment under investigation.

Blinding compounds the difficulty in chiropractic research. In an ideal trial, neither the patient nor the practitioner would know which group the patient was assigned to, eliminating expectation effects on both sides. For manual therapy, practitioner blinding is essentially impossible — the chiropractor knows whether they are performing a therapeutic adjustment or a sham procedure. Patient blinding is achievable but difficult; sham manipulation techniques exist, but experienced patients may recognise the difference.

Pragmatic trials offer a practical alternative. Rather than comparing chiropractic to a placebo, they compare it to whatever treatment the patient would otherwise receive — typically medication, physiotherapy, or standard GP care. This design sacrifices some internal rigour but answers the question patients actually want answered: would I be better off seeing a chiropractor than doing what I am currently doing? Both types of trial have their place, and knowing which type you are reading changes how to interpret the results.

Cherry-Picked Outcomes and Missing Data

Most research studies measure more than one outcome. A chiropractic trial might track pain intensity, disability scores, range of motion, patient satisfaction, medication use, and days off work. Outcome reporting bias occurs when the published paper reports only the outcomes that reached statistical significance while quietly omitting those that did not.

The defence against this is pre-registration. Reputable trials register their study protocol with a clinical trial registry before recruitment begins, declaring in advance which outcome is the primary endpoint. If the published paper reports a different primary outcome than the one registered in the protocol, the authors may have changed course after seeing which results looked most favourable. This is not always deliberate misconduct — protocols do evolve for legitimate reasons — but it is a red flag worth noting.

Equally important is attrition. If a trial starts with 200 participants and only 140 complete the study, the 60 who dropped out may have done so for reasons related to the treatment: side effects, lack of improvement, inconvenience of attending repeated appointments. The remaining participants may therefore represent a biased sample — people who were responding well and found it easy to continue. High dropout rates are a particular concern in chiropractic trials, where treatment requires multiple in-person visits over weeks or months. A study with attrition above 20% warrants careful scrutiny of whether the authors used intention-to-treat analysis, which includes all participants regardless of whether they completed the protocol.

Conflict of Interest and Funding Sources

Every published study includes a section — typically at the end, before the references — declaring the authors’ conflicts of interest and the study’s funding source. This section is worth reading, not because funding automatically corrupts findings, but because decades of research across medicine have demonstrated a consistent pattern: studies funded by parties with a financial interest in the outcome are more likely to report favourable results.

This applies to chiropractic research just as it applies to pharmaceutical trials. A study funded by a chiropractic professional association has an institutional interest in demonstrating the value of chiropractic care, much as a trial funded by a drug company has an interest in demonstrating the value of its product. The National Health and Medical Research Council in Australia and university research grants represent the kind of independent funding that reduces this concern.

Funding does not determine a study’s conclusions, and declaring it does is itself a logical error. Plenty of industry-funded research is methodologically rigorous. But funding is a factor to weigh alongside sample size, study design, and effect size when forming an overall judgement. If the only positive evidence for a particular chiropractic application comes from studies funded by organisations with a stake in the result, that evidence carries less weight than independent replication would.

Why One Study Is Never Enough

Chiropractic Gains Recognition in ...

Replication and the Weight of Evidence

One of the more reliable instincts a reader can develop is scepticism toward any single study presented as definitive proof. Science does not work that way, and health science especially does not. A single randomised controlled trial, no matter how well designed, might have drawn an atypical sample, measured outcomes during an unusual period, or been affected by chance in ways that only become apparent when other researchers attempt to reproduce the findings.

Replication — the process of independent researchers testing the same hypothesis in a different population or setting — is the mechanism by which tentative findings become established knowledge. The broader health sciences have grappled publicly with a “replication crisis,” in which many widely cited findings failed to hold up under re-examination. Chiropractic research is not exempt from this problem, though it receives less attention.

The evidence supporting spinal manipulation for acute low back pain carries the weight it does not because of one breakthrough study, but because multiple research groups across different countries, over several decades, have reached broadly similar conclusions. Each individual trial has its limitations. Collectively, they form a body of evidence that is substantially more persuasive than any single component. When you encounter a chiropractic research claim, the first question worth asking is not whether the study is good, but whether it stands alone or sits within a body of evidence pointing in the same direction.

How Evidence Evolves Over Time

The evidence base for chiropractic care has not arrived fully formed. It has been built piece by piece over decades, and a reader who understands this trajectory is better equipped to interpret where things stand today.

Through the 1980s and early 1990s, the evidence for spinal manipulation was sparse, and what existed was frequently dismissed by the broader medical establishment. The 1979 New Zealand Commission of Inquiry into Chiropractic was notable for its time — it concluded that chiropractic care was effective and safe for certain conditions and recommended its integration into the public health system. But the clinical trial evidence was thin by modern standards.

By the 2000s and 2010s, the picture had sharpened considerably. Multiple systematic reviews, including those published through Cochrane, established moderate evidence for spinal manipulation as a treatment for acute low back pain. Evidence for neck pain was present but less robust. Evidence for non-spinal conditions — headaches, for instance — remained mixed. The pattern is not one of a single decisive finding but of gradual clarification. This is how evidence-based medicine operates for any intervention. The evidence for chiropractic care is neither universally strong nor universally weak; it varies by condition, by outcome measure, and by the time period in which you assess it.

A Practical Checklist for the Non-Scientist

Reading research need not require a biostatistics degree. When you encounter a study or a claim about chiropractic care — whether in a news article, a clinic waiting room, or a conversation — a short set of questions can help you evaluate what you are actually being told:

What type of study is it? A systematic review carries more weight than a single trial, which carries more weight than a case report. Know where the study sits on the hierarchy.
How many participants were included? Small studies (under 50 per group) are prone to chance findings. Look for adequate sample sizes.
Was there a control group? Without one, improvements cannot be attributed to the treatment with any confidence.
How large was the effect? Look past the p-value. Was the improvement clinically meaningful — large enough that a patient would actually notice the difference?
Has it been replicated? One study is a starting point. Multiple independent studies reaching similar conclusions constitute evidence.
Who funded it? Check the conflict of interest statement. Independent funding adds credibility; industry funding warrants scrutiny.

None of these questions requires specialist knowledge to answer. Most can be determined from a study’s abstract and methods section. The goal is not to become a peer reviewer but to become a more discerning reader — someone who can distinguish between a well-supported claim and a persuasively presented one. That skill serves well beyond chiropractic research.

The ability to read a study critically is, in the end, a form of self-respect. It means refusing to outsource your judgement about your own health to whoever presents their case most confidently. Chiropractic research, like all clinical research, exists on a spectrum from preliminary to well-established, and most findings land somewhere in the middle. The reader who understands that spectrum — who knows what a p-value actually means, why sample size matters, and what a single trial can and cannot prove — is better positioned than most to make genuinely informed decisions about their care.

3 Comments

  1. T
    Tracey Marsh 5 Dec 2025

    That practical checklist at the end is genuinely useful. Ive been guilty of reading the abstract and conclusion of studies and skipping straight past the results table. The point about comparing mean differences to the MCID is something Ill actually start doing.

  2. S
    Sam Turei 21 Dec 2025

    The explanation of p-values is the best Ive come across. “If the treatment had no real effect at all, data this extreme would arise less than 5% of the time by chance” — thats so much clearer than the way my stats lecturer explained it. Would have saved me a lot of confusion in undergrad.

  3. I
    Ingrid Larsen 8 Jan 2026

    Interesting that you mention the 1979 NZ Commission of Inquiry. I had no idea New Zealand was that early in formally evaluating chiropractic. We seem to have been quite progressive on this compared to other countries.

Evidence in Your Inbox

Research summaries, practical guidance, and NZ-specific chiropractic updates. No pseudoscience, no sales pitches -- just useful, evidence-informed content.

Free. Monthly. Unsubscribe anytime.