alt Nov, 20 2025

Getting bioequivalence (BE) studies right isn’t just about running tests on volunteers. It’s about getting the statistical power and sample size perfect from the start. If you get this wrong, your study fails - no matter how well you designed the protocol or how clean your data looks. And when a BE study fails, it costs companies millions and delays generic drugs reaching patients. This isn’t theory. It’s daily reality in pharmaceutical development.

Why Power and Sample Size Matter in BE Studies

Bioequivalence studies compare a generic drug to its brand-name counterpart to prove they behave the same way in the body. The goal isn’t to show one is better - it’s to show they’re practically identical. That’s why regulators like the FDA and EMA don’t use standard significance tests. Instead, they demand that the 90% confidence interval for the ratio of test to reference drug (usually for Cmax and AUC) falls entirely within 80% to 125%.

But here’s the catch: if your study doesn’t have enough people, you might miss a real difference - or worse, falsely claim equivalence when the drugs aren’t truly similar. That’s a Type II error. On the flip side, if you enroll too many people, you waste money, time, and expose more volunteers to unnecessary procedures. Neither outcome is acceptable.

Regulators expect at least 80% power - meaning there’s an 80% chance your study will correctly show bioequivalence if the drugs really are equivalent. Many sponsors now aim for 90% power, especially for drugs with narrow therapeutic windows. The alpha level is fixed at 0.05. No exceptions. That means you only have a 5% chance of falsely declaring bioequivalence when it doesn’t exist.

The Three Big Factors That Drive Sample Size

Sample size isn’t pulled out of thin air. It’s calculated using three critical inputs:

  1. Within-subject coefficient of variation (CV%) - This measures how much a person’s own drug levels fluctuate across dosing periods. If CV% is 20%, that means a person’s Cmax might vary by ±20% even when taking the same pill twice. High CV% = bigger sample size needed. For drugs like warfarin or digoxin, CV% can hit 40% or higher. That means you might need 80+ subjects just to get 80% power.
  2. Expected geometric mean ratio (GMR) - This is your best guess of how the test drug’s exposure compares to the reference. Most assume 1.00 (perfect match). But real-world data shows generics often have GMRs around 0.95. Assuming 1.00 when the true ratio is 0.95 can inflate your required sample size by 32%. Always use realistic, conservative estimates.
  3. Equivalence margins - The standard is 80-125%. But for Cmax in some cases, the EMA allows 75-133%. That small change can cut your sample size by 15-20%. For highly variable drugs (CV > 30%), regulators permit reference-scaled average bioequivalence (RSABE), which widens the margin based on observed variability. This can reduce sample sizes from over 100 to 24-48 subjects.

Let’s say you’re testing a generic antibiotic with a CV% of 25% and expect a GMR of 0.98. With 80% power and standard 80-125% limits, you’d need about 36 subjects. But if your CV% is 35% - common for some cancer drugs - you’d need 78 subjects. That’s more than double. Ignoring variability is the #1 reason BE studies fail.

How to Estimate Variability Accurately

Most people grab CV% values from published literature. Big mistake.

The FDA reviewed 147 BE submissions and found that literature-based CVs underestimated true variability by 5-8 percentage points in 63% of cases. Why? Published studies often use small, homogenous populations or ideal conditions. Real-world variability is messier.

Best practice? Use pilot data. Even a small pilot study with 12-16 subjects gives you a much more reliable CV%. If you can’t run a pilot, use the upper end of published ranges. Don’t be optimistic. Be cautious. Dr. Laszlo Endrenyi found that overly optimistic CV estimates caused 37% of BE study failures in oncology generics between 2015 and 2020.

Also, don’t just look at one parameter. You must calculate power for both Cmax and AUC - together. Most sponsors only optimize for the more variable one. But if your study has 80% power for Cmax and 75% for AUC, your joint power is only about 60%. That’s not enough. Regulators expect you to justify power for both endpoints.

Scale balancing generic and brand drugs with confidence interval bar barely within regulatory limits.

Dropouts and Study Design Matter Too

Even if you calculate the perfect sample size, people will drop out. Maybe they get sick. Maybe they move. Maybe they just don’t want to come back for the second period.

Industry standard? Add 10-15% to your calculated sample size. If you need 30 subjects, enroll 33-35. If you’re doing a crossover design - which most BE studies do - you also need to account for carryover effects. That’s why washout periods are critical. The EMA rejected 29% of BE studies in 2022 because of inadequate sequence effects handling.

Parallel designs (two groups, one dose each) avoid carryover but need double the sample size of crossover studies. So unless you’re dealing with a drug that has a very long half-life, crossover is preferred - if done right.

Tools You Should Be Using

You don’t calculate this by hand. You use software. But not just any software.

General-purpose tools like G*Power won’t cut it. BE studies need specialized calculators that know the regulatory rules. Here are the ones professionals use:

  • PASS - The most comprehensive. Handles RSABE, multiple endpoints, and all regulatory scenarios.
  • nQuery - Popular in large pharma. Easy interface, good documentation.
  • FARTSSIE - Free, open-source. Great for small companies or academics.
  • ClinCalc BE Sample Size Calculator - Free online tool. Good for quick estimates.

One industry survey found that 78% of statisticians use these tools iteratively. They tweak CV%, GMR, and power to see how the numbers shift. It’s not a one-time calculation. It’s a negotiation between feasibility and rigor.

What Happens When You Get It Wrong

The FDA’s 2021 Annual Report showed that 22% of deficiencies in Complete Response Letters were due to inadequate sample size or power calculations. That’s more than formulation issues, more than bioanalytical errors. It’s the #1 statistical failure.

What does that look like in real life? A company spends $1.2 million on a BE study with 20 subjects. The 90% CI for AUC is 78-128%. Close. But it dips below 80%. The study fails. They have to run it again - with 48 subjects this time. Now they’re out $2.5 million and 18 months behind. All because they used a CV% from a 2017 paper instead of running a pilot.

And it’s not just money. Delayed generics mean patients wait longer for affordable drugs. That’s the human cost.

Dashboard showing low joint power for Cmax and AUC, with RSABE activated as variability increases.

What Regulators Want to See in Your Submission

The FDA’s 2022 Bioequivalence Review Template spells it out: your sample size justification must include:

  • Software name and version used
  • Exact input values for CV%, GMR, power, and margins
  • Source of CV% estimate (pilot data? literature? why?)
  • Adjustment for expected dropouts
  • Justification for joint power on Cmax and AUC
  • Any use of RSABE or widened margins - with regulatory reference

Incomplete documentation caused 18% of statistical deficiencies in 2021 submissions. Don’t assume the reviewer will guess what you meant. Spell it out. Document everything.

The Future: Model-Informed Bioequivalence

There’s a new wave coming: model-informed bioequivalence (MIBE). Instead of relying only on Cmax and AUC, MIBE uses pharmacokinetic modeling to predict drug exposure from sparse sampling. It’s already being used in complex products like inhalers and injectables.

Early data suggests MIBE can cut sample sizes by 30-50%. But it’s still rare - only 5% of submissions use it as of 2023. Why? Regulatory uncertainty. It’s hard to standardize. But the FDA’s 2022 Strategic Plan for Regulatory Science explicitly supports it.

For now, stick with the tried-and-true. But keep an eye out. The next five years will change how we think about BE study design.

Final Checklist Before You Start

Before you enroll your first subject, ask yourself:

  • Did I get CV% from pilot data or a reliable source - not just a random paper?
  • Did I use a realistic GMR (0.95-1.05), not 1.00?
  • Did I calculate joint power for Cmax and AUC?
  • Did I add 10-15% for dropouts?
  • Did I use a BE-specific tool (PASS, nQuery, FARTSSIE)?
  • Did I document every assumption and source?
  • Did I check if RSABE applies (CV% > 30%)?

If you answered yes to all of these, you’re not just following the rules. You’re setting your study up to succeed.

What is the minimum acceptable power for a BE study?

Regulatory agencies accept 80% power as the minimum standard. However, many sponsors now aim for 90% power, especially for drugs with narrow therapeutic windows or when submitting globally. The FDA often expects 90% power for such drugs, while the EMA allows 80%. Always check the specific guidance for your target market.

Can I use a sample size from a similar study in the literature?

Only as a starting point. Literature-based sample sizes often underestimate variability. The FDA found that published CV% values are too low in 63% of cases. Always validate with pilot data or use conservative estimates. Never copy a sample size without recalculating based on your drug’s expected pharmacokinetics.

What happens if my BE study fails due to low power?

If your study fails because the 90% confidence interval falls outside 80-125%, you must redesign it. This means recalculating sample size with better CV% estimates, possibly switching to RSABE if applicable, and enrolling more subjects. Failed studies cost between $1 million and $2.5 million and delay generic drug approval by 12-24 months. Prevention is far cheaper than repetition.

Do I need a statistician to run these calculations?

Yes. While tools like ClinCalc are user-friendly, BE sample size calculations involve complex assumptions and regulatory nuances. A qualified biostatistician ensures you’re using the correct formulas, accounting for multiple endpoints, and justifying your inputs according to FDA/EMA guidelines. Most successful BE submissions involve close collaboration between pharmacologists and statisticians.

Is a crossover design always better than a parallel design for BE studies?

Crossover designs are preferred because they reduce variability by using each subject as their own control. They typically require half the sample size of parallel designs. But they’re only suitable if the drug’s half-life allows for a sufficient washout period (usually 5-7 half-lives). For drugs with very long half-lives (e.g., some antidepressants), parallel designs are necessary - but you’ll need to double your subject count.

14 Comments

  • Image placeholder

    Nikhil Purohit

    November 21, 2025 AT 09:46

    Man, I just ran a BE study last month and we got burned by using a CV% from a 2019 paper. Turned out our drug had 38% variability, not 22%. We had to redo everything. Don’t be like us. Pilot data isn’t optional-it’s your lifeline.

  • Image placeholder

    Debanjan Banerjee

    November 22, 2025 AT 01:48

    You’re absolutely right about the joint power issue. Most teams optimize for Cmax and assume AUC will follow-but that’s mathematically naive. If Cmax has 80% power and AUC has 75%, the joint probability is 0.8 × 0.75 = 60%. That’s not just underpowered-it’s a regulatory red flag waiting to happen. Always calculate both, always report both.

  • Image placeholder

    Steve Harris

    November 23, 2025 AT 16:14

    Great breakdown. I’ve seen too many teams skip the pilot because of budget, then panic when the study fails. The $150k you spend on a pilot saves you $1.8M in re-runs. Also, FARTSSIE is a gem-free, accurate, and open-source. If you’re in academia or a small biotech, you’re doing yourself a disservice not using it.

  • Image placeholder

    Michael Marrale

    November 24, 2025 AT 14:29

    Wait… so you’re telling me the FDA doesn’t just make these rules up as they go? 😳 What if this whole BE thing is just a corporate shell game to keep generics off the market? I heard Big Pharma pays regulators to demand 90% power so they can charge $500 for aspirin. 🤔

  • Image placeholder

    David vaughan

    November 25, 2025 AT 11:15

    I just wanted to say… thank you… for this… truly… detailed… post… I’ve been struggling with this for months… and now I finally get it… 😊

  • Image placeholder

    David Cusack

    November 25, 2025 AT 14:55

    How quaint. You assume regulators care about power calculations. In my experience, they care about whether your name is on the right letterhead. I’ve seen studies with 12 subjects pass because the sponsor’s CRO had a dinner with the reviewer. Power is a myth for the masses.

  • Image placeholder

    Elaina Cronin

    November 26, 2025 AT 22:07

    While I appreciate the technical rigor of this post, I must emphasize that the human cost of delayed generics is not merely a footnote-it is a moral imperative. Every day a patient waits for an affordable medication, their quality of life deteriorates. This is not statistics. This is ethics.

  • Image placeholder

    Willie Doherty

    November 27, 2025 AT 02:25

    Interesting. You mention RSABE reduces sample size. But did you account for the fact that 68% of RSABE submissions in 2023 were rejected due to inadequate variance modeling? Your 'solution' is just a different kind of failure. Also, your source for the 37% failure rate? Unverified. You’re not a statistician. You’re a blogger.

  • Image placeholder

    Darragh McNulty

    November 28, 2025 AT 13:12

    This is gold 🙌 I’ve been telling my team for months: stop copying sample sizes from papers! Pilot data = peace of mind 💪 Let’s all go run a 16-subject pilot this week. Who’s with me? 🚀

  • Image placeholder

    Cooper Long

    November 29, 2025 AT 05:49

    Regulatory science demands precision. The distinction between 80% and 90% power is not trivial. It is the difference between a submission that is reviewed and one that is deferred. Do not underestimate the weight of documentation. Clarity is not optional-it is mandatory.

  • Image placeholder

    Sheldon Bazinga

    November 30, 2025 AT 02:48

    lol so we’re all just pawns in the pharma game? I mean, come on. If you’re using PASS or nQuery you’re already paying for the lie. Just use Excel and call it a day. Also, why are all these people from Ireland? Did they steal all the good statisticians?

  • Image placeholder

    Sandi Moon

    December 1, 2025 AT 14:04

    And yet, the FDA approved a BE study last year with a GMR of 1.18 and a CV% of 52%... and the drug is now on shelves. So what’s the point of all this? Are we just performing rituals to appease the algorithm? The system is broken. And you’re all just rearranging deck chairs.

  • Image placeholder

    Kartik Singhal

    December 2, 2025 AT 21:27

    Ugh another textbook post. 🙄 We all know the rules. But real life? You get 2 weeks to submit. Pilot? Nah. Just use the CV from that one paper from 2016. If it fails? Blame the volunteers. They’re cheap. 😎

  • Image placeholder

    Logan Romine

    December 3, 2025 AT 01:39

    So… we’ve turned medicine into a spreadsheet game? We’re calculating the minimum number of humans to subject to blood draws so we can sell a pill for $0.10 instead of $5? I mean… I guess that’s progress? 🤷‍♂️ But tell me… when did we stop caring about healing… and started caring about confidence intervals? 🌌

Write a comment