James Cook and Olivia Seifert
More than 95% of rare diseases – of which there are thought to be over 7,000 – currently have no approved therapies, highlighting a significant unmet need for patients without access to commercially available treatments.
An increasing number of pharmaceutical and biotech companies are targeting these conditions, reflected by the fact that in 2024, some 50% of FDA approvals and approximately 32% of approvals in the EU were for orphan drugs.
However, one of the main challenges faced by all companies conducting strategic planning for rare diseases is obtaining accurate data on the size of these often complex patient populations. The challenge of accurately estimating the prevalence of rare diseases is illustrated in the literature, where prevalence figures often span strikingly wide ranges – sometimes varying by orders of magnitude between studies. This level of uncertainty affects every step of the molecule-to-market planning cycle.
Why do patient population estimates vary so widely?
Most prevalence estimates for rare diseases found in the literature are based on sources which cover the “known” clinically ascertained patient population, such as epidemiological surveys, patient registries (where available), medical records, and insights from key opinion leaders and clinical practitioners. However, estimates based on these sources do not capture individuals who are misdiagnosed or undiagnosed – representing a potentially significant number of patients, as many people living with rare diseases face a long ‘diagnostic odyssey’ due to the phenotypic and genetic complexity of their conditions. This can often lead to a systemic underestimation of the true size of the patient population and the disease burden.
Why this matters: Reducing risk and uncertainty in the drug development lifecycle
For companies developing therapies, uncertainty in patient population estimates translate to significant commercial risks, as accurate data is essential for decision-making and investment at every stage of the molecule-to-market planning cycle:
Research – for understanding disease aetiology and the natural history of disease to help identify disease targets, establish development priorities, and optimise investment decisions for early-stage assets. Early Target Product Profiles will be shaped by an understanding of the prevalence and characteristics of patient groups that assets will be intending to treat.
Development and clinical trials – for patient group identification and stratification, and selecting eligible patients for clinical trials. At the pivotal trial stage, this will determine which patient groups may be eligible for new treatments if approved by regulators.
HEOR, market access and launch – for budget impact forecasts and cost effectiveness analysis, HTA dossier submissions, and determining which patients will receive reimbursements.
Post-market surveillance and market expansion – as key inputs for projecting the health, environmental and economic burden and outcomes of interventions.
Policy impact and patient advocacy – for equipping advocacy groups with data to help raise awareness and drive change.
Without reliable data on the size of patient populations, companies may hesitate to invest in a drug target or even abandon the development of promising therapies due to perceived lack of commercial viability – ultimately limiting access to potentially life-saving treatments for patients.
Widely varying prevalence estimates: Spotlight on Lysosomal Storage Disorders
The issue of widely varying rare disease prevalence figures found within the literature is well illustrated when looking at Lysosomal Storage Disorders (LSDs).
LSDs are a group of inherited metabolic conditions caused by defects in lysosomal function, typically due to enzyme deficiencies. The table below summarises prevalence data from the literature for nine LSDs.
Disease | Prevalence | References |
---|---|---|
Niemann-Pick Disease Type C | ~1 in 345,000 – 256,000 (US – US) | Burton et al., 2021; van Gool et al., 2024 |
Krabbe disease | ~1 in 400,000 – 20,000 (Global population – US) | Medscape, 2024; HRSA, 2009 |
MPS III (Sanfilippo syndrome) | ~1 in 1,000,000 – 111,000 (Global population) | Orphanet |
Metachromatic leukodystrophy | ~1 in 160,000 – 40,000 (Global population) | ICER, 2023 |
Pompe disease | ~1 in 34,000 – 12,000 (Taiwan – East Asia) | Chien et al., 2009; Park, 2021 |
MPS I (Hurler syndrome) | ~1 in 1,429,000 – 1,351,000 (US and Wales – Denmark) | Puckett et al., 2021; Nørmark et al., 2017 |
MPS II (Hunter syndrome) | ~1 in 1,429,000 – 1,111,000 (US – Russia) | Puckett et al., 2021; Buchinskaia et al., 2025 |
Fabry disease | ~1 in 117,000 – 1,250 (Australia – Taiwan) | Meikle et al., 1999; Hwu et al., 2009 |
Gaucher disease | ~1 in 313,000 – 112,000 (Russia – Italy) | Movsisyan et al., 2017; Carubbi et al., 2019 |
Two key themes are strikingly clear from this data.
First, that prevalence estimates for the same disease within the same population can vary widely – with estimates for Niemann-Pick Disease Type C in the US varying from 1 in 345,000 to 1 in 256,000 individuals, for example.
Secondly, prevalence estimates for the same disease between different populations also vary widely in the literature. This likely reflects real differences in disease risk across ethnic groups, but highlights the complexity involved in understanding the true prevalence of a disease within global populations or within a particular population of interest – an issue that is critical for companies that need to understand the market size for countries in which their therapy will be made available.
Applying genetic database analysis to produce reliable prevalence estimates for rare genetic diseases
The growing availability of large-scale genetic databases offers a solution to close the gap on wide ranging rare disease prevalence estimates, and can complement the data derived from other sources including the literature and information gathered from clinical practice.
By analysing large genetic datasets that contain the genetic profiles of thousands of individuals from diverse ancestries, it is possible to determine how many individuals carry the relevant genetic variants associated with a particular rare disease of interest.
HealthLumen applied this methodology for our recent research on the prevalence of Fabry disease – an X-linked lysosomal storage disorder caused by mutations in the GLA gene. We screened for pathogenic GLA variants within the large gnomAD genetic database and stratified disease risk by sex and ethnic group. We then applied population data from the US to generate estimates for the number of carriers and symptomatic Fabry disease patients in the US.
The study estimated that the carrier and symptomatic populations of Fabry disease in the US in 2024, based on analysis of the 8 included genetic variants, are:
This is the first study to estimate the number of Fabry disease carriers and symptomatic individuals in the US using publicly available gnomAD v4.1 data. The findings suggest that Fabry disease may be over 3 times as prevalent as is currently believed by the US National Institute of Health, and has a prevalence in line with that suggested by newborn screening studies and the UK Biobank analysis. Integrating the prevalence figures found in this analysis with these existing sources helps build a more complete picture of the true patient population for Fabry disease in the US – which is critical for both internal investment decisions and external engagement with payers for companies looking to develop and deliver effective therapies to this under-served population.
You can find our full study published here.
Comparison of Fabry disease prevalence estimates based on clinically ascertained patient numbers vs HealthLumen’s genetic database analysis estimate
Incorporating data from genetic database analysis provides a more comprehensive understanding of Fabry disease prevalence in the US, complementing studies based on clinically ascertained patients.
Looking ahead: Better understanding rare genetic disease patient populations
As research and therapeutic innovation for rare genetic diseases advances, the need for accurate prevalence data is becoming increasingly clear in order to improve efficiency and reduce risk across the drug development lifecycle.
To gain deeper insights into the burden of rare diseases or accurately define your target patient population, get in touch with HealthLumen today.
References: