National Data Sets

Discover Data @ Duke Menu: DiscoverData@Duke | Duke Clinical Data Tools | Clinical Data Research NetworksDuke Dataset Inventory | EHR Data Consultation Services

 

Featured Resource:

Duke Population Health Datashare

Gives Duke researchers access to electronic health data from different sources to generate new insights into health and health care.


Other Resources & Datasets:

Click to jump to data sets from one of the following categories: 

National Data Sets and Resources

Agency for Healthcare Research and Quality (AHRQ) Data

Cancer Data

Centers for Disease Control Data

Census Data

Environmental Data (EPA)

Centers for Medicare and Medicaid Data

Health Resources Services Administration (HRSA) Data

Other National Health Data Sets and Resources

State and Community Data Sets

HealthStats

NC Health Data

Partners in Information Access

Partnership for a Healthy Durham

Other Data Inventories and Data Sharing Platforms

ISPCR

ISPOR

Center for Open Science

NIH Data Sharing Repositories

US National Library of Medicine Catalogue

 


National Data Sets

AGENCY FOR HEALTHCARE RESEARCH AND QUALITY (AHRQ) DATA

 

Agency for Healthcare Research and Quality (AHRQ) offers robust data sources to researchers, clinicians, purchasers, policymakers, and consumers. Topics include accessibility of care, healthcare disparities, healthcare provided to low-income and other vulnerable populations, healthcare quality, healthcare spending, and use of healthcare services.

Free from source

 

The AHRQ Healthcare Cost and Utilization Project (HCUP) is a family of healthcare databases and related software tools and products developed through a federal-state-industry partnership and sponsored by AHRQ. HCUP databases bring together the data collection efforts of state data organizations, hospital associations, private data organizations, and the federal government to create a national information resource of discharge-level healthcare data.

Free from source

 

AHRQ fact sheets help patients, clinicians, health system leaders, and policymakers make more informed healthcare decisions.

Free from source

 

AHRQ Medical Expenditure Panel Survey (MEPS) provides policymakers, healthcare administrators, businesses, and others with timely, comprehensive information about healthcare use and costs in the U.S., and to improve the accuracy of their economic projections. MEPS collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of private health insurance held by and available to the U.S. population.

Free from source

 

AHRQ's National Healthcare Disparities Report (NHDR) summarizes information about healthcare quality and access among various racial, ethnic, and income groups, and other priority populations, such as children and older adults.

Free from source

 

AHRQ's National Healthcare Quality Report (NHQR) tracks the performance of the healthcare system through quality measures, such as the percentage of heart attack patients who received recommended care when they reached the hospital.

Free from source

 

AHRQ State Snapshots provide state-specific healthcare quality information, including strengths, weaknesses, and opportunities for improvement.

Free from source

 

The American Hospital Association provides a variety of different hospital and health system data resources. For instance, the AHA Annual Survey of Hospitals is conducted annually and provides comprehensive data about individual hospitals, including organizational structure, facilities, services, community orientation, utilization, financing, and personnel.

Fee required

 


CANCER DATA

 

The National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics as an aid to reducing the cancer burden in the U.S. population. SEER is supported by the Surveillance Research Program (SRP), which provides leadership in the science of cancer research as well as analytical tools and methodological expertise in collecting, analyzing, interpreting, and disseminating reliable population-based cancer statistics.

Free from source

 

Cancer Control Planet portal is sponsored by CDC, NCI, and other agencies/organizations. PLANET provides access to data and resources that can help planners, program staff, and researchers to design, implement, and evaluate evidence-based cancer control programs. It also provides access to resources that can assist in:

  • Assessing the cancer and/or risk factor burden within a given state.
  • Identifying potential partner organizations that may already be working with high-risk populations.
  • Understanding the current research findings and recommendations.
  • Accessing and downloading evidence-based programs and products.
  • Finding guidelines for planning and evaluation.

Free from source

 

United States Cancer Statistics (USCS) is an annual online report that includes the official federal statistics on cancer incidence from registries that have high-quality data and cancer mortality statistics from the National Center for Health Statistics (NCHS).

Free from source

 

National Program of Cancer Registries (NPCR) collects data on cancer occurrence (including the type, extent, and location of the cancer) and the type of initial treatment. Through NPCR, CDC supports central cancer registries in 45 states, the District of Columbia, Puerto Rico, and the U.S. Pacific Island Jurisdictions. These data represent 96% of the U.S. population.

Free from source

 


CDC DATA

 

The Centers for Disease Control and Prevention Data and Statistics 

Free from source

 

Behavioral Risk Factor Surveillance System (BRFSS) is a telephone survey conducted by all state health departments, the District of Columbia, Puerto Rico, the Virgin Islands, and Guam with assistance from CDC. The BRFSS is the largest continuously conducted telephone health survey in the world. States use BRFSS data to track critical health problems and to develop and evaluate public health programs. The BRFSS is the primary source of information on the health-related behaviors of adults in this country. States use standard procedures to collect data through monthly telephone interviews with adults 18 or older. BRFSS interviewers ask questions related to behaviors that are associated with preventable chronic diseases, injuries, and infectious diseases.

Free from source

 

CDC WONDER (Wide-ranging Online Data for Epidemiologic Research) is an inventory of online databases that utilizes a rich ad hoc query system for the analysis of a wide array of public health data.

CDC WONDER allows users to:

  • Search for and read published documents on public health concerns, including reports, recommendations and guidelines, articles and statistical research data published by CDC, as well as reference materials and bibliographies on health-related topics;
  • Query numeric data sets on CDC's information systems. Public-use data sets about mortality (deaths), cancer incidence, HIV and AIDS, TB, natality (births), census data and many other topics are available for query, and the requested data are readily summarized and analyzed.
  • Produce tables, maps, and charts. 
  • Download tab-delimited text exports of summary statistics.

Free from source

 

National Ambulatory Medical Care Survey (NAMCS) provides objective, reliable information about the provision and use of ambulatory medical care services in the U.S. Findings are based on a sample of visits to non-federal-employed office-based physicians who are primarily engaged in direct patient care.

Free from source

 

The CDC’s National Center for Environmental Health (NCEH) provides a reference list of nationally funded major data systems with national scope that have a relationship to environmental health, including Asthma Data, Statistics and Surveillance, Childhood Lead Poisoning Data and Surveillance Resources, and the National Environmental Public Health Tracking Network.

Free from source

 

The CDC’s National Center for Health Statistics Surveys and Data Collection Systems (NCHS) provides a rich source of information about America's health from a wide variety of surveys and studies. To get started, check out their Resources for Researchers.

Free from source  

 

The NCHS Data Briefs are statistical publications from the National Center for Health Statistics that take a complex data subject and summarizes it in text and graphics that provide readers with easily comprehensible information in a compact publication.

Free from source

 

The NCHS FastStats – Statistics by Topic provide quick access to statistics on topics of public health importance, organized alphabetically. Links are provided to publications that include the statistics presented, to sources of more data, and to related web pages.

Free from source

 

National Hospital Discharge Survey (NHDS), which has been conducted annually since 1965, is a national probability survey designed to meet the need for information on characteristics of inpatients discharged from non-Federal short-stay hospitals in the U.S.

Free from source

 

National Health and Nutrition Examination Survey (NHANES) is a survey conducted by CDC's National Center for Health Statistics to collect information about the health and diet of people in the U.S. NHANES is unique in that it combines a home interview with health tests that are done in a Mobile Examination Center.

Free from source

 

National Program of Cancer Registries (NPCR) collects data on cancer occurrence (including the type, extent, and location of the cancer) and the type of initial treatment. Through NPCR, CDC supports central cancer registries in 45 states, the District of Columbia, Puerto Rico, and the U.S. Pacific Island Jurisdictions. These data represent 96% of the U.S. population.

Free from source

 

Work Related Injury Statistics Query System (Work-RISQS) provides a query system for obtaining national estimates (number of cases) and rates (number of cases per hours worked) for nonfatal occupational injuries and illnesses treated in U.S. hospital emergency departments. Users may interactively query based on demographic characteristics, nature of injury/illness, and incident circumstances for the years 1998, 1999, and 2000. Additional data-years will be added in future updates.

Free from source

 

Web-Based Injury Statistics Query and Reporting System (WISQARS) is the National Center for Injury Prevention and Control's interactive, online database that provides customized injury-related mortality data and nonfatal injury data useful for research and for making informed public health decisions.

Free from source

 


CENSUS DATA

 

U.S. Census Bureau's Gateway to census 2000 and American Fact Finder have tables and maps of geographies to the block level; summaries of the most requested data for states and counties; and data highlights, documentation, and FTP access for the U.S., states, counties, places (cities and towns), including data for Puerto Rico (en español) and Island Areas.

Free from source

 

TheDataWeb is a network of online data libraries for census, economic, health, income and unemployment, population, labor, cancer, crime and transportation, family dynamics, and vital statistics data.

Free from source

 

The World Health Organization’s Health Statistics and Information Systems provides access to the Global Health Observatory, Global Health Estimates, and the WHO Mortality Database as well as a variety of Data Analysis Tools.

Free from source

 


ENVIRONMENTAL DATA (EPA)

 

AirData provides access to yearly summaries of U.S. air pollution data, taken from EPA's databases. The summary includes all fifty states plus District of Columbia, Puerto Rico, and the U.S. Virgin Islands. AirData has information about where air pollution comes from (emissions) and how much pollution is in the air outside our homes and workplaces (monitoring).

Free from source

 

Air Information Retrieval System AQS is EPA's repository of ambient air quality data. AQS contains ambient air pollution data collected by EPA, state, local, and tribal air pollution control agencies from over thousands of monitors. AQS also contains meteorological data, descriptive information about each monitoring station (including its geographic location and its operator), and data quality assurance/quality control information.  The AQS Data Mart is designed to make air quality data more accessible and useful to the scientific and technical community.

Free from source

 

Envirofacts Data Warehouse provides access to several EPA databases containing information about environmental activities that may affect air, water, and land anywhere in the U.S. and allows users to generate maps of environmental information. Topics include waste, water, toxins, air, radiation, land, and maps.

Free from source

 

EnviroMapper maps various types of environmental information, including air releases, drinking water, toxic releases, hazardous wastes, water discharge permits, and Superfund sites.

Free from source

 

MyEnvironment provides a wide range of federal, state, and local information about environmental conditions and features in an area of your choice.

Free from source

 

The National Environmental Public Health Tracking Network is a system of integrated health, exposure, and hazard information and data from a variety of national, state, and city sources. On the Tracking Network, you can view maps, tables, and charts with data about:

  • chemicals and other substances found in the environment
  • some chronic diseases and conditions
  • the area where you live

You can also:

  • Try the new Data Explorer
  • Search for Information by Location
  • Visit State & Local Tracking Portals

Free from source

 


MEDICARE & MEDICAID DATA (CMS)

 

The Centers for Medicare and Medicaid Services CMS.gov Data Compendium provides key statistics about CMS programs and national health expenditures. The Compendium contains historic, current, and projected data on Medicare enrollment and Medicaid recipients, expenditures, and utilization. Data pertaining to budget, administrative and operating costs, individual income, financing, and healthcare providers and suppliers are also included. National health expenditure data not specific to the Medicare or Medicaid programs are also included.  

Free from source

 

Use the CMS Data Navigator to find data and information products for specific CMS programs, such as Medicare and Medicaid, or on specific healthcare topics or settings-of-care. Navigator displays search results by data type, making it easier to locate specific types of information (e.g., data files, publications, statistical reports, etc.). 

Free from source

 

Medicare Data provides direct access to the official data from the Centers for Medicare & Medicaid Services (CMS) that are used on the Medicare.gov Compare Websites and Directories and makes these CMS data readily available in open, accessible, and machine-readable formats.

Free from source

 

Medicare Research Identifiable Files (RIFs)

The Centers for Medicare & Medicaid Services (CMS) makes identifiable data files (IDFs) available to certain stakeholders as allowed by federal laws and regulations as well as CMS policy. IDFs contain protected health information (PHI) and/or personally identifiable information (PII) and CMS is committed to ensuring this information is protected.

CMS allows organizations to access IDFs or research identifiable files (RIFs) for research purposes. Requests for these data files require a research protocol and Data Use Agreement, among other documents, and are reviewed by CMS’s Privacy Board. For more information on the research request process, please visit the Research Data Assistance Center (ResDAC) website at: http://www.resdac.org

Additional information about fee information for CMS data can be found here.

Fee required

 

Medicare Inpatient Rehabilitation Facility Patient Assessment Instrument (IRF-PAI) is assessment data collected on all Medicare Part A fee-for-service patients who receive services under Part A from an inpatient rehabilitation unit or hospital. IRF-PAI data items address the physical, cognitive, functional, and psychosocial status of the IRF patients.

Fee required

 

Medicare Cost Reports contain provider information such as facility characteristics, utilization data, cost and charges by cost center, in total and for Medicare, Medicare settlement data, and financial statement data. CMS maintains the cost report data in the Healthcare Provider Cost Reporting Information System. 

Free from source, with charges for custom reports

 

The CMS supported Research Data Assistance Center (ResDAC) provides free assistance to academic, government, and non-profit researchers interested in using Medicare and/or Medicaid data for their research. ResDAC is staffed by a consortium of epidemiologists, public health specialists, health services researchers, biostatisticians, and health informatics specialists from the University of Minnesota

Data available from ResDAC, including Identifiable Files, Limited Data Sets, and Public Use Files, can be found here.

ResDAC also provides a cost estimator application to allow users an opportunity to create an estimated cohort size and obtain an estimated cost based on size, file types, and years before moving to the next level in the data request process (for data requests and grant proposals).

Fee required

 

CMS Chronic Conditions Data Warehouse (CCW)  is a research database designed to (1) identify areas for improving the quality of care provided to chronically ill Medicare beneficiaries; (2) identify ways to reduce program spending; and (3) make current Medicare data more readily available to researchers studying chronic illness in the Medicare population. The CCW contains fee-for-service institutional and non-institutional claims, enrollment/eligibility, and assessment data from 1999 forward for a random 5% sample of Medicare beneficiaries (100% for 2005 forward). The data are linked by a unique, unidentifiable beneficiary key, which allows researchers to analyze information across the continuum of care.

Fee required

 

CMS Long-Term Care Minimum Data Set (MDS) is a standardized, primary screening and assessment tool of health status that forms the foundation of the comprehensive assessment for all residents in a Medicare and/or Medicaid-certified long-term care facility.  The MDS contains items that measure physical, psychological, and psychosocial functioning.  The items in the MDS give a multidimensional view of the patient's functional capacities and helps staff to identify health problems. 

Free from source

 


HEALTH RESOURCES SERVICES ADMINISTRATION (HRSA) DATA

 

Health Resources and Service Administration Data Warehouse (HDW)

The Health Resources and Services Administration (HRSA), an agency of the U.S. Department of Health and Human Services, is the primary Federal agency for improving health and achieving health equity through access to quality services, a skilled health workforce and innovative programs. HRSA's programs provide health care to people who are geographically isolated, or economically or medically vulnerable. Data in the HDW are associated with one or more topic areas such as grants, health professionals, and shortage areas. Data can also be viewed by tool (charts, data tables, maps, and preformatted reports).

Free from source

 

HRSA Area Health Resources Files

The Area Health Resources Files (AHRF) data are designed to be used by planners, policymakers, researchers, and others interested in the nation’s health care delivery system and factors that may impact health status and health care in the United States. The AHRF data includes county, state, and national-level files in eight broad areas: Health Care Professions, Health Facilities, Population Characteristics, Economics, Health Professions Training, Hospital Utilization, Hospital Expenditures, and Environment. The AHRF data are obtained from more than 50 sources. 
    
The HRSA Data Warehouse (HDW) allows users to interact with data in charts, tables/reports, maps, and tools. Visit the Data Sources and Refresh Dates page for information about where the AHRF data is available in the HRSA Data Warehouse. 

Free from source

 


OTHER HEALTH DATA SETS AND RESOURCES

 

dbGaP is the database of Genotypes and Phenotypes that was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in humans. dbGaP is funded and managed by the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). The site provides a number of tutorials demonstrating how to access and use the data and an FAQ help manual.

Free from source

 

Commonwealth Fund shares maps and data, including ChartCart and Performance Snapshots, which allow you to create your own collections of Commonwealth Fund charts. Further resources include an interactive  Health System Data Center, include an interactive Health System Data Center, where you can compare state-level health performance across a variety of domains.

Free from source

 

The Dryad Digital Repository is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes.

Dryad's mission is to provide the infrastructure for, and promote the re-use of, data underlying the scholarly literature.

Because all data in Dryad is (after the expiration of a possible embargo) available under a CC Zero public domain waiver, Dryad does not contain sensitive, ePHI, or any other data that cannot be released to the public. Data from medical studies that is in Dryad is properly anonymized and was prepared under applicable legal and ethical guidelines. For additional information, email the Dryad Helpdesk.

Free from source

 

The annual County Health Rankings and Roadmaps measure vital health factors, including high school graduation rates, obesity, smoking, unemployment, access to healthy foods, the quality of air and water, income inequality, and teen births in nearly every county in America. 

Free from source

 

Dartmouth Atlas of Health Care uses Medicare data to provide information and analysis about national, regional, and local markets. Data are available on the following topics: primary care service areas, end-of-life care, care of chronic illness in last two years of life, hospital and physician capacity, quality/effective care, hospital use, hospital discharges for medical conditions, Medicare reimbursement, and surgical procedures.

Free from source

 

Data Resource Center for Child and Adolescent Health provides access to data from the National Survey of Children's Health (NSCH) and the National Survey of Children with Special Health Care Needs (CSHCN) as well as many other useful resources.

Free from source

 

HealthData.gov makes high-value health data more accessible to entrepreneurs, researchers, and policy makers in hopes of better health outcomes for all. The site includes a filterable search index with access to more than 3,000 datasets across multiple domains, some with one-touch accessibility.

Free from source

 

Internet Crossroads in Social Science Data The University of Wisconsin Data and Program Library Service publishes the Internet Crossroads in Social Science Data (Crossroads) that contains more than 1,000 links to worldwide data resources. Users enter keywords in the search function or browse through the topic related links. The Health link provides more than 50 health-related data resources, both national and international in scope.

Free from source

 

IPUMS (Integrated Public Use Microdata Series) provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation allows users to study change, conduct comparative effectiveness research, merge information across data types, and analyze individuals within family and community context. 

Free from source

 

Long Term Care: Facts on Care in the U.S. (LTCFocUS.org) provides data on nursing home care in the U.S., specifically focusing on the health and functional status of nursing home residents, characteristics of care facilities, state policies relevant to long-term care services, financing, and data characterizing the markets in which facilities exist.

Free from source

 

MarketScan® Research Databases by Truven Health Analytics captures person-level inpatient/outpatient services, prescription drug fills, and health insurance enrollment. The database contains demographic, diagnosis, procedure, and treatment data on over 60 million commercial and Medicare patients in the U.S. (over three years). The data come from a selection of large-employer health plans linking claims and encounter data to patient information across sites and providers over time.

Fee required

 

Organization for Economic Cooperation promotes policies that will improve the economic and social well-being of people around the world. OECD provides international data across a variety of different topics related to health.

Free summary stats; paid access to database

 

Statehealthfacts.org is the Kaiser Family Foundation's online source for the latest state-by-state data from all 50 states and national data on more than 700 demographic, health, and health policy topics.

Free from source

 

 


State and Community Data Sets

 

HealthStats for North Carolina provides statistical numerical data as well as contextual information on the health status of North Carolinians and the state of North Carolina's healthcare system.

Free from source

 

North Carolina Health Data Query System provides customized reports of health data based on user-specified selection of variables (e.g., age, race, county). Reports include birth, birth defect, mortality, population estimates, and pregnancy data.

Free from source

 

Partners in Information Access for the Public Health Workforce provides access to health data tools and statistics.

Free from source

 

The Partnership for a Healthy Durham is a coalition of local organizations and community members with the goal of collaboratively improving the physical, mental, and social health and well-being of Durham’s residents.  The Partnership also provides a list of health data resources.

Free from source

 

 

 


Other Data Inventories & Data Sharing Platforms

 

Inter-University Consortium for Political and Social Research (ISPCR) provides over 500,000 data sets from social science research studies. Topics include gerontology, public health, medical care, substance abuse, and mental health. Some components of select data sets can be analyzed online.

 

 

International Society for Pharmacoeconomics and Outcomes Research (ISPOR) maintains an inventory of healthcare data sets from over 30 countries. The digest is grouped by country and allows keyword searches and searches by type of data set.

 

 

The Center for Open Science has a mission to increase openness, integrity and reproducibility of scholarly research and envisions a future scholarly community in which the process, content, and outcomes of research are openly accessible by default. The Open Science Framework is a free, secure web application for project management, collaboration, registration, and archiving across the entire research lifecycle.

Free from source

 

The University of California San Francisco Clinical Translational Science Institute (UCSF CTSI) has a large dataset inventory available to Clinical Translational Science Awardee institutions like Duke. The inventory has a guided search feature to help you find the best data set for your project. It is searchable by domain, study design, population, timeframe, unit of observation, scope, cost, and publisher

 


NIH Data Sharing Repositories

The US National Library of Medicine and the Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) have developed a catalogue of NIH-supported data repositories that make data available for re-use, as well as resources that aggregate information about biomedical data and information sharing systems. 

View the catalogue