Open-source Java tool to analyze comorbidities over large datasets of patients

Latest version: 3.5

Comorbidity4j is an Open-source java tool tailored to easily perform comorbidity analyses, thus supporting the analysis of significant cooccurrences of diseases over large datasets of patient data.

Given the demographic information (birth-date, geneder and, optionally secondary patient features like education level, ethnicity, etc.) and the history of diseases of a set of patients, Comorbidity4j performs the comorbidity analysis of (a subset of) such diseases: several widespread measures to identify relevant pairs of cooccurring diseases are computed over the patients' population data provided as input. Besides CSV tables, results can be accessed and interactively explored by means of a set of interactive Web visualizations generated on the flight: from this link you can access an example of the interactive Web visualizations generated by Comorbidity4j.

A brief walkthrough of the capabilities of Comorbidity4j is provided by the Overview.

Comorbidity analyses can be executed:

  • on-line: at by means of the Comorbidity4web service, powered by Comorbidity4j;
  • locally on your workstation: by means of Comorbidity4j. To download Comorbidity4j Java library and obtain instructions for local execution of comorbidity analyses, access the Executing Comorbidity4j on your pc section of the Comorbidity4j documentation. Comorbidity analyses executed by Comorbidity4j expects the same input data and produce the same set of interactive Web visualizations generated by Comorbidity4web, but data are processed and results are stored locally to the user workstation. In this way it is possible to enforce a greater data privacy, protect sensitive data or exploit more powerful workstations.

Source code available at:

Comorbidity4j is distributed under the terms defined by the GNU AFFERO GENERAL PUBLIC LICENSE (Version 3, 19 November 2007).

Core features:

  • computation of the following comorbidity scores: Relative Risk Index (with user-defined confidence interval), Odds Ratio (with user-defined confidence interval), Phi Index, Comorbidity Score, Fisher Test (see: Comorbidity scores)
  • customization of p-value adjustment approach by choosing one of the following methodologies: BONFERRONI, BENJAMINI-HOCHBERG, HOLM, HOCHBERG, BENJAMINI-YEKUTIELI, HOMMEL (see: Comorbidity scores)
  • sex ratio analysis to evaluate if a comorbidity suffered in both, men and women, is equally likely in both sex or if it is more likely in one sex than in another
  • support for time directionality in the analysis of disease pairs
  • results of comorbidity analysis accessible as spreadsheet (see: Comorbidity table) as well as by means of interactive visualizations (see: Web-based interactive visualizations) generated on the flight. The results of comorbidity analysis can be also downloaded as a zip file including these data in CSV format and as standalone Web visualizations
  • support for input data formatted in compliance with the OHDSI Common Data Model (OMOP). (see: Processing OMOP Common Data Model datasets)
  • support for multi-thread execution in order to effectively deal with datasets with thousand patients and million disease pairs
  • input data provided by means of a set of spreadsheets describing demographic information and history of diseases of a set of patients (see: Patient input file format and download an example input dataset, generated by Synthea)
  • interactive, Web-based input data loading, validation and customization of comorbidity analysis exectuion parameters, in particular:
    • interactive identification of the column semantics of input spreadsheet files
    • interactive validation of input file contents and consistency
    • customizable column separator and text delimiters of input spreadsheet files
    • support for any set of diagnosis identifiers
    • automated guess of date formats in input spreadsheet files
    • interactive definition of groups of diagnoses to consider as a single one
    • interactive identification of the pairs of diagnoses to analyze / check for comorbidities
    • customization of time directionality of diagnosis pairs
    • patient filters to select subgroups of patients by age and secondary patient features like education level, ethnicity
    • comorbidity filters to select only diagnosis pairs matching specific values of the computed comorbidity scores
    • comorbidity filters to study only the diagnoses with a minimum number of patients