Software and Data Products
The Baltimore Air Quality Dashboard provides interactive access to fine-scale air pollution measurements from low-cost sensor networks deployed across Baltimore, Maryland. The dashboard is designed to support community-engaged environmental health research by visualizing spatial and temporal patterns in particulate matter and related air quality indicators. It helps residents, community partners, researchers, and public health practitioners explore local air quality conditions, identify pollution episodes, and connect sensor-based evidence with neighborhood-level environmental concerns.
R packages
- E. Wilson, H. Kalter, A. Datta, S. Pramanik, and R. Black, EAVA: Deterministic Verbal Autopsy Coding with Expert Algorithm Verbal
Autopsy. 2025.
Bibtex
@manual{EAVA,
title = {EAVA: Deterministic Verbal Autopsy Coding with Expert Algorithm Verbal
Autopsy},
author = {Wilson, Emily and Kalter, Henry and Datta, Abhirup and Pramanik, Sandipan and Black, Robert},
year = {2025},
note = {R package},
url = {https://cran.r-project.org/web/packages/EAVA/}
}
Description
Expert Algorithm Verbal Autopsy assigns causes of death to 2016 WHO Verbal Autopsy Questionnaire data. EAVA uses the presence and absence of signs and symptoms reported in the Verbal Autopsy interview to diagnose common causes of death. A deterministic algorithm assigns a single cause of death to each Verbal Autopsy interview record using a hierarchy of all common causes for neonates or children 1 to 59 months of age.
- S. Pramanik, E. Wilson, J. Fiksel, B. Gilbert, and A. Datta, vacalibration: Calibration of Computer-Coded Verbal Autopsy Algorithm. 2025.
Bibtex
@manual{vacalibration,
title = {vacalibration: Calibration of Computer-Coded Verbal Autopsy Algorithm},
author = {Pramanik, Sandipan and Wilson, Emily and Fiksel, Jacob and Gilbert, Brian and Datta, Abhirup},
year = {2025},
note = {R package},
url = {https://cran.r-project.org/web/packages/vacalibration/index.html}
}
Description
VAcalibration is a package for bias correcting estimates of cause-specific mortality fractions (CSMF) estimates generated by computer-coded verbal autopsy (CCVA) algorithms from WHO-standardized verbal autopsy (VA) survey data. It leverages data from the multi-country Child Health and Mortality Prevention Surveillance (CHAMPS) project, which determines causes of death via Minimally Invasive Tissue Sampling (MITS). Using the CHAMPS data, the package estimates and publishes an inventory of 48 uncertainty-quantified misclassification matrices for three CCVA algorithms (EAVA, InSilicoVA, InterVA), two age groups (neonates aged 0-27 days and children aged 1-59 months), and eight countries or country groups: seven countries in CHAMPS, Bangladesh, Ethiopia, Kenya, Mali, Mozambique, Sierra Leone, and South Africa, together with an estimate for countries not in CHAMPS. Given VA-only data for an age group, CCVA algorithm, and country, the package uses the corresponding uncertainty-quantified misclassification matrix estimates as an informative prior, and utilizes modular VA calibration to produce calibrated CSMF estimates. It also supports ensemble calibration when VA-only data are provided for multiple algorithms. More generally, the package can be applied to calibrate predictions from a discrete classifier or ensemble of classifiers utilizing user-provided fixed or uncertainty-quantified misclassification matrices.
- W. Zhan and A. Datta, geospaNN: Neural Networks for geospatial data. 2024.
Bibtex
@manual{geospaNN,
title = {geospaNN: Neural Networks for geospatial data},
author = {Zhan, Wentao and Datta, Abhirup},
year = {2024},
note = {Python package},
url = {https://pypi.org/project/geospaNN/}
}
Description
GeospaNN is a package for geospatial analysis using neural networks that explicitly accounts for spatial correlation in the data. The package implements the NN-GLS method of Zhan and Datta (2024, JASA) and is developed using PyTorch and under the framework of PyG library. NN-GLS is a geographically-informed Graph Neural Network (GNN) for analyzing large and irregular geospatial data, that combines multi-layer perceptrons, Gaussian processes, and generalized least squares (GLS) loss. NN-GLS offers both regression function estimation and spatial prediction, and can scale up to sample sizes of hundreds of thousands.
- A. Saha, S. Basu, and A. Datta, RandomForestsGLS: Random Forests for Dependent Data. 2021.
Bibtex
@manual{rfgls,
title = {RandomForestsGLS: Random Forests for Dependent Data},
author = {Saha, Arkajyoti and Basu, Sumanta and Datta, Abhirup},
year = {2021},
note = {R package version 0.1.2},
url = {https://cran.r-project.org/web/packages/RandomForestsGLS/}
}
Description
RandomForestsGLS is a package for fitting non-linear regression models for dependent (spatial and temporal) data with correlated errors. RandomForestsGLS implements the Generalised Least Square (GLS) based Random Forests (RF-GLS) detailed in Saha, Basu and Datta (2020). For spatial data, RandomForestsGLS combines Random Forests and Gaussian Process to estimate non-linear functions and predict spatial outcomes. For time-series data, RandomForestsGLS uses the AR (auto-regressive) process covariance structure with Random Forests for estimation.
- J. Fiksel and A. Datta, codalm: Transformation-Free Linear Regression for Compositional Outcomes and Predictors. 2020.
Bibtex
@manual{codalm,
title = {codalm: Transformation-Free Linear Regression for Compositional Outcomes and Predictors},
author = {Fiksel, Jacob and Datta, Abhirup},
year = {2020},
note = {R package version 0.1.0},
url = {https://cran.r-project.org/web/packages/codalm}
}
Description
codalm is an R-package for linear modeling of compositional data (coda). It implements a simple transformation-free regression of a compositional outcome on a compositional prediction using an M-estimation method. Estimates of the regression-coefficient matrix, bootstrap-based confidence intervals are provided. A permutation based test of linear association is also offered.
- A. Saha and A. Datta, BRISC: Fast Inference for Large Spatial Datasets using BRISC. 2018.
Bibtex
@manual{brisc,
title = {BRISC: Fast Inference for Large Spatial Datasets using BRISC},
author = {Saha, Arkajyoti and Datta, Abhirup},
year = {2018},
note = {R package version 0.1.0},
url = {https://CRAN.R-project.org/package=BRISC}
}
Description
BRISC is a package for rapid estimation, prediction and inference for large spatial data in a frequentist setup. BRISC estimation and prediction relies on nearest neighbor approximations of the spatial Gaussian Process likelihood, and uses a scalable paramteric bootstrap to provide inference for all spatial parameters.
To our knowledge, currently BRISC is the only R-package that provides confidence intervals in a frequentist setup for all parameters including the spatial variance and range of Gaussian Process. Inference from BRISC is highly competitive with those obtained on Bayesian approaches relying on MCMC, while being manifold times faster.
- A. Finley, A. Datta, and S. Banerjee, spNNGP: Spatial Regression Models for Large Datasets using Nearest
Neighbor Gaussian Processes. 2017.
Bibtex
@manual{spnngp,
title = {spNNGP: Spatial Regression Models for Large Datasets using Nearest
Neighbor Gaussian Processes},
author = {Finley, Andrew and Datta, Abhirup and Banerjee, Sudipto},
year = {2017},
note = {R package version 0.1.1},
url = {https://CRAN.R-project.org/package=spNNGP}
}
Description
spNNGP is a package for fully Bayesian analysis of massive spatial data. Spatial analysis of point process data is usually computationally expensive requiring memory and computations that are quadratic and cubic in the number of locations where data is observed. spNNGP implements a class of scalable Nearest Neighbor Gaussian Process models that uses memory and computations that are linear in the size of the data.
spNNGP enables fast fully Bayesian inference of all parameters and proper uncertainty quantified predictions at new locations. An MCMC-free hybrid Bayesian conjugate NNGP is also included which is super fast even for spatial datasets with millions of locations.