Software

R packages

EAVA: Deterministic Verbal Autopsy Coding with Expert Algorithm Verbal Autopsy. 2025.
Bibtex
```
@manual{EAVA,
  title = {EAVA: Deterministic Verbal Autopsy Coding with Expert Algorithm Verbal
  	Autopsy},
  author = {},
  year = {2025},
  note = {R package},
  url = {https://cran.r-project.org/web/packages/EAVA/}
}
```
Description

Expert Algorithm Verbal Autopsy assigns causes of death to 2016 WHO Verbal Autopsy Questionnaire data. EAVA uses the presence and absence of signs and symptoms reported in the Verbal Autopsy interview to diagnose common causes of death. A deterministic algorithm assigns a single cause of death to each Verbal Autopsy interview record using a hierarchy of all common causes for neonates or children 1 to 59 months of age.
W. Zhan and A. Datta, geospaNN: Neural Networks for geospatial data. 2024.
Bibtex
```
@manual{geospaNN,
  title = {geospaNN: Neural Networks for geospatial data},
  author = {Zhan, Wentao and Datta, Abhirup},
  year = {2024},
  note = {Python package},
  url = {https://pypi.org/project/geospaNN/}
}
```
Description

GeospaNN is a package for geospatial analysis using neural networks that explicitly accounts for spatial correlation in the data. The package implements the NN-GLS method of Zhan and Datta (2024, JASA) and is developed using PyTorch and under the framework of PyG library. NN-GLS is a geographically-informed Graph Neural Network (GNN) for analyzing large and irregular geospatial data, that combines multi-layer perceptrons, Gaussian processes, and generalized least squares (GLS) loss. NN-GLS offers both regression function estimation and spatial prediction, and can scale up to sample sizes of hundreds of thousands.
A. Saha, S. Basu, and A. Datta, RandomForestsGLS: Random Forests for Dependent Data. 2021.
Bibtex
```
@manual{rfgls,
  title = {RandomForestsGLS: Random Forests for Dependent Data},
  author = {Saha, Arkajyoti and Basu, Sumanta and Datta, Abhirup},
  year = {2021},
  note = {R package version 0.1.2},
  url = {https://cran.r-project.org/web/packages/RandomForestsGLS/}
}
```
Description

RandomForestsGLS is a package for fitting non-linear regression models for dependent (spatial and temporal) data with correlated errors. RandomForestsGLS implements the Generalised Least Square (GLS) based Random Forests (RF-GLS) detailed in Saha, Basu and Datta (2020). For spatial data, RandomForestsGLS combines Random Forests and Gaussian Process to estimate non-linear functions and predict spatial outcomes. For time-series data, RandomForestsGLS uses the AR (auto-regressive) process covariance structure with Random Forests for estimation.
J. Fiksel and A. Datta, codalm: Transformation-Free Linear Regression for Compositional Outcomes and Predictors. 2020.
Bibtex
```
@manual{codalm,
  title = {codalm: Transformation-Free Linear Regression for Compositional Outcomes and Predictors},
  author = {Fiksel, Jacob and Datta, Abhirup},
  year = {2020},
  note = {R package version 0.1.0},
  url = {https://cran.r-project.org/web/packages/codalm}
}
```
Description

codalm is an R-package for linear modeling of compositional data (coda). It implements a simple transformation-free regression of a compositional outcome on a compositional prediction using an M-estimation method. Estimates of the regression-coefficient matrix, bootstrap-based confidence intervals are provided. A permutation based test of linear association is also offered.
A. Saha and A. Datta, BRISC: Fast Inference for Large Spatial Datasets using BRISC. 2018.
Bibtex
```
@manual{brisc,
  title = {BRISC: Fast Inference for Large Spatial Datasets using BRISC},
  author = {Saha, Arkajyoti and Datta, Abhirup},
  year = {2018},
  note = {R package version 0.1.0},
  url = {https://CRAN.R-project.org/package=BRISC}
}
```
Description

BRISC is a package for rapid estimation, prediction and inference for large spatial data in a frequentist setup. BRISC estimation and prediction relies on nearest neighbor approximations of the spatial Gaussian Process likelihood, and uses a scalable paramteric bootstrap to provide inference for all spatial parameters. To our knowledge, currently BRISC is the only R-package that provides confidence intervals in a frequentist setup for all parameters including the spatial variance and range of Gaussian Process. Inference from BRISC is highly competitive with those obtained on Bayesian approaches relying on MCMC, while being manifold times faster.
J. Fiksel and A. Datta, calibratedVA: Locally calibrated cause specific mortality fractions using verbal autopsy data. 2018.
Bibtex
```
@manual{calibratedVA,
  title = {calibratedVA: Locally calibrated cause specific mortality fractions using verbal autopsy data},
  author = {Fiksel, Jacob and Datta, Abhirup},
  year = {2018},
  note = {},
  url = {https://github.com/jfiksel/CalibratedVA}
}
```
Description

calibatedVA is a package for local calibration of national and sub-national cause specific mortality (CSMF) estimates produced by algortihms based on verbal autopsy data. These computer coded verbal autopsy (CCVA) algorithms usually rely on non-local gold standard training data and can be inaccurate in a local context. calibratedVA uses the output of the CCVA algorithm and limited amount of local gold standard data to update the CSMF estimates using a fast Bayesian hierarchical model. calibratedVA also has an ensemble calibration option where outputs from multiple CCVA algorithms are used to produce an unified calibrated CSMF estimate. the package can also be used in other general contexts to calibrate any discrete classifier (or a set of classifiers) based on limited local labeled data.
A. Finley, A. Datta, and S. Banerjee, spNNGP: Spatial Regression Models for Large Datasets using Nearest Neighbor Gaussian Processes. 2017.
Bibtex
```
@manual{spnngp,
  title = {spNNGP: Spatial Regression Models for Large Datasets using Nearest
  Neighbor Gaussian Processes},
  author = {Finley, Andrew and Datta, Abhirup and Banerjee, Sudipto},
  year = {2017},
  note = {R package version 0.1.1},
  url = {https://CRAN.R-project.org/package=spNNGP}
}
```
Description

spNNGP is a package for fully Bayesian analysis of massive spatial data. Spatial analysis of point process data is usually computationally expensive requiring memory and computations that are quadratic and cubic in the number of locations where data is observed. spNNGP implements a class of scalable Nearest Neighbor Gaussian Process models that uses memory and computations that are linear in the size of the data. spNNGP enables fast fully Bayesian inference of all parameters and proper uncertainty quantified predictions at new locations. An MCMC-free hybrid Bayesian conjugate NNGP is also included which is super fast even for spatial datasets with millions of locations.

Contact:

Software

R packages