Skip to content

TEXAS

TetraEther indeX for Ammonia oxidizerS — Bayesian proxy system model for TEX86 paleothermometry

License PyPI Zenodo

TEXAS is a Bayesian proxy system model (PSM) for TEX86-based sea surface temperature (SST) reconstruction. It fits hierarchical generalized-logistic Stan models to isoGDGT Ring Index data — with optional non-thermal corrections for GDGT-2/3 ratio (AOA ecology) and NO₃ (nutrient effect) — and reconstructs paleotemperatures with full posterior uncertainty.

The result is a posterior distribution of temperature for each downcore sample, not just a point estimate with a fixed RMSE.


How it works

TEXAS uses a two-stage workflow:

Stage 1 — Forward calibration fits a hierarchical Bayesian generalized logistic curve to modern culture, mesocosm, and coretop Ring Index–temperature data. The output is a posterior distribution of calibration parameters saved as a .nc file. Pre-computed posteriors are available on Zenodo — most users can skip this stage entirely.

Stage 2 — Inverse reconstruction passes your downcore Scaled RI measurements through the forward posterior, marginalizing over all calibration parameter uncertainty, and returns a full temperature posterior per sample.


Quickstart

Install

pip install texas-psm

Or open the interactive notebook in Google Colab — no installation needed:

Open in Colab

For Docker, conda-lock, and development installs see Installation.


Step 1 — Compute Scaled Ring Index

Before prediction you need Scaled Ring Index (RI₀₋₃) values. Pass raw LC/MS peak areas or fractional abundances — the formula normalises by the six-GDGT total, so either works:

import pandas as pd
from TEXAS import compute_scaledRI

df = pd.read_csv("my_gdgt_data.csv")

df["scaledRI_cren3"] = compute_scaledRI(
    df["GDGT-0"], df["GDGT-1"], df["GDGT-2"], df["GDGT-3"],
    df["cren"],   df["cren_prime"],   # cren_rings=3 by default → RI₀₋₃
)

Which Ring Index convention?

The canonical TEXAS posteriors are calibrated against RI₀₋₃ (cren_rings=3, crenarchaeol counted as 3 rings). Pass cren_rings=4 to reproduce the RI₀₋₄ convention of Zhang et al. (2016), but the canonical posteriors were not calibrated against that convention.


Use Mahalanobis distance to flag samples that fall outside the modern coretop calibration domain before running the inverse reconstruction. The detector is fit on the screened coretop training data (low-G23 subset: gdgt23ratio ≤ 5) using TEX86 and scaledRI_cren3 as features. Samples in the paleo record whose distance exceeds the chi-squared threshold are flagged; detect_outliers_manual() additionally preserves warm end-member samples (high RI + high TEX86) that lie outside the ellipse.

import pandas as pd
import matplotlib.pyplot as plt
import TEXAS
from TEXAS.utils.paths import SPREADSHEETS_DIR
from TEXAS.data import MahalanobisOutlierDetector

# Download training data from Zenodo (~1.8 MB, skipped if already cached)
TEXAS.download_training_data()

# Load combined dataset; keep coretop rows only
combined_df = pd.read_csv(SPREADSHEETS_DIR / 'combined_coretop_culture_mesocosm_rev20260210.csv')
coretop_df = combined_df[combined_df['datatype'] == 'coretop']

# Fit on low-G23 coretops (gdgt23ratio ≤ 5 excludes ecology-dominated samples)
detector = MahalanobisOutlierDetector(['TEX86', 'scaledRI_cren3'], confidence=0.9)
detector.fit(coretop_df[coretop_df['gdgt23ratio'] <= 5])
print(f"Fitted on {int((coretop_df['gdgt23ratio'] <= 5).sum())} coretop samples (gdgt23ratio ≤ 5)")
print(f"Mahalanobis threshold (90% CI): {detector.threshold:.3f}")

# Apply to your downcore data — requires TEX86 and scaledRI_cren3 columns
df['TEXRI_cren3_mahalDist_low23ratio_outliers_manual'] = detector.detect_outliers_manual(df)
n_out = int(df['TEXRI_cren3_mahalDist_low23ratio_outliers_manual'].sum())
print(f"Screened out: {n_out} / {len(df)} samples")

# Visualise — 90% confidence ellipse with inliers/outliers colour-coded
fig, ax = plt.subplots(figsize=(5, 4))
detector.plot_decision_boundary(df, ax=ax)
ax.set_xlabel("TEX$_{86}$")
ax.set_ylabel(r"Scaled RI$_{0-3}$")
ax.set_title("Mahalanobis screening (90% CI)")
plt.tight_layout()
plt.show()

# Keep only inliers
df_screened = df[df['TEXRI_cren3_mahalDist_low23ratio_outliers_manual'] == False].reset_index(drop=True)

Step 2 — Download a forward posterior

Pre-computed posteriors are hosted on Zenodo. Download only what you need:

import TEXAS

# Univariate SST — recommended starting point (~0.3 MB)
TEXAS.download_posteriors(["gen_logi_fixed_hier_crtp_univ_priorApprox_SST_scaledRI_cren3"])

# Multivariate EIV (GDGT-2/3 + NO₃ corrections) — ~78 MB each
TEXAS.download_posteriors([
    "gen_logi_fixed_hier_crtp_multiv_priorApprox_eiv_SST_gdgt23ratio_no3_1.0_scaledRI_cren3",
])

# Or download everything at once (~158 MB total)
TEXAS.download_all()

# Check what is already cached
TEXAS.list_posteriors()

Available forward posteriors:

Name (no .nc) Model Temperature Size
gen_logi_fixed_culmeso_cultureT_scaledRI_cren3 Culture + mesocosm Culture T <1 MB
gen_logi_fixed_hier_crtp_univ_priorApprox_SST_scaledRI_cren3 Univariate coretop SST <1 MB
gen_logi_fixed_hier_crtp_univ_priorApprox_thermoT_scaledRI_cren3 Univariate coretop Thermo T <1 MB
gen_logi_fixed_hier_crtp_multiv_priorApprox_eiv_SST_gdgt23ratio_no3_1.0_scaledRI_cren3 EIV multivariate coretop SST ~78 MB
gen_logi_fixed_hier_crtp_multiv_priorApprox_eiv_thermoT_gdgt23ratio_no3_1.0_scaledRI_cren3 EIV multivariate coretop Thermo T ~78 MB

Step 3 — Forward prediction (temperature → proxy)

Useful for plotting the calibration curve and its uncertainty envelope:

import numpy as np
from TEXAS import predict_proxy_from_T

result = predict_proxy_from_T(
    temperatures=np.linspace(5, 35, 100),
    posterior="gen_logi_fixed_hier_crtp_univ_priorApprox_SST_scaledRI_cren3",
)
result["p50"]   # median Scaled RI (numpy array, length 100)
result["p5"]    # 5th percentile
result["p95"]   # 95th percentile

Step 4 — Inverse reconstruction (proxy → temperature)

from TEXAS import predict_T_from_proxyObs

result = predict_T_from_proxyObs(
    proxyObs=df["scaledRI_cren3"].values,
    prior_mu_t=15.0,        # prior mean temperature (°C) — geological estimate
    prior_sigma_t=10.0,     # prior uncertainty (°C) — use wide prior if unsure
    fwd_posterior="gen_logi_fixed_hier_crtp_univ_priorApprox_SST_scaledRI_cren3",
    temptype="SST",
)
result["p50"]   # median SST (°C), one value per sample
result["p5"]    # 5th percentile
result["p95"]   # 95th percentile
result = predict_T_from_proxyObs(
    proxyObs=df["scaledRI_cren3"].values,
    prior_mu_t=15.0,
    prior_sigma_t=10.0,
    fwd_posterior="gen_logi_fixed_hier_crtp_multiv_priorApprox_eiv_SST_gdgt23ratio_no3_1.0_scaledRI_cren3",
    temptype="SST",
    gdgt23ratio=df["gdgt23ratio"].values,
    no3=df["no3"].values,   # µmol/L; scalar or per-sample array
)
import xarray as xr

ocean_ds = xr.load_dataset("ocean_prop_ds.nc")  # WOA23-derived, from SI_code1

result = predict_T_from_proxyObs(
    proxyObs=df["scaledRI_cren3"].values,
    prior_mu_t=15.0, prior_sigma_t=10.0,
    fwd_posterior="gen_logi_fixed_hier_crtp_multiv_priorApprox_eiv_SST_gdgt23ratio_no3_1.0_scaledRI_cren3",
    temptype="SST",
    gdgt23ratio=df["gdgt23ratio"].values,
    site_lat=15.3, site_lon=-23.7,   # modern drill-site coordinates
    no3_dataset=ocean_ds,
)
# Prints: WOA23 NO₃ lookup: lat=15.3, lon=-23.7 → 0.42 µmol/L

If you have a posterior .nc file locally or on Google Drive, pass it directly — no cache lookup, no download:

import xarray as xr
from TEXAS import predict_T_from_proxyObs

# Colab: mount Google Drive first, then load
ds = xr.load_dataset("/content/drive/MyDrive/posteriors/gen_logi_fixed_hier_crtp_univ_priorApprox_SST_scaledRI_cren3.nc")

result = predict_T_from_proxyObs(
    proxyObs=df["scaledRI_cren3"].values,
    prior_mu_t=15.0, prior_sigma_t=10.0,
    fwd_posterior=ds,    # xr.Dataset — skips all file I/O
    temptype="SST",
)

Saving results

By default predict_T_from_proxyObs returns a dict in memory and writes nothing to disk. Pass save_results=True to persist:

result = predict_T_from_proxyObs(
    ...,
    save_results=True,            # writes quantile .nc + .npz
    save_draws=True,              # also saves raw MCMC draws as _draws.nc
    cache_dir="/your/output/",    # default: ~/.texas/cache/TEXAS_invT_posterior_cache/
)

Running forward calibration from scratch

Only needed if you want to re-fit the model to your own data or reproduce the published calibration. Requires CmdStan and the GDGT training database (TEXAS.download_training_data()).

from TEXAS import build_fwd_data, get_posterior, save_posterior

data = build_fwd_data(
    t_cul=cul_df["SST"].values,       proxy_cul=cul_df["scaledRI"].values,
    t_meso=meso_df["SST"].values,     proxy_meso=meso_df["scaledRI"].values,
    t_crtp=crtp_df["SST"].values,     proxy_crtp=crtp_df["scaledRI"].values,
    gdgt23ratio_crtp=crtp_df["gdgt23ratio"].values,
    no3_crtp=crtp_df["no3"].values,   # no3_cutoff auto-calculated via Spearman if omitted
)

posterior, diagnostics = get_posterior(
    data,
    stan_file="gen_logi_fixed_hier_crtp_multiv_priorApprox_eiv",
    temptype="SST",
    proxy_name="scaledRI_cren3",
)
save_posterior(posterior)

Citation

If you use TEXAS in published work, please cite:

Rattanasriampaipong, R. et al. (in prep). TEXAS: A proxy system model for TEX86 paleothermometry. AGU Paleoceanography and Paleoclimatology.

See CITATION.cff for machine-readable metadata.