Tutorials overview
Welcome to the BELLA tutorials ✅. This section is split into:
- A general overview of BEAST and BELLA.
- One tutorial per example in
examples/.
If you are new to BEAST, start here before jumping into the example pages.
BEAST in one minute
BEAST (Bayesian Evolutionary Analysis by Sampling Trees) runs MCMC to estimate phylogenies and model parameters. A BEAST analysis is defined by an XML configuration file that declares:
- data (trees, alignments, predictors)
- model components (priors, likelihoods, parameterizations)
- MCMC settings (operators, logging, chain length)
Think of the XML as a model graph: nodes compute values, and BEAST samples the parameters that feed those nodes.
Where BELLA fits
BELLA provides a Bayesian neural network component that lives inside the BEAST model graph. In practice:
- You declare predictors (trait vectors or time series).
- BELLA (
bella.BayesMLP) maps predictors -> rates. - Those rates feed a phylodynamic model (birth-death, migration, etc.).
This lets BEAST estimate network weights and evolutionary parameters jointly.
BELLA behaves like any other BEAST component:
- it is a CalculationNode (recomputes when its inputs change),
- a Function (outputs a vector of values),
- and a Loggable (it can write to BEAST logs).
How a BELLA XML is structured
Below is the mental model for a BELLA-driven BEAST analysis:
Data + predictors + priors
v
BEAST XML
v
MCMC
v
Posterior logs
1) XML skeleton
Every BEAST XML starts with a <beast> tag and a namespace that includes BELLA and any extra packages you need.
2) Data inputs
You typically provide:
- a tree (often Newick)
- one or more predictor vectors (CSV or direct values)
3) State nodes (parameters to estimate)
In BEAST, state nodes are parameters sampled by MCMC. For BELLA, these are usually the network weights. Other parameters (like sampling rates, clock rates, or population sizes) can also be sampled.
Example:
<state id="state">
<plate var="n" range="$(layersRange)">
<stateNode spec="RealParameter" id="birthRateW$(n)" value="0"/>
<stateNode spec="RealParameter" id="deathRateW$(n)" value="0"/>
</plate>
<stateNode spec="RealParameter" id="samplingRate" value="$(samplingRateInit)"/>
</state>
How to think about it:
layersRangeexpands one state node per layer connection.- BELLA expects one weight parameter per connection (hidden layers + 1).
- The weight parameters are flattened vectors; BELLA reshapes them internally.
4) BELLA wiring (predictors -> rates)
A BELLA MLP is defined in XML like this:
<skylineValues id="birthRate" spec="bella.BayesMLP" nodes="$(nodes)">
<predictor spec="RealParameterFromXSV" fileName="$(birthRatePredictorFile)"/>
<plate var="n" range="$(layersRange)">
<weights idref="birthRateW$(n)"/>
</plate>
<outputActivation spec="bella.activations.Sigmoid" lower="$(birthRateLower)" upper="$(birthRateUpper)"/>
</skylineValues>
Key points:
nodesdefines hidden layers only. BELLA adds input and output layers automatically.- The predictor list becomes a matrix. Each row is an observation (time bin), each column is a predictor.
- Use
outputActivationto constrain the output range (Sigmoid is a common choice). - If
normalize=true(default), BELLA min-max normalizes predictor vectors to[0, 1].
5) Time bins and skyline vectors
BELLA often predicts skyline vectors, where each element is a rate for a time bin.
If your XML contains:
changeTimes = "t1 t2 ..."
then the number of skyline values is:
length(changeTimes) + 1
So every predictor vector must have that exact length.
6) Activation functions (hidden + output layers)
BELLA supports multiple activation functions. You configure them in XML with hiddenActivation and outputActivation:
<skylineValues id="birthRate" spec="bella.BayesMLP" nodes="$(nodes)">
<predictor spec="RealParameterFromXSV" fileName="$(birthRatePredictorFile)"/>
<hiddenActivation spec="bella.activations.ReLU"/>
<outputActivation spec="bella.activations.Sigmoid" lower="$(birthRateLower)" upper="$(birthRateUpper)"/>
...
</skylineValues>
Available activations:
- ReLU:
max(0, z)(default for hidden layers) - Tanh:
tanh(z) - SoftPlus:
log(1 + exp(z))with a stable implementation - Identity:
z(no nonlinearity) - Sigmoid: bounded logistic with parameters:
lowerandupperto clamp the output rangeshape(steepness) andmidpoint(center)
Practical guidance:
- Use Sigmoid for outputs that must stay within bounds (rates, probabilities).
- Use ReLU or Tanh for hidden layers to allow nonlinear mappings.
- If you want linear outputs, set
outputActivationto Identity.
7) Likelihood and priors
In BEAST, the posterior is a product of likelihood and prior terms:
<distribution id="posterior" spec="CompoundDistribution">
<distribution id="likelihood" spec="CompoundDistribution">
<!-- model likelihood here -->
</distribution>
<distribution id="prior" spec="CompoundDistribution">
<!-- priors here -->
</distribution>
</distribution>
BELLA weights are just parameters, so you place priors on them like any other RealParameter:
<Normal id="weightsPrior" mean="0" sigma="1"/>
<plate var="n" range="$(layersRange)">
<distribution spec="Prior" x="@birthRateW$(n)" distr="@weightsPrior"/>
<distribution spec="Prior" x="@deathRateW$(n)" distr="@weightsPrior"/>
</plate>
8) Operators (how MCMC moves)
Operators propose new parameter values so MCMC can explore the posterior. BELLA examples use Bactrian random walks on weights:
<plate var="n" range="$(layersRange)">
<operator spec="BactrianRandomWalkOperator" parameter="@birthRateW$(n)" weight="30.0"/>
<operator spec="BactrianRandomWalkOperator" parameter="@deathRateW$(n)" weight="30.0"/>
</plate>
Use higher weights for parameters that need more frequent updates. Tune operator weights and step sizes if mixing is poor.
9) Logging (including BELLA internals)
Logging controls what you see in output files. You can log:
- posterior, likelihood, prior
- skyline rates
- BELLA outputs
- BELLA network weights
Example logger:
<logger spec="Logger" fileName="FBD.log" logEvery="$(logEvery)" model="@posterior">
<log idref="posterior"/>
<log idref="prior"/>
<log idref="likelihood"/>
<log idref="birthRateSP"/>
<log idref="deathRateSP"/>
<log idref="birthRate"/>
<log idref="deathRate"/>
</logger>
How BELLA logs its weights:
bella.BayesMLPimplements Loggable.- On init, it prints headers like
W.LayerX[i][j]for every weight matrix entry. Xis the layer index (1-based).iis the input neuron index (including the bias row as index 0).jis the output neuron index.- During logging, it prints every weight value in order.
So if you add <log idref="birthRate"/> and <log idref="deathRate"/>, you will get the full network weights per MCMC sample in your log file.
10) JSON + -DF substitutions
BELLA examples use a data.json file that fills placeholders like $(treeFile) and $(nodes) at runtime.
Run with:
beast -DF path/to/data.json path/to/config.xml
This is the easiest way to tweak a BELLA configuration without editing the XML directly.
Next, choose a specific tutorial from the sidebar for a full, runnable example 🚀.