Overview
A Bayesian network is a probabilistic graphical model that encodes a joint probability distribution over a set of random variables by using a directed acyclic graph (DAG). Each node represents a variable and each directed edge expresses a probabilistic dependency: a child node is conditionally dependent on its parent nodes. The global joint distribution factorizes according to the graph structure into a product of local conditional distributions, so that complexity is managed by exploiting conditional independence. Bayesian networks are used for reasoning under uncertainty, diagnosis, prediction, and decision support.
Structure and components
The principal components of a Bayesian network are the graph topology and the local probability models. The topology is a DAG whose vertices correspond to random variables and whose directed edges indicate direct influences. Numerically, discrete variables are typically specified by conditional probability tables (CPTs) that give P(child | parents); continuous variables are represented by conditional densities, often Gaussian or other parametric families. The factorization implied by the DAG can be written informally as a product: the joint probability equals the product over all variables of the probability of each variable given its parents. Conditional independence relations implied by the graph can be tested using the criterion known as d-separation, which helps to simplify inference and to identify which variables shield others from influence.
Inference
Common inference tasks include computing marginal probabilities, finding the most probable explanation given observations, and answering conditional queries (belief updating). Exact algorithms include variable elimination and junction tree (clique tree) methods; these exploit graph structure but may become infeasible when networks are large or densely connected. Approximate algorithms include Monte Carlo sampling (e.g., Markov chain Monte Carlo), importance sampling, belief propagation on factor graphs, and variational approximations. Practical use often balances model expressiveness with tractable inference.
Learning
Learning a Bayesian network from data involves two related problems: parameter learning and structure learning. Parameter learning fits the local conditional distributions given a fixed graph and can be done by maximum likelihood estimation or by Bayesian methods that place priors on parameters. Structure learning discovers the DAG itself from data: score-based approaches search for graphs that optimize a quality score (such as BIC or Bayesian scores), while constraint-based approaches infer conditional independencies and assemble a graph consistent with those constraints (for example, the PC algorithm). Hybrid methods combine both ideas. When data are incomplete or missing, expectation-maximization and fully Bayesian treatments are commonly used.
Variants and extensions
Several extensions address different modeling needs. Dynamic Bayesian networks (DBNs) represent temporal processes by linking variables across time slices. Hybrid networks mix discrete and continuous variables. Influence diagrams extend Bayesian networks by adding decision and utility nodes for decision analysis. Undirected graphical models (Markov networks) provide an alternative representation that is better suited to some problems, while factor graphs and conditional random fields are related representations that emphasize factorization and conditional modeling, respectively.
Causality and interpretation
Although a Bayesian network encodes statistical dependencies, edges do not automatically imply causal effects unless the model is built with causal assumptions. When a network is interpreted causally, it can be used to reason about interventions and counterfactuals; formal tools for this purpose include do-calculus and causal graphical models developed in the literature. Causal modeling typically requires domain knowledge or experimental data to justify directional edges.
Applications
- Medical diagnosis and clinical decision support, combining symptoms, test results and disease models.
- Fault diagnosis, reliability analysis and prognostics in engineering systems.
- Machine learning tasks such as classification, information extraction and parts of automated pipelines (machine learning).
- Signal and image processing, speech recognition, sensor fusion and pattern recognition.
- Information retrieval and document analysis (information retrieval).
- Bioinformatics and genetics, where networks express regulatory or probabilistic relationships between biological variables.
Practical considerations
Designing and deploying Bayesian networks requires choices about variable granularity, model structure, and the tradeoff between expressive power and computational tractability. Large networks may be simplified by exploiting conditional independencies, modular design, or approximate inference. Software tools and libraries are available to build, learn, and perform inference with Bayesian networks; practitioners often combine theoretical understanding with empirical model validation.
For background on the graphical representation see graph concepts, for algorithmic methods consult material on probabilistic inference, and for formal probabilistic foundations see treatments of random variables and conditional probabilities. The probabilistic foundation of the approach traces to Bayes' theorem and the ideas associated with Bayes' theorem, and the models are widely taught and applied within modern machine learning curricula and applied research.
