---
path: /guides/glm-insurance-pricing-python
title: "GLM insurance pricing in Python"
description: "Build a frequency-severity GLM for insurance pricing in Python: Poisson frequency with a log-exposure offset, Gamma severity, factor tables, and A/E validation."
section: Resources
priority: 0.6
changefreq: monthly
source_file: pages/marketing/seo/articleData.ts
---

# GLM insurance pricing in Python

Most insurers run GLMs, and many were built in Emblem or R. Doing the same thing properly in Python means handling exposure, overdispersion, and validation the way an actuary would — not the way a generic ML tutorial does. Here is the rigorous version.

## Frequency-severity, not a single model

Standard practice fits two GLMs: claim frequency (Poisson, outcome = claim count, offset = log of exposure) and claim severity (Gamma, outcome = average severity conditional on a claim, weighted by claim counts). Pure premium is frequency times severity. A single Tweedie model is an alternative, but you must estimate the power parameter rather than defaulting to 1.5.

## The non-negotiables

- Exposure as an offset (offset = log(exposure)), not a feature or a sample weight — otherwise the model predicts a count, not a rate.
- Quasi-Poisson / overdispersion-corrected standard errors for inference; plain Poisson understates uncertainty.
- Actual/Expected validation by factor band, not just global deviance — global fit can hide a bad factor.
- Export factor tables as multiplicative relativities compatible with your rating engine.

## Tooling

glum and statsmodels both handle exposure offsets and the GLM families correctly; sklearn does not surface exposure offsets cleanly and is a poor fit for actuarial inference. Use temporal cross-validation rather than k-fold, since insurance data has a time structure.
