Difference between revisions of "Overview"

From Popix
Jump to navigation Jump to search
m
 
(40 intermediate revisions by 4 users not shown)
Line 1: Line 1:
La modélisation d'un phénomène biologique ou physique est généralement motivée par l'observation de ce phénomène, c'est à dire lorsque des données générées par ce phénomène peuvent être recueillies. Quoi de plus naturel par conséquent, que de commencer par regarder quelques données ?
+
<div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]]
 +
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly:  http://popix.lixoft.net)
 +
</div>
 +
$
 +
\def\simulix{\mathsf{simulix} }
 +
$
 +
 
 +
The desire to model a biological or physical phenomenon often arises when we are able to record some observations issued from that phenomenon. Nothing would be more natural therefore than to begin this introduction by looking at some observed data.
  
 
    
 
    
 
{{ExampleWithImage
 
{{ExampleWithImage
|text= This first graphics display the viral load of four patients with hepatitis C who start a treatment at time $t=0$.
+
|text= This first plot displays the [http://en.wikipedia.org/wiki/Viral_load viral load] of four patients with [http://en.wikipedia.org/wiki/Hepatitis_C hepatitis C] who started a treatment at time $t=0$.
 
|image = NEWintro1.png
 
|image = NEWintro1.png
 
}}  
 
}}  
Line 9: Line 16:
  
 
{{ExampleWithImage
 
{{ExampleWithImage
|text=This second example involves weight data of rats, measured over 14 weeks in a setting of a sub-chronic toxicity study linked to Genetically Modified corn.
+
|text=This second example involves weight data for rats measured over 14 weeks, for a sub-chronic [http://en.wikipedia.org/wiki/Toxicity toxicity] study related to the question of [http://en.wikipedia.org/wiki/Genetically_modified_maize genetically modified corn].
 
|image = NEWintro2.png}}
 
|image = NEWintro2.png}}
  
  
 
{{ExampleWithImage
 
{{ExampleWithImage
|text= Here, the data are fluorescence intensities measured over time during a cellular biology experiment.
+
|text= In this third example, data are [http://en.wikipedia.org/wiki/Fluorescence fluorescence] intensities measured over time in a cellular biology experiment.
 
|image=NEWintro3.png }}
 
|image=NEWintro3.png }}
  
  
 
{{ExampleWithImage
 
{{ExampleWithImage
|text= Repeated measurements are not necessarily functions of time. For example, we may be interested in corn production as a function of fertilizer quantity.
+
|text= Note that repeated measurements are not necessarily always functions of time.
 +
For example, we may be interested in corn production as a function of fertilizer quantity.
 
|image= NEWintro4.png}}
 
|image= NEWintro4.png}}
  
  
Bien que ces différents examples concernent des domaines très différents, les données  sont toutes des données répétées recueillies sur plusieurs individus.  Une approche de population est alors pertinente pour décrire et modéliser ces données: in population approaches, the study is not based on a single individual, but on several individuals from the same population. The modeling goal is thus twofold : to characterize the biological or physical phenomena observed for each individual and the variability seen between individuals.
+
Even though these examples come from quite different domains, in each case the data is made up of repeated measurements on several individuals from a population. What we will call a "population approach" is therefore relevant for characterizing and modeling this data. The modeling goal is thus twofold: characterize the biological or physical phenomena observed for each individual, and secondly, the variability seen between individuals.
  
Dans notre exemple sur les rats, le modèle doit intégrer un modèle de croissance, qui décrit comment le poids d'un rat augmente avec le temps ainsi qu'un modèle statistique qui décrit pourquoi ces cinétiques peuvent varier d'un rat à l'autre. L'objectif est donc d'obtenir une courbe "typique" de population (in red), et d'expliquer la variabilité des courbes individuelles  (in green) autour de cette courbe de population.  
+
In the example with the rats, the model needs to integrate a growth model that describes how a rat's weight increases with time, and a statistical model that describes why these kinetics can vary from one rat to another. The goal is thus to finish with a "typical" curve for the population (in red) and to be able to explain the variability in the individual's curves (in green) around this population curve.
  
  
::[[File:NEWintro5.png]]
+
::[[File:NEWintro5.png|link=]]
  
  
Le modèle permettra d'expliquer une partie de cette variabilité par des covariables individuelles comme le sexe ou le régime (les rats 1 et 3 sont des mâles alors que les rats 2 et 4 sont des femelles), mais une partie restera non expliquée et sera considérée comme aléatoire. Intégrer dans un même modèle des effets considérés comme fixes et des effets considérés comme aléatoires conduit naturellement à l'utilisation de modèles à effets mixtes.
+
The model will explain some of this variability by individual [http://en.wikipedia.org/wiki/Covariate covariates] such as sex or diet (rats 1 and 3 are male while rats 2 and 4 are female), but some of the variability will remain unexplained and will be considered as random. Integrating into the same model effects considered fixed and others considered random  leads naturally to the use of [http://en.wikipedia.org/wiki/Mixed_model mixed-effects models].
  
Une approche alternative, mais équivalente, considère ce modèle comme un modèle hierarchique: chaque cinétique est décrite par un modèle individuel, et la variabilité entre les modèles individuels est décrite par un modèle de population. Dans le cas de modèles paramétriques, cela signifie que les observations d'un individu donné sont décrite par un modèle d'observations qui dépend d'un vecteur de paramètres individuels: c'est l'approche individuelle classique. L'approche de population est alors une extension directe de l'approche individuelle: on rajoute dans le modèle une composante qui décrit la variabilité de ces paramètres individuels au sein de la population.
+
An alternative yet equivalent approach considers this model as a [http://en.wikipedia.org/wiki/Multilevel_model hierarchical] one: each curve is described by a single model, and the variability between individual models is described by a population model. In the case of [http://en.wikipedia.org/wiki/Parametric_model parametric models], this means that the observations for a given individual are described by a model of the observations that depends on a vector of individual parameters: this is the classic individual approach. The population approach is then a direct extension of [[The individual approach|the individual approach]]: we add a component to the model that describes the variability of the individual parameters within the population.
  
Un modèle est alors une distribution de probabilité jointe, qui peut facilement s'étendre au cas où d'autres variables du modèles sont considérées comme des variables aléatoires: covariates, population parameters, design, ...  
+
A model can thus be seen as a [[What is a model? A joint probability distribution! | joint probability distribution]], which can easily be extended to the case where other variables in the model are considered as random variables: covariates, population parameters, the design, etc. The hierarchical structure of the model leads to a natural decomposition of the joint distribution into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional] and [http://en.wikipedia.org/wiki/Marginal_distribution marginal] distributions.
La structure hierarchique du modèle  permet une décomposition naturelle de cette distribution jointe en un produit de distributions conditionnelles et marginales.
 
  
Les modèles pour les paramètres individuels et les modèles pour les observations sont décrits dans le Chapitre "Models".   En particulier, des modèles pour des observations continues, catégorielles, de comptage et de survie sont présentés et illustrés par différents exemples. Extensions for mixture models, hidden Markov Models ans Stochastic Differential Equation based models are also presented.
+
Models for [[Modeling the individual parameters |individual parameters]] and models for [[Modeling the observations | observations]] are described in the [[Introduction_%26_notation|Models]] chapter. In particular, models for [[Continuous data models|continuous observations]], [[Model for categorical data|categorical data]], [[Models for count data|count data]] and [[ Models for time-to-event data | survival data]] are presented and illustrated by various examples. Extensions for [[ Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]] and [[Stochastic differential equations based models| stochastic differential equation based models]] are also presented.
  
Le Chapitre "Tasks & Tools" présente des exemples pratiques d'utilisation de ces modèles: [[Visualization|exploration et visualization]], [[Estimation]], [[Model evaluation#Model diagnostics | Model diagnostics ]], [[Model evaluation#Model selection| Model selection]], [[Simulation]]. Toutes les approches et méthodes proposées sont rigoureusement justifiées dans le Chapitre "Methods".
+
The  Tasks & Tools chapter presents practical examples of using these models: [[Visualization|exploration and visualization]], [[Estimation|estimation]], [[Model evaluation#Model diagnostics|model diagnostics]], [[Model evaluation#Model selection|model selection]] and [[Simulation|simulation]]. All approaches and proposed methods  are rigorously detailed in the [[Introduction and notation|Methods]] chapter.
  
 +
The main purpose of a model is to be used. Mathematical modeling and statistics remain useful tools  for many disciplines (biology, agronomy, environmental studies, pharmacology, etc.), but it is important that these tools are used properly. The various software packages used in this wiki have been developed with this in mind: they serve the modeler well, while fully complying with a coherent mathematical formalism and using well-known and theoretically justified methods.
  
L'objectif principal d'un modèle est d'être utilisé. La modèlisation mathématique et la statistique restent des outils au service d'autres disciplines (biology, agronomy, environment, pharmacology, ldots), mais il est important que ces outils soient correctement utilisés.  
+
Tools for model exploration ($\mlxplore$), modeling ($\monolix$) and simulation ($\simulix$) use the same model coding language $\mlxtran$. This allows us to define a complete workflow using the same model implementation, i.e., to run several different tasks based on the same model.
Les différents logiciels utilisés dans ce wiki ont été développés dans cet esprit: ils sont au service du modélisateur, tout en respectant rigoureusement un formalisme mathématique  cohérent et en utilisant des méthodes reconnues et justifiées théoriquement 
 
 
 
Une distinction
 
 
   
 
   
 
+
$\mlxtran$ is extremely flexible and well-adapted to implementing complex mixed-effects models.
Pourquoi $monolix$ & $\mlxtran$
+
With $\mlxtran$ we can easily write ODE-based models, implement  [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic models]] with complex administration schedules, include inter-individual variability in parameters, define statistical models for covariates, etc.
 
+
Another crucial property of $\mlxtran$ is that it rigorously adopts the model representation formalism proposed in $\wikipopix$. In other words,  the model implementation is fully consistent with its mathematical representation.
 
 
<ul>
 
<li> An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$.
 
Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters]] and implemented in $\monolix$ as we are convinced  by  its theoretical properties and satisfied by its practical qualities:
 
 
 
<ul style="list-style-type: square">
 
<li> The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.
 
 
 
<li> The SAEM implementation in \monolix is extremely efficient for a wide variety of complex models.
 
 
 
<li> The SAEM implementation in \monolix was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.
 
</ul></ul>
 
 
 
  
 +
$\mlxplore$ provides a clear graphical interface that allows us to visualize not only the structural model but also the statistical model, which is of fundamental importance in the population approach. We can  visualize for instance the impact of covariates and inter-individual variability of model parameters on predictions.  $\mlxplore$ is an ideal tool for teaching or discovering what a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic model]] is, for example.
  
 +
The algorithms implemented in $\monolix$ ([http://en.wikipedia.org/wiki/Stochastic_approximation Stochastic Approximation] of EM, [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo MCMC], [http://en.wikipedia.org/wiki/Simulated_Annealing Simulated Annealing], [http://en.wikipedia.org/wiki/Importance_sampling Importance Sampling], etc.) are extremely efficient for a wide variety of complex models. Furthermore,  convergence of [[The SAEM algorithm for estimating population parameters|SAEM]] and its extensions ([[Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]], [[Stochastic differential equations based models|SDE-based models]], censored data, etc.)  has been rigorously proved and published in statistical journals.
  
 +
$\simulix$ is a model computation engine which enables us to simulate a $\mlxtran$ model from within various environments. $\simulix$ is now available for the Matlab and R  platforms, allowing any user to combine the flexibility of R and Matlab scripts  with the power of $\mlxtran$ in order to easily encode complex models and simulate data.
  
 +
For these reasons, $\wikipopix$ and these  tools can be used with confidence for training and teaching. This is even more the case because  $\mlxplore$, $\monolix$ and $\simulix$ are free for academic research and education purposes.
  
  
 
{{Next
 
{{Next
 
|link=The individual approach }}
 
|link=The individual approach }}

Latest revision as of 09:34, 21 June 2013

Attention4.jpg

(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)

$ \def\simulix{\mathsf{simulix} } $

The desire to model a biological or physical phenomenon often arises when we are able to record some observations issued from that phenomenon. Nothing would be more natural therefore than to begin this introduction by looking at some observed data.


This first plot displays the viral load of four patients with hepatitis C who started a treatment at time $t=0$.
NEWintro1.png



This second example involves weight data for rats measured over 14 weeks, for a sub-chronic toxicity study related to the question of genetically modified corn.
NEWintro2.png



In this third example, data are fluorescence intensities measured over time in a cellular biology experiment.
NEWintro3.png



Note that repeated measurements are not necessarily always functions of time. For example, we may be interested in corn production as a function of fertilizer quantity.
NEWintro4.png


Even though these examples come from quite different domains, in each case the data is made up of repeated measurements on several individuals from a population. What we will call a "population approach" is therefore relevant for characterizing and modeling this data. The modeling goal is thus twofold: characterize the biological or physical phenomena observed for each individual, and secondly, the variability seen between individuals.

In the example with the rats, the model needs to integrate a growth model that describes how a rat's weight increases with time, and a statistical model that describes why these kinetics can vary from one rat to another. The goal is thus to finish with a "typical" curve for the population (in red) and to be able to explain the variability in the individual's curves (in green) around this population curve.


NEWintro5.png


The model will explain some of this variability by individual covariates such as sex or diet (rats 1 and 3 are male while rats 2 and 4 are female), but some of the variability will remain unexplained and will be considered as random. Integrating into the same model effects considered fixed and others considered random leads naturally to the use of mixed-effects models.

An alternative yet equivalent approach considers this model as a hierarchical one: each curve is described by a single model, and the variability between individual models is described by a population model. In the case of parametric models, this means that the observations for a given individual are described by a model of the observations that depends on a vector of individual parameters: this is the classic individual approach. The population approach is then a direct extension of the individual approach: we add a component to the model that describes the variability of the individual parameters within the population.

A model can thus be seen as a joint probability distribution, which can easily be extended to the case where other variables in the model are considered as random variables: covariates, population parameters, the design, etc. The hierarchical structure of the model leads to a natural decomposition of the joint distribution into a product of conditional and marginal distributions.

Models for individual parameters and models for observations are described in the Models chapter. In particular, models for continuous observations, categorical data, count data and survival data are presented and illustrated by various examples. Extensions for mixture models, hidden Markov models and stochastic differential equation based models are also presented.

The Tasks & Tools chapter presents practical examples of using these models: exploration and visualization, estimation, model diagnostics, model selection and simulation. All approaches and proposed methods are rigorously detailed in the Methods chapter.

The main purpose of a model is to be used. Mathematical modeling and statistics remain useful tools for many disciplines (biology, agronomy, environmental studies, pharmacology, etc.), but it is important that these tools are used properly. The various software packages used in this wiki have been developed with this in mind: they serve the modeler well, while fully complying with a coherent mathematical formalism and using well-known and theoretically justified methods.

Tools for model exploration ($\mlxplore$), modeling ($\monolix$) and simulation ($\simulix$) use the same model coding language $\mlxtran$. This allows us to define a complete workflow using the same model implementation, i.e., to run several different tasks based on the same model.

$\mlxtran$ is extremely flexible and well-adapted to implementing complex mixed-effects models. With $\mlxtran$ we can easily write ODE-based models, implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define statistical models for covariates, etc. Another crucial property of $\mlxtran$ is that it rigorously adopts the model representation formalism proposed in $\wikipopix$. In other words, the model implementation is fully consistent with its mathematical representation.

$\mlxplore$ provides a clear graphical interface that allows us to visualize not only the structural model but also the statistical model, which is of fundamental importance in the population approach. We can visualize for instance the impact of covariates and inter-individual variability of model parameters on predictions. $\mlxplore$ is an ideal tool for teaching or discovering what a pharmacokinetic model is, for example.

The algorithms implemented in $\monolix$ (Stochastic Approximation of EM, MCMC, Simulated Annealing, Importance Sampling, etc.) are extremely efficient for a wide variety of complex models. Furthermore, convergence of SAEM and its extensions (mixture models, hidden Markov models, SDE-based models, censored data, etc.) has been rigorously proved and published in statistical journals.

$\simulix$ is a model computation engine which enables us to simulate a $\mlxtran$ model from within various environments. $\simulix$ is now available for the Matlab and R platforms, allowing any user to combine the flexibility of R and Matlab scripts with the power of $\mlxtran$ in order to easily encode complex models and simulate data.

For these reasons, $\wikipopix$ and these tools can be used with confidence for training and teaching. This is even more the case because $\mlxplore$, $\monolix$ and $\simulix$ are free for academic research and education purposes.


Forward.png