1. In Situ Flux Measurements : Single Site Data Optimization

1.1. Assimilating fluxtower data (NEE and LE) for Tapajós tower in the Amazon, validation for Guyaflux and Jarú site

In this study a standard model parameterization of ORCHIDEE produces seasonal NEE patterns that are opposite in phase to the eddy flux data of the tropical evergreen forest at the Tapajós km 67 site (Brazil), like many other global models. However, we optimized several key parameters of ORCHIDEE using eddy covariance data of the Tapajós km 67 site. The validity of the retrieved parameter values was evaluated for two other flux tower sites in the Amazon.

The different tested optimization scenarios showed that only a few parameters substantially improve the fit to NEE and latent heat (LE) data. Our results confirm that these forests have the ability to maintain high transpiration and photosynthesis during the dry season in association with a large soil depth (Dsoil = 10 m) and a rooting system density that decreases almost linearly with depth (Hroot = 0.1). In addition, our results highlight both the potential and the limitations of flux data to improve global terrestrial models. Several parameters were not identifiable and the risk of over-fitting of the model was illustrated. Nevertheless, we conclude that these models can be improved substantially by assimilating site level flux data over the tropics.

Ref.: Verbeeck H. et al., J. Geophys. Res., 116, G02018, doi: 10.1029/2010JG001544, 2011 

Figure 1. Measured and modeled seasonal patterns of daily mean NEE (g C m–2 s–1) of the Tapajós km 67 site in Brazil (10 day running means). Eddy covariance measurements (black) are compared with the prior model (green), optimization O7b (red) and optimization O11b (blue). The monthly measured precipitation (mm) is shown in the lower panel. Dry periods are indicated in grey (< 100 mm month–1). On the top — map showing flux tower locations in Brazil including the K67 site, which is the Tapajós km 67 flux tower located in the central Amazon tropical evergreen rainforest (adapted from de Gonçalves et al. 2013).

1.2. Optimizing a process-based ecosystem model with eddy-covariance flux measurements

In this first study we design a Bayesian inversion method (gradient-based) to optimize the key parameters of ORCHIDEE model against the combination of prior information upon the parameters and eddy covariance fluxes. Inversion is carried out using measurements of CO2, latent heat, and sensible heat fluxes as well as of net radiation over a pine forest in southern France. We designed an ensemble of inversions with different set ups using flux data over different time periods, in order to (1) identify well-constrained parameters and loosely constrained ones, (2) highlight some model structural deficiencies, and (3) quantify the overall information gained from assimilating each type of CO2 or energy fluxes. Assimilating 3 weeks of half-hourly flux data during the summer improves the fit to diurnal variations, but merely improves the fit to seasonal variations. Assimilating a full year of flux data also improves the fit to the diurnal cycle more than to the seasonal cycle. This points out to the key importance of timescales when inverting parameters from high-frequency eddy-covariance data. The figure illustrates the model fit to the data (NEE, LE, H, Rn) for the mean diurnal cycle during the growing season period.

Ref.: Santaren D. et al., Global Biogeochem. Cycles, 21, GB2013, doi: 10.1029/2006GB002834, 2007 

Figure 2. Bin-averaged diurnal cycles for the GS period (days 195–216 of 1997) and for each type of data (NEE, LE, H, and Rn). Diamonds with the errors bars represent the data and their uncertainties. Dotted lines show prior model outputs and solid and dashed lines model outputs with optimized parameters values for the full year (FY) and the growing season (GS) cases, respectively.

2. In Situ Flux Measurements : Multiple Site Data Optimization

2.1. Optimizing and evaluating generic parameters from the multi-site assimilation of flux tower data

Temperate deciduous broadleaf forests

In a first study, we simultaneously assimilated in situ measurements of daily net ecosystem exchange (NEE) and latent heat (LE) flux from 12 temperate deciduous broadleaf forests sites, into the ORCHIDEE ecosystem model. We evaluate the ability of the model to simulate the carbon (NEE, gross primary productivity GPP, ecosystem respiration Reco) and water (LE) fluxes at these 12 sites with the set of parameters retrieved through this “multi-site” approach, as compared to those derived from separate optimizations at each site (“single-site” approach).

The multi-site (MS) approach significantly decreases the model-data misfit from daily to yearly timescales, with performances consistent to those of the single-site (SS) (Fig. 3). Besides, the MS inversion system finds a unique combination of parameters which is not the simple average of the different SS sets of parameters, and yields better performances than the latter. It is also found that optimization approach in general distinctively improves the simulation of Reco, and to a lesser extent that of GPP. Finally, a global-scale simulation with MS-optimized parameters show an enhanced phase agreement between modeled leaf area index (LAI) and remote-sensing observations of normalized difference vegetation index (NDVI).

Ref.: Kuppel S. et al., Biogeosciences, 9, 3757-3776, doi: 10.5194/bg-9-3757-2012, 2012  

Figure 3. Model-data RMS at different time scales for (a) NEE and (b) LE. Prior model is in grey, MS optimization in blue and SS optimizationin orange.

Multi-site optimization across ecosystems

This second study evaluates the validity of the multi-site approach using eddy covariance measurements of daily NEE and LE from a large number of sites grouped in seven plant functional types (PFTs). The optimized PFT-generic sets of parameters are found to enhance the agreement between measured and simulated fluxes at most of the sites (Fig. 4), notably improving the CO2 balance at tropical and temperate forests sites and reducing the output uncertainty, with performances comparable to those of the SS optimizations. Comparison with estimates of GPP and Reco indicates improved simulations of both gross fluxes. Atmospheric CO2 concentration (CCO2) records from 53 locations around the globe indicates an overall improvement of the modelled CCO2 seasonality when using MS-derived parameters in earth system simulations (Fig. 5), along with a better simulated interannual variability. The NDVI/LAI evaluation methodology (see previous study) also shows a median improvement of the modelled foliar cover phenology for all PFTs. The overall poorer optimization results in tropical evergreen broadleaf forests point at structural model limitations.

Ref.: Kuppel S. et al., Geosci. Model Dev., 7, 2581-2597, doi: 10.5194/gmd-7-2581-2014, 2014 

Figure 4. Model-data (A) RMSD and (B) bias for the daily NEE time series at each site (filled circles), grouped and averaged by PFT (horizontal bars), in three cases: prior model (green), multi-site optimization (blue) and single-site optimization (orange). (C) PFT-averaged mean seasonal cycle of NEE, for the training observations (black) and the three aforementioned cases, smoothed with a 15-day-moving-average window.

Figure 5. Detrended mean seasonal cycle of the atmospheric CO2 concentrations at (A) Alert, (B) South Pole and (C) Mauna Loa locations during the 1989-2009 period: the optimization-independent concentrations records (black) are compared to simulations where the biospheric contribution is calculated using the ORCHIDEE model with default (green) and multi-site (blue) parameterization, with the model-data RMSD given between brackets. (D) Regional contributions to the mean seasonal cycle simulated at Alert.

Functional traits variability inferred from data assimilation

Past attempts to optimize Terrestrial Biosphere Models (TBM) parameters mostly focused on model performance and uncertainties, overlooking the ecological properties of ecosystems. In this study we assessed the ecological consistency of optimized trait-related parameters while improving the model performances for gross primary productivity (GPP) at sites. We proposed a simple protocol in which we optimized 14 parameters linked to carbon assimilation against 371 site-year estimates of GPP from eddy-covariance observations (Fig. 6).

The results showed that optimized parameter values are consistent with leaf-scale traits and well-known trade-offs observed at the leaf level, echoing the leaf economic spectrum theory. Our results also highlighted a marked sensitivity of trait-related parameters to local bio-climatic variables and reproduced observed relationships between traits and climate (Fig. 7), thus validating the main GPP-related processes implemented in the ORCHIDEE model.

Ref.: Peaucelle M. et al., Submitted to Global Ecology and Biogeography, doi: 10.1111/geb.12937, 2019 

Figure 6. Schematic representation of the modeling protocol followed in this study. For each FLUXNET site-year (blue), the model ORCHIDEE (green) was calibrated with the data assimilation system ORCHIDAS (red) in order to reproduce GPP observations. The ORCHIDAS system uses a gradient-based approach (L-BFGS-B) to reduce the cost function J(x). For each site-year, 14 parameters were optimized 10 times with different initial values. The best calibration, i.e. leading to the minimum value of J(x), was retained. This procedure was repeated for each site-year, resulting in 371 sets of 14 independently optimized parameters. Finally, correlations between optimized parameters and climate were explored using standardized major axis regressions.

Figure 7. Four examples of co-variations obtained between optimized parameters and environmental conditions of the sites for PFTs TroEB (black square), TEN (red square), TemEB (green triangle), TDB (blue square), BEN (cyan dots) and BDB (pink dots). Each point represents the mean optimized parameter (environmental variable) value for one site while the error bars represent the inter-annual variability (no bars means only one year of measurement). The red line represents the slope of the standardized major axis regression.

3. Satellite Data

3.1. Phenology data

In order to improve the timing of the seasonal cycle of deciduous vegetation in ORCHIDEE, we have optimised the phenology-related model parameters using nine years of satellite-derived vegetation "greenness" index (NDVI) data from the MODIS instrument.

We found the misfit between the observations and the model decreased for all boreal and temperate deciduous Plant Functional Types (PFTs) due to an earlier onset of leaf senescence induced primarily by higher temperature and moisture thresholds. The bias in the date of leaf onset and senescence was partially reduced for tropical deciduous trees; however, no improvement was seen for natural C4 grasses that are mainly located in the dry tropics and semi-arid regions. This suggests that the phenology models used for these regions need improvement.

Spatial and temporal validation at site-level, and globally, demonstrated the generality of the posterior parameter vector for use in global simulations, with an increase in global median correlation between the model and the MODIS data of 0.56 to 0.65.

The optimisation of leaf phenology resulted in a reduction in the mean global annual GPP of ~10PgC over the 1990-2010 period due to a shortened Growing Season Length (GSL) and/or smaller amplitude, thereby reducing the positive bias compared to data-based estimates. The earlier decline in carbon assimilation produced a shift in phase of the seasonal cycle of net C surface fluxes and atmospheric CO2 toward an earlier start to the period of C uptake, thus improving the fit to atmospheric CO2 concentration data measured at land-based stations.

Finally, the optimisations generally caused a decrease in the strength of vegetation "greening" and "browning" trends at a global scale.

Ref.: MacBean N. et al., Biogeosciences, 12, 7185-7208, doi: 10.5194/bg-12-7185-2015, 2015 

Figure 8. The difference (posterior — prior) of the mean (over the 1990 — 2010 period) of a) simulated GSL (days); b) annual amplitude of the simulated normalised fAPAR (range 0-1) and c) annual mean of the simulated normalised fAPAR.

3.2. Fluorescence data

Model optimization using SIF data

The aim of this study to investigate the potential benefit of using a global, satellite-derived fluorescence (SIF) dataset to constrain and improve carbon cycle simulations of the ORCHIDEE model. Given the apparent strong linear relationship between GPP and fluorescence (Frankenberg et al., 2011; Guanter et al., 2012), the aim is first to optimize the parameters of the linear relationship between fluorescence and GPP (SIF = a GPP + b) as well as parameters relating to photosynthesis and phenology, in order to determine if the fluorescence data provide a direct constraint on the carbon uptake.

We used monthly 0.5° GOME-2 SIF data from 2007 to 2011 to optimise GPP-related parameters (photosynthesis and phenology) of ORCHIDEE. The optimisation reduces GPP magnitude across all vegetation types except C4 plants. Global mean annual GPP therefore decreases from 194 ± 57 PgC/yr to 166 ± 10 PgC/yr, bringing the model more in line with an up-scaled flux tower estimate of 133 PgC/yr. Strongest reductions in GPP are seen in boreal forests: the result is a shift in global GPP distribution, with a ~50% increase in the tropical to boreal productivity ratio.

This work was supported and co-funded by the ESA FLEX-Bridge Project in collaboration with colleagues at University College London (, the Copernicus Atmosphere Monitoring Service (CAMS41 project) implemented by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Commission, and the CNES TOSCA Flu-OR Project.

Ref.: MacBean N. et al., Scientific Reports, 8, 1973, doi: 10.1038/s41598-018-20024-w, 2018 

Figure 9. Global mean annual sum (2007–2011) and spatial distribution of: (a) GOME-2 SIF; (b) JUNG upscaled FLUXNET data-driven GPP product 18; (c) ORCHIDEE prior GPP; (d) ORCHIDEE posterior GPP; (e) difference in ORCHIDEE simulated GPP (posterior – prior); (f) reduction in GPP uncertainty (1σ).

Mechanistic SIF observation operator in ORCHIDEE

We developed a mechanistic SIF observation operator in ORCHIDEE. It relies 1) on the simulation of the regulation of the PSII fluorescence quantum yield at the leaf level, following the lake model formalism and using a novel parameterization of NPQ as a function of temperature, PAR and the normalized quantum yield of photochemistry, and 2) a parametric simplification of the SCOPE model to emulate the radiative transfer of chlorophyll fluorescence to the top of the canopy.

We then assimilated two years of monthly OCO-2 SIF product at 0.5° (2015-2016) to optimize ORCHIDEE photosynthesis and phenological parameters over an ensemble of grid points for the 14 vegetation plant functional types (PFTs). The impact on the simulated GPP was considerable with a large decrease of the global scale budget by 28 GtC.yr-1 over the period 1990-2009. The optimized GPP budget (134 / 135 GtC.yr-1 over 1990-2009 / 2001-2009) remarkably agrees with independent GPP estimates, FLUXSAT 137 GtC.yr-1 over 2001-2009) in particular and FLUXCOM (121 GtC.yr-1 over 1990-2009).

The positive trend of the model was decreased from 0.56 GtC.yr-2 to 0.43 GtC.yr-2 for both 1990-2009 and 2001-2009 periods. It was hence in closer agreement with mean linear trends estimated in the literature from several models (for instance 0.41 GtC.yr-2 both over the period 1990-2009 in Anav et al. (2015) and over the period 2000-2010 in Chen et al. (2019); 0.35 GtC.yr-2 over the 1981-2010 period in Ito et al. (2017)) , but departed from the higher positive trend of FLUXSAT (1.35 GtC.yr-2 over 2001-2009). That positive trend in ORCHIDEE is due to the fertilization effect of increasing CO2 atmospheric concentrations to which the model is sensitive, although moderated by a down-regulation effect that mimics the role of nitrogen limitation on photosynthesis in the version used, and to the increasing surface temperatures.

The reasons for these large changes vary between PFTs. For deciduous trees and C3 and C4 grasses and crops, it was attributed to a decrease of the growing season length mostly; for tropical forests, temperate and boreal evergreen trees as well as C3 and C4 grasses and crops, it can also be related to a reduction of the leaf area and photosynthetic capacities. At global scale, this translated into a decrease of the gross primary productivity over the Tropics principally, Europe and over a large area covering the Northwestern part of North America; at the opposite, an increase in GPP was obtained in the US corn belt and over boreal Asia.

Our results also suggest a biome dependency of the SIF-GPP relationship that needs to be improved for some PFTs.

Ref.: Bacour C. et al., J. Geophys. Res. Biogeosci., 124, 10, 3143-3157, doi: 10.1029/2018JG004938, 2019
Ref.: Bacour C. et al., J. Geophys. Res. Biogeosci., 124, 11, 3281-3306, doi: 10.1029/2019JG005040, 2019 

Figure 10. Compared annual variation of global scale GPP budget simulated by ORCHIDEE prior (ORCH-std) and after assimilation of OCO-2 SIF data (ORCH-opti), and estimated by FLUXCOM and FLUXSAT data-driven approaches.

4. In Situ Flux Measurements & Satellite-derived fAPAR Data

We have evaluated to potential benefit of jointly assimilating eddy-covariance flux measurements and fAPAR products in order to improve the simulations of the ORCHIDEE terrestrial biosphere model with ORCHIDEE Data Assimilation System.

The study relies on measurements of net carbon exchange (NEE), and latent heat (LE), fluxes at two instrumented sites - Fontainebleau (deciduous broadleaf forest) and Puechabon (Mediterranean broadleaf evergreen forest) – and fAPAR time series measured in situ and derived from spaceborne observations at high (SPOT) and medium (MERIS) spatial resolutions.

Each data stream has first been assimilated separately and we have analysed the subsequent effect on the other data streams. The strong discrepancies between the various fAPAR products, mainly in terms of mean magnitude, has led to significantly different model response after assimilation. In particular a degradation of the model-data agreement with respect to NEE has been obtained at the two sites. Interestingly, combining flux measurements with one or the other fAPAR products in the assimilation has lead to a model-data agreement of the same order than when each data stream was assimilated separately.

The results suggest that assimilating fAPAR observations can bring additional knowledge on vegetation functioning (mainly phenology), which is best achieved by assimilating normalised fAPAR time series for ecosystems with pronounced seasonal cycle, but that it has to be accounted for with caution when assimilated alone, at the risk of falsifying the model.

Ref.: Bacour C. et al., J. Geophys. Res. Biogeosci., 120, 1839-1857, doi: 10.1002/2015JG002966, 2015 

Figure 11. Ratio between the posterior and prior RMSE of fit, between the model simulations and different observed variables, considering assimilations performed with only flux data, fAPAR (normalised data for Fontainebleau, raw data for Puechabon) data only, and the combination of the two datastreams.

5. In Situ Flux Measurements & Biomass Data

5.1. Use of biomass data in addition to CO2/H2O flux data in Model Data Fusion at two temperate forest sites

We demonstrated the use of biomass data with the ORCHIDEE Data Assimilation System at two temperate forests in France, Beech forest Hesse and Maritime pine forest Le Bray. In addition to the micrometeorological fluxes, we used annual values for biomass increments and tested how to use the total aboveground biomass (AGB) measurements. We used the biomass data to optimize the parameters related to allocation.

The use of biomass increment with the flux data was successful with improvement in all the data streams (Fig. 12). However, the use of AGB was more challenging. We used AGB to optimize mortality on longer time scale, but as the clear cuts and storms were not described in the model, we obtained too high mortality rate for the forest. Overall, use of annual biomass increment improved allocation to more realistic direction in light of the observations at Hesse, but the use of AGB requires a model version with more event description.

Ref.: Thum et al., Agricultural and Forest Meteorology, 234, 48-65, doi: 10.1016/j.agrformet.2016.12.004, 2017 

Figure 12. The micrometeorological fluxes (GPP, TER and Qle) and annual aboveground biomass increments at Hesse in 2001-2004. The 15-day running averages of the flux daily values are shown. The measurements are shown in black, a priori simulation result as green and a posteriori in red. The RMSE value between simulation and observation is shown in parenthesis.

6. In Situ Flux Measurements & Satellite & Atmospheric CO2 Data

6.1. Assimilating multiple-data streams (MODIS-NDVI, FluxNet data, Atmospheric CO2) in a Carbon Cycle Data Assimilation System: a new step-wise approach

We designed a new step-wise multi-data streams assimilation system where we assimilate sequentially three sets of observations, providing complementary information on the carbon cycle (from local to continental scale), in order to constrain ORCHIDEE’s parameters:

  1. Step 1: MODIS-NDVI : An ensemble of selected points covering all Plant Functional Types (PFTs) of ORCHIDEE is used to first constrain phenology parameters; see MacBean et al. (2014) for details.
  2. Step 2: FluxNET CO2 and water fluxes : As in Kuppel et al. (2014), we used 70 sites to optimize a larger set of parameters (compared to step 1) controlling the photosynthesis and respiration related parameters.
  3. Step 3: Atmospheric CO2 data : We finally optimized a subset of step-2 parameters (including the initial soil carbon content, scaled by eco-regions) with atmospheric CO2 concentration data, using the LMDz atmospheric transport model to relate surface fluxes to concentrations.

The critical feature of the step-wise system, as described in the figure, consists to propagate the information from one step to the next (i.e., posterior parameter values and error covariance matrix). At each step, the optimization relies on a variational approach that minimizes a cost function describing the model — data mismatch and the prior information on the parameters. Examples of model-data fit are provided in the attached figure for the each data streams. The paper describes i) the benefit of this step-wise approach (i.e. from a practical point of view), compared to a more rigorous simultaneous optimization of all data, streams as well as ii) the potential drawback and the needed consistency checks.

Ref.: Peylin et al., Geosci. Model Dev., 9, 3321-3346, doi: 10.5194/gmd-9-3321-2016, 2016 

Figure 13. Schematic of a step-wise carbon cycle multi-data streams assimilation system, based on the ORCHIDEE land surface model. MODIS-NDVI, FluxNet carbon and water fluxes, and atmospheric CO2 data are assimilated sequentially in order to optimize key parameters of ORCHIDEE. The posterior (after optimization) information on the parameters (values and error covariance matrix) is propagated from one step to the next. Model-data fits are illustrated for i) the phenology of a Boreal Deciduous Broadleaf forest site after the assimilation of MODIS data (mean seasonal cycle of NDVI), ii) the mean seasonal cycle of the NEE across all FluxNet sites (per PFTs) and iii) the atmospheric CO2 concentration at Mauna Loa station.

7. Surface Temperature Data

7.1. Variational assimilation of land surface temperature within the ORCHIDEE continental surface model

The aim is to test the new tool YAO for automatic adjoint code calculation in assimilation of thermal and hydrological data in ORCHIDEE. The other objective is to estimate how strongly the model parameters are constrained by soil temperature and wetness data (in this case, in situ data at two sites: Harvard Forest and Krüger Park).

Ref.: Benavides Pinjosovsky H. S. et al., Geosci. Model Dev., doi: 10.5194/gmd-10-85-2017, 2017 

Figure 14. Gradient of modeled soil temperature with respect to various hydrological-related parameters for C3 grassland cover at Harvard Forest site (Canada). Calculations by finite differences (blue) and code differentiation using YAO (red) show almost perfect agreement. Such figures demonstrates that some parameters are better constrained by Tsol than others, and that sensitivity varies according to the forcing and to the time of the day. It also depends upon the season and upon the site.

8. Methods

8.1. Data assimilation as a tool to diagnose the structural error of an ecosystem model

This study explores the impact of the structural error of biosphere models when assimilating net ecosystem exchange (NEE) measurements or CO2 concentration measurements to optimize uncertain model parameters within carbon cycle data assimilation systems (CCDAS). We propose a simple method which is derived from the model-minus-observation mismatch statistics, here applied to the ORCHIDEE model using NEE measurements at twelve temperate deciduous broadleaf forest sites. We find that the structural model error significantly dominates the measurement error in the NEE space; it has a standard deviation of 1.5 to 1.7 gC/m2/d, without a significant correlation structure beyond the lag of a few days (Fig. 15A), and possesses a large spatial structure that can be approximated with an exponential decay of e-folding length of 500 km (Fig. 15B). In the space of atmospheric CO2 concentrations, its characteristics are commensurate with the transport errors, both for surface air sample measurements and total column measurements.

Ref.: Kuppel S. et al., Geosci. Model Dev., 6, 45-55, doi: 10.5194/gmd-6-45-2013, 2013 

Figure 15. (A) All-site median of the autocorrelation of the observation error (model+measurement errors), estimated at each site with three diagnotics: prior with the linear assumption (orange), prior with ensemble simulations (blue), and posterior (grey). (B) Distance correlogram of the observation (model+measurement) error using pairs of distant sites for a same time. The value represented by each blue diamond includes all the common years of one site pair. The thick black line represents the overall median using 400-km bins, and the dotted line an exponential decay with an e-folding length of 500 km.

8.2. Gradient-based and random search algorithms performance in single and multiple site optimisation

In this study, we compare two different assimilation approaches — gradient-based (L-BFGS-B algorithm) and random search (Genetic algorithm) — to optimise the ORCHIDEE model with the eddy-covariance observational data (net ecosystem exchange and latent heat fluxes) from 78 sites grouped in seven plant functional types. Assimilation performance is evaluated independently for the single site and multiple site optimisation schemes. We show that the random search approach generally succeeds to provide more appropriate model parameters, however its superiority is significant mostly for the single-site assimilation scheme, while in multi-site approach the gradient-based algorithm tends to reveal a similar performance.

Ref.: Bastrikov V. et al., Geosci. Model Dev., 11, 4739-4754, doi: 10.5194/gmd-11-4739-2018, 2018 

Figure 16. Mean model–data RMSD for the daily NEE (upper panel) and LE (lower panel) time series at each site, grouped and averaged by PFT (horizontal bars) for single-site variational (SV, green), single-site genetic (SG, red), multiple-site variational (SV, blue) and multiple-site genetic (SG, orange) methods. In the last column the data are grouped by different assimilation techniques and presented by boxplots with central horizontal bar indicating the median value, top and bottom of the boxes corresponding to the first and last quartiles, and whiskers encompassing the values within 1.5 IQR (interquartile range).