rm(list=ls()) # clean workspace
# setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) # Set working directory to the current file
# PLANT PARAM PATH
<- "Wheat.xml"
path_wheat
# SET HERE YOUR PYTHON3 PATH
# path_python3 <- "/home/mdago/anaconda3/envs/CPB/bin/python3"
<- "/home/mdago/anaconda3/envs/CPB2/bin/python3"
path_python3
# PYTHON SCRIPTS PATH
<- "python_files/"
path_CPB
# R SCRIPTS PATH
<- "R Functions/" path_R_functions
CPB-LiDAR : RECOVER PLANT DATA FROM LiDAR METRICS USING STRUCTURAL MODELING AND MACHINE LEARNING
Marco D’Agostino1,2, Jordan Bates1, Guillaume Lobet1,2
1Agrosphere IBG3, Forschungszentrum Juelich, 52428 Juelich, Germany
2Earth and Life Institute, UCLouvain, 1348 Louvain-la-Neuve, Belgium
This project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy-EXC 2070-390732324 (PhenoRob).
INTRODUCTION
Field monitoring using Unmanned Aircraft Systems (UAS) is a new promising field of research that aims to retrieve plant information with higher spectral, temporal and spatial resolution than destructive measurements. In particular, measures with LiDAR (Light Detection And Ranging) sensors mounted on UAS are useful to characterize canopy, and are also promising to estimate structural plant parameters (ex LAI, biomass) (J. S. Bates et al. 2021).
Rough LiDAR data consists of a very dense point cloud of the field. With this point cloud, one can define a grid and apply algorithms to extract metrics : Crop Height (indicator of height per gridcell), Gap Fraction (indicator of canopy density per grid cell) and Intensity (which will not be used here). Such metrics can next be correlated with destructive measurements, to build regressions and predict plant parameters from LiDAR data.
More specifically, J. Bates et al. (2022) used a neural network to predict field biomass from LiDAR metrics.
In this project, we want to investigate the idea of using CPlantBox (Zhou et al. 2020), a plant structural model, to emulate LiDAR field data, and find relations between LiDAR metrics and plant parameters. CPlantBox is a functional-structural plant model (FSPM) that can generate 3D plant architectures. CPlantBox generates individual plants. In this work we implemented code to loop the process and generate a whole virtual field. Since these virtual fields are composed of nodes and segments, it is possible to apply algorithms to extract artificial LiDAR data (Crop Height and Gap Fraction) from it. It would be next possible to loop this process with different sets of plant parameters, and study the sensistivity of artificial LiDAR metrics to these plant parameters. Finally, by inverting the process, it would be possible to build regressions of plant parameters from artificial LiDAR metrics, and therefore bring new insights on LiDAR-based plant status predictions.
The goals of this project are :
Parametrize CPlantBox to generate pseudo-realistic wheat field with different sets of plant parameter
Convert CPlantBox field simulation into LiDAR-like data (basically create artificial LiDAR metrics from CPlantBox field simulations).
Use machine learning techniques to retrieve plant parameters from artificial LiDAR metrics.
In this notebook we first present a code pipeline to generate CPlantBox-based virtual fields. Then we use functions to extract artificial LiDAR metrics (Crop Height and Gap Fraction) from the virtual fields. We loop this process to investigate the sensitivity of these LiDAR metrics to 3 plant parameters : stem growth rate, stem inter-lateral distance and number of tillers. At last, we test some very simple machine learning methods (linear regression and neural network) to recover plant parameters from LiDAR metrics.
SET UP
This project consist of a folder containing this notebook as well as R functions and python functions. The folder is dropped into the general CPlantBox folder. We used the 2022 release of CPlantBox (Schnepfa 2022) ; general instructions to install CPlantBox are available here. The installation for our personal setup is described in annexes.
Once CPlantBox is installed, you simply need to put the folder CPB-LiDAR of this repository in CPB/CPlantBox/.
In the following chunk, we define our paths :
path_wheat is the path to Wheat.xml, which is the parameter file read by CPlantBox to generate individual plants
path_python3 is the path to anaconda python to run python files
path_CPB is the path to our python functions
path_R_functions is the path to utilitarian functions for this notebook.
Installation/loading of libraries :
#==========================================================================
<- c("fmsb", "SciViews", "pander", "visreg", "readxl", "EnvStats", "FactoMineR", "factoextra", "knitr", "data.table", "reticulate", "xml2", "tidyverse", "dplyr", "plyr", "viridis", "Hmisc", "ggplot2", "gridExtra", "ggpubr","cowplot", "stats", "emmeans")
libs # "riot"
#==========================================================================
for(pk in libs) {
if (!requireNamespace(pk, quietly = TRUE)) {
# If not installed, install the package
install.packages(pk)
}
library(pk, character.only = T)
}#==========================================================================
<- list.files(path_R_functions, pattern = ".R")
fct for(i in fct){
# print(paste0("R Functions/", i))
source(paste0(path_R_functions, i))
}#==========================================================================
# source("My_Neural_Network_functions.R")
1. DATA DESCRIPTION
The data used in this work is kindly provided by J. Bates, C. Montzka, R. Bajracharya and F. Jonard, and were collected during a flight campaign at the PhenoRob Central Experiment, Campus Klien Altendorf (CKA), Germany, in 2021. More specifically, we focus on one plot composed of 6 subplots of each 1.5 x 3 m. The team of J. Bates measured Crop Height (CH), Gap Fraction (GF) and Intensity (I) at 7 dates using LiDAR sensors mounted on UAS. They also took destructive measurements ; BBCH (development stage) and Biomass.
We only used these data to compare and calibrate our simulations.
<- list.files(path = "data/LiDAR_Metrics/", pattern = ".csv")
files
<- data.frame()
LiDAR_metrics for(i in files){
print(i)
<- as.Date(strsplit(i, split = "_")[[1]][1], format="%Y%m%d")
date_i <- read.csv(paste0("data/LiDAR_Metrics/", i))
file_i $date <- date_i
file_i<- rbind(LiDAR_metrics, file_i)
LiDAR_metrics }
[1] "20210419_LiDAR_Metrics.csv"
[1] "20210519_LiDAR_Metrics.csv"
[1] "20210531_LiDAR_Metrics.csv"
[1] "20210614_LiDAR_Metrics.csv"
[1] "20210705_LiDAR_Metrics.csv"
[1] "20210727_LiDAR_Metrics.csv"
[1] "20210805_LiDAR_Metrics.csv"
<- data.frame(date = unique(LiDAR_metrics$date),
BBCH BBCH = c(28, 33, 36, 61, 76, 87, 90))
<- data.frame(date = unique(LiDAR_metrics$date),
Days day = as.numeric(as.Date(unique(LiDAR_metrics$date)) - as.Date("2020-11-16")))
<- inner_join(LiDAR_metrics, BBCH, by = "date")
LiDAR_metrics
# Read tables
<- read_excel(path = "data/myCKA.xlsx", sheet = "BBCH")
BBCH2 <- read_excel(path = "data/myCKA.xlsx", sheet = "biomass")
Biomass <- read.csv("data/BBCH_scale.csv") BBCH_scale
Destructive data
See the work of Jordan Bates.
<- ggplot(data = Biomass) +
biom geom_line(aes(x = Date, y = Value, color = Organ)) +
# geom_smooth(aes(x = Date, y = Value, color = Organ)) +
facet_wrap(~Type) +
ggtitle("Biomass per organ") +
theme_test()
<- ggplot(data = BBCH2) +
bbch geom_line(aes(x = Date, y = BBCH)) +
ggtitle("Devlopment Stages") +
theme_test()
plot_grid(biom, bbch, ncol = 1)
<- ggplot(data = LiDAR_metrics,
GF_grid aes(x = x-min(x), y = y-min(y), color = GF, fill = GF)) +
#geom_point(size = , shape = 15) +
geom_tile() +
coord_fixed() +
theme_test() +
scale_color_viridis() +
scale_fill_viridis() +
xlab("x") +
ylab("y") +
ggtitle("Gap Fraction") +
facet_wrap(~date)
<- ggplot(data = LiDAR_metrics,
CH_grid aes(x = x-min(x), y = y-min(y), color = Height, fill = Height)) +
# geom_point(size = 1, shape = 15) +
geom_tile() +
coord_fixed() +
theme_test() +
scale_color_viridis() +
scale_fill_viridis() +
xlab("x") +
ylab("y") +
ggtitle("Crop Height") +
facet_wrap(~date)
<- ggplot(data = LiDAR_metrics,
I_grid aes(x = x-min(x), y = y-min(y), color = Intensity, fill = Intensity)) +
# geom_point(size = 1, shape = 15) +
geom_tile() +
coord_fixed() +
theme_test() +
scale_color_viridis() +
scale_fill_viridis() +
xlab("x") +
ylab("y") +
ggtitle("Intensity") +
facet_wrap(~date)
# grid.arrange(GF_grid, CH_grid, I_grid, ncol =1)
GF_grid
CH_grid I_grid
2. CPLANTBOX
In this section we implement functions to use CPlantBox to generate individual plants and group them in a field.
First we define all plant parameters of our field. We use a XML parameter file called Wheat.xml that will be read by CPlantBox. We update that file when we change parameters value. For these tests, results will be stored in Test folder.
We add some new parameters in the XML that define the field we want to generate : number of rows, columns, plots, etc… These field parameters are not directly read by CPlantBox, but are inputs of custom scripts that loop CPlantBox to arrange individual plants in a field setup.
= 130.
simtime = 10.
dx = simtime/dx
N_iter = Sys.Date()
date #==================================================================================
= path_wheat
param_file = 20
grid_size #==================================================================================
# Field parameters
= 12 # number of rows
N_row = 6 # number of columns
N_col = 2 # number of plots
N_plots = 25 # distance between the root systems within a row [cm]
dist_col = 25 # distance between crop row
dist_row = N_col*dist_row + 40 # distance between crop plot
dist_plot #==================================================================================
# Plant parameters
#==================================================================================
## Stem
= 10 # 1
stem_r = 100
stem_lmax = 50
stem_ln = 0.1
stem_theta #==================================================================================
## Tiller
= 1
maxTi # other tiller parameters?
#==================================================================================
## leaf
= 8 # basal length = 8
leaf_lb = 35 # apical length = 35
leaf_la = 0 # inter-lateral dist
leaf_ln = 43 # max length
leaf_lmax = 4 # growth rate
leaf_r = 0.1 # radius
leaf_a = 0.35 # petiole width
leaf_Width_petiole = 1.5 # blad width
leaf_Width_blade = 2 # shape type
leaf_shapeType = 1 # Tropism 1
leaf_tropismT = 9.81 # Gravitropism
leaf_tropismN = 0.1 # Tropism 2
leaf_tropismS = 2 # Axial Resolution
leaf_dx = 1
leaf_dxMin = 0.1
leaf_theta = 1000000000
leaf_rit = 3 # 3
leaf_gf = 100
leaf_areaMax = 100
leaf_geometryN
# =================================================================================
# SET PARAMS
# =================================================================================
source("R Functions/Write_Params.R")
<- read.csv("data/Param_ID.csv")
Param_ID <- CPB_read_params(path_wheat) # Read XML param file
Params <- CPB_updateParams(Params) # Update the parameters
Params CPB_write_param(Params, path = path_wheat) # Write XML param file
<- join(Params, Param_ID)
Params <- Params[c("paramID", "param_value")]
Params $simulationID <- NaN Params
Once we defined our parameters, we write a CPlantBox python file containing instructions, that will be run in Linux subsystem.
This is done with the function Write_CPB.
The argument type can be set to “solo” to simulate one plant, “field” to simulate a field or “field_TS” to simulate time series of one field.
Solo plant
With the following code, we run CPlantBox for one unique plant and plot the result.
#==================================================================================
# Write CPB executable
Write_CPB(type = "solo", # To generate one plant
param_filename = "Wheat.xml", # Will be read by CPB
CPB_filename = "python_files/CPB_solo.py", # Path to save the file
simtime = simtime, # Simulation time
plot = T) # PLot or not the simulation
#==================================================================================
# Unleash CPlantBox
CPlantBox(path_python3, "python_files/CPB_solo.py")
#==================================================================================
This generates a single plant looking like that :
Field
Now we can assemble multiple plants to generate a virtual field. We can run either single date simulation or time series.
For fixed time :
Write_CPB(type = "field",
param_filename = "Wheat.xml",
CPB_filename = "python_files/CPB.py",
params_df = Params,
export_path = "test/",
sim_name = "Test",
plot = T, # Plot the field
vtp = T) # Save in VTP the virtual field
CPlantBox(path_python3, "python_files/CPB.py")
This generates a virtual field looking like that :
This rendering is done with the custom function plot_field, during the CPlantBox looping in WSL.
For time series :
# TIME SERIES
Write_CPB(type = "field_TS",
param_filename = "Wheat.xml",
CPB_filename = "python_files/CPB_timeseries.py",
params_df = Params,
export_path = "test/",
sim_name = "Test",
plot = F,
vtp = T)
CPlantBox(path_python3, "python_files/CPB_timeseries.py")
Load results
Results from CPlantBox are stored as point cloud in a csv file. This is a test simulation, just to show how it looks like.
= read.csv("test/Test.csv") # Load the csv
coords #==============================================================================
<- ggplot(data = coords) +
result_map geom_point(aes(x = X, y = Y, color = Z), size = 0.01) +
coord_fixed() +
scale_color_viridis() +
theme_test()
#==============================================================================
result_map
For now, leaves are represented as points along the center line. CPlantBox do not allow yet to export points from leaves, but only polygons. Future work can be done here.
3. EXTRACT ARTIFICIAL LiDAR METRICS
With the following code, we apply algorithms to extract LiDAR metrics (Crop Height and Gap Fraction) from generated point cloud from CPlantBox.
First we extract Crop Height (CH), Gap Fraction (GF) and Density from the point cloud, per grid cell :
source("R Functions/Extract_Metrics.R")
<- Extract_Metrics(coords, gridsize = 20, method = "max") # Extract CH and GF Metrics
<- ggplot(data = Metrics) +
h1 geom_histogram(aes(x = CH), bins = 20) +
xlab("Crop Height") +
ylab("Count") +
ggtitle("Crop Height Distribution") +
theme_bw()
<- ggplot(data = Metrics) +
h2 geom_histogram(aes(x = GF), bins = 20) +
xlab("Gap Fraction") +
ylab("Count") +
ggtitle("Gap Fraction Distribution") +
theme_bw()
<- ggplot(data = Metrics) +
h3 geom_histogram(aes(x = GF), bins = 20) +
xlab("Density") +
ylab("Count") +
ggtitle("Denstity Distribution") +
theme_bw()
plot_grid(h1, h2, h3)
<- ggplot(data = Metrics) +
Grid_density geom_tile(aes(x = X, y = Y, fill = Density)) +
scale_color_viridis() +
scale_fill_viridis() +
coord_fixed() +
theme_test()
<- ggplot(data = Metrics) +
Grid_GF geom_tile(aes(x = X, y = Y, fill = GF)) +
scale_color_viridis() +
scale_fill_viridis() +
coord_fixed() +
theme_test()
<- ggplot(data = Metrics) +
Grid_CH geom_tile(aes(x = X, y = Y, fill = CH)) +
scale_color_viridis() +
scale_fill_viridis() +
coord_fixed() +
theme_test()
plot_grid(Grid_density, Grid_GF, Grid_CH)
Next we can also extract Gap Fraction per layers, similarly to how J. Bates et al. (2022) extracted Gap Fraction metrics from the point cloud.
source("R Functions/Extract_GF_layers.R")
<- Extract_GF_layers(coords, gridsize = 20, DT = 1, method = "mean", n_layers = 6) GF_Layers
We therefore extracted here Gap Fraction in 6 layers of 20cm height.
ggplot(data = GF_Layers) +
geom_tile(aes(x = X, y = Y, fill = GF)) +
scale_color_viridis() +
scale_fill_viridis() +
facet_wrap(~layer) +
coord_fixed() +
ggtitle("Layers of Gap Fraction") +
theme_test()
4. RUN & LOOP THROUGH PIPELINE
Now that our functions are ready, we can loop the pipeline with different plant parameters.
First we define our simulation parameters :
#==================================================================================
# Setup
#==================================================================================
rm(list=ls())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) # Set working directory to the current file
# DEFINE PATHS
<- "Wheat.xml"
path_wheat <- "/home/mdago/anaconda3/envs/CPB/bin/python3"
path_python3 # path_python3 <- "/home/mdago/anaconda3/envs/CPB2/bin/python3"
<- "python_files/"
path_CPB <- "R Functions/"
path_R_functions #==================================================================================
<- list.files(path_R_functions, pattern = ".R")
fct for(i in fct){
# print(paste0("R Functions/", i))
source(paste0(path_R_functions, i))
}#==================================================================================
# Set basic parameters
#==================================================================================
= 140.
simtime # dx = 60.
# N_iter = simtime/dx
= Sys.Date()
date = path_wheat
param_file = 15
grid_size #==================================================================================
# Field parameters.
#==================================================================================
= 18 # number of rows
N_row = 6 # number of columns
N_col = 2 # number of plots
N_plots = 25 # distance between the root systems within a row [cm]
dist_col = 25 # distance between crop row
dist_row = N_col*dist_row + 40 # distance between crop plot
dist_plot #==================================================================================
# Plant parameters
#==================================================================================
# Stem
= 5 # 1 # stem growth rate
stem_r = 100 # stem max length
stem_lmax = 30 # stem inter-lateral distance
stem_ln = 0.1 # stem insertion angle
stem_theta # Tiller
= 5 # max number of tillers
maxTi # leaf
= 8 # basal length = 8
leaf_lb = 35 # apical length = 35
leaf_la = 0 # inter-lateral dist
leaf_ln = 43 # max length
leaf_lmax = 4 # growth rate
leaf_r = 0.1 # radius
leaf_a = 0.35 # petiole width
leaf_Width_petiole = 0.9 # blad width
leaf_Width_blade = 2 # shape type
leaf_shapeType = 1 # Tropism 1
leaf_tropismT = 9.81 # Gravitropism
leaf_tropismN = 0.1 # Tropism 2
leaf_tropismS = 2 # Axial Resolution
leaf_dx = 1
leaf_dxMin = 0.1
leaf_theta = 1000000000
leaf_rit = 3 # 3
leaf_gf = 50
leaf_areaMax = 100 leaf_geometryN
Next, define what params and how you will make them change.
In this code we made a sensitivity analysis of the effect of 3 plant parameters on LiDAR metrics :
sln : stem inter-lateral distance, so the distance between two leaves. It is an interesting parameter since it will define the number of total leaves, and therefore is expected to influence Gap Fraction in different layers
max_t : the maximum number of tillers. This is an interesting parameter to know in fields since it defines the devlopment stage of the plant, and is expected to greatly influence Gap Fraction through point density
s_r : The stem growth rate. It is a very important physiological parameter and is expected to influence Crop Height.
# ======================================================================
# Define parameters to loop
# ======================================================================
# Stem inter lateral distance
<- round(runif(50, min = 5, max = 50), digits = 1) # stem inter-lat distance
sln <- data.frame(stem_ln = sln)
PS
# Max number of tillers
<- round(runif(2, min = 1, max = 10), digits = 0)
max_t <- data.frame(seed_true_maxTi = max_t)
PS
# Stem Growth Rate
<- round(runif(50, min = 1, max = 20), digits = 1)
sr <- data.frame(stem_r = sr) PS
Run the PIPELINE. This creates a new directory in results folder, named with the date. Each simulation will be stored in this new file.
<- "maxTi" # Indicate here the parameter to vary (check chunk 16)
param_name <- "python_files/CPB.py"
CPB_filename
# RUN CPB LOOP
source("CPB_LOOP.R")
In the next chunk we load each field simulation, and store it in a unique database : results_clean. This table contains, for each different field, a “map” of Gap Fraction, Crop Height and Density.
To simplify this table and try to extract unique informations for each field, we constructed LiDAR features, that summarize each simulation with one value. The most obvious features are mean Gap Fraction and Crop Height per field. We also built “repartition” features, that are, for each field, counts of points between 4 quartiles. In addition, we extracted mean Gap Fraction values for 20 cm height layers.
To summarize ; each field simulation is characterized by the plant input parameter, and 16 features values. These features values will become explaining variables when we will try to recover plant parameters from LiDAR data.
# =============================================================================
<- "results/2023-12-14_1/" # Change the directory to last simulation
dir_name # =============================================================================
<- read.csv(paste0(dir_name, "Results.csv"))
results <- read.csv(paste0(dir_name, "GF_layers_all.csv"))
GF_layers_all <- read.csv(paste0(dir_name, "Simulations.csv"))
simulation <- read.csv(paste0(dir_name, "Parametric_Space.csv"))
PS colnames(PS)[2] <- "simulationID"
# =============================================================================
<- Make_Results(results, simulation, PS)
results_clean # write.csv(results_clean, paste0(dir_name,"results_clean.csv"), row.names = F)
# =============================================================================
<- Make_GF_features(GF_layers_all, simulation, PS) # This takes lottttsss of time
GF_features <- left_join(results_clean, GF_features)
results_full
write.csv(results_full, paste0(dir_name,"results_full.csv"), row.names = F)
# EXPORT FULL SIMULATION RESULTS AS TABLE
<- As_Database2(results_full) # Extract Features
results_clean2 write.csv(results_full, "results/Tillers.csv", row.names = F) # Change name of simulation here
5. RESULTS ANALYSIS
Once the simulations are done, we can load the data and analyse the sensitivity of LiDAR metrics to plant parameters.
Resume of simulations :
- 2023-12-14_2 : Stem inter lateral distance - 50 simulations
- 2023-12-14_1 : leaf_width_blade - 50 simulations (not relevent because leaves are only characterized by the central line)
- 2023-11-24_1 : stem growth rate (stem_r) - 20 simulations
- 2023-11-06_2 : maximum number of tillers (seed_true_maxTi) - 100 simulations
Effect of the number or tillers
The number of tillers has a great effect on LiDAR features, and clear trends are visibles (ex GF_layer6).
Effect of stem inter lateral distance
Stem inter-lateral distance seems to have effect on some features (e.g CHR2, GF_layer2), but not as strong as number of tillers.
Effect of stem growth rate
The stem growth rate seems to be not very influent on LiDAR features.
6. MACHINE LEARNING REGRESSION
In this part we will focus on the effect of number of tillers on LiDAR features, and try to regress the plant param.
We first split the data in Training set and Test set. We define data_clean as our database for the machine learning : we only keep the features and the plant params (we rename it “t” as target).
0) Model Selection
Before training models, we can try to find the best model to choose. We use here the caret library for model selection, and neuralnet package to build Neural Network regression.
How to decide which feature will be actually used? We implement a recursive feature selection, where we make modes with different features and compare their importance with the test set.
In rfeControl, we define the the type of algorithm (functions, here we apply linear functions) and cross validation method (method, here repeatedCV means K-Fold cross validation).
Recursive feature selection
Outer resampling method: Cross-Validated (25 fold, repeated 2 times)
Resampling performance over subset size:
Variables RMSE Rsquared MAE RMSESD RsquaredSD MAESD Selected
1 2.1178 0.4663 1.8427 0.6166 0.3261 0.5542
2 1.1573 0.8044 0.9806 0.4490 0.2162 0.4129
3 1.0568 0.8434 0.9063 0.3689 0.1709 0.3453
4 1.0757 0.8404 0.9216 0.3713 0.1764 0.3517
5 1.0921 0.8280 0.9323 0.3898 0.2178 0.3654
6 1.0585 0.8317 0.9067 0.3921 0.2218 0.3622
7 1.0497 0.8415 0.9031 0.3815 0.2088 0.3493
8 1.0549 0.8396 0.9049 0.3790 0.2082 0.3448
9 1.0344 0.8546 0.9151 0.3491 0.1964 0.3494
10 0.9875 0.8746 0.8719 0.3354 0.1835 0.3376
16 0.9696 0.8868 0.8330 0.3009 0.1594 0.2976 *
The top 5 variables (out of 16):
Mean_GF, GF_layer6, GF_layer1, GF_layer2, GF_layer4
[1] "Mean_GF" "GF_layer6" "GF_layer1"
So this analysis shows that with the top 3 variables we have a nice drop of RMSE : the 3 best variables are Mean_GF, GF_layer6 and GF_layer1
1) Linear model
We test here a very simple linear regression on GF_layer6 and GF_layer1.
Results are satisfying : we have a \(R^2 = 0.81\) .
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 2.337e-15 | 0.04865 | 4.804e-14 | 1 |
GF_layer6 | -0.6527 | 0.07239 | -9.016 | 1.116e-13 |
GF_layer1 | -0.307 | 0.07239 | -4.241 | 6.13e-05 |
Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
---|---|---|---|
80 | 0.4351 | 0.8154 | 0.8106 |
Let’s see how it behaves on test set :
RMSE : 0.437022209286027 | SSE : 1.9098841140924
We get a Root Mean Square Error of 0.437.
2) Neural network
We select here a neural network from neuralnet package, and we train it using caret :
Neural Network
80 samples
16 predictors
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 72, 72, 72, 73, 72, 71, ...
Resampling results across tuning parameters:
layer1 RMSE Rsquared MAE
1 0.3825465 0.8703528 0.3065804
3 0.4775939 0.8006193 0.3762550
5 0.5568222 0.7447823 0.4468709
Tuning parameter 'layer2' was held constant at a value of 0
Tuning
parameter 'layer3' was held constant at a value of 0
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were layer1 = 1, layer2 = 0 and layer3 = 0.
Let’s see the results
# Predict response of test set
<- predict(nn_fit1, test_set)
NN_results1
# pred_df <- data.frame(NN = NN_results1, LM <- Pred_lm, true = test_set$t)
<- data.frame(NN = NN_results1, LM <- Pred_lm, true = test_set$t)
pred_df
ggplot(pred_df) +
# geom_point(aes(x = true, y = LM), color = "blue") +
geom_point(aes(x = true, y = NN), color = "red") +
geom_abline() +
theme_bw()
= RMSE(test_set$t, NN_results1)
Er = SSE(test_set$t, NN_results1)
Er2 print(paste0("RMSE : ", Er, " | SSE : ", Er2))
[1] "RMSE : 0.447593358888904 | SSE : 2.00339814921451"
It is pretty good too.
# Run NN mannually
<- neuralnet(data = train_set,
NN1 formula = t ~ Mean_GF + Mean_CH + GFR2 + GFR1 + GF_layer6,
hidden = c(1),
rep = 20,
lifesign = "none",
lifesign.step = 2000,
algorithm = "backprop",
threshold = 1,
learningrate = 0.0001)
<- predict(NN1, test_set)
NN_results1
= RMSE(test_set$t, NN_results1)
Er = SSE(test_set$t, NN_results1)
Er2 print(paste0("RMSE : ", Er, " | SSE : ", Er2))
Conclusion
In this document, we provide a pipeline to generate virtual fields using CPlantBox and extract artificial LiDAR metrics (Crop Height and Gap Fraction) from it. We also provide codes to build unique features from LiDAR metrics maps.
We made a loop of the pipeline to study the sensitivity of LiDAR features to different plant parameters (namely stem growth rate, stem inter-lateral distance and maximum number of tillers). Our results show that the number of tillers greatly influence LiDAR features.
In the last part, we tried to recover the number of tillers from LiDAR features using a linear regression and a neural network model. The results are promising, and call for further investigations, for example with more plant paremeters and more computing power. It would also be interesting to study the combined influence of plant parameters on LiDAR metrics.
Despite the fact that our virtual plants and fields are not visually realistic, plant parameters and model still can be improved to increase the precision of representations.
We show here a proof of concept that we can use structural modeling to help predicting plant status using LiDAR data. Such methodology could reduce the cost of destructive measurements and improve predictions on precise plant parameters.
Appendix : CPlantBox installation for our particular setup
This project was implemented in Windows 10-11
The R version is 4.3.2 (installed 22-12-2023) and RStudio version is 01-09-2023
CPlantBox runs under Linux, so we needed to install a Windows Linux Subsystem. It is easy to install with these instructions.
1) Install and setup Anaconda in Linux
We downloaded Anaconda installation file here. We took the most recent file looking like […]linux-x86-64.sh
Put it in
$HOME
(your virtual Linux directory, something like \wsl.localhost)Open command prompt and run
wsl
Go to your Home directory :
$ cd $HOME
Run
bash [...]linux-x86-64.sh
with […] the rest of your anaconda file. This install Anaconda to your linux subsystemCreate an environment for CPlantBox :
$ conda create -n CPB1 python=3.9
$ conda activate CPB1
or$source activate CPB1
- NB : Make sure that the version of your new environment is the same as the one indicated in the file plantbox.cpython-310-x86_64-linux-gnu.so (here 3.10)
2) Install CPlantBox
- Download CPlantBox source code. This pipeline uses the 2022 release (Schnepfa 2022).
- Create a CPB folder and unzip the source code
- Open Prompt and launch WSL
- Check that the file installCPlantBox.py is in the directory
- Run
$ python3 installCPlantBox.py
This can be tricky, so don’t hesitate to contact me or someone from the ROSI team. The installation can take a few minutes - Once it is complete, try one example :
$ python3 CPB/CPlantBox/tutorial/examples/python/example_plant.py