name: layout-general layout: true class: left, top <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: rgba(2, 70, 79, 0.874); } .remark-slide-number .numeric { position: absolute; bottom: 4%; height: 4px; display: block; right: 2.5%; font-weight: bold; } .remark-slide-number .numeric-out{ color: rgba(2, 70, 79, 0.874); } </style> --- count: false class: middle, center, title-slide <img src="img/sorbonne.png" align="left" height="90"> <img src="img/LPSM.jpg" align="middle" height="100"> <img src="img/UniversiteParisCite_logo.jpg" align="right" height="90"> <hr class="L1"> ## Modèles prédictifs par agrégation consensuelle et applications <hr class="L2"> #### Soutenance de thèse de Doctorat<br><br> en Mathématiques de <br><br>Sorbonne Université .fl.w-30.pa3.f6[ .left[ .bo[.lgt-green[Présentée par :]] .bo[.darkgreen-nor[Sothea .textsc[Has]]]] ] .fl.w-40.pa3.f6[ .bo[.lgt-green[11 juillet 2022]] ] .fl.w-30.pa3.f6[ .right[ .bo[.lgt-green[Directrices de thèse :]] .bo[.darkgreen-nor[Mathilde .textsc[Mougeot]]]<br> .bo[.darkgreen-nor[Aurélie .textsc[Fischer]]]] ] </br></br> --- exclude: true class: left, middle .fl.w-60.pa2.f6[ ### Main focus<hbr> - Supervised learning : using input `\((X_1, ..., X_d)\)` to predict output `\((Y)\)`. | `\(X_1\)` | `\(\dots\)` | `\(X_d\)` | `\(Y\)` | | -------|---------|-------|---- | `\(x_{11}\)` | `\(\dots\)` | `\(x_{1d}\)` | `\(y_1\)` | | `\(\vdots\)`| `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | `\(x_{n1}\)`| `\(\vdots\)` | `\(x_{nd}\)` | `\(y_n\)` | ### Main question<hbr> - How to use information such as - .stress[clustering structure] of the input - .stress[predicted features] (given by a list of estimators: small or large) for .stress[prediction]? ] .fl.w-40.pa2.f6[ <br> <br> <br> <br> <br> .center[ <img src="./img/tools.png" width="250px"/>] ] ??? - show picture after point 2. - - clustering step is very crucial for the final prediction - Finding the right configuration is not easy. - .stress[KFC] procedure aims at flexibly combining both .stress[clustering] and .stress[modeling] for prediction. --- class: left, middle .fl.w-70.pa2.f6[ ### Main focus<hbr> - Supervised learning : using input `\((X_1, ..., X_d)\)` to predict output `\((Y)\)`. .center[ | `\(X_1\)` | `\(X_2\)` | `\(\dots\)` | `\(X_d\)` | `\(Y\)` | |--------|-------|---------|-------|---- | `\(x_{11}\)` | `\(x_{12}\)` | `\(\dots\)` | `\(x_{1d}\)` | `\(y_1\)` | | `\(\dots\)` | `\(\dots\)` | `\(\dots\)` | `\(\dots\)` | `\(\dots\)` | | `\(x_{n1}\)` | `\(x_{n2}\)`| `\(\dots\)` | `\(x_{nd}\)` | `\(y_n\)` | <hbr> ] ### Main question<hbr> - How to use .stress[clustering structure of input] for .stress[prediction]? .center[<img src="./img/goal.png" width="400px"/>] ] .fl.w-30.pa2.f6[ <br> <br> <br> <br> <br> .center[ <img src="./img/tools.png" width="220px"/>] ] --- template: inter-slide class: left, middle count: false ##
.bold-blue[Outline] <br> .hhead[I. KFC procedure : clusterwise prediction via aggregation of distances] <br> .hhead[II. A kernel-based consensual aggregation for regression] <br> .hhead[III. Aggregation on projected HD predicted features for regression] --- template: inter-slide class: left, middle count: false ##
.bold-blue[Outline] <br> .section[I. KFC procedure : clusterwise prediction via aggregation of distances] <br> .hhead[II. A kernel-based consensual aggregation for regression] <br> .hhead[III. Aggregation on projected HD predicted features for regression] --- exclude: true class: left, top ## Two-step approach <hbr> Clustering step is very crucial & finding the right one is not easy! .fl.w-50.pa2.f6[
] .fl.w-50.pa2.f6[
] --- class: left, top .fl.w-60.pa2.f6[ ### Notation<hbr> - `\((X,Y)\in\mathbb{R}^d\times\mathcal{Y}\)`: input-output generic data - `\(\mathcal{Y}=\mathbb{R}\)` (regression) or `\(\{1,...,N\}\)` (classification) - `\(iid\)` training data of size `\(n\)`: `$$\mathcal{D}_n=\{(X_1,Y_1),...,(X_n,Y_n)\}$$` ### Case of interest<hbr> - Input is composed of `\(K\)` clusters (known) - The group structure is unknown - The underlying models may be different on different clusters. ### Objective<hbr> - Making good predictions. ] .fl.w-40.pa2.f6[
- Two-step approach: <hbr> - [Auder and Fischer (2012)](https://www.tandfonline.com/doi/abs/10.1080/00949655.2011.572882) <hbr> - [Devijver et al. (2015)](https://arxiv.org/abs/1507.00167)<hbr> - [Keita et al. (2015)](https://hal.laas.fr/UNIV-HESAM/hal-02471608v1), ... ] ??? - Clustering is crucial - Final performance depends strongly on clustering step --- class: left, top ## KFC procedure<hbr> .subtitle[ K-means / Fitting / Combining]<hbr> - It is a three-step approach, aiming at flexibly combining .stress[clustering] and .stress[modeling]. ------ .fl.w-10.pa2.f6[ <br> <br> .stress[Step K] <br> <br> <br> <br> <br> <br> <br> .stress[Step F] <br> <br> <br> .stress[Step C] ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{1,1},\mathcal{M}_{1,2},\mathcal{M}_{1,3}\}}_{\mathcal{M}_1}$$` ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{2,1},\mathcal{M}_{2,2},\mathcal{M}_{2,3}\}}_{\mathcal{M}_2}$$` <br> <h1br> .center[.stress[Combining]] `$$(\mathcal{M}_1, \mathcal{M}_2, \mathcal{M}_3)$$` ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{3,1},\mathcal{M}_{3,2},\mathcal{M}_{3,3}\}}_{\mathcal{M}_3}$$` ] <br> <br> <br> <br> <br> <br> <br> -------- <br> <br> <br> <h1br> -------- <br> <br> <br> -------- ??? - The two-step approach is implemented with several distances - Then combine --- ### Step K : K-means with Bregman divergences (BD)<hbr> .subtitle[ [Breman (1967)](https://www.sciencedirect.com/science/article/abs/pii/0041555367900407)]<hbr> .hhead[Definition] `\(\phi:\mathcal{C}\subset\mathbb{R}^d\rightarrow\mathbb{R}\)`, strictly convex and of class `\(C^1(\cal C)\)`, then for any `\((x,y)\in\mathcal{C}\times int(\mathcal{C})\)` $$ d_{\phi}(x,y)=\phi(x)-\phi(y)-\langle x-y,\nabla\phi(y)\rangle $$ <h1br> .fl.w-60.pa2.f6[ .hhead[Property] - Non-negativity, separability, convexity (first argument), linearity in index function - .stress[Mean as minimizer]: `\(E(X)=\arg\min_{c}\mathbb{E}[d_{\phi}(X,c)]\)` <br> .hhead[K-means with BD] - [Banerjee et al. (2005)](https://www.jmlr.org/papers/v6/banerjee05b.html)<hbr> - [Fischer (2010)](https://www.sciencedirect.com/science/article/pii/S0047259X10001156). ] .fl.w-40.pa2.f6[ .center[<img src="./img/bd.png" height="270px" width="400px"/>] ] --- exclude: true ## Bregman divergences [[Breman (1967)](https://www.sciencedirect.com/science/article/abs/pii/0041555367900407)]<hbr> .subtitle[ Step K]<hbr> ### Mean minimizer property [[Banerjee et al. (2005)](https://www.jmlr.org/papers/v6/banerjee05b.html)]<h1br> ----- If `\(U\)` is a random variable defined on an open subset `\(\mathcal{O}\subset\mathbb{R}^d\)`, then `$$\mathbb{E}[U]=\arg\min_{x\in\mathcal{O}}\mathbb{E}[d_{\phi}(U,x)]$$` for any index function `\(\phi\)`. ----- ### Remark - If `\(\phi = \|.\|_2^2\Rightarrow d_{\phi}(x,y)=\|x-y\|_2^2\)`: back to variance case. - It is a motivation for K-means algorithm. --- exclude: true ## Exponential families<hbr> .subtitle[ Still step K]<h0br> ### Definition<h1br> ------ `\(X\)` is a member of an exponential family `\(\mathcal{E}_{\psi}\)` if `$$f(x|\theta)=h(x)\exp(\langle\theta,T(x)\rangle-\psi(\theta)), \theta\in\Theta$$` ----- - An exponential family is said to be .stress[regular] if - `\(T\)` is not redundant : `\(\nexists \alpha\neq0: \langle \alpha,T(x)\rangle=\)` constant `\(\forall x\)` - `\(\Theta\)` is open.<h0br> ### Theorem [[Banerjee et al. (2005)](https://www.jmlr.org/papers/v6/banerjee05b.html)]<h1br> ----- If `\(X\)` is a member of a .stress[regular] exponential family `\(\mathcal{E}_{\psi}\)` and if `\(\phi\)` is the convex conjugate of `\(\psi\)` defined by `\(\phi(x)=\sup_{y}\{\langle x,y\rangle-\psi(y)\}\)`, then there exists a unique Bregman divergence `\(d_{\phi}\)` such that `$$f(x|\theta)=h(x)\exp(-d _{\phi}(T(x),\mathbb{E}[T(X)])+\phi(T(x)))$$` ----- --- class: left, top ### Step F : Fitting clusterwise models<hbr> .subtitle[ Fitting local models & candidate models]<hbr> - For each Bregman divergence, simple models are fitted on all clusters in .stress[Step F].<hbr> - Linear regression (regression)<hbr> - Logistic regression (classification)<hbr> ------ .fl.w-10.pa2.f6[ <br> <br> .stress[Step K] <br> <br> <br> <br> <br> <br> <br> .stress[Step F] ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{1,1},\mathcal{M}_{1,2},\mathcal{M}_{1,3}\}}_{\mathcal{M}_1}$$` ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{2,1},\mathcal{M}_{2,2},\mathcal{M}_{2,3}\}}_{\mathcal{M}_2}$$` ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{3,1},\mathcal{M}_{3,2},\mathcal{M}_{3,3}\}}_{\mathcal{M}_3}$$` ] <br> <br> <br> <br> <br> <br> <br> -------- <br> <br> <br> <h1br> -------- - To predict : `\(\mathcal{M}_m(x)=\mathcal{M}_{m,k^*}(x)\)` such that `\(x\)` belongs to cluster `\(k^*\)`. ??? - Simple model for interpretability - Other models also possible --- exclude: true ## Candidate models<hbr> .subtitle[ Step F]<hbr> - .stress[Step K] : .stress[K-means] with `\(M\)` options of Bregman divergences `\(\mathcal{B}_m, m=1,...,M\)`, are implemented. We then have `\(M\)` partition structures of in put data: `$$\mathcal{C}_1,...,\mathcal{C}_M$$` where `\(\mathcal{C}_m=\{C_{m,1},...,C_{m,K}\}\)` is the partition corresponding to `\(\mathcal{B}_m\)`. - .stress[Step F] : fit simple local model on each obtained cluster. - Linear regression (regression) - Logistic regression (classification) - ... We then have `\(M\)` candidate models: `$$\mathcal{M}_1, ..., \mathcal{M}_M$$` where `\(\mathcal{M}_m=\{\mathcal{M}_{m,1},...,\mathcal{M}_{m,K}\}\)` with `\(\mathcal{M}_{m,k}\)` built on `\(C_{m,k}\)`. ??? - Then, in step C, how would we combine the candidate models? --- ### Step C : Combining estimation methods<hbr> .subtitle[ Consensual aggregation methods]<hbr> .hhead[Classification] - Example: binary classification - With `\(4\)` classifiers `\(\mathcal{M}=(\mathcal{M}_1,\mathcal{M}_2,\mathcal{M}_3,\mathcal{M}_4)\)` - New observation `\(x\)` with predicted classes : `\((\mathcal{M}_1,\mathcal{M}_2,\mathcal{M}_3,\mathcal{M}_4)=\)` (.red[1,1,0,1]). | `\(i\)` | `\(\mathcal{M}_1\)`| `\(\mathcal{M}_2\)` | `\(\mathcal{M}_3\)` | `\(\mathcal{M}_4\)` | `\(Y\)` | |------|-------------|--------------|-----------------------------|----- | `\(1\)` | 0 | 1 | 0 | 1 | 0 | | .bo[.blue[2]] | .red[1] | .red[1] | .red[0] | .red[1] | .bo[.blue[1]] | | `\(3\)` | 1 | 0 | 0 | 1 | 1 | | .bo[.blue[4]] | .red[1] | .red[1] | .red[0] | .red[1] | .bo[.blue[0]] | | .bo[.blue[5]] | .red[1] | .red[1] | .red[0] | .red[1] | .bo[.blue[1]] | - [Mojirsheibani (1999)](https://www.tandfonline.com/doi/abs/10.1080/01621459.1999.10474154) : majority vote: `\(\hat{y}=\mathbb{1}_{\{\sum_i Y_i\mathbb{1}_{\{\mathcal{M}(x)= \mathcal{M}(X_i)\}} > 1/2\}}\)`. - [Mojirsheibani (2000)](https://www.tandfonline.com/doi/abs/10.1080/01621459.1999.10474154) : weighted vote: `\(\hat{y}=\mathbb{1}_{\{\sum_i Y_iK_h(d_{\mathcal{H}}(\mathcal{M}(x), \mathcal{M}(X_i)))>1/2\}}\)`. ??? - This allows us to combine all the candidate classifiers in Step C - How about in regression? --- class: left, top ### Step C : Combining estimation methods<hbr> .subtitle[ Consensual aggregation methods]<hbr> .hhead[Regression]<h1br> .fl.w-50.pa2.f6[ - Example: with `\((r_1,r_2,r_3,r_4)\)` | `\(i\)` | `\(r_1\)`| `\(r_2\)` | `\(r_3\)` | `\(r_4\)` | `\(Y\)` | |------|-------------|--------------|-----------------------------|------ | `\(1\)` | `\(r_1(X_1)\)` | `\(r_2(X_1)\)` | `\(r_3(X_1)\)` | `\(r_4(X_1)\)` | `\(Y_1\)` | | `\(2\)` | `\(r_1(X_2)\)` | `\(r_2(X_2)\)` | `\(r_3(X_2)\)` | `\(r_4(X_2)\)` | `\(Y_2\)` | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | `\(n\)` | `\(r_1(X_n)\)` | `\(r_2(X_n)\)` | `\(r_3(X_n)\)` | `\(r_4(X_n)\)` | `\(Y_n\)` | - Form: weighted average `$$\hat{y}=\sum_{i=1}^{n}W_{n,i}(x)Y_i.$$` - Let `\({\bf r}(x)=(r_1(x), ... ,r_M(x))\in\mathbb{R}^M\)` ] .fl.w-50.pa2.f6[ - [Biau et al. (2016)](https://www.sciencedirect.com/science/article/pii/S0047259X15000950): `$$W_{n,i}(x)=\frac{\prod_{m=1}^M\mathbb{1}_{\{|r_{m}(x)-r_{m}(X_i)|<h\}}}{\sum_{j=1}^{n}\prod_{m=1}^M\mathbb{1}_{\{|r_{m}(x)-r_{m}(X_j)|<h\}}}$$` - [Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5): `$$W_{n,i}(x)=\frac{K_h({\bf r}(x)-{\bf r}(X_i))}{\sum_{j=1}^{n}K_h({\bf r}(x)-{\bf r}(X_j))}$$` - [Fischer and Mougeot (2019)](https://www.sciencedirect.com/science/article/pii/S0378375818302349): `$$W_{n,i}(x)=\frac{K_{\alpha, \beta}(x-X_i,{\bf r}(x)-{\bf r}(X_i))}{\sum_{j=1}^{n}K_{\alpha, \beta}(x-X_j,{\bf r}(x)-{\bf r}(X_j))},$$` with the convention `\(0/0=0\)`. ] ??? - Imagine I have 4 predictors that can give me the predictions of all the training data. - now, the prediction is the weighted average among the response variable, and the weights (the score for how close or far the data point from `\(x\)`), is now measured using this predicted features. - MixCobra is for both classification and regression. From now we will can it, aggregation with input. --- class: left, top ## KFC Procedure<hbr> .subtitle[ A bit more]<hbr> - `\(\mathcal{D}_n=\mathcal{D}_k\cup\mathcal{D}_{\ell}\)`. - .stress[Step K] and .stress[F] are based only on `\(\mathcal{D}_k\)`, and .stress[Step C] is done based on `\(\mathcal{D}_{\ell}\)`. ------ .fl.w-10.pa2.f6[ <br> <br> .stress[Step K] <br> <br> <br> <br> <br> <br> <br> .stress[Step F] <br> <br> <br> .stress[Step C] ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{1,1},\mathcal{M}_{1,2},\mathcal{M}_{1,3}\}}_{\mathcal{M}_1}$$` ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{2,1},\mathcal{M}_{2,2},\mathcal{M}_{2,3}\}}_{\mathcal{M}_2}$$` <br> <h1br> .center[.stress[Combining]] `$$(\mathcal{M}_1, \mathcal{M}_2, \mathcal{M}_3)$$` ] .fl.w-30.pa2.f6[
<br> `$$\underbrace{\{\mathcal{M}_{3,1},\mathcal{M}_{3,2},\mathcal{M}_{3,3}\}}_{\mathcal{M}_3}$$` ] <br> <br> <br> <br> <br> <br> <br> -------- <br> <br> <br> <h1br> -------- --- ## Numerical experiment<hbr> .subtitle[ Bregman divergences] - Recall: $$ d_{\phi}(x,y)=\phi(x)-\phi(y)-\langle x-y,\nabla\phi(y)\rangle $$ .center[<img src="./img/all_bd.png" height="275px" width="775px"/>] - Only some are selected for numerical experiments. --- ## Numerical experiment<hbr> .subtitle[ Simulated data (classification)]<h0br>
- Average Misclassification errors of `\(20\%\)`-testing data over `\(20\)` runs, in unit of `\(10^{-2}\)`.<hbr> .center[<img src="./img/kfc_sim_clas1.png" height="200px" width="750px"/>] ??? How to simulate them? --- exclude: true ## Numerical experiment<hbr> .subtitle[ ]<hbr> .center[<img src="./img/kfc_sim_clas.png" height="390px" width="750px"/>] - Average misclassication errors over 20 runs, unit = `\(10^{-2}\)`. - First row: aggregation without input [[Mojirsheibani (1999)](https://www.tandfonline.com/doi/abs/10.1080/01621459.1999.10474154) and [(2000)](https://www.tandfonline.com/doi/abs/10.1080/01621459.1999.10474154)]. - Second row: aggregation with input [[Fischer and Mougeot (2019)](https://www.sciencedirect.com/science/article/pii/S0378375818302349)]. --- exclude: true ## Numerical experiment<hbr> .subtitle[ Simulated data (regression)]<hbr> .center[<img src="./img/kfc_sim_reg.png" height="390px" width="737px"/>] - Average RMSEs over 20 runs. - First row: aggregation without input [[Biau et al. (2016)](https://www.sciencedirect.com/science/article/pii/S0047259X15000950) and [Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5)]. - Second row: aggregation with input [[Fischer and Mougeot (2019)](https://www.sciencedirect.com/science/article/pii/S0378375818302349)]. --- ## KFC in power consumption modeling<hbr> .subtitle[ Real data (Air compressor machine [[Cadet et al., (2005)](https://www.researchgate.net/publication/285367508_Monitoring_energy_performance_of_compressors_with_an_innovative_auto-adaptive_approach)])]<hbr> .fl.w-40.pa2.f6[ - Five predictors: air temperature, input pressure, output pressure, flow and water temperature. - Response variable: power consumption. .center[<img src="./img/compressor.png" height="170px" width="250px"/>] - `\(Comb_3^R\)` and `\(Comb_2^R\)` stand combining method with and without input respectively. - `\(K\)` is not available! ] .fl.w-60.pa2.f6[ .center[<img src="./img/kfc_compressor.png" height="350px" width="450px"/>] .center[<img src="./img/kfc_basic_compressor.png" height="90px" width="300px"/>] ] --- ## KFC in power production modeling<hbr> .subtitle[ Real data (Wind turbine of Maïa Eolis [[Fischer et al., 2017](https://hal.archives-ouvertes.fr/hal-01373429v2)])]<hbr> .fl.w-40.pa2.f6[ - Six predictors: wind speed (real part, imaginary part, and strength), wind direction (sine and cosine) and temperature. - Response variable: power. .center[<img src="./img/wind.jpg" height="170px" width="250px"/>] - Same here, `\(K\)` is not available! ] .fl.w-60.pa2.f6[ .center[<img src="./img/kfc_turbine.png" height="350px" width="450px"/>] .center[<img src="./img/kfc_basic_turbine.png" height="88px" width="290px"/>] ] --- ## Article & reproducibility<hbr> .subtitle[ [Has et al. (2021)](https://www.tandfonline.com/doi/full/10.1080/00949655.2021.1891539) Published in *Journal of Statistical Computation and Simulation*]<hbr> - GitHub
: [https://github.com/hassothea/KFC-procedure](https://github.com/hassothea/KFC-procedure). - Documentation : [https://hassothea.github.io/files/CodesPhD/KFCReg.html](https://hassothea.github.io/files/CodesPhD/KFCReg.html). <iframe width='720px' height='370px' src='https://hassothea.github.io/files/CodesPhD/KFCReg.html' > <p>KFC-Procedure documentation</p> </iframe> ??? - Extreme input data 'leverage', KFC works! --- template: inter-slide class: left, middle count: false ##
.bold-blue[Outline] <br> .hhead[I. KFC procedure : clusterwise prediction via aggregation of distances] <br> .section[II. A kernel-based consensual aggregation for regression] <br> .hhead[III. Aggregation on projected HD predicted features for regression] --- ## Notation - `\(\mathcal{Y} = \mathbb{R}\)` : output space - `\(\mathcal{D}_{n}\)` is partitioned into `\(\mathcal{D}_k=\{(X_i^{(k)},Y_i^{(k)})_{i=1}^k\}\)` and `\(\mathcal{D}_{\ell}=\{(X_i^{(\ell)},Y_i^{(\ell)})_{i=1}^{\ell}\}\)` - `\({\bf r}_k=(r_{k,1},...,r_{k,M})\)`: `\(M\)` regression estimators constructed using `\(\mathcal{D}_k\)` .center[ | `\(i\)` | `\(r_{k,1}\)`| `\(r_{k,2}\)` | `\(\dots\)` | `\(r_{k,M}\)` | `\(Y\)` | |------|-------------|--------------|-----------------------------|------ | `\(1\)` | `\(r_{k,1}(X_1^{(\ell)})\)` | `\(r_{k,2}(X_1^{(\ell)})\)` | `\(\dots\)` | `\(r_{k,M}(X_1^{(\ell)})\)` | `\(Y_1^{({\ell})}\)` | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | `\(\ell\)` | `\(r_{k,1}(X_\ell^{(\ell)})\)` | `\(r_{k,2}(X_\ell^{(\ell)})\)` | `\(\dots\)` | `\(r_{k,M}(X_\ell^{(\ell)})\)` | `\(Y_\ell^{({\ell})}\)` | ] - `\(\eta(x)=\mathbb{E}(Y|X=x)\)` : Regression function - `\(\eta({\bf r}_k(x))=\mathbb{E}(Y|X={\bf r}_k(x))\)`. - Quadratic risk : `\(\mathcal{R}(f)=\mathbb{E}(|f(X)-\eta(X)|^2)\)` --- ## The aggregation method<hbr> - Form : `$$g_n({\bf r}_k(x))=\sum_{i=1}^{\ell}W_{n,i}(x)Y_i^{(\ell)}=\sum_{i=1}^{\ell}\frac{K_h({\bf r}(x)-{\bf r}(X_i^{(\ell)}))}{\sum_{j=1}^{\ell}K_h({\bf r}(x)-{\bf r}(X_j^{(\ell)}))}Y_i^{(\ell)},$$` where `\(K_h(x)=K(x/h)\)` for some `\(h>0\)` with the convention of `\(0/0=0\)`. .fl.w-60.pa2.f6[ - Regular kernel `\(K:\mathbb{R}^d\to\mathbb{R}_+\)` satisfying: - `\(\exists b,\kappa_0,\rho>0: \forall x\in\mathbb{R}^M: b\mathbb{1}_{B_M(0,\rho)}(x)\leq K(x)\leq 1\)` - `\(\int_{\mathbb{R}^M}\sup_{u\in B_M(x,\rho)}K(u)dx = \kappa_0 < +\infty\)` - [Biau et al. (2016)](https://www.sciencedirect.com/science/article/pii/S0047259X15000950) :<h1br> `$$W_{n,i}(x)=\frac{\prod_{m=1}^M\mathbb{1}_{\{|r_{k,m}(x)-r_{k,m}(X_i^{(\ell)})|<h\}}}{\sum_{j=1}^{\ell}\prod_{m=1}^M\mathbb{1}_{\{|r_{k,m}(x)-r_{k,m}(X_j^{(\ell)})|<h\}}}$$` ] .fl.w-40.pa2.f6[ .top[<img src="./img/ker.png" height="220px" width="320px"/>] ] </h1br> -- - What is the theoretical property of the aggregation method? --- ## Theoretical performance<hbr> ### Proposition 1<h1br> ----- Let `\(\textbf{r}_k=(r_{k,1},r_{k,2},...,r_{k,M})\)` be the collection of all basic estimators and `\(g_n(\textbf{r}_k(x))\)` be the combined estimator computed at point `\(x\in\mathbb{R}^d\)`. Then, for all distributions of `\((X,Y)\)` with `\(\mathbb{E}[|Y|^2]< +\infty\)`, `$$\mathbb{E}[|g_n({\bf r}_k(X))-\eta(X)|^2]\leq \inf_{f\in\mathcal{G}}\mathbb{E}[|f({\bf r}_k(X))-\eta(X)|^2]\\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad +\mathbb{E}[|g_n({\bf r}_k(X))-\eta({\bf r}_k(X))|^2],$$` where `\(\mathcal{G}=\{f:\mathbb{R}^M\to\mathbb{R}:\mathbb{E}[|f(\textbf{r}_k(X))|^2]<+\infty\}\)`. In particular, `$$\mathbb{E}\Big[|g_n({\bf r}_k(X))-\eta(X)|^2\Big]\leq \min_{1\leq m\leq M}\mathbb{E}[|r_{k,m}(X)-\eta(X)|^2]\\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad +\mathbb{E}[|g_n({\bf r}_k(X))-\eta({\bf r}_k(X))|^2].$$` ----- - The two terms can be seen as the bias and variance terms. ??? - First term : bias and cannot be controlled. - Second term : variance term which is the price to pay for combining and to be bound. --- ## Theoretical performance<hbr> ### Proposition 2<h1br> ----- Assume that all the estimators `\(r_{k,1},...,r_{k,M}\)` are bounded. Let `\(h\rightarrow0\)` and `\(\ell\rightarrow+\infty\)` such that `\(h^M\ell\to+\infty\)`. Then `$$\mathbb{E}\Big[|g_n(\textbf{r}_k(X))-\eta(\textbf{r}_k(X))|^2\Big]\rightarrow0\ \text{as }\ell\rightarrow+\infty$$` for all distribution of `\((X,Y)\)` s.t `\(\mathbb{E}[|Y|^2]<+\infty\)`. Thus, `$$\limsup_{\ell\rightarrow+\infty}\mathbb{E}\Big[|g_n(\textbf{r}_k(X))-\eta(X)|^2\Big]\leq\inf_{f\in\mathcal{G}}\mathbb{E}\Big[|f(\textbf{r}_k(X))-\eta(X)|^2\Big].$$` And in particular, `$$\limsup_{\ell\rightarrow+\infty}\mathbb{E}\Big[|g_n(\textbf{r}_k(X))-\eta(X)|^2\Big]\leq\min_{1\leq m \leq M}\mathbb{E}\Big[|r_{k,m}(X)-\eta(X)|^2\Big].$$` ----- ??? - Asymptotic control of the error - No information about rate of convergence --- ## Theoretical performance<hbr> ### Theorem<h1br> ----- Assume that - `\(Y\)` and all the basic machines `\(r_{k,m},m=1,2,...,M\)`, are bounded by `\(R\)` a.s - `\(\exists L>0,\forall k\geq 1\)`: `\(|\eta({\bf r}_k(x))-\eta({\bf r}_k(y))|\leq L\|{\bf r}_k(x)-{\bf r}_k(y)\|,\forall x,y\in\mathbb{R}^d.\)` - `\(\exists R_K,C_K>0\)`: `\(K(z)\leq C_Ke^{-\|z\|^{\alpha}/\sigma}, \forall z\in\mathbb{R}^M\ \text{s.t }\|z\|\geq R_K\)`, `\(\alpha,\sigma>0\)`. Then, with the choice of `\(h\propto \ell^{-\frac{1}{M+2}}\)`, there exists `\(C>0\)` such that `$$\mathbb{E}[|g_n({\bf r}_k(X))-\eta(X)|^2]\leq \min_{1\leq m\leq M}\mathbb{E}[|r_{k,m}(X)-\eta(X)|^2]+C\ell^{-\frac{2}{M+2}}.$$` ----- - Similar rate as in [Biau et al. (2016)](https://www.sciencedirect.com/science/article/pii/S0047259X15000950). - Consequently, we have .stress[consistency inheritance property] : `$$\Big[\exists m_0:\mathcal{R}(r_{k,m_0})\xrightarrow[k\to+\infty]{} 0\Big]\Rightarrow \Big[\mathcal{R}(g_n({\bf r}_k(.)))\xrightarrow[k,\ell\to+\infty]{}0\Big]$$` ??? - So, we are a nice theory on the method. - And in simulation part --- ## Optimization algorithm<hbr> .subtitle[ Gradient descent]<h1br> .fl.w-60.pa2.f6[ - Observation : .center[
] - Objective function : `\(\kappa\)`-fold cross-validation error: `$$\varphi^{\kappa}(h)=\frac{1}{\kappa}\sum_{p=1}^{\kappa}\sum_{(X_j,Y_j)\in F_p}[g_n(\textbf{r}_k(X_j))-Y_j]^2,$$` where `\(g_n(\textbf{r}_k(X_j))=\sum_{(X_i,Y_i)\in \mathcal{D}_{\ell}\setminus F_p}W_{n,i}(X_j)Y_i\)`. ] .fl.w-40.pa2.f6[ - The differentiability of `\(\varphi^{\kappa}\)` depends entirely on `\(K\)`. - .stress[Algorithm] ----- - `Init`: `\(h_0,\lambda,\delta>0,N\)`. - `for` `\(k=1,2,...,\)` `maxIter`: `while` `\(\Big|\frac{d}{d h}\varphi^{\kappa}(h_{k-1})\Big|>\delta\)` `do`: `$$h_k\gets h_{k-1}-\lambda\frac{d}{d h}\varphi^{\kappa}(h_{k-1})$$` - `return` : `\(h_k\)` violating the `while` condition or `\(h_{\text{maxIter}}\)`. ---- ] ??? - For numerical experiments --- ## Numerical experiment<hbr> .subtitle[ Simulated data]<hbr> - Average RMSEs over `\(100\)` runs: - First block: 5 predictors: Lasso, Ridge, `\(k\)`NN, Tree, Random Forest. - Second block: aggregation with different kernels. - Uncorrelated case : `\(X\sim\mathcal{U}[-1,1]^d\)`, <img src="./img/uncor_1.png" height="135px" width="710px"/> - Correlated case : `\(X\sim\mathcal{N}_d(0, \Sigma)\)` with `\(\Sigma_{ij}=2^{-|i-j|}\)`. <img src="./img/corr_1.png" height="135px" width="710px"/> --- ## Numerical experiment<hbr> .subtitle[ Real data]<hbr> - **House**, **Wine**, **Abalone** are public data. **Air** and **Turbine** are private. .center[ <img src="./img/real-cobra.png" height="180px" width="650px"/> ] - Computational time on **Turbine** and **Wine** datasets. .center[ <img src="./img/turbine-time.png" height="190px" width="320px"/> <img src="./img/wine-time.png" height="190px" width="320px"/> ] --- ## Application on Physics data<hbr> .subtitle[ Provided by researchers of CEA [[Kluth et al. (2022)](https://www.frontiersin.org/articles/10.3389/fphy.2022.786639/full)]]<hbr> - Input : `\(L\)`, `\(\alpha\)`, and `\(\log(E)\)`. Response variable : `\(D_{\alpha, \alpha}\)`. - Training data is small (low resolution), predict richer testing data. - .stress[Basic estimators] were already built in [Kluth et al. (2022)](https://www.frontiersin.org/articles/10.3389/fphy.2022.786639/full) on LR data. - .stress[Gaussian] is the aggregation method (on a part of HR data). .center[ <img src="./img/domain_adapt.png" height="190px" width="400px"/><img src="./img/cea1.png" height="200px" width="350px"/> ] -- - This shows domain adaptation-like property of the method. ??? - predicting Pitch angle difusion coefficient. - estimators are built on low resolution data, to predict the test of higher resolution. - So, source and target are different. --- ## Article & reproducibility<hbr> .subtitle[ Under review and available in *HAL* [[Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5)]]<hbr> - GitHub
: [https://github.com/hassothea/AggregationMethods](https://github.com/hassothea/AggregationMethods). - Documentation : [https://hassothea.github.io/files/CodesPhD/KernelAggReg.html](https://hassothea.github.io/files/CodesPhD/KernelAggReg.html). <iframe width='720px' height='370px' src='https://hassothea.github.io/files/CodesPhD/KernelAggReg.html' > <p>Kernel Aggregation documentation</p> </iframe> ??? - Extreme input data 'leverage', KFC works! --- template: inter-slide class: left, middle count: false ##
.bold-blue[Outline] <br> .hhead[I. KFC procedure : clusterwise prediction via aggregation of distances] <br> .hhead[II. A kernel-based consensual aggregation for regression] <br> .section[III. Aggregation on projected HD predicted features for regression] --- exclude: true ## Some studies .fl.w-50.pa2.f6[ ### .darkgreen[Classification] `$$g_n({\bf c}(x))=\text{arg}\max_{1\leq k\leq N}\sum_{i=1}^{\ell}W_{n,i}(x)\mathbb{1}_{\{Y_i^{(\ell)}=k\}}.$$` - [Mojirsheibani (1999)](https://www.tandfonline.com/doi/abs/10.1080/01621459.1999.10474154): `$$W_{n,i}(x)=\mathbb{1}_{\{{\bf c}(x)={\bf c}(X_i^{(\ell)})\}}$$` - [Mojirsheibani (2016)](https://www.sciencedirect.com/science/article/abs/pii/S0167715216301304): `$$W_{n,i}(x)=K_h(d_\mathcal{H}({\bf c}(x),{\bf c}(X_i^{(\ell)})))$$` where - `\(d_\mathcal{H}\)` : Hamming distance - `\(K_h(x)=K(x/h)\)` for some `\(h>0\)` - with convention `\(0/0 = 0\)` ] -- exclude: true .fl.w-50.pa2.f6[ ### .darkgreen[Regression] `$$g_n({\bf r}(x))=\sum_{i=1}^{\ell}W_{n,i}(x)Y_i^{(\ell)}.$$` - [Biau et al. (2016)](https://www.sciencedirect.com/science/article/pii/S0047259X15000950): `$$W_{n,i}(x)=\frac{\prod_{m=1}^M\mathbb{1}_{\{|r_{k,m}(x)-r_{k,m}(X_i^{(\ell)})|<h\}}}{\sum_{j=1}^{\ell}\prod_{m=1}^M\mathbb{1}_{\{|r_{k,m}(x)-r_{k,m}(X_j^{(\ell)})|<h\}}}$$` - [Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5): `$$W_{n,i}(x)=\frac{K_h({\bf r}(x)-{\bf r}(X_i^{(\ell)}))}{\sum_{j=1}^{\ell}K_h({\bf r}(x)-{\bf r}(X_j^{(\ell)}))}$$` ] ??? - Une idée clé de la prédiction est d'identifier le voisin du point de données dans l'espace d'entrée. C'est la même pour aggregation consenseille, sauf que les voisons sont en sense de predictions. - La classe prédite est la classe avec le poids le plus lourd. - Pour predire un classe de point `\(x\)`, on cherche pour tout les points `\(x_i\)` de donnée apprentisage avec la même préditions que `\(x\)`. --- exclude: true ## .darkgreen[Consistency Inheritence Property (CIP)] .fl.w-50.pa2.f6[ ### .darkgreen[Classification] - Loss: `\(\ell(x,y)=\mathbb{1}_{\{x\neq y\}}\)` - Risk: `\(\mathcal{R}_{\bf c}(f)=\mathbb{P}(f(X)\neq Y)\)` - Result: for any basic classifier `\(c_{k,m}\)`, `$$\lim_{k,\ell\to+\infty}\sup\{\mathcal{R}_{\bf c}(g_n({\bf c}(.))-\mathcal{R}_{\bf c}(c_{k,m}(.))\}\leq0.$$` - In particular, `$$\exists c_{k,m_0}\text{ consistent }\Rightarrow\text{ so is }g_n({\bf c}(.)).$$` <br> <br> <br> <br> <br> ] -- exclude: true .fl.w-50.pa2.f6[ ### .darkgreen[Regression] - Loss: `\(\ell(x,y)=(x- y)^2\)` - Risk: `\(\mathcal{R}_{\bf r}(f)=\mathbb{E}[(f(X)- Y)^2]\)` - Let `\(\eta(x)=\mathbb{E}(Y|X=x)\)` be the regression function, and `$$\mathcal{R}_{\bf r}(f,\eta)=\mathbb{E}[(f(X)-\eta(X))^2].$$` - Result: `$$\mathcal{R}_{\bf r}(g_n({\bf r}(.)),\eta) \leq \min_{1\leq m\leq M}\mathcal{R}_{\bf r}(r_{k,m},\eta)+\mathcal{V}_{\ell}.$$` - In particular, `$$\exists r_{k,m_0}\text{ consistent }\Rightarrow\text{ so is } g_n({\bf r}(.)).$$` ] --- ## Motivation<hbr> - Practical question: does it work with the same type of estimators? Several types? .center[<img src="./img/agg.png" width="350px"/>] - .stress[High-dimension] refers to .stress[the number of basic estimators] `\(M\)`. - A dimensional reduction is needed. ??? - Prediction matrix : `\({\bf r}_k(\mathcal{X})\in\mathbb{R}^{\ell\times M}\)`. .center[ | `\(r_{k,1}\)`| `\(r_{k,2}\)` | `\(\dots\)` | `\(r_{k,M}\)` | |----------|-----------|---------|-----------| | `\(r_{k,1}(X_1)\)` | `\(r_{k,2}(X_1)\)` | `\(\dots\)` | `\(r_{k,M}(X_1)\)` | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | | `\(r_{k,1}(X_\ell)\)` | `\(r_{k,2}(X_\ell)\)` | `\(\dots\)` | `\(r_{k,M}(X_\ell)\)` | ] - In practice, we combine `\(5\)` or `\(6\)` estimators, and the aggregation tends to the best one. But those candidates are often not at their full potentials (no cross-validation). - In theory: some assumptions like Lipscitz condition is not verifiable. - There are theory in classification that `\(M\to\infty\)`. --- ## Jonhson-Lindenstrauss Lemma (J-L)<hbr> .subtitle[ Implementation] - For any `\(m<M\)`, let `\(G=(G_{i,j})\)` be an `\(M\times m\)`, with `\(G_{i,j}\sim\mathcal{N}(0,1/m)\)` independent. - Random projection: `$$\tilde{\bf r}_k(\mathcal{X})={\bf r}_k(\mathcal{X})\times G\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\ \\ =\begin{pmatrix} r_{k,1}(X_1) & \dots & r_{k,M}(X_1) \\ r_{k,1}(X_2) & \dots & r_{k,M}(X_2) \\ \vdots & \ddots & \vdots \\ r_{k,1}(X_\ell) & \dots & r_{k,M}(X_\ell) \\ \end{pmatrix}\times \begin{pmatrix} G_{11} & \dots & G_{1m} \\ G_{21} & \dots & G_{2m} \\ \vdots & \ddots & \vdots \\ G_{M1} & \dots & G_{Mm} \\ \end{pmatrix}\\ \quad\ =\begin{pmatrix} \tilde{r}_1(X_{1}) & \tilde{r}_2(X_{1}) & \dots & \tilde{r}_m(X_{1}) \\ \tilde{r}_1(X_{2}) & \tilde{r}_2(X_{2}) & \dots & \tilde{r}_m(X_{2}) \\ \vdots & \vdots & \vdots & \vdots \\ \tilde{r}_1(X_{\ell}) & \tilde{r}_2(X_{\ell}) & \dots & \tilde{r}_m(X_{\ell}) \\ \end{pmatrix}_{\ell \times m}.\quad\quad\quad\quad\quad\quad\quad\quad$$` --- ## .darkgreen[Jonhson-Lindenstrauss Lemma (J-L)]<hbr> ### .darkgreen[Lemma (Johnson-Lindenstrauss)]<hbr> ----- Let `\(S_n = \{z_j\in\mathbb{R}^M:j=1,2,...,n\}\)` denote a subset containing `\(n\)` points of `\(\mathbb{R}^M\)`, `\(z_0\in\mathbb{R}^M\)` fixed. Let `\(\tilde{z_0}\)` and `\(\tilde{z_j}\)` be the projected point of `\(z_0\)` and `\(z_j\)` respectively into `\(\mathbb{R}^m\)` using the described random projection. Thus, for any `\(\delta\in(0,1)\)`, one has: `$$\mathbb{P}\Big(\Big|\frac{\|\tilde{z_0}-\tilde{z_j}\|^2}{\|z_0-z_j\|^2}-1\Big|\leq \delta,\text{ for all } z_j\in S_n\Big)\geq 1-2n\exp(-m(\delta^2/2-\delta^3/3)/2).$$` ------ <hbr> .fl.w-50.pa2.f6[
] .fl.w-50.pa2.f6[
] --- ## Aggregation scheme - Two steps: .stress[projection] and .stress[aggregation].<br> More precisely, for any `\(x\in\mathbb{R}^d\)`, `$$g_n(\tilde{\bf r}_k(x))=\sum_{i=1}^{\ell}\frac{K_h(\tilde{\bf r}_k(x)-\tilde{\bf r}_k(X_i^{(\ell)}))}{\sum_{j=1}^{\ell}K_h(\tilde{\bf r}_k(x)-\tilde{\bf r}_k(X_j^{(\ell)}))}Y_i^{(\ell)},$$` where `\(K_h(x)=\exp(-\|x/h\|^{\alpha}/\sigma)\)` for some `\(\alpha,\sigma\)` and `\(h>0\)`. <br> .center[<img src="./img/scheme2.png" height="200px" width="700px"/>] ??? - The result doesn't depend on `\(M\)`. --- ## Theoretical result - Let `\(g_n({\bf r}(.))\)` and `\(g_n(\tilde{\bf r}(.))\)` be the aggregation using .stress[full] predicted features and .stress[projected] features respectively. ### Theorem<hbr> ------ Assume that all the regressors `\(r_1,r_2,...,r_M\)` and the response variable `\(Y\)` are bounded almost surely by `\(R_0\)`, thus for any `\(h,\varepsilon>0, n\geq1,\)` and for any `\(\delta\in(0,1)\)`, with the choice of `\(m\)` satisfying: `\(m\geq C_1\frac{\log[2/(1-\sqrt[n]{1-\delta})]}{h^{2\alpha}\varepsilon^2}\)`, with `\(C_1=3(2+\alpha)^2(2R_0)^{2(1+\alpha)}/\sigma^2\)`, one has: $$ \mathbb{P}\Big(|g_n({\bf r}(X))-g_n(\tilde{\bf r}(X))|>\varepsilon\Big)\leq \delta. $$ ------ - For large `\(n\)`, the lower bound of `\(m\)` is of order `\(O\Big(\frac{\log(2n/\delta)}{h^{2\alpha}\varepsilon^2}\Big)\)`. ??? - `\(m\)` assez grand. --- exclude: true ## Theoretical result - Let `\(g_n({\bf r}(.))\)` and `\(g_n(\tilde{\bf r}(.))\)` be the aggregation using .blue[full] and .blue[projected] features respectively. ## .darkgreen[Theorem] Assume that all the regressors `\(r_1,r_2,...,r_M\)` and the response variable `\(Y\)` are bounded almost surely by `\(R_0\)`, thus for any `\(h,\varepsilon>0, n\geq1,\)` and for any `\(\delta\in(0,1)\)`, $$ \mathbb{P}\Big(|g_n(\textbf{r}(X))-g_n(\tilde{\textbf{r}}(X))|>\varepsilon\Big)\leq 1-\Big[1-2\exp\Big(-\frac{mh^{2\alpha}\varepsilon^2}{3R_1^{2}}\Big)\Big]^n $$ --- exclude: true ## Numerical simulation<hbr> .subtitle[ Basis regressors]<hbr> - `\(k\)`NN, elastic net, bagging, random forest, boosting. - `\(200\)` values of parameter each `\(\Rightarrow M=1000\)` (very highly cor related). - Projected dimension `\(m\in\{2,3,...,9,100,200,...,900\}\)`. - Average root mean square errors of `\(20\%\)`-testing data (RMSE) over `\(30\)` independent runs are reported. - For each model, only the best and the worst perfrmances are reported. --- ## Numerical simulation<hbr> .subtitle[ Simulated data]<h0br> - Estimators: `\(k\)`NN, elastic net, bagging, random forest, boosting. - `\(M=1000\)` ( `\(200\)` for each type), `\(m\in\{2,3,...,9,100,200,...,900\}\)`. .center[<img src="img/sim-mod11.png" align="center" height="220" width="370"><img src="img/sim-mod12.png" align="center" height="220" width="370">] .center[<img src="img/sim-mod-time1.png" align="center" height="220" width="740">] --- ## Numerical simulation<hbr> .subtitle[ Real data]<h0br> .center[<img src="img/real1.png" align="center" height="230" width="380"><img src="img/real2.png" align="center" height="230" width="380">] .center[<img src="img/real-time1.png" align="center" height="240" width="370"><img src="img/real-time2.png" align="center" height="240" width="370">] --- ## Conclusion - Numerically, the method works well in high-dimensional case (very correlated). - The performances are almost preserved in much smaller subspaces. - Several types of (highly redundant) models can be directly aggregated without model selection or cross validation. - Theoretically and numerically, `\(m\approx O(\log(n))\)` works. ## Article & reproducibility - [Has (2022)](https://hal.archives-ouvertes.fr/hal-03631715) available in *HAL*: .stress[Consensual Aggregation on Random Projected High-dimensional Features for Regression]. - GitHub
: [https://github.com/hassothea/AggregationMethods](https://github.com/hassothea/AggregationMethods). --- class: center, middle count: false .center[# Conclusion & perspectives] --- ## Conclusion & summary<hbr> .center[<img src="img/map.png" align="center" height="340" width="780">] ## Future perspectives<hbr> - High-dimensional .stress[KFC procedure] : random projection with Bregman divergences. - Domain adaptation-like property of the aggregation method .stress[II]. --- exclude: true - .stress[KFC procedure] : clustering structure of input & aggregation method <hbr> - Three-step procedure : .stress[K-means / Fitting / Combining] - Article [Has et al. (2021)](https://www.tandfonline.com/doi/full/10.1080/00949655.2021.1891539).<hbr> - .stress[A Kernel-based aggregation for regression] :<hbr> - Theoretical study : consistency inheritance property & convergence rate - Optimization : gradient descent minimizing `\(\kappa\)`-cross validation error - Numerical experiments & application on physics data - Codes & documentation.<hbr> - .stress[Aggregation on randomly projected high-dimensional features] :<hbr> - Theoretical study : high probability bound (.stress[full] - .stress[projected]) - Numerical experiments on simulated and real data - Codes & documentation.<hbr> - Joint work with researchers from CEA [[Kluth et al. (2022)](https://www.frontiersin.org/articles/10.3389/fphy.2022.786639/full)].<hbr> --- count: false class: middle, center # Thank you 🤓 --- exclude: true ## Work summary And coffee `\(\approx 2750\)` ☕ ... --- count: false class: left, top template: inter-slide class: left, top count: false # References<hbr> .small[ 📚 [Aurélie Fischer and Mathilde Mougeot. Aggregation using input-output trade-off. Journal of Statistical Planning and Inference, 200:1–19, May 2019.](https://www.sciencedirect.com/science/article/pii/S0378375818302349) 📚 [B. Auder and A. Fischer. Projection-based curve clustering. Journal of Statistical Computation and Simulation, 82(8):1145–1168, 2012.](https://www.tandfonline.com/doi/abs/10.1080/00949655.2011.572882) 📚 [E. Devijver, Y. Goude, and J.M. Poggi. Clustering electricity consumers using highdimensional regression mixture models. arXiv preprint arXiv:1507.00167, 2015.](https://arxiv.org/abs/1507.00167) 📚 [G. Biau, A. Fischer, B. Guedj, and J.D. Malley. COBRA: a combined regression strategy. Journal of Multivariate Analysis, 146:18–28, 2016.](https://www.sciencedirect.com/science/article/pii/S0047259X15000950) 📚 [G. Kluth, J.-F. Ripoll, S. Has, A. Fischer, M. Mougeot, and E. Camporeale. Machine learning methods applied to the global modeling of event-driven pitch angle diffusion coefficients during high speed streams. Frontiers in Physics, 10, 2022. ISSN 2296-424X. doi: 10.3389/fphy.2022.786639.](https://www.frontiersin.org/articles/10.3389/fphy.2022.786639/full) 📚 [Majid Mojirsheibani. Combined classifiers via discretization. Journal of the American Statistical Association, 94(446):600–609, June 1999.](https://www.tandfonline.com/doi/abs/10.1080/01621459.1999.10474154) 📚 [Majid Mojirsheibani. A kernel-based combined classification rule. Journal of Statistics and Probability Letters, 48(4):411–419, July 2000.]() ] --- template: inter-slide class: left, top count: false # References<hbr> .small[ 📚 [Majid Mojirsheibani and Jiajie Kong. An asymptotically optimal kernel combined classifier. Journal of Statistics and Probability Letters, 119:91–100, 2016.](https://www.sciencedirect.com/science/article/abs/pii/S0167715216301304) 📚 [N. Keita, S. Bougeard, and G. Saporta. Clusterwise multiblock PLS regression. In CFE-CMStatistics 2015, page 195, Londres, Grande Bretagne, December 2015. 8th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics (CMStatistics 2015) ISBN 978-9963-2227-0-4.](https://hal.laas.fr/UNIV-HESAM/hal-02471608v1) 📚 [S. Has, A. Fischer, and M. Mougeot. Kfc: A clusterwise supervised learning procedure based on the aggregation of distances. Journal of Statistical Computation and Simulation, 0(0):1–21, 2021. doi: 10.1080/00949655.2021. 1891539.](https://www.tandfonline.com/doi/abs/10.1080/00949655.2021.1891539) 📚 [William B. Johnson and Joram Lindenstrauss. Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics, 26:189–206, 1984. doi: 10.1090/conm/ 026/737400](https://doi.org/10.1007/BF02764938). 📚 [Sothea Has. A Kernel-based Consensual Aggregation for Regression. Preprint, April 2021, *HAL*.](https://hal.archives-ouvertes.fr/hal-02884333v5)
[https://github.com/hassothea/KFC-Procedure](https://github.com/hassothea/KFC-Procedure)
[https://github.com/hassothea/AggregationMethods](https://github.com/hassothea/AggregationMethods) ]