class: center, middle, inverse, title-slide # Mining the National Diet and Nutrition Survey (NDNS RP) data ## Where, when, what you eat. ### 王 超辰 | Chaochen Wang ### 2019-09-17 18:00~19:30
@AMU
・疫学懇話会 --- class: middle # Outline of today's talk -- ### Correspondence analyses: - Relationship between **food consumed** and **eating location** for UK adolescents - Relationship between **food consumed** and **eating time** for UK adults according to their diabetes status <!-- -- --> <!-- ### Multilevel Latent Class Analysis: --> <!-- - Relationship between **carbohydrate consumption** and **eating time** for UK adults. --> --- class: middle ## Some intuition about correspondence analysis (CA) - CA is a method for investigating the relationship in a **two-dimensional contingency table**. - It takes a large table, and turns it into a seemingly easy-to-read visualization. --- class: middle ## Analysis of a contingency table - The frequency of consumption of three healthy food groups at each location in the NDNS RP data among teenagers (age: 11~19 years): <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Others </th> <th style="text-align:right;"> Home </th> <th style="text-align:right;"> School_work </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:right;"> 95 </td> <td style="text-align:right;"> 613 </td> <td style="text-align:right;"> 201 </td> <td style="text-align:right;"> 909 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:right;"> 455 </td> <td style="text-align:right;"> 2595 </td> <td style="text-align:right;"> 839 </td> <td style="text-align:right;"> 3889 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:right;"> 698 </td> <td style="text-align:right;"> 6122 </td> <td style="text-align:right;"> 376 </td> <td style="text-align:right;"> 7196 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Sum </td> <td style="text-align:right;"> 1248 </td> <td style="text-align:right;"> 9330 </td> <td style="text-align:right;"> 1416 </td> <td style="text-align:right;"> 11994 </td> </tr> </tbody> </table> --- class: middle ## Usually, we will conduct a `\(\chi^2\)` test. <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Others </th> <th style="text-align:right;"> Home </th> <th style="text-align:right;"> School_work </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:right;"> 95 </td> <td style="text-align:right;"> 613 </td> <td style="text-align:right;"> 201 </td> <td style="text-align:right;"> 909 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:right;"> 455 </td> <td style="text-align:right;"> 2595 </td> <td style="text-align:right;"> 839 </td> <td style="text-align:right;"> 3889 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:right;"> 698 </td> <td style="text-align:right;"> 6122 </td> <td style="text-align:right;"> 376 </td> <td style="text-align:right;"> 7196 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Sum </td> <td style="text-align:right;"> 1248 </td> <td style="text-align:right;"> 9330 </td> <td style="text-align:right;"> 1416 </td> <td style="text-align:right;"> 11994 </td> </tr> </tbody> </table> $$ \chi^2 = \sum\frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}} \sim \chi^2_{(m -1)\times(n-1)} $$ -- To look for evidence against the null hypothesis that **there is no difference across the columns or rows**. --- class: middle ## Hypothesis of independence The null hypothesis of no difference across the columns or rows is also called: **Hypothesis of independence, or Homogeneity assumption.** ```r chisq.test(freqtab) ``` ``` ## ## Pearson's Chi-squared test ## ## data: freqtab ## X-squared = 792.56, df = 4, p-value < 2.2e-16 ``` We conclude that **there is dependency between the columns (food groups) and the rows (locations)**. --- class: middle ## But what exactly does this dependency mean? <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Others </th> <th style="text-align:right;"> Home </th> <th style="text-align:right;"> School_work </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:right;"> 95 </td> <td style="text-align:right;"> 613 </td> <td style="text-align:right;"> 201 </td> <td style="text-align:right;"> 909 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:right;"> 455 </td> <td style="text-align:right;"> 2595 </td> <td style="text-align:right;"> 839 </td> <td style="text-align:right;"> 3889 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:right;"> 698 </td> <td style="text-align:right;"> 6122 </td> <td style="text-align:right;"> 376 </td> <td style="text-align:right;"> 7196 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Sum </td> <td style="text-align:right;"> 1248 </td> <td style="text-align:right;"> 9330 </td> <td style="text-align:right;"> 1416 </td> <td style="text-align:right;"> 11994 </td> </tr> </tbody> </table> ### Can we get **more information** from this contingency table? How? -- ### Yes! we can visualize their relationships! --- class: middle ### Step 1 - convert the frequency cell to proportions <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Matrix X. Observed Cell Proportions</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Others </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> School_work </th> <th style="text-align:left;"> r: row masses </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:left;"> 0.0079 </td> <td style="text-align:left;"> 0.0511 </td> <td style="text-align:left;"> 0.0168 </td> <td style="text-align:left;"> 0.0758 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:left;"> 0.0379 </td> <td style="text-align:left;"> 0.2160 </td> <td style="text-align:left;"> 0.0700 </td> <td style="text-align:left;"> 0.3240 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:left;"> 0.0582 </td> <td style="text-align:left;"> 0.5100 </td> <td style="text-align:left;"> 0.0313 </td> <td style="text-align:left;"> 0.6000 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> c: col masses </td> <td style="text-align:left;"> 0.1040 </td> <td style="text-align:left;"> 0.7780 </td> <td style="text-align:left;"> 0.1180 </td> <td style="text-align:left;"> 1.0000 </td> </tr> </tbody> </table> ??? - Row masses: `\(\mathbf{r} = \mathbf{X1}\)` ```r r <- Xp %*% matrix(c(1, 1, 1), nrow = 3) ``` - Column masses: `\(\mathbf{c} = \mathbf{X}^T\mathbf{1}\)` ```r c <- t(Xp) %*% matrix(c(1, 1, 1), nrow = 3) ``` - On average (77.8%) most of the healthy foods are eaten at home --- class: middle ### Step 2 - The deviation (residuals) of each cell in matrix X from its expected value `$$\text{Deviation} = \mathbf{X} - \mathbf{rc}^T$$` <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Deviations from Expected</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Others </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> School_work </th> <th style="text-align:left;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:left;"> 0.00003 </td> <td style="text-align:left;"> -0.00785 </td> <td style="text-align:left;"> 0.00781 </td> <td style="text-align:left;"> -0.00000 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:left;"> 0.00420 </td> <td style="text-align:left;"> -0.03587 </td> <td style="text-align:left;"> 0.03167 </td> <td style="text-align:left;"> 0.00000 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:left;"> -0.00423 </td> <td style="text-align:left;"> 0.04371 </td> <td style="text-align:left;"> -0.03948 </td> <td style="text-align:left;"> 0.00000 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Sum </td> <td style="text-align:left;"> 0.00000 </td> <td style="text-align:left;"> 0.00000 </td> <td style="text-align:left;"> -0.00000 </td> <td style="text-align:left;"> 0.00000 </td> </tr> </tbody> </table> ??? Big positive (negative) numbers means a strong positive (negative) relationship. The residuals quantify the difference between the observed data and the data we would expect under the assumption that there is no relationship between the row and column categories of the table. --- class: middle ### Step 3 - Standardise the residual for fair comparison (Z score) $$ \mathbf{Z} = \mathbf{D}_r^{-\frac{1}{2}} \times \text{Deviation} \times \mathbf{D}_c^{-\frac{1}{2}} $$ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Standardised Residuals Table</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Others </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> School_work </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:left;"> 0.00039 </td> <td style="text-align:left;"> -0.03231 </td> <td style="text-align:left;"> 0.08258 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:left;"> 0.02285 </td> <td style="text-align:left;"> -0.07142 </td> <td style="text-align:left;"> 0.16188 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:left;"> -0.01694 </td> <td style="text-align:left;"> 0.06399 </td> <td style="text-align:left;"> -0.14835 </td> </tr> </tbody> </table> ??? These standardized residuals are z-scores. This means that values of more than 1.96, or less than -1.96, are significant at the 0.05 level. For binary variables, their standard deviation is the square root of their expected values. --- class: middle ### Step 4 - Single Value Decomposition $$ \mathbf{Z} = \mathbf{UD}_\lambda \mathbf{V}^T $$ Where, - `\(\mathbf{Z}\)`: the standardised deviation matrix; - `\(\mathbf{U}\)`: the left singular vectors of `\(\mathbf{Z}\)` <br> (principal axes of food groups space **for projection of locations**); - `\(\mathbf{D}_\lambda\)`: singular values matrix; - `\(\mathbf{V}^T\)`: right singular vectors of `\(\mathbf{Z}\)`<br> (principal axes of location space **for projection of food groups**). --- class: middle ### Recall that in principle component analysis (PCA) - Spectral decomposition $$ \mathbf{S} = \mathbf{P}\Lambda\mathbf{P}^T $$ ### It is also easy to prove that $$ \mathbf{D}_\lambda^2 = \Lambda $$ ??? - S is the variance covariance matrhx - Lambda is the diagonal eigenvalue matrix - P is orthogonal projection matrix with new coordinates for the rotated new variables --- class: middle ### Step 4 - Single Value Decomposition .small[ $$ `\begin{aligned} \mathbf{Z} & = \mathbf{UD}_\lambda \mathbf{V}^T \\ \mathbf{Z} & = \left[\begin{matrix} 0.00039&-0.032&0.083\\ 0.02285&-0.071&0.162\\ -0.01694&0.064&-0.148 \end{matrix}\right] \\ = \left[\begin{matrix} -0.34&0.8980&0.28\\ -0.69&-0.4400&0.57\\ 0.63&0.0043&0.77 \end{matrix}\right]& \left(\begin{matrix} 0.26&0&0\\ 0&0.01&0\\ 0&0&0 \end{matrix}\right) \left[\begin{matrix} -0.10&-0.94&0.32\\ 0.39&0.26&0.88\\ -0.91&0.22&0.34 \end{matrix}\right]\\ \end{aligned}` $$ ] In R, simply do ```r svd(z) ``` --- class: middle ### Step 5 - Calculate row profile, and column profile, centre them .pull-left[ .small[ - Percentages of location for each food was eaten: <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Matrix R. Row Profile</caption> <thead> <tr> <th style="text-align:left;"> Others </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> School_work </th> <th style="text-align:left;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 0.1050 </td> <td style="text-align:left;"> 0.6740 </td> <td style="text-align:left;"> 0.2210 </td> <td style="text-align:left;"> 1.0000 </td> </tr> <tr> <td style="text-align:left;"> 0.1170 </td> <td style="text-align:left;"> 0.6670 </td> <td style="text-align:left;"> 0.2160 </td> <td style="text-align:left;"> 1.0000 </td> </tr> <tr> <td style="text-align:left;"> 0.0970 </td> <td style="text-align:left;"> 0.8510 </td> <td style="text-align:left;"> 0.0523 </td> <td style="text-align:left;"> 1.0000 </td> </tr> </tbody> </table> $$ `\begin{aligned} \mathbf{R} & = \mathbf{D}^{-1}_r\mathbf{X} \\ \mathbf{R} &\; - \mathbf{1c}^T \end{aligned}` $$ ]] .pull-right[ .small[ - Percentages of food was eaten at each location: <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Matrix C. Column Profile</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Others </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> School </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:left;"> 0.0761 </td> <td style="text-align:left;"> 0.0657 </td> <td style="text-align:left;"> 0.1420 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:left;"> 0.3650 </td> <td style="text-align:left;"> 0.2780 </td> <td style="text-align:left;"> 0.5930 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:left;"> 0.5590 </td> <td style="text-align:left;"> 0.6560 </td> <td style="text-align:left;"> 0.2660 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Sum </td> <td style="text-align:left;"> 1.0000 </td> <td style="text-align:left;"> 1.0000 </td> <td style="text-align:left;"> 1.0000 </td> </tr> </tbody> </table> $$ `\begin{aligned} \mathbf{C} & = \mathbf{D}^{-1}_r\mathbf{X}^T \\ \mathbf{C} &\; - \mathbf{1r}^T \end{aligned}` $$ ] ] --- class: middle ### Step 6 - Find principal coordinates for rows and columns .pull-left[ .small[- Project the centred row profiles (food groups) onto to principal axes of locations `\(\mathbf{V}\)` $$ F = (\mathbf{R} - \mathbf{1c}^T)(\mathbf{D}_c^{-1})(\mathbf{D}_c^{\frac{1}{2}}\mathbf{V}) $$ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Matrix F: Principal Coordinates for Rows</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Horizontal </th> <th style="text-align:left;"> Vertical </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:left;"> -0.3203 </td> <td style="text-align:left;"> 0.0339 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:left;"> -0.3132 </td> <td style="text-align:left;"> -0.0080 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:left;"> 0.2097 </td> <td style="text-align:left;"> 0.0001 </td> </tr> </tbody> </table> ]] .pull-right[ .small[ - Project the centred column prifiles (locations) onto principal axes of food groups `\(\mathbf{U}\)` $$ G = (\mathbf{C} - \mathbf{1r}^T)(\mathbf{D}_r^{-1})(\mathbf{D}_r^{\frac{1}{2}}\mathbf{U}) $$ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Matrix G: Principal Coordinates for Columns</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Horizontal </th> <th style="text-align:left;"> Vertical </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Others </td> <td style="text-align:left;"> -0.0828 </td> <td style="text-align:left;"> -0.0303 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Home </td> <td style="text-align:left;"> 0.1147 </td> <td style="text-align:left;"> 0.0031 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> School_work </td> <td style="text-align:left;"> -0.6827 </td> <td style="text-align:left;"> 0.0066 </td> </tr> </tbody> </table> ]] --- background-image: url("./img/CA_3fg3lc.png") background-position: 50% 50% background-size: contain ??? - the vertical axis may be interpreted as moving from structured environments (home, school-work) to unstructured (Other - leisure, party, mobile, bus, etc.) - The percentage label for each axis is a measure of how much of the total variation in the data (inertia) has been captured in that axis. - So 99.8% of total variation of the contingency table is explained by the horizontal axis. --- class: middle ### Let's look at the row (food group) profile again .pull-left[ .small[ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <caption>Matrix R. Row Profile</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Others </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> School_work </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Brown Bread </td> <td style="text-align:left;"> 0.1050 </td> <td style="text-align:left;"> 0.6740 </td> <td style="text-align:left;"> 0.2210 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;font-weight: bold;color: white !important;background-color: #d95f0e !important;"> Fruit </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #d95f0e !important;"> 0.1170 </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #d95f0e !important;"> 0.6670 </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #d95f0e !important;"> 0.2160 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Veg not raw </td> <td style="text-align:left;"> 0.0970 </td> <td style="text-align:left;"> 0.8510 </td> <td style="text-align:left;"> 0.0523 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;font-weight: bold;color: white !important;background-color: #666666 !important;"> c: col masses </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #666666 !important;"> 0.1040 </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #666666 !important;"> 0.7780 </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #666666 !important;"> 0.1180 </td> </tr> </tbody> </table> ]] .pull-right[ .small[1. The origin point is the average profile of these points. 2. On average, 77.8% of these healthy food were eaten at home 3. Although 66.7% of fruit intake were recorded at home, it is lower than the average. 4. Fruit plotted away from home. 5. Similarly, we see vegetable plotted closer to home. ]] ??? - Length of the vector from the origin to any point represents the deviation (residuals from the average) --- background-image: url("./img/CA_3fg3lc.png") background-position: 50% 50% background-size: contain --- class: middle, center ### The full contingency table of food recordings and eating locations .small[
] --- class: middle # Obectives - To investigate/describe the association visually between food groups and locations using Correspondence Analysis (CA). - To investigate/describe the association visually between food groups and time slots using CA. - Generate and test some potential hypotheses suggested by close assocations identified using CA. --- class: middle # Methods and strategy 1. Randomly select 50% of the food recordings; 1. Visualize the association structure between locations and food groups; 2. Generating hypotheses concerning these associations for further sampling and testing them use the other half of the data; 3. Use Bonferroni method to adjust for *p* values. --- class: middle # Data from the NDNS RP - NDNS RP (started from 2008): - On-going cross-sectional study - representative of UK popluation <br>- about 1000/year - 4 consecutive days food diary - 2821 teenagers <br> (1396 boys, 1425 girls, age between 11-18 inclusive) - 208037 food recordings collected. --- background-image: url("./img/Diary1.png") background-position: 50% 50% background-size: contain --- background-image: url("./img/Diary2.png") background-position: 50% 50% background-size: contain --- background-image: url("./img/Diary03.png") background-position: 50% 50% background-size: contain --- class: middle # Hierarchies in the data 1. Food recordings are nested within individuals; - Within person correlation 2. Individuals are nested within 12 survey regions; - The complex survey design ## Solution: - Mixed (random) effect logistic regression models using generalised estimating equation (GEE). - Covariates included: <br>Sex, age, socio-economic class ??? - GEE can estimate the Odds ratio which can be interpreted as population average effect. - Unbiased estiamtes (even when the correlation structure is mis-specified) --- class: middle ### Frequencies and proportions of food recordings by eating locations .small[
] --- class: left ## The healthiness of food groups - Healthiness scoring system is based on a system from Rayner et al. 2013. - Categorisation of this point is controversial, roughly, the lower the point, the healthier it is considered. - Later we use tertiles to separate the foods into three groups. <br> .footnote[ <html><head></head><body><div class="csl-entry"> <div class="csl-left-margin"><div class="csl-right-inline">M. Rayner, P. Scarborough, and A. Kaur, “Nutrient profiling and the regulation of marketing to children. Possibilities and pitfalls,” <i>Appetite</i>, vol. 62, pp. 232–235, 2013.</div> </div> </body></html> ] --- class: ### Food groups and contribution to calories .small[
] --- class: middle, center ### The full contingency table of food recordings and eating locations .small[
] --- background-image: url("./img/CA60fg7lc_not.png") background-position: 50% 50% background-size: contain --- background-image: url("./img/CA60fg7lc_big.png") background-position: 50% 50% background-size: contain ??? - This plot figure captured 88.8% total variation of the contingency table. - Horizontally, Home contrasts with all the others; - Vertically, school and work locations are different from the other locations; - School/home are hugly different from Leisure locations (clubs and cafe); home and move are also far away from similar. --- class: inverse, center, middle # Stratified CA biplots --- background-image: url("./img/F1st20_loc.png") background-position: 50% 50% background-size: contain ??? - 90% of total variation were explained in this figure - Chicken, fish, brown bread, fruit, white meal bread are more likely to be consumed at school and work - Pasta, rice, milk, are more likely to be consumed at home. - Beer, Wine, Chips are associated when the teenagers are with friends, leisure locations. --- background-image: url("./img/F2nd20_loc.png") background-position: 50% 50% background-size: contain ??? - Regular sweet soft drinks are closely associated with Friends, on the move. - Spirits and liqueurs are far awary from the other food here and closer to leisure locations. --- background-image: url("./img/F3rd20_loc.png") background-position: 50% 50% background-size: contain ??? - Cheese, crisps, biscuits are associated with school and work. - Sugar, candy, are associated when teenagers are on the move. - Meat pastries and chocolate are shown together and associated with Friends. --- class: ## To simplify testing Location definition collapsed: .pull-left[ .small[ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Location </th> <th style="text-align:left;"> Location </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Home </td> <td style="text-align:left;"> Home </td> </tr> <tr> <td style="text-align:left;"> School </td> <td style="text-align:left;"> School-Work </td> </tr> <tr> <td style="text-align:left;"> Work </td> <td style="text-align:left;"> School-Work </td> </tr> <tr> <td style="text-align:left;"> Friends/Carers/Relatives </td> <td style="text-align:left;"> Other </td> </tr> <tr> <td style="text-align:left;"> Mobile: bus, car, street, etc </td> <td style="text-align:left;"> Other </td> </tr> <tr> <td style="text-align:left;"> Leisure: café, club, church, etc </td> <td style="text-align:left;"> Other </td> </tr> <tr> <td style="text-align:left;"> Other </td> <td style="text-align:left;"> Other </td> </tr> </tbody> </table> ] ] .pull-right[ .small[ - Leisure, on the move, Friends, and others are usually close to each other and on the same quadrant (象限) - Same with school and work. - So the interest become to look for those food that was associated with locations away from home/school ]] --- class: middle ## Hypotheses: 1. Chocoloate and meat pastries apear to be highly associated in their location being consumed. 2. Sweetened soft drinks, chips, fruit, chocolate, and meat pastries were more likely to be consumed when a teenager is away from home, or away from school. --- class: middle ## Correlation between chocolate and meat pastries - The correlation between two profile vectors, `\(\mathbf{f1, f2}\)` - can be proved to be equivalent to the cosine of the angle between the vectors: `$$\cos\theta = \frac{\mathbf{f1 f2}}{|f1||f2|}$$` - We applied bootstrapping approach (10000 subsamples with replacement) and run CA on each sample, and calculate `\(\cos\theta\)` for each sample --- class: middle, center ### Bootstrap distribution of the correlation for chocolate and meat pastries ![Bootstrap](img/Screen Shot 2019-09-11 at 19.19.45.png) --- class: middle ## Location profile for chocolate and meat pastries .small[ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Friend </th> <th style="text-align:left;"> Home </th> <th style="text-align:left;"> Leis </th> <th style="text-align:left;"> Move </th> <th style="text-align:left;"> Other </th> <th style="text-align:left;"> School </th> <th style="text-align:left;"> Work </th> <th style="text-align:left;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Chocolate </td> <td style="text-align:left;"> 105 </td> <td style="text-align:left;"> 1077 </td> <td style="text-align:left;"> 120 </td> <td style="text-align:left;"> 117 </td> <td style="text-align:left;"> 147 </td> <td style="text-align:left;"> 331 </td> <td style="text-align:left;"> 32 </td> <td style="text-align:left;"> 1929 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> </td> <td style="text-align:left;"> 5.4% </td> <td style="text-align:left;"> 55.8% </td> <td style="text-align:left;"> 6.2% </td> <td style="text-align:left;"> 6.1% </td> <td style="text-align:left;"> 7.6% </td> <td style="text-align:left;"> 17.2% </td> <td style="text-align:left;"> 1.7% </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Meat pastries </td> <td style="text-align:left;"> 23 </td> <td style="text-align:left;"> 243 </td> <td style="text-align:left;"> 34 </td> <td style="text-align:left;"> 27 </td> <td style="text-align:left;"> 35 </td> <td style="text-align:left;"> 86 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 459 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> </td> <td style="text-align:left;"> 5.0% </td> <td style="text-align:left;"> 52.9% </td> <td style="text-align:left;"> 7.4% </td> <td style="text-align:left;"> 5.9% </td> <td style="text-align:left;"> 7.6% </td> <td style="text-align:left;"> 18.7% </td> <td style="text-align:left;"> 2.4% </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Row mass </td> <td style="text-align:left;"> 4.5% </td> <td style="text-align:left;"> 63.4% </td> <td style="text-align:left;"> 5.5% </td> <td style="text-align:left;"> 3.0% </td> <td style="text-align:left;"> 4.2% </td> <td style="text-align:left;"> 17.4% </td> <td style="text-align:left;"> 2.0% </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> ] --- class:middle ### Soft-drinks, chips, chocolate, meat pastries and fruit at locations away from home/school .small[ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> H0 </th> <th style="text-align:right;"> OR vs Home </th> <th style="text-align:left;"> 99% CI </th> <th style="text-align:right;"> OR vs School-work </th> <th style="text-align:left;"> 99% CI </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Sweetened soft drinks </td> <td style="text-align:right;"> 2.9 </td> <td style="text-align:left;"> (2.3, 2.5) </td> <td style="text-align:right;"> 2.30 </td> <td style="text-align:left;"> (1.7, 3.1) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Chips </td> <td style="text-align:right;"> 2.8 </td> <td style="text-align:left;"> (2.2, 3.6) </td> <td style="text-align:right;"> 3.40 </td> <td style="text-align:left;"> (2.1, 5.3) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Chocolates </td> <td style="text-align:right;"> 2.5 </td> <td style="text-align:left;"> (1.8, 3.4) </td> <td style="text-align:right;"> 1.80 </td> <td style="text-align:left;"> (1.2, 2.8) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Meat pastries </td> <td style="text-align:right;"> 2.8 </td> <td style="text-align:left;"> (1.5, 5.0) </td> <td style="text-align:right;"> 1.30 </td> <td style="text-align:left;"> (0.6, 3.0) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Fruit </td> <td style="text-align:right;"> 0.7 </td> <td style="text-align:left;"> (0.49, 0.98) </td> <td style="text-align:right;"> 0.44 </td> <td style="text-align:left;"> (0.29, 0.67) </td> </tr> </tbody> </table> ] --- class: # Conclusion - Sweetened soft drinks, chips, and chocolates are more likely to be consumed when a teenager finds himself/herself at locations other than Home or School. - Meat pastries are more likely to be consumed outside of Home (because school and other locations are equally likely) - Odds of eating fruit at other locations are significantly lower than at home, and at school. --- class: middle ## Correlation between chocolate and meat pastries - Chocolate and meat pastries have location profiles which are strongly correlated. - Both conveient foods to carry and eat. - Both available at low prices (<£2). --- class: middle ## Why These are important? - Large proportion of food consumed at other locations is packaged and sold, rather than cooked meals at home or school. - Policymakers may need to legislate and regulate about what is sold, how the price should be, how they should be packaged and promoted for teenagers. --- class: middle, center, inverse # Food group consumption and time slots --- class: middle ## Objectives - To investigate and describe the relationshipt between **food** eaten by British adults, **time slots** and **diabetes status**. --- background-image: url("./img/Diary1.png") background-position: 50% 50% background-size: contain --- background-image: url("./img/Diary2.png") background-position: 50% 50% background-size: contain --- background-image: url("./img/Diary03.png") background-position: 50% 50% background-size: contain --- class: middle ## Methods and strategies - Correspodence analysis (CA) - Biplots stratified by diabetes status - Mixed effect logistic regression using generalized estimating equation (GEE). - Generate hypotheses from half of the food recordings, and test using the other half. - Time slots defined: 6-9 am, 9-12 noon, 12-2 pm, 2-5 pm, 5-8 pm, 8-10 pm and 10 pm-6 am. --- class: middle ## Definition of diabetes <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Diabetes Status </th> <th style="text-align:left;"> Self-reported </th> <th style="text-align:left;"> Glucose (mmol/L) </th> <th style="text-align:left;"> HbA1c (%) </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> No diabetes </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> < 6.1 </td> <td style="text-align:left;"> < 6.5 </td> <td style="text-align:right;"> 2626 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Pre-diabetes </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> 6.1 ~ 6.99 </td> <td style="text-align:left;"> -- </td> <td style="text-align:right;"> 133 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Undiagnosed </td> <td style="text-align:left;"> No </td> <td style="text-align:left;"> >= 7.00 </td> <td style="text-align:left;"> >= 6.5 </td> <td style="text-align:right;"> 99 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Diagnosed </td> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> -- </td> <td style="text-align:left;"> -- </td> <td style="text-align:right;"> 227 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Missing </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 3717 </td> </tr> </tbody> </table> --- class: middle ## Summary statistics 6802 adults (age >= 19 years), 2810 men and 3992 women from NDNS RP data (2008-2018). 749,026 recordings of food group entries. .small[ <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> MealTimeSlot </th> <th style="text-align:right;"> n </th> <th style="text-align:left;"> rel.freq </th> <th style="text-align:left;"> cum.freq </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> 6am to 8:59am </td> <td style="text-align:right;"> 107144 </td> <td style="text-align:left;"> 14.304% </td> <td style="text-align:left;"> 14.304% </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 9am to 11:59am </td> <td style="text-align:right;"> 110614 </td> <td style="text-align:left;"> 14.768% </td> <td style="text-align:left;"> 29.072% </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 12 noon to 1:59pm </td> <td style="text-align:right;"> 138183 </td> <td style="text-align:left;"> 18.448% </td> <td style="text-align:left;"> 47.521% </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 2pm to 4:59pm </td> <td style="text-align:right;"> 94606 </td> <td style="text-align:left;"> 12.631% </td> <td style="text-align:left;"> 60.151% </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 5pm to 7:59pm </td> <td style="text-align:right;"> 180498 </td> <td style="text-align:left;"> 24.098% </td> <td style="text-align:left;"> 84.249% </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 8pm to 9:59pm </td> <td style="text-align:right;"> 81716 </td> <td style="text-align:left;"> 10.91% </td> <td style="text-align:left;"> 95.158% </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 10pm to 5:59am </td> <td style="text-align:right;"> 36265 </td> <td style="text-align:left;"> 4.842% </td> <td style="text-align:left;"> 100% </td> </tr> </tbody> </table> ] --- class: middle ### Food groups and contribution to calories .small[
] --- class: middle, center ### The full contingency table of food recordings and time slots .tiny[
] --- class: middle, center, inverse # CA biplots in total sample --- background-image: url("./img/F60T7.png") background-position: 50% 50% background-size: contain ??? - It seems that later time in the day (8 pm- 10pm-) are shown on the upper side of the plot. - Morning time people are having breakfast foods (cereals/milk) - Chocolate, Bear, Spirits, sugar candy, sweetened soft drinks are clouded around later times. --- background-image: url("./img/F201T7.png") background-position: 50% 50% background-size: contain ??? - it is clearer that late time slots (8pm - 10pm, 10pm - 6am) are different from the earlier hours in stratified figures by food groups (for better visualisation) --- background-image: url("./img/F202T7.png") background-position: 50% 50% background-size: contain --- background-image: url("./img/F203T7.png") background-position: 50% 50% background-size: contain --- class: middle, center, inverse # Food group consumption and time slots <br> stratified by DM status --- background-image: url("./img/F60T7_nonDM.png") background-position: 50% 50% background-size: contain ??? - IN non-diabetes, sugar, beer, wine, crisps, sweetened soft drink, biscuits are more associated with night period time slots. --- background-image: url("./img/F60T7_DM.png") background-position: 50% 50% background-size: contain ??? - For Diagnosed DM patients (self-reported), beer, spirits, chocolate, biscuits, regular sweetened soft drinks are apear close to time after 8 pm. --- background-image: url("./img/F60T7_UndiagDM.png") background-position: 50% 50% background-size: contain ??? - Among undiagnosed DM patients, sugar, chocolate, Beer, regular soft drinks, wine, ice cream, biscuits, Puddings seems to apear to shown in later time slots. --- background-image: url("./img/F60T7_PreDM.png") background-position: 50% 50% background-size: contain --- class:center ##### OR for food groups eaten at night (8pm - ) vs. earlier time, among total and according to DM status .between[
] --- class: middle # Discussion (1) - All unhealthy foods emerged from CA were significantly more likely to be eaten after 8pm. - These included alcoholic/sweetened beverages, chocolates and other foods rich in added sugars and saturated fats like biscuits and ice-cream. - Food and drinks consumed in the evening/night time slot tend to be highly processed and easily accessible. <!-- - Potentially alcoholic/sweetened beverages and foods rich in added sugars and saturated fats. --> <!-- - The greater odds of eating puddings in the evening might partly explain the undiagnosed/prediabetic state. --> --- class: middle # Discussion (2) - Assessing the relationships between less healthy foods and timing of eating is a first step towards identifying specific public health targets for behaviour change/modification. - Undiagnosed T2D patients might be at higher risk of causing/worsening their condition as they had higher odds to consume a number of less healthy foods after 8pm (sugar-confectionary, biscuits, sweetened soft drinks and puddings). - The survey cross-sectional nature warrants further investigations by longitudinal cohort studies. --- background-image: url("./img/CAbook.jpg") background-position: 50% 50% background-size: contain --- class: left background-image: url("./img/Screen Shot 2019-09-13 at 22.40.12.png") background-position: 50% 50% background-size: contain ### The Cholera map .footnote[ <html><head></head><body><div class="csl-entry"> <div class="csl-right-inline">N. Shiode, et. al. International journal of health geographics 2015.</div> </body></html> ] --- class: inverse background-image: url("./img/Screen Shot 2019-09-13 at 22.29.43.png") background-position: 50% 50% background-size: contain --- class: left, bottom, inverse background-image: url("./img/IMG_0258.png") background-position: 50% 50% background-size: contain # Thanks