appDatos: Logic and Workflow

Technical documentation of the Shiny application

Published

February 21, 2026

Overview

appDatos is a Shiny application designed for the interactive exploration of environmental and food safety laboratory data. It allows the user to:

  • Select a matrix (type of product or sample, e.g. water, wine, fruit),
  • Choose the analytes to display on the X and Y axes,
  • Apply filters by label (rótulo) and by analytical entity,
  • Explore data through a scatter plot (two variables) or a histogram (one variable),
  • Consult a Spearman correlation table for the selected X-axis analyte.

Packages used

Show code
library(shinyWidgets)    # Enhanced widgets (pickerInput, actionBttn)
library(bslib)           # Modern theme and layout (page_navbar, card, sidebar)
library(tidyverse)       # Data manipulation and plotting (ggplot2, dplyr, stringr)
library(plotly)          # Interactive charts (ggplotly)
library(shinycssloaders) # Loading spinner while plots render
library(DT)              # Interactive tables
library(feather)         # Fast file reading (loaded but not actively used)

Data file structure

Data are stored as .rds files in the same directory as app.R. There are three file types per matrix:

Type Name pattern Contents
Analytical data <key>.rds Table with one column per analyte + Rótulo, Entidad, Fracción
Correlations correl_<key>.rds Spearman correlation matrix between analytes
Analyte names analisis_<key>.rds Character vector of available analyte names for the matrix

Available matrices (30 total)

Show code
matrices <- c(
  "AGUA PROCESO", "AGUA SUPERFICIAL", "AGUA EFLUENTE", "AGUA SUMINISTRO",
  "AGUA SUBTE", "SUELO REMEDIACIÓN", "JCONT LIMÓN", "JSIM LIMÓN",
  "ACEITE LIMÓN", "DESH TE", "JCONT NARANJA", "JSIM NARANJA",
  "JCONC MANZANA", "FF PERA", "FF MANZANA", "FF DURAZNO", "FF LIMÓN",
  "BAYAS UVA", "HORTALIZA TOMATE", "VINO TINTO", "VINO BLANCO",
  "VINO ROSADO", "PSIM DURAZNO", "PCON DURAZNO", "PSIM DAMASCO",
  "PCON DAMASCO", "PSIM MANZANA", "PCON MANZANA", "CEREAL MAÍZ",
  "TRUCHA MÚSCULO"
)

Data preparation

The .rds files consumed by the application are produced by two functions defined in carga.R: carga_datos() for the analytical data tables and correlaciones() for the Spearman correlation matrices. Both share a common data preparation pipeline applied to a raw CSV exported from the LIMS.

Common pipeline

1. Column selection and renaming

The CSV is read with data.table::fread() for speed. Only the relevant columns are kept, and Resultado convertido is renamed to Resultado_conv.

Show code
datos <- fread(file = archivo) %>%
  select(
    Fracción, `Tipo de producto`, Matriz, Rótulo, Entidad, Análisis,
    Resultado, `Modificador de resultado`, `Límite de detección`,
    `Límite de cuantificación`, `Resultado convertido`, `Unidad inicial`
  ) %>%
  rename(Resultado_conv = `Resultado convertido`) %>%
  mutate(Resultado_conv = as.numeric(Resultado_conv))

2. Handling censored values

Results flagged as below the detection or quantification limit are replaced by the corresponding limit value:

Show code
datos <- datos %>%
  mutate(Resultado_conv = case_when(
    `Modificador de resultado` == "nd" ~ as.numeric(`Límite de detección`),
    `Modificador de resultado` == "<"  ~ as.numeric(`Límite de cuantificación`),
    .default = Resultado_conv
  ))

3. Unit conversion for scaled results

Some rows carry results expressed with a power-of-10 multiplier encoded in Unidad inicial (e.g. "x1³" means ×10³). These rows are separated, rescaled, and rejoined with the rest:

Show code
dat1 <- datos %>% filter(str_detect(`Unidad inicial`, "x1"))

supin <- str_extract(dat1$`Unidad inicial`, "\\W")  # extract the superscript character

dat1 <- dat1 %>%
  mutate(y = as.numeric(as.factor(supin))) %>%       # encode as integer 1–6
  mutate(Resultado = case_when(
    y == 1 ~ Resultado * 1e3,
    y == 2 ~ Resultado * 1e4,
    y == 3 ~ Resultado * 1e5,
    y == 4 ~ Resultado * 1e6,
    y == 5 ~ Resultado * 1e7,
    y == 6 ~ Resultado * 1e8
  ))

datos <- datos %>% filter(!str_detect(`Unidad inicial`, "x1"))
datos <- full_join(dat1, datos)

4. Result consolidation and NA handling

Resultado_conv is the primary numeric value. When it is zero (i.e. originally missing), it falls back to Resultado. Zeros are then converted back to NA to mark missing data, and rows still missing a result are dropped:

Show code
datos <- datos %>%
  select(Fracción, `Tipo de producto`, Matriz, Rótulo, Entidad,
         Análisis, Resultado, Resultado_conv, `Unidad inicial`) %>%
  replace(is.na(.), 0) %>%
  mutate(Resultado_conv = ifelse(Resultado_conv == 0, Resultado, Resultado_conv)) %>%
  replace(. == 0, NA) %>%
  select(Fracción, `Tipo de producto`, Matriz, Rótulo, Entidad,
         Análisis, Resultado_conv, `Unidad inicial`) %>%
  drop_na()

5. Unit harmonization

Three units are converted to a common scale before pivoting:

Original unit Multiplied by Resulting unit
g/l ×1 000 mg/l
mS/cm ×1 000 µS/cm
g/kg ×1 000 mg/kg
Show code
datos <- datos %>%
  mutate(Resultado_conv = case_when(
    `Unidad inicial` == "g/l"   ~ Resultado_conv * 1000,
    `Unidad inicial` == "mS/cm" ~ Resultado_conv * 1000,
    `Unidad inicial` == "g/kg"  ~ Resultado_conv * 1000,
    .default = Resultado_conv
  ))

6. Pivot to wide format

Each unique value of Análisis becomes a column; the cell values are the consolidated Resultado_conv:

Show code
datos <- pivot_wider(datos, names_from = Análisis, values_from = Resultado_conv)

7. Fraction aggregation

The last digit of the Fracción code is standardised to "1" to group sub-fractions. Within each group, numeric columns are summed and Rótulo/Entidad are taken from the first row:

Show code
datos <- datos %>%
  mutate(Fracción = str_replace_all(Fracción, "\\d$", "1")) %>%
  replace(is.na(.), 0) %>%
  group_by(Fracción) %>%
  summarise(
    Rótulo   = first(Rótulo),
    Entidad  = first(Entidad),
    across(where(is.numeric), sum)
  ) %>%
  ungroup()

8. Entity ordering and final NA restoration

Entities are sorted alphabetically and stored as an ordered factor. Zeros introduced during aggregation are converted back to NA:

Show code
r <- sort(unique(datos$Entidad))
datos <- datos %>%
  mutate(Entidad = factor(Entidad, levels = r)) %>%
  replace(. == 0, NA)

carga_datos(): saving the analytical table

After the common pipeline, the wide-format tibble is saved directly as an .rds file ready for the application:

Show code
saveRDS(datos, guardado)

correlaciones(): computing and saving the Spearman matrix

After the same pipeline, correlaciones() additionally:

  1. Drops the Fracción, Rótulo, and Entidad identifier columns to keep only the analyte columns.
  2. Removes any analyte with ≤1 non-NA observation (required for a valid correlation estimate).
  3. Computes the pairwise Spearman correlation matrix with cor(..., use = "pairwise.complete.obs").
  4. Rounds values to 2 decimal places, adds an Analisis name column, and saves as .rds.
Show code
data <- datos %>% select(-(1:3))

# Keep only analytes with more than one observation
x     <- sapply(1:ncol(data), function(i) data %>% filter(!is.na(data[, i])) %>% nrow())
data1 <- data[, which(x > 1)]

matriz <- cor(data1, method = "spearman", use = "pairwise.complete.obs")
correl  <- as_tibble(matriz) %>% round(2)
correl  <- correl %>% mutate(Analisis = names(correl))

write_rds(correl, archivo_output)

Note: correlaciones() also writes an analisis_*.rds file containing the vector of analyte names. In carga.R this filename is hard-coded ("analisis_trucha_musc.rds"); in practice it would need to be parameterised the same way archivo_output is.


User Interface (UI)

The UI is built with bslib::page_navbar(), which produces a top navigation bar with tabs and a collapsible sidebar.

General layout

page_navbar
├── nav_panel("Dos variables")           ← Tab: scatter plot + correlation table
├── nav_panel("Una variable")            ← Tab: histogram
├── nav_item: selectizeInput(matrices)   ← Matrix selector (embedded in navbar)
├── nav_item: uiOutput(selec_analisis_x) ← X-axis selector (dynamic)
├── nav_item: conditionalPanel           ← Y-axis selector (only in "Dos variables")
└── sidebar
    ├── ("Una variable" panel) textInput(rot1) + actionBttn(buscar1)
    ├── ("Dos variables" panel) textInput(rot) + actionBttn(buscar)
    ├── checkboxInput(log)     ← Apply log10() to BOTH axes in the data values
    ├── checkboxInput(logy)    ← Log10 axis scale on Y (axis transformation)
    ├── checkboxInput(logx)    ← Log10 axis scale on X (axis transformation)
    ├── numericInput(vert, vert2)   ← Vertical reference lines at x
    ├── ("Dos variables") numericInput(hor, hor2)  ← Horizontal reference lines at y
    ├── checkboxInput(ent)     ← Color points/bars by entity
    ├── uiOutput(selec_entidades)  ← Multi-select entity picker (dynamic)
    └── ("Dos variables") numericInput(pendiente, ordenada, ...) ← Manual regression lines

Conditional panels

conditionalPanel() is used to show or hide controls based on the active tab (input.nav):

  • “Dos variables” controls (Y-axis selector, horizontal lines, slope/intercept inputs) appear only in that tab.
  • “Una variable” controls (label filter for the histogram) appear only in that tab.

Dynamic widgets

Three UI outputs are rendered dynamically by the server because they depend on the loaded data:

Output Widget produced Depends on
selec_analisis_x selectizeInput (X axis) analisis() — analyte names for the matrix
selec_analisis_y selectizeInput (Y axis) analisis() — same as above
selec_entidades pickerInput (multi-select) datos_full()$Entidad — unique entities in the data

Server logic

Reactive flow diagram

input$matrices
    │
    ├──► datos_full()   ─────────────────────────────────────────────────┐
    │       (analytical data for the selected matrix)                     │
    │                                                                      │
    ├──► analisis()     ──► selec_analisis_x / selec_analisis_y (UI)     │
    │       (analyte names)                                                │
    │                                                                      │
    └──► correl()       ──► tablacorr (correlation table)                 │
            (Spearman correlation matrix)                                   │
                                                                           │
input$buscar  ──► rotulos()  ────────────────────► subsetted() ──► scatterplot
                   (label filter)                       │
                                                        │ also filters by entity
input$enti  ────────────────────────────────────────────┘

input$buscar1 ──► rotulos1() ───────────────────► subsetted1() ──► scatterplot2
                   (label filter for histogram)

Data loading: datos_full(), analisis(), correl()

All three are eventReactive(input$matrices, {...}): they recompute only when the user changes the matrix. They use switch() to map the matrix name to the corresponding .rds file.

Show code
datos_full <- eventReactive(input$matrices, {
  req(input$matrices)
  switch(input$matrices,
    "AGUA PROCESO"    = read_rds("agupr.rds"),
    "AGUA SUPERFICIAL" = read_rds("super_agua.rds"),
    # ... (30 matrices)
  )
})

Note: The analisis_* and correl_* objects are loaded once at session startup (outside the server function). The analisis() and correl() reactives simply select the already-loaded object from memory via switch(), without reading files on every matrix change.

Label filtering: rotulos() and rotulos1()

Both are eventReactive triggered by the “Visualizar” button (input$buscar or input$buscar1). This is an intentional design choice: the plot does not update as the user types, only when the button is clicked.

Show code
rotulos <- eventReactive(input$buscar, {
  if (input$rot != "") {
    datos_full() %>%
      filter(str_detect(Rótulo, regex(input$rot, ignore_case = TRUE))) %>%
      pull(Rótulo)
  } else {
    datos_full() %>% pull(Rótulo)  # No filter: returns all labels
  }
})

The search uses str_detect() with regex(..., ignore_case = TRUE), so it accepts regular expressions and is case-insensitive.

Data subsetting: subsetted() and subsetted1()

These combine the entity filter (multi-select picker) and the label filter:

Show code
subsetted <- reactive({
  req(input$enti)
  datos_full() %>%
    filter(Entidad %in% input$enti & Rótulo %in% rotulos())
})

subsetted() feeds the scatter plot; subsetted1() feeds the histogram.


Visualizations

Scatter plot (output$scatterplot)

Built with ggplot2 and made interactive with ggplotly(). The plot only updates when input$buscar is pressed, because it depends on rotulos().

Layers always present

Show code
ggplot(subsetted(), aes(.data[[input$x]], .data[[input$y]],
                         text = paste0("Fracción: ", Fracción, "</br>",
                                       "Rótulo: ", Rótulo))) +
  geom_point() +
  geom_vline(xintercept = input$vert,  color = "red") +
  geom_vline(xintercept = input$vert2, color = "red") +
  geom_hline(yintercept = input$hor,   color = "red") +
  geom_hline(yintercept = input$hor2,  color = "red") +
  geom_abline(slope = input$pendiente,  intercept = input$ordenada,  color = "green") +
  geom_abline(slope = input$pendiente2, intercept = input$ordenada2, color = "green")

Variants based on active controls

The server covers all combinations of three boolean inputs via explicit if/else if branches:

input$ent input$logx input$logy input$log Effect
FALSE FALSE FALSE FALSE Standard plot, no entity coloring
FALSE TRUE FALSE FALSE X axis on log10 scale
FALSE TRUE TRUE FALSE Both axes on log10 scale
FALSE FALSE TRUE FALSE Y axis on log10 scale
TRUE FALSE FALSE FALSE Points colored by Entidad
TRUE TRUE FALSE FALSE Entity color + X axis log10
TRUE FALSE TRUE FALSE Entity color + Y axis log10
TRUE TRUE TRUE FALSE Entity color + both axes log10
TRUE/FALSE TRUE log10() applied to the data values (not the axis scale)

Key distinction: input$logx/input$logy use scale_x/y_continuous(trans = "log10"), which transforms the axis scale while keeping the original values in the tooltip. In contrast, input$log applies log10() directly inside aes(), transforming the data before plotting — so axis labels show the log-transformed values.

Histogram (output$scatterplot2)

Uses only the X axis (input$x) to show the distribution of a single analyte. Also made interactive via ggplotly().

Show code
ggplot(subsetted1(), aes(.data[[input$x]],
                          text = paste0("Fracción: ", Fracción, "</br>",
                                        "Rótulo: ", Rótulo))) +
  geom_histogram() +
  geom_vline(xintercept = input$vert,  color = "red") +
  geom_vline(xintercept = input$vert2, color = "red") +
  ylab("N° de muestras")

Variants: combination of input$ent (entity coloring via fill = Entidad) and input$logx (log10 scale on X axis), yielding four branches.

Correlation table (output$tablacorr)

Displays the analytes most correlated with the selected X-axis analyte, filtered to |r| > 0.5 and sorted in descending order:

Show code
output$tablacorr <- renderDT({
  req(input$x, correl())

  correl() %>%
    filter(.data[[input$x]] > 0.5 | .data[[input$x]] < -0.5) %>%
    select(Analisis, all_of(input$x)) %>%
    arrange(desc(.data[[input$x]])) %>%
    datatable()
})

The table header is dynamic (output$encabezado) and displays the name of the currently selected X analyte.


Complete workflow summary

User selects a matrix
        │
        ▼
datos_full() + analisis() + correl()   ← loaded / selected from memory
        │
        ▼
X/Y axis selectors and entity picker update (renderUI)
        │
User chooses analytes, types a label filter, selects entities
        │
        ▼
Clicks "Visualizar"
        │
        ▼
rotulos() / rotulos1()   ← label filter applied
        │
        ▼
subsetted() / subsetted1()   ← data filtered by entity and label
        │
        ├──► scatterplot   ("Dos variables" tab)  +  tablacorr
        └──► scatterplot2  ("Una variable" tab / histogram)

Design observations

Strengths

  • Clear separation of concerns: each reactive has a single role (loading, filtering, subsetting).
  • “Visualizar” button pattern: prevents the plot from recomputing on every keystroke, which is efficient with large datasets.
  • Upfront metadata loading: analisis_* vectors and correl_* matrices are loaded once at startup to minimize latency when switching matrices.
  • Consistent styling: bslib layout with a corporate color scheme applied uniformly via inline CSS.

Opportunities for refactoring (for future reference)

  • The ~10 if/else if branches in the scatter plot renderer could be simplified by building the base plot once and appending layers conditionally.
  • The three switch() blocks (data, analisis, correl) all share the same matrix list and could be consolidated into a helper function or a named lookup list.
  • The feather package is loaded but never used and can be removed.