Item Reduction

Author

Riccardo Volpato

source("scripts/read_qualtrics_data.R")
d <- load_data("data/TPSai+-+Pilot+Tester_27+April+2026_13.02.csv", "data/factors_v1.1.csv")
items_data <- d$items_data
cols_tpsai <- d$item_ids

Items

Questions

d$factors_map %>% 
  select(IID, Item) %>% 
  kable()

IID	Item
COMP_01	I trust the AI to provide me information that helps me manage my psychological challenges
COMP_02	I trust the AI to help me clarify my thoughts about a difficult situation
COMP_03	I trust that what the AI tells me about psychological concepts and processes is accurate and reliable
COMP_04	I can rely on the AI to make sense of the perspective of someone I am having difficulties to understand
COMP_05	Using the AI enables me to see how my experiences fit into what is common for others
PERS_01	I trust the AI to understand my specific needs when I ask for support
PERS_02	I trust the AI to give me advice that is well-tailored to my specific situation
PERS_03	I trust that the more information the AI has about me, the better it can support me
PERS_04	I trust the AI to help me gain new insights into my issues
PERS_05	I trust the AI to suggest helpful strategies based on my situation
EMOT_01	I trust the AI to reflect the feelings I share in a way that feels accurate
EMOT_02	I trust the AI to make me feel heard and validated
EMOT_03	The AI helps me regulate my emotions when I feel distressed
EMOT_04	I trust the AI to help me feel more confident in my ability to handle a difficult situation
EMOT_05	I trust the AI to adjust its responses appropriately as my emotions change during our conversation
SAFE_01	I feel comfortable sharing my private and sensitive information with the AI
SAFE_02	I feel safe opening up to the AI about anything that is on my mind
SAFE_03	I trust that the AI responses are free from harmful social stereotypes
SAFE_04	I trust the AI to provide me with a space where I can express myself freely
SAFE_05	I trust the AI to respond to me in ways that respect who I am
RISK_01	I am concerned that the AI may not be able to respond appropriately to a mental health crisis (e.g., feeling hopeless, stuck, or wanting to self-harm)
RISK_02	Using the AI as my primary source of psychological support would be risky
RISK_03	I am concerned that the AI does not sufficiently highlight its limitations as a mental health tool
RISK_04	I doubt that the AI can provide sufficient support for complex mental health conditions
RISK_05	I believe the AI may provide inaccurate or misleading mental health information
CORP_01	I am concerned that the AI company may use my data in ways I do not approve of
CORP_03	I am skeptical that the AI company ensures the AI is safe to use for mental health support
CORP_04	I am concerned that the values of the AI company do not sufficiently align with my own
CORP_05	I am concerned that the AI company treats my wellbeing as secondary to their commercial interests
CRIT_01	I am concerned that the way the AI communicates may lead me to trust it more than I should
CRIT_02	It is up to me to guide the AI to give responses that meet my needs
CRIT_03	I trust AI responses less as conversations get longer and the AI starts to ‘drift’ (e.g., speaks repetitively or less consistently)
CRIT_04	When in doubt, I would trust my own experience over what the AI tells me
CRIT_05	I feel the need to verify the AI’s sources when the information is important
HUMN_01	The AI cannot replace human connections for emotional support
HUMN_02	Even though the AI sounds human, our interactions lack the shared experience of human connections
HUMN_03	The emotional support the AI can offer is limited by its lack of embodied feelings and shared presence
HUMN_04	While the AI uses empathic language, it cannot care about my wellbeing as a good friend would
HUMN_05	Even though I can talk with the AI about my feelings, it does not experience emotions the way I do

Statistics

Total number of responses is 70

item_stats <- items_data %>%
  select(all_of(cols_tpsai)) %>%
  pivot_longer(everything(), names_to = "item", values_to = "response") %>%
  group_by(item) %>%
  summarise(
    n        = sum(!is.na(response)),
    mean     = mean(response, na.rm = TRUE),
    sd       = sd(response, na.rm = TRUE),
    median   = median(response, na.rm = TRUE),
    min      = suppressWarnings(min(response, na.rm = TRUE)),
    max      = suppressWarnings(max(response, na.rm = TRUE)),
    skew     = psych::skew(response, na.rm = TRUE),
    kurtosis = psych::kurtosi(response, na.rm = TRUE),
    .groups  = "drop"
  ) %>%
  arrange(item)

item_stats %>%
  knitr::kable(
    digits = 2,
    col.names = c("Item", "N", "Mean", "SD",
                  "Median", "Min", "Max", "Skew", "Kurtosis"),
  )

Item	N	Mean	SD	Median	Min	Max	Skew	Kurtosis
COMP_01	69	4.29	1.91	5.0	1	7	-0.39	-1.14
COMP_02	70	4.67	1.78	5.0	1	7	-0.66	-0.62
COMP_03	69	3.90	1.78	4.0	1	7	-0.18	-1.22
COMP_04	67	4.19	1.91	5.0	1	7	-0.34	-1.15
COMP_05	67	4.30	1.72	4.0	1	7	-0.27	-0.80
CORP_01	69	5.86	1.39	6.0	2	7	-1.15	0.61
CORP_03	69	5.54	1.58	6.0	1	7	-1.00	0.29
CORP_04	65	5.28	1.77	6.0	1	7	-0.80	-0.36
CORP_05	69	5.68	1.59	6.0	1	7	-1.04	0.12
CRIT_01	70	5.54	1.49	6.0	2	7	-0.81	-0.39
CRIT_02	70	5.44	1.45	6.0	1	7	-1.07	0.50
CRIT_03	66	5.55	1.47	6.0	2	7	-1.03	0.09
CRIT_04	70	6.13	1.14	7.0	2	7	-1.34	1.43
CRIT_05	68	6.04	1.34	6.0	1	7	-1.94	4.03
EMOT_01	67	4.33	1.61	5.0	1	7	-0.60	-0.57
EMOT_02	70	4.29	1.76	4.5	1	7	-0.43	-0.67
EMOT_03	63	4.49	1.75	5.0	1	7	-0.48	-0.84
EMOT_04	67	4.31	1.85	5.0	1	7	-0.36	-1.04
EMOT_05	65	3.89	1.67	4.0	1	7	-0.15	-0.85
HUMN_01	70	6.24	1.26	7.0	2	7	-1.75	2.36
HUMN_02	69	6.12	1.33	7.0	1	7	-1.75	2.62
HUMN_03	67	5.40	1.74	6.0	1	7	-0.93	-0.25
HUMN_04	70	6.37	1.19	7.0	2	7	-1.99	3.12
HUMN_05	69	6.42	1.14	7.0	1	7	-2.37	6.24
PERS_01	69	3.94	1.85	5.0	1	7	-0.40	-1.22
PERS_02	69	4.17	1.95	5.0	1	7	-0.39	-1.04
PERS_03	69	4.58	1.71	5.0	1	7	-0.56	-0.58
PERS_04	69	4.41	1.67	5.0	1	7	-0.61	-0.53
PERS_05	68	4.49	1.76	5.0	1	7	-0.65	-0.80
RISK_01	70	5.41	1.81	6.0	1	7	-0.82	-0.72
RISK_02	70	6.03	1.35	7.0	1	7	-1.48	1.83
RISK_03	69	5.30	1.83	6.0	1	7	-0.71	-0.90
RISK_04	69	5.86	1.45	6.0	1	7	-1.30	1.01
RISK_05	70	5.49	1.51	6.0	2	7	-0.77	-0.41
SAFE_01	70	3.44	2.17	3.0	1	7	0.15	-1.49
SAFE_02	70	3.36	2.06	3.0	1	7	0.30	-1.38
SAFE_03	67	3.66	2.14	3.0	1	7	0.19	-1.47
SAFE_04	70	4.69	1.81	5.0	1	7	-0.47	-0.80
SAFE_05	70	4.86	1.67	5.0	1	7	-0.66	-0.45

Missingness

Items where many people respond Don't know likely capture constructs that participants are uncertain about and can reduce the effective sample size and item statistics like means and correlations. We decided to drop items exceeding 30% missing responses. Our rationale is that if at least 3 out of 10 individuals were not able to meaningfully engage with an item, dropping it simplifies our scale and allows it to focus on measuring items with generally clearer responses.

Beyond just dropping items with at least 30% missingness, we want to inspect the distribution of missing answers per item. When visually inspecting this distribution we will look for discontinuities, meaning items with a substantially higher proportion of Don't know than other items. We may remove such items even if below our pre-set missingness threshold.

missing_threshold <- 0.3

missing_data <- items_data %>%
  summarise(
    across(all_of(cols_tpsai), ~ mean(is.na(.x)), .names = "{.col}_missing")
  ) %>%
  pivot_longer(
    everything(),
    names_to  = c("item", ".value"),
    names_sep = "_(?=[^_]+$)"
  ) %>%
  separate(item, c("factor", "q_n"), sep = "_", remove = FALSE)

# adding another threshold from visual inspection
v_missing_threshold <- 0.05

ggplot(missing_data, aes(x = missing, fill = factor)) +
  geom_histogram(binwidth = 0.0075) +
  geom_vline(xintercept = missing_threshold) +
  geom_vline(xintercept = v_missing_threshold, linetype="dashed",color='red')

Visual threshold selects items with at least 3.5 missing responses out of 70.

missing_items <- missing_data %>%
  filter(missing > missing_threshold | missing > v_missing_threshold) %>%
  select(item, missing) %>% 
  arrange(desc(missing))

if (nrow(missing_items) > 0) {
  missing_items %>% knitr::kable(digits = 2)
}

item	missing
EMOT_03	0.10
EMOT_05	0.07
CORP_04	0.07
CRIT_03	0.06

Distributional Shapes

Skewness and Kurtosis

We examine skewness and kurtosis for each item to check whether responses are spread across the scale or piled up at one end. If everyone scores high/low on an item, it means that the item is too easy/hard to endorse and cannot discriminate between people with high versus low levels of that trait being measured. Moreover, items where most people gave the same answer have little variation, which makes it harder for factor analysis to detect the underlying structure, regardless of the estimation method used.

Items with negative skew and high positive kurtosis (left tail) may have a “ceiling effect” where an item is too easy to endorse and almost everyone agrees.
Items with positive skew and high negative kurtosis(right tail) may have a “floor effect” where an item is too extreme to endorse and very few people agree.

Avoiding ceiling/floor effects is crucial because if items cluster at the extremes, you lose the ability to distinguish among people at those ends.

Following Kline (2016), we treat absolute skewness and kurtosis values above 1.5 as a sign that an item’s responses are too bunched together to be useful. Items exceeding this threshold will be considered for removal or rewording alongside the other item reduction criteria. Additionally, we may consider keeping items with high skew or kurtosis for comparison or reporting purposes but flag them for removal from factor analysis.

dist_threshold <- 1.5

dist_data <- items_data %>%
  summarise(
    across(all_of(everything()), list(
        skew = ~ psych::skew(.x,    na.rm = TRUE),
        kurt = ~ psych::kurtosi(.x, na.rm = TRUE)
      ), .names = "{.col}_{.fn}")
  ) %>%
  pivot_longer(
    everything(),
    names_to  = c("item", ".value"),
    names_sep = "_(?=[^_]+$)"
  ) %>%
  separate(item, c("subscale", "q_n"), sep = "_", remove = FALSE)

# adding another threshold at 1.25 from visual inspection
v_dist_threshold <- 1.25

ggplot(dist_data, aes(x = skew, y = kurt, colour = subscale, label = item)) +
  geom_point() +
  geom_hline(yintercept=c(-dist_threshold, dist_threshold)) +
  geom_vline(xintercept=c(-dist_threshold, dist_threshold)) +
  geom_hline(yintercept=c(-v_dist_threshold, v_dist_threshold),linetype="dashed",color='red') +
  geom_vline(xintercept=c(-v_dist_threshold, v_dist_threshold),linetype="dashed",color='red') +
  labs(x = "Skewness", y = "Kurtosis")

dist_items <- dist_data %>%
  filter(abs(skew) > v_dist_threshold | abs(kurt) > v_dist_threshold) %>% 
  select(item, skew, kurt)

if (nrow(dist_items) > 0) {
  dist_items  %>% knitr::kable(digits = 2)
}

item	skew	kurt
SAFE_01	0.15	-1.49
SAFE_02	0.30	-1.38
SAFE_03	0.19	-1.47
RISK_02	-1.48	1.83
RISK_04	-1.30	1.01
CRIT_04	-1.34	1.43
CRIT_05	-1.94	4.03
HUMN_01	-1.75	2.36
HUMN_02	-1.75	2.62
HUMN_04	-1.99	3.12
HUMN_05	-2.37	6.24

Visual Inspection

Beyond flagging items with high skew, we can also visually inspect the distribution of each item, looking for items where people consistently avoid using certain points in the scale such as items with responses clustered at the top and end of the scale (Bimodal distribution) or items where some scale points are consistently not used. These items might need to be reworded or removed depending on their distributional shape.

long_data <- items_data %>%
  pivot_longer(everything(), names_to = "item", values_to = "response") %>%
  mutate(factor = sub("_.*", "", item))

y_max <- long_data %>% 
  count(item, response) %>% 
  group_by(item) %>% 
  mutate(prop = n / sum(n)) %>% 
  pull(prop) %>% 
  max()

for (f in unique(long_data$factor)) {
  p <- long_data %>%
    filter(factor == f) %>%
    ggplot(aes(x = factor(response), y = after_stat(prop), group = 1)) +
    geom_bar(fill = "#D85A30") +
    scale_x_discrete(limits = as.character(1:7)) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    coord_cartesian(ylim = c(0, y_max)) +
    facet_wrap(~ item, nrow = 1) +
    labs(x = "", y = "") +
    theme_minimal(base_size = 16)
  
  print(p)
}

Survey Duration

Displaying survey duration for pilot test responses in order to decided how much to grant for future data collection.

ggplot(d$raw_data, aes(x = `Duration (in seconds)`)) +
  geom_histogram(bins = 50) +
  xlim(NA, quantile(d$raw_data$`Duration (in seconds)`, 0.95, na.rm = TRUE)) +
  geom_vline(xintercept = 600)

References

Kline, Rex B. 2016. Principles and Practice of Structural Equation Modeling, 4th Ed. Principles and Practice of Structural Equation Modeling, 4th Ed. New York, NY, US: The Guilford Press.