Research Guide

Start with a clear, meaningful question
Navigate research ethics approval and student vulnerability
Evaluate your project
Report honestly and completely

For additional resources, please visit AAMC’s Scholar Development Pathway for Medical Educators.

Start with a clear, meaningful question

A clear, meaningful question is the anchor for your research, and begins with the intention to improve education and help students. Methods follow questions. Deciding you want to “do a survey”, “conduct interviews”, “run an experiment”, or “implement AI” before clarifying your question guarantees problems. A clear question has four components: specific population, phenomenon or intervention, comparison or context, and measurable outcome. “Does team-based learning help students?” is too vague. “Does team-based learning improve exam performance compared to lectures among first-year students?” immediately suggests an experimental design with exam scores as the primary outcome.

Navigate research ethics approval and student vulnerability

Medical education research involving human participants requires IRB review when a project constitutes research, which is defined as systematic investigation designed to contribute to generalizable knowledge. Quality improvement and program evaluation activities may not require IRB review at all if they meet specific criteria. The university provides a QI/Program Evaluation Self-Certification Tool that allows study teams to independently determine whether their project constitutes research or falls outside IRB purview.

Many education research projects in SMPH qualify for exempt status, which represents an intermediate pathway between full IRB review and activities not requiring review.

While exempt research is relieved from ongoing IRB oversight and full review, researchers must still submit an application through ARROW—only the IRB can officially determine whether a project qualifies for exemption. IRB provides an HRP-312 Worksheet and an Exemption Category Tool to help researchers assess whether their project may qualify for exemption before submitting their application. Regardless of exemption status, all research personnel must complete CITI Human Subjects Protection Training, and the review process typically takes several weeks, so early planning is essential.

Informed consent for research must clearly communicate study purpose, time commitment, risks and benefits, voluntary participation, and data handling procedures. However, protecting student participants requires thoughtful study design beyond consent forms alone. The faculty-student power differential creates inherent coercion concerns, so genuine protections include using anonymous data collection when possible, having someone other than the faculty recruit participants, and separating research from graded activities.

Research involving AI or machine learning technologies requires additional documentation per HRP-337, including the technology’s development stage, parameters limiting AI interactions, monitoring plans for participant safety, and data security measures. Researchers must explicitly address how AI systems will access and protect participant data, whether confidentiality can be guaranteed given AI terms of use, and how the AI might create indirect identifiers that could re-identify participants even in de-identified datasets.

Evaluate your project

Whether you are testing a new teaching method, implementing a curriculum change, or studying how students experience a program, planning your evaluation from the start ensures you collect the right evidence to support meaningful conclusions. Two practical tools help structure this work: a logic model to clarify how your program is supposed to work, and an evaluation matrix to plan what evidence you need and how you will collect it.

Build a Logic Model

A logic model is a visual map of your program’s theory of change. It connects what you invest, what you do, and what you expect to happen—making your assumptions explicit and testable. A basic logic model has five components: inputs (resources you invest, such as faculty time, funding, or technology), activities (what the program does, such as workshops, simulations, or mentoring sessions), outputs (direct products of those activities, such as sessions delivered or students enrolled), short-term outcomes (immediate changes in participants, such as knowledge gains or attitude shifts), and long-term outcomes (downstream effects, such as changes in clinical behavior or patient outcomes).

Example: A new simulation-based curriculum for clinical reasoning

Inputs	Activities	Outputs	Short-term Outcomes	Long-term Outcomes
Coaching faculty; validated clinical reasoning rubric; integrated case bank spanning Phase 1 block content	Longitudinal coaching sessions with structured case-based reasoning practice embedded in PaCE cases	6 coached sessions per block across Phase 1; 175 students participate per cohort	Students demonstrate improved illness script development on block assessments; coaches report more sophisticated reasoning during C&C reviews	Higher diagnostic accuracy on Phase 2 OSCE stations; students identify reasoning gaps earlier in clinical encounters

The logic model forces you to articulate the causal chain before you collect data. If your short-term outcomes don’t change, you know to look at whether the activities were delivered as planned (an implementation problem) rather than concluding the underlying idea doesn’t work. If outputs are strong but outcomes are weak, the theory of change itself may need revisiting.

Develop an Evaluation Matrix

Once your logic model identifies what outcomes matter, an evaluation matrix plans how you will gather evidence for each one. The matrix connects evaluation questions (derived from your logic model’s outcomes) to specific evidence sources, analysis plans, and timelines. An evaluation matrix has six columns: Evaluation Question (what do you need to know?), Evidence (what data will answer this question?), Who (who provides or has access to this data?), When (how often will data be collected?), How (what analysis methods will you use?), and Action Plan (what will you do with the results?).

Example: Evaluating the coached clinical reasoning curriculum

Evaluation Question	Evidence	Who	When	How	Action Plan
Do students perceive coached reasoning practice in PaCE cases as relevant to their clinical development?	Post-session surveys; end-of-block focus groups	Learners	After each coached session (survey); end of block (focus groups)	Descriptive statistics; thematic analysis	Coaching faculty debrief to review themes and revise case bank
Do students improve in illness script development across Phase 1, and do gains transfer to Phase 2 clinical encounters?	Clinical reasoning rubric scores from coached sessions across blocks; clinical reasoning scores on Phase 2 OSCE stations	Learners; coaching faculty; OSCE evaluators	Each block (rubric scores); end of Phase 2 blocks (OSCE)	Growth trajectories across Phase 1 blocks; compare OSCE clinical reasoning scores to the prior cohort; effect sizes with confidence intervals	Identify blocks with weakest reasoning growth and OSCE stations where transfer is weakest; collaborate with block and coaching leadership to revise case sequencing and debriefing approach

The action plan column is critical and often overlooked. Evaluation is not just about measuring—it is about creating a feedback loop that improves the program. Each question should have a clear path from evidence to decision.

Kirkpatrick’s model of program evaluation may also help you think about the level of evidence you are targeting: participant reactions, learning, behavior change, or results. Most curriculum evaluations appropriately focus on the first two levels, while longer-term studies may address behavior change in clinical settings.

Choose Descriptive or Causal Questions

This is the most consequential methodological decision and the one most often muddled. Descriptive research characterizes patterns. Causal research establishes that X causes Y. The distinction fundamentally changes your methods, and confusing them leads to conclusions your data cannot support.

	Descriptive	Causal
Question	Who experiences burnout and what factors are associated with it?	Does peer mentoring decrease student burnout?
Goal	Characterize patterns, prevalence, and associations	Determine whether X causes a change in Y
Typical designs	Surveys, interviews, focus groups, observational studies	Randomized experiments, quasi-experiments, observational studies with causal methods
Sampling	Representative or purposeful sampling	Randomized assignment to conditions (ideal) or careful adjustment for confounders
Analysis	Descriptive statistics, correlations, thematic analysis	Group comparisons, regression with confounder adjustment, effect sizes
Appropriate language	“is associated with,” “correlates with,” “co-occurs with”	“improves,” “reduces,” “has an effect on”

A descriptive burnout study might survey students across years and programs to measure prevalence, identify which factors correlate with higher burnout scores, and characterize when it typically emerges. Qualitative approaches—interviews, focus groups, or observations—might explore how students experience burnout, what they perceive as contributors, and how it manifests in daily life. Either way, the findings would be reported using associative language: burnout was more prevalent among third-year students, certain factors co-occurred with higher scores.

Causal questions require a fundamentally different design—one that accounts for confounding and, ideally, randomization. These studies are resource-intensive and methodologically complex, and they’re rarely the right starting point. If your question is genuinely causal, talk to CISR-UME staff before you design your study. We can help you determine whether a causal approach is feasible and what it would require.

Why this distinction matters in practice

Most medical education research at CISR-UME will be descriptive, and that’s appropriate and valuable. But understanding the causal/descriptive distinction isn’t just a methods question—it directly governs the language you’re allowed to use when reporting findings.

A survey showing that students who attended peer tutoring had higher exam scores cannot conclude that peer tutoring improved performance. It can conclude that attendance was associated with higher performance. Causal language (“improves,” “reduces,” “leads to”) makes a claim your design didn’t test and your data can’t support.

Getting this right protects you in peer review, in presentation Q&A, and in the practical recommendations you make to curriculum leaders.

Report Honestly and Completely

Share Your Work

All CISR-UME projects should be scholarly, but we recognize that dissemination takes many forms and different levels of effort are appropriate for different projects. What matters is that your work reaches the people who can learn from it.

White papers — Document what you did, what you learned, and what you recommend. These preserve institutional knowledge that would otherwise be lost when faculty rotate or programs evolve. Not every project needs a journal article, but every project should be written up. These project can be shared on our website or a repository such as OSF.
Local presentations — Events like Med Ed Day offer low-barrier opportunities to share findings with colleagues, get feedback, and refine your thinking before pursuing external venues.
Regional and national conferences — Organizations such as CGEA, RIME, AMEE, and AAMC host conferences where poster and oral presentations reach broader audiences and invite peer feedback.
Regional journals — Outlets like the Wisconsin Medical Journal serve an important role in disseminating findings relevant to local and regional contexts.
National and international journals — Journals such as Academic Medicine, Medical Education, Teaching and Learning in Medicine, Medical Teacher, and BMC Medical Education reach the widest audiences and undergo rigorous peer review.

Match your dissemination level to the scope and generalizability of your findings. A well-written white paper documenting a curriculum innovation at SMPH is more valuable than an underpowered national publication, and it may be the foundation for a stronger study later. CISR-UME staff can help you identify the right venue for your work.

Report everything honestly and completely: your hypotheses, methods, statistical analyses (confirmatory and exploratory), and use of AI tools. Report any deviations from your study, why they occurred, and their consequences.

Statistical software will produce p-values and effect sizes regardless of whether your analysis makes sense. Match your analysis to your question, report all pre-specified outcomes (not just significant ones), acknowledge limitations explicitly, and use appropriate causal language: “associated with” for associations, “effect on” or “improves” only when you’ve properly addressed confounding.

Be Honest About Significance

Statistical significance (the p-value) tells you the probability of observing an effect as extreme as yours if the null hypothesis were true. Practical significance tells you whether an effect actually matters. These are different questions. For example, a statistically significant OSCE improvement might not move students across pass-fail thresholds. Similarly, a correlation of r = 0.15, 95% CI [0.01, 0.20] may be statistically significant but too weak to guide decisions.

Always report effect sizes with confidence intervals. Then ask: What’s the minimal educationally important difference? Would this change justify the time, effort, or burden? Would it translate to measurable differences in patient care or learner outcomes? Large samples can detect tiny, meaningless differences; small samples miss big ones. Address whether effects justify action, and be honest when you are not certai.

Be Honest About Generalizability

Medical education research typically uses convenience samples, not random samples from defined populations. A study of M1s in their first block may not generalize to other blocks, years, schools, or cohorts. Explicitly describe your sample’s characteristics and educational context. Report who participated, who didn’t, and the specific setting. If response rate was 60%, acknowledge findings represent students who chose to participate. In your discussion, name where findings might reasonably apply and where they likely don’t.

Report AI Use

Transparency about AI assistance safeguards research integrity. Describe which AI tools you used and for what purposes. Authors remain fully responsible for all AI-generated content and cannot attribute errors to AI. Check disclosure requirements for target journals. Learn more about UW’s generative AI services and policies here.