In 2022, Health Education England (HEE) released a practice paper for the Multi-Specialty Recruitment Assessment (MSRA)1. It included twenty-three Professional Dilemmas, the MSRA rebrand of the Situational Judgement Test (SJT) question. While the name is different, the premise is identical: you are asked to resolve a scenario with competing demands that you may come across as a Foundation Year 2 (F2) doctor. For years HEE had held back on providing practice material for the MSRA, there was a general line that the SJT was not an exam you could revise for. Eventually they relented. What followed was an immediate backlash.
Question eight, of the practice paper, asked you to imagine being an F2 doctor on annual leave. You’re driving to the airport, when you get called by the ward. Not to wish you a nice holiday, but to let you know that your colleague has not shown up for their shift. To complicate matters further you don’t have this colleague's mobile number. Why should this complicate things, you ask? Because the expectation within the answer was that this is your problem to sort out. So much so, that cancelling your holiday to return to the ward and do the shift yourself was considered more correct than simply telling the ward that this isn’t your problem to solve.
For a profession where we are often infantilised and asked to make personal sacrifices this caused much consternation. On twitter, many people asked if the question was real (it was). The GMC education and standards director gave the question a disapproving ‘oh dear’, and the Yorkshire branch of the BMA exclaimed that the question was ‘ludicrous’2,3. In 2023 the paper was re-released with one change, question eight had been removed4. For many people the question brought to attention something that is widely thought - that the Professional Dilemma paper is unfair. Worse yet, for an exam so important to our careers, its answers and logic are so opaque that preparing for it is near impossible.
The SJT began life in 2006 because the GP national recruitment office had a problem5,6,7. Applications were growing and marking the structured Application Form Questions (AFQs) that formed part of the shortlisting process was becoming cumbersome. Markers were noticing similar answers to questions such as “Tell us about a time your communication skills made a difference to a patient”, behind which was a culture of sharing answers, getting help and even buying ‘model’ answers online. A tender offer was put out for a replacement that was won by a private company: the Work Psychology Group (WPG).
The WPG developed a detailed job analysis of the role of an F2 doctor, the level that candidates were expected to be operating at. They then recruited twenty-four GPs, who alongside educational psychologists, developed 186 SJT questions. These were divided into four pilot papers which GP applicants would sit that year. They showed that the SJTs correlated well with performance at the final OSCE-style selection stage; it was also cheaper to operate than the AFQs. With that it was approved. In 2008 the exam went live and became an official part of the GP recruitment process. It is now used in recruitment for 12 different specialty programmes8. This propagation continues even in the face of ongoing concerns around the exam.
In educational psychology there are two ideas that map nicely onto criticisms of the MSRA SJT. The first idea is of construct validity. A construct is ‘what’ an exam is aiming to measure, its validity is how well it’s measuring it. For a driving test its construct would be your driving ability, for GCSE biology it’s your knowledge of the exam syllabus. For the MSRA SJT it’s your soft-skills in the domains of Empathy & Sensitivity, Working under Pressure and Professional Integrity. No one would argue these are not important skills for a doctor to possess, but how well the SJT assesses them, its validity, is a point of conjecture. The WPG would point to various papers showing that outcomes at the SJT correlate with performance in the Foundation Programme and specialty training exams as signs that the exam has validity and is measuring something real and important9,10,11. Others would point to question 8, their own experience of sitting the exam or to the fact that many of those papers were authored by people who work for the WPG as signs the exam needs a major overhaul, or even replacing.
As this is an exam that does not appear to be going anywhere, our main concern has been a second widely held criticism, that the MSRA SJT is nothing more than a Random Number Generator (RNG). That the reason we often feel the answers are nonsensical is, well, because they are. This idea even contributed to the Foundation Programme eventually scrapping their own SJT and replacing it with a literal RNG12. In educational psychology this idea is known as internal consistency. A random exam would have poor internal consistency, i.e. you could have the same person sit 5 variations of the exam and get 5 different grades. A consistent exam would give that same person the same score each and every time. This idea is important because it speaks to the revisability of an exam. An exam that is consistent can be revised for, as long as it’s not measuring something static (like your IQ). We believe that judgement is something that can be modelled and learnt through practice. While you may not change your own intrinsic judgement you can develop a model and feel of the judgement expected by the exam writers. But only if the exam is consistent.
After every round of the MSRA, the WPG sits down to analyse the performance of the exam. They look at various technical indicators which are worked out by a statistician, amongst them is the exam’s Cronbach's Alpha (α) - its internal consistency. From this they produce a report. For the Foundation Programme this was published online for anyone to see, for the MSRA they are marked as confidential and distributed only to the various recruitment offices and NHS England13. A couple of years ago, in an effort to find out the consistency of the MSRA SJT, we made various freedom of information requests to obtain these technical reports. After to-ing and fro-ing they eventually sent them over. A completely consistent exam would have a α of 1.0, a truly random exam 0.0. The MSRA SJT is fairly consistent in having a α of 0.82-0.8314. This follows on from the Foundation Programme’s SJT α of 0.83. For all its faults and subversive logic we can at least say the WPG is reasonably consistent in how they judge responses to SJTs. Importantly this means the MSRA is an exam we should be able to prepare for, but how?
From the age of 15 we have had to sit an array of exams, one after another. We start out doe-eyed and idealistic, reading textbooks and other resource materials to learn things from base principles. Eventually, as we become more time poor and our background knowledge has accumulated, we converge to the tried and tested method of "just do the question bank".
We know that by working through thousands of questions we can refine and refresh our knowledge, identify gaps that need further reading and pass the exam. Yet for the SJT there is no question bank to just do, at least not a validated one.
We take for granted, when we use a question bank that it is valid. That the answers are correct and consistent with the official exam. This is because for most medical exams, correctness is trivial. We know there is an established ground truth, or consensus, and can verify quite easily, for example, what the second-line management for Asthma is. But for the MSRA Professional Dilemmas paper, things are much trickier.
For an SJT question to be valid it must be consistent with the judgement expected within official MSRA questions, otherwise practicing on it will likely lead to worsening of your performance. But how do you validate something as subjective as judgement? For the answer we can take a tour of how the WPG themselves ensure that their questions stay consistent.
Before any SJT question is included within the MSRA it has been worked on for at least one year. The process starts in annual workshops where the WPG recruits specialty trainees, consultants, GPs and Foundation Programme directors. They are instructed to come up with realistic scenarios and to write stems and plausible answers for them15. These questions are then iterated on until they reach a specific threshold of agreement on the answers (concordance) from up to 5 doctors.
Questions are then piloted in a live exam and have their consistency tested against the existing bank. If the question lacks consistency it is removed, if it passes, it is entered into the bank for use in the next rounds exam. Questions already in the bank are routinely reassessed to make sure there is no drift over time. For the past three years we have replicated this writing and concordance process to come up with a provisional bank of 480 new SJT questions, we now plan to validate them.
For validation, we need a ground truth. The MSRA uses its existing bank and the concordance amongst its exam writers. We also plan to use official MSRA questions. Over the past 13 years we’ve been collecting official SJT questions via various releases from Health Education England and other bodies that use WPG produced clinical SJTs . All in, we now have a database of 289 official questions, all of which have been developed and validated by the WPG and their own internal processes. We believe this is the most comprehensive collection of official SJT questions and we’re making it freely available to access via our platform here and in their raw pdf formats here.
For people who sign up to our platform, we can use your responses to these official questions to model the exam. We will then pilot our own questions and using a statistical model known as Item Response Theory can pick out questions that have consistency with the official question set. Importantly only 53 of these official questions are at the same level (F2) as the MSRA, the bulk are at a different level (F1) meaning they will be measuring a subtly different construct. We believe these questions have enough overlap to still have some use, as long as they are viewed more critically. For validation however we will only be using MSRA-level questions or other questions where we can show, with sufficient confidence, that they have a high level of internal consistency with the MSRA-level questions.
Pilot questions that aren’t consistent with the official question set will either be reworked or abandoned. Questions that pass the validation process will then be included in a new question bank that we plan to release closer to Round 1 of the 2027 MSRA. This question bank will have a high internal consistency with the official MSRA and, we believe, will give people a distinct advantage in improving their performance on this exam. We plan to validate it further by obtaining and matching your performance on the official exam with that of our question bank.
To maintain the advantage we believe that our validated question bank will give you, we will be limiting sign ups once this is available. People who signed up to our free tier to help us validate it will automatically be prioritised. Access to our bank of official SJT questions will always remain free and will not be capped.
Questions around the fairness of MSRA Professional Dilemmas paper remain. But while it has such an oversized impact on our careers, our main focus is on ‘beating’ the exam, by developing an edge that will allow us to improve our performance and, ultimately, to rank higher.