Baddeley 1966 Case Study

Research into the capacity of STM

Miller (1956) identified that people have a digit span of 7 items +/- 2 items. This means that on average we are able to remember between 5- 9 numbers in our short term memory
Grouping items into chunks of 3 items can enhance the capacity of the Short term memory especially if the chunks are meaningful to the individual.

Research into the duration of STM
Peterson & Peterson (1959)

Aim: To test how long STM lasts when rehearsal is prevented.
  • Participants were presented with sets of trigrams (nonsense syllables in sets of three, eg. BCM, CPW) which they were then asked to recall in order after a delay of 3, 6, 9, 12, 15 and 18 seconds.
  • In order to prevent rehearsal, participants were given an interference task of counting backwards in threes from a random three digit number to prevent rehearsal (known as the Brown-Peterson technique).
  • Recall had to be 100% accurate and in the correct order in order for it to count as correctly recalled.  

The percentage recall was: After 3 SECONDS = 80% After 18 SECONDS = LESS THAN 10% 
Recall decreases steadily between 3 and 18 seconds suggesting that the duration of the STM is not much more than 18 seconds.
Conclusions: The memory trace in the STM has just about disappeared after 18 seconds. Information held in the STM is quickly lost without rehearsal. This supports the hypothesis that the duration of the STM is limited to approximately 20 seconds. They also concluded that this is evidence that the STM is distinct from the LTM as the LTM has a much longer duration.

A02 Criticisms:
1. The trigrams are artificial and they do not reflect everyday memory.

2. I may be that interference for earlier trigrams may cause poor recall, not simply decay.
3. A strength is that the research method employed (laboratory experiment) allows us to see that causal effect of time passing (IV) on recall of the trigrams (DV).

Research into duration of the LTM
Bahrick et al (1975)

Investigation of the duration of very-long-term memory (VLTM). 
Tested the duration by testing recall of real-life information.

Participants included 392 American ex-high school students aged 17-74.
Recall was tested in four ways:
1. Free recall of the names of as many of their former classmates as possible.
2. A photo recognition test where they were asked to identify former classmates in a set of 50 photos, only some of which were their classmates.
3. A name recognition test.
4. A name and photo matching test.

90% accuracy in FACE AND NAME RECOGNITION after 34 YEARS 
80% accuracy for NAME RECOGNITION after 48 YEARS 
40% accuracy for FACE RECOGNITION after 48 YEARS 
60% accuracy for FREE RECALL after 15 YEARS 
30% accuracy for FREE RECALL after 30 YEARS

Name matching condition were 90% accurate after 14 years and 60% accurate after 47 years. 
Where as the recognition group were 60% accurate after 7 years and less than 20% accurate after 47 years.

Classmates are rarely forgotten, but cues are sometimes needed.
Recognition was better then recall.

  • A positive is that this study uses meaningful stimuli to test subjects (high-school year books) and they tested peoples memories from their own lives.
  • A weakness is that it is unclear whether the drop off in accuracy at 47 years is due to the limits of duration or a general decline with memory as we become older.

Research into encoding in STM & LTM
Baddeley (1966)

Aim; To explore the effects of acoustic and semantic coding in Short term memory and long term memoryProcedures;
  • In the STM study, participants were asked immediately after presentation, to recall, in serial order, a list of five words taken from a pool of words in the following categories.
-acoustically similar words, (eg  man, mad, map): Words that sound the same
-acoustically dissimilar words (eg pen, day few): Words that sound different
-semantically similar words (eg  great, big, large): Words that have the same meaning
-semantically dissimilar words (eg  hot, old, late): Words that have a different meaning
  • In the LTM study, each list of words was extended to ten, and recall was tested after an interval of twenty minutes.
  • Words with similar sounds were much harder to recall using STM than words with dissimilar sounds
  • Similarity of meaning had only a very slight detrimental effect on STM
  • When participants were recalling from LTM, recall was much worse for semantically similar words than for semantically dissimilar words
  • Recall from LTM was the same for acoustically similar and acoustically dissimilar words
  • STM relies heavily on acoustic coding
  • LTM primarily makes use of semantic coding
  • The use of the experimental method allows a causal link to be drawn between type of coding used in STM and LTM and the accuracy of recall, since it allows the control of extraneous variables, high in Validity and Reliability.
  • It is scientific in it’s approach, which also adds credibility to the research
  • The conclusions of this study may not reflect the complexities of encoding. Evidence from other studies shows that, in certain circumstances, both STM and LTM can use other forms of coding.
This is the Classic Cognitive Study, so every student has to know it and the Examiner will expect you to be familiar with details. As well as general questions about the Aims, Procedure, Results & Conclusions (APRC), you could get fairly specific questions on how Baddeley tested memory or how the different groups performed. However, remember that there are 3 different experiments reported in the 1966b paper and although we're covering the 3rd one, candidates can in fact write about any of them.

Baddeley (1966b)

The First Two

Baddeley's Study AO1

Evaluating Baddeley AO3

Exemplar Essay

Baddeley FAQ

BADDELEY (1966b)

This study was carried out by Alan Baddeley in the ‘60s. Baddeley (and Hitch) went on to develop the Working Memory Model in the 1970s so this study is quite important as part of the background to that theory. It charts Baddeley’s growing realisation that memory was in fact more complicated than the Multi Store Model made out.

This study is significant for students in other ways:
  • It shows how scientific research proceeds, because Baddeley carried out 3 experiments, performing one that produced baffling results, a second that corrected the first, then the third that you are studying.
  • It illustrates features of the Cognitive Approach, since it uses the experimental method to try to isolate and measure functions of memory that are so subtle we don’t normally realise they are going on
  • It illustrates the power of the experimental method, making use of clever experimental controlsto isolate and remove confounding (extraneous) variables
  • It shows the importance of experimental design, since it uses both Independent Groups and Repeated Measures.


Baddeley (1966b) reports three linked experiments. The Unit 1 Exam will only assess you on the 3rd exp[eriment. However, it's helpful to have a basic awareness of the earlier two experiments, to see how Baddeley perfected his procedures and identified the variables he was controlling.
Baddeley started off trying to test Long Term Memory (LTM). He gave participants four trials at learning the order of a list of words. Then he used a 20 minute delay (to remove Short Term Memory or STM) and then asked participants to recall as many words as possible in order. He compared their score in the 5th trial with their score in the 4th trial 20 minutes earlier to see how much they had forgotten.

Baddeley expected the participants who had to remember the order of rhyming words would struggle less than those who had to remember the order of words with similar meanings. This is because he thought LTM worked by semantic encoding and would be confused by the similarity of meanings but not by the similarity of sounds.

Baddeley’s results weren’t what he expected and he realised that the participants’ STM was helping their LTM out, with the two memory stores working together. To remove this confounding variable, he carried out a second test. This time the participants would have to perform an interference task after hearing the list for words. This seemed to work, because it cancelled out STM and meant that the participants were only using LTM to perform their recall tasks.

With this technique in place, Baddeley then carried out his third test, which is described below. He made one more change, adding in a slide show rather than tape recordings of the word lists, because he was disqualifying participants who couldn’t hear well.

  • Notice Baddeley’s use of experimental controls. The first is the 20 minute delay to allow “forgetting” to take place. The second is the interference task which makes it hard for participants to use their STM to remember the words from the word list.
  • Also notice Baddeley’s scientific approach. When his results don’t match what his hypothesis predicts, he suspects a confounding variable is at work. He designs a further experiment with more detailed controls to try to isolate the confounding variable and remove it.

To find out if LTM encodes acoustically (based on sound) or semantically (based on meaning). This is done by giving participants word lists that are similar in the way they sound (acoustic) or their meaning (semantic); if the participants struggle to recall the word order, it suggests LTM is confused by the similarity which means that this is how LTM tends to encode.


This lab experiment has several IVs. (1) Acoustically similar word list or acoustically dissimilar; (2) semantically similar word list or semantically dissimilar; (3) performance before 15 minutes “forgetting” delay and performance after.

IVs (1) and (2) are tested using Independent Groups design but IV (3) is tested through Repeated Measures.


Score on a recall test of 10 words; words must be recalled in the correct order (really, this is a test of remembering the word order, not the words themselves)


Men and women from the Cambridge University subject panel (mostly students); they were volunteers. There were 72 altogether, a mixture of men and women. There were 15-20 in each condition (15 in Acoustically Similar, 16 in Semantically Similar).


The participants are split into four groups, according to IV (1) and (2). Each group views a slideshow of a set of 10 words. Each word appears for 3 seconds.

In the Acoustically Similar condition, the participants get a list of words that share a similar sound (man, cab, can, max, etc) but the Control group get words that are all simple one syllable words but they do not sound the same (pit, few, cow, pen, etc).

In the Semantically Similar condition, the words share a similar meaning (great, large, big, huge, etc) but the Control group get words that are unconnected (good, huge, hot, safe, etc).

The participants in all 4 conditions then carry out an “interference test” which involves hearing then writing down 8 numbers three times. Then they recall the words from the slideshow in order.

There are four “trials” and (as you would expect) the participants’ get better each time they do it because the words stay the same. The words themselves are displayed on signs around the room so the participants only have to concentrate on getting the ORDER of the words right, not remembering the words themselves.

After the 4th trial, the participants get a 15 minute break and perform an unrelated interference task. Then they are asked to recall the list again. This fifth and final trial is unexpected. The words themselves are still on display; it is the order of the words the participants have to recall.


Baddeley was interested to see whether Acoustic or Semantic Similarity made it harder to learn the words. He compared the scores of the participants in the Similar and Control conditions and paid particular attention to whether they recalled as well in the 5th “forgetting” trial or whether there was a drop-off in scores.
Acoustically similar words seem to be confusing at first, but participants soon “catch up” with the Control group and even overtake them, but this isn’t statistically significant. Notice how LTM is not confused by acoustic similarities – scores on the last test are similar to the 4th trial, suggesting no forgetting has taken place.
Semantically similar words do seem to be confusing and the experimental group lags behind the Control group. In fact, the experimental group never catches up with the Control group and performs worse overall than the Acoustically Similar group above. Very little forgetting takes place, but scores are lower.


Baddeley concludes that LTM encodes semantically, at least primarily. His earlier experiments suggest STM encodes acoustically.

This is why LTM gets confused when it has to retrieve the order words which are semantically similar: it gets distracted by the semantic similarities and muddles them up. It has no problem retrieving acoustically similar words because LTM pays no attention to how the words sound.

The “slow start” in the Acoustically Similar condition would be because the interference task doesn’t block STM 100% - some of the words linger on in the rehearsal loop. This means in most conditions, the participants’ LTM gets a bit of help from STM. But in the Acoustically Similar condition, STM gets confused by the similar sounds the way that LTM gets confused by similar meanings. It can’t be of much help so this group lags behind the Controls until all the words are encoded in LTM, at which point the two groups finally get similar scores.


Describing Studies can be done by following the pattern of A-P-R-C, which stands for Aims, Procedures, Results, Conclusions. "Methods" is a term that often covers Aims and Procedures, while "Findings" covers Results and Conclusions.

Baddeley has a large sample of 72. Any anomalies (people will unusually good or bad memories) will be “averaged out” in a sample this size. This suggests you can generalise from this sample.

However, there were so many conditions in this study that each group only had 15-20 people in it. That’s not a lot. Only 15 people did the Acoustically Similar condition. An anomaly could make a difference to scores with numbers that small.

The sample was made up of British volunteers. It might be that there is something unusual about the memories of British or the memorable qualities of British words. However this is unlikely. LTM works the same for people from all countries, speaking all languages, so this sample is probably representative.

However, a volunteer sample might have more people with parrticularly good memories who enjoy doing memory tests - not representative of people in general.


This is a great example of a reliable study because it has standardised procedures that you could replicate yourself. You wouldn’t need special equipment and you could use exactly the same words that Baddeley used.

Baddeley improved the reliability of his own study by getting rid of the read-aloud word lists (some participants had hearing difficulties) and replacing them with slides. Everyone saw the same word for the same amount of time (3 seconds).


The main application of this study has been for other Cognitive Psychologists, who have built on Baddeley’s research and investigated LTM in greater depth. Baddeley’s use of interference tasks to control STM has been particularly influential. Baddeley & Hitch built on this research and developed a brand new memory model – Working Memory.

Another application is for your own revision. If LTM encodes semantically, it makes sense to revise using mind maps that use semantic links. However, reading passages out loud over and over (rote learning) is acoustic coding, but LTM doesn’t seem to work this way, so it won't be as effective.


Baddeley took trouble to improve the internal validity of his experiment. He used controls to do this. Rather than getting participants to recall words, he asked them to recall word order(with the words themselves on display the whole time). This reduced the risk that some words would be hard to recall because they were unfamiliar or others easy to recall because they had associations for the participants.

However, the ecological validity of this study is not good. Recalling lists of words is quite artificial but you sometimes have to do it (a shopping list, for example). Recalling the order of words is completely artificial and doesn’t resemble anything you would use memory to do in the real world.

Baddeley did improve this. For example, he made the 5th “forgetting” trial a surprise that the participants weren’t expecting. This is similar to real life, where you are not usually expecting it when you are asked to recall important information.


There are no significant ethical issues with this study so do not bring up ethics when evaluating it.


Evaluating Studies can be done using the mnemonic G-R-A-V-E, which reminds you about Generalisability, Reliability, Applications, Validity and Ethics. These issues are often used in the wording of exam questions.

An 8-mark essay on the classic study

Evaluate the classic study from cognitive psychology. (8 marks)
  • A 8-mark “evaluate” question awards 4 marks for AO1 (Describe) and 4 marks for AO3 (Evaluate). To get a top band mark (7-8) you MUST add a conclusion.
Baddeley had a very reliable experiment. In fact, he replicated it 3 times, improving the procedures each time. He used the same lists of words, gave the participants the same amount of time and tested them in the same way. This is called following standardised procedures.

Baddeley improved the validity of his study by using controls. He added an interference task (writing down lists of numbers) before each trial to “block” the STM and make sure only LTM was being used. He also presented the words on slides because he didn’t want to disqualify people for having bad hearing.

However, Baddeley’s study lacks ecological validity because it is unrealistic. Learning lists of similar sounding or similarly themed words is not an ordinary activity. As with most memory tests, there was nothing at stake and no reason for participants to try hard to remember.

Baddeley had a big sample which is probably representative. However, there were 4 different conditions and one of them only had 15 people in it. This is quite a small group where an anomaly (someone with an unusual memory) might skew the results.
In conclusion, Baddeley's study made a big contribution to our undertstanding of memory and provided the basis for Tulving's later work into Semantic LTM and Baddeley & Hitch's theory of working memory.
Notice that for a 8-mark answer you don’t have to include everything Baddeley did. I haven’t mentioned the Control groups or the fact that the words were posted up for participants to see the whole time. But I have tried to make the two halves – Description and Evaluation – evenly balanced.

Frequently Asked Questions

Why do participants do worse in the Semantic Similarity condition? Surely they should do better if LTM is good at semantic things

What LTM is good at is semantic differences. It's as if LTM is specially designed to "put things into different boxes" based on meaning. This is why we're very good at remembering oddities - the "odd one out". If I showed you a collection of tools (like a hammer and a screwdriver and a spanner, etc) and in among them was an apple, then I asked you to recall what I'd shown you, you'd recall the apple straight away. It stood out. This is LTM doing what it's good at: sorting things into meaningful categories.

Baddeley's experiment puts LTM under some strain because one of the groups gets a list where all the words mean the same thing. There's no semantic difference between the words great, large and huge. This gives LTM "nothing to hold on to". As far as LTM is concerned, the words are all as good as each other; they're interchangeable. Then Baddley asked the participants to recall the exact order the words were in. But that's rather hard to do with interchangeable words.

You can see that the Control Group didn't have this problem. When given words like good, huge and hot to remember, LTM puts them into different categories. Good goes into "words to do with nice things" and huge goes into "words to do with big things" and hot goes into "fire words" and so on. So when the Control Group had to recall the words in order, their LTM told them that the first word was something nice, the next word was something big and the third word was something to do with fire: good, huge, hot. They got much higher scores.

This is why mind maps make such good revision tools. When you create a mind map, you deliberately put your information into different semantic categories and LTM soaks it up like a sponge. This is particularly important if the stuff you're learning seems like it all belongs to one big semantic category (like "names of boring psychologists"). You have to introduce semantic differences to get LTM to work at its best.
What were the interference tasks for?

In his first experiment, Baddeley didn't use an interference task. The participants listened to the 10 words then immediately wrote them down in the order they heard them. The results were a mess and didn't show a clear pattern.

Baddeley realised that STM was joining in the recall task. Participants were holding the words in the Rehearsal Loop in STM. This is a good example of a confounding variable that makes the study invalid.

The interference task is designed to "block" STM by giving you something to concentrate on that lasts longer than the 20 second duration of STM and involves more capacity than the 7-9 things STM can handle at once. This is a good example of an experimental control that reduces a confounding variable.

So in the final experiment, the participants studied the words on the screen, did the interference task (which took about two minutes), then recalled the words in order. Because STM has been "blocked", they were only using their LTM to do the recalling.
Why did the Acoustic Similarity condition do so badly compared to the Control Group?

They only did badly in the first two trials. In trials 3 and 4 they caught up and in the Forgetting Test they overtook the Controls.

This is because the interference tasks Baddeley used weren't 100% successful at "blocking" STM. Despite having to listen to numbers and write them down, they still squeezed in a bit of rehearsal using STM. In 3 of the conditions, this made very little difference, but in the Acoustic Similarity condition it did. This is because STM finds similar-sounding words "slippery" and interchangeable, in the same way that LTM finds similarly-themed words interchangeable.

So STM wasn't much help in the Acoustic Similarity condition; these participants had to rely entirely on LTM. It's not so much that the Acoustic Similarity group did worse than the other groups; it's more that the other groups all got a little boost from STM in the first couple of trials but the Acoustic Similarity condition didn't get this boost.

Therefore, this result is a bit of extra proof that STM encodes acoustically.
Why did the participants have to recall the words in order? What was the point of that?

There are two answers to this.

The simple answer is that recalling words is not a very reliable test of memory. This is because some words may be easier or harder to remember than others. For example, one of the words in the Semantically Similar condition was broad. Now, in the 1960s, broad was also an American slang term for a woman. Some participants might have been more likely to remember broad because it "stuck out" from the list. Another word was fat, which has humorous connections for some people. Baddeley didn't ask participants to recall the words themselves and he posted the words up on signs around the room. He asked participants to recall the word order. This task shouldn't be affected by certain words being funny or unusual.

The more complicated answer is that recalling the words in order is more of a challenge for LTM. If Baddeley had just asked participants to recall the words in any order, participants might have recalled 7 or 8, then "guessed" the remaining 2 or 3. Because the words are all from the same semantic category, they could easily have guessed right by writing down words that mean the same as "big". Baddeley wouldn't know whether the participants got their scores because they had really remembered the words, or because they had guessed correctly.

Of course, participants could guess the word order too, but the semantic similarity doesn't help them do that. Baddeley can be much more confident that participants who wrote the words in the right order had really remembered the words.

0 Thoughts to “Baddeley 1966 Case Study

Leave a comment

L'indirizzo email non verrĂ  pubblicato. I campi obbligatori sono contrassegnati *