Word Association Thesaurus

 
 
 
 
 

The Edinburgh Associative Thesaurus (EAT) is a set of word association norms showing the counts of word association as collected from subjects. This is not a developed semantic network such as WordNet, but empirical association data.

Interactive Associative Thesaurus

Enter a word in one or both boxes.

Stimulus (Find associations produced from the given word)

Response (Word which was produced as an association to what)


The EAT can be searched in two directions. You can enter a stimulus word and find the words that were produced in association to it; or you can enter a response word that was produced by subjects, and find the list of stimulus words which produced it as a response.

The EAT here is that included in the MRC Psycholinguistic Database, for use with the other measures available their.

Data Returned

The associations return the number of different answers, the count of all answers, and the list of triads of associated types, their individual frequencies, and proportion of occurrence. The proportion of occurrence for a given type is its individual frequency divided by the total count of all answers.

For example, if the word CAT is entered as the stimulus, the output starts as:

cat stimulated the following associations

Number of different answers: 30

Total count of all answers: 95

  • DOG 49 0.52

This states that a list of 30 different answers will be produced, as a result of a total count of 95 people making responses to the stimulus word. The first of the words returned is DOG which has a frequency of 49, in that 49 of the 95 responses was this word DOG, and proportion of occurrence of 0.52 which is the 49 divided by 95 (rounded to two decimal places). The proportion of occurrence can be multiplied by 100 to provide a percentage, in this case 52% of respondents produced DOG as the response to the word CAT.


Uses of the Edinburgh Associative Thesaurus

The EAT has been used for a wide varety of uses including:

  • as a test case of a large, complex network in several studies of networking and graphs - the EAT network contains 23219 vertices (words) and 325624 arcs (stimulus!response), including 564 loops. Almost 70% of arcs have value 1.
  • to provide a psychologically credible basis for semantic linking between words
  • as a source of associations between ideas in advertising.

As an example of the first use, a 2008 study by Randolf Rotta analyses the response stimulus graph at different granularities. A 2012 study by Amancio, Oliveira Jr and Costa uses EAT to analyse the consistency in the use of words. The image below shows the network of words linked to SAD, from a 2004 study by Matjaz Zaversnik and Vladimir Batagelj on islands in networks.

graph of words linking to SAD and each other

An example of the other uses which rely on the content itself can be seen in the image below which shows some of the 84 six node association paths from "peace" to "neutron". These paths show semantic distance between the two words with some psychological validity, and can be used to stimulate creative message design in the media and advertising. The example uses the MS-Windows tool available below, written by Professor Maocheng LIANG.

table of six node association paths from peace to neutron,

Downloadable Word Association Data

A downloadable version of the word association thesaurus are available below.

An XML version of EAT has been created by Guy Lapalme from the University of Montreal, which is available from his web site.

A version of EAT in the format used by the Pajek large network analysis tools is also avaialabe from Vladimir Batagelj at the University of Ljubljana.

EAT for MS-Windows

Professor Maocheng LIANG of National Research Centre for Foreign Language Education, Beijing Foreign Studies University (BFSU), People's Republic of China has ported the data from the Edinburgh Word Association Thesaurus into a stand alone tool for MS Windows which can be used off-line to find word associations, when Internet access is not available. The tool also reports chains of associations between any two words entered which the on-line tool does not. The product of the probablilities of a response to a cue is used as a measure of the semantic distance along the chain of words. No attempt is made to provide semantic labels for the links in the chain which, like any response to a word association cue, could be any of many types (e.g. synonym, antonym, morphologically or phrasally syntagmatic, a property or its value, co-members of a semantic class, case slot fillers to a verb, rhymes, etc...). However theoretically meaningless such chains may be, they may also still be of use in the creative industries in generating acceptible transitions between ideas in media presentations.

The tool can be downloaded in a ZIP archive from the link below. The tool is provided here as Professor Maocheng LIANG made it available and STFC accept no liability for any consequences of using this tool.

DOWNLOAD ZIP Archive


EAT Data Collection Procedure

Data Collection Procedure - excerpted from:

Kiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973) An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W. and Hamilton-Smith, N. (Eds.), The Computer and Literary Studies. Edinburgh: University Press.

STIMULUS WORDS

Since the objective was to obtain a reasonably large complete mapping of the associative network for a large set of words, a systematic procedure of 'growing' the network from a small nucleus was followed. At first responses were obtained from this nucleus set, then these responses were used as stimuli to obtain further responses, and so on. In fact, this cycle was repeated about three times, since by then the number of different responses was so large that they could not all be re-used as stimuli.

The nucleus set was derived from (a) the 200 stimuli used in the Palermo and Jenkins (1964) normq(6) the 1,000 most frequent words of the Thorndike and Lorge (1944) word frequency counf and (c) the basic English vocabulary of Ogden (1954).

Data collection was stopped when 8,400 stimulus words had been used. Only a minimal amount of selection of stimuli was applied in each cycle of the data collection. Effectively all responses which were English words or meaningful verbal units mere included, inchlding some phrasal forms and numerals. The data cover a wide range of grammatical form classes and inflexional forms.

PROCEDURE

Each stimulus word was presented to 100 different subjects. Each subject recieved a computer-printed sheet with 100 stimuli in randomised arrangement (to minimize priming effects). The total contribution of each subject was thus 100 responses. The verbal environment of each word for each subject was different. The instructions asked the subject to write down against each stimulus the first word it made him think of, working as quickly as possible. the total time spent on this task was measured, and most subjects completed the sheet in five to ten minutes.

Most of the data was collected in a classroom setting nuder supervision. Sheets which had more than 25 percent blank responses were rejected and fresh data was collected.

Other Word Association Norms

KENT, G.H. and A.J. ROSANOFF 1910. A study of association in insanity. American Journal of Insanity, 67(37-96). 317-390.

Palermo and Jenkins (1964). Word Association Norms, U of Minnesota Press.

Leo Joseph Postman, Geoffrey Keppel (1970). Norms of word association, Academic Press.

Helen Moss, Lianne Older (1996). Birkbeck word association norms, Psychology Press.

Nelson, D. L., McEvoy, C. L., and Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms.


Contact for issues about the Edinburgh Associative Thesaurus