#######################################
##### Contents ########################
#######################################


NN_ratings.csv
  - The original rating data from Graves, Binder and Seidenberg (2013)
  
Files data_XXX_nmf.txt 
  - contain the original data set by Graves, Binder and Seidenberg (2013) together with our additions (plausibility measures etc.)
    (sometimes these data sets are smaller than the original data sets, when compositional vectors could not be computed for all compounds)

File CORE_SS.seidenberg_core_headplusmod.ppmi.nmf_300.rda
  - core space (i.e. corpus-derived vector representations) employed in this study, in the .rda format for R
 
File PER_SS.seidenberg_phrase_headplusmod.CORE_SS.seidenberg_core_headplusmod.ppmi.nmf_300.rda
  - peripheral space, containing the corpus-derived vector representations for the noun compounds of the training set
    (in the .rda format for R)
  
File trainingset.txt
  - contains the items used for training the composition methods that require training 
 
Files COMPOSED_SS.XXX.list_to_compose.txt.dm
  - phrase spaces (containing the compositionally derived vector representations for the compounds) for the methods requiring training:
      Weighted Additive (T), Dilation (T), Full Additive, Head as a lexical function (list_to_compose_head), Modifier as a Lexical function (list_to_compose_mod)

The corpora employed in this study, from which the co-occurrence counts were collected, can be found here:
   - ukwac:      http://wacky.sslmit.unibo.it/
   - BNC:        http://www.natcorp.ox.ac.uk/
   - Wikipedia:  http://en.wikipedia.org

   
   
#######################################
##### Variables in the data files #####
#######################################

#### Plausibility measures in the data sets:

ndensity   : Neighbourhood Density
stemprox   : Head Proximity
stemprox2  : Modifier Proximity
entropy    : Entropy
nounsim    : Constituent Similarity


#### Other vector-based measures:

length     : Vector Length
variance   : Vector Variance

weedsprec, cosweeds, clarkede, invcl    : Entailment measures (see Kruszewski & Baroni, 2014) with respect to the compound head
weedsprec2, cosweeds2, clarkede2, invcl2: Entailment measures (see Kruszewski & Baroni, 2014) with respect to the compound modifier


#### Other psycholinguistic measures

"logfreq_mod":        SUBTLEX log-frequency for the modifier
"logfreq_stem":       SUBTLEX log-frequency for the head
"logfreq_bigram":     SUBTLEX log-frequency for the modifier-head pair
"logfreq_bigram_rev": SUBTLEX log-frequency for the head-modifier pair (i.e. the modifier-head pair in reversed order)

"pmi":                Pointwise Mutual Information for the modifier-head pair


#### Other variables:

For the remaining variables, see the original study by Graves, Binder, & Seidenberg (2013)
