SentiWS
~~~~~~~

SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. It lists positive and negative polarity bearing words weighted within the interval of [-1; 1] plus their part of speech tag, and if applicable, their base form. Some words, for which the polarity weighting, as described in the work mentioned under "Citation", was inconclusive, were manually revised and assigned the minimum weight of their class. 

This version of SentiWS (v2.0b) contains around 1,650 positive and 1,800 negative base forms, which, with their inflections, sum up to around 16,000 positive and around 18,000 negative word forms, respectively. It not only contains adjectives and adverbs explicitly expressing a sentiment, but also nouns and verbs implicitly containing one.


License
~~~~~~~

SentiWS is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/).


Obtain a Copy
~~~~~~~~~~~~~
The latest version of SentiWS can be found at https://wortschatz.uni-leipzig.de/download/.


Data Format
~~~~~~~~~~~
SentiWS is organised in two utf8-encoded CSV files, where every word form (base forms and inflections) has its own entry per line. 
It is structured as follows:

<Word> \t <Polarity weight> \t <POS tag> \t <Base form> \n

Some duplicate base-form-inflection-combinations present in older versions were removed (17 total).


Citation
~~~~~~~~

If you use SentiWS in your work we kindly ask you to cite

R. Remus, U. Quasthoff & G. Heyer: SentiWS - a Publicly Available German-language Resource for Sentiment Analysis.
In: Proceedings of the 7th International Language Ressources and Evaluation (LREC'10), 2010

or use the following BibTeX-code snippet:

@INPROCEEDINGS{remquahey2010,
title = {SentiWS -- a Publicly Available German-language Resource for Sentiment Analysis},
booktitle = {Proceedings of the 7th International Language Resources and Evaluation (LREC'10)},
author = {Remus, R. and Quasthoff, U. and Heyer, G.},
year = {2010}
}


Version History
~~~~~~~~~~~~~~~

SentiWS does not claim to be exhaustive or error-free. It was refined multiple times by adding missing words and word forms and removing ambiguous ones. There might be future updates, which would be released in a new version.

v1.8b, 2010-05-19: First publicly available version as described in Remus et al. (2010).
v1.8c, 2012-03-21: Second publicly available version in which some POS tags were corrected.
v2.0, 2018-10-19: Third publicly available version in which the inflected forms were extended.
v2.0b, 2024-08-12: Fourth publicly available version in which an alternative data format was introduced.


Statistics
~~~~~~~~~~

				Positive	Negative
Adjectives	Baseforms	792		712
		Inflections	10,935		10,467
Adverbs		Baseforms	7		4
		Inflections	5		0
Nouns		Baseforms	548		688
		Inflections	735		1158
Verbs		Baseforms	297		423
		Inflections	3,243		4,572

All		Baseforms	1,644		1,827
		Inflections	14,923		16,209

		Total		16,562		18,024

Table: Overview of the dictionary's content



SentiWS.txt was last updated on 2024-08-12.
