2023-03-31T12:08:43,151 Created temporary directory: /tmp/pip-ephem-wheel-cache-s70io6nb 2023-03-31T12:08:43,157 Created temporary directory: /tmp/pip-build-tracker-r3kq4i7t 2023-03-31T12:08:43,158 Initialized build tracking at /tmp/pip-build-tracker-r3kq4i7t 2023-03-31T12:08:43,158 Created build tracker: /tmp/pip-build-tracker-r3kq4i7t 2023-03-31T12:08:43,158 Entered build tracker: /tmp/pip-build-tracker-r3kq4i7t 2023-03-31T12:08:43,160 Created temporary directory: /tmp/pip-wheel-pbyqzit3 2023-03-31T12:08:43,169 DEPRECATION: --no-binary currently disables reading from the cache of locally built wheels. In the future --no-binary will not influence the wheel cache. pip 23.1 will enforce this behaviour change. A possible replacement is to use the --no-cache-dir option. You can use the flag --use-feature=no-binary-enable-wheel-cache to test the upcoming behaviour. Discussion can be found at https://github.com/pypa/pip/issues/11453 2023-03-31T12:08:43,176 Created temporary directory: /tmp/pip-ephem-wheel-cache-ujl371ve 2023-03-31T12:08:43,230 Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple 2023-03-31T12:08:43,237 2 location(s) to search for versions of lex2sent: 2023-03-31T12:08:43,237 * https://pypi.org/simple/lex2sent/ 2023-03-31T12:08:43,237 * https://www.piwheels.org/simple/lex2sent/ 2023-03-31T12:08:43,238 Fetching project page and analyzing links: https://pypi.org/simple/lex2sent/ 2023-03-31T12:08:43,239 Getting page https://pypi.org/simple/lex2sent/ 2023-03-31T12:08:43,242 Found index url https://pypi.org/simple 2023-03-31T12:08:43,470 Fetched page https://pypi.org/simple/lex2sent/ as application/vnd.pypi.simple.v1+json 2023-03-31T12:08:43,473 Skipping link: No binaries permitted for lex2sent: https://files.pythonhosted.org/packages/9f/23/d4e07b91d2157bb1e51525b26c7ae839774e87afa1ba8a72cef73cbada8d/lex2sent-0.0.1-py3-none-any.whl (from https://pypi.org/simple/lex2sent/) 2023-03-31T12:08:43,474 Found link https://files.pythonhosted.org/packages/da/ba/1ff3079efb210af9b70660dc9d629071307ab2930bd4d49212b90d5c9ca2/lex2sent-0.0.1.tar.gz (from https://pypi.org/simple/lex2sent/), version: 0.0.1 2023-03-31T12:08:43,475 Skipping link: No binaries permitted for lex2sent: https://files.pythonhosted.org/packages/02/b5/b3b4b6f0f3439d4f9de8755ba0331a09bb96a13c57c0b552857cd44dff0c/lex2sent-0.0.2-py3-none-any.whl (from https://pypi.org/simple/lex2sent/) 2023-03-31T12:08:43,475 Found link https://files.pythonhosted.org/packages/53/03/3042faa6bec7661cccc59f3e0a544e5f4b320c6c43650dce35001031be58/lex2sent-0.0.2.tar.gz (from https://pypi.org/simple/lex2sent/), version: 0.0.2 2023-03-31T12:08:43,476 Fetching project page and analyzing links: https://www.piwheels.org/simple/lex2sent/ 2023-03-31T12:08:43,477 Getting page https://www.piwheels.org/simple/lex2sent/ 2023-03-31T12:08:43,479 Found index url https://www.piwheels.org/simple 2023-03-31T12:08:43,691 Fetched page https://www.piwheels.org/simple/lex2sent/ as text/html 2023-03-31T12:08:43,693 Skipping link: not a file: https://www.piwheels.org/simple/lex2sent/ 2023-03-31T12:08:43,694 Skipping link: not a file: https://pypi.org/simple/lex2sent/ 2023-03-31T12:08:43,727 Given no hashes to check 1 links for project 'lex2sent': discarding no candidates 2023-03-31T12:08:43,757 Collecting lex2sent==0.0.1 2023-03-31T12:08:43,762 Created temporary directory: /tmp/pip-unpack-9z2zar7u 2023-03-31T12:08:43,996 Downloading lex2sent-0.0.1.tar.gz (11 kB) 2023-03-31T12:08:44,068 Added lex2sent==0.0.1 from https://files.pythonhosted.org/packages/da/ba/1ff3079efb210af9b70660dc9d629071307ab2930bd4d49212b90d5c9ca2/lex2sent-0.0.1.tar.gz to build tracker '/tmp/pip-build-tracker-r3kq4i7t' 2023-03-31T12:08:44,071 Running setup.py (path:/tmp/pip-wheel-pbyqzit3/lex2sent_c311994db98a49ccb7d3876370a9d3eb/setup.py) egg_info for package lex2sent 2023-03-31T12:08:44,073 Created temporary directory: /tmp/pip-pip-egg-info-llo78hy4 2023-03-31T12:08:44,073 Preparing metadata (setup.py): started 2023-03-31T12:08:44,075 Running command python setup.py egg_info 2023-03-31T12:08:45,623 # Lex2Sent - A bagging approach to unsupervised Sentiment Analysis 2023-03-31T12:08:45,624 Lex2Sent is a text classification/clustering model that can be used with minimal a-priori-information to classify texts into two classes. While the [original paper](https://doi.org/10.48550/arXiv.2209.13023) used it for sentiment analysis on english documents, it is not limited to that purpose, but can be used for any arbitrary type of classification and language as long as there are lexica that can be used as an information-basis. 2023-03-31T12:08:45,625 ## Getting Started 2023-03-31T12:08:45,625 You may install this package using either pypi 2023-03-31T12:08:45,626 ``` 2023-03-31T12:08:45,626 pip install lex2sent 2023-03-31T12:08:45,626 ``` 2023-03-31T12:08:45,627 or GitHub 2023-03-31T12:08:45,627 ``` 2023-03-31T12:08:45,628 pip install git+https://github.com/K-RLange/Lex2Sent.git 2023-03-31T12:08:45,628 ``` 2023-03-31T12:08:45,629 The following is an example of using the Opinion Lexicon to classify an iMDb movie review data set. You may have to use ```nltk.download()``` to download the opinion_lexicon first. 2023-03-31T12:08:45,629 First we configure our data set 2023-03-31T12:08:45,629 ``` 2023-03-31T12:08:45,629 from datasets import load_dataset 2023-03-31T12:08:45,630 from nltk.corpus import opinion_lexicon 2023-03-31T12:08:45,630 data = load_dataset('imdb') 2023-03-31T12:08:45,630 ratings, reviews = [], [] 2023-03-31T12:08:45,631 for stars, text in zip(data["train"]["label"], data["train"]["text"]): 2023-03-31T12:08:45,631 if text: 2023-03-31T12:08:45,631 if stars == 0: 2023-03-31T12:08:45,632 ratings.append("negative") 2023-03-31T12:08:45,632 else: 2023-03-31T12:08:45,632 ratings.append("positive") 2023-03-31T12:08:45,633 reviews.append(text) 2023-03-31T12:08:45,633 ``` 2023-03-31T12:08:45,633 And now we can start applying Lex2Sent 2023-03-31T12:08:45,634 ``` 2023-03-31T12:08:45,634 from lex2sent.textClass import * 2023-03-31T12:08:45,634 lexicon = ClusterLexicon([opinion_lexicon.positive(), opinion_lexicon.negative()]) 2023-03-31T12:08:45,634 rated_texts = RatedTexts(reviews, lexicon, ratings) 2023-03-31T12:08:45,635 #Basic "counting" method of classification: 2023-03-31T12:08:45,635 count_res = rated_texts.lexicon_classification_eval(label_list=["positive", "negative"]) 2023-03-31T12:08:45,636 l2s_res = rated_texts.lbte(label_list=["positive", "negative"], workers=4) 2023-03-31T12:08:45,636 print("Counting accuracy: {}%; Lex2Sent accuracy: {}%".format(count_res * 100, l2s_res*100)) 2023-03-31T12:08:45,636 ``` 2023-03-31T12:08:45,637 yielding the result "Counting accuracy: 73.772%; Lex2Sent accuracy: 78.172%". 2023-03-31T12:08:45,637 ## Reference 2023-03-31T12:08:45,638 Please refer to ["Lex2Sent - A bagging approach to unsupervised Sentiment Analysis"](https://doi.org/10.48550/arXiv.2209.13023) when using this package. When you use this package in a publication, please cite it as 2023-03-31T12:08:45,638 ``` 2023-03-31T12:08:45,638 @misc{lex2sent, 2023-03-31T12:08:45,639 title = {{Lex2Sent}: {A} bagging approach to unsupervised sentiment analysis}, 2023-03-31T12:08:45,639 shorttitle = {{Lex2Sent}}, 2023-03-31T12:08:45,639 publisher = {arXiv}, 2023-03-31T12:08:45,639 author = {Lange, Kai-Robin and Rieger, Jonas and Jentsch, Carsten}, 2023-03-31T12:08:45,640 month = sep, 2023-03-31T12:08:45,640 year = {2022}, 2023-03-31T12:08:45,640 note = {arXiv:2209.13023 [cs]}, 2023-03-31T12:08:45,641 keywords = {Computer Science - Computation and Language}, 2023-03-31T12:08:45,641 } 2023-03-31T12:08:45,641 ``` 2023-03-31T12:08:45,642 ## Future Features 2023-03-31T12:08:45,642 -Calling from the console 2023-03-31T12:08:45,643 -FastText and SentenceBERT as alternatives to Doc2Vec 2023-03-31T12:08:45,643 -Options to classify into more than two clusters 2023-03-31T12:08:45,644 running egg_info 2023-03-31T12:08:45,644 creating /tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info 2023-03-31T12:08:45,693 writing /tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/PKG-INFO 2023-03-31T12:08:45,698 writing dependency_links to /tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/dependency_links.txt 2023-03-31T12:08:45,703 writing requirements to /tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/requires.txt 2023-03-31T12:08:45,705 writing top-level names to /tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/top_level.txt 2023-03-31T12:08:45,708 writing manifest file '/tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/SOURCES.txt' 2023-03-31T12:08:45,913 reading manifest file '/tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/SOURCES.txt' 2023-03-31T12:08:45,916 adding license file 'LICENSE' 2023-03-31T12:08:45,921 writing manifest file '/tmp/pip-pip-egg-info-llo78hy4/lex2sent.egg-info/SOURCES.txt' 2023-03-31T12:08:46,039 Preparing metadata (setup.py): finished with status 'done' 2023-03-31T12:08:46,051 Source in /tmp/pip-wheel-pbyqzit3/lex2sent_c311994db98a49ccb7d3876370a9d3eb has version 0.0.1, which satisfies requirement lex2sent==0.0.1 from https://files.pythonhosted.org/packages/da/ba/1ff3079efb210af9b70660dc9d629071307ab2930bd4d49212b90d5c9ca2/lex2sent-0.0.1.tar.gz 2023-03-31T12:08:46,053 Removed lex2sent==0.0.1 from https://files.pythonhosted.org/packages/da/ba/1ff3079efb210af9b70660dc9d629071307ab2930bd4d49212b90d5c9ca2/lex2sent-0.0.1.tar.gz from build tracker '/tmp/pip-build-tracker-r3kq4i7t' 2023-03-31T12:08:46,064 Created temporary directory: /tmp/pip-unpack-_3u3ao9y 2023-03-31T12:08:46,065 Building wheels for collected packages: lex2sent 2023-03-31T12:08:46,074 Created temporary directory: /tmp/pip-wheel-4n9z2hhj 2023-03-31T12:08:46,075 Building wheel for lex2sent (setup.py): started 2023-03-31T12:08:46,077 Destination directory: /tmp/pip-wheel-4n9z2hhj 2023-03-31T12:08:46,077 Running command python setup.py bdist_wheel 2023-03-31T12:08:47,153 # Lex2Sent - A bagging approach to unsupervised Sentiment Analysis 2023-03-31T12:08:47,154 Lex2Sent is a text classification/clustering model that can be used with minimal a-priori-information to classify texts into two classes. While the [original paper](https://doi.org/10.48550/arXiv.2209.13023) used it for sentiment analysis on english documents, it is not limited to that purpose, but can be used for any arbitrary type of classification and language as long as there are lexica that can be used as an information-basis. 2023-03-31T12:08:47,155 ## Getting Started 2023-03-31T12:08:47,156 You may install this package using either pypi 2023-03-31T12:08:47,156 ``` 2023-03-31T12:08:47,156 pip install lex2sent 2023-03-31T12:08:47,157 ``` 2023-03-31T12:08:47,157 or GitHub 2023-03-31T12:08:47,157 ``` 2023-03-31T12:08:47,158 pip install git+https://github.com/K-RLange/Lex2Sent.git 2023-03-31T12:08:47,158 ``` 2023-03-31T12:08:47,159 The following is an example of using the Opinion Lexicon to classify an iMDb movie review data set. You may have to use ```nltk.download()``` to download the opinion_lexicon first. 2023-03-31T12:08:47,159 First we configure our data set 2023-03-31T12:08:47,159 ``` 2023-03-31T12:08:47,160 from datasets import load_dataset 2023-03-31T12:08:47,160 from nltk.corpus import opinion_lexicon 2023-03-31T12:08:47,160 data = load_dataset('imdb') 2023-03-31T12:08:47,161 ratings, reviews = [], [] 2023-03-31T12:08:47,161 for stars, text in zip(data["train"]["label"], data["train"]["text"]): 2023-03-31T12:08:47,161 if text: 2023-03-31T12:08:47,162 if stars == 0: 2023-03-31T12:08:47,162 ratings.append("negative") 2023-03-31T12:08:47,162 else: 2023-03-31T12:08:47,163 ratings.append("positive") 2023-03-31T12:08:47,163 reviews.append(text) 2023-03-31T12:08:47,163 ``` 2023-03-31T12:08:47,163 And now we can start applying Lex2Sent 2023-03-31T12:08:47,164 ``` 2023-03-31T12:08:47,164 from lex2sent.textClass import * 2023-03-31T12:08:47,164 lexicon = ClusterLexicon([opinion_lexicon.positive(), opinion_lexicon.negative()]) 2023-03-31T12:08:47,165 rated_texts = RatedTexts(reviews, lexicon, ratings) 2023-03-31T12:08:47,165 #Basic "counting" method of classification: 2023-03-31T12:08:47,166 count_res = rated_texts.lexicon_classification_eval(label_list=["positive", "negative"]) 2023-03-31T12:08:47,166 l2s_res = rated_texts.lbte(label_list=["positive", "negative"], workers=4) 2023-03-31T12:08:47,166 print("Counting accuracy: {}%; Lex2Sent accuracy: {}%".format(count_res * 100, l2s_res*100)) 2023-03-31T12:08:47,167 ``` 2023-03-31T12:08:47,167 yielding the result "Counting accuracy: 73.772%; Lex2Sent accuracy: 78.172%". 2023-03-31T12:08:47,168 ## Reference 2023-03-31T12:08:47,168 Please refer to ["Lex2Sent - A bagging approach to unsupervised Sentiment Analysis"](https://doi.org/10.48550/arXiv.2209.13023) when using this package. When you use this package in a publication, please cite it as 2023-03-31T12:08:47,168 ``` 2023-03-31T12:08:47,169 @misc{lex2sent, 2023-03-31T12:08:47,169 title = {{Lex2Sent}: {A} bagging approach to unsupervised sentiment analysis}, 2023-03-31T12:08:47,169 shorttitle = {{Lex2Sent}}, 2023-03-31T12:08:47,169 publisher = {arXiv}, 2023-03-31T12:08:47,170 author = {Lange, Kai-Robin and Rieger, Jonas and Jentsch, Carsten}, 2023-03-31T12:08:47,170 month = sep, 2023-03-31T12:08:47,170 year = {2022}, 2023-03-31T12:08:47,171 note = {arXiv:2209.13023 [cs]}, 2023-03-31T12:08:47,171 keywords = {Computer Science - Computation and Language}, 2023-03-31T12:08:47,171 } 2023-03-31T12:08:47,172 ``` 2023-03-31T12:08:47,172 ## Future Features 2023-03-31T12:08:47,172 -Calling from the console 2023-03-31T12:08:47,173 -FastText and SentenceBERT as alternatives to Doc2Vec 2023-03-31T12:08:47,174 -Options to classify into more than two clusters 2023-03-31T12:08:48,140 running bdist_wheel 2023-03-31T12:08:48,890 running build 2023-03-31T12:08:48,891 running build_py 2023-03-31T12:08:48,965 creating build 2023-03-31T12:08:48,966 creating build/lib 2023-03-31T12:08:48,967 creating build/lib/lex2sent 2023-03-31T12:08:48,970 copying lex2sent/__init__.py -> build/lib/lex2sent 2023-03-31T12:08:48,974 copying lex2sent/Bootstrap.py -> build/lib/lex2sent 2023-03-31T12:08:48,978 copying lex2sent/textClass.py -> build/lib/lex2sent 2023-03-31T12:08:48,985 creating build/lib/lex2sent/tests 2023-03-31T12:08:48,987 copying lex2sent/tests/__init__.py -> build/lib/lex2sent/tests 2023-03-31T12:08:48,991 copying lex2sent/tests/test_RatedTexts.py -> build/lib/lex2sent/tests 2023-03-31T12:08:49,077 /usr/local/lib/python3.7/dist-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. 2023-03-31T12:08:49,078 setuptools.SetuptoolsDeprecationWarning, 2023-03-31T12:08:49,148 installing to build/bdist.linux-armv7l/wheel 2023-03-31T12:08:49,149 running install 2023-03-31T12:08:49,207 running install_lib 2023-03-31T12:08:49,277 creating build/bdist.linux-armv7l 2023-03-31T12:08:49,278 creating build/bdist.linux-armv7l/wheel 2023-03-31T12:08:49,281 creating build/bdist.linux-armv7l/wheel/lex2sent 2023-03-31T12:08:49,284 creating build/bdist.linux-armv7l/wheel/lex2sent/tests 2023-03-31T12:08:49,286 copying build/lib/lex2sent/tests/__init__.py -> build/bdist.linux-armv7l/wheel/lex2sent/tests 2023-03-31T12:08:49,290 copying build/lib/lex2sent/tests/test_RatedTexts.py -> build/bdist.linux-armv7l/wheel/lex2sent/tests 2023-03-31T12:08:49,293 copying build/lib/lex2sent/__init__.py -> build/bdist.linux-armv7l/wheel/lex2sent 2023-03-31T12:08:49,297 copying build/lib/lex2sent/Bootstrap.py -> build/bdist.linux-armv7l/wheel/lex2sent 2023-03-31T12:08:49,301 copying build/lib/lex2sent/textClass.py -> build/bdist.linux-armv7l/wheel/lex2sent 2023-03-31T12:08:49,305 running install_egg_info 2023-03-31T12:08:49,471 running egg_info 2023-03-31T12:08:49,536 writing lex2sent.egg-info/PKG-INFO 2023-03-31T12:08:49,540 writing dependency_links to lex2sent.egg-info/dependency_links.txt 2023-03-31T12:08:49,544 writing requirements to lex2sent.egg-info/requires.txt 2023-03-31T12:08:49,547 writing top-level names to lex2sent.egg-info/top_level.txt 2023-03-31T12:08:49,619 reading manifest file 'lex2sent.egg-info/SOURCES.txt' 2023-03-31T12:08:49,623 adding license file 'LICENSE' 2023-03-31T12:08:49,628 writing manifest file 'lex2sent.egg-info/SOURCES.txt' 2023-03-31T12:08:49,631 Copying lex2sent.egg-info to build/bdist.linux-armv7l/wheel/lex2sent-0.0.1-py3.7.egg-info 2023-03-31T12:08:49,651 running install_scripts 2023-03-31T12:08:49,683 creating build/bdist.linux-armv7l/wheel/lex2sent-0.0.1.dist-info/WHEEL 2023-03-31T12:08:49,688 creating '/tmp/pip-wheel-4n9z2hhj/lex2sent-0.0.1-py3-none-any.whl' and adding 'build/bdist.linux-armv7l/wheel' to it 2023-03-31T12:08:49,694 adding 'lex2sent/Bootstrap.py' 2023-03-31T12:08:49,697 adding 'lex2sent/__init__.py' 2023-03-31T12:08:49,705 adding 'lex2sent/textClass.py' 2023-03-31T12:08:49,709 adding 'lex2sent/tests/__init__.py' 2023-03-31T12:08:49,712 adding 'lex2sent/tests/test_RatedTexts.py' 2023-03-31T12:08:49,717 adding 'lex2sent-0.0.1.dist-info/LICENSE' 2023-03-31T12:08:49,720 adding 'lex2sent-0.0.1.dist-info/METADATA' 2023-03-31T12:08:49,722 adding 'lex2sent-0.0.1.dist-info/WHEEL' 2023-03-31T12:08:49,724 adding 'lex2sent-0.0.1.dist-info/top_level.txt' 2023-03-31T12:08:49,726 adding 'lex2sent-0.0.1.dist-info/RECORD' 2023-03-31T12:08:49,728 removing build/bdist.linux-armv7l/wheel 2023-03-31T12:08:49,903 Building wheel for lex2sent (setup.py): finished with status 'done' 2023-03-31T12:08:49,911 Created wheel for lex2sent: filename=lex2sent-0.0.1-py3-none-any.whl size=12227 sha256=83e4cbc6e406170d0fc45b3aec9b9cb725c0770f926650a255006d6bfe493706 2023-03-31T12:08:49,913 Stored in directory: /tmp/pip-ephem-wheel-cache-ujl371ve/wheels/f8/6e/1c/68452a1d630672ae17e3b9def0e98eb05d37989415171628c1 2023-03-31T12:08:49,938 Successfully built lex2sent 2023-03-31T12:08:49,946 Removed build tracker: '/tmp/pip-build-tracker-r3kq4i7t'