2019-07-23T16:35:58 Created temporary directory: /tmp/pip-ephem-wheel-cache-_6k0qwug 2019-07-23T16:35:58 Created temporary directory: /tmp/pip-req-tracker-5sdeswx3 2019-07-23T16:35:58 Created requirements tracker '/tmp/pip-req-tracker-5sdeswx3' 2019-07-23T16:35:58 Created temporary directory: /tmp/pip-wheel-xrahtn4s 2019-07-23T16:35:58 Collecting pystempel==1.0.post1 2019-07-23T16:35:58 1 location(s) to search for versions of pystempel: 2019-07-23T16:35:58 * https://pypi.org/simple/pystempel/ 2019-07-23T16:35:58 Getting page https://pypi.org/simple/pystempel/ 2019-07-23T16:35:58 Analyzing links from page https://pypi.org/simple/pystempel/ 2019-07-23T16:35:58 Found link https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz#sha256=17f572d3b75459f7b5e2d007d1816e09e2493c137b8b94a9a196662c9936f7de (from https://pypi.org/simple/pystempel/) (requires-python:>=3.7), version: 1.0.post1 2019-07-23T16:35:58 Found link https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz#sha256=787199b74ef4e9353aa538c66f5237286d27eca2c111ae7e36b04c90f463b428 (from https://pypi.org/simple/pystempel/) (requires-python:>=3.7), version: 1.0.1 2019-07-23T16:35:58 Using version 1.0.post1 (newest of versions: 1.0.post1) 2019-07-23T16:35:58 Created temporary directory: /tmp/pip-unpack-8eqgsokc 2019-07-23T16:35:58 Downloading https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz (428kB) 2019-07-23T16:35:58 Downloading from URL https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz#sha256=17f572d3b75459f7b5e2d007d1816e09e2493c137b8b94a9a196662c9936f7de (from https://pypi.org/simple/pystempel/) (requires-python:>=3.7) 2019-07-23T16:35:58 Added pystempel==1.0.post1 from https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz#sha256=17f572d3b75459f7b5e2d007d1816e09e2493c137b8b94a9a196662c9936f7de to build tracker '/tmp/pip-req-tracker-5sdeswx3' 2019-07-23T16:35:58 Running setup.py (path:/tmp/pip-wheel-xrahtn4s/pystempel/setup.py) egg_info for package pystempel 2019-07-23T16:35:58 Running command python setup.py egg_info 2019-07-23T16:36:00 Stempel Stemmer 2019-07-23T16:36:00 =============== 2019-07-23T16:36:00 Python port of Stempel, an algorithmic stemmer for Polish language, originally written in Java. 2019-07-23T16:36:00 The original stemmer has been implemented as part of `Egothor Project`_, taken virtually unchanged to 2019-07-23T16:36:00 `Stempel Stemmer Java library`_ by Andrzej Białecki and next included as part of `Apache Lucene`_, 2019-07-23T16:36:00 a free and open-source search engine library. 2019-07-23T16:36:00 .. _Egothor Project: https://www.egothor.org/product/egothor2/ 2019-07-23T16:36:00 .. _Stempel Stemmer Java library: http://www.getopt.org/stempel/index.html 2019-07-23T16:36:00 .. _Apache Lucene: https://lucene.apache.org/core/3_1_0/api/contrib-stempel/index.html 2019-07-23T16:36:00 This package includes also high-quality stemming table for Polish with 20,000 training sets, 2019-07-23T16:36:00 pretrained by Andrzej Białecki. 2019-07-23T16:36:00 The port does not include code for compiling stemming tables. 2019-07-23T16:36:00 .. _sjp.pl: https://sjp.pl/slownik/en/ 2019-07-23T16:36:00 How to use 2019-07-23T16:36:00 ---------- 2019-07-23T16:36:00 Install in your local environment: 2019-07-23T16:36:00 .. code:: console 2019-07-23T16:36:00 pip install pystempel 2019-07-23T16:36:00 Use in your code: 2019-07-23T16:36:00 .. code:: python 2019-07-23T16:36:00 >>> from stempel import StempelStemmer 2019-07-23T16:36:00 >>> stemmer = StempelStemmer.default() 2019-07-23T16:36:00 >>> for word in ['książki', 'książki', 'książkami', 'książkowa', 'książkowymi']: 2019-07-23T16:36:00 ... print(stemmer.stem(word)) 2019-07-23T16:36:00 ... 2019-07-23T16:36:00 książek 2019-07-23T16:36:00 książek 2019-07-23T16:36:00 książek 2019-07-23T16:36:00 książkowy 2019-07-23T16:36:00 książkowy 2019-07-23T16:36:00 Choosing between port and wrapper 2019-07-23T16:36:00 --------------------------------- 2019-07-23T16:36:00 If you work on an NLP project in Python you can choose between Python port and Python wrapper. 2019-07-23T16:36:00 Python port is what pystempel tries to achieve: translation from Java implementation to Python. 2019-07-23T16:36:00 Python wrapper is what I used in `tests`_: Python functions to call the original Java implementation of 2019-07-23T16:36:00 stemmer. You can find more about wrappers and ports in `Stackoverflow comparision post`_. Here, I 2019-07-23T16:36:00 compare both approaches to help you decide: 2019-07-23T16:36:00 * **Same accuracy**. I have verified Python port by comparing its output 2019-07-23T16:36:00 with output of original Java implementation for 331224 words from Free Polish dictionary 2019-07-23T16:36:00 (`sjp.pl`_) and for 100% of words it returns same output. 2019-07-23T16:36:00 * **Similar performance**. For mentioned dataset both stemmer versions achieved comparable performance. 2019-07-23T16:36:00 Python port completed stemming in 4.4 seconds, while Python wrapper -- in 5 seconds (Intel Core 2019-07-23T16:36:00 i5-6000 3.30 GHz, 16GB RAM, Windows 10, OpenJDK) 2019-07-23T16:36:00 * **Different setup**. Python wrapper requires additionally installation of Cython and pyjnius. 2019-07-23T16:36:00 Python wrapper will make also `debugging harder`_ (switching between two programming languages). 2019-07-23T16:36:00 .. _Stackoverflow comparision post: https://stackoverflow.com/questions/10113218/how-to-decide-when-to-wrap-port-write-from-scratch 2019-07-23T16:36:00 .. _debugging harder: https://stackoverflow.com/questions/6970359/find-an-efficient-way-to-integrate-different-language-libraries-into-one-project 2019-07-23T16:36:00 .. _tests: tests/ 2019-07-23T16:36:00 Development setup 2019-07-23T16:36:00 ----------------- 2019-07-23T16:36:00 To setup environment for development you will need `Anaconda`_ installed. 2019-07-23T16:36:00 .. _Anaconda: https://anaconda.org/ 2019-07-23T16:36:00 .. code:: console 2019-07-23T16:36:00 conda create -n stempel-stemmer 2019-07-23T16:36:00 conda activate stempel-stemmer 2019-07-23T16:36:00 conda install -c conda-forge --file requirements.txt 2019-07-23T16:36:00 To run tests: 2019-07-23T16:36:00 .. code:: console 2019-07-23T16:36:00 curl https://repo1.maven.org/maven2/org/apache/lucene/lucene-analyzers-stempel/8.1.1/lucene-analyzers-stempel-8.1.1.jar > stempel-8.1.1.jar 2019-07-23T16:36:00 python -m pytest ./ 2019-07-23T16:36:00 To run benchmark: 2019-07-23T16:36:00 .. code:: console 2019-07-23T16:36:00 python tests\test_benchmark.py 2019-07-23T16:36:00 Licensing 2019-07-23T16:36:00 ------------------ 2019-07-23T16:36:00 Most of the code is covered by `Egothor Open Source License`_, an Apache-style license. The rest of 2019-07-23T16:36:00 the code and pretrained stemming table are covered by the `Apache License 2.0`_. Unit tests use the 2019-07-23T16:36:00 Free Polish dictionary for use in spell-checking from `sjp.pl`_ , covered by `Apache License 2.0`_ 2019-07-23T16:36:00 as well. 2019-07-23T16:36:00 .. _Egothor Open Source License: https://www.egothor.org/product/egothor2/ 2019-07-23T16:36:00 .. _Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0 2019-07-23T16:36:00 Other languages 2019-07-23T16:36:00 ------------------ 2019-07-23T16:36:00 * `Estem`_ is Erlang wrapper (not port) for Stempel stemmer. 2019-07-23T16:36:00 .. _Estem: https://github.com/arcusfelis/estem 2019-07-23T16:36:00 running egg_info 2019-07-23T16:36:00 creating pip-egg-info/pystempel.egg-info 2019-07-23T16:36:00 writing pip-egg-info/pystempel.egg-info/PKG-INFO 2019-07-23T16:36:00 writing dependency_links to pip-egg-info/pystempel.egg-info/dependency_links.txt 2019-07-23T16:36:00 writing requirements to pip-egg-info/pystempel.egg-info/requires.txt 2019-07-23T16:36:00 writing top-level names to pip-egg-info/pystempel.egg-info/top_level.txt 2019-07-23T16:36:00 writing manifest file 'pip-egg-info/pystempel.egg-info/SOURCES.txt' 2019-07-23T16:36:00 reading manifest file 'pip-egg-info/pystempel.egg-info/SOURCES.txt' 2019-07-23T16:36:00 writing manifest file 'pip-egg-info/pystempel.egg-info/SOURCES.txt' 2019-07-23T16:36:00 Source in /tmp/pip-wheel-xrahtn4s/pystempel has version 1.0.post1, which satisfies requirement pystempel==1.0.post1 from https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz#sha256=17f572d3b75459f7b5e2d007d1816e09e2493c137b8b94a9a196662c9936f7de 2019-07-23T16:36:00 Removed pystempel==1.0.post1 from https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz#sha256=17f572d3b75459f7b5e2d007d1816e09e2493c137b8b94a9a196662c9936f7de from build tracker '/tmp/pip-req-tracker-5sdeswx3' 2019-07-23T16:36:00 Building wheels for collected packages: pystempel 2019-07-23T16:36:00 Created temporary directory: /tmp/pip-wheel-2zevnna7 2019-07-23T16:36:00 Building wheel for pystempel (setup.py): started 2019-07-23T16:36:00 Destination directory: /tmp/pip-wheel-2zevnna7 2019-07-23T16:36:00 Running command /usr/bin/python3 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-wheel-xrahtn4s/pystempel/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-2zevnna7 2019-07-23T16:36:01 Stempel Stemmer 2019-07-23T16:36:01 =============== 2019-07-23T16:36:01 Python port of Stempel, an algorithmic stemmer for Polish language, originally written in Java. 2019-07-23T16:36:01 The original stemmer has been implemented as part of `Egothor Project`_, taken virtually unchanged to 2019-07-23T16:36:01 `Stempel Stemmer Java library`_ by Andrzej Białecki and next included as part of `Apache Lucene`_, 2019-07-23T16:36:01 a free and open-source search engine library. 2019-07-23T16:36:01 .. _Egothor Project: https://www.egothor.org/product/egothor2/ 2019-07-23T16:36:01 .. _Stempel Stemmer Java library: http://www.getopt.org/stempel/index.html 2019-07-23T16:36:01 .. _Apache Lucene: https://lucene.apache.org/core/3_1_0/api/contrib-stempel/index.html 2019-07-23T16:36:01 This package includes also high-quality stemming table for Polish with 20,000 training sets, 2019-07-23T16:36:01 pretrained by Andrzej Białecki. 2019-07-23T16:36:01 The port does not include code for compiling stemming tables. 2019-07-23T16:36:01 .. _sjp.pl: https://sjp.pl/slownik/en/ 2019-07-23T16:36:01 How to use 2019-07-23T16:36:01 ---------- 2019-07-23T16:36:01 Install in your local environment: 2019-07-23T16:36:01 .. code:: console 2019-07-23T16:36:02 pip install pystempel 2019-07-23T16:36:02 Use in your code: 2019-07-23T16:36:02 .. code:: python 2019-07-23T16:36:02 >>> from stempel import StempelStemmer 2019-07-23T16:36:02 >>> stemmer = StempelStemmer.default() 2019-07-23T16:36:02 >>> for word in ['książki', 'książki', 'książkami', 'książkowa', 'książkowymi']: 2019-07-23T16:36:02 ... print(stemmer.stem(word)) 2019-07-23T16:36:02 ... 2019-07-23T16:36:02 książek 2019-07-23T16:36:02 książek 2019-07-23T16:36:02 książek 2019-07-23T16:36:02 książkowy 2019-07-23T16:36:02 książkowy 2019-07-23T16:36:02 Choosing between port and wrapper 2019-07-23T16:36:02 --------------------------------- 2019-07-23T16:36:02 If you work on an NLP project in Python you can choose between Python port and Python wrapper. 2019-07-23T16:36:02 Python port is what pystempel tries to achieve: translation from Java implementation to Python. 2019-07-23T16:36:02 Python wrapper is what I used in `tests`_: Python functions to call the original Java implementation of 2019-07-23T16:36:02 stemmer. You can find more about wrappers and ports in `Stackoverflow comparision post`_. Here, I 2019-07-23T16:36:02 compare both approaches to help you decide: 2019-07-23T16:36:02 * **Same accuracy**. I have verified Python port by comparing its output 2019-07-23T16:36:02 with output of original Java implementation for 331224 words from Free Polish dictionary 2019-07-23T16:36:02 (`sjp.pl`_) and for 100% of words it returns same output. 2019-07-23T16:36:02 * **Similar performance**. For mentioned dataset both stemmer versions achieved comparable performance. 2019-07-23T16:36:02 Python port completed stemming in 4.4 seconds, while Python wrapper -- in 5 seconds (Intel Core 2019-07-23T16:36:02 i5-6000 3.30 GHz, 16GB RAM, Windows 10, OpenJDK) 2019-07-23T16:36:02 * **Different setup**. Python wrapper requires additionally installation of Cython and pyjnius. 2019-07-23T16:36:02 Python wrapper will make also `debugging harder`_ (switching between two programming languages). 2019-07-23T16:36:02 .. _Stackoverflow comparision post: https://stackoverflow.com/questions/10113218/how-to-decide-when-to-wrap-port-write-from-scratch 2019-07-23T16:36:02 .. _debugging harder: https://stackoverflow.com/questions/6970359/find-an-efficient-way-to-integrate-different-language-libraries-into-one-project 2019-07-23T16:36:02 .. _tests: tests/ 2019-07-23T16:36:02 Development setup 2019-07-23T16:36:02 ----------------- 2019-07-23T16:36:02 To setup environment for development you will need `Anaconda`_ installed. 2019-07-23T16:36:02 .. _Anaconda: https://anaconda.org/ 2019-07-23T16:36:02 .. code:: console 2019-07-23T16:36:02 conda create -n stempel-stemmer 2019-07-23T16:36:02 conda activate stempel-stemmer 2019-07-23T16:36:02 conda install -c conda-forge --file requirements.txt 2019-07-23T16:36:02 To run tests: 2019-07-23T16:36:02 .. code:: console 2019-07-23T16:36:02 curl https://repo1.maven.org/maven2/org/apache/lucene/lucene-analyzers-stempel/8.1.1/lucene-analyzers-stempel-8.1.1.jar > stempel-8.1.1.jar 2019-07-23T16:36:02 python -m pytest ./ 2019-07-23T16:36:02 To run benchmark: 2019-07-23T16:36:02 .. code:: console 2019-07-23T16:36:02 python tests\test_benchmark.py 2019-07-23T16:36:02 Licensing 2019-07-23T16:36:02 ------------------ 2019-07-23T16:36:02 Most of the code is covered by `Egothor Open Source License`_, an Apache-style license. The rest of 2019-07-23T16:36:02 the code and pretrained stemming table are covered by the `Apache License 2.0`_. Unit tests use the 2019-07-23T16:36:02 Free Polish dictionary for use in spell-checking from `sjp.pl`_ , covered by `Apache License 2.0`_ 2019-07-23T16:36:02 as well. 2019-07-23T16:36:02 .. _Egothor Open Source License: https://www.egothor.org/product/egothor2/ 2019-07-23T16:36:02 .. _Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0 2019-07-23T16:36:02 Other languages 2019-07-23T16:36:02 ------------------ 2019-07-23T16:36:02 * `Estem`_ is Erlang wrapper (not port) for Stempel stemmer. 2019-07-23T16:36:02 .. _Estem: https://github.com/arcusfelis/estem 2019-07-23T16:36:02 running bdist_wheel 2019-07-23T16:36:02 running build 2019-07-23T16:36:02 running build_py 2019-07-23T16:36:02 creating build 2019-07-23T16:36:02 creating build/lib 2019-07-23T16:36:02 creating build/lib/stempel 2019-07-23T16:36:02 copying stempel/egothor.py -> build/lib/stempel 2019-07-23T16:36:02 copying stempel/__init__.py -> build/lib/stempel 2019-07-23T16:36:02 copying stempel/streams.py -> build/lib/stempel 2019-07-23T16:36:02 copying stempel/stemmer_20000.tbl -> build/lib/stempel 2019-07-23T16:36:02 installing to build/bdist.linux-armv7l/wheel 2019-07-23T16:36:02 running install 2019-07-23T16:36:02 running install_lib 2019-07-23T16:36:02 creating build/bdist.linux-armv7l 2019-07-23T16:36:02 creating build/bdist.linux-armv7l/wheel 2019-07-23T16:36:02 creating build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:36:02 copying build/lib/stempel/egothor.py -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:36:02 copying build/lib/stempel/__init__.py -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:36:02 copying build/lib/stempel/streams.py -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:36:02 copying build/lib/stempel/stemmer_20000.tbl -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:36:02 running install_egg_info 2019-07-23T16:36:02 running egg_info 2019-07-23T16:36:02 writing pystempel.egg-info/PKG-INFO 2019-07-23T16:36:02 writing dependency_links to pystempel.egg-info/dependency_links.txt 2019-07-23T16:36:02 writing requirements to pystempel.egg-info/requires.txt 2019-07-23T16:36:02 writing top-level names to pystempel.egg-info/top_level.txt 2019-07-23T16:36:02 reading manifest file 'pystempel.egg-info/SOURCES.txt' 2019-07-23T16:36:02 writing manifest file 'pystempel.egg-info/SOURCES.txt' 2019-07-23T16:36:02 Copying pystempel.egg-info to build/bdist.linux-armv7l/wheel/pystempel-1.0.post1-py3.7.egg-info 2019-07-23T16:36:02 running install_scripts 2019-07-23T16:36:02 creating build/bdist.linux-armv7l/wheel/pystempel-1.0.post1.dist-info/WHEEL 2019-07-23T16:36:02 creating '/tmp/pip-wheel-2zevnna7/pystempel-1.0.post1-py3-none-any.whl' and adding 'build/bdist.linux-armv7l/wheel' to it 2019-07-23T16:36:02 adding 'stempel/__init__.py' 2019-07-23T16:36:02 adding 'stempel/egothor.py' 2019-07-23T16:36:03 adding 'stempel/stemmer_20000.tbl' 2019-07-23T16:36:03 adding 'stempel/streams.py' 2019-07-23T16:36:03 adding 'pystempel-1.0.post1.dist-info/METADATA' 2019-07-23T16:36:03 adding 'pystempel-1.0.post1.dist-info/WHEEL' 2019-07-23T16:36:03 adding 'pystempel-1.0.post1.dist-info/top_level.txt' 2019-07-23T16:36:03 adding 'pystempel-1.0.post1.dist-info/RECORD' 2019-07-23T16:36:03 removing build/bdist.linux-armv7l/wheel 2019-07-23T16:36:03 Building wheel for pystempel (setup.py): finished with status 'done' 2019-07-23T16:36:03 Stored in directory: /tmp/tmpqq3mchrk 2019-07-23T16:36:03 Successfully built pystempel 2019-07-23T16:36:03 Cleaning up... 2019-07-23T16:36:03 Removing source in /tmp/pip-wheel-xrahtn4s/pystempel 2019-07-23T16:36:03 Removed build tracker '/tmp/pip-req-tracker-5sdeswx3'