2019-07-23T16:39:23 Created temporary directory: /tmp/pip-ephem-wheel-cache-cwikmas2 2019-07-23T16:39:23 Created temporary directory: /tmp/pip-req-tracker-taf0zudk 2019-07-23T16:39:23 Created requirements tracker '/tmp/pip-req-tracker-taf0zudk' 2019-07-23T16:39:23 Created temporary directory: /tmp/pip-wheel-1t5ky9dr 2019-07-23T16:39:23 Collecting pystempel==1.0.1 2019-07-23T16:39:23 1 location(s) to search for versions of pystempel: 2019-07-23T16:39:23 * https://pypi.org/simple/pystempel/ 2019-07-23T16:39:23 Getting page https://pypi.org/simple/pystempel/ 2019-07-23T16:39:23 Analyzing links from page https://pypi.org/simple/pystempel/ 2019-07-23T16:39:23 Found link https://files.pythonhosted.org/packages/7a/e6/8595b6b2706e06e01be7540eb88ff59852a55571ac9290ff1a4ac807a552/pystempel-1.0.post1.tar.gz#sha256=17f572d3b75459f7b5e2d007d1816e09e2493c137b8b94a9a196662c9936f7de (from https://pypi.org/simple/pystempel/) (requires-python:>=3.7), version: 1.0.post1 2019-07-23T16:39:23 Found link https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz#sha256=787199b74ef4e9353aa538c66f5237286d27eca2c111ae7e36b04c90f463b428 (from https://pypi.org/simple/pystempel/) (requires-python:>=3.7), version: 1.0.1 2019-07-23T16:39:23 Using version 1.0.1 (newest of versions: 1.0.1) 2019-07-23T16:39:23 Created temporary directory: /tmp/pip-unpack-54r2rs90 2019-07-23T16:39:24 Downloading https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz (428kB) 2019-07-23T16:39:24 Downloading from URL https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz#sha256=787199b74ef4e9353aa538c66f5237286d27eca2c111ae7e36b04c90f463b428 (from https://pypi.org/simple/pystempel/) (requires-python:>=3.7) 2019-07-23T16:39:24 Added pystempel==1.0.1 from https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz#sha256=787199b74ef4e9353aa538c66f5237286d27eca2c111ae7e36b04c90f463b428 to build tracker '/tmp/pip-req-tracker-taf0zudk' 2019-07-23T16:39:24 Running setup.py (path:/tmp/pip-wheel-1t5ky9dr/pystempel/setup.py) egg_info for package pystempel 2019-07-23T16:39:24 Running command python setup.py egg_info 2019-07-23T16:39:25 Stempel Stemmer 2019-07-23T16:39:25 =============== 2019-07-23T16:39:25 Python port of Stempel, an algorithmic stemmer for Polish language, originally written in Java. 2019-07-23T16:39:25 The original stemmer has been implemented as part of `Egothor Project`_, taken virtually unchanged to 2019-07-23T16:39:25 `Stempel Stemmer Java library`_ by Andrzej Białecki and next included as part of `Apache Lucene`_, 2019-07-23T16:39:25 a free and open-source search engine library. 2019-07-23T16:39:25 .. _Egothor Project: https://www.egothor.org/product/egothor2/ 2019-07-23T16:39:25 .. _Stempel Stemmer Java library: http://www.getopt.org/stempel/index.html 2019-07-23T16:39:25 .. _Apache Lucene: https://lucene.apache.org/core/3_1_0/api/contrib-stempel/index.html 2019-07-23T16:39:25 This package includes also high-quality stemming table for Polish with 20,000 training sets, 2019-07-23T16:39:25 pretrained by Andrzej Białecki. 2019-07-23T16:39:25 The port does not include code for compiling stemming tables. 2019-07-23T16:39:25 .. _sjp.pl: https://sjp.pl/slownik/en/ 2019-07-23T16:39:25 How to use 2019-07-23T16:39:25 ---------- 2019-07-23T16:39:25 Install in your local environment: 2019-07-23T16:39:25 .. code:: console 2019-07-23T16:39:25 pip install pystempel 2019-07-23T16:39:25 Use in your code: 2019-07-23T16:39:25 .. code:: python 2019-07-23T16:39:25 >>> from stempel import StempelStemmer 2019-07-23T16:39:25 >>> stemmer = StempelStemmer.default() 2019-07-23T16:39:25 >>> for word in ['książki', 'książki', 'książkami', 'książkowa', 'książkowymi']: 2019-07-23T16:39:25 ... print(stemmer.stem(word)) 2019-07-23T16:39:25 ... 2019-07-23T16:39:25 książek 2019-07-23T16:39:25 książek 2019-07-23T16:39:25 książek 2019-07-23T16:39:25 książkowy 2019-07-23T16:39:25 książkowy 2019-07-23T16:39:25 Choosing between port and wrapper 2019-07-23T16:39:25 --------------------------------- 2019-07-23T16:39:25 If you work on an NLP project in Python you can choose between Python port and Python wrapper. 2019-07-23T16:39:25 Python port is what pystempel tries to achieve: translation from Java implementation to Python. 2019-07-23T16:39:25 Python wrapper is what I used in `tests`_: Python functions to call the original Java implementation of 2019-07-23T16:39:25 stemmer. You can find more about wrappers and ports in `Stackoverflow comparision post`_. Here, I 2019-07-23T16:39:25 compare both approaches to help you decide: 2019-07-23T16:39:25 * **Same accuracy**. I have verified Python port by comparing its output 2019-07-23T16:39:25 with output of original Java implementation for 331224 words from Free Polish dictionary 2019-07-23T16:39:25 (`sjp.pl`_) and for 100% of words it returns same output. 2019-07-23T16:39:25 * **Similar performance**. For mentioned dataset both stemmer versions achieved comparable performance. 2019-07-23T16:39:25 Python port completed stemming in 4.4 seconds, while Python wrapper -- in 5 seconds (Intel Core 2019-07-23T16:39:25 i5-6000 3.30 GHz, 16GB RAM, Windows 10, OpenJDK) 2019-07-23T16:39:25 * **Different setup**. Python wrapper requires additionally installation of Cython and pyjnius. 2019-07-23T16:39:25 Python wrapper will make also `debugging harder`_ (switching between two programming languages). 2019-07-23T16:39:25 .. _Stackoverflow comparision post: https://stackoverflow.com/questions/10113218/how-to-decide-when-to-wrap-port-write-from-scratch 2019-07-23T16:39:25 .. _debugging harder: https://stackoverflow.com/questions/6970359/find-an-efficient-way-to-integrate-different-language-libraries-into-one-project 2019-07-23T16:39:25 .. _tests: tests/ 2019-07-23T16:39:25 Development setup 2019-07-23T16:39:25 ----------------- 2019-07-23T16:39:25 To setup environment for development you will need `Anaconda`_ installed. 2019-07-23T16:39:25 .. _Anaconda: https://anaconda.org/ 2019-07-23T16:39:25 .. code:: console 2019-07-23T16:39:25 conda create -n stempel-stemmer 2019-07-23T16:39:25 conda activate stempel-stemmer 2019-07-23T16:39:25 conda install -c conda-forge --file requirements.txt 2019-07-23T16:39:25 To run tests: 2019-07-23T16:39:25 .. code:: console 2019-07-23T16:39:25 curl https://repo1.maven.org/maven2/org/apache/lucene/lucene-analyzers-stempel/8.1.1/lucene-analyzers-stempel-8.1.1.jar > stempel-8.1.1.jar 2019-07-23T16:39:25 python -m pytest ./ 2019-07-23T16:39:25 To run benchmark: 2019-07-23T16:39:25 .. code:: console 2019-07-23T16:39:25 python tests\test_benchmark.py 2019-07-23T16:39:25 Licensing 2019-07-23T16:39:25 ------------------ 2019-07-23T16:39:25 Most of the code is covered by `Egothor Open Source License`_, an Apache-style license. The rest of 2019-07-23T16:39:25 the code and pretrained stemming table are covered by the `Apache License 2.0`_. Unit tests use the 2019-07-23T16:39:25 Free Polish dictionary for use in spell-checking from `sjp.pl`_ , covered by `Apache License 2.0`_ 2019-07-23T16:39:25 as well. 2019-07-23T16:39:25 .. _Egothor Open Source License: https://www.egothor.org/product/egothor2/ 2019-07-23T16:39:25 .. _Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0 2019-07-23T16:39:25 Other languages 2019-07-23T16:39:25 ------------------ 2019-07-23T16:39:25 * `Estem`_ is Erlang wrapper (not port) for Stempel stemmer. 2019-07-23T16:39:25 .. _Estem: https://github.com/arcusfelis/estem 2019-07-23T16:39:25 running egg_info 2019-07-23T16:39:25 creating pip-egg-info/pystempel.egg-info 2019-07-23T16:39:25 writing pip-egg-info/pystempel.egg-info/PKG-INFO 2019-07-23T16:39:25 writing dependency_links to pip-egg-info/pystempel.egg-info/dependency_links.txt 2019-07-23T16:39:25 writing requirements to pip-egg-info/pystempel.egg-info/requires.txt 2019-07-23T16:39:25 writing top-level names to pip-egg-info/pystempel.egg-info/top_level.txt 2019-07-23T16:39:26 writing manifest file 'pip-egg-info/pystempel.egg-info/SOURCES.txt' 2019-07-23T16:39:26 reading manifest file 'pip-egg-info/pystempel.egg-info/SOURCES.txt' 2019-07-23T16:39:26 writing manifest file 'pip-egg-info/pystempel.egg-info/SOURCES.txt' 2019-07-23T16:39:26 Source in /tmp/pip-wheel-1t5ky9dr/pystempel has version 1.0.1, which satisfies requirement pystempel==1.0.1 from https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz#sha256=787199b74ef4e9353aa538c66f5237286d27eca2c111ae7e36b04c90f463b428 2019-07-23T16:39:26 Removed pystempel==1.0.1 from https://files.pythonhosted.org/packages/be/cd/fe9ab8760ff402b923841c5843f742052666508a5b84c4fc6a2e0b67dfdd/pystempel-1.0.1.tar.gz#sha256=787199b74ef4e9353aa538c66f5237286d27eca2c111ae7e36b04c90f463b428 from build tracker '/tmp/pip-req-tracker-taf0zudk' 2019-07-23T16:39:26 Building wheels for collected packages: pystempel 2019-07-23T16:39:26 Created temporary directory: /tmp/pip-wheel-172_1wel 2019-07-23T16:39:26 Building wheel for pystempel (setup.py): started 2019-07-23T16:39:26 Destination directory: /tmp/pip-wheel-172_1wel 2019-07-23T16:39:26 Running command /usr/bin/python3 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-wheel-1t5ky9dr/pystempel/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-172_1wel 2019-07-23T16:39:27 Stempel Stemmer 2019-07-23T16:39:27 =============== 2019-07-23T16:39:27 Python port of Stempel, an algorithmic stemmer for Polish language, originally written in Java. 2019-07-23T16:39:27 The original stemmer has been implemented as part of `Egothor Project`_, taken virtually unchanged to 2019-07-23T16:39:27 `Stempel Stemmer Java library`_ by Andrzej Białecki and next included as part of `Apache Lucene`_, 2019-07-23T16:39:27 a free and open-source search engine library. 2019-07-23T16:39:27 .. _Egothor Project: https://www.egothor.org/product/egothor2/ 2019-07-23T16:39:27 .. _Stempel Stemmer Java library: http://www.getopt.org/stempel/index.html 2019-07-23T16:39:27 .. _Apache Lucene: https://lucene.apache.org/core/3_1_0/api/contrib-stempel/index.html 2019-07-23T16:39:27 This package includes also high-quality stemming table for Polish with 20,000 training sets, 2019-07-23T16:39:27 pretrained by Andrzej Białecki. 2019-07-23T16:39:27 The port does not include code for compiling stemming tables. 2019-07-23T16:39:27 .. _sjp.pl: https://sjp.pl/slownik/en/ 2019-07-23T16:39:27 How to use 2019-07-23T16:39:27 ---------- 2019-07-23T16:39:27 Install in your local environment: 2019-07-23T16:39:27 .. code:: console 2019-07-23T16:39:27 pip install pystempel 2019-07-23T16:39:27 Use in your code: 2019-07-23T16:39:27 .. code:: python 2019-07-23T16:39:27 >>> from stempel import StempelStemmer 2019-07-23T16:39:27 >>> stemmer = StempelStemmer.default() 2019-07-23T16:39:27 >>> for word in ['książki', 'książki', 'książkami', 'książkowa', 'książkowymi']: 2019-07-23T16:39:27 ... print(stemmer.stem(word)) 2019-07-23T16:39:27 ... 2019-07-23T16:39:27 książek 2019-07-23T16:39:27 książek 2019-07-23T16:39:27 książek 2019-07-23T16:39:27 książkowy 2019-07-23T16:39:27 książkowy 2019-07-23T16:39:27 Choosing between port and wrapper 2019-07-23T16:39:27 --------------------------------- 2019-07-23T16:39:27 If you work on an NLP project in Python you can choose between Python port and Python wrapper. 2019-07-23T16:39:27 Python port is what pystempel tries to achieve: translation from Java implementation to Python. 2019-07-23T16:39:27 Python wrapper is what I used in `tests`_: Python functions to call the original Java implementation of 2019-07-23T16:39:27 stemmer. You can find more about wrappers and ports in `Stackoverflow comparision post`_. Here, I 2019-07-23T16:39:27 compare both approaches to help you decide: 2019-07-23T16:39:27 * **Same accuracy**. I have verified Python port by comparing its output 2019-07-23T16:39:27 with output of original Java implementation for 331224 words from Free Polish dictionary 2019-07-23T16:39:27 (`sjp.pl`_) and for 100% of words it returns same output. 2019-07-23T16:39:27 * **Similar performance**. For mentioned dataset both stemmer versions achieved comparable performance. 2019-07-23T16:39:27 Python port completed stemming in 4.4 seconds, while Python wrapper -- in 5 seconds (Intel Core 2019-07-23T16:39:27 i5-6000 3.30 GHz, 16GB RAM, Windows 10, OpenJDK) 2019-07-23T16:39:27 * **Different setup**. Python wrapper requires additionally installation of Cython and pyjnius. 2019-07-23T16:39:27 Python wrapper will make also `debugging harder`_ (switching between two programming languages). 2019-07-23T16:39:27 .. _Stackoverflow comparision post: https://stackoverflow.com/questions/10113218/how-to-decide-when-to-wrap-port-write-from-scratch 2019-07-23T16:39:27 .. _debugging harder: https://stackoverflow.com/questions/6970359/find-an-efficient-way-to-integrate-different-language-libraries-into-one-project 2019-07-23T16:39:27 .. _tests: tests/ 2019-07-23T16:39:27 Development setup 2019-07-23T16:39:27 ----------------- 2019-07-23T16:39:27 To setup environment for development you will need `Anaconda`_ installed. 2019-07-23T16:39:27 .. _Anaconda: https://anaconda.org/ 2019-07-23T16:39:27 .. code:: console 2019-07-23T16:39:27 conda create -n stempel-stemmer 2019-07-23T16:39:27 conda activate stempel-stemmer 2019-07-23T16:39:27 conda install -c conda-forge --file requirements.txt 2019-07-23T16:39:27 To run tests: 2019-07-23T16:39:27 .. code:: console 2019-07-23T16:39:27 curl https://repo1.maven.org/maven2/org/apache/lucene/lucene-analyzers-stempel/8.1.1/lucene-analyzers-stempel-8.1.1.jar > stempel-8.1.1.jar 2019-07-23T16:39:27 python -m pytest ./ 2019-07-23T16:39:27 To run benchmark: 2019-07-23T16:39:27 .. code:: console 2019-07-23T16:39:27 python tests\test_benchmark.py 2019-07-23T16:39:27 Licensing 2019-07-23T16:39:27 ------------------ 2019-07-23T16:39:27 Most of the code is covered by `Egothor Open Source License`_, an Apache-style license. The rest of 2019-07-23T16:39:27 the code and pretrained stemming table are covered by the `Apache License 2.0`_. Unit tests use the 2019-07-23T16:39:27 Free Polish dictionary for use in spell-checking from `sjp.pl`_ , covered by `Apache License 2.0`_ 2019-07-23T16:39:27 as well. 2019-07-23T16:39:27 .. _Egothor Open Source License: https://www.egothor.org/product/egothor2/ 2019-07-23T16:39:27 .. _Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0 2019-07-23T16:39:27 Other languages 2019-07-23T16:39:27 ------------------ 2019-07-23T16:39:27 * `Estem`_ is Erlang wrapper (not port) for Stempel stemmer. 2019-07-23T16:39:27 .. _Estem: https://github.com/arcusfelis/estem 2019-07-23T16:39:27 running bdist_wheel 2019-07-23T16:39:27 running build 2019-07-23T16:39:27 running build_py 2019-07-23T16:39:28 creating build 2019-07-23T16:39:28 creating build/lib 2019-07-23T16:39:28 creating build/lib/stempel 2019-07-23T16:39:28 copying stempel/egothor.py -> build/lib/stempel 2019-07-23T16:39:28 copying stempel/__init__.py -> build/lib/stempel 2019-07-23T16:39:28 copying stempel/streams.py -> build/lib/stempel 2019-07-23T16:39:28 copying stempel/stemmer_20000.tbl -> build/lib/stempel 2019-07-23T16:39:28 installing to build/bdist.linux-armv7l/wheel 2019-07-23T16:39:28 running install 2019-07-23T16:39:28 running install_lib 2019-07-23T16:39:28 creating build/bdist.linux-armv7l 2019-07-23T16:39:28 creating build/bdist.linux-armv7l/wheel 2019-07-23T16:39:28 creating build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:39:28 copying build/lib/stempel/egothor.py -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:39:28 copying build/lib/stempel/__init__.py -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:39:28 copying build/lib/stempel/streams.py -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:39:28 copying build/lib/stempel/stemmer_20000.tbl -> build/bdist.linux-armv7l/wheel/stempel 2019-07-23T16:39:28 running install_egg_info 2019-07-23T16:39:28 running egg_info 2019-07-23T16:39:28 writing pystempel.egg-info/PKG-INFO 2019-07-23T16:39:28 writing dependency_links to pystempel.egg-info/dependency_links.txt 2019-07-23T16:39:28 writing requirements to pystempel.egg-info/requires.txt 2019-07-23T16:39:28 writing top-level names to pystempel.egg-info/top_level.txt 2019-07-23T16:39:28 reading manifest file 'pystempel.egg-info/SOURCES.txt' 2019-07-23T16:39:28 writing manifest file 'pystempel.egg-info/SOURCES.txt' 2019-07-23T16:39:28 Copying pystempel.egg-info to build/bdist.linux-armv7l/wheel/pystempel-1.0.1-py3.7.egg-info 2019-07-23T16:39:28 running install_scripts 2019-07-23T16:39:28 creating build/bdist.linux-armv7l/wheel/pystempel-1.0.1.dist-info/WHEEL 2019-07-23T16:39:28 creating '/tmp/pip-wheel-172_1wel/pystempel-1.0.1-py3-none-any.whl' and adding 'build/bdist.linux-armv7l/wheel' to it 2019-07-23T16:39:28 adding 'stempel/__init__.py' 2019-07-23T16:39:28 adding 'stempel/egothor.py' 2019-07-23T16:39:29 adding 'stempel/stemmer_20000.tbl' 2019-07-23T16:39:29 adding 'stempel/streams.py' 2019-07-23T16:39:29 adding 'pystempel-1.0.1.dist-info/METADATA' 2019-07-23T16:39:29 adding 'pystempel-1.0.1.dist-info/WHEEL' 2019-07-23T16:39:29 adding 'pystempel-1.0.1.dist-info/top_level.txt' 2019-07-23T16:39:29 adding 'pystempel-1.0.1.dist-info/RECORD' 2019-07-23T16:39:29 removing build/bdist.linux-armv7l/wheel 2019-07-23T16:39:29 Building wheel for pystempel (setup.py): finished with status 'done' 2019-07-23T16:39:29 Stored in directory: /tmp/tmpney1jfgq 2019-07-23T16:39:29 Successfully built pystempel 2019-07-23T16:39:29 Cleaning up... 2019-07-23T16:39:29 Removing source in /tmp/pip-wheel-1t5ky9dr/pystempel 2019-07-23T16:39:29 Removed build tracker '/tmp/pip-req-tracker-taf0zudk'