Development¶
Goals¶
Goals:
clearly documented API
minimal web interface
minimal CLI
Development should follow a problem-solution approach.
Roadmap¶
In no particular order:
API to delete old entries. (#96)
API to delete duplicate entries. (#140)
Batch get related resources API. (#191)
update_feeds() filtering. (#193)
Web application re-design.
Plugin system / hooks stabilization. (#80)
Internal API stabilization.
CLI stabilization.
Web application stabilization.
OPML support. (#165)
Style guide¶
reader uses the Black style.
You should enforce it by using pre-commit. To install it into your git hooks, run:
pip install pre-commit # ./run.sh install-dev already does both
pre-commit install
Every time you clone the repo, running pre-commit install
should always be
the first thing you do.
Testing¶
First, install the testing dependencies:
./run.sh install-dev # or
pip install '.[search,cli,app,tests,dev,unstable-plugins]'
Run tests using the current Python interpreter:
pytest --runslow
Run tests using the current Python interpreter, but skip slow tests:
pytest
Run tests for all supported Python versions:
tox
Run tests with coverage and generate an HTML report (in ./htmlcov
):
./run.sh coverage-all
Run the type checker:
./run.sh typing # or
mypy --strict src
Start a local development server for the web application:
./run.sh serve-dev # or
FLASK_DEBUG=1 FLASK_TRAP_BAD_REQUEST_ERRORS=1 \
FLASK_APP=src/reader/_app/wsgi.py \
READER_DB=db.sqlite flask run -h 0.0.0.0 -p 8000
Building the documentation¶
First, install the dependencie:
pip install '.[docs]' # ./run.sh install-dev already does it for you
The documentation is built with Sphinx:
./run.sh docs # or
make -C docs html # using Sphinx's Makefile directly
The built HTML docs should be in ./docs/_build/html/
.
Making a release¶
Making a release (from x
to y
== x + 1
):
Note
scripts/release.py already does most of these.
(release.py) bump version in
src/reader/__init__.py
toy
(release.py) update changelog with release version and date
(release.py) make sure tests pass / docs build
(release.py) clean up dist/:
rm -rf dist/
(release.py) build tarball and wheel:
python -m build
(release.py) push to GitHub
(release.py prompts) wait for GitHub Actions / Codecov / Read the Docs builds to pass
upload to test PyPI and check:
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
(release.py) upload to PyPI:
twine upload dist/*
(release.py prompts) tag release in GitHub
build docs from latest and enable
y
docs version (should happen automatically after the first time)(release.py) bump versions from
y
to(y + 1).dev0
, add(y + 1)
changelog section(release.py prompts) deactivate old versions in Read the Docs
Design notes¶
Folowing are various design notes that aren’t captured somewhere else (either in the code, or in the issue where a feature was initially developed).
Why use SQLite and not SQLAlchemy?¶
tl;dr: For “historical reasons”.
In the beginning:
I wanted to keep things as simple as possible, so I don’t get demotivated and stop working on it. I also wanted to try out a “problem-solution” approach.
I think by that time I was already a great SQLite fan, and knew that because of the relatively single-user nature of the thing I won’t have to change databases because of concurrency issues.
The fact that I didn’t know exactly where and how I would deploy the web app (and that SQLite is in stdlib) kinda cemented that assumption.
Since then, I did come up with some of my own complexity: there’s a SQL query builder, a schema migration system, and there were some concurrency issues. SQLAlchemy would have likely helped with the first two, but not with the last one (not without dropping SQLite).
Note that it is possible to use a different storage implementation; all storage stuff happens through a DAO-style interface, and SQLAlchemy was the main real alternative I had in mind. The API is private at the moment (1.10), but if anyone wants to use it I can make it public.
It is unlikely I’ll write a SQLAlchemy storage myself, since I don’t need it (yet), and I think testing it with multiple databases would take quite some time.
Multiple storage implementations¶
Detailed requirements and API discussion: #168#issuecomment-642002049.
Parser¶
file:// handling, feed root, per-URL-prefix parsers (later retrievers, see below):
requirements: #155#issuecomment-667970956
detailed requirements: #155#issuecomment-672324186
method for URL validation (not added, as of 1.13): #155#issuecomment-673694472
Requests session plugins:
requirements: #155#issuecomment-667970956
why the Session wrapper exists: #155#issuecomment-668716387 and #155#issuecomment-669164351
Retriever / parser split:
Metrics¶
Some thoughts on implementing metrics: #68#issuecomment-450025175.
Query builder¶
Survey of possible options: #123#issuecomment-582307504.
Pagination for methods that return iterators¶
Why do it for the private implementation: #167#issuecomment-626753299 (also a comment in storage code).
Detailed requirements and API discussion for public pagination: #196#issuecomment-706038363.
Search¶
From the initial issue:
detailed requirements and API discussion: #122#issuecomment-591302580
discussion of possible backend-independent search queries: #122#issuecomment-508938311
Entry/feed “primary key” attribute naming¶
This whole issue: #159#issuecomment-612914956.
Change feed URL¶
From the initial issue:
use cases: #149#issuecomment-700066794
initial requirements: #149#issuecomment-700532183
Feed tags¶
Detailed requirements and API discussion: #184#issuecomment-689587006.
Entry user data¶
#228#issuecomment-810098748 discusses three different kinds, how they would be implemented, and why I want more use-cases before implementing them (basically, YAGNI):
entry searchable text fields (for notes etc.)
entry tags (similar to feed tags, can be used as additional bool flags)
entry metadata (similar to feed metadata)
also discusses how to build an enclosure cache/preloader (doesn’t need special reader features besides what’s available in 1.16)
Feed updates¶
Some thoughts about adding a map
argument: #152#issuecomment-606636200.
How update_feeds()
is like a pipeline: comment.
Data flow diagram for the update process, as of v1.13: #204#issuecomment-779709824.
update_feeds_iter()
:
use case: #204#issuecomment-779893386 and #204#issuecomment-780541740
return type: #204#issuecomment-780553373
Disabling updates:
Updating entries based on a hash of their content (regardless of updated
):
stable hasing of Python data objects: #179#issuecomment-796868555, the
reader._hash_utils
module, death and gravity articleideas for how to deal with spurious hash changes: #225
Decision to ignore feed.updated when updating feeds: #231.
Counts API¶
Detailed requirements and API discussion: #185#issuecomment-731743327.
Using None as a special argument value¶
This comment: #177#issuecomment-674786498.
Batch update (set) methods¶
There’s a discussion on why I want to postpone this in this comment: #187#issuecomment-700740251.
Using a single Reader objects from multiple threads¶
Some thoughts on why it’s difficult to do: #206#issuecomment-751383418.
Plugins¶
List of hooks (unmaintained as of 2021): #80.
Minimal plugin API (case study and considetrations for the built-in plugin naming scheme): 229#issuecomment-803870781.
Reserved names¶
Requirements, thoughts about the naming scheme and prefixes unlikely to collide with user names: #186 (multiple comments).
Wrapping underlying storage exceptions¶
Which exception to wrap, and which not: #21#issuecomment-365442439.