1. Introduction¶
The module lru_ng
provides a single Python type,
LRUDict
that behaves like a normal Python dictionary
dict
by providing key-value mapping.
In addition, its capacity is bounded. When the size bound is reached, inserting
a new key will cause the least-recently used key (and its associated value) to
be removed (or “evicted”) from the mapping. The LRUDict
can be
resized, and reducing its size will remove keys in the order from the least- to
the most-recent use.
Optionally, a callback can be registered for displaced items. The callback is applied to the (key, value) pair, and can be used for a variety of purposes such as freeing up resources, printing information, or queueing up the discarded items for reuse, among others.
1.1. Installation¶
The latest version’s source is available from the PyPI, and you can install it
using the pip
command:
pip install lru_ng
The pip
utility will download the C source and try to build a binary
distribution. For this to work, you need CPython >= 3.6. The following are
required for compilation:
A C compiler that supports C99 syntax
Python development headers
setuptools, although
distutils
included with Python should work, too.
1.2. First example¶
An LRUDict
object is created by specifying its fixed size bound. In
this example we begin with a size of 4 items.
>>> from lru_ng import LRUDict
>>> cache = LRUDict(3)
The object cache
, in many ways, will behave just like a Python
dict
. It can be indexed, and you can test for membership using the
in
keyword.
>>> cache["system"] = "programming"
>>> cache["web"] = "development"
>>> "technology" in cache
False
>>> cache["documentation"]
Traceback (most recent call last):
...
KeyError: 'documentation'
By incurring a KeyError
exception, we have caused a cache miss, which
is recorded.
>>> # For Python >= 3.8
>>> "hits: {0.hits}; misses: {0.misses}".format(cache.get_stats())
'hits: 0; misses: 1'
>>> # For Python < 3.8, use
>>> "hits: {}; misses: {}".format(*cache.get_stats())
'hits: 0; misses: 1'
Successful retrieval of a value by key increases the “hits” record by one:
>>> word = cache["system"]
>>> hits, misses = cache.get_stats()
>>> hits
1
When more keys are inserted and the capacity reached, items are displaced by starting from the least-recently used.
>>> for phrase in ("documentation writing", "software testing"):
... topic, action = phrase.split(" ")
... cache[topic] = action
>>> print(cache)
<LRUDict(3) object with dict {'system': 'programming', 'documentation': 'writing', 'software': 'testing'} at 0x...>
>>> len(cache)
3
The item for "web development"
has been removed from cache
. The
item is the first to be discarded because "web"
is the least recently
used key.
1.3. Using a callback¶
A callback is a callable object with two arguments for (key, value)
.
If it is set, it will apply to displaced (or “evicted”) key-value pairs. The
callback can be accessed or set via the LRUDict.callback
property.
>>> def func(key, value):
... print(f"discarded: ({key}, {value})")
... print("total length: ", len(key) + len(value))
>>> cache.callback = func
>>> cache["quality"] = "assurance"
discarded: (system, programming)
total length: 17
This is a rather trivial illustration. A more useful situation for the callback would be, for example, where the values are objects that should be cleaned up before reaching the end of lifecycle. If the values stored are file-like objects, closing them would be a suitable task for a callback.
The callback can also be specified during the instance initialization as an optional parameter:
>>> cache = LRUDict(size, callback=func)
1.4. Caveats with callbacks¶
Although the callback may execute arbitrary code, certain restrictions are imposed on its action. Most notably:
Warning
The return value of the callback is discarded.
Any exception raised in the callback cannot be handled the usual way.
What the latter means is that, if the callback would have raised an exception,
the exception is passed to the “unraisable handler”
sys.unraisablehook
which may be further customized. Python’s default
unraisable handler will print a message with the traceback to
sys.stderr
.
Note
It is strongly suggested that a callback should only
do what is minimally necessary,
avoid lengthy computation or substantial interaction with an external system, and
especially avoid modifying the state of the
LRUDict
object itself.
Because of the loss of return value and suppression of exceptions, it has limited capability to interact with other parts of the programme.
The most significant reason why it works like this is that any method call that
“adds something” to the LRUDict
object may trigger the
eviction-handling callback. The caller may not be well-equipped to handle any
exception raised; indeed, we cannot assume that the caller ever knows or should
know what, if any, is being evicted.
Also, it is rather likely that the evicted object will go beyond the reach of
normal code if no other references to it are held. The callback in this way
shares some similarity with the __del__()
method (or
“finalizer”). However, unlike __del__()
, the callback is not
bound to the object about to be discarded.