Fleche Storage Backends
This notebook demonstrates the usage of the different storage backends available in fleche. In fleche, a Cache is composed of two storage components:
values: Stores the actual results of the functions.
calls: Stores the metadata about the function calls (arguments, function name, etc.) and references to the stored values.
You can mix and match different storage backends for values and calls to suit your needs.
Memory Storage
The Memory storage backend keeps all the cached data in memory. This is the simplest backend and is useful for testing or when you don’t need to persist the cache beyond the current process.
[ ]:
from fleche import fleche, cache
from fleche.caches import Cache
from fleche.storage import ValueMemory, CallMemory
# Using Memory for both values and calls
memory_cache = Cache(values=ValueMemory({}), calls=CallMemory({}))
with cache(memory_cache):
@fleche
def add(a, b):
print(f"Executing add({a}, {b})")
return a + b
print(f"Result 1: {add(2, 3)}")
print(f"Result 2: {add(2, 3)}") # This should be cached
print('Cache keys in calls storage:', list(memory_cache.calls.list()))
PickleFile Storage
The PickleFile storage backend serializes data using Python’s standard pickle module and stores it in individual files. It requires a root directory.
[ ]:
import shutil
from pathlib import Path
from fleche.storage import ValuePickleFile, CallPickleFile
shutil.rmtree('.pickle_cache', ignore_errors=True)
pickle_cache = Cache(
values=ValuePickleFile.with_pickle(root='.pickle_cache/values'),
calls=CallPickleFile.with_pickle(root='.pickle_cache/calls')
)
with cache(pickle_cache):
@fleche
def greet(name):
print(f"Executing greet({name})")
return f"Hello, {name}!"
print(greet("World"))
print(greet("World"))
print("Files in calls storage:")
!ls .pickle_cache/calls
Compressed PickleFile Storage
You can also enable gzip compression for PickleFile (and its variants CloudpickleFile and DillFile) by passing compress=True.
[ ]:
import shutil
shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)
compressed_cache = Cache(
values=ValuePickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),
calls=CallPickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)
)
with cache(compressed_cache):
@fleche
def big_result(n):
return "a" * n
big_result(1000)
file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]
with open(file_path, 'rb') as f:
header = f.read(2)
# check for gzip magic number 0x1f 0x8b
is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)
print(f"Is gzip compressed: {is_compressed}")
CloudpickleFile Storage
The CloudpickleFile storage backend is similar to PickleFile but uses cloudpickle for serialization. cloudpickle can handle more complex Python objects, like lambdas or functions defined interactively, that standard pickle might struggle with.
[ ]:
shutil.rmtree('.cloudpickle_cache', ignore_errors=True)
cp_cache = Cache(
values=ValuePickleFile.with_cloudpickle(root='.cloudpickle_cache/values'),
calls=CallPickleFile.with_cloudpickle(root='.cloudpickle_cache/calls')
)
with cache(cp_cache):
@fleche
def mul(a, b):
print(f"Executing mul({a}, {b})")
return a * b
print(f"Result: {mul(3, 4)}")
print(f"Result: {mul(3, 4)}")
print("Files in values storage:")
!ls .cloudpickle_cache/values
DillFile Storage
The DillFile storage backend is similar to CloudpickleFile but uses dill for serialization.
[ ]:
shutil.rmtree('.dill_cache', ignore_errors=True)
dill_cache = Cache(
values=ValuePickleFile.with_dill(root='.dill_cache/values'),
calls=CallPickleFile.with_dill(root='.dill_cache/calls')
)
with cache(dill_cache):
@fleche
def add(a, b):
print(f"Executing add({a}, {b})")
return a + b
print(f"Result: {add(3, 4)}")
print(f"Result: {add(3, 4)}")
print("Files in values storage:")
!ls .dill_cache/values
BagOfHoldingH5File Storage
The BagOfHoldingH5File storage backend uses the bagofholding library to store data in HDF5 files. This is particularly efficient for large numerical arrays (e.g., NumPy arrays).
[ ]:
import numpy as np
from fleche.storage import ValueBagOfHoldingH5File
shutil.rmtree('.boh_cache', ignore_errors=True)
boh_cache = Cache(
values=ValueBagOfHoldingH5File(root='.boh_cache/values'),
calls=CallPickleFile.with_cloudpickle(root='.boh_cache/calls') # We can use Cloudpickle for calls
)
with cache(boh_cache):
@fleche
def make_array(n):
print(f"Executing make_array({n})")
return np.ones((n, n))
print(f"Array sum: {make_array(5).sum()}")
print(f"Array sum: {make_array(5).sum()}")
print("Files in H5 values storage:")
!ls .boh_cache/values
Sql Storage (Call Storage Only)
The Sql storage backend uses SQLAlchemy to store call metadata in a SQL database (like SQLite). It provides advanced querying capabilities but can only be used for the calls component of a Cache.
[ ]:
from fleche.storage import Sql
import os
if os.path.exists('calls.db'): os.remove('calls.db')
sql_cache = Cache(
values=ValueMemory({}),
calls=Sql(url='sqlite:///calls.db')
)
with cache(sql_cache):
@fleche
def power(a, b):
print(f"Executing power({a}, {b})")
return a ** b
power(2, 8)
power(2, 8)
print("Calls in SQL storage:")
print(list(sql_cache.calls.list()))
Mix and Match
As seen in the BagOfHoldingH5File and Sql examples, you can mix different backends for values and calls. For instance, you might want to use BagOfHoldingH5File for large values but Sql for calls to enable efficient metadata querying.
[ ]:
mixed_cache = Cache(
values=ValueBagOfHoldingH5File(root='.mixed_cache/values'),
calls=Sql(url='sqlite:///.mixed_cache/calls.db')
)
with cache(mixed_cache):
@fleche
def compute_heavy(x):
return np.random.rand(x, x)
compute_heavy(10)
print(f"Value storage: {type(mixed_cache.values).__name__}")
print(f"Call storage: {type(mixed_cache.calls).__name__}")
Clean Up
Remove temporary cache directories.
[ ]:
import shutil
shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)
compressed_cache = Cache(
values=ValuePickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),
calls=CallPickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)
)
with cache(compressed_cache):
@fleche
def big_result(n):
return "a" * n
big_result(1000)
file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]
with open(file_path, 'rb') as f:
header = f.read(2)
# check for gzip magic number 0x1f 0x8b
is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)
print(f"Is gzip compressed: {is_compressed}")