Fleche Storage Backends
This notebook demonstrates the usage of the different storage backends available in fleche. In fleche, a Cache is composed of two storage components:
values: Stores the actual results of the functions.
calls: Stores the metadata about the function calls (arguments, function name, etc.) and references to the stored values.
You can mix and match different storage backends for values and calls to suit your needs.
Memory Storage
The Memory storage backend keeps all the cached data in memory. This is the simplest backend and is useful for testing or when you don’t need to persist the cache beyond the current process.
[1]:
from fleche import fleche, cache
from fleche.caches import Cache
from fleche.storage import Memory
# Using Memory for both values and calls
memory_cache = Cache(values=Memory({}), _calls=Memory({}))
with cache(memory_cache):
@fleche
def add(a, b):
print(f"Executing add({a}, {b})")
return a + b
print(f"Result 1: {add(2, 3)}")
print(f"Result 2: {add(2, 3)}") # This should be cached
print('Cache keys in calls storage:', list(memory_cache.calls.list()))
No config file found. Using default memory cache.
() ()
Executing add(2, 3)
Result 1: 5
Result 2: 5
Cache keys in calls storage: ['947d73b2adcb9f69e1ddd5d05678a08983d98a971591910b0466084214a9cc08']
PickleFile Storage
The PickleFile storage backend serializes data using Python’s standard pickle module and stores it in individual files. It requires a root directory.
[2]:
import shutil
from pathlib import Path
from fleche.storage import PickleFile
shutil.rmtree('.pickle_cache', ignore_errors=True)
pickle_cache = Cache(
values=PickleFile.with_pickle(root='.pickle_cache/values'),
_calls=PickleFile.with_pickle(root='.pickle_cache/calls')
)
with cache(pickle_cache):
@fleche
def greet(name):
print(f"Executing greet({name})")
return f"Hello, {name}!"
print(greet("World"))
print(greet("World"))
print("Files in calls storage:")
!ls .pickle_cache/calls
() ()
Executing greet(World)
Hello, World!
Hello, World!
Files in calls storage:
e9cc541e2a3c72d117bec87999a8966057cdd5f356c83a98583288163d8ef535
Compressed PickleFile Storage
You can also enable gzip compression for PickleFile (and its variants CloudpickleFile and DillFile) by passing compress=True.
[3]:
import shutil
shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)
compressed_cache = Cache(
values=PickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),
_calls=PickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)
)
with cache(compressed_cache):
@fleche
def big_result(n):
return "a" * n
big_result(1000)
import gzip
file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]
with open(file_path, 'rb') as f:
header = f.read(2)
# check for gzip magic number 0x1f 0x8b
is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)
print(f"Is gzip compressed: {is_compressed}")
() ()
Is gzip compressed: True
CloudpickleFile Storage
The CloudpickleFile storage backend is similar to PickleFile but uses cloudpickle for serialization. cloudpickle can handle more complex Python objects, like lambdas or functions defined interactively, that standard pickle might struggle with.
[4]:
from fleche.storage import PickleFile
shutil.rmtree('.cloudpickle_cache', ignore_errors=True)
cp_cache = Cache(
values=PickleFile.with_cloudpickle(root='.cloudpickle_cache/values'),
_calls=PickleFile.with_cloudpickle(root='.cloudpickle_cache/calls')
)
with cache(cp_cache):
@fleche
def mul(a, b):
print(f"Executing mul({a}, {b})")
return a * b
print(f"Result: {mul(3, 4)}")
print(f"Result: {mul(3, 4)}")
print("Files in values storage:")
!ls .cloudpickle_cache/values
() ()
Executing mul(3, 4)
Result: 12
Result: 12
Files in values storage:
056a34cfa7662755ef47b05e966a8cd152ad9cd663134c03b0a4e6f61349c7e5
65d52a82c5a72f12ca0499522dc9274a0e6822e1038630ba68f94400b3e4c98f
83ada2198553b88cb3d0882f7fca8c4e9531049b978df3e9e3b5d6301c6c0bfa
DillFile Storage
The DillFile storage backend is similar to CloudpickleFile but uses dill for serialization.
[5]:
from fleche.storage import PickleFile
shutil.rmtree('.dill_cache', ignore_errors=True)
dill_cache = Cache(
values=PickleFile.with_dill(root='.dill_cache/values'),
_calls=PickleFile.with_dill(root='.dill_cache/calls')
)
with cache(dill_cache):
@fleche
def add(a, b):
print(f"Executing add({a}, {b})")
return a + b
print(f"Result: {add(3, 4)}")
print(f"Result: {add(3, 4)}")
print("Files in values storage:")
!ls .dill_cache/values
() ()
Executing add(3, 4)
Result: 7
Result: 7
Files in values storage:
307ea96b8b27c58a4c93e2e34e3783e475dc0e8ef99b4c5bbfa6b67e2cab7d80
65d52a82c5a72f12ca0499522dc9274a0e6822e1038630ba68f94400b3e4c98f
83ada2198553b88cb3d0882f7fca8c4e9531049b978df3e9e3b5d6301c6c0bfa
BagOfHoldingH5File Storage
The BagOfHoldingH5File storage backend uses the bagofholding library to store data in HDF5 files. This is particularly efficient for large numerical arrays (e.g., NumPy arrays).
[6]:
import numpy as np
from fleche.storage import BagOfHoldingH5File
shutil.rmtree('.boh_cache', ignore_errors=True)
boh_cache = Cache(
values=BagOfHoldingH5File(root='.boh_cache/values'),
_calls=PickleFile.with_cloudpickle(root='.boh_cache/calls') # We can use Cloudpickle for calls
)
with cache(boh_cache):
@fleche
def make_array(n):
print(f"Executing make_array({n})")
return np.ones((n, n))
print(f"Array sum: {make_array(5).sum()}")
print(f"Array sum: {make_array(5).sum()}")
print("Files in H5 values storage:")
!ls .boh_cache/values
() ()
Executing make_array(5)
Array sum: 25.0
Array sum: 25.0
Files in H5 values storage:
183708770a6459111eb4effccefdb31bac94cee07d7ea356d8e99b08b8551bcf
5b07a837d91d67764109c11fb912c9b91b4e9d9d4c909fca542e5fbe39ddc244
Sql Storage (Call Storage Only)
The Sql storage backend uses SQLAlchemy to store call metadata in a SQL database (like SQLite). It provides advanced querying capabilities but can only be used for the calls component of a Cache.
[7]:
from fleche.storage import Sql
import os
if os.path.exists('calls.db'): os.remove('calls.db')
sql_cache = Cache(
values=Memory({}),
_calls=Sql(url='sqlite:///calls.db')
)
with cache(sql_cache):
@fleche
def power(a, b):
print(f"Executing power({a}, {b})")
return a ** b
power(2, 8)
power(2, 8)
print("Calls in SQL storage:")
print(list(sql_cache.calls.list()))
() ()
Executing power(2, 8)
Calls in SQL storage:
['bbbcff21a09114bee332b0f0da9085948aadc5177ac01dccbf54023426813087']
Mix and Match
As seen in the BagOfHoldingH5File and Sql examples, you can mix different backends for values and calls. For instance, you might want to use BagOfHoldingH5File for large values but Sql for calls to enable efficient metadata querying.
[8]:
mixed_cache = Cache(
values=BagOfHoldingH5File(root='.mixed_cache/values'),
_calls=Sql(url='sqlite:///.mixed_cache/calls.db')
)
with cache(mixed_cache):
@fleche
def compute_heavy(x):
return np.random.rand(x, x)
compute_heavy(10)
print(f"Value storage: {type(mixed_cache.values).__name__}")
print(f"Call storage: {type(mixed_cache.calls).__name__}")
() ()
Value storage: DestructuringStorage
Call storage: Sql
Clean Up
Remove temporary cache directories.
[9]:
import shutil
shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)
compressed_cache = Cache(
values=PickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),
_calls=PickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)
)
with cache(compressed_cache):
@fleche
def big_result(n):
return "a" * n
big_result(1000)
import gzip
file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]
with open(file_path, 'rb') as f:
header = f.read(2)
# check for gzip magic number 0x1f 0x8b
is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)
print(f"Is gzip compressed: {is_compressed}")
() ()
Is gzip compressed: True