Adding per-user persistent memory to a chat application is the most common Spectron integration. This guide covers the two integration shapes – Spectron-driven and caller-driven – and shows how to inject memory into system prompts and how memory accumulates across multiple sessions.
The pattern
The core pattern is:
One session per conversation – scoped to the user's identifier.
Profile injection – retrieve the user's accumulated memory and prepend it to the system prompt before each LLM call.
Turn recording – after each exchange, record both the user and assistant turns so the extraction pipeline can update memory.
Memory builds up across sessions automatically. The second time a user starts a conversation, the profile already contains facts from previous sessions.
Integration shape 1 – Spectron drives the loop
Use this shape when you want the simplest possible integration and are comfortable letting Spectron manage the LLM calls. You provide an agent_fn callback; Spectron calls it, records the result, and runs extraction.
asyncdefllm_call(messages: list[dict]) -> str: # Build a memory-aware system prompt user_id=messages[0].get("user_id")# passed as metadata profile=awaitmemory.profile(scope={"user": user_id}) system="You are a helpful assistant." ifprofile.summary: system+=f"\n\n{profile.summary}" returnyour_llm(system=system,messages=messages)
# Per conversation session=awaitmemory.sessions.create( scope={"user": user_id}, agent_fn=llm_call, )
# Each user message result=awaitsession.chat(content=user_message) reply=result.response
asyncdefhandle_message(user_id: str,session_id: str|None,user_message: str) -> str: # Re-open existing session or create a new one ifsession_id: session=memory.sessions.open(session_id) else: session=awaitmemory.sessions.create(scope={"user": user_id})
# Build system prompt system="You are a helpful assistant." ifprofile.summary: system+=f"\n\n## User context\n{profile.summary}" ifctx.formatted: system+=f"\n\n## Relevant memory\n{ctx.formatted}"
# Your LLM call response=your_llm(system=system,user=user_message)
# Record the exchange awaitsession.turn(role="user",content=user_message) awaitsession.turn(role="assistant",content=response)
letsystem="You are a helpful assistant."; if(profile.summary)system+=`\n\n## User context\n${profile.summary}`; if(ctx.formatted)system+=`\n\n## Relevant memory\n${ctx.formatted}`;
The key difference between the two shapes is ownership of the LLM call. The caller-driven shape is usually the right choice for existing applications because it requires no changes to the core call path – you add memory injection before and recording after.
Injecting profile into the system prompt
The profile endpoint is designed for system prompt injection. It returns a summary string that is dense and LLM-readable, plus a structured attributes list if you want fine-grained control.
Memory is not session-scoped – it is user-scoped. Every session with the same user scope dimension feeds into the same pool of entities and attributes.
Consider a user who has three conversations over a week:
Session 1: "I work at Acme Corp as a backend engineer." → extracts employer: Acme Corp, role: backend engineer.
Session 3: "I just moved to the platform team." → updates role: platform engineer with a supersession chain.
By session 3, the profile contains all three facts. The role update from session 3 supersedes session 1, but the old value is preserved in the supersession chain for auditability.
Querying accumulated memory
To see what Spectron currently knows about a user: