Coordination

Projects

Tasks

Meeting Notes

Weekly Meetings

We have a call over Google Meet on Wednesdays from 4:30 PM - ~5:00 PM PST. This time is used for resolving blockers, brainstorming, and clarifying our direction. Anyone is welcome, even if you’re just interested in listening in. This call isn’t required, but contributing researchers are encouraged to attend. Video call link: https://meet.google.com/jtk-bkdo-jgz

Motivation

Memorization refers to language models' tendency to sometimes output entire training sequences verbatim. This phenomenon is not deeply understood but has implications for safely deploying language models. In particular, it is vital to minimize a model’s memorization of sensitive datapoints such as those containing personally identifiable information (PII) and trade secrets.

This project aims to challenge this traditional definition of memorization. We believe that it captures the spirit of the problem but that it is too broad. For example, the k-elicitable definition ( Carlini et al., 2022) treats highly repetitive text, code, and sequences with only a single true continuation as memorized and thus undesirable. We conjecture that traditional memorization definitions incorrectly capture too many of these benign memorizations and don't accurately reflect undesirable memorization.

Archetypal examples of sequences from The Pile “memorized” by GPT-2, even though GPT-2 was not trained on The Pile. This implies that either there is training set overlap, or that there are sequences that most competent language models could predict without needing to see the sequence during training. Carlini et al., 2022)

Archetypal examples of sequences from The Pile “memorized” by GPT-2, even though GPT-2 was not trained on The Pile. This implies that either there is training set overlap, or that there are sequences that most competent language models could predict without needing to see the sequence during training. Carlini et al., 2022)

Why Does This Matter?

Potential Research/Paper Contributions