Foundation Models Guide / Chapter 12

Training Custom Adapters

On-device Foundation Models are capable, but sometimes they are not specific enough for your use case. You may need the model to understand your app’s domain, follow particular formatting rules, or adopt a consistent voice that matches your app’s personality. Custom adapters specialize behavior without requiring you to retrain the full model.

I trained my first adapter on an M4 MacBook Air with 24GB RAM, using a toy dataset of playwriting scripts that is included in the adapter training toolkit. In ~69 minutes of training, the adapter learned to generate perfectly formatted theatrical scenes with consistent XML-style markup. Apple sent me a maxed-out M5 MacBook Pro with 32GB of RAM that helped me rerun the same experiment with some RAM headroom.

This chapter covers Apple’s custom adapter training process from start to finish, with performance metrics, actual training results, and guidance for training on both resource-constrained and well-provisioned machines.

Prerequisites and Context

This chapter builds on the sessions chapter, streaming and snapshots, structured generation, tool use, safety and best practices, and internationalization. Custom adapters are not directly related to these topics, but an understanding of the base framework is important before you invest your time, energy, and money in specialized training.

What You Will Learn

By the end of this chapter, you will be able to:

Determine when custom adapters are worth the investment over advanced prompting
Understand memory usage, training time, and practical performance metrics
Prepare training datasets that capture your specific domain or style
Train adapters using Apple’s toolkit with real hyperparameter guidance
Export and integrate custom adapters into your Foundation Models apps
Evaluate adapter performance with concrete before-and-after examples
Plan for production deployment with version management and asset delivery

Understanding Foundation Models Adapters

Custom adapters are not full model retraining. They are small, specialized layers that modify how the base Foundation Models behave for a specific use case. Apple uses LoRA (Low-Rank Adaptation) where the original model weights stay frozen and only small adapter matrices are trained.

This approach has several advantages:

Faster training where you can train in hours instead of days or weeks
Lower memory requirements as you can train on consumer hardware with optimizations
Smaller file sizes as each adapter is around 160MB
Multiple adapters where you can train different adapters for different tasks

The trade-off is that adapters are tied to specific Foundation Models versions. When Apple updates the base model with an OS release, you must retrain your adapters. This is a significant consideration when deciding whether adapters are right for your app.

Each adapter is compatible with a single base system model version. If the adapter version does not match the runtime base model version on a person’s device, the framework raises a runtime error and the adapter cannot load.

When to Consider Custom Adapters

Before committing to adapter training, ask: “Can you solve this with better prompting or tools?”

Most of the time, the answer is yes. The base Foundation Models are good enough when you provide strong instructions and relevant context through tools.

Adapters are worth the investment when you have specialized domain knowledge like medical terminology, legal concepts, or technical jargon that the base model struggles with, or other domain-specific knowledge that the base model does not know about. You may also need the latest information or data that the base model does not have access to without internet access.

Another reason to consider adapters is consistent style requirements such as when your app needs responses in a specific format or voice that wastes too many tokens to achieve reliably with prompt engineering. You can also train adapters for repetitive tasks such as when you are using the same lengthy prompts repeatedly, an adapter can reduce latency and token usage, so you can fill more of the user’s prompt into the model’s limited context window.

A useful rule of thumb: if you are writing 500+ word prompts to achieve consistent behavior, an adapter might be more efficient.

Training on Resource-Constrained Hardware

The official toolkit documentation recommends 32GB+ for adapter training. I successfully trained on a MacBook Air M4 with 24GB unified memory, but I do not recommend it. The machine was swapping memory constantly, making the training process painfully slow. A fanless machine is also not ideal for training as it gets very hot. Here are some numbers during training on the toy dataset:

Base model: 13GB
System/PyTorch overhead: 5GB
Total Python process: 25-27GB (peak)
Swap usage: 7GB

Training Time (Toy Dataset)

Epoch 1: 37 minutes 35 seconds (84 training batches)
Epoch 1 Evaluation: 2 minutes 34 seconds (36 validation batches)
Epoch 2: 32 minutes 37 seconds (batch size reductions helped)
Epoch 2 Evaluation: 3 minutes 20 seconds (more thorough)
Total: ~77 minutes for 2 epochs

This was with batch-size 1 and activation checkpointing enabled, purposefully optimized for a 24GB machine.

Scaling Up with the M5 MacBook Pro

I was able to train the adapter on a machine with more resources. Here are the numbers during training on the toy dataset:

Epoch 1/2
Training: 21/21 batches in 13m50s, loss 0.859
Evaluation: 9/9 batches in 2m04s, loss 0.605

Epoch 2/2
Training: 21/21 batches in 8m47s, loss 0.234
Evaluation: 9/9 batches in 4m55s, loss 0.606

Training loss fell 73% across the two epochs (0.859 -> 0.234) with a relatively flat evaluation curve at ~0.606, mirroring the M4 experiment while completing in 23 minutes total versus ~57 minutes on the M4.

Setting Up the Training Environment

Apple provides an adapter training toolkit available from the Developer portal. Here is how to set up the environment:

Environment Setup

# 1. Download toolkit from Apple Developer portal
# 2. Create Python virtual environment
python3 -m venv adapter_env
source adapter_env/bin/activate

# 3. Navigate to toolkit
cd /path/to/adapter_training_toolkit_v26_0_0

# 4. Install dependencies
pip install -r requirements.txt

# 5. Test basic generation (verifies setup)
python -m examples.generate --prompt "Hello, world!" --max-new-tokens 20

Training Your First Adapter

For this chapter, I trained an adapter on the toy dataset included in the toolkit, which contains 81 playwriting training examples split between training and evaluation sets. The goal is to specialize the base model to generate consistently formatted theatrical scripts.

The Dataset

The toy dataset contains examples like the following:

[{"role": "system", "content": "A conversation between a user and a helpful assistant. Taking the role as a play writer assistant for a kids' play."}, {"role": "user", "content": "generate a script about The Friendly Dolphin"}, {"role": "assistant", "content": "<center>Act One</center>\n\n<center>Scene 1</center>\n\n<stage>The ocean waves. Day. SUNLIGHT dances on the water's surface.</stage>\n\n<center>DOLPHIN</center>\n\n<dialog>Let us play a game of tag in the water!</dialog>\n\n<center>SEAGULL</center>\n\n<dialog>I love playing tag! But watch out for the waves!</dialog>\n\n<stage>DOLPHIN leaps through the air, playfully splashing SEAGULL.</stage>\n\n<center>SEAGULL</center>\n\n<dialog>You are a great friend, Dolphin!</dialog>"}]

The key point: this dataset is small (81 total examples), but highly consistent—every response follows the same XML-like theatrical markup:

<center> tags for character names
<stage> tags for stage directions
<dialog> tags for dialogue

Training Configuration

I used the adapter-studio wrapper to orchestrate the training process:

adapter-studio train-adapter 
  --demo 
  --epochs 2 
  --batch-size 1 
  --activation-checkpointing 
  --learning-rate 1e-3 
  --warmup-epochs 1

Here is what each setting does:

--demo: Uses the included toy dataset automatically
--epochs 2: Enough training to see specialization without overfitting
--batch-size 1: Important for 24GB machines, processes one example at a time
--activation-checkpointing: Recomputes activations instead of storing them (trades compute for memory)
--learning-rate 1e-3: Standard rate for adapter fine-tuning, does not dramatically change base model behavior
--warmup-epochs 1: Gradually increases LR in epoch 1 to stabilize training

Training Results

Here are the training results during the first epoch:

Epoch 1/2
Training:  100%|==| 84/84 [37:35<00:00, 26.85s/it, loss=0.751]
Evaluation: 100%|==| 36/36 [02:34<00:00,  4.29s/it, loss=0.604]

Epoch 2/2
Training:  100%|==| 84/84 [32:37<00:00, 23.30s/it, loss=0.283]
Evaluation: 100%|==| 36/36 [03:20<00:00,  5.57s/it, loss=0.571]

Here is what this means:

Training loss dropped 62% (0.751 -> 0.283)
Validation loss landed slightly higher at 0.571, staying near the training curve and showing only a mild gap on this small dataset
The adapter generated consistently structured theatrical markup in evaluation runs, indicating it learned the format instead of memorizing specific examples

The adapter checkpoints are saved to:

/path/to/toolkit/checkpoints/demo_20251022_161532/
├── adapter_epoch_1.pt
├── adapter_epoch_2.pt  (best one)
└── adapter-final.pt    (ready to use)

Testing Your Adapter: Before and After

To see the difference, compare the same prompt before and after adapter training. For the prompt “Write a script about a friendly robot”:

Base Model Response

**Title: Andy: A Friendly Robot Adventure**

**INT. SMALL APARTMENT - DAY**

*(The scene opens in a cozy, cluttered apartment bustling with youthful energy. A friendly robot named ANDY, with rounded silver panels...

Notice: Generic screenplay format with asterisks for stage directions. No consistent markup. The model defaulted to what it “knows” about screenplays in general.

Adapter-Trained Response

Response:
<center>Scene 1</center>
<stage>A manufacturing plant. Robots assemble products.</stage>
<center>ROBOT</center>
<dialog>I am known for my friendly demeanor and efficient operations.</dialog>
<center>AUDIENCE MEMBER</center>
<dialog>Such a helpful machine!</dialog>

Notice: Perfectly formatted with consistent XML-style tags. The adapter learned the exact structure from training data and applies it reliably.

Using the Adapter Studio CLI

The adapter-studio CLI simplifies the training workflow by handling subcommands, config management, and validation.

Instead of using raw Python commands, the CLI makes the entire workflow simpler. You still need to download the toolkit and configure the path. Download link: https://developer.apple.com/download/foundation-models-adapter/

Here are the steps to use the adapter-studio CLI and train the adapter in general:

Initialize the toolkit
Test the base model
Train the adapter
Test the trained adapter
Export the adapter
Train the draft model (optional)
Test the draft model (optional)
Export the draft model (optional)

# Initialize (download toolkit, set up venv, etc.)
adapter-studio init  # Configure toolkit path
adapter-studio setup # Create Python venv, install deps

# Test base model
adapter-studio demo --prompt "test prompt"

# Train adapter
adapter-studio train-adapter --demo --epochs 2 --batch-size 1 --activation-checkpointing

# Test trained adapter
adapter-studio generate 
  --prompt "Write a script about a friendly robot" 
  --checkpoint /path/to/adapter-final.pt

# Export to .fmadapter
adapter-studio export 
  --adapter-name playwriting 
  --checkpoint /path/to/adapter-final.pt 
  --output-dir ./exports/ 
  --author "Your Name" 
  --description "Trained to generate theatrical scripts in XML-style markup"

# Optional: Train draft model for inference speedup
adapter-studio train-draft 
  --checkpoint /path/to/adapter-final.pt 
  --train-data toy_dataset/playwriting_train.jsonl 
  --eval-data toy_dataset/playwriting_valid.jsonl

Python Version for Export

Export requires Python 3.12 or earlier. The coremltools library does not have binary wheels for Python 3.13+ on macOS arm64, causing ModuleNotFoundError: No module named 'coremltools.libmilstoragepython'.

Create the toolkit venv with Python 3.12 specifically:

# Ensure you have Python 3.12
brew install python@3.12

# Create venv with Python 3.12
python3.12 -m venv /path/to/toolkit/venv
source /path/to/toolkit/venv/bin/activate
pip install -r requirements.txt --prefer-binary

This does not affect your system Python (which can stay at 3.13 or later). The toolkit venv is isolated.

Exporting the Adapter

The output adapter is of the format .fmadapter, which is a self-contained package ready to use in Xcode:

Exporting adapter to .fmadapter format...
...
Adapter saved at /Users/rudrankriyam/Downloads/adapter_exports/playwriting.fmadapter.
Adapter exported successfully to: /Users/rudrankriyam/Downloads/adapter_exports/playwriting.fmadapter

The .fmadapter package contains:

playwriting.fmadapter/
├── adapter_weights.bin  (127MB - trained adapter weights)
└── metadata.json        (adapter metadata: author, description, etc.)

Shipping and App Size

Each adapter occupies roughly 160 MB on disk. Do not ship adapters in your main app bundle. As soon as you include multiple adapters or versions, your binary size grows quickly, and users may choose not to install or update your app.

Treat adapters like other large, optional assets:

Host adapters on your server or CDN.
Use the Background Assets framework to download exactly one adapter that is compatible with the user’s device and OS.
Keep versioning separate from your app release so you can rotate or revoke adapters without a full app update.

Entitlements and Device Support

Production apps require the entitlement com.apple.developer.foundation-model-adapter to enable custom adapters on device. However, you do not need this entitlement to train or to locally preview adapters in Xcode.

While Foundation Models can run in simulator and even in SwiftUI previews, testing adapters requires a physical device.

Adapter Studio: Side-by-Side Evaluation on macOS

I wrote an open-source macOS app called Adapter Studio that allows you to compare adapters against the baseline model side by side. It is a simple app that lets you import the .fmadapter package and compare the responses side by side.

You can run the app from Xcode or build it from the source code. Here are the features:

Run live comparisons by entering one prompt and watching both the system model and adapter responses stream in parallel with timing metrics
Inspect adapter context by reviewing file metadata, swapping adapters, or revealing them in Finder without leaving the app
Measure latency by tracking time-to-first-token and total duration so regressions surface immediately

Loading Adapters in Your App

For local testing, keep your .fmadapter outside the project directory. In Finder, select the file and press Option + Command + C to copy its absolute path, then initialize with a file URL:

import FoundationModels

let localURL = URL(filePath: "/absolute/path/to/my_adapter.fmadapter")
let adapter = try SystemLanguageModel.Adapter(fileURL: localURL)
let model = SystemLanguageModel(adapter: adapter, guardrails: .default)

guard SystemLanguageModel.default.isAvailable else {
    // Fallback to a non-adapted model or a user message
    throw NSError(domain: "Adapter", code: 1)
}

let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Your prompt here")

There are two ways to load adapters in your app:

Local Testing (file URL)

Use the file-based initializer only for local validation. Remove local adapter references before publishing your app.

Production Loading (Background Assets)

In production, initialize by name and let the system download a compatible adapter asset on demand. Remove obsolete adapters at launch, check availability, and provide a fallback when unavailable:

import BackgroundAssets
import FoundationModels

// Reclaim space and avoid mismatched versions
try SystemLanguageModel.Adapter.removeObsoleteAdapters()

// Initialize by base name (no extension). If no adapter is present,
// the system begins a background download of a compatible asset pack.
let adapter = try SystemLanguageModel.Adapter(name: "playwriting_adapter")

// Optionally, track download status and update UI before prompting.
let model = SystemLanguageModel(adapter: adapter, guardrails: .default)

guard SystemLanguageModel.default.isAvailable else {
    // Fallback to base model or defer the feature
    let fallback = LanguageModelSession(model: SystemLanguageModel.default)
    // Inform the user or queue the request
    throw NSError(domain: "Adapter", code: 2)
}

let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Write a script about a friendly robot")

Background Assets Quickstart

To download adapters at runtime, you need to add a Background Assets downloader extension in Xcode. You need to choose the hosting type:

Apple-Hosted, Managed: BAHasManagedAssetPacks=YES, BAAppGroupID, BAUsesAppleHosting=YES
Self-Hosted, Managed: BAHasManagedAssetPacks=YES, BAAppGroupID

Then, in your downloader, only allow compatible adapter packs:

func shouldDownload(_ assetPack: AssetPack) -> Bool {
    // Filter other assets if needed, then gate on compatibility
    return SystemLanguageModel.Adapter.isCompatible(assetPack)
}

Finally, to drive your loading UI, track status updates for the compatible adapter identifier and wait for .finished before prompting:

func checkAdapterDownload(name: String) async -> Bool {
    let ids = SystemLanguageModel.Adapter.compatibleAdapterIdentifiers(name: name)
    guard let id = ids.first else { return false }
    for await status in AssetPackManager.shared.statusUpdates(forAssetPackWithID: id) {
        switch status {
        case .finished(_): return true
        case .failed(_, _): return false
        default: break
        } 
    }
    return false
}

Compile the Draft Model (optional)

If your adapter includes a draft model for speculative decoding, compile it to speed up inference. Schedule compilation with Background Tasks, and expect rate limiting (three compilations per app per day on iOS, iPadOS, tvOS, and visionOS; no limit on macOS):

do {
    try await adapter.compile()
} catch {
    // Handle compilation error and continue with uncompiled adapter if necessary
}

Locale and Language Support

You can also gate adapter selection by language or locale. Query SystemLanguageModel.default.supportedLanguages and SystemLanguageModel.default.supportsLocale(_:) before prompting, and prefer a locale-appropriate adapter when you host variants. This aligns with the internationalization guidance from the internationalization chapter.

Integration Checklist

Here is a checklist for integration:

Request the com.apple.developer.foundation-model-adapter entitlement
Package adapters as Background Assets; do not ship adapters in the main app bundle
Add a downloader extension and required Info.plist keys
Call SystemLanguageModel.Adapter.removeObsoleteAdapters() at launch to reclaim space and avoid mismatched versions
Initialize adapters by name and track download status before prompting
Use guardrails and availability checks; provide a non-adapter fallback
Optionally compile the draft model in a background task to speed up inference

What’s Next

You now understand how to build and specialize Foundation Models with custom adapters. Start with the base model, refine with prompting and tools, then train adapters for genuinely domain-specific behavior that cannot be achieved through instruction alone. Adapters remain tied to Foundation Models versions, so plan for retraining whenever iOS updates introduce a new base model.