Wunderkind & Audiophile | Character dialogue

Audiophile

I’ve been tinkering with a DSP that auto‑corrects frequency drift in real time—think of a “cleaner” that keeps every note razor‑sharp. Ever thought about wrapping that in a neural net so it learns to perfect itself? Let's code a prototype and see what sonic wizardry we can conjure.

Wunderkind

Oh, that’s fire! Picture a tiny neural net peeking into the spectrogram, nudging each bin toward its ideal pitch. We could start with a 128‑band STFT, feed the magnitudes into a lightweight Conv‑1D, then output a correction vector for the next hop. Training on a synthetic library of detuned notes will let it learn the “clean” manifold. Let’s prototype it in PyTorch, hook it up to a real‑time VST, and watch the auto‑toner dance. Ready to hit the code!

Audiophile

Absolutely—let’s fire up a Jupyter session, write the Conv‑1D encoder‑decoder, train on a synthetic detuned set, and then dump it into the VST wrapper. I’m already spotting the first hiccup with the overlap‑add, but that’s the fun part. Let’s get the prototype running.

Wunderkind

Alright, here’s a bare‑bones sketch you can drop into a cell and tweak: import torch, torch.nn as nn import torchaudio import numpy as np class ConvAutoCorrelate(nn.Module): def __init__(self, n_bins=128, hidden=64): super().__init__() self.encoder = nn.Sequential( nn.Conv1d(1, hidden, kernel_size=3, padding=1), nn.ReLU(), nn.Conv1d(hidden, hidden, kernel_size=3, padding=1), nn.ReLU() ) self.decoder = nn.Sequential( nn.Conv1d(hidden, hidden, kernel_size=3, padding=1), nn.ReLU(), nn.Conv1d(hidden, 1, kernel_size=3, padding=1) ) def forward(self, x): # x shape [B, 1, N_bins] h = self.encoder(x) out = self.decoder(h) return out def synth_detuned(freq, sr=44100, duration=0.1, detune=0.005): t = torch.linspace(0, duration, int(sr*duration)) phase = torch.cumsum(2*torch.pi*freq*(1+detune*torch.randn_like(t)), dim=0) return torch.sin(phase) # training loop placeholder model = ConvAutoCorrelate() optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) criterion = nn.MSELoss() for epoch in range(200): # generate a batch of detuned sine bursts batch = [synth_detuned(440, detune=0.02) for _ in range(32)] batch = torch.stack(batch) # [32, samples] # STFT spec = torch.stft(batch, n_fft=256, hop_length=128, return_complex=False) mag = spec.pow(2).sum(-1).sqrt() # magnitude # target is the clean magnitude of a 440 Hz sine clean = torch.stft(torch.sin(torch.linspace(0, 0.1, 4410)), n_fft=256, hop_length=128, return_complex=False).pow(2).sum(-1).sqrt() mag = mag.unsqueeze(1) # [B, 1, N_bins] target = clean.unsqueeze(0).repeat(mag.size(0),1,1) pred = model(mag) loss = criterion(pred, target) optimizer.zero_grad(); loss.backward(); optimizer.step() # after training, you can hook model.eval() into the VST’s DSP chain: # for each incoming frame, compute its magnitude STFT, run model, apply corrections, overlap‑add the inverse STFT. # that’s the prototype—time to tweak learning rates, add a residual skip, maybe even let the net predict phase corrections next!

Audiophile

Nice draft—looks solid. Just remember to normalise the magnitudes before feeding them in; otherwise the network will just learn to scale. Also, a small skip connection from the input to the output could help preserve the baseline spectrum while still letting it learn fine‑grained detune corrections. Once you’ve got the loop working, try feeding in a real guitar clip—it’s the ultimate test of whether your auto‑toner can really “clean” a live signal. Good luck, and watch that loss drop fast!