The day DJing changed — source separation for djay Pro AI and VirtualDJ

I had an entirely different article planned, one that told the story of my two hour pre-release djay Pro AI sofa mixing session. It started like this:

“Bugger me — it’s 1.30 am”.

Turning to my better half on the sofa, I assured her it’s not a timely request but an exclamation.

“I know. I left you to play. I haven’t seen you this happy in a long time”.

I had only expected one game-changing announcement that day, but then VirtualDJ green-lit their own take on precisely the same thing. Thus that particular story arc got spiked.

So what happened? Well, Algoriddim announced djay Pro AI for iOS and iPadOS, a ground-up rebuild complete with audio source splitting tech dubbed Neural Mix. And faced with little choice Atomix released VirtualDJ 2021 for macOS and Windows that offers the very same feature — real-time audio source splitting into stems.

And the DJ game… changed forever? Will never be the same again? Time will tell, but such hyperbolic marketing phrases do seem somewhat appropriate as I write this.

This is not a glib statement dished out for dramatic effect — the introduction of music source splitting just changed everything. And while early days (actually still hours at the time of writing) you’re witnessing the next real revolution. I’ll explain why shortly.

As I’m stepping away from pure DJ news reporting, I’ll let you discover the respective djay Pro AI and VirtualDJ 2021 news here and here. Instead, I want to take you down a different path, to explain how we got here, and why this is so damned important.

DISCLAIMER — I’ve been Algoriddim’s video making guy for years. This however doesn’t stop me having an independent opinion.

In the beginning

It has long been my contention that everything that needs to be done by DJs has been rinsed to the nth degree via DJ technology, and that the next real innovation will happen with music. This manifested itself with the lurch towards streaming services, and the plumbing in of said services into the usual suspects’ software, and now with Denon DJ directly into hardware.

But we’re talking about individual track manipulation rather than the delivery of them. We’ve always been able to screw around with our music in all manner of ways, either directly in software to create new versions, or while performing via loops, hot cues, samples, effects, and filters.

Let’s talk about Spleeter

For DJing, this is a whole new ballgame. But it’s one that has developed over a period of time with a number of products. But most recently, Deezer’s Spleeter technology made huge waves with real demonstrations of actively extracting decent stems. Not the cleanest of stems you understand, but nobody would argue that this was nothing less than audio sorcery.

The problem was usage. You couldn’t just download an app and extract away — this was command line stuff. Hell, people struggle with the App Store, let alone navigating arcane instructions on Github.

But the real joy is that it was open-source, meaning that anyone could use it. And while Algoriddim isn’t expressly saying they’re not using Spleeter, Atomix goes out of its way to stress that their take is all their own work. Let’s see if it was worth it should Algoriddim’s patent application get approved.

The fundamental difference between old methods and this new one is immediacy. Being baked into performance software means that this all happens live. No prep is needed, and it works with any audio source, including streamed music. I happily smashed out two hours worth via TIDAL with djay Pro AI without a single hitch.

But it is transient and temporary. There’s no extraction of stems to a saved file — record labels and streaming platforms might have an issue with that. But there’s nothing stopping you recording your output in real-time. Ugh — how positively archaic.

The same but different

On the face of it, Algoriddim and Atomix just announced much the same thing. And on one level, yes that’s true. And while the end result is largely the same, the implementation and target audiences are quite different.

djay Pro AI Neural Mix VirtualDJ 2021 Spleeter stems audio source separation (1)

Firstly, djay Pro AI calls it Neural Mix, and it works by isolating beats, instruments, and vocals. You have full control over these independently on all four decks, but in two-deck mode, you can solo, mute, or swap sources with the other deck. Or you can combine beats and instruments or instruments and vocal to switch or fade between them for instant Acappella or beats. There are also options on viewing the source waveforms too.

It’s very simple but incredibly powerful when you experience it for yourself. Of the two, it’s instant gratification and is implemented in the most uncomplicated way. It just works, and it works instantly too, even on my comparatively elderly iPhone 7 Plus.

djay Pro AI Neural Mix VirtualDJ 2021 Spleeter stems audio source separation (3)

Look at the pads and EQs. 

VirtualDJ however takes the base function of splitting source audio and creates their own more expansive take on it. It takes more of a complementary EQ approach to things with an added stem on/off pad mode too. Interestingly VirtualDJ splits the song into five stems — vocals, instruments, bass, kick, and hi-hat, and combines them for different modes.

djay Pro AI Neural Mix VirtualDJ 2021 Spleeter stems audio source separation (2)

Atomix does stress that to get the instant feel of real-time source splitting, you’ll need a Mac or PC with some grunt. My 2014 MacBook Pro does work pretty well, but it’s not quite the absolutely instant feel of djay Pro AI.

From what I can tell, the big difference is that djay Pro AI quite literally does it in real-time i.e. it starts analysing chunks at the playhead, whereas VirtualDJ analyses the whole track, hence needing the powerful machine to deliver that necessary instant real-time response.

Out of the starting gates, djay Pro AI’s version is more polished, easier to understand and delivers that instant feel. VirtualDJ feels like it needs a little more work, but the five-way stem extraction may well be a winner for some. Make it separate out kick and snare/clap and that’s me sold. They’re obviously continuously developing it as it’s had three updates since launch.

Ultimately, they’re two very different animals. One is on the iOS/iPadOS platform and the other is macOS/Windows. One is simple, the other does more. Let the arms stems race begin.

Check out this djay Pro AI mix courtesy of Crossfader

BUT HOW DOES IT SOUND?

This, to quote football pundits, is definitely a game of two halves. The very first time you try either software, you get a genuine wow feeling, as if everything you knew about DJing just changed and you can’t go back. 

The experience of removing your first vocal or making that long yearned for instrumental is epic. And having done it repeatedly over the last 2 weeks or so, it’s a feeling that shows no signs of going away any time soon.

That said, the more you try it out, the more you realise that this is early days for this technology. There will be numerous ways to implement wrangling of stems, but the key is the quality of the output.

Right now it’s a mixed bag, but that is only to be expected. The source material matters, so music with space around the drums, instruments, and vocals will clearly deliver the best results.

Vocals clearly work best. Even pushing some very angry Sepultura through them pulled out a pretty clean vocal. But there’s no denying the slightly reverse reverb feel to the stems. But I stress that this is when sitting down and listening for such things with a single track. When mixing, it’s much less pronounced — it’s like your brain hears something familiar in a mix and uses real intelligence to fill in the audible gaps in quality.

It would be folly to expect true stems level quality from every track any time soon. But it will get better, and quickly too. It just needs some machines to do some more learning and to feed that right into the AI that’s driving this stuff, or however the hell this is working anyway.

But there’s no two ways about it — even at this fledgeling stage, this technology is pretty bloody magical. And as it develops, the possibilities become unfathomable.

An interesting thought to ponder — given that SoundCloud is about to be flooded with pretty shitty mashups made by DJs trying to be producers using this tech (wanna hear my Good Times/Another One Bites The Dust mashup? No?), will it make the music industry think about monetising real stems at last? Or is that just too progressive a thought for them to comprehend? Will streaming takedowns be impacted because this new fangled layering of stems will confuse the algorithms?

But what about other software? I’d say Pioneer DJ’s rekordbox is already tiptoeing though the idea of deeper track analysis with the recently announced vocal position feature. Native Instruments will probably be devastated that free stems will kill off their real Stems project, although I’d argue that they were never really into it anyway.

And already commenters are looking at Serato to respond. I doubt they’ll be too worried about people migrating from Serato DJ Pro to djay and VirtualDJ though, but it could turn the heads of newer DJs less entrenched in a particular ecosystem.

BEYOND SOFTWARE

The immediate problem is making it work with hardware. Short term, you can shift map controllers, but that’s an immediate workaround implementation rather than a proper solution.

The knock-on effect will be a slew of new full controllers designed to give direct hardware access to these new features. Having just been presented with a new cash cow, I’m sure the industry is collectively rubbing its hands together. Product managers will be working out how to implement these new features in hardware form as we speak.

I urge caution to the industry — not everyone will want to use this source splitting tech at this point, so don’t rush to deliver it to everyone. Take baby steps to deliver some modular controller solutions and see what sticks first. This might not be the technology to quickly cram into your range just yet.

For most, especially mixer users, it might be wise to lay hands on a modular controller like a Kontrol X1 or Korg Nano and play with this new feature for yourself. And should this feature take off, keeping up with the Joneses and oneupmanship dictates that hardware churn will be equally rapid too.

Also, given the march towards standalone performance, it’s just a matter of time before we see this happening in hardware too. Looks like a potential update to the already stunning Denon DJ SC6000/M Primes got given its next major USP. Perhaps I’ll wait for whatever those will be to arrive.

The promo video I made for Algoriddim. No explanations — just that oh wow moment.

Summing up

Wearing my editor’s hat, it’s been a while since I had an oh shit moment. And while a few bits of hardware have delivered that in recent years, none of them has been a real revolution. You can count those on your fingers over the last couple of decades.

But for me, source separation offers the next real shift in how we play music to a crowd. When we look back at things we now take for granted, they all started somewhere, and were pretty bloody awful by modern standards, and should by all rights have been doomed to fail. 

At the start, DVS latency felt more like a delay effect than a feature. But the promise was so strong that people stuck with it. And I feel the same about this.

Does it deliver studio-grade stems? Of course not. It would be unwise for anyone to say it never will though because we’re only seven months past the launch of Spleeter, and those algorithms can only improve dramatically, just like digital audio compression and DVS latency did too.

But it’s a start, and a bloody good one too. I sat completely lost in music for two hours not even noticing the diminished quality. It was just pure unadulterated fun and made me think more about possibilities than it did the sound.

The biggest test of all is not our DJ ears, but those of the audience. They’ll soon tell you if it isn’t good enough. And sure, a discerning audience only happy with lossless recordings being played via a £5K rotary though a Void sound system won’t be having any of that nonsense. But even in these early days, I’m certain that your average audience will love being regaled with vocals over beats that previously were strangers. Cheap beer and a few pills can make anything sound a-maaay-zing maaate.

I haven’t been this excited for the DJ future in a very long time. It’s finally going in the direction I’ve wanted it to for years.