Cool. Generation of symbolic music using transformers is indeed a pretty neglected field. I assume you have to encode more "musical knowledge" into the embeddings than when you "just" compress waveforms. Can you provide information on your embeddings?
Hi, It is actually not using transformers, those would be too slow. It is using a combination of CNN's and linear layers. Correct, it uses embedings, not waveforms or spectrograms. The inputs are midis, some of which I made myself in FL Studio. The model creates a "latent representation" from each midi, I can then sample randomly from this latent space to get an original piece. The most important part is the preprocessing in my opinion.
That's fascinating. This sounds like a variational autoencoder. The embeddings, which from my humble point of view (as a trained musician) are a largely unexplored field not really supported by existing theory, are at the same time game-deciding. Have you found a good solution for this?
Feel free to comment any enhancement suggestions or points you have.
Cool. Generation of symbolic music using transformers is indeed a pretty neglected field. I assume you have to encode more "musical knowledge" into the embeddings than when you "just" compress waveforms. Can you provide information on your embeddings?
Hi, It is actually not using transformers, those would be too slow. It is using a combination of CNN's and linear layers. Correct, it uses embedings, not waveforms or spectrograms. The inputs are midis, some of which I made myself in FL Studio. The model creates a "latent representation" from each midi, I can then sample randomly from this latent space to get an original piece. The most important part is the preprocessing in my opinion.
That's fascinating. This sounds like a variational autoencoder. The embeddings, which from my humble point of view (as a trained musician) are a largely unexplored field not really supported by existing theory, are at the same time game-deciding. Have you found a good solution for this?