Interview - The potential of the AI developed for Ushio Kensuke’s Chainsaw Man soundtrack

The potential of the AI developed for Ushio Kensuke’s Chainsaw Man soundtrack

— The Future of the Music Business

Currently, the music business is facing a revolution on a global scale from changes in the way we listen to music, to how it is delivered to consumers, the diversification of how it is used in a given scene, and how music is monetized. “The Future of the Music Business” series looks at these changes from a variety of viewpoints to address the question of what awaits the music business in the future.

In this article, we feature the AI tool that was developed for the production of music for the television anime series Chainsaw Man. This series was highly praised for its expert adaptation of the original manga’s robust storyline to the TV anime format, along with its stunning depiction of the action scenes. The worldview in the anime is powerfully supported by the many musical compositions in the soundtrack created by Ushio Kensuke, which seem to embody destruction and chaos. In addition, the AI tool ChainsawGAN developed by Sony Computer Science Laboratories (Sony CSL) was used in creating this soundtrack.

Just what sort of AI tool is ChainsawGAN and how was it created? Four people involved in the project, including Ushio Kensuke, generously shared the behind-the-scenes of the development process, as well as their thoughts on AI and music.

Through the article, they discussed the process leading to ChainsawGAN’s development, and its conceptual origin in DrumGAN. We also look at the role of ChainsawGAN in the production of the Chainsaw Man soundtrack and dig deeper into the question, “What does AI mean for creators?”

Kensuke Ushio —

Musician/Producer. As a solo artist, he released his debut album a day, phases from the agraph label in 2008. Since 2012, Ushio has been a supporting member in live performances of the band Denki Groove. His first job writing music for film was for the TV anime series Ping Pong the Animation in 2014. Since then, Ushio has created the music for many films, including the movies The Silent Voice; Sani/32; Liz and the Blue Bird; Mori, the Artist’s Habitat; and A Gambler’s Odyssey 2020, as well as two anime films streamed worldwide by Netflix — Devilman Crybaby and Japan Sinks 2020 — and the TV anime The Heike Story.

Javier Nistal Hurlé —

Researcher at Sony Computer Science Laboratories – Paris

Ikuko Matsusaka —

Sony Computer Science Laboratories – Tokyo

Haruko Takanoha —

Sony Music Publishing

About

The TV Anime series Chainsaw Man Anime production company MAPPA adapted the highly regarded popular manga written and illustrated by Fujimoto Tatsuki. The manga’s second arc is currently serialized in Shōnen Jump+ (Shueisha). Denji is a young man who lives in poverty because he has to pay off his dead father’s debts. One day, Denji is betrayed and killed by a Devil. However, Pochita, his partner and Chainsaw Devil, becomes his new heart, and Denji is reborn as Chainsaw Man.

A technology developer and a creator join forces

AI “ChainsawGAN” talks by Kensuke Ushio, who works on “Chainsaw Man“

── Today we’re here to talk about the AI tool ChainsawGAN that was developed for the soundtrack music of the TV anime Chainsaw Man. This is probably an unprecedented example of an AI tool being developed specifically for a single anime film. How did it all start?

Ushio: I had a connection with the people at Sony CSL even before I began working on the soundtrack. The first time was Flow Machines™, wasn’t it?

Takanoha: Yes, that was in 2020. Flow Machines is an AI music production support tool developed by Sony CSL. It’s an application that uses AI technology to support music composition. I work in the Digital Strategy section of Sony Music Publishing (SMP), which handles music publishing business for the Sony Music Group. This is where I work to connect leading-edge Sony Group technology with creators.

However, there aren’t many creators who can use AI well while having an understanding of the design concepts behind the technology, and that’s why when I heard about Mr. Ushio, who is not only a composer but also has been supporting the music production and live performances of Denki Groove and many other artists as a technical engineer, I approached him immediately.

Ushio: It was an amazing stimulation for me. If you’re always composing by yourself, you’ll inevitably fall into old habits. Especially in the case of soundtrack music, where you have to write a lot of pieces in a short period of time, there’s this risk of unconsciously falling back on your own favorite melodies and patterns. Since Flow Machines provides other different suggestions, I felt it would be an extremely useful tool in the sense that it reveals these other choices.

Takanoha: Mr.Ushio composed an actual song for us called parkside in bloom using Flow Machines. After that, we exchanged information and advised each other regarding the latest technologies. Essentially, the ChainsawGan project came out of that process.

── Sounds like the ideal relationship between the technology developer and the creator, who is the actual user.

Ushio: Everyone at Sony CSL is truly passionate about this and they regularly present their latest research findings to me. Like university professors, they’ll give me a frame-by-frame layout, and one by one, they will show me the parts they are responsible for, saying “When you use this technology, you can do this cool thing.” The AI tool they had presented to me became the starting point for this project.

Matsusaka: I also remember the presentation on that day very clearly. I work in Technology Promotion Office at Sony CSL. My work is mostly introducing the AI technology developed by researchers in the field of music to places where music is being produced, and getting it put into practice. On that day, we presented the AI tool DrumGAN to Mr. Ushio. It can automatically generate tons of drum sounds that no one has ever heard before. Mr.Ushio took an interest in it and picked up our development concept right away.

Ushio: I’m a tech geek at heart [laughs]. Of course, I don’t understand the technical details, but I think in my own way I was able to recognize a sort of conceptual image that “it sort of works like this.” That explanation of DrumGAN stuck in my memory.

Matsusaka: A little later, we got a surprising offer, didn’t we? He asked, “Can we do the same thing, but with a chainsaw sound instead of drums?”

Setting up “Messed up” as the core concept of the soundtrack

── So that’s what led to the development of ChainsawGAN. I’ll ask you about ways of using AI later, but first, please tell me about the production of the Chainsaw Man soundtrack. What were your first thoughts when you were offered the job?

Edge of Chainsaw (Chainsaw Man Original Soundtrack) / Music by Kensuke Ushio Guitar: Hisako Tabuchi

Ushio: I do the same thing for any project I am working on, but before creating a soundtrack, I think about what will be at the core concept in my own way. This is because the effect of the background music on a video work is greater than you can imagine. Depending on how you use it, a soundtrack could lead the viewer in a certain fixed direction, and even forcibly establish a story that is clearly unnatural. I always have this idea in my mind that I should avoid using the soundtrack like a powerful drug. To this end, it’s extremely important to me to first create an unshakeable axis of reference.

── So that when you’re unsure of something, you go right back to that?

Ushio: Yes. It’s a soundtrack, so you have to adjust it to each individual scene, but if that’s all you do, then it can end up being descriptive. When I read the original Chainsaw Man comic, I found it to be “messed up” in a good sense. The main character is a dark hero who makes a contract with the Chainsaw Devil and there are a lot of violent scenes. Characters keep getting killed off one after the other. Even so, the story never collapses, and instead has a powerful appeal that draws in the reader. So, I thought the first thing I wanted to do was to place this sense of “messed up” at the core of the soundtrack. Then, I wondered how I could expand on my ideas from there.

── How did you incorporate “messed up” into your work?

Ushio: I think there were three main methods. The first was with my own handiwork. That is, I’d take a finished song and use waveform editing to cut it. The second method was by developing my own plugins. Under a Max/MSP development environment, I’d get help from a programmer to create applications that shuffle the input sounds into a big mess. The last method was using AI to automatically generate sounds.

Takanoha: The fact that you came up with the idea of using DrumGAN at that point was amazing, wasn’t it? Even though you’d only heard about it once at that presentation. You understood intuitively that making a “Chainsaw version” of it would definitely make things more interesting.

Ushio: I guess it must have made that strong of an impression on me. I made that request because I wanted to add elements that I couldn’t predict myself. With the cutting up and plugin methods I mentioned, I can see the process at work. Even though the output sound is “messed-up-sounding,” the algorithm itself is clear-cut. In other words, these two methods on their own can’t realize the core concept.

── I see. That’s where the AI comes in, right?

Ushio: To embody “messed up,” I needed a tool that would let me give up control and generate truly unpredictable material. I thought that the key to this project would be how I placed this material into the compositions and incorporated them as soundtrack music.

Matsusaka: Of course, there is an extremely wide range of relationships between AI and music. There are creators who have a strong aversion to AI, as well as creators who take an approach to using it along the lines of our development vision. Mr. Ushio also did this with Flow Machines, and he presented us with many surprising ways to use it and suggestions that made me think, “Wow, is that even possible?”

Ushio: Sorry for the hassle [laughs].

Matsusaka: No, no [laughs]. On the contrary, those were happy surprises for us as presenters. As soon as I heard your idea, I arranged a remote meeting with Javier Nistal Hurlé, the researcher who developed DrumGAN.

DrumGAN, the inspiration for ChainsawGAN

── Javier, you work at Sony Computer Science Laboratories in Paris, right ?

Javier: Yes. As a researcher, my work is to research the latest AI technologies, and use them in the actual development of music production tools. My area of expertise is deep learning and machine learning (for which I earned a doctorate in 2022), which are essential to AI.

── When Mr. Ushio asked you to create ChainsawGAN based on DrumGAN, how did you feel?

Javier: Hmm…I thought, “What a weird thing for someone to say.”

Ushio: Ha ha ha ha. Well, I guess that’s true.

Javier: I’m only kidding [laughs]. At first, I didn’t know the content of the original Chainsaw Man comic. To be honest, I did think that it was quite an extraordinary thing to ask.

Takanoha: Even so, these two understood each other instantly. Even though they didn’t know each other at all until then, they were communicating on the same wavelength after a very short amount of time. I remember looking on from the outside and being impressed by it.

Matsusaka: They could understand each other very quickly. I was also very impressed.

── In the first place, what kind of tool is DrumGAN?

Javier: As was mentioned at the beginning, it’s a program that automatically generates drum sounds, a tool that uses AI to intuitively derive drum sounds that didn’t exist before. The objective is to support creators’ creative activities by enhancing their freedom. The “GAN” part is an abbreviation for Generative Adversarial Network. AI is roughly divided into two categories: discriminative and generative. A GAN is a robust algorithm that belongs to the latter category. It works by having two neural networks that virtually compete with each other within a single system. You could think of it as a debate or an MC battle between two AIs.

Ushio: The term “adversarial” is just metaphorical. What they are really doing in there is maturing rapidly while cooperating—sort of like two boxers sparring.

Javier: That’s a good example. In short, GAN is just a paradigm that feeds AI with data for it to learn. You could also say it’s a concept similar to game theory in that it stimulates AI to evolve. It’s a process, with the goal being to automatically generate sounds based on the data that is learned.

── What is the actual process by which drums sounds are generated?

Javier: To put it very simply, first, Network A uses a huge amount of waveform data to produce a fake sound that doesn’t exist in reality. Network B’s task is to judge whether that sound is real or fake. A keeps creating fake sounds in order to deceive B, while B continuously fights against it.

In technical terms, we call A the generator and B the discriminator. And by having AI learn via these transactions of huge amounts of data between the two, it becomes able to freely generate sounds that didn’t exist in the original data.

Ushio: To do this, you need to first load in as many actual drum sounds as possible. How many sample sounds was DrumGAN made to learn?

── How many samples did you let DrumGAN learn?

Javier: Around 300,000.

── 300,000! That’s amazing! I didn’t think that many samples of drum sounds existed.

Javier: But the number 300,000 isn’t a high order in the world of AI development. The learning data that was the basis for ChatGPT, which is a hot topic now, was much more massive.

Ushio: However, for us flesh and blood creators, it’s more than enough. It’s probably more sounds than one can get around to in a lifetime.

A desire to help creators make music

── So far, we’ve learned the background behind Mr. Ushio’s desire to use AI in the production of the Chainsaw Man soundtrack, as well as the history of DrumGAN, the AI that automatically generates drum sounds and was the inspiration for his idea. I’d like to dig a little deeper into the development and use of ChainsawGAN, which we could call a variant of its predecessor.

Ushio: Before that, I’d like to ask a question. Javier, what made you want to create such an interesting tool as DrumGAN?

Javier: The biggest reason is that I’m a huge fan of music with a strong beat. For someone who composes music in a genre like house, breakbeat, and hip-hop, isn’t the choice of, say, bass drum timbre or texture, just as important as which rhythm to play?

Ushio: Yes, exactly.

Javier: Recently, many creators have been looking through sample libraries on the Internet for sound sources that fit their purposes, and then processing them to make their own drum sounds. But this is actually a very time-consuming task. And in many cases, they can’t find a sample sound that matches their mood that they are aiming for. On top of that, material from a famous song requires a huge sampling fee. My first thought was to try to ameliorate this situation to provide wide-ranging support to creators in their music creation.

Ushio: I completely get this. Javier’s vision and intentions came across to me very strongly, as they did through the feel of ChainsawGAN.

── So, based on an understanding of this basic structure of DrumGAN, you had the idea of replacing its sample data with chainsaw sounds.

Ushio: That’s right. The AI system had already been set up and was actually up and running. So, I thought that if you could change the data stored in it, then it would have all sorts of applications.

Outputting rhythms as chainsaw sounds

── Javier, what difficulties did you face in using DrumGAN as the basis for creating ChainsawGAN, the generative AI for Chainsaw Man?

Javier: As Mr. Ushio said, the AI system itself was already there, so an image of the finished product came to mind right away. I thought that having the GAN learn new sound samples would be enough if that lets one freely handle random chainsaw noises in the same way as with drum sounds. The problem we faced was that we only had a small number of sample sounds.

Ushio: I mean, they don’t have chainsaw sounds at any sample sharing sites.

Javier: Right. That’s where Ms. Takanoha and the project team at Sony Music Publishing (SMP) came in, and they helped us clear that hurdle.

Takanoha: Gathering the data while getting the copyright clearances was quite difficult.

── Mr. Ushio, how did you use ChainsawGAN in the actual soundtrack production?

Ushio: For example, let’s say I want to use a rhythm that goes, “zu-da-da-das, zu-da-das.” This is a rhythm that I basically got from my own body. When I play it for ChainsawGAN, it outputs it with a chainsaw sound, while keeping the “zu-da-da-das, zu-da-das” rhythm. This becomes something like “Shu-gu-wa-ja-gya, kyu-ju-wa-ju.” And I have absolutely no control over the kind of timbre that results.

── I see. It seems like you were able to get output that perfectly embodies the “messed up” concept that you set from the start.

Ushio: Yes, but when I’m composing, I don’t use the chainsaw beat that I got from DrumGAN as is. I process it, cut it up and embed it deeply in the music

── When you listen to the finished soundtrack, it doesn’t sound like there are simple chainsaw sounds in it.

Ushio: Maybe this is similar to the sensation that the element of “chaos” is dispersed into the sound as a basso continuo?

Takanoha: That’s what impressed me the most. First, I couldn’t help but try to figure out in which part of which piece ChainsawGAN was being used [laughs]. As I listened intently to the soundtrack, I came to see a not-so-simple layered structure.

Matsusaka: I feel the same way. It easily exceeded our expectations on the presenting side. And of course, I once again came to feel just how amazing the creative force of a musician and creator is.

The benefits of AI’s capability for infinite trial and error.

── Being an amateur myself, can you tell me the main difference between creating drum sounds through intricate sampling of real chainsaw sounds, and conventional sampling?

Ushio: I think you can get close even with conventional sampling methods. But the effort and cost needed to do that are huge. For example, if you had 10 minutes of chainsaw sample sounds, you might only want to use a few seconds of it.

You probably only have a typical starter sound in your head, and on top of this, if you had 10 songs, the drum sound would be different for each one of them. Even a single drum set consists of a variety of sounds, such as snare, kick, cymbals, and toms. To look on your own for a chainsaw sample sound that fits all of these things is realistically impossible.

Javier: That’s right. That’s another reason why I came up with DrumGAN.

Ushio: What’s interesting about ChainsawGAN is that, since the AI generates all kinds of tones, it can perform trial and error endlessly until it comes up with the optimal solution. The chainsaw starter sound “bru-u-u-n” is an example. You can use the interface to intuitively change the pitch or the timbre, which allows you to come up with infinite variations. This makes a huge difference. At the very least, without this AI tool, I’d never have been able to meet the deadline for delivering the Chainsaw Man soundtrack.

── Javier, what was your impression on hearing the soundtrack?

Javier: Simply put, it got my heart pounding. There were incredibly noisy and violent compositions, but there were also quiet and beautiful ones, and I felt that the soundtrack as a whole was extremely multilayered. At the same time, a mysterious force was at work, unifying this teeming chaos of sound. My impression was that this sonic state itself was embodying the narrative of the Chainsaw Man story.

Sweet Dreams (Chainsaw Man Original Soundtrack) Music by Kensuke Ushio

Ushio: I am so happy to hear you say that! Naturally, a variety of elements are included in a single piece of music, and of course, the Chainsaw Man soundtrack wasn’t realized merely by using only one tool “ChainsawGAN”. However, the frightening metallic roar of a chainsaw stands out with just a single sound.

I’ve always thought of myself as a composer of sounds rather than melodies, chords, or rhythms, and ultimately, as one who can stir up a wide range of emotions in an audience with a single sound.

Javier: That is really interesting.

Ushio: Well, that’s always the kind of soundtrack I want to create. In that sense, ChainsawGAN without a doubt became the essence of this particular soundtrack.

AI as a tool for expanding a creator’s creativity

── Mr. Ushio, did you realize anything new about the potential of AI through this project?

Ushio: Let’s see. At the very least, it gave me lots of inspirations. A tool like ChainsawGAN didn’t even exist before, and AI evolution itself just keeps moving ahead, so I get the feeling that it holds unimaginable potential. To start with, humans and AI have the exact opposite approaches to output. This is something I experienced first-hand when I started to use this tool.

── What do you mean by that?

Ushio: Human creative activity is, so to speak, additive. We have a desire to express something, and this drives us to go through repeated trial and error in piling up works of art. Meanwhile, generative AI doesn’t create something from scratch. Conversely, AI extracts a single possibility from among a huge number of combinations. You might say it’s like carving out a work of art from a state of infinity akin to white noise.

Javier: Yes. Conceptually, it’s just like that.

Ushio: At any rate, it’s a way of creating that is absolutely impossible for humans to do, and that’s also an interesting point. And how can each creator utilize this unpredictability in their work? For us musicians, AI is only a tool; what’s important is whether or not the music you make has the power to draw people in.

Javier: That is so true. AI excels at learning vast amounts of data and comparing and considering all the possibilities to come up with the optimal solution. On the other hand, we humans start out with the question “why?” and then create a variety of possibilities from a small amount of data, which AI can’t do.

Ushio: But what is the “optimal solution” for music in the first place? This is also an important question [laughs]. You can’t put it in the same category as a game that is won or lost, like go or shogi.

Javier: That’s right. You could derive a probabilistic theory such as “This combination of sounds tends to be preferred,” but this has nothing to do with musical quality. Ushio and I both feel that AI should be a tool that stimulates and amplifies a creator’s creativity.

For example, if AI can easily generate the necessary sound sources, then the time and labor that was saved thereby can be reassigned toward asking the more essential question of “What should I do and how should I do it?” This is the way I think it should work.

── The fundamental way of creating music doesn’t change, even as AI technology advances. So, the role of AI is to assist the creator in concentrating on this fundamental aspect?

Ushio: To me, ChainsawGAN played exactly that role. What would have probably taken a week with conventional sampling was achieved in just 15 minutes. This is a genuinely welcome evolution, isn’t it?

Javier: I want to go further and further with this kind of democratization of music through AI. I believe that all people have inside them the ability to create music. Anyone has the potential to be a creator. If

AI provides the power and means to this, then each and every one of us would be able to express their inner feelings through sound. I would like to build that bridge.

── This is some very convincing discourse on AI and music.

Matsusaka: When you’re researching and marketing AI for music, people often equate your work as something like “getting a machine to automatically generate a complete work of music at the push of a button,” but our overriding concept at Sony CSL is that AI should be used as a tool. We develop AI just as Mr. Ushio said—as a tool that expands the creator’s creativity.

That’s why the fact that the two of you came together and a great soundtrack was created by making the most of your respective individuality and skills makes me truly happy. I feel that this was the ideal collaboration.

Takanoha: I feel exactly the same way. I think we at SMP still have a lot things to work on to make more of these happy successful examples. One of the things we have to do is to proactively create contact points for creators who are actively looking for new ways of expression. As Ms. Matsusaka said, to change this perception of AI as equivalent to “automatic composing,” the best thing is to have creators that use AI to create and introduce to the world appealing works of art and great musical compositions.

── This also fits into the context of “technology for creators” pursued by the Sony Group as a whole.

Takanoha: Yes. As a music publishing company, this is our mission as well of course. Maintaining an environment where all creators can devote themselves to creating with peace of mind, while securely protecting the rights of copyright owners is also extremely important. These days, the problem of generative AI and copyrights is getting global attention, and I think a lot of people are worried, thinking “Is it really okay to use this?” This is where we want to always be on top of the latest trends and be providing accurate information to creators.

── I guess the structure of the copyright business could also be impacted by the emergence of AI.

Takanoha: That’s right.. We used to be able to determine who used a given work without any doubt whatsoever. Of course, the essence of copyright shouldn’t change, but additional evolution of AI going forward could create a situation where the copyright owners are harder to see then they are now. When that happens, we may see more of an emphasis placed on intent and ideas than on musical output.

To quote Javier’s words just now, only humans can ask the question “why?” To make sure that the value of this is never lost, I think it’s important for us to keep asking what true creative value is.

Kensuke Ushio, who created the soundtrack of “Chainsaw Man”, talks about AI “ChainsawGAN” (Short)

Text/Interview: Takayuki Otani

Photography: Osamu Hoshikawa

Release information:

Chainsaw Man Original Soundtrack Complete Edition-chainsaw edge fragments – Ushio Kensuke

Price: ¥3,630

Related websites:

Official Website for the TV Anime Series Chainsaw Man

https://chainsawman.dog/

Ushio Kensuke Official Website

http://www.agraph.jp

Sony Computer Science Laboratories

https://www.sonycsl.co.jp/

Sony Music Publishing

https://smpj.jp/

Published on December 20, 2023