Now Loading..

Shenzhen East: Behind the Scenes

Last updated 2023- 6-24, 10: 6:54 A.M. UTC+00:00

Familiar Junction, Unfamiliar Juncture.

What follows is FFFanwen's account of his deep dive into Shenzhen East Station, originally published on July 17th, 2023.

Shenzhen East Railway Station

Warning: A verbose, convoluted, self-centered, and somewhat schizophrenic account ahead. This is more of a personal reflection than trying to share experiences.


Some time has passed since the release of this work. During this period, I have largely let the work ferment on its own, aside from some brief interactions. Even the various post-upload tasks were conducted in a lackadaisical manner. A few days ago, I remembered that I had promised in the comments or in a DM to provide a sort of account behind the video. Hence, I thought I should overcome my laziness and write a comprehensive summary.

But to be honest, I'm not really keen on discussing my videos. This is because, as each work nears completion, I spend a significant amount of time fixing and packaging it, concealing the crude and unappealing aspects to make it look passable. For instance, after installing various appliances at home and seeing the messy cords on the floor, some people might choose to take on cable management, arranging them neatly along the skirting board, making even the minutest details pleasing to the eye. However, my approach would probably be to haphazardly toss them into a corner cabinet; as long as it remains closed, that's fine by me. Therefore, writing a detailed account implies that I need to dissect my work, exposing the crudest parts and my clumsy methods for all to see.

However, upon reflection, it might not be a bad thing to expose it as such. After all, I can selectively omit certain details in the creation discussion. Moreover, I've always had this feeling that once a work is completed, it escapes my grip. In less than a month, I find it hard to remember the process of making it. The smoother the creation process and the more flawless the final product, the faster my memories fade. A detailed discussion could be a good way to preserve those memories and feelings. So, for the sake of my future self as well, let's get onto it.


Let's start with what led me to make this video.

As we near the end of February, some time has passed since the New Year. I've been living rather idly. Due to the pandemic, my study-abroad plan has been delayed for a year, and I haven't been seeking full-time employment. I've been spending my time working on my portfolio for the application and doing some part-time work. Frankly, I've been quite lazy during this period. In September of last year, I finished a longer piece and submitted it to an animation festival. After that, I didn't do much. I made a BGA for a BOF entry, visited my hometown, went to Beijing to attend the aforementioned animation festival in which my work was nominated. But unfortunately, I was quarantined in a hotel and had to return home. Before and after the New Year, I made little progress on my portfolio. According to my plan, I should create a shorter piece to wrap up my portfolio, but I couldn't figure out what to make. So, my idle self started to feel anxious and began to ponder what I should make.

March came, and I started to handle some paperwork and other miscellaneous tasks. During that period, I had to go out to different places at different times of the day, passing through transport hubs and then returning home at various times. It was quite a hassle for someone like me who prefers to stay home, but it also gave me a different experience of commuting, especially at one of the transport hubs I had to pass through - Shenzhen East Station. It was quite fascinating. Unlike my previous commuting experience as a student at the station, I passed through Shenzhen East Station at different times of the day. As such, I actually developed a new impression of it. Depending on the time of day, the station had its own rhythm of busyness and inderludes. It was a familiar yet strange feeling, which inspired me. So I thought I should make a short film set in Shenzhen East Station.

So, I finally got back on track with my portfolio. Although I only had a vague idea of a framework, I already felt much more at ease. So, what form should it take? Thinking back to my previous agonizing experience with animation, maybe I could take a step back and create something in a format I'm familiar with. Yes! How about an otoMAD! It's the final piece of my portfolio anyway, why not treat myself and make an otoMAD? I could extend my last Shenzhen video referencing Teine's Kyoto video from last New Year! This way, I could complete my portfolio but also have it my way as an online content creator, a win-win situation indeed. Besides, I had asked beforehand, and it seems that the copyright requirements for the portfolio aren't extremely strict. As long as the sources of other elements are acknowledged upon submission, derivative work at this level should be fine. Moving forward, all I need to do is to ensure that the otoMAD style be formally recognized as a form of visual artform.

And so, I embarked on my journey on the project as an otoMAD, a familiar process that indeed comforts me. The first step is to find the suitable music, and then I just need to materialize my mental image onto it. This is a really important step.

About TrainMADs

As a digression, although I didn't tag the video as a TrainMAD, I have to admit that I was indeed influenced by TrainMADs when creating it. At the very least, my willingness to use the classic element of announcer clips stems from this trend, as well as this video by Karashi. Since a good while ago, the so-called "TrainMAD" has become a special existence for me. It's different from other otoMADs, as it requires you to actively "create material".

Let's say, anime or movies, they can only be used for about ten to thirty episodes, or one to two hours of content. From beginning to end, the content that can be used as material is already established. Unless there is a sequel or side story to the anime or movie, but this is not something that otoMAD creator gets to decide. To a large extent, expansion of the so-called "main material" is completely dependent on luck.

But TrainMADs are different, the material we use is physically filmed and recorded at a location, by the creator. Therefore, the author has the freedom to choose their own material shooting and recording angles. If the processing flow for other otoMAD materials is "Download - Edit - Use", then the process for TrainMADs should be "Select - Produce - Use". It can be said that there is an everlasting, consistently updated supply of material to use. The only limitation is only the physical location. But on the other hand, any material shot at and around this location is then technically related to TrainMADs, right? If other conventional sources are "closed" in themselves, then the materials for TrainMADs are completely "open" to the author. This is actually very liberating for otoMAD. Think about it, if the "original material" and the "original song" are the two shackles that firmly lock otoMAD in the field of "derivative content". Then the creative method of TrainMADs is a convenient weapon to destroy the former shackle. After all, the material is obtained by the author from reality, so there is no so-called "original material", and in some sense it can become a "original production" (as opposed to "derivative work").

Although it seems that I still used a lot of materials found from the internet (truly a man true to his words!). But still, this approach is at least a point of entry towards "original production".

I know others have their own understanding and prejudice towards TrainMADs, but from my perspective, the most potent essence of TrainMADs lies only in its unique creative approach. This might seem strange, given that even I, who has a certain understanding for TrainMAD, don't necessarily appreciate most TrainMADs. Yet, I cannot deny that I have found a certain "entry point" through TrainMADs. Those who know me know that I seek a kind of "escape" in otoMADs. And this seems to be an "entry point" constituting the crucial step guiding the otoMAD escape. Therefore, if no one is making the TrainMAD I'm happy with, I'll do it myself; if no one is making the cut to take the step of escape, let me be the one to do it.


Enough of the grand talk. Even if the materials have a way to be liberated, I still need to choose the music, right? I'm not capable of composing myself, so I can't take all those big steps at once. Therefore, from the perspective of song selection, this video is still a derivative work. What's more troublesome is that when I used to make otoMAD, I usually formed my ideas by repeatedly appreciating music I like. But this time, I had a concept before choosing the music, which left me at a loss. With so many choices, I really didn't know how to choose.

So I went back to that video by Karashi, the video that coincidentally had gave me the idea in the first place. The track he chose seemed quite fitting. And there are quite a few tracks in this style from Nekomata Master, so I decided to limit myself to Nekomata Master and look for a piece of his instrumental music that is cool in texture, fast-paced, and richly arranged. I spent two days listening to over a hundred of Nekomata's songs that I wasn't familiar with.

FFFanwen's Nekomata Master Playlist on NetEase Music

Nekomata Master playlist consisting of 149 tracks

I don't know if this arbitrary restriction really helped in selecting the best track. However, if I don't set a scope for myself, I'll probably exhaust myself and still be unable to make a choice. Since I've decided to narrow the scope, I'll ignore everything outside and focus on comparing the limited number of songs within the range. I was finally able to gradually eliminate the unsuitable tracks. The final piece left was the one presented in the end - End of World. I don't know much about professional music analysis, but intuitively, the sense of imagery and emotion this piece gives me is incomparable to any other. I'm quite satisfied with my choice.

But then, I still hesitated getting my hands dirty in software. Frankly, I had no idea what kind of effects to implement in which parts of the song. This was another problem I had never experienced before, following that of song selection. Therefore, all I could do was listen to the chosen song over and over again. Sometimes, I would sit quietly by a chair, spending most of the day listening to it. Later on, I would go to Shenzhen East Station, walking around every corner of it while listening to the song. This may seem a little like Wang Yangming's experience of bamboo-watching back in the day, but I really didn't have a better solution.

Eventually, I did come up with some of my own ideas for the song: Rough Ideas 1 Rough Ideas 2 Rough Ideas 3 Rough Ideas 4 Rough Ideas 5

These are some rough notes I made in March. You can see that even some of the later parts were thought through early on. I consider myself quite lucky to have been able to implement these ideas.

In addition to walking around, I was also shooting and recording. Before this, I wasn't sure if it was allowed, so I specifically went to the central kiosk of Shenzhen East Station to ask. If there was an official permit that could be given, that would be even better. The final answer I received was a verbal assurance that "it's fine as long as you're only using your phone to shoot." Well, since they've said so, I shouldn't worry anymore. If any staff were to question me, I would just respond with this statement. So, I simply used my phone and a recording pen, recording every place I passed by. At the time, I didn't know exactly what kind of material to record, so I just recorded everything that I thought could be useful. It seems that at this stage, I didn't yet know how to circle in on the main theme. After all, this was just the beginning of the creative process, without any getting hands in there, I probably wouldn't be able to figure out which materials shot at which locations would be useful. It was only then that I realized the freedom to "create materials" comes at a cost. Facing an unlimited material library without a production direction, even making a choice is a difficulty. Looking back now, nearly 60% of the materials I recorded in the early days turned out to be useless. But such recording actually helped me build an impression of Shenzhen East, so I must admit, this may still have been necessary.

File Recordings

Aside from video, audio faces the same challenges. Out of nearly three hours of audio I recorded, the usable segments might add up to less than ten minutes. The rest is nothing more than pure noise, ever so slightly different due to changes in location. There is another difference with audio compared to video. Although my exploratory video shooting is chaotic, if I change the angle and shoot again, that useless video footage can still be used. But the audio I recorded initially pretty much exhausted all the sounds of the subway station. No matter how much more I record, noise will still be just noise. The final ten minutes of material is all I have that can be used. I listened carefully and it's mainly announcements, ring tones, and some pure mechanical sounds. It seems a bit difficult to reinterpret a song over four minutes long with these monotonous materials. The sounds from my previous Shenzhen video might still be useful, but it's really just a drop in the bucket. What I'm more concerned about is that if I go this route, it will just be a cover of the original song, which is very detrimental to the theme. Maybe I need to think of other ways.

In the end, I remembered a lecture from over half a year ago. A phrase I heard there really struck me:

Whatever you now find weird, ugly, uncomfortable and nasty about a new medium will surely become its signature. CD distortion, the jitterness of digital video, the crap sound of 8-bit - all of these will be cherished and emulated as soon as they can be avoided. It's the sound of failure: so much modern art is the sound of things going out of control, of a medium pushing to its limits and breaking apart. The distorted guitar sound is the sound of something too loud for the medium supposed to carry it. The blues singer with the cracked voice is the sound of an emotional cry too powerful for the throat that releases it. The excitement of grainy film, of bleached-out black and white, is the excitement of witnessing events too momentous for the medium assigned to record them.

-- Brain Eno (1996)

It seems that today, after layer upon layer of technological breakthroughs, the past shortcomings that were unavoidably crude, weird, and annoying due to various limitations can now become a style. The flaws are precisely due to the inability of the medium to truly carry reality, so instead of extracting reality, it's better to preserve the flaws. Those useless, broken, flawed elements actually have a beauty that can be appreciated. For instance, the noise of those three plus hours is actually a true representation of every part of Shenzhen East. "Don't just discard them because they're noise, at least try to use them," a voice inside me said. I'm glad I remembered this phrase. What I didn't realize at the time was that this phrase I happened to recall would become a principle I would use until the end, like a lifesaving straw (hereafter referred to as Principle A).

Land of Hopes and Dreams

This is a 2022 video from LHDW, voted to be one of 2022's top otoMADs on Bilibili due to the unique sentiments it evokes. After amassing over a million views, the video was taken down by Bilibili for unknown reasons.

It is definitely still possible, albeit perhaps more difficult, to appreciate the video if one were not entirely familiar with the source material or the culture.

However, such noise doesn't seem suitable for tuning melodies. Perhaps it's better suited for creating beats or something of the sort? On second thought, if it can be used for beats, then any rhythm-related performance could potentially be created with these noises. In fact, when I was writing the my column last year about my thoughts on Land of Hopes and Dreams, it seemed as though there was a method of using sound effects this way, differing from traditional instrumentation performing a song. This is purely about using timbre to create an emotional response. In fact, I saw many otoMADs from last couple years doing something similar. So, with that in mind, how can I take it a step further this time? I could increase the emphasis on sound effects, creating an audio piece led purely by sound effects. In this case, the original song may no longer be the fundamental basis upon which the rest of the audio is built, but merely as an instance of such a sound effect. Each sound effect should perhaps be given more context, allowing them to become meaningful sounds.


Honestly, my production process was quite chaotic, doing a bit here and a bit there, even alternating between audio and video at times. This anxious and disorderly method of production only eased towards the end of the project. Therefore, in order to make the following production explanation a bit clearer, I will not explain it in chronological order, but in the order they appear in the video.

The specifics of the audio will probably be based on the version not yet sent to Liangyu for mixing. After all, I can only understand the part of the audio I worked on myself. In the end, I'm still questioning whether Liangyu really stuffed a “Yajuu Senpai” in there or not.

Liangyu's YJSNPI comment

For context, MixBadGun, responsible for the recent hit, Function, was running a stream during which he played Shenzhen East. Liangyu, responsible for mixing Shenzhen East, dropped the bombshell above in chat, claiming he had stuffed in samples of "Yajuu Senpai" at this stage.

Section 1 (0:00-1:35)

90 seconds could well be the full length of many otoMADs, but here I chose to make this a single shot. Regardless, this is a piece that lasts over four minutes, with the texture of the first minute and a half being almost the same. If I used all the tricks up my sleeve here, I wouldn't be able to enhance the viewing experience later. I like to use the analogy of playing cards, it's certainly a bad idea to drop your best hand at the beginning. When creating this section, I was completely suppressing myself. Many times, when I thought of adding an effect, I would force myself to give up the idea, focusing purely on achieving the main effect without excessively branching out. Therefore, the initial audio material only added the ticking sound of a traffic light, the underlying instruments are all from the original song. Then various sound effects are gradually introduced.

In my view, a sense of progression is encapsuled within this section of the song. For Shenzhen East Station, it is indeed like the start of a period of time, gradually gaining its own momentum. Therefore, I added all the materials I could think of as time progressed in the early morning. Correspondingly, the visuals also appear with the sound. Compared to later sections, this part should be the most obvious representation of an otoMAD, to some extent, I guess it's also about familiarizing the audience with this performance mechanism of otoMAD.

I mostly rely on my old phone for video shooting, so the quality isn't particularly high. In this era where 4k is commonplace, my phone stubbornly maxes out at 1080p, 30 FPS, and even when shooting slow motion, it can only handle 720p at 30FPS. These materials look decent from a distance, but the lack of detail becomes painfully clear when zoomed in.

However, as I mentioned before, when you are presented with imperfect and flawed materials and have no choice but to use them, the only option seems to be to identify their characteristics and make the best use of them. Since blurriness and stuttering are inherent characteristics of these materials, I decided to highlight these aspects and make them a feature. So I added noise and depth of field, changed the color to black and white (and added "Magic Bullet Looks"), and preserved the 30FPS as the final framerate. This was essentially my approach to processing the visuals.

Adding a thick layer of fuzzy noise filter to the video is indeed a classic technique, and I was greatly inspired by the work of Katsushika Shusshin. It's like adding a lot of strong flavors to not-so-fresh ingredients; what should have been an embarrassing cover-up can, through fermentation, become a unique feature. As for color, I'm even less proficient in handling it. The combination of colors is too chaotic and dirty to control, so why not cancel them all out and let the audience experience the purest sense of light from the work. Finally, the frame rate doesn't have to be 60FPS. In my opinion, preserving the sharp motion at 30fps is a form of commitment to the original flavor of the recorded material. Instead of using 60FPS to make the motion soft, why not retain the sharp and distinct texture of 30fps? And perhaps later, if even sharper motion is needed, it could be reduced to an even lower frame rate.

To sum up, this is roughly how I approached the long process. Oh, and most of the audio and visuals in this segment do not match. Almost all of the audio and video were paired based on what I thought suited each other during post-production. Some materials were even sourced online because they weren't available on the spot, and they had to match the rest of the live recorded materials. I remember this approach of re-dubbing visuals being referred to as having an "otoMAD flavor". But can this be considered otoMAD? I'm not sure, hehe. I'm not one to obsess over definitions, and perhaps the "station announcement sentencing" later might pull this video back to being an otoMAD. But rather than obsessing over these details, what I really want to do is simply label a video, which doesn't quite fit into the otoMAD definition, as otoMAD.

Section 2

Speaking of the station announcement sentencing, we might have to delve into the so-called second part of this video, specifically the 1:35-2:12 section. Chronologically, the audio for this part was one of the earliest things I finished in mid-March, but the video wasn't completed until the beginning of June. This temporal disjointedness somewhat reflects my struggle with this part, as I really didn't know how to finish the video section.

The concept for the audio was simple enough. Since there was so much build-up, adding a simple drumbeat and the explosions of station announcements would be enough. The subsequent repetition of "Shenzhen East" and the ringing segment was the treatment I had in mind from the beginning of the song. However, the video, after a long build-up of a single scene, left me at a loss when it came to transitions. Just like a long-oppressed slave who doesn't know what to do once they gain freedom, I felt unable to express freely and could only keep a self-repressed posture. In fact, this part did require me to continue this self-repression. Because looking at the whole song, this is just an early part. It is not the time to play all the "tricks up the sleeve", or there will be nothing left for the later parts! This uncertainty made me struggle, so I finally decided to skip this part. I will complete all the sections that can be called "tricks" first, and then come back to reassess how to do this part. In fact, I did just that, ruthlessly handing this part over to my future self in June. By the beginning of June, looking at this leftover gift, I couldn't help but curse the me from three months ago who planned to rely on the wisdom from the future to solve the problem.

By the beginning of June, I was really burnt out. Facing this part of the video, I just wanted to finish it quickly. After comparing it with the effects I created later, it seemed better to use some of the fast editing techniques I am good at to bluff my way through, in order to make the overall transition smoother. So, in this state of not knowing whether I was being lazy or making a serious decision, I used unused scraps of material to piece together the video for this part. As a result, many of the scenes were actually magnified several times for use, creating a blurry corner of the video that couldn't be more blurry. However, after being baptized by Principle A, I was able to understand and appreciate this texture of the image. (In truth, I was just being lazy.)

How it's made: Metro route

The reality

In the edited video, I subtly added a layer of 3D train footage. Since I used the subway background noise for sound effects in the audio, it seemed fitting to incorporate elements that would match this noise in the visuals as well. However, this background noise was initially introduced to mask the poor quality of some audio materials. The sound quality of the subway station announcements was particularly terrible, especially the repeated "Shenzhen East" part. The recorded material was so distorted from clipping that it was inaudible. To hide this, I used sound effects along with a significant delay to blur the original sound. This approach also created a sort of "spatial presence" for the material, compensating for its shortcomings. Truly, this is yet another "heavy seasoning for stale ingredients" method. Compared to my completely blurred audio, Liangyu's version salvaged this part quite a bit.

There is also an interesting incident related to this. When I was adjusting the EQ for the background noise, I could never get it to a state that satisfied me. Sometimes, I would play the original song and adjust the EQ knob at the same time. As the rhythm of the song progressed, I playfully adjusted the EQ to the beat. This playful approach resulted in a novel and dynamic experience. Therefore, I decided to simulate this effect using an envelope, as shown here:

FL Enveloppes EQ Gif

In this way, I retained the rough dynamic effect brought about by the adjustments, giving the background noise a sense of distance. This is not an innovative technique, I imagine such effects are not uncommon in music I'm not aware of. However, the things learned by messing around can indeed be more memorable than those absorbed during an earnest learning attempt. This philosophy seems to be reflected in the third section as well.

Section 3

This segment ranges from 2:12 to 2:31 and is a relatively short, quiet passage in the middle of the song. There's not much to say about the audio; on top of the original track, I only layered some ambient noise and station announcement vocals. As for the video, I drew some inspiration from the phrase "a large number of passengers in the station" at the beginning of this segment. Since the phrase implies "heavy passenger flow" but the original track suggests a "sudden pause of emptiness," why not portray this contradiction in the video and create a sense of "heavy passenger flow, yet seemingly empty"? Luckily, I had an idea for how to do this, using an effect I discovered while playing around in January:

The basic Content-Aware Fill feature in the new version of AE is to automatically calculate and fill in missing areas of a layer. It can be considered as the dynamic version of Photoshop's Content-Aware Fill. Therefore, many people use this feature to remove unwanted objects in footage, such as moving cars on the road.

Content aware fill tutorial thumbnail

But in many cases, this kind of filling is not perfect. Especially when dealing with many objects that take up a large proportion of the image, there will almost always be a hazy trace remaining. Generally speaking, people are eager to eliminaet this annoying trace, but I want to preserve it to create a certain effect. This is still in line with Principle A, turning flaws into features. So, I gradually smudge the characters frame by frame, put them into After Effects, use a mask to remove the characters and generate a filled scene. Then, I overlay the generated scene onto the original characters to create this main effect.

In fact, after explaining it, this is a very simple effect to achieve. The real difficulty, of course, is the time investment, especially the process of smudging the characters frame by frame. A project of just a few seconds can take me a whole day. But if we delve deeper, there should be more efficient ways than smudging frame by frame, but I still enjoy being stuck in this inefficient method.

Firstly, although my hand needs to keep smudging, my brain doesn't need to work at this stage. Therefore, I can listen to some entertainment programs while doodling. To some extent, this is a long-lost rest for me, and I really want to slow down this process and enjoy it more. Just like some people would take a slow green train on purpose, just to appreciate the scenery or spend more time with the people they are with.

Secondly, I like the certainty that hand-smudging gives me. When I do this, I always know that as long as I finish smudging, I can definitely create the effect, without worrying about getting stuck halfway. And I personally like the jittery feeling between frames brought by doodling. Compared to the situation of "a small stump every three steps and a big stump every five steps" when making other parts, this experience is truly precious.

Section 4 (2:31-2:42)

Comment section

A comment from Xuwuyun commenting on the maladroitness of running around and looking for the appropriate signs, with FFFanwen's acknoledgment.

If I had to summarize this section in a brief sentence, I think the above quote would be most appropriate. But before talking about finding the signs, let's talk about the audio first.

This section was the second one I started working on in terms of audio, so it's also early from my directionless period in March. The original song had a breakbeat at this part, so the rhythm seemed rushed, and my initial idea before making the audio was to restore the breakbeat. But the problem I encountered was still that "simple cover of the instruments are insufficient". Transcribing, tuning, replacing, restoring the original song here really became a thankless task. Facing this vast amount of work, I decided to "take a shortcut" - I didn't restore any of the original song's texture, at most I added a layer of the original song underneath to highlight the texture. What I really need to do is to create content that the original song did not have.


As mentioned above, because I felt the drumbeat's texture was not enough, I simply stacked another copy of drumbeats from the song.

"Even noise has the potential to be an element of performance".

If the previous section was just using noise as a decorative sound effect, then this section should be considered a deeper manifestation of that idea. No longer concerned with recreating the original song, my thoughts became more daring. First, I felt the audio was empty, so I boldly layered in some noise. After realizing I could use noise as a rhythm, I started to segment and manipulate it. I must admit, even noise can exhibit different textures with the change in environment. When you place two noises from different environments side by side, the difference in tone becomes particularly noticeable. These differences in noise textures were key in completing this audio piece. In the end, when the audio was finished, the effect of different noises from different scenes rushing towards me naturally inspired my video. Indeed, I think the video should also involve some kind of transition that matches the audio.

The "photo freeze-frame quick-cut" effect is a technique used in many video works, and I wanted to try it this time. However, there is a key point when using this technique, which is the need for an "anchoring element" to establish a visual connection. As I mentioned in the comment section:

For this segment, the most direct inspirations came from Kevin McGloughlin and Max Cooper.

In this video, the "anchored quick-cut" technique is well demonstrated. Rapid switching alone can only create visual chaos for the viewer. Therefore, in the midst of rapid and complex "changes", there must be something "unchanged" or "regular" so that the viewer knows where to focus their attention and appreciate the movement. So, I thought about using something readily available in Shenzhen East Station as the anchor, and I finally settled on "signs".

Before I started actively paying attention to these signs for this video, I never would have imagined that there were so many of them scattered throughout Shenzhen East Station. When I was scouting locations, almost every few steps I took, a sign would pop up in front of me. As a creator, this was a good thing, as I didn't have to worry about sourcing material. But on top of that, as an ordinary person stripped of my creator identity, I felt an indescribable surprise. I was shocked to discover that the amount of information in the scenes I face every day is so saturated. While I'm usually able to ignore these signs, when I consciously focus on them, I start to feel confused and dizzy over time. I couldn't help but marvel at how much my brain has compromised to help me adapt to city life.

Footage of signs

Enough about that, in short, I collected a lot of signs, planning to edit them according to the audio when I get home. This seems contrary to what I initially responded to Xuewuyun, there was no "maladroitness in searching". But the fact is that the "maladroitness of searching" is starting from now. I indeed collected a wide variety of materials, but how to combine these materials is actually the most complex part of the process. I established some key shapes like building blocks, then filled in between them. Most of my time is spent searching for materials in folders, dragging them into the software to see if they fit, or abandoning them to find new ones, a continuous cycle, which does make me feel maladroit. But after several rounds of comparison and modification, although it's not very refined overall, I did manage to complete this part of the production.

While assembling, I also had to straighten out any crooked materials, and for materials that were not enough, I filled them in with DALL-E.

DALL-E folder DALL-E

If you look closely, you should still be able to notice the unnatural parts of the picture, but since the scene changes quickly, it won't be scrutinized too much. In the end, I also set up a transition into to the fifth section.

Section 5 (2:42-3:20)

Through the underground tunnel advertisement, we roughly reach the fifth part of the entire piece. I personally feel that it's from this section that the video really starts to get into the main course. And this section's effect is also one of the core effects. After all, I even emphasized and drew it in my initial sketch. Strictly speaking, this section can be further divided into three parts, but I always feel that these three parts are still connected together, so I still collectively refer to it as the fifth section, so I can talk about it more freely afterwards, whether it's divided or combined.

From a technical perspective, the effect of the first part is actually very simple. So much so that for some people, they would understand how to do it at a glance:

How it's made: tunnel

Overall, it's about hollowing out the billboard to create a feeling similar to a "window", and then using the camera to push through the windows, creating "parallelism". I personally imagine it as different parallel compartments of the train, but some viewers interpret it as "parallel worlds". Either way, this sense of simultaneous existence yet irrelevance is indeed the focus of this section. I let a plugin handle the keying work, and carefully manually patch up any missing areas. Then stack them up on the z-axis, add the camera's depth of field. Put mosaics on the faces of the characters to protect the portrait rights of passers-by and the like. Actually, in the end, the traces of collage are still very obvious, after all, both people and backgrounds are compressed onto a plane, so their depth offset changes are not really that clear.

One effect I consider to be quite important is that I add a sort of "exposure change from dark to clear as the camera gets closer" to each 3D layer. After all, without some kind of "visibility" restriction for the camera, I would have to control even the farthest layer to avoid any mistakes. This mist-like effect can indeed add a sense of unknown as the camera advances. I've thought for a long time about how to achieve this effect in AE, and finally decided to use a simple expression to do it. Although it seems rough, the effect is actually quite passable.

How it's made: exposure

On the audio side, this part mainly emphasizes the bass and station announcements. I really liked the texture of the bass when I first heard the original song, so it was appropriate to preserve this texture. Other than that, the tracks are mostly ambient sounds, with: background noise,

Audio screenshot

high-frequency crackling noise from electrical sparks,

Audio screenshot

and to achieve a better transition effect, I even added "free wind transition sound effect", hehe.

Audio screenshot

The first half of the second part of the fifth section basically continues the method of the first section. The video uses repetition and offset in a more surreal way, while the audio replaces the original station announcement with noise slices similar to the fourth section. This part of the original song actually introduced a fast-paced bass, but since I couldn't figure it out, I used vocal slices to substitute for texture. However, when Liangyu took over, he figured it out and added it back for me, for which I am truly grateful.

Liangyu saves the day

FFFanwen: I don't think I got the bass quite right...
Liangyu: I'll see what I can do.

As for the second half of this section and the end of the third part of the video, I used a different method. After all, I didn't want to be confined to this corridor, and it was appropriate to change the scene at the right time (although the actual change was quite abrupt).

Depth map plugin

For this section, I used a plugin to generate depth maps to mask the original video, and then stacked the processed materials from different scenes together to create this effect. The depth map generated by the plug-in naturally can't be said to be very accurate. As shown in the figure below, the jitter between frames is almost fatal. But I didn't care about that, I just stacked them up on their own. Fortunately, even though the overall lighting also flickered, the jitter of the material itself became a seemingly coincidental flicker effect with the light, so I kept it as a feature.

Depth demo

Section 6 (3:20-3:33)

This section is a transition in the latter half of the video, and the overall texture of the song has suddenly changed, so I wanted to change the original camera dolly movement in this section and create a new effect. For some reason, this slightly eerie tune reminded me of cyriak. I don't think I need to introduce this author too much. In my view, he indeed excels at repetition techniques and compositing, so I may get some inspiration from him. I imitated and created this effect with the repetition of objects.

Repetition Escalator

I cut out the objects I wanted in motion and then added an afterimage effect to create a feeling of the same object continuously appearing. At the same time, the audio emphasizes the playing of the bell sound. This sense of urgency fits this transition period well and sets the stage for the next scene. With the synchronized "next stop, next stop" oozing in the voiceover, combined with the quick cuts of the station names in the picture. The camera eventually shifts abruptly, heading towards the unknown, the so-called final station. To add a sharp sense of movement, I indeed pulled some motion into 15FPS. At least in my view, this section should be the accumulation of a certain sense of anxious trance to the peak, and then finally all let out at once.

Section 7 (3:33-3:46)

Train door

This section is also composed of the repetition of the door. After all, a transition was set up where the camera moves towards the unknown terminus, so it feels like a "road" for running can be set up here, only this road is infinite, with no end. Moreover, there is something telling you to hurry in the background, which makes it necessary to move forward.

I thought of the warning sound when the subway door closes, which is almost the most annoying type of command to hurry for me. So I incorporated it into the audio of this section, and let the camera enter the next level along the slit almost every time the car door is about to close. I want to make the advancement more urgent through this method.

I'm not sure if anyone noticed, but I actually used DALL-E to expand this closing door footage. My phone couldn't capture the entire scene in the narrow space of the subway, so I had to resort to this. And for a smooth transition from the previous section, I also pulled this section into 15FPS. I've tried a 30FPS version of this section before, but for some reason, the closing action always felt a bit soft, so I decided to cut the frame rate in half and keep it at 15FPS.

Behind the door, I decided to add a layer of tunnel footage as a base. And on this basis, I personally wanted to make the tunnel editing a bit more peculiar. So I added a Droste effect, giving the tunnel an added layer of spiral depth.


However, this tunnel image I used actually came from a blogger online, and it doesn't even seem to be a tunnel in the Shenzhen subway (I apologize to the Shenzhen fans, I deserve to die).

But in the beginning, I did seriously consider self-filming materials. The Line 14 passing through Shenzhen East is an unmanned vehicle, so you can directly see the glass at the front and rear of the car. I heard that when the line first opened, you could go directly to either end of the train and film the tunnel. Now it's sealed off by a railing and you can only look from a distance. In theory, you could film a section of tunnel material from here, but when actually filming, the distance separated by the railing always results in most of the material being obscured by the window sill, and sometimes other passengers are filming as well. The final effect of the finished piece is not very good, so in the end, I chose to take a shortcut.

That being said, this material was actually still repurposed earlier in the second section.


Section 8 (3:46-4:49)

Finally, we've arrived at the last section. My ideas here weren't particularly complex. Whether in terms of audio or video, I simply tossed all the effects I had used before into a fragmented mix to achieve a massive amount of information. Except for one technique I hadn't used before, photogrammetry.

In fact, I initially considered the possibility of needing a 3D subway model later and planned to utilize a scanned subway model. For this, I even bought a Shenzhen subway model to try. Here's what the actual model looked like when I got it:

Train model 1

Well, it's a paper model after all, so I understand the lack of detail. Anyway, let's scan it and see:

Train model 2

Ughhh... how should I put it, this was probably my first attempt at scanning. Without a doubt, it was a failure. I hadn't prepared thoroughly and didn't know what factors would affect the scan quality. In my attempt, I managed to scan an object with this kind of texture. This, of course, made me feel defeated, but I also noticed the fragmented texture.

Train model 3

Afterwards, I made more attempts and managed to generate better models. However, looking at the details, they still weren't satisfactory. I didn't know how to redo the topology on the models, so like Edison searching for the right filament for the lightbulb, I just had to keep trying. Later, I even went down a dark path, buying a specialized subway model for scanning, hoping to achieve better results than with the paper model.

Train model 4 Train model 5 Train model 6 Train model 7

The final result wasn't satisfactory either. However, this rough and scattered texture does have a certain charm, perhaps it could even become a flavor of its own?

Train model 8

Just like the first scan filled with errors, if we don't consider these results as mistakes, but rather as flavors, it's hard to imagine what software or method could create this effect. Perhaps I could try not using it as a tool to scan physical models into high-quality ones, but rather as a tool to scan real-life scenes into rough, fragmented models.

So, I scanned interesting corners in the Shenzhen East Station.

Photoscanning 1 Photoscanning 2 Photoscanning 3

Of course, this included the previously shot traffic light poles. I had roughly planned the transition at the end, so I made preparations at this time.

The final scan produced a texture that was to my liking.

Photoscanning 4 Photoscanning 5 Photoscanning 6

I combined them to create a tunnel that could be advanced through, creating the general look of the final section.

Photoscanning 7

Additionally, to give the models a dynamic feel, I added a layer of dynamic noise to the displacement properties of their materials. Although the effect isn't very pronounced under the moving camera, it does add a strange vitality.


On top of these models, I overlaid the depth map scenes and moving subway from the fifth section, finally achieving the visual effects of this section.

Photoscanning 8

As for the previously scanned subway model, I've also placed it here for a second or two. It's somewhat ironic that the material, which cost me a few hundred RMB and was supposed to be the focus of the production, only had such a minor effect in the end. When I needed to use a detailed subway model, I reluctantly paid for a model online that doesn't particularly resemble the Shenzhen subway, and added some effects to make it look passable. It's some black humor, and I can't help but tease myself about how I learned nothing but how to spend money.

But this experience of sparing no expense for a certain effect, much like gambling, is indeed more exciting than the gloomy feeling of trying out different effects on the computer. Especially when the final result is unexpectedly good, it's like hitting the jackpot.

In the end, I managed to sell the acquired subway model second-hand to recoup some of my costs.

Photoscanning 9

Finally, I successfully transitioned from the scanned traffic light scene to the real-life scene, bringing the visuals to an ending that corresponds to the beginning and smoothly ending this video.

About Shenzhen East itself

After a marathon of more than three months, this is the longest otoMAD I've made in my eight-year otoMAD career. It was a back and forth battle like I had never experienced before. Looking back, it feels like a dream. I don't remember what I went through or why I decided to make this video. Especially now that I've left Shenzhen and am recording these events in a completely unfamiliar place, I've started to lose clarity about what Shenzhen East really is.

From the video, Shenzhen East Station seems like a cold place made up of commuting people. But in reality, I lied. Point the camera elsewhere, color it in. There are certainly other people here too:

The Real Shenzhen

Especially after the removal of the coronavirus testing kiosk, the open space of the west square was restored. There were more parents bringing children to play and people taking walks. Sometimes when I come here at night, I can still see older folks bustling around in the open space. Aside from being a transportation hub, it's almost like a small park. But step outside, and it's surrounded by bustling commercial streets. This island-like scene also left a deep impression on me, but I didn't include it in the video to stick to the theme.

Shift the perspective to the other side, are there other types of people besides those leisurely walking and those busy commuting? The answer is, of course, yes. I remember one time I left the house around five in the morning to capture the early morning scenes of Shenzhen East Station. I thought there wouldn't be many people at the station at five o'clock. But the reality was, when I arrived, the underground hall of the station was already filled with people lying and standing in all directions, with large and small burlap bags by their sides. I guessed these were people who came to Shenzhen East Railway Station to take the long-distance train.

Speaking of which, I often overlook Shenzhen East for its primary role as a train station. After all, Shenzhen East station is a rather small one, with its waiting hall at best comparable to a slightly larger school lobby. The number of trains departing from here is minimal, mostly consisting of green trains. Hence, whenever I need to take a train, I usually opt for Shenzhen North instead of Shenzhen East. On the other hand, perhaps it's because of the affordable prices due to the absence of high-speed trains here that many workers carrying their cloth bags would choose to take the train from Shenzhen East. I had to return to Meizhou this year, so I decided to take a train from Shenzhen East. On one hand, I wanted to capture some footage of Shenzhen East train-related scenes, and on the other hand, I wanted to verify what it feels like as a train station. But when I actually boarded the train and was surrounded by people on their journeys, I truly sensed a different side of Shenzhen through them.

I ended up not photographing them. I just felt that at the very least, I should show some respect to these travelers, hence it's better not to turn them into a spectacle. There are things that I still don't fully understand, and without the creative ability and power to perfectly present these to the audience, it might be better to refrain.

So, I chose to focus only on the commuters and the subway that represents them and made the video. As for the real Shenzhen East, or even the real Shenzhen, it's not for me to define. I see many people in the comment section sharing their experiences and impressions of Shenzhen, and maybe by compiling them, we could get a general impression of Shenzhen East. To me, Shenzhen is neither particularly joyous nor painful. It's just an interesting place where I've grown accustomed to living.


That's pretty much my complete summary. Deconstructed in this way, the video somehow seems less appealing. Before making it, I attached too many labels. I wanted it to be part of my portfolio, and also a video that can be uploaded onto my channel; I wanted it to have the quality of a film, but also retain the characteristics of otoMAD. In the end, this work seems like a chimera, and even I don't know what it really is.

From a filmmaking perspective, the use of techniques in this work is not exhaustive. For example, the effect of making characters transparent in the third segment is a jarring addition. I used it simply because I wanted to.

Silhouette Haze

To solidify this effect would be simple - just use it again in the subsequent segments. I had originally planned to use this effect to create a character glitch feel in the freeze-frame quick cutting and push-shot segments. But in the end, I didn't because the workload was too much.

On the other hand, from an otoMAD perspective, I didn't adhere to the production methods of otoMAD. Dubbing, and non-matching audio and video are prevalent in this work. Perhaps only the station announcements in the middle can slightly highlight this work's attribute as an otoMAD.

Moreover, despite being a collection of works, the way I collect materials, including using the non-original music, still leans on the gray area. I didn't try my best to erase my habits surrounding this sense of "derivative work". Even what I said at the beginning of this article, about elevating "derivative creation" to "original production", I didn't persist till the end. I only offered a possibility, and finally slacked off on the details. It seems that as long as some feelings are in place, I can always rush and gloss over the other details.

Hasty and casual, that's actually my way of creation. The effects achieved must be seen, and what's expressed must be made clear to the audience. So I've also self-mocked that watching my works is often seeing the same thing over and over; it seems you can't discover more hidden details by rewatching, because most of the effects I've put on the surface.

The same applies to my personality. Honestly, I don't enjoy the happiness in the creative process. True, I feel novelty when I discover some new technique. But this is unrelated to purely enjoying the creative process, I'm just looking forward to the results brought by this effect and the completion of the whole work. Therefore, as I mentioned before about being hasty and casual, I really just want this work to be completed quickly. I really don't know what's so joyful about being immersed in details. To put it harshly, I think the real joy only exists in the few days after the moment the work is released, the rest of the time is just a painful endeavor, and the sense of alienation after the work is no longer "mine".

For me, creation is essentially a painful luxury. During the process, I have been disgusted several times to the point where I would rather do anything else than this. I would play the most boring games just to avoid doing it. In the process of making this, I played Minesweeper, Sudoku, and 2048, even listening to academic lectures seemed more interesting. But in the end, I still have to make it, not only because it is a work that brings me practical benefits, but also because it is my way to escape from otoMAD.

It might seem strange. I'm creating otoMAD works, but not to enter the world of otoMAD, but rather to escape from it. This escape is not about quitting otoMAD, but about struggling to not be enveloped by the parts of otoMAD I dislike. I have indeed felt this way for quite a long time. After all, otoMAD is not just about the videos, it also encompasses various related matters, social connections, and "drama", which are also part of otoMAD. The videos are an important and fundamental aspect of otoMAD, but there's no concept of otoMAD that solely consists of the videos. So when facing otoMAD, I naturally have to confront those things I dislike, things that seem to be ostentatious. But when I finish a work, I genuinely feel a sense of freedom that's removed from these frivolous aspects of otoMAD. This feeling is different from the regret or defeat that comes from leaving a circle or fleeing to another. After this separation, I have the right to look at otoMAD on equal terms. However, this freedom only exists for a few days after a work is completed and published. So, for this freedom, I must continue to create.

On the other hand, I believe this so-called desire to escape is prevalent in other subcultures, and even in everyone's daily life. But clearly, disassociating from an identity doesn't provide a solution. Just like escaping from the otoMAD circle to others, these constraints still exist. Just like escaping from otoMAD to daily life, these constraints will undoubtedly manifest in all aspects of life. Of course, in the context of this work, it's like even after having escaped from Shenzhen, those shackles will surely follow you relentlessly.

Therefore, I want to confront these constraints, even if they're only confined to the small realm of otoMAD. But I can see that behind it lies the essential predicaments of life. It seems I have overlapped the struggles of otoMAD with the struggles of creating, living, and even more grandiose contexts. So, to rid myself of these universally applicable constraints, I start with otoMAD. Every time I finish a work, I gain a short-lived escape, the right to observe otoMAD from a distance with an equal footing. Even if I don't speak or articulate, the video forms a force that allows me to maintain my selfhood, which is also a kind of silent power. This is not the helpless silence of those who cannot speak, but a powerful silence like a towering tree standing in the blsitering cold. Eventually, I hope to find the answers and solutions I seek within this relentless cycle of separation and confrontation.