Vocaloid Wiki

The Tech of Vocaloid

Vocaloid 1 Interface
Various voice banks have been released for use with the Vocaloid synthesizer technology. Each is sold as "a singer in a box" designed to act as a replacement for an actual singer. As such, they are released under a moe anthropomorphism. These avatars are also referred to as Vocaloids, and are often marketed as virtual idols; some have gone on to perform at live concerts as an on-stage projection.
Vocaloid's singing synthesis technology is generally categorized into the concatenative synthesis in the frequency domain, which splices and processes the vocal fragments extracted from human singing voices, in the forms of time-frequency representation. The Vocaloid system can produce the realistic voices by adding vocal expressions like the vibrato on the score information. Initially, Vocaloid's synthesis technology was called "Frequency-domain Singing Articulation Splicing and Shaping" (周波数ドメイン歌唱アーティキュレーション接続法, Shūhasū-domain Kashō Articulation Setsuzoku-hō) on the release of Vocaloid in 2004, although this name is no longer used since the release of Vocaloid 2 in 2007.
"Singing Articulation" is explained as "vocal expressions" such as vibrato and vocal fragments necessary for singing. The Vocaloid and Vocaloid 2 synthesis engines are designed for singing, not reading text aloud, though software such as Vocaloid-flex and Voiceroid have been developed for that. They cannot naturally replicate singing expressions like hoarse voices or shouts.