-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathFORMAT
More file actions
293 lines (238 loc) · 11 KB
/
FORMAT
File metadata and controls
293 lines (238 loc) · 11 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
OptimizedObs is an audio container used in âge's rUGP engine.
The "reference implementation" of the format is the riooor.rpo
plugin file distributed with many titles based on rUGP.
OptimizedObs is based on Ogg Vorbis. The audio stream is 100%
compatible with Vorbis. The encapsulation format is a heavily
modified version of Ogg, similar in high-level structure but
more tightly packed than regular Ogg and without the packet
segmentation scheme employed in Ogg.
This document covers page versions 0 and 1. Other versions are
likely to be incompatible in some ways with what's written here.
Some knowledge of the Ogg Vorbis stream structure is assumed.
If a term written here seems unfamiliar, it's likely precisely
defined in the Vorbis I specification or one of the accompanying
documents. The most relevant resources are listed at the bottom of
this file, they're worth a skim.
Page bit packing conventions are not compatible with the way Vorbis
packet contents are packed. Bit order of the field contents is the
same but the fields themselves are "MSb-aligned" instead of
"LSb-aligned", so for example a 3-bit field x packed next to a
7-bit field y would look like this:
MSb LSb
xxxyyyyy byte 1
yy------ byte 2
Assuming x=5 (101) and y=19 (10011):
10100100
11------
Page header
-----------
All variants of the header begin with a 2-bit version and 4-bit
flags. Flags are, in order: continued (1000), partial (0100),
bos (0010) and eos (0001).
If the `continued` flag is set, the first packet on this page is
fragmented and needs to be merged with the last packet from the
previous page in order to be fully parseable. This is analogous to
the Ogg 0x01 bitflag.
If the `partial` flag is set, the last packet on this page is not
complete and needs to be merged with the first packet from the
next page in order to be fully parseable.
Unlike Ogg, last packet length doesn't have to be equal to 255 in
order for this flag to be set.
If the `bos` flag is set, this page is the first page in the stream.
This is analogous to the Ogg 0x02 bitflag.
If the `eos` flag is set, this page is the last page in the stream.
This is analogous to the Ogg 0x04 bitflag.
Pictured:
0 1 2 3 4 5 bits
+-+-+-+-+-+-+
|ver|c|p|b|e|
+-+-+-+-+-+-+
If the version is 1 or later, flags are followed by the 64-bit
granule position, padded to the next byte.
0 1 2 3 4 bytes
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|ver|c|p|b|e| | |
+-+-+-+-+-+-+-+-+ +
| granulepos |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |0 0|0|0|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The zeroed 6 bits are presumably left here to make byte-oriented
parsing easier - with the 6-bit padding the rest of the page has
the same bit alignment regardless of the header version so it's
possible to read version, flags, granule position if present,
and use v0 page parsing code for the rest of the fields.
The following description applies to every page except for the first.
Information on how to parse the first page is available in the
"Identification header" section.
Following the version, flags and optional granulepos, the page header
contains packet length information. Length information consists of a
4-bit length of the `variable packet size` field in bits, 1-bit
padding, 8-bit packet count and a 2-bit selector of the `base packet
size` field length in bits.
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 bits
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|vps len| | packet count |sel|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-----+------+
| sel | bits |
+-----+------+
| 0 | 0 |
| 1 | 8 |
| 2 | 11 |
| 3 |undefd|
+-----+------+
The rest of the header contains the base packet size and variable
packet sizes, one for each packet. Both packet sizes are in bytes.
For example a 2-packet page header with 3-bit variable sizes and
a base packet size of 15 bytes would look like this:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ......... |0 0 1 1| |0 0 0 0 0 0 1 0|0 1|0 0 0 0 1 1 1 1|size1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|size2|
+-+-+-+
5-packet page header with 4-bit variable sizes and a base packet
size of 512 (packets aren't usually that large, but just for the
sake of illustration):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 bits
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ......... |0 1 0 0| |0 0 0 0 0 1 0 1|1 0|0 1 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size1 | size2 | size3 | size4 | size5 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Packet size can be calculated by adding the base packet size to the
respective variable packet size.
The last variable packet size is followed by actual packet bodies.
Packets are byte-padded.
Vorbis headers
==============
While the audio packets (MSb 0) of the Vorbis stream are exactly
the same as in any regular Ogg Vorbis file, the header packets
(first byte 1, 3, 5) are shortened and packed more tightly. The
comment header is absent. The setup header, most likely due to
its enormous size and the fact it's in many cases the same for
multiple files, is usually stored apart from the audio file with
the setup page only containing the reference number.
Identification header
---------------------
The identification page is always the first page in the stream.
It's treated a bit differently from other pages as the size is
not explicitly stored in the page header and needs to be inferred
from the contents of the identification header.
Following the 6-bit padding (or version-flags in v0 pages), the
identification header consists of a 2-bit version number, 3-bit
channel count and a 2-bit sample rate selector. The version field
appears to have exactly the same contents as the page header
version. If the rate selector is less than 3, sample rate is
calculated by the following formula: 11025*2^rsl. Otherwise the
sample rate is determined by the following 8-bit field (present
only if the rate selector is equal to 3).
(Note: the reference implementation does not actually maintain
backwards compatibility here. Post-v0-capable versions always
use the values for which the ver column is marked with "*", older
versions the ones marked with "0". It's not clear why compatibility
is not maintained here as other parts of the post-v0 implementation
appear to be able to parse v0 pages. It's possible pre-v0 files
with these sample rates never existed or weren't deemed important
enough.)
+---+---+------+
|ver|sel| rate |
+---+---+------+
| 0 | 3 | 32000|
| 0 | 4 | 48000|
| 0 | 5 | 96000|
+---+---+------+
| * | 4 | 32000|
| * | 5 | 48000|
| * | 6 | 64000|
| * | 7 | 88200|
| * | 8 | 96000|
+---+---+------+
If the header version is greater than 0, the sample rate selector is
followed by:
- 1-bit unknown field (usually set to 1),
- 1-bit unknown field (usually set to 1),
- 7-bit unknown field (usually set to 0),
- 64-bit final granule position, equal to the granule position of the
last page.
Regardless of the version, the previous fields are followed by:
- 4-bit log2(blocksize 0) (consult the Vorbis spec for details),
- 4-bit log2(blocksize 1),
- 1 framing bit (always set to 1),
- (x<8)-bit zero padding to make the whole page byte-aligned.
Example of an entire setup page with the following fields:
- version: 0
- channels: 1
- sample rate: 44100
- blocksize 0: 256
- blocksize 1: 2048
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 bits
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 |0|0|1|0| 0 | 1 | 2 | 8 | 11 |1|0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ver b ver chans sel lg(bs0) lg(bs1) f pad
Another example:
- version: 1
- channels: 2
- sample rate: 48000
- blocksize 0: 64
- blocksize 1: 128
- final granule position: 12345678
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 bits
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 |0|0|1|0| 1 | 2 | 3 | 5 |1|1| 0 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| 12345678 |
+ +-+-+
| | 6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 7 |1|0| final granule position lg(
+-+-+-+-+-+-+-+-+
bs0) lg(bs1) f p
Setup header
------------
The setup page is always the second page in the stream.
0 1 2 3 4 5 6 7 bits
+-+-+-+-+-+-+-+-+
| | header id |
+-+-+-+-+-+-+-+-+
It's not clear what the first 2 bits are used for. If header id
is equal to 0, the packed setup header (without the "\x5vorbis"
magic string) is placed immediately after the id field; otherwise
it refers to a predefined setup header.
All known predefined headers are included in the `data` directory
for convenience, but it's possible to generate them from scratch
by executing the following command line:
echo | oggenc -r - $oggenc_options | trash/get-setup-hdr.escript > hdrfile
It's important to use libvorbis 1.0 for this task as the more
recent versions generate different setup headers. Get-setup-hdr
is a simple script that extracts setup headers from Ogg files
with the "\x05vorbis" prefix stripped.
+----+------------------------------------------+----------------------+
| id | sha1 sum of the stripped header | $oggenc_options |
+----+------------------------------------------+----------------------+
| 1 | 44a0a0ebf68508c81999bfee18882e73f08633e9 | -C 1 -R 44100 -q 0.5 |
| 2 | 35317cb593d127f30c589ccbdb111802f7f7398c | -C 1 -R 44100 -q 1.5 |
| 3 | 0fd840c8246d4dbea6467eff468ad3476f8483b0 | -C 1 -R 22050 -q 5 |
| 4 | 3612e56744fbdcdae2241d917f5ad2b430880f24 | -C 2 -R 44100 -q 4 |
| 5 | ed285ef5553f492664ee8cd13c98e59857b3a093 | -C 1 -R 44100 -q 2.2 |
| 6 | 707c8fb7bd247e3e91f1952107f8872d2b5a3ac9 | -C 1 -R 48000 -q 2.5 |
+----+------------------------------------------+----------------------+
Note: options mentioned above are just examples. In most cases the given
header can be generated with other combinations too so it's not possible
to reliably infer just from the setup header that a file with the given
header has been encoded with these exact options.
Audio pages
-----------
Every page past the setup page is an audio page. There's nothing special
about them, they contain Vorbis audio packets.
References
==========
Vorbis I spec: https://xiph.org/vorbis/doc/Vorbis_I_spec.html
Ogg framing: https://xiph.org/vorbis/doc/framing.html