Understanding CX2D Conversions and How to Get Best Quality

"How to's" and tips and tricks - for all VSO products- not for questions!

Moderators: Maggie, Cougar_II, ckhouston, JJ, Forum admin

VSO Fanatic
Posts: 4590
Joined: Wed Aug 08, 2007 4:12 am

Understanding CX2D Conversions and How to Get Best Quality

Post by ckhouston »


This topic is intended to explain why and how CX2D conversions are different from conversion programs many users are familiar with, and how to get the best results from it. It is necessarily long in order to overcome preconceived biases of users not familiar with CX2D‘s unique method of converting, but you don’t have to read it if you are using version 4 or 5 and you are willing to accept the concise recommendations in http://forums.vso-software.fr/answers-t ... 11326.html If you have questions after reading that, please come back and read this entire post carefully.

This is the best starting point for versions 2 and 3 users to understand their conversions. Read the So How do We Decide which Option to Use for a Particular Project? section below at the very least for guidelines about how to choose the best encoding option.

Don’t just post in the forum expecting some quick magic answer or expecting volunteers that help here to take time to give you a private tutorial without carefully reading this entire post first. Feel free to post questions though if you don’t understand something specific or have questions about something not covered here.


There are three main types of encoding used to convert source files to DVD format.

Most users are familiar with constant bitrate (CBR) encoding which assigns the same amount of bitrate to every scene in a video. That usually causes more bitrate to be used for simple scenes than is needed to describe them but less than needed for complex scenes. The resulting quality of scenes varies widely with high quality for simple scenes and lower quality for complex ones. But converted size is easily predicted since an average bitrate is usually prescribed which will exactly fill a DVD completely.

And most users are familiar with the traditional variable bitrate (VBR) method which assigns bitrate according to scene complexity with more bitrate used for complex ones than CBR and less for the simple ones.

Most users apparently believe this method assigns the ultimate amount of bitrate to each scene but it doesn’t. It is usually constrained by an average bitrate like with CBR which will again exactly fill a DVD, and sometimes a minimum and maximum bitrate is also imposed. It will assign more bitrate to complex scenes than simple ones as one might expect. But it cannot allow bitrate to swing very much in order to achieve the average it seeks. The result is again less quality for more complex scenes than simple ones but the quality swing is much less than with CBR. A very good VBR encoder can achieve more consistent quality from scene to scene with multiple encoding passes which refine the bitrate distribution among different scenes. But even 5 or 6 passes will only approach constant quality for all scenes.

ConvertXtoDVD (CX2D) uses a constant quantization (CQ) method that many are not familiar with. It maintains the same quantization factor (Q) for all scenes and, since Q can be taken as a measure of quality, it assigns bitrate as needed to each scene to keep the same quality for all scenes regardless of complexity. The result can be a bitrate distribution over time that swings up and down much more than with VBR in order to keep quality the same for all scenes.

Converted size with CQ encodes is unpredictable before conversion since average scene complexity is not known, and Q is an integer so converted size jumps in steps as Q varies. The result is that DVDs will typically not be exactly filled as with other encoding methods that control bitrate. So do not expect many of your conversions with CX2D to completely fill a DVD.

Many experts familiar with CQ encodes do not consider the inability to fill a DVD completely to be a problem, they have found that CQ can give better overall quality than other methods that will fill the disc. And CX2D enjoys a reputation for producing the best quality among its competitors when it is used right.

Note: The basic encoding has not changed since the beginning of version 2 although some minor improvements were made along the way. But the names/descriptions of the encoding options were changed in version 4 to reflect better how they should be used. The equivalent names/descriptions are:

SP (Short Projects) in ver 4 = High quality in ver 2 and 3

MP (Medium Projects) in ver 4 = Medium quality in ver 2 and 3

LP (Long Projects) in ver 4 = Low quality in ver 2 and 3

And an Automatic option -- based on some guidelines that will be presented later in the So How do We Decide which Option to Use for a Particular Project? section below-- was added in ver 4 that automatically picks the appropriate option so you don’t have to choose manually.

Version 4 designations for the encoding options will be used in this topic.

Note: A more detailed discussion of encoding methods is in http://tangentsoft.net/video/mpeg/enc-modes.html

Sidebar: Some users that simply cannot accept that a conversion that does not fill a DVD can give better quality than one that does have asked if the converted size of the CX2D encode can be forced to fill the DVD in order to increase quality. No it cannot, it is then no longer a CQ encode and quality suffers. A test was done in http://forums.vso-software.fr/post40642.html#p40642 that proves it -- see the Wouldn’t Increasing Bitrate so Output Size is Bigger be Better? section there.

A Comparison of Results with Different Encoders

(Note: The images for referenced figures are at the end of this post. You can open a second tab in your browser and scroll it to the figures to view them without losing your place in the discussion.)

Fig. 1 compares the distribution of bitrate over a 2 minute segment of several conversions to a DVD5 target with that of the source. Time varies along the horizontal axis and each vertical division represents a 1000 kb/s bitrate change for the conversions but only 800 kb/s for the source -- the reason for the difference is not important to this discussion.

The source is a 20 minute clip taken from a high quality DL commercial DVD and added 6 times to get a 120 minute project. Using a 20 minute clip in this way allows keeping content constant while varying project length, in order to study its effect, by varying the number of times the clip is added to a project.

We can assume that every effort was made to provide an optimum presentation on the original DVD. So one way to evaluate a conversion of the source is to compare how well the shape of the converted result emulates that of the source. Comparing actual bitrates is not valid because the source was taken from a DL DVD so its bitrates would be higher than a conversion to DVD5.

In the figure:

1P = 1 pass conversion

2P = 2 pass conversion

Nero = the Nero Vision program

HC = the highly regarded HC encoder as implemented in the AVStoDVD program

CX2D = the ConvertXtoDVD program

Conversions in the figure are arranged from left to right in order of how well the converted distribution appears to emulate that of the source. The one that does best is the CX2D 1P CQ encode on the right. Note that it delineates complex scenes with high bitrates and simple ones with low bitrates just as the source does, and the relative complexity of scenes, as indicated by bitrate, is much the same.

It will surprise some that CX2D 2P resembles a traditional VBR encode more than CQ. That is because that conversion filled the DVD so a CQ encode could not be done, more about this later.

These results seem to substantiate the statement above that “A very good VBR encoder can achieve more consistent quality from scene to scene with multiple encoding passes which refine the bitrate distribution among different scenes. But even 5 or 6 passes will only approach constant quality for all scenes.”

The question in many minds must now be which is better, the HC 2P VBR or the CX2D MP 1P CQ conversion. One cannot make an accurate judgment from bitrate values alone because one encoder might be more efficient than another in the way it uses bitrate resources. My own limited comparisons of the two gives a slight edge to CX2D for the more complex scenes. They seem to be very close in the simple scenes with HC maybe a little better. That is typical of comparisons between CQ and VBR conversions though, because VBR usually provides more bitrate in simple scenes but less in the complex ones because of the size constraint discussed above. A comparison of partial frame captures is presented in http://forums.vso-software.fr/post54205.html#p54205 for the last complex scene on the right so you can form your own opinion. And a user that stirred up quite a controversy, finally decided that CX2D was best when the right settings were used -- see http://forums.vso-software.fr/post56139.html#p56139.

Effects of Different Encoding Options and Project Length

Fig. 2 shows how project length affects conversions. Both bitrate and Q are shown there with the vertical scale for Q varying from 0 to 10. Remember that Q (shown in green color) can be taken as a measure of quality with lower values indicating better quality.

Note that the plots for MP 1P are all the same because none of those conversions fill the DVD. The Q plots in them are not exactly constant, there are small peaks at the most complex scenes so CX2D has not done a pure CQ conversion. It cannot because DVD standards impose a maximum bitrate that can be used on a DVD that standalone players can be designed to play correctly. We still refer to conversions of that type as CQ though even though they are not pure CQ.

(A pure CQ encode would require trial and error conversions to find the minimum Q that would not cause bitrate to exceed the max allowed and would not exceed the size of the DVD. That would normally require a higher Q than used by CX2D which would cause overall quality to be less than with CX2D’s modified method.)

Now look at the plots for SP 2P at 60 minutes. That conversion did not fill the DVD either so it is a CQ conversion in the same manner, although not as apparent because it uses a lower Q value of 1 and Q has to rise to the same levels as for MP 1P for the complex scenes.

(Higher values of Q indicate that more bitrate was needed to maintain constant Q but could not be applied.)

We can compare bitrate and Q directly here because the same encoder was used for all conversions. Extensive analysis of converted results confirms this.

So, at 60 minutes, the SP 2P conversion is clearly better than one with MP 1P in simple scenes, but there is little, if any, difference in complex ones. But look at what happens with longer projects where SP 2P fills the DVD and an average bitrate must be imposed to force converted size to fit the DVD. Q begins to rise for all scenes as project length increases. For 120 minute conversions, it is about 45% higher for SP 2P than MP 1P in complex scenes and only about 12 % lower in simple ones. So one would expect MP 1P to give better quality overall at 120 minutes, and thorough examination of frame captures combined with careful viewing of playback confirms that.

One must conclude then that the best choice of encoding options for a particular source depends on project length. Note that converted size also depends on project length and the choice of encoding option

These plots show the progression from a CQ encode to VBR for SP 2P as project length increases. Since the overall quality of those encodes, compared with MP 1P, drops for longer projects, one must also conclude that using SP 2P for most projects, as proposed by some users, is not wise.

Fig. 3 compares 1 and 2 pass conversions at 120 minutes.

The difference between 1 and 2 pass for MP and LP is insignificant because neither fills the DVD so a CQ encode is done. Multipass cannot change anything that is constant whether CBR or CQ. Even in the complex scenes where CQ is not exactly constant, there is little room for bitrate improvement because the bitrate there is restricted by DVD standards. The only place a potentially significant difference is found is in transitions between scenes, but they pass so fast that they are inconsequential. So anyone that tells you that 2 pass is better when a conversion does not fill the DVD has a vivid imagination.

Now compare the SP 1P conversion with others in that figure. The SP 1P conversion is clearly bad in complex scenes because it has filled the DVD and is no longer a CQ encode, but is still the best in simple ones. SP 2P is better than that in complex scenes but worse in simple ones, but we have already established that MP 1P is better overall than SP 2P so the 2 pass option is a waste of time in this case. So we are faced with a compromise in deciding which option is best.

Note that the LP 1P conversion is very nearly a pure CQ encode but doesn't give results as good as MP 1P. So that figure illustrates that CX2D can give better results than a program that does a pure CQ encode.

Fig. 4 and 5 present worst case partial frame captures to help you decide. Remember that a frame capture represents what is actually converted and written on your DVD for every pixel, it is not enhanced as usually done when playing the conversion on your computer with a software player or when played on your TV through a standalone player.

Download those figures, load them in a viewer or graphics editor and zoom them to about 300 % where individual pixels are apparent to see differences better.

You should find big differences for complex scenes and less of a difference in simple ones. Those images also give a clue about how the different encoding options work.

Look for differences in sharpness of detail in the complex scene in faces, clothing and the colored tape around the stair rail and the post. You will find what appears to be the beginning of block type pixilation with SP 1P in areas where there is little variation in visual properties like the man‘s gray pants or the coats and hats. It really isn’t as bad as it may seem at first glance. In an editor that has a tool that shows visual properties for individual pixels, you will find that those properties actually vary within what appears to be a block of equal properties, the biggest block I found, in a quick examination, with the exact same properties is 3 x 3 pixels. So the encoder has done the best it can do under the circumstances but the result is not good when viewed during playback.

There may be little difference at first glance in the simple scene comparison. But a close look at things like the texture of wall paint, edges of the vertical post and individual strands of hair shows a little less detail for MP 1P, but the face, the most important part of the scene for most people, is fine unless one is very critical. The result can best be described as a softening of the scene. Many people that have no experience with video conversion and therefore no prejudices or biases either don’t notice a difference during playback or, believe it or not, actually prefer the softer look of the MP 1P conversion for simple scenes.

Note again that Fig. 4 and 5 represent worst cases, they are for the specific encoding options and projects shown in the titles. But they can be used to get an idea of what to expect in other cases, refer to the discussion above of Fig 2 and 3 to get an idea of what can happen.

MP 1P should be the clear winner here if one keeps an open mind. So a CQ conversion that did not fill a DVD is better than a non-CQ VBR conversion that did fill the DVD, at least in this case. The same will be true in most cases as shown by many tests discussed in the References below and opinions of unbiased people watching conversions.

And, because the only noticeable difference between SP 1P and MP 1P when both do not fill the DVD, is a slight reduction in quality with MP 1P in simple scenes which many do not object to or even like better, one can conclude that not much is lost, if any, if the MP option is chosen when in doubt about whether SP will fill the DVD or not.

Sidebar: Keeping an open mind in deciding quality can be difficult for those bringing strong prejudices/biases from their experience with VBR. This has been discussed many times in the forums so we won't dwell on it. But be aware that your mind is a strong filter of what you actually see. In other words, if you still believe at this point that bigger just has to be better somehow with CX2D, you will be convinced that it is even though others with no biases don't agree.

Why the Difference in Complex and Simple Scenes?

What is going on? There is little difference in the simple scene of Fig. 5, SP 1P looks only slightly better despite applying more than 4 times as much bitrate as MP 1P. But there are big differences in the complex scene of Fig. 4. MP 1P is much better there even though it applied only a little more than 2 times as much bitrate.

That points to the conclusion that quality in a CQ encode is not determined by bitrate alone, and therefore bigger converted size does not guarantee quality. One might also conclude that the SP option applies more bitrate to simple scenes than is necessary if you agree that MP does a pretty good job.

This behavior, which seems odd to users not familiar with CQ, is due to the way Q is used by the encoder -- CX2D uses the open source ffmpeg encoder.

The encoder allows Q to range from 1 to 31. Every pixel in every frame, regardless of how complex the scenery is, is described in the best detail the encoder is capable of when Q = 1, it is the best quality the encoder is capable of. As Q rises above 1, the encoder begins to save bitrate resources in simple scenes by grouping together adjacent pixels that have similar visual characteristics. Those saved resources can then be applied in complex scenes to maintain their quality.

If Q = 2, you will rarely find more than 2 or 3 pixels grouped together and those grouped pixels/blocks are spread throughout the frame, rarely close to each other where it would be objectionable. The blocks naturally become bigger and closer together as Q increases but still not very objectionable as long as Q is less than about 5. You will start to see noticeable blocks when Q is above 5 in a scene, they will be objectionable for most people for Q more than about 10, very few people will like a conversion where Q rises above 15, and any conversion producing Q as high as 25 in any scene is virtually unwatchable.

Sidebar: The encoder options actually provide minimum Q values for the encoder to use:

Min Q for SP = 1

Min Q for MP = 2

Min Q for LP = 3

For some reason unknown to me, a minimum Q of 1 is used for all frames when possible with the SP option, but Q = 2 is only used for about 1/3 of the frames with MP while the remaining 2/3 are assigned Q = 3. The result is an average of 2.67 as reflected in the plots already presented. A similar thing happens with LP so the lowest average is 3.67. (For those familiar with mpeg encoding, I and P frame types receive the lower values while B frames receive the higher value.)

What Happens with Different Source Files?

We have learned that a CQ encode uses only as much bitrate as necessary to keep quality the same for all scenes. Less bitrate is needed for simple scenes than complex, so it is logical to expect that converted size of different sources will vary according to the average scene complexity.

Logs posted in the forum were surveyed to shed light on how much variation can be expected with typical sources. The results are in Fig. 6.

Only conversions that did not fill the DVD are shown because that gives a CQ encode. (The percent occupancy of the conversion is shown in your log. Because of the way CX2D works, occupancy less than 95 % usually indicates a CQ conversion while those above 95 % usually indicate a less desirable traditional VBR conversion.) Only those conversions with one audio stream, either stereo or 5.1 surround, and one or no subtitles are shown, because audio and subtitles require fixed bitrate resources so multiple ones reduce bitrate resources that could be used for better video quality. And there were not enough samples of LP conversions to include.

The bitrates shown on the vertical scale were obtained by simply dividing the converted size by project length. They can easily be converted to more conventional kb/s units if desired by using the conversion factor shown there.

Fig. 6 shows that

1. Converted size for SP can vary by more than a factor of 2 because of different scene complexity and by more than a factor of 3 for MP.

2. Project lengths that will not fill a DVD5 can vary by a factor of more than 2 for SP and more than 3 for MP.

3. The ratio of converted size for complex scenery is about 1.8 times higher with SP than MP. That ratio is about 2.5 for simple scenery, it s about 2.2 for average scene complexity. (This illustrates how converted size jumps in steps when the encoding option is changed, as mentioned earlier.)

Now we must conclude that the best choice of encoding options for a particular source depends not only on project length but also very much on average scene complexity in the source.

Remember that these are typical results. There are a few videos with average scenery even more complex than these and some even more simple.

Data in Fig. 6 is presented as actual numbers in http://forums.vso-software.fr/post71866.html#p71866 along with some more insight into what they indicate.

So How do We Decide which Option to Use for a Particular Project?

We want to use an encoding option that will come as close as possible to filling the DVD without actually filling it. But we don't know for sure just how complex the scenery is in our project and how much it will affect converted size. So how can we choose the best encode option to use?

The easiest way is to rely on some general guidelines that were developed over two years ago and seem to work well for most users and most projects.

Briefly, for conversions to a DVD5, use:

SP for projects less than 80 minutes long

MP for those between 80 and 160 minutes

LP for those more than 160 minutes

Multiply those times by 8100 / 4300 = 1.88 to get appropriate switch times for DVD9.

Note that the same guidelines are shown in the encoding settings of version 4. And the Automatic option in versions 4 and 5 uses them to decide whether to use SP, MP or LP, with a few minor corrections for things like multiple audio streams. Version 2 and 3 users have to set encoding options manually.

It may seem that these guidelines ignore the big effect that average scene complexity has. But that effect is accounted for indirectly because the switch times are based on the optimum for a test file (the same one that results are shown for here) that has average scene complexity about the same as those shown in Fig. 6 as Complex. Then the guidelines will recommend changing from SP to MP much earlier than might be optimum for simpler sources. But remember the earlier statement that:

And, because the only noticeable difference between SP 1P and MP 1P when both do not fill the DVD, is a slight reduction in quality with MP 1P in simple scenes which many do not object to or even like better, one can conclude that not much is lost, if any, if the MP option is chosen when in doubt about whether SP will fill the DVD or not.

The same principle applies to a choice between MP or LP.

Like any rule-of-thumb, the guidelines work well most of the time but are not infallible.

They work very well for projects with fairly complex average scenery. And they usually work well for most people and most projects with scenery between complex and simple.

Alternate Ways to Pick the Best Option

There is no easy way to pick a better option than indicated by the guidelines. But there are some things one can try for very simple scenery sources if determined to get a slight quality increase and satisfy the desire for a fuller DVD at the same time. Note: There is another method presented in http://forums.vso-software.fr/determine ... 17528.html that can be used in addition to those discussed below.

A conversion with MP which gives disc occupancy less than about 40 to 50 % indicates fairly simple average scenery so you can probably reconvert using SP if it is important. You will still probably get a good result in that case even if the new SP conversion does fill the DVD because it will most likely still resemble CQ fairly well.

You can also start a conversion with an option higher than the one indicated by the guidelines, let it run for about 10 minutes then pause it. Note the bitrate shown in the status bar at the bottom, convert it to MB/min using the conversion factor in Fig. 6, then project a horizontal line that intersects the red envelope line in Fig. 6, and then project down to find the maximum project length that can be used without filling the DVD. Then proceed with the conversion or switch options as indicated. (To find the equivalent max project length for a DVD9 disc, multiply the max for a DVD5 found from Fig.6 by 8100/4300 = 1.88.)

This approach is not foolproof though. The average scenery complexity in the first few minutes may be significantly different than in the rest of the video.

A better way, but more time consuming, is to set the target size to a high custom value in settings, then convert with your preferred option. Then you can decide if the conversion will fit on a DVD5 or DVD9. You can try again with another option if the results indicate so.

You can make better judgments as you gain familiarity with conversions, particularly if most of your sources are very similar and you pay attention to where their conversions fall in Fig. 6. But you may find, as one user did, that there can be as much as a 40 % difference in scene complexity, and therefore in converted sizes, between collections of the same TV show for example.

In the end, most people are better off sticking with the guidelines or choosing the Automatic option in ver 4.

Judging Scene Complexity

A discussion of this is necessary mainly because of the widespread mistaken and misleading impression that action primarily determines complexity. It does not. Bitrates assigned to different scenes by encoders indicate how complex a scene is, and they show that most action scenes are not very complex. In fact, the average complexity of most action movies is simpler than commonly thought, it usually falls somewhere from fairly simple to about midway in the range between the most simple and most complex.

A basic explanation of the way an encoder compresses helps understand why. It will fully describe the first frame using grouping of pixels and other techniques to reduce the amount of data that describes what is in the frame. Then only differences from that first frame are described in the next few frames. Those differences are caused not only by action but also by any movement that makes the next frames different -- natural movement such as falling rain or wind rustling leaves, camera pans or zooms, etc. An occasional full frame refresh is done to re-establish accuracy.

Try to think like an encoder and imagine that you had to describe each frame, maybe with a detailed drawing. Suppose the camera records a man running across the frame. The background doesn’t change so only the man has to be described in subsequent frames to account for differences. But his image is usually blurred so there is not a lot of difference to describe, it is a fairly simple scene in other words. Now suppose the camera pans to follow the man. This scene is a little more complex but still fairly simple, because the background will be blurred and easy to account for and because people are fairly easy to describe even if in perfect focus.

It turns out that what we regard as action video, meaning a lot of fast movement by people or machines, is not very complex in many videos. One of the most complex scenes you can find is a forest with leaves gently rustling in the breeze. There is much detail to describe there if the scene is in sharp focus because of all the contrasting edges and variation of visual properties within those edges -- note that black and white video is much easier to describe, and therefore simpler than color because it has less variation of visual properties. And almost all that detail has to accounted for in the next few frames because of the slight movement of the leaves. But that scene is certainly not what most would describe as action. Another example of a very complex "non-action scene" is shown in Fig. 4. It is complex because of the falling confetti which provides both movement and detail to descibe, the rest of the scene involves a lot of detail because there are several people and they are clappimg which is movement to describe. But it is also not what one would call an action scene.

In fact, the video that has the most complex average scenery I have found is a documentary of the Amazon river area, certainly not an action video. It was filmed by Imax with extraordinary quality, things in focus, etc. The Avatar movie has a lot of fast action and its scenery is almost as complex as the Amazon documentary. It has extraordinary detail even in fast action scenes because of the excellent detail in the animation. But those graphics allow sharp detail in action scenes that cannot be captured by a camera.

Compare your conversions with those in Fig. 6, think like an encoder to understand the complexity indicated, and you will eventually be able to judge if you want to choose custom encoding options rather than rely on guidelines or the Automatic option.


Here are some reference links that provide more detail about this subject. Follow links in them, or even read the entire topic that contains them, for even more information.

First results -- http://forums.vso-software.fr/post20817.html#p20817 and for bitrate only http://forums.vso-software.fr/post38576.html#p38576

Some more Answers -- http://forums.vso-software.fr/post40642.html#p40642 and http://forums.vso-software.fr/post40813.html#p40813 and http://forums.vso-software.fr/post41357.html#p41357

Compression -- http://forums.vso-software.fr/post43436.html#p43436

Modified CQ -- http://forums.vso-software.fr/post47115.html#p47115

2 Pass -- http://forums.vso-software.fr/post51085.html#p51085

Summary -- http://forums.vso-software.fr/post38321.html#p38321

We see what we want -- http://forums.vso-software.fr/post26588.html#p26588

180 min SP, LP images -- http://forums.vso-software.fr/post21505.html#p21505
Fig. 1 - Encoder Comparison for 120 Minute Project
Fig. 1 - Encoder Comparison for 120 Minute Project
Fig. 2 - SP 2P and MP 1P Compare for Different Project Lengths
Fig. 2 - SP 2P and MP 1P Compare for Different Project Lengths
Fig. 3 - 1 and 2 Pass Comparison for 120 Minute Project
Fig. 3 - 1 and 2 Pass Comparison for 120 Minute Project
Fig. 4 - Complex Scene Compare for 120 Minute Project
Fig. 4 - Complex Scene Compare for 120 Minute Project
Fig. 5 - Simple Scene Compare for 60 Minute Project
Fig. 5 - Simple Scene Compare for 60 Minute Project
Fig. 6 - Effect of Scenery Complexity and Encode Option.
Fig. 6 - Effect of Scenery Complexity and Encode Option.
Last edited by ckhouston on Sun Nov 21, 2010 1:48 pm, edited 13 times in total.