One of the easiest solutions, relatively, is ffmpeg. It is a very active open source project with some awesome people contributing. Getting it to work from the source code can be bothersome. If you're using Linux, there is a good chance you can download a package. Windows users should refer to the unofficial Win32 page. Consider donating some money, really, you don't want to build it yourself on Windows ;)
Let's get started with the conversion script I use on Windows:
@echo off
set in=input_file
set out=output_file
set size=500x272
set ab=64k
set vb=384k
set flags=-coder ac -flags +gmc+loop+qpel+umv -flags2 +bpyramid+dct8x8+mixed_refs+wpred -me_method umh -subq 6 -partitions +parti4x4+parti8x8+partp4x4+partp8x8+partb8x8 -qmin 8 -qmax 48 -refs 3 -trellis 2 -sws_flags lanczos -threads 4 -y
ffmpeg -i %in% -vcodec libx264 -b %vb% -s %size% -acodec libfaac -ac 2 -ab %ab% -pass 1 %flags% %out%
ffmpeg -i %in% -vcodec libx264 -b %vb% -s %size% -acodec libfaac -ac 2 -ab %ab% -pass 2 %flags% %out%
del ffmpeg2pass-0.log
del x264_2pass.log
This script makes certain assumptions about how you want your video done. I think both H.264 and AAC are the codecs to use right now so this script uses settings that mostly apply to them. Your milage with other codecs may vary. As for the bandwidth of both the video and audio stream, at the size we're talking about (500 pixels wide) that is more than enough. I think 64k is acceptable for stereo sound of most sources. For movie trailers I duplicate the original audio stream. You can replace the relevant lines with:
ffmpeg -i %in% -vcodec libx264 -b %vb% -s %size% -acodec copy -pass 1 %flags% %out%
ffmpeg -i %in% -vcodec libx264 -b %vb% -s %size% -acodec copy -pass 2 %flags% %out%
No doubt about two pass encoding being necessary when possible. It allows for much better rate control which results in a crisper picture. H.264 also supports entropy coding (CABAC), psycho-visual coding (Trellis) and 8x8 macro-blocks ('tiles') instead of the more common 16x16. It really takes a leap from the DivX era in terms of what you can do with a certain amount of data.
But let's get down to the nitty gritty, the parameters. You're done now if you are not a geek ;) I am not an expert on video codecs. The following may confuse you or may be flat out wrong.
-coder ac
This enables Context-Adaptive Binary Arithmetic Coding and that's why we call it CABAC. The short answer is that this further compresses data from the video stream. It does not operate on the image itself, but on the encoded video. You can call it 'zipping' your stream, except this coder is on steroids. Advanced feature supported by Flash Player.
-flags +gmc+loop+qpel+umv
Here we have Global Motion Compensation, which analyzes the video for things like camera pans. If multiple macro-blocks (the 'tiles') are doing the same move this can optimize the compression. It is something not all hardware supports because of the memory requirements. The Loop filter enables the 'deblocking' filter, in fact inserts the preference flag into the media. Quarter PixEL tells all math operations that we're quite anal and want sub-pixel resolution. The Unlimited Motion Vectors is tossed in just for good measure to (maybe) further optimize the encoding process.
-flags2 +bpyramid+dct8x8+mixed_refs+wpred
There are B-frames and D-frames in the video stream. While a B-frame frame represents a whole picture, the D-frame (Delta) only stores differences between B-frames. The B-pyramid feature enables multi-resolution analysis (the different zoom levels) of the B-frames by the encoder. This can optimize encoding results because of the same reason Global Motion Compensation works. It allows the codec to use more of the context around each macroblock (compressed/decompressed tile).
The 'dct8x8' enables 8x8 DCT, which implies that the codec chops up the picture in smaller tiles than most older codecs. Mixed-References allow D-frames to reference multiple B-frames. Weighted Prediction tells the codec to consider more 'time' around each movie frame. This optimizes movie segments like fades-ins or cross-fades.
Please note that these are all features which makes the stream 'High' profile. Not all devices, like my iPod Video 5G, support such advanced functionality. Fortunately, Flash Player does.
-me_method umh -subq 6 -partitions +parti4×4+parti8×8+partp4×4+partp8×8+partb8×8
We want ffmpeg to use an Uneven Multi-Hexagonal pattern while detecting motion. This is more precise at expense of encoding speed. The 'subq' is a bit of a voodoo parameter which specifies the tradeoff between speed and quality. The partitions parameter specifies what kind of macro-blocks we want to operate on.
-qmin 8 -qmax 48 -trellis 2
An important part of compressing video is reducing the number of colors and pixels used concurrently per macro-block. This is called 'quantization'. The higher the quantization, the more detail we lose in the picture. The 'qmin' and 'qmax' define these bounds for the codec. In the output of ffmpeg you can see this value while encoding. If it is reaching 'qmax' most of the time your bitrate is too low. Finally there is Trellis which is a loss-less coder and further compresses the output of the quantization stage.
-refs 3 -sws_flags lanczos -threads 4
With 'refs' we specify how many B-frames may be referenced by one D-frame. The more references, the more memory the decoder needs. It comes as no suprise this must be '1' in case of most portable decoding hardware. Lanczos is one of the hidden gems of image resizing; it preserves unbelievable amounts of detail. The 'threads' parameters, with which we conclude, must be equal to the number of logical processors in your system.
hello good website yea nice job Great articles