ffmpeg - Thumbnail and Preview Clip Generation (Part 2)

Disclaimer - If you are unfamiliar with FFmpeg, then please read this blog post before proceeding.

When you upload a video to a platform such as Youtube, you can select and add a custom thumbnail image to display within its result item. Amongst the many recommended videos, a professionally-made thumbnail captures the attention of undecided users and improves the chances of your video being played. At a low-level, a thumbnail consists of an image, a title and a duration (placed within a faded black box and fixed to the lower-right corner):

To generate a thumbnail from a video with ffmpeg:

  1. Decide on a frame to extract the thumbnail image from via the -ss and -vframes options. The thumbnail only requires an image of a single frame. Therefore, set the -vframes option to 1. The -ss option ("start timestamp") allows you to pick a timestamp within the video from which to extract the single frame from. If the -ss option is not provided to the ffmpeg command, then by default, the frame will be extracted from the 00:00:00.000 (hour:minute:seconds.milliseconds) timestamp.

  2. Draw the title and duration onto the thumbnail image via the drawtext video filter. A video filter modifies/transforms the media streaming through it, and it is specified with the -vf option. When a filter finishes modifying the input media, it outputs the result, and this result is piped to the next available filter as its input. Connecting multiple filters forms a directed graph called a filtergraph. To write a piece of text onto the thumbnail image, use the drawtext video filter.

Adding Text to a Thumbnail#

Let's test the drawtext filter by extracting the thumbnail image from the beginning of the video and writing "Test Text" to the center of this image. This thumbnail image will be a JPEG file.

Notice that the drawtext filter accepts the parameters text, fontcolor, fontsize, x and y for configuring it:

  • text - What should be written to the media by the filter?

  • fontcolor - What color should the written text be?

  • fontsize - How large/small should the written text be? When this parameter is set to a numerical value, the font size is in pixels.

  • x and y - Where should the written text be placed on the media? These parameters can accept expressions. Here, we set x and y to expressions that calculate the x and y coordinate values of the media's center point. To calculate these values, we must make use of the variables w (the input's width), tw (the text's width), h (the input's height) and th (the text's height).

The parameters are delimited by a colon.

To see a full list of drawtext parameters, click here.

Adding Multiple Texts to a Thumbnail#

Now that we've covered the basics, let's add a duration to this thumbnail:

Unfortunately, there's no convenient variable like w or tw for accessing the input's duration. Therefore, we must extract the duration from the input's information, which is outputted by the -i option.

2>&1 redirects standard error (2 for stderr) to standard output (1 for stdout). We pipe the information outputted by the -i option directly to grep to search for the line containing the text "Duration" and pipe it to cut to extract the duration (i.e., 00:00:10 for ten seconds) from this line. This duration is stored within a variable DURATION so that it can be injected into the text passed to drawtext.

Here, we use two drawtext filters to modify the input media: one for writing the title text "Test Text" and one for writing the duration "00:00:10". The filters are comma delimited. To place the duration within a box, provide the box parameter and set it to 1 to enable it. To set the background color of this box, provide the boxcolor parameter.

Note: Alternatively, you could get the video's duration via the ffprobe command.

Writing a Bash Script for Generating Thumbnail#

Let's tidy up this thumbnail by substituting the placeholder title with the actual title, uppercasing this title, changing the font to "Open Sans" and moving the duration box to the bottom-right corner. Like the duration, the title must also be extracted from the input media's information. To uppercase every letter in the title, place the ^^ symbol of Bash 4 at the end of the title's variable via parameter expansion (${TITLE^^}). Since Bash is required for the uppercasing, let's place these commands inside of a .sh file beginning with a Bash shebang, which determines how the script will be executed.

To find the location of the Bash interpreter for the shebang, run the following command:


To specify a font weight for a custom font, reference that font weight's file as the fontfile. Don't forget to replace <username> with your own username!

Additionally, several changes were made to the thumbnail box. The box color has a subtle opacity of 0.625. This number (any number between 0 and 1) proceeds the @ in the boxcolor. A border width of 8px provides a bit of spacing between the edges of the box and the text itself.

Note: If you run into a bash: Bad Substitution error, update Bash to version 4+ and verify the Bash shebang correctly points to the Bash executable.

Clipping a Video#

When you hover over a recommended video's thumbnail, a brief clip appears and plays to give you an idea of what the video's content is. With the ffmpeg command, generating a clip from a video is relatively easy. Just provide a starting timestamp via the -ss option (from the original video, -ss seeks until it reaches this timestamp, which will serve as the point the clip begins at) and an ending timestamp via the -to option (from the original video at which the clip should end). Because video previews on Youtube are three seconds long, let's extract a three second segment starting from the four second mark and ending at the seven second mark.

Since the clip lasts for a few seconds, we must re-encode the video (exclude -c copy) to accurately capture instances when no keyframes exist. To clip a video without re-encoding, ffmpeg must capture a sufficient number of keyframes from the video. Since MP4s are encoded with the H.264 video codec (h264 (High) is stated under the video's metadata printed by ffmpeg -i <input>), if we assume that there are 250 frames between any two keyframes ("a GOP size of 250"), then for the ten second Big Buck Bunny video with a frame rate of 30 fps, there is one keyframe each eight to nine seconds. Clipping a video less than nine seconds with -c copy results in no keyframes being captured, and thus, the outputted clip contains no video (0 kB of video).

Eight Second Clip (with -c copy):

Nine Second Clip (with -c copy):

Note: Alternatively, the -t option can be used in place of the -to option. With the -t option, you must specify the duration rather than the ending timestamp. So instead of 00:00:07 with -to, it would be 00:00:03 with -t for a three second clip.

Overlaying an Image on Top of a Thumbnail#

Suppose you want to add your brand's logo, custom-made title graphics or watermark to the thumbnail.

To overlay such an image on top of a thumbnail, pass this image as an input file via the i option and apply the overlay filter. Position the image on top of the thumbnail accordingly with the x and y parameters.


Passing multiple inputs (in this case, a video and watermark image) requires the -filter_complex option in place of the -vf option. The main_h and overlay_h variables represent the main input's height (from the input video) and the overlay's height (from the input watermark image) respectively. Here, we place the watermark image in the lower-left corner of the thumbnail.

The watermark image looks a bit large compared to the other elements on the thumbnail. Let's scale down the watermark image to half its original size by first scaling it down before any of the existing chained filters are executed.


To scale the watermark image to half its size, we must explicitly tell the scale filter to only scale this image and not the video. This is done by prepending [1:v] to the scale filter to have the scale filter target our second input -i ./watermark-ex.png. The iw and ih variables will represent the watermark image's width and height respectively. Once the scaling is done, the scaled watermark image is outputted to ovrl, which can be referenced by other filters for consumption as a filter input. Because the overlay filter takes two inputs, an input video and an input image overlay, we prepend the overlay filter with these inputs: [0:v] for the first input -i ./Big_Buck_Bunny_360_10s_30MB.mp4 and ovrl for our scaled watermark image.

Next Steps#

Imagine having a large repository of videos that needs to be processed and uploaded during continuous integration. Write a Bash script to automate this process.