A simple trick for how to reuse and update Encoding resources

If you use the Bitmovin API (and SDKs), you will quickly notice something critical: although our solution is managed through a REST API, there are very few resources that can be updated (whether with an HTTP PUT or POST operation). You can create most resources, you can retrieve them, and you can delete them. That’s it. It’s an architectural change that we made a while ago for reasons I won’t go into in this post (if only because it’s not going to change any time soon, if ever).

No update?

And for most resources you create and use in the configuration of an encoding, it’s not really something you need to think about. Streams, Muxings, Manifests, they all only really make sense in the context of a single encoding, and in a typical encoding script are created on the fly just before starting the encoding, after which there really should be no need to update them.

There are however a number of resources for which one might be tempted to want the ability to modify them. In particular the “independent” resources (ie. those that don’t depend on other resources having been created previously), such as Input, Output, CodecConfigurations, Filters, etc.

To reuse or not to reuse?

This, in particular, because you’ll often hear or read that we recommend it as best practice to reuse resources once created, when setting up encoding workflows. Sure, you could decide to create each such resource for every single script you build, and we won’t stop you from doing this. In many ways it is simpler.

But, and in particular when considering production workflows, there are some good reasons not do so:

  1. Troubleshooting. Say for example that you realise that a codec configuration had a wrong parameter, causing an issue down the line. You will want to quickly identify how many other encodings were affected in the same way. It’s much simpler and faster to do so if you’ve used the same CodecConfiguration in all your encodings, rather than created one similar one for each encoding
  2. Usability. If you don’t reuse those resources, you will quickly end up with thousands of resources in your account that are functionally equivalent. Again, not a problem, unless you then try to find them in the dashboard or through the APIs.
  3. Certification. If you have separate teams configuring and validating configurations, and putting them to production, it offers a level of safety that you are doing things “right” when you create your encodings.

The cost of reuse

Let’s therefore say that you decide to adhere to those best practices and reuse independent resources once created. Congratulations, you’ve now made your workflow more complex. Now, instead of just having one independent script able to set up all aspects of an encoding that you can use in a fire-and-forget way, you now have to worry about (and implement) additional considerations, such as:

  • Where will I keep a record of the resource identifiers I want to reuse? Traditionally this means doing a lookup in a database, or some other form of long term storage.
  • Do these resources still exist? Who’s to say that someone else didn’t go and delete them recently, thereby causing any new encoding to fail.
  • How do I create those resources in the first place? That’s simple enough: a dedicated script could create the resources, but you will need to create it and maintain it.

The need to update

And then we come back to the first point: how do you go about updating things? Sure, you can’t update the Bitmovin resources themselves, as highlighted above, but functionally you may need to update your workflow, for example:

  • When the storage credentials change (which is good security practice)
  • When the codec configurations parameters need to change (for example, as a result of fixing the problem given earlier as example)

My solution

In the rest of this post, I want to give you my solution to all this, which gives me the best of both worlds: the ability to reuse and update, resources, whilst still having all my configuration of the resources and the encodings in the same script.

Part 1 - identifiers

It relies on a number of principles about using our APIs, which are worth knowing even if you don’t implement a solution similar to mine:

  1. Almost all resources in the Bitmovin solution can be named. You can set the name property on any resource in almost any way you want. Those names are not validated, and there is no requirement for them to be unique.
  2. Most GET endpoints to return lists of resources accept a parameter to filter resources by name.
  3. A GET endpoint will return lists of resources in reverse order of creation. Therefore, the first item in the list is always going to be the most recent one.

What follows from these 3 principles is that you can use the name property as an identifier of sorts.

  • Set it when you create the resource, and then in your encoding script, retrieve the first (ie. the most recent) one, and use its id.
  • If the list comes back empty, you know it doesn’t exist, and you can create it. So your encoding script, instead of creating resources, moves to a logic of create-if-not-exist.
  • And if you need to (functionally) update it, you just create a new one, with the same name. Simples!
    • You might also be tempted to delete the older one. My advice? Don’t! There is almost never any reason to delete resources within the Bitmovin solution, and if you do so, you may impact your ability to retrieve information about old encodings. You never know when you’ll need to be able to do that…

However, that’s all well and good, but now you have the burden of defining identifiers, unique enough that there is no chance that anyone else in the team would create another resource functionally different, but with the same name. And again, you’ll need to store these identifiers somewhere…

Part 2 - hashing

When you use the APIs (wether you use SDKs or not), what you essentially do is build payloads of resources you want to create, and then submit those to the APIs for adding to the back-end. Most of these payloads are simple, and contain just the information that you want to submit (leaving any other unspecified property of the resource to its default value).

The functional “semantic” or your resource is therefore contained in that payload. And so, we can turn to a technique widely used in computer systems: hashing.

By hashing the payload, we can get a short and sweet string representation of it that uniquely represents that functional resource. Which also makes a perfect name for it!

So, what would your fire-and-forget script do then?

  1. Build the payload for the resource you want to create
  2. Calculate a hash for it, using whatever mechanism you prefer. Depending on how you go about it, you may need to make sure that it considers properties of the resource that are “functionally meaningful”
  3. Query the GET endpoint for that resource, using that hash as a filter.
  4. If a non-empty list is returned, pick the first item, which is the most recent
  5. Otherwise, that resource doesn’t exist, and you create it.

That’s it! No more database, no more need to remember the identifiers… Just a few lines of generic code to insert into your existing encoding workflows.

Example

Here is some code from one of my previous projects. It’s specific to Java, as it uses reflection, but similar mechanisms would be possible with all SDKs. It will not work for every use case, and probably suffers from edge cases (not that I’ve found any yet). It’s therefore really only provided as an example.

First is a new class, the ModelHasher, that takes care of creating that hashed identifier:

package common;

import com.bitmovin.api.sdk.model.BitmovinResource;
import java.lang.reflect.Field;
import java.lang.reflect.Modifier;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.commons.lang3.RandomStringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * This class provides a static method to calculate the hash of a Bitmovin resource
 *
 * It uses Java reflection to find all properties of the object, and after filtering unwanted ones
 * creates a simple string that could be used as a functional unique identifier
 */
public class ModelHasher {
  private static final Logger logger = LoggerFactory.getLogger(ModelHasher.class);

  private static final List<String> fieldsAlwaysExcluded = Arrays.asList(
      "id", "createdAt", "modifiedAt", "description"
  );

  public static Boolean forceRandom = false;

  private static List<Field> getAllFields(List<Field> fields, Class<?> type) {
    fields.addAll(Arrays.asList(type.getDeclaredFields()));

    if (type.getSuperclass() != null) {
      getAllFields(fields, type.getSuperclass());
    }

    return fields;
  }

  public static String createHash(
      BitmovinResource obj) {
    return createHash(obj, null, null, null);
  }

  /**
   * Creates a string with a hash of the resource provided as parameter
   * @param obj The object representing the resource
   * @param excludeFields The fields of the object that will never be taken into account
   * @param includeFields A list of the fields of the object to take into account
   * @param prefix An optional prefix to add before the hash
   */
  public static String createHash(
      Object obj,
      List<String> includeFields,
      List<String> excludeFields,
      String prefix
  ) {
    Class clazz = obj.getClass();

    if (excludeFields == null) {
      excludeFields = new ArrayList<>();
    } else {
      excludeFields = new ArrayList<>(excludeFields);
    }
    excludeFields.addAll(fieldsAlwaysExcluded);

    if (includeFields == null) {
      includeFields = new ArrayList<>();
    }

    if (includeFields.size() > 0 && excludeFields.size() > 0) {
      throw new IllegalArgumentException("Cannot have both an include and exclude list of fields");
    }

    // traverse the class hierarchy to collect all object properties
    List<Field> allResourceFields = getAllFields(new LinkedList<Field>(), clazz);

    // create a simple string that will be hashed
    StringBuilder stringToHash = new StringBuilder(obj.getClass().getSimpleName());

    try {
      for (Field f : allResourceFields) {
        if (includeFields.contains(f.getName()) || !excludeFields.contains(f.getName()))
        {
          int modifiers = f.getModifiers();
          if (Modifier.isPrivate(modifiers))
            f.setAccessible(true);

          stringToHash.append(String.format("-%s=%s", f.getName(), f.get(obj)));

          if (Modifier.isPrivate(modifiers))
            f.setAccessible(false);
        }
      }

      logger.debug(stringToHash.toString());

      // hash it
      StringBuilder hashString = new StringBuilder();
      if (prefix != null) {
        hashString.append(prefix).append("|");
      }
      hashString.append(DigestUtils.sha256Hex(stringToHash.toString()));
      if (forceRandom)
        hashString
            .append("|")
            .append(RandomStringUtils.random(6, true, true));

      return hashString.toString();
    } catch (Exception e) {
      logger.error("Error in creating hash for model: " + e.getMessage());
      // return just a random string to ensure the main code continues running
      return RandomStringUtils.random(16, true, true);
    }
  }

}

And here is how I would use it, in this case in the creation of an S3 output. I modify here one of the method you’ll find in the majority of our SDK examples, essentially replacing lines 244-252 in this example

private static S3Output createS3Output(String bucketName, String accessKey, String secretKey)
      throws BitmovinException {

    S3Output s3Output = new S3Output();
    s3Output.setBucketName(bucketName);
    s3Output.setAccessKey(accessKey);
    s3Output.setSecretKey(secretKey);

    // compute hash
    String hash = ModelHasher.createHash(s3Output);

    // query to see if it exists
    OutputListQueryParams params = new OutputListQueryParams();
    params.setName(hash);
    params.setLimit(1);
    List<Output> existingOutputs = bitmovinApi.encoding.outputs.list(params).getItems();

    // reuse or create
    if (existingOutputs.size() > 0) {
      return (S3Output) existingOutputs.get(0);
    } else {
      s3Output.setName(hash);
      return bitmovinApi.encoding.outputs.s3.create(s3Output);
    }
  }

Let us know what you think about this type of technique. Is that something you’d use?
And if you have equivalent code for others of our SDKs, we’d love to you to share it in your comments!

Note

In a future version of the Bitmovin Encoding solution, we intend to implement something similar on our end, thereby removing the need for you to apply this logic in your own scripts…. You’ll then be able to go back to a pure create-every-resource-every-time, and we’ll take care of ensuring that we reuse previously created resources that are strictly equivalent, if they exist.

4 Likes