Boldon James: When Classification Really Means Classification

January 28, 2020

The term “classification” has been thrown around progressively by software companies that offer related products like data governance and DLP. In some cases, the vendor will define “classification” as the ability to discover and protect data, which is a very new and misleading use of the word. Traditionally “classification” was related to visual and metadata markings, like the control markings used by the intelligence community. Outside of that community, a full range of standard classifications and their related markings can be found in the CUI (controlled unclassified information) handbook. CUI is just one of many classification systems that involve markings. In other words, those systems require that the data is somehow “marked.” Protecting a file with DLP or encryption is not the same thing as “marking” the file as confidential.

For those who are unfamiliar with classification, it can be confusing when certain terms are thrown around. For instance, what is the difference (if any) between classifications, values, file properties, metadata markings, visual markings, and marking formats? Let’s look at a real world example, and put these terms into context.

If a sensitive document is marked with [TOP SECRET] in the header, then we could say that the classification is “top secret”, which in an abstract way is describing an option within a category. In the Classifier Admin console, those categories are called “selectors” and the options are called “values.” So the value might be “top_secret”, and there can be alternate values, like “TOP SECRET”. When those values are written to locations, like the header in MS Word, they can be formatted using any combination of fonts, font sizes, colors, justification, brackets and other punctuation. In other words, placing [TOP SECRET] in the header requires a marking format that writes the classification value in all caps and encapsulates it in brackets.

Markings that appear in the header are considered to be visual markings (aka visible markings). Any non-visual marking is called a metadata marking. Classifier, for instance can place metadata in the file properties (document properties and custom document properties) of a Word document. This metadata may use the same “[TOP SECRET]” marking format that was used for the header. As an alternative, the metadata can be more encoded, e.g., [xyzTopSecx] or [TS]. The Classifier label detection mechanism (and other software, e.g., DLP and data governance tools) will be configured to equate those markings with the “top secret” level. If the DLP detects [xyzTopSecx] in the keywords document property, then the file will be protected from leaving the organisation.

“Classification” products must be able to read and write labels in the manner described above. The alternative is that data is simply discovered and protected. Fingerprinting might be the closest substitute to classification. At first glance fingerprinting gives us a way to track and identify specific files. That is a powerful value add that can be important in certain use cases. The challenge is that the fingerprint database can be large, and communications can take excessive bandwidth (depending on how it’s used). Furthermore, fingerprinting is typically on or off. Either the fingerprint is applied or not. There is no “public fingerprint” vs. “confidential fingerprint”. So true classification stands alone as the only solution to complex classification needs, like the categories used in regulatory compliance like CUI.

Contact us today to find out more about protecting your sensitive data using a classification tool that really does classify your data how you need it to.