Document Watermarking
Watermarks for document security.
Watermarking is quite an old technique, appearing to start in the 13th century when it identified the papermaker or the trade guild manufacturing the paper. So right from the beginning it was about preventing forgery and identifying the source of the paper. The Gutenberg Bible was watermarked.
The common use of watermarks in paper today is to protect banknotes and postage stamps (since 1826). Putting watermarks into the paper was not the only technique developed – intaglio printing printed into the paper and made a distinct ridge that was noticeable to the touch.
But intaglio might be thought of as a precursor to using laser and inkjet printers to produce watermarks as part of printed documents. What makes the difference is that with today’s printing technology we print everything with a single pass of the paper instead of having multiple passes, one for each technique.
So what’s the big deal about watermarking as a security method?
The thing is that watermarks can be used to provide many controls reasonably easily.
The properties of watermarks include:
- Authenticating the source of the document;
- Identifying the content owner;
- Authenticating that the content is unchanged;
- Identifying any copyrights claimed;
- Identifying the authorised user of this copy of the document.
This is useful in any number of use cases, and as a result, users commonly watermark Excel, Word, PDF, and image files as well as physical banknotes, documents, postage notes, etc.
Authenticating the source and owner
On banknotes, the watermark is used to prove that the claimed issuing bank is valid and that the banknote is valued at the sum shown on the face. It does this by containing watermarks that are very difficult to copy, and are a mixture of printing and embedding, adding holograms and sometimes even transparent portions. The idea is not to make them impossible to forge, that would be rather far-fetched, but to make it so expensive to copy that it is not worth doing.
In the humble world of computers you can’t get a printer that would reproduce all these technical complexities, but you can still make things difficult to copy.
Take the picture below. It may not look very interesting at first glance, but it does have some very interesting properties. It is a selection from what is called a Mandelbrot set.
This is a highly complex piece of mathematics which when expressed visually as fractals, which are infinitely complex patterning, produce some interestingly complex patterns that are quasi-self-similar (in detail they do not repeat). See Wikipedia if you are interested in the technicalities.
So you can select a pattern such as the one below, and be reasonably confident that nobody else will be able to reproduce it exactly. Reproducing the image below which I have provided as an illustration of the point is very hard indeed, which means you can use the technique to create almost unique patterns to associate with your document content.
Another opportunity when using such patterns is to make them fairly light rather than highly contrasted. That has the result that if printed documents are scanned or copied the pattern will either fill in and look muddy, or will become more highly contrasted and show up as hard black and white instead of grey. In either case the authenticity of the document is in question.
Unfortunately normal laser and inkjet printers are not able to print to a quality that would show hidden words and markings when photocopied although other printing technologies (as used on banknotes and cheques) can do that.
Finally, adding a watermark of this type can help resist processing using an OCR scanner since the scanner can be upset trying to make some sense out of the background watermark.
Authenticating content is unchanged
In order to change the content that the watermark is protecting you have to be able to separate the content from the watermark. If the text is a black colour and the watermark is also colour black it will be impossible to run the extract. If you try to drop new text with a transparent background on to the current image it has to fit exactly over existing text or it will reveal it is faked.
Similarly you cannot copy and paste from one part of the picture to another because the pattern does not repeat so it will be immediately obvious it has been altered.
So the authentication of the content of the document is much more straightforward, particularly if it is being claimed that the document is a genuine copy, but not provably an original.
Identifying Copyrights
One of the issues of using copyright as a legal measure of control is the need to include a copyright mark in some prominent position. Whether this needs to be on every page is a matter of debate, although some prefer to do that to address the case of linking copyright to particular diagrams, pictures or tables.
It is possible to include a copyright statement as a watermark, but by the nature of watermarking, text is always put in the same place, so it is possible to use an editing tool to remove it, unless the publisher is happy to have invasive watermarking which goes through text and pictures in the document. This may be acceptable, particularly where printed copies bear invasive watermarks while viewed watermarks do not.
Of course this would require that there were different watermarking schemes possible between viewed and printed images. That would require a relatively sophisticated DRM control system that allowed a content owner to distinguish between the treatment of a visual image (which is presumably the desired presentation and brand image) and the printed image (which could be for backup or offline use). This may seem to reverse centuries of dealing with the printed image, but over the next 20 years we are going to see generations of people who read naturally from a tablet or phone and they are not going to reach for a book when they can go online. But we must plan for the future.
Identifying the authorized user
Watermarking strategies are also about making the authorized user feel uncomfortable or unwilling to allow other to have copies of documents they were authorized to use.
Some people claim that under Copyright Law it is their right to allow others to have copies of any document for the purpose of private study (reading it?) at no charge. In classical terms you could say that such people are effectively operating as lending libraries?
And that is all because normally there is nothing to identify which user(s) passed on their copy to someone else. However, using dynamic watermarking it is possible to include personalization into both viewed and printed copies of documents. Personalization could be the addition of a name or an identifier, an email address, a company name (where a corporate body is the ‘user’ and not a private individual) that would allow the publisher, through information in their own administration system, to identify which individual or organization was the source of the printed copy or the screen shot.
General observations about watermarking
Watermarking can be a powerful technique provided you understand what protection effect is going to be achieved with the method you have selected. It may be necessary to use more than one type of watermark, since an anti-copy watermark or adding a stamp to a PDF will not identify the authorized user, and vice-versa.
Choosing the colour of a watermark is important. If screen images are captured – say by a camera – then a picture editor can be used to remove a watermark easily if the watermark is a unique colour. So a red watermark when red is hardly ever used as a colour, is easily removed.
Also, if a watermark is always outside the normal page area of the document (say always in the header or footer) it is also not too difficult to remove, although more work than removing a colour. But to make this more difficult, it is necessary to have the (text) watermark run with the text of the document. This can create a conflict between the wish to protect the document and the wish to have a highly attractive document. So there will be some compromise between appearance and security which only the publisher can decide. The more important the security, the less visually attractive the result.
It is possible to select extra thin fonts – examples of free fonts may be found searching the Internet – and include Swiss 721 thin BT and HowardThinRegular which are much less likely to seriously overwrite written text or pictures. Using a larger font and a lower opacity – Swiss 721 thin BT and HowardThinRegular can improve the acceptability from a visual impact.
Finally, you may want to time-stamp a digital document for copyright purposes and to verify sequencing.