Protect online machinery manuals from AI web scraping & copying
What the EU machinery directive means for digital machinery manuals & how to protect them
Learn the new requirements for digital machinery manuals under the EU machinery directive, why online machinery manuals are vulnerable to AI web scraping, and how to protect them.
What the EU machinery directive means for businesses
The EU’s new machinery regulation (2023/1230) put into force a long-awaited change: the ability for manufacturers to deliver machinery manuals to their customers digitally. This should allow publishers to cut costs while making delivery more convenient for both parties. However, it also brings some new challenges for distributors. They must not only comply with new compliance regulations but also look out for their own interests – preventing their digital assets from being stolen, shared, or misused.
This blog will cover:
- The requirements for manuals under the EU machinery directive
- The risks of publishing manuals online
- Why AI web scraping threatens manual publishers
- Popular methods to protect internet content from web scraping
- What to look for in a machinery manual DRM
- How Locklizard Safeguard can protect your machine manuals
What are the requirements for manuals under the new EU machinery directive? |
The European Union has laid out several requirements that businesses must adhere to to deliver their manuals digitally in the EU market. EU-compliant user instructions must meet the following criteria:
- Users must be able to download and print the instructions.
- Users must be able to access manuals offline for the machinery’s lifetime and at least ten years after the first sale.
- Publishers must provide paper copies on request at no extra cost. They must be delivered within a month.
- Non-professional machinery must include essential health and safety requirements in paper form, even if the instruction manual is digital. Professional machinery does not need to include safety components physically, but you still need to provide technical documentation on request.
- You must mark on the packaging or machinery where the user can obtain the digital instructions.
- The instructions must be in a language understandable to the target market.
Currently, the EU does not appear to enforce a specific file format for instructions, although PDF would be a logical choice. It’s worth noting that the act also allows you to submit your declaration of conformity or conformity assessment in digital format. However, they must be provided via a valid internet address or machine-readable code.
The risks of publishing manuals online
Online publishing comes with several risks that manuals are not immune to. The natural consequence of making documents available to a global audience of billions is that some are bound to bad actors. You may have to contend with various challenges, including:
- AI web scraping
- IP theft
- Unauthorized sharing and redistribution
- Inability to take manuals out of circulation (potentially leading to legal liability)
- Unauthorized modification
- Easier access by competitors, which could lead to loss of competitive advantage or reverse engineering
- Attacks on the web server hosting the manual
Why AI web scraping threatens manual publishers
Generative AI has caused new problems for many businesses but particularly damages those selling or providing instructions, selling PDF files and the selling of ebooks. Web scraping for AI bots is largely indiscriminate, with companies looking to add valuable information to their dataset without regard for the impact on rights holders – as we have previously covered, copyright protection is often useless. There is a real risk that, in the future, users start turning to often inaccurate chatbots for their information rather than official sources.
Can you scrape data from a PDF?
Scraping text from a PDF is simple. Tools such as PDFminer use Python to automatically parse and analyze PDF documents. They can extract text, images, and tables of contents, as well as convert PDF files, tag, and compress data.
So, scraping a PDF might require slightly different methods than scraping a webpage, but it’s still easy enough to automate. At the time of writing, there is no clear legal framework to prevent AI companies from taking this data without license or consent and training their bots on it. In the future, users may turn to tools like ChatGPT before official instructions, leading to serious inaccuracies.
Popular methods to protect digital manuals from web scraping
There are two main ways to protect manuals from scraping, sharing, and theft. One involves securing a webpage on which the instructions are embedded, and the other involves securing the file itself. One of these is clearly better than the other, and we’ll explain why.
Why protecting a web PDF is so difficult
The most tempting way to provide instruction manuals is to embed them in an existing website. This allows companies to leverage their existing web provider or use a cloud document portal for convenience on both sides.
Unfortunately, while security and convenience sometimes line up, they don’t in this case. PDFs embedded in webpages are prime targets for automatic web crawling and mass unauthorized sharing. Even if manuals are protected online, users can find simple ways around the security measures.
For several reasons, it’s almost impossible to protect PDFs in the browser adequately:
- Developers cannot adequately enforce document controls because doing so effectively requires the ability to disable certain system functions. Browsers cannot access resources at the system level because this would open up users devices or computer systems to malware.
- Passwords are the only real option to limit access to specific web pages/portals. This is a poor method because they can easily be shared, along with usernames and 2FA codes.
- If your manuals are publicly available, you can expect them to be AI web-scraped and used by chatbots shortly. Competitors may also steal them to use in reverse engineering.
- It is tough to prevent PDFs from being downloaded from a browser, and EU regulations require you to allow downloads anyway. Without additional protection in the file, your manuals can be modified and reuploaded or shared on piracy sites.
The bottom line is that hosting PDF manuals online will almost certainly lead to them being mass-shared, modified, scraped, and analyzed by your competitors. If that is a reality you are willing to accept, more power to you, but it is untenable for companies that must be more protective of their manuals.
Embedding protection in PDF manuals
Embedding restrictions or controls into files is the only thing that makes sense when you are required by law to allow downloads. That said, it is critical that you choose the right type of protection if you want this to have any measurable impact. There are many file protection options for PDF, but most are unsuitable for our use case:
- PDF password protection: Users who know open passwords (required to open a password protected PDF) can remove them and share the unprotected document with others. Permissions passwords are easily removed with free online tools, as outlined in How secure is Adobe PDF encryption?.
- PGP file encryption: Is good at protecting files in transit and at rest but doesn’t change what a user can do once they’ve opened the file. They can still copy and paste, modify, and redistribute at will. Both parties also need to share keys with each other in advance, which isn’t feasible at scale.
- PDF certificates: Are better at preventing unauthorized opening than passwords, but are simply a way to exchange encryption keys, not prevent PDF editing or sharing.
The best way to embed protection in PDF files and protect online manuals from copying, editing and sharing, is to use a DRM solution. Digital Rights Management solutions are designed to protect businesses’ IP in numerous ways, conveniently and at scale.
Check list for a content protection system for digital manuals
As far as we see it, there are four main security requirements a DRM for machine manuals should meet:
- It should allow publishers to restrict manual access to only their customers to mitigate the risk of reverse engineering and other issues.
- The DRM should prevent the manual from being scraped for use in chatbots and other AI, as inaccuracies may lead to improper usage and maintenance.
- It should stop the modification and redistribution of manuals, as this could also lead to safety and reliability issues.
- The DRM must offer a way to take outdated manuals out of circulation, particularly if they contain an error.
Importantly, it must be flexible enough to meet these requirements while still meeting the EUs stipulations of allowing printing, downloads, and offline access.
How Safeguard can protect your machine manuals
Locklizard Safeguard is a PDF DRM solution that is purpose-built to protect documents. It uses a combination of AES 256-bit encryption, transparent licensing, and a secure viewer application to:
- Enforce editing and copy/paste controls that cannot be removed or bypassed.
- Prevent screenshots and/or blank the viewer window out when it is not active.
- Stop online and offline PDF scraping.
- Restrict access to specific users.
- Expire manuals based on date, days since first open, number of opens, or number of prints.
- Manually revoke access regardless of where the document is located.
- Disable printing or restrict it to black and black-and-white or greyscale copies/
- Add dynamic, irremovable watermarks that can include usernames, date/time, company, publisher, and more.
- Allow fully offline USB distribution. Safeguard PDF portable for USB can distribute secure viewers, documents, and key stores on USB devices as a complete offline solution with no need for admin privileges or firewall setup.
- Stop mass sharing and redistribution. Manuals are locked to individual authorized devices so they cannot be shared.
Critically for the EU machinery directive, our DRM works fully or partially offline. Locklizard provides the best content protection system for digital manuals, whether they are online machine manuals or offline ones. We lock digital manuals to devices so they cannot be shared.
You can how Locklizard can help you comply with the EU machinery directive, prevent AI scraping and protect your IPR, by taking a 15-day free trial of our PDF DRM software.
FAQs
What is the current EU Machinery Directive?
The current machinery directive is 2023/1230, which replaced 2006/42/EC on 14 June 2023. However, it does not become mandatory until 20 January 2027.
What is the EU work equipment directive?
The 2009/104/EC work equipment directive outlines minimum health and safety requirements when using work equipment in the workplace.
Why is the EU machinery directive necessary?
A standardized set of safety, instruction, component, and other requirements makes it easier for machinery makers as they only need their products to follow a single set of rules rather than catering to individual member states. At the same time, it helps to guarantee a baseline safety of machinery across the union while facilitating the free movement of machinery between EU member states without additional national approvals.
What happens if my machine manuals do not comply with the EU directive?
You may fail to receive the CE marking and therefore be unable to legally sell or put your machinery into service in the European Economic Area (EEA) until it is rectified. There is also the risk of fines or penalties from market surveillance authorities as well as product recalls, liability for accidents and injuries, etc.
Can you protect machine manuals with a secure browser based solution?
No. Take a read of How secure are virtual datarooms and How secure are Google Docs for examples of how users can easily remove protection.
Can you protect machine manuals in PDF format with a password?
No. It is simple for users to edit a password protected PDF. PDF restrictions to prevent editing, copying and printing, are completely useless since they can be bypassed or instantly removed.
Can you use AI to scrape the web for data?
Yes, it is simple to do with AI tools. If you are therefore publishing machine manuals online, you should make sure they are protected with a content protection system for digital manuals that prevents web scraping of PDF files. This can only be achieved by ensuring only authorized users can view encrypted content, that manuals are locked to devices so they cannot be shared, and that digital content is only viewable using a secure viewer app that can enforce DRM controls.