You’re not alone, and the reason isn’t always what you think.
Across major UK news sites, anti-bot defences are rising. Real readers get mistaken for scripts, raising tempers and questions.
Why websites are tightening the gates
Publishers have moved from simple captchas to sophisticated behaviour checks in response to automated scraping and industrial-scale text and data mining. News Group Newspapers, which publishes The Sun, explicitly bars automated access to its content in its terms and conditions. The ban covers direct scraping and the use of intermediaries that fetch or process content on behalf of others, including for artificial intelligence, machine learning and large language models.
Automated access and text/data mining, including for AI, machine learning and LLMs, are prohibited under publisher terms and require prior permission.
The aim is straightforward: protect intellectual property, preserve advertising value, and keep infrastructure stable when traffic surges. The side effect is predictable. Some readers who do nothing wrong hit a verification screen or a hard block that looks accusatory and final.
When real people look like bots
Bot-detection tools profile patterns more than identities. They weigh signals like speed, sequence, fingerprints and network origin, then score a session as risky or safe. You might be holding a phone on the Northern line, juggling tabs, and copying snippets for a group chat. To an automated system, that can resemble a script.
| Signal | What triggers it | How to fix |
|---|---|---|
| Rapid-fire actions | Multiple page requests in under a second, constant refresh, mass opening of new tabs | Slow down; open fewer tabs; avoid auto-refresh extensions |
| Blocked scripts or cookies | Hardened privacy settings, script blockers, disabled local storage | Allow JavaScript and first-party cookies for the site; whitelist just that domain |
| Network anomalies | VPNs, rotating proxies, or IPs previously linked to automation | Turn off the VPN or choose a UK endpoint; stick to one connection briefly |
| Odd browser fingerprints | Headless browsers, spoofed user agents, unusual screen or timezone mismatches | Use a standard browser; align device timezone with your location |
Five quick moves that often clear the check
- Refresh once and wait 10–20 seconds before trying again. Multiple reloads can raise suspicion.
- Enable JavaScript and cookies for the news site. Disable extensions that block scripts or rewrite pages.
- Pause your VPN or proxy. If you need one, pick a stable endpoint and avoid rapid hopping.
- Close automation tools and aggressive scrapers. Remove auto-refreshers or tab managers that prefetch many pages.
- Still stuck? Note the time, your IP and any request ID on the page. Send those to [email protected]. For commercial access to content, use [email protected].
Legitimate users can be misread as automated. Keep a record of your block screen and contact [email protected] for support. Commercial reuse requests go to [email protected].
The AI training rush meets publisher pushback
Generative models feed on vast text corpora. Newsrooms argue that years of reporting cannot be vacuumed up to train commercial systems without consent or payment. That tension has turned legal and technical. Expect more publishers to tighten terms, harden bot defences, and demand licences for any systematic collection. The trend is not about stopping a single curious reader; it targets mass-scale harvesting that repackages journalism into other products.
What counts as text and data mining
Text and data mining means systematic analysis of content to extract patterns. That includes scraping entire sections, building datasets for model training, or using a third-party crawler to fetch and process pages. It also covers repeated access by automated tools, even if run by a human on a desktop. Many publishers now spell out that both direct and indirect methods breach their terms without prior permission.
What to do if your work relies on scraping
Some teams genuinely need structured access, from compliance tracking to media monitoring. The route that avoids conflict is a licence or an official feed. Ask about rate limits, caching windows and how attribution should appear. Keep logs of your usage. A small paid trial often opens the door to longer-term access with clear boundaries. Trying to cloak requests behind rotating IPs usually gets noticed, and the fallout can affect unrelated users on the same network.
What this means for your daily reading
For most people, a block arrives during a news rush: a big match, a breaking story, a celebrity case. Traffic spikes make systems more cautious. A simple rule helps—browse like a person, not a script. Read a page before opening the next. Avoid copy-pasting at machine pace. Keep privacy tools, but whitelist a handful of trusted news sites you read every day.
Three quick checks before you think you’re banned
- Open the site in a standard browser profile, not incognito, to allow cookies to do their job.
- Turn off the VPN for five minutes and retry. If it works, choose a different endpoint next time.
- Disable one extension at a time. Start with script blockers, anti-fingerprinting add-ons and auto-refreshers.
Why the wording looks so stern
Legal text on block pages can sound harsh because it must cover every scenario, from hobbyist crawlers to commercial AI labs. The same page often mixes two messages: readers who need a quick nudge to pass verification, and organisations that need a formal licence before touching any content at scale. That is why you will see a support email alongside a rights email. Each serves a different audience.
A quick scenario to test your setup
Imagine you open five articles in two seconds while running a full tracker blocker and a VPN. That combination scores as high risk for many systems. Now change two variables: switch off the VPN and allow first-party cookies. Wait three seconds between clicks. In most cases, you will slip back into the clear without needing to fill a puzzle or email support.
What publishers usually approve—and what they reject
- Likely approved with a licence: structured feeds for headline monitoring, limited index building, internal research tied to a contract.
- Often rejected outright: bulk copying for AI training without permission, redistributing full articles, or using intermediaries to sidestep rate limits.
- Case by case: academic projects with non-commercial aims, provided access is narrow, time-bound and respectful of technical controls.
If you need help today
Stuck on a verification loop as a normal reader? Write to [email protected] with the exact time of the block, your IP address and any request or error ID printed on the page. Seeking permission to reuse or mine content for a product or a model? Email [email protected] with a clear outline of your purpose, volumes and time frame. Each minute spent preparing those details usually shaves hours off the back-and-forth.









Great walkthrough. Refresh + wait + disabling my auto‑refresher did the trick; I was back in under a minute. Thanks!
Feels like « privacy = suspicious » is the underlying assumption. Why should readers have to weakn their privary tools just to read? Not convinced this is only about bots.