Are you a robot or a reader: why 49% of clicks trigger checks and what it costs you in minutes

You may have landed on a page asking you to prove you are real. The prompt looks blunt, but it shows a broader shift. Publishers are tightening rules against automated access and data mining, including use by AI models. In rare cases, genuine people get swept up by the filters.

What the message means

News Group Newspapers, the company behind The Sun, is telling visitors that automated collection of content is banned under its terms. That ban includes text and data mining for artificial intelligence, machine learning and large language models. The policy applies whether the collection is direct or run through a third-party tool. It also states that those wishing to reuse content commercially should seek permission first.

Automated scraping for AI training or any text/data mining is not permitted under the publisher’s terms. For commercial reuse, ask permission at [email protected].

Sometimes the system mistakes normal behaviour for a bot. That can happen when browsing patterns look unusual. It can also happen when your network hides your real origin. If you are a legitimate reader, the company invites you to get in touch to get the block lifted.

Legitimate users who are flagged can contact [email protected] to request support and restore access.

Why you might be flagged

Automated detection tools score your session using dozens of signals. A high score triggers a challenge or a lock. Many signals relate to speed, sequence and device metadata.

Common triggers you can avoid

Rapid-fire clicks, instant page refreshes or opening many tabs within seconds.
VPNs, proxies or shared corporate networks that mask your location and mix traffic.
Strict ad or script blockers that break page scripts or fingerprinting checks.
Privacy extensions that randomise headers or disable cookies needed for verification.
Old browsers, headless modes or automation frameworks that mimic bot traits.

These traits do not prove bad intent. They raise suspicion scores. A brief challenge follows. Most checks take a few seconds, then you move on. If you keep seeing them, your settings likely need a tweak.

The time cost for readers

Anti-bot gates protect content, but they do add friction. Here is an illustrative scenario showing how small delays stack up across a year.

Frequency of checks	Average seconds per check	Checks per year	Time lost per reader
1 per week	25	52	21 minutes
2 per week	25	104	43 minutes
1 per day	20	365	2 hours

On mobile, solve times can stretch when images and scripts load slowly. That compounds the delay. A few small checks a week still add up over twelve months.

The AI rush and the rules around scraping

Publishers face a new wave of scraping. AI developers want large, varied datasets to train models. That demand fuels automated collection. Many newsrooms view that as unlicensed reuse. They point to their terms and to copyright law. They also cite the cost of producing original journalism.

In the UK, a limited exception allows text and data mining for non-commercial research. Commercial use sits outside that carve‑out. In the EU, rights holders can reserve content from mining by signalling their choice. Many media groups now assert those rights through technical and legal notices. News Group Newspapers’ warning follows that direction.

Some AI firms pitch paid licences for training. Others argue that public web pages form part of fair dealing. The debate remains unsettled in courts and parliaments. While that unfolds, site owners deploy stronger gates. Readers feel the effect through more checks and stricter rate limits.

What you can do right now

You can reduce the risk of getting flagged without weakening your privacy too much. Small changes help your session look human while keeping your data safe.

Keep one tab per article and avoid instant refreshes after errors.
Allow first-party cookies for the site to store verification tokens.
Update your browser to the latest version and disable headless modes.
Whitelabel core scripts on the site so the check can run, then re-enable stricter rules after.
Turn off aggressive “anti-fingerprinting” features for news domains you trust.
Switch off the VPN for a minute if you hit repeated challenges, then switch it back on.
If the block persists, email [email protected] from the affected address with a brief description and your approximate time of the block.

Running a legitimate bot or a feed?

Some actors run price trackers, brand monitors or media summaries. If your use is commercial or public-facing, seek permission first. Identify yourself, your purpose and your rate limits. Respect robots.txt and any “no text and data mining” flags. For this publisher, send requests to [email protected] before you collect anything at scale.

Why publishers are doing this

Industry studies suggest that automated traffic accounts for a large share of web requests. A significant slice is malicious, aiming to copy or overload content. Newsrooms want to protect their archives, ad integrity and subscriber benefits. They also want control over how stories feed machine-learning pipelines. Verification screens act as a filter. They deter data harvesting and keep distribution on the publisher’s terms.

Behind the screen: how detectors think

Anti-bot systems use fingerprinting, timing analysis and challenge responses. A human tends to move a cursor, hesitate and scroll unevenly. A script moves in straight lines or reacts instantly. The system measures those micro-patterns. It also checks the browser for missing features. If too many flags stack up, a challenge appears. Complete it once and a token marks you as low risk for a period.

Extra context that helps readers

Text and data mining refers to automated techniques that extract patterns from large volumes of text, images or audio. Research teams use it to study language, markets and public health. Commercial operators use it to feed recommendation engines and AI models. The legal boundary often turns on purpose, licence terms and signals the site owner sends. When a publisher states a prohibition in clear terms, they set the conditions for access.

You can run a simple simulation to gauge your own cost. Time one verification on your device. Multiply by how many times you see it in a week. Multiply again by 52. Compare the total with the value you get from the site. If the delay feels heavy, adjust your settings or set up a free account if the site offers one. That can store a longer-lived token and cut repeat checks.

There are trade-offs. Strong privacy tools reduce tracking, but they also break scripts that prove you are human. A light, targeted approach often works better than blanket blocking. You keep control while giving just enough permission for verification to pass. That balance saves minutes and keeps access smooth during a period of tighter defences across the news industry.

cécile

28 octobre 2025 à 15:27

49% of clicks trigger checks? That stat feels high—what’s the methodolgy and sample size behind it? Is this across news sites broadly or skewed by a few domains with strict rules? Would love a link to the raw data or study notes.

Répondre

carolineenvol

28 octobre 2025 à 18:00

So my VPN + 10 open tabs + an old browser basically screams “bot”. Got it. Time to retire my 2009 habbits and stop speed‑refreshing like a caffeinated ferret.