mikemccaffrey,
@mikemccaffrey@drupal.community avatar

We have a client site where some crawler bot has been repeatedly coming to the site for years and trying to access hundreds and hundreds of bad urls. All of the requests from one of their previous episodes caused to bump the account up to the next tier, costing the organization hundreds of dollars more a month. Any ideas on how to keep an out of control crawler off of a pantheon site?

kreynen,
@kreynen@fosstodon.org avatar

@mikemccaffrey this probably won't help in your use case, but Pantheon customers paying for their Advanced Global CDN (partially unlocked Fastly access) can block the source of "malicious" requesters at that level. In addition to the traffic limit changes, there are also changes coming to the AGCDN pricing. Hopefully those changes will made it more affordable for smaller organizations.

mikemccaffrey,
@mikemccaffrey@drupal.community avatar

@kreynen Is there an interface for managing that?

kreynen,
@kreynen@fosstodon.org avatar

@mikemccaffrey yes, but not all configurations/VCL options can be done as self-service... and similar to my motivation for writing https://www.drupal.org/project/pantheon_autopilot_toolbar, you can't actually navigate to https://agcdn.ps-pantheon.com/ through the Pantheon's UI. The Edge Cache icon in the left hand navigation of the Pantheon Dashboard links to a page promoting the AGCDN and telling you to contact your sales person... even if you are paying for it.

jurgenhaas,
@jurgenhaas@fosstodon.org avatar

@mikemccaffrey
There is @CrowdSec to solve this problem. And the integration at https://www.drupal.org/project/crowdsec allows you to use their shield and also automatically ban any IP for a configurable period of time, if they're crawling your site that way. E.g. after 3 consecutive 404s, banned.

mikemccaffrey,
@mikemccaffrey@drupal.community avatar

@jurgenhaas @CrowdSec Interesting. I wonder if returning a 403 error would be any more effective than the 404 errors it is already ignoring.

jurgenhaas,
@jurgenhaas@fosstodon.org avatar

@mikemccaffrey @CrowdSec
Actually, it takes all 4xx request responses into account.

froboy,
@froboy@mastodon.online avatar

@mikemccaffrey can you block the user agent? Also, related, made some big announcements related to overages today: https://pantheon.io/blog/enhanced-pantheon-overage-policy-traffic-limits-and-pricing

mikemccaffrey,
@mikemccaffrey@drupal.community avatar

@froboy Define "block". I can tell the crawler is this idiot because it has been crawling the same non-existent .html addresses for over five years now, but all I can do it return errors at it which it ignores.

Archnemysis,
@Archnemysis@mastodon.social avatar

@mikemccaffrey @froboy You would need to do the block at the CDN layer before it ever gets to pantheon to avoid the traffic hitting pantheon. Cloudflare has decent bot management tools you can use beyond just blocking a specific IP. My guess is it would easily identify this crawler and every script kiddie trying to reach wp-login.

Archnemysis,
@Archnemysis@mastodon.social avatar

@mikemccaffrey @froboy The pro plan @ $25/month would likely be sufficient: https://www.cloudflare.com/learning/bots/what-is-bot-management/ but if the customer is on performance medium at Pantheon facing a performance large, the Cloudflare business plan @ $250/month would still be more cost effective.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • drupal
  • DreamBathrooms
  • InstantRegret
  • thenastyranch
  • magazineikmin
  • everett
  • rosin
  • Youngstown
  • slotface
  • cubers
  • cisconetworking
  • kavyap
  • GTA5RPClips
  • osvaldo12
  • tacticalgear
  • megavids
  • khanakhh
  • mdbf
  • Durango
  • ngwrru68w68
  • tester
  • normalnudes
  • ethstaker
  • provamag3
  • modclub
  • Leos
  • anitta
  • JUstTest
  • lostlight
  • All magazines