forum.coppermine-gallery.net

Support => cpg1.5.x Support => cpg1.5 miscellaneous => Topic started by: Walkinman on April 21, 2012, 09:34:11 pm

Title: weird url showing up
Post by: Walkinman on April 21, 2012, 09:34:11 pm
Hello

In my looking around to find why my cup usage is higher than it should be, I'm finding a lot of crawlers are hitting urls like this

http://example.com/stock/thumbnails-18-Whitetail-Deer-Photos.htmlhttp:/css/themes/water_drop/displayimage-search-0-260-ArctGrndSqrl_

clearly, the address (SEF plugin) for that page should be http://example.com/stock/thumbnails-18-Whitetail-Deer-Photos.html

but something is also sending a link to the extended (and wrong) url above. And they're crawling a lot of pages with that same kind of url

example.com/stock/thumbnails-18-Whitetail-Deer-Photos.htmlhttp:/css/themes/water_drop/displayimage-search-0-5503-Ski-tracks-in-snow-Wrangell-St-Elias-Natio.html

etc, etc

The only bots I see crawling that stuff are from China, particularly a baidu.com. I'm adding them to my blocked IP addresses, but I'm curious if maybe I have some thing coded incorrectly that's causing the above urls to be crawled.

Thank you.

Cheers

Carl
Title: Re: weird url showing up
Post by: Walkinman on April 21, 2012, 10:57:09 pm
ETA: I also noticed that this weird url is ONLY showing up via one album:

http://www.skolaiimages.com/stock/thumbnails-18-Whitetail-Deer-Photos.html

It doesn't show up with any of the other albums.

I've blocked baidu from crawling my site, but am curious if anyone might have an idea what is generating that set of urls.

Thank you.

Cheers

Carl
Title: Re: weird url showing up
Post by: Walkinman on April 21, 2012, 10:58:53 pm
"cup usage" should read "cpu usage" of course .. it'd be nice if at least some editing of posts were allowed.

Thanks.
Title: Re: weird url showing up
Post by: Walkinman on May 02, 2012, 06:20:42 am
hello - is it possible for an admin to PLEASE edit the first post here, and change the domain name to example.com ... I'm getting hammered by google .. over 5500 'file not found' entries and rising.

What's weird is that once the url-Whitetail-Deer-Photos.htmlhttp:/css/themes/ starts, it then tries to crawl the entire coppermine-gallery with that kind of thing.

I shouldn't have typed the correct domain name in the post. Please edit or delete it.

Thank you.
Title: Re: weird url showing up
Post by: Αndré on May 02, 2012, 11:42:54 am
Edited as requested.
Title: Re: weird url showing up
Post by: Walkinman on May 02, 2012, 07:43:22 pm
Thanks so much, André. I shouldn't have been so stupid as to post it with the url.

What I don't understand is how a crawler accesses that one url, it then proceeds to try to crawl every link in the site with that string as the precedent. It'll put searches like "displayimage-search-0-260-ArctGrndSqrl_"and search every single keyword, and display a page for each one, with http://example.com/stock/thumbnails-18-Whitetail-Deer-Photos.htmlhttp as the first part of the string. All those pages will appear messed up, as the css doesn't apply correctly.

Thanks again for editing the post.

Cheers

Carl