I just started using the gallery recently, and only have about 300 files uploaded. I ran XENU's Link Sleuth to check for bad links and build a sitemap and it turned up over 6,800 links. So, yes, a lot of duplicate pages. I wasn't sure where to begin, so thanks for pointing out the pageheader function.
Here is how I set it up to get rid of duplicate pages. I have only briefly tested it, but it seems to work. Would be grateful if anyone pointed out any mistakes I made or problems with the implementation.
Step 1: The first thing I needed was to grab the php_self and php_request variables,
$php_self = $_SERVER["PHP_SELF"];
$php_request = $_SERVER["REQUEST_URI"];
Step 2: Next I needed to check for pages I wanted to be indexed and followed,
Index.php includes the home page and the category pages, so all of these need to be indexed and followed,
$php_self == 'index.php'
I want the search page to be indexed, this of course is optional,
$php_self == 'search.php'
The following will check for strictly the album pages,
preg_match( '/^.thumbnails.php.album.[0-9]+$/', $php_request )
If an album has multiple pages, I want to index the first page ( i.e. /thumbnails.php?album=1 ).
I don't want to index the first pages duplicate in a multi-page album ( i.e. /thumbnails.php?album=1&page=1 ).
I also need to index the other pages ( i.e. /thumbnails.php?album=1&page=5 ).
preg_match( '/.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})/', $php_request )
Next I need to index the files,
preg_match( '/.displayimage.php.pos..[0-9]+/', $php_request ) )
Step 3: Now I need to check for pages I don't want indexed or followed (to keep the web crawlers from wasting bandwidth crawling unindexed duplicate pages).
Check for thumbnail.php duplicate albums; lastup, lastcom, topn, toprated, favpics, and search,
preg_match( '/.thumbnails.php.album.[lastup|lastcom|topn|toprated|favpics|search]/', $php_request )
Tell it not to index ratepic.php or addfav.php,
preg_match( '/.[ratepic|addfav].php/', $php_request )
Step 4: Anything that isn't included above will be followed but not indexed.
Here is the code, just add it to or modify it in your "theme.php" file,
// Function for writing a pageheader
function pageheader($section, $meta = '')
{
global $CONFIG, $THEME_DIR;
global $template_header, $lang_charset, $lang_text_dir;
$php_self = $_SERVER["PHP_SELF"];
$php_request = $_SERVER["REQUEST_URI"];
if ($php_self == 'index.php' ||
$php_self == 'search.php' ||
preg_match( '/^.thumbnails.php.album.[0-9]+$/', $php_request ) ||
preg_match( '/.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})/', $php_request ) ||
preg_match( '/.displayimage.php.pos..[0-9]+/', $php_request ))
{
$meta .= '<meta name="robots" content="index,follow" />'."\n";
}
else if (preg_match( '/.thumbnails.php.album.[lastup|lastcom|topn|toprated|favpics|search]/', $php_request) ||
preg_match( '/.[ratepic|addfav].php/', $php_request ))
{
$meta .= '<meta name="robots" content="noindex,nofollow" />'."\n";
}
else {
$meta .= '<meta name="robots" content="noindex,follow" />'."\n";
}
$custom_header = cpg_get_custom_include($CONFIG['custom_header_path']);
$charset = ($CONFIG['charset'] == 'language file') ? $lang_charset : $CONFIG['charset'];
header('P3P: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"');
header("Content-Type: text/html; charset=$charset");
user_save_profile();
$template_vars = array('{LANG_DIR}' => $lang_text_dir,
'{TITLE}' => $CONFIG['gallery_name'] . ' - ' . strip_tags(bb_decode($section)),
'{CHARSET}' => $charset,
'{META}' => $meta,
'{GAL_NAME}' => $CONFIG['gallery_name'],
'{GAL_DESCRIPTION}' => $CONFIG['gallery_description'],
'{SYS_MENU}' => theme_main_menu('sys_menu'),
'{SUB_MENU}' => theme_main_menu('sub_menu'),
'{ADMIN_MENU}' => theme_admin_mode_menu(),
'{CUSTOM_HEADER}' => $custom_header,
);
echo template_eval($template_header, $template_vars);
}
I have one more concern about duplication issues, although I should probably save it for a separate thread. Google recently added a section for their webmaster tools that shows pages that have duplicate titles. I was going to suggest that when a person is uploading images that the CPG software warn them if a duplicate title already exists in the database. Not forcing a person to change it, but warning them that they will have 2 pages that feature the same title. Just a thought.