Writing docs for the new bridge system is something I aim to get around to doing soon, but I'll answer these questions now.
collect_groups is to grab the custom groups from the forum where the forum has some special requirement or query to get the groups list, eg phpbb has an extra condition in the sql, for some forums we can just read them straight from some already provided user data array. The +100 is so that the groups from coppermine and groups from the forum do not conflict. Coppermine groups would be eg 1,2,3,4 and the additional forum groups would start at around 100. (some group ids can be negative). As a sidenote, the term post-based-groups is a little misleading, this actually refers to using the true groups defined in the forum, not specifically post based ranks.
udb_hash_db can be used where is some seed unique to the installation that needs to be combined with the database password to form the cookie password. punbb in particular uses this. If both the db and cookie hold the same type of hash then you don't need to use this.
Yes, you can provide an existing db connection id to connect() at the end of the constructor. SMF is a good example of where we do this.
get_groups is used to get an array of all groups that the user is in for forums that support multiple groups. Multiple group membership is implemented in different ways in different forums.
session_extraction and cookie_extraction are 2 different ways to get auth info from the user. Many forums use 3 cookies, 1 with a session id, 1 with the user id and 1 with a password hash. Some use full blown sessions, so that can be used too. It should return the id and password hash the browser is providing. Having 2 methods makes things more reliable, the IPB2 bridge is a good example here. The session method takes priority.
You can of course skip those 2 functions and do all the authentication yourself by overriding the authenticate method. The phorum bridge does this, we include its authentication file and trust the information that comes back.