September 02, 2010, 03:22:27 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: The wait is over - Coppermine 1.5.8 [stable] is here
It is with great pride that the Coppermine community announces to the world the immediate availability of Coppermine 1.5.8 stable. As a community of contributors, we have been through celebrations and tribulations. We have had fun, and we have worked hard. We have learned from and helped one another. What started two and a half years ago has grown into what we believe to be the finest PHP photo gallery yet.
[more]
   Home   Help Search Board rules Login Register  
Pages: [1]   Go Down
  Print  
Author Topic: What about Similiar Images with metric distance functions?  (Read 3776 times)
0 Members and 1 Guest are viewing this topic.
da4walker Topic starter
Coppermine newbie

Germany Germany

Posts: 6


« on: November 24, 2009, 07:30:21 am »

Hi folks,
there are several methods to find similar images by using color histograms. This doesn't work bad and is really interesting when you have a lot of family pictures...
It works like this:
U select a picture and want other pictures which are similar to this one. Then for the first picture some kind of signature is created and compared to the other signatures in the database (from all the other pictures).

I am not requesting someone to implement this feature. I think i could implement it myself.
So is there interest for such a feature?
If so, I would then contact some developers here for talking a bit in detail.

cu
Logged
Joachim Müller
Dev Team member
****
Gender: Male
Germany Germany

Posts: 47735


aka "GauGau"


WWW
« Reply #1 on: November 24, 2009, 09:26:14 am »

There is a mod that creates histograms: http://forum.coppermine-gallery.net/index.php/topic,18759.0.html
The enlargeIt plugin by Timo features a histogramm button as well. What exactly do you propose? Do you propose to create a histogram on upload and store that inside the database to be able to come up with code that allows you to search the database for identical or similar images based on the histogram? That would be a cool feature, but I'm not sure about the impact on resources that may be huge.
Anyway, I'm interessted to hear more, please elaborate.
Logged
da4walker Topic starter
Coppermine newbie

Germany Germany

Posts: 6


« Reply #2 on: November 24, 2009, 10:22:12 am »

Hi Joachim,

As you allready noticed the intension is the following one:
- on each picture upload a histogram vector is created, which is stored in the database. (e.g. v(n1,n2,n3,...)
  of course when we later want to search for the most similar pictures we have to calculate a metric distance, e.g. euclidean, manhatten, and so on.., for every vector in the database. Although this operation is linear in complexity it would take really long if you have thousands of pictures in you database.
- to speed this up I would suggest to calculate a certain value of each vector and store it in the database too, and also index it with a btree or something like that. This value could be the euclidean distance on the vector itsself: value=(n1^2+n2^2+n3^2)^(1/2).  Now there comes the great thing about this value (at least i think so): If two pictures are similar in their histograms they surely will have a value which is similar (difference of the two values is not big). Of course there are a lot of other not similar pictures where the value will be similar although the pictures are not. But this doesn't matter because we surely will reduce the resultset to just a fraction of all the pictures in the database.

We first determine the pictures which are similar considering the calculated single value. Lets say we get 30 Pictures as result out of thousands. Because the value discribed above should be a lower bound considering the similarity, it is garanted that the most similar pictures will among this 30 pictures. Now we do some more calculation on similarity which will be ok for a low number of pictures and then we have the most similar pictures.

When this certain value is stored with a tree index, which should be able in mysql, this thing could work well.

Here is a PDF from some lecture where the idea of distance functions is described a bit.

Any questions on this?

cu


* Distanzfunktionen.pdf (205.1 KB - downloaded 325 times.)
Logged
Αndré
Administrator
*****
Gender: Male
Germany Germany

Posts: 4038


aka eenemeenemuu


« Reply #3 on: November 24, 2009, 02:12:05 pm »

Sounds very interesting Cool
Logged
da4walker Topic starter
Coppermine newbie

Germany Germany

Posts: 6


« Reply #4 on: November 24, 2009, 05:38:54 pm »

So does anyone know which library i could use to calculate color histograms from pictures in php? I read some doku of gd library but is seems to bee just usefull for creating images...
Need some hints on that so I can start this thing in php. I will then start creating such a script and see if it also works in a good speed.

cu

Logged
da4walker Topic starter
Coppermine newbie

Germany Germany

Posts: 6


« Reply #5 on: November 24, 2009, 05:51:56 pm »

Ok i found it, GD Library has at least a function to get some pixel information:
int gdImageGetPixel(gdImagePtr im, int x, int y) (FUNCTION)
So I have to create the Histograms by myself, which should be just some writing, and not difficult.

cu
Logged
Phill Luckhurst
Administrator
*****
Gender: Male
United Kingdom United Kingdom

Posts: 2615



WWW
« Reply #6 on: November 24, 2009, 06:59:08 pm »

Take a look at the enlargit plugin ( http://forum.coppermine-gallery.net/index.php/topic,57424.0.html ) as that generates histograms in php. Might save you a bit of time re-inventing the wheel.
Logged

It is a mistake to think you can solve any major problems just with potatoes.
da4walker Topic starter
Coppermine newbie

Germany Germany

Posts: 6


« Reply #7 on: November 24, 2009, 07:12:34 pm »

oh yeah, should have read joachims post better, he allready mentioned this.
found the code in the enlarge mod, makes everything a bit faster i think now Smiley

thnx for your hint.
cu
Logged
Joachim Müller
Dev Team member
****
Gender: Male
Germany Germany

Posts: 47735


aka "GauGau"


WWW
« Reply #8 on: November 26, 2009, 08:53:45 am »

The enlargeit plugin basically just uses the histogram code taken from the mod I refered to initially, so you don't actually have to look at the enlargit plugin (as the main purpose of that plugin lies somewhere else), but just look at the mod (Histogram added.).
By coincidence I have been working on the histogram part of the enlargeit plugin for cpg1.5.x the past three days (adding an option to the plugin to cache the histogram images properly and maintaining that cache efficiently).
Keep in mind though that GD2 (which is needed for that mod) is not available everywhere and that it's really a resource hog. As you don't need to actually create graphical resources, but are just interessted to come up with a vector and a calculated value based on that vector, the resources consumption should be neglible.
I can see what you're trying to do now, and as far as I can see it could be accomplished by creating a plugin (instead of a mod, which basically is a hack or coppermine's core code). Maybe you will need some additional plugin hooks, but that would be acceptable.
The real tricky part is to come up with the search queries in the end to refine your search.
Logged
da4walker Topic starter
Coppermine newbie

Germany Germany

Posts: 6


« Reply #9 on: December 03, 2009, 08:52:19 pm »

Hi folks,

after coding a while i have managed to create a standalone working system.
I created a jApplet to upload complete folders of pictures, which are indexed into the database.
On a html page u can select another picture to uploaded and compared. After that all similar pictures (in a certain range) are shown according to their similarity.
For some pictures it works really well, for some others it doesn't work the best. But it's really cool as a plugin which proposes similar pictures....

I attached an example result. The algorithm just uses histograms of the pictures which are stored as a vector.
This example takes under 2 second with a database which has 1000 pictures in it.
To reach this speed i made for each color, red, green, blue  small histograms with just 3 bins which are stored and index in the database. My first idea with the euclidean distance of the vector didn't work because it wasn't a lower bound and lost some good results.

I have another idea to make this algorithm much faster and will test it the next days.

If someone has some pictures (not too big each) he could upload it somewhere, so that i have more pictures to test with. I am thinking of about a few 10 000 pictures at least so that i can say something about a big database (at least for pictures).

the only thing you need on the server side so far is the gd library for calculating the histograms. But also if you don't have such a library you could calculate the vectors offline on your computer and put it into the database with a link to the picture.

After this thing is working i will go through the code and comment it a bit, but then i would need a bit your help because I have no idea of creating a plugin for coppermine.
Put this could do someone of you, i could make some functions for all these algorithm so that it can be easy used.

What do you think about this?


* exampleResult.png (249.44 KB, 1413x545 - viewed 120 times.)
Logged
Αndré
Administrator
*****
Gender: Male
Germany Germany

Posts: 4038


aka eenemeenemuu


« Reply #10 on: December 04, 2009, 08:30:54 am »

The algorithm has to be accelerated a lot to avoid timeout problems on larger galleries. I have a gallery with ~75k pictures on a free webhost (funpic.de), where we can test your plugin once it's created Smiley
Logged
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC
Page created in 0.135 seconds with 15 queries.