Support Forum Project Downloads FAQ Documentation About Demo Tutorials Blog Plugins
November 21, 2009, 08:41:03 am *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Maintenance release cpg1.4.25 - upgrade recommended
The Coppermine development team is releasing an update for Coppermine in order to fix an issue with http uploads that could occur in particular versions of PHP.The fix is not security-critical, so if your gallery is running fine with cpg1.4.23 or cpg1.4.24 you don't need to upgrade. If you are running an older version than cpg1.4.23, you must update to this latest version as soon as possible because of the security impact (the past few maintenance releases before cpg1.4.24 all were security-related).
[more]
   Home   Help Search Board rules Login Register  
Pages: [1]   Go Down
  Send this topic  |  Print  
Author Topic: GSoC: Scalability  (Read 22829 times)
0 Members and 1 Guest are viewing this topic.
intorio Topic starter
Coppermine newbie

Posts: 6


« on: March 28, 2008, 01:20:42 am »

Hey, I thought I would start off getting some feedback on my abstract before I write up the detailed portion.

Code:
Abstract
=========
I propose adding distributed image serving functionality to coppermine. This
would greatly increase the capacity of each coppermine installation and reduce
the load on individual servers. With bandwidth getting cheaper, the desire to
share higher quality images continues to increase, and the associated load of
serving up those files, especially from shared hosting, becomes greater and
you risk consuming too many resources. With distribute image storing/serving,
this problem is greatly alleviated. Also, one might want to use a filestore
system that does not do anything but serve files, such as Amazon's Simple
Storage Service. Something like this has its own benefits and would be good to
take advantage of.

My system would add several methods of accomplishing this, choosable by the administrator:
1) Single off-site server. (Best for Amazon's S3)
2) A round-robin style service where each server is given load requests in turn.
3) A load-balance style service where the servers occasionally report in their system load to distribute future requests.

Each server added to the cluster could have a specific role:
a) Thumbnail Server
b) Picture Server
c) Both

Finally, there would be three replication modes
1) Distributed (Large data-sets)
2) Mirrored (Redundancy)
3) Distributed+Mirrored 

Thanks.
Logged
Joachim Müller
Administrator
*****
Gender: Male
Germany Germany

Posts: 45051


aka "GauGau"


WWW
« Reply #1 on: March 28, 2008, 07:39:32 am »

How do you propose to handle authentification (in terms of avoiding image and bandwidth theft)?
Logged
Tarique Sani
Dev Team member
****
Gender: Male
Posts: 2710



WWW
« Reply #2 on: March 28, 2008, 07:58:28 am »

How do you propose to handle authentification (in terms of avoiding image and bandwidth theft)?

I am not sure that authentication falls in the gambit of scaling. The distributed storage would be as secure as the current method of storage in CPG - if you know the direct URL you can see the image.

What I am more interested in seeing is the abstraction of storage methods.
Logged

SANIsoft PHP applications for E Biz
Joachim Müller
Administrator
*****
Gender: Male
Germany Germany

Posts: 45051


aka "GauGau"


WWW
« Reply #3 on: March 28, 2008, 09:38:33 am »

Let me clarify my concerns:

Let's assume that we have two website owners: Arthur Artist and Theodore Thief. Arthur is an anime artist - he is creating genuine images and for reasons of storage limitations on his webserver (where he runs coppermine) he is uploading his anime art to some external storage space - let's say Amazon's S3 storage. His files then reside on http://amazon.com/s3-storage-space/arthur-artist/. He then goes to coppermine and (assuming that the proposed project is being accepted and implemented) specifies in coppermine's user interface that his external images are residing at the external storage space. The pics stored at Amazon are being displayed embedded into Arthur's coppermine-driven gallery located at http://arthur-artist.tld/gallery/
Theodore Thief visits Arthur's gallery and decides that he wants to display Arthur's anime pics on his own gallery to promote his site. He performs a right-click, properties on the images and finds out that they are located at http://amazon.com/s3-storage-space/arthur-artist/. He goes to his coppermine-driven gallery located at http://theodore-thief.tld/gallery/ and uses the interface proposed by intorio to just specify the URL of the external pics. As a result, the pics owned by Arthur will be displayed on Theodore's page. There is no bandwidth theft applied (as the bandwidth of Amazon is being consumed, not the bandwidth of Arthur), however this is image theft in my opinion. In fact, it doesn't even matter wether Arthur uploaded the pics to external storage space or his own webspace: if the coppermine user interface should ever contain a method to specify external images to be embedded into the gallery, there needs to be a method that will only do so if the original owner of the external images has explicitely agreed to have his images displayed embedded on a particular sites.
There are countless images stored on public storage space (be it Amazon's S3, Imageshack, Flickr or whatever you could imagine). I wouldn't want coppermine to get a bad reputation as application that encourages users to steal images. That's why I suggest implementing a authorization check that will only allow you to specify external storage space if there is a particular file present in the folder of the external storage. This could be a file that follows a particular naming scheme; in my above example, this could be a file named arthur-artist.tld_gallery.txt or arthur-artist.tld_gallery.jpg. If that file is present on the external storage space (e.g. http://amazon.com/s3-storage-space/arthur-artist/arthur-artist.tld_gallery.txt), the proposed part of the coppermine user interface accepts the files stored on the external storage; if it doesn't exist, the external images are being rejected.

This is what I meant with authentification - I thought that we already had agreed long time ago (on the dev board) that this would be the pre-requisite for allowing external storage embedded in coppermine.

Please let me know your thoughts...

Joachim
Logged
Abbas Ali
Administrator
*****
Gender: Male
India India

Posts: 2087


Spread the PHP Web


WWW
« Reply #4 on: March 28, 2008, 10:46:29 am »

Here is what will happen:

  • At the time of installation the storage module will be determined. It can be normal file storage, ftp, s3 or any other
  • There won't be any change in current upload system front end i.e. user will see file upload boxes and he will upload the files to coppermine
  • On upload the file is placed in a temp folder on user's server. Currently we use move_uploaded_file to move the file to its final location but after storage module implementation the respective module's move method will be called and that method will move the file to the storage system (local file, ftp, s3 etc...)
  • The file will be moved to its destination on storage server
  • For file urls, there will be a get_url method in the storage module (class) which will take either pid or filename and will return the correct url of the file to be used in img src
  • So this system will basically be similar to what we have now, but the files can be stored on another server
  • The default storage module shipped with cpg will be normal http file system (i.e. what we have currently)

Joachim, the approach you mentioned above was the other way round i.e. upload the files separately (directly) on storage system and provide the URL in coppermine.

Yes - i agree that image urls will be visible to the thief but he will not be able to directly use that url in his coppermine installation. He will first have to download those images and then upload them to his coppermine installation.

I hope i was able to explain it clear.

Abbas
Logged

--- Love is blind, wish it was mute too ---
Visit me @ www.abbasali.net
Tarique Sani
Dev Team member
****
Gender: Male
Posts: 2710



WWW
« Reply #5 on: March 28, 2008, 11:10:50 am »

In addition to what Abbas wrote

This is what I meant with authentification - I thought that we already had agreed long time ago (on the dev board) that this would be the pre-requisite for allowing external storage embedded in coppermine.

We are going one step further - CPG will now ensure that if CPG  cannot upload to the remote storage it will not be able to display either, hope that satisfies your very valid concern.
Logged

SANIsoft PHP applications for E Biz
intorio Topic starter
Coppermine newbie

Posts: 6


« Reply #6 on: March 28, 2008, 02:55:22 pm »

My understanding of the problem was the same as Abbas Ali's.

I plan to write up the rest of the proposal this afternoon since there doesn't seem to be any issues with the core ideas.
Logged
Tarique Sani
Dev Team member
****
Gender: Male
Posts: 2710



WWW
« Reply #7 on: March 29, 2008, 04:17:24 am »

FWIW - there is already another proposal for this idea from zmarty (of this forums) may be you two can get together and sort out which part you want to code.

To re-iterate - We are looking for a way to abstract the image storage - storage modules which immediately come to mind are local filesystem/local but outside webroot/remote ftp upload/remote http upload/Amazon S3/Flickr/Picassa

It will hav two basic parts
#1 move after upload part
#2 create the image / media display URL

The default that CPG will ship with will be local filesystem. All others would be downloadable and need their own setup

The abstraction should be powerful enough to not need any tweaking of core CPG code if the user decides to use another storage module. Which storage module to use will be an install time decision and not changeable at least for now - devs are free to write storage module converters
Logged

SANIsoft PHP applications for E Biz
intorio Topic starter
Coppermine newbie

Posts: 6


« Reply #8 on: March 29, 2008, 08:20:24 pm »

So it would be optimal to add this functionality as a plug-in?
Logged
intorio Topic starter
Coppermine newbie

Posts: 6


« Reply #9 on: March 29, 2008, 08:28:58 pm »

Sorry, doesn't seem to be an edit feature.

I just discovered the 'picture_url' filter for the plug-in api, which essentially makes the choice obvious. Though it does not seem to be in the 1416 release, is it okay to make a proposal for the svn version?
Logged
SaWey
Dev Team member
****
Gender: Male
Belgium Belgium

Posts: 1119



WWW
« Reply #10 on: March 29, 2008, 10:36:05 pm »

Ofcource, that is the one you would be working on!
Logged
intorio Topic starter
Coppermine newbie

Posts: 6


« Reply #11 on: March 31, 2008, 08:20:33 am »

I know I am pushing the deadline here, I had some things come up this weekend I really wish hadn't.

Anyway, I was hoping for some last input on my proposal before I submit it, follows are the Abstract and Detailed portions of the application (not the entire application)

Code:
Abstract
========
I propose creating a series of plugins for the Coppermine that would add support
for distributed storage. This would increase the capacity of each coppermine
installation and reduce resource use on individual servers. This can be useful in
a number of cases, such as shared and vps hosting where resources are limited and,
under load, the server might be slow to serve up both the webpage and the images.

This system would have several methods of implementing distributed storage,
selectable in basic to advance configurations, all derived from existing
distributed methods.
1) Seperate server(s). (Best for Amazon's S3) (If >1, random server)
2) A round-robin service, each server going in turns.
3) A load service, similar to round-robin, but factors in the server's load.

Each server would be configurable to host only thumbnails, pictures, or combination.
There would be, essentially, two separate server clusters, thumbnail and pictures,
with combination belonging to both.

In addition, servers could be setup in several different ways. Unique, where they
are the only server with copies of the files (precludes round-robin, load service).
Mirrored, where each server in a server group replicates the others. And
Distributed+Mirrored, similar to raid-5, where some servers have unique files
in their group, and another group replicates them.

Finally, support for rule-based plugins. These would be plugins that have specific
mirroring rules, such as a certain user's photos always being placed on a specific
server. They would be implemented as a hook which would inform the plugin of
information regarding the picture and the the plugin returns a specific server or
nothing if the rule does not apply, and the system will use the normal settings.


Detailed
========

To distinguish a concept, I will refer to image server types (Amazon S3, FTP, etc)
that are part of the distributed system as 'providers'. The actual server entities
will still be referred to as 'servers'.

The primary plugin would be the one responsible for the entire distributed system.
It would handle the necessary hooks, such as 'picture_url', to implement itself in
the system. It would handle management of the different servers and the distribution
mechanism.

Primary functions:

1) Provide a mechanism for providers to register themselves.
2) Offer a clean interface for adding new servers. This would deferred to the
responsible plugin (as configuration requirements differ), but would be presented
through this plugin's configuration, to keep a consistent place of adding servers.
3) Allow the user to push files from one server to another.
4) Allow the user to delete servers.
5) Store deferred operations. Deferred operations being actions that could not be
completed on a remote server for some reason (disconnection, no write access, etc).
It would try again later. In the intermediate time, the picture(s) would be hosted
locally.
6) A non-trivial error system which would notify the admin as needed, such as: no
write access would be reported immediately, whereas disconnection would be reported
after a period of time.
7) Utilize the storage/rule plugin's interface to add/delete files.
8) Handle the distribution mechanisms and corresponding url modifications.

The setup for handling servers would be three clusters: Thumbnail, Picture, Both;
with their function corresponding to their name. Inside each cluster, are three
sub-groups: Mirrored, Distributed, Mirrored/Distributed. Mirrored/Distributed
having multiple sub-groups which indicate where the distributed portion ends, each
sub-group mirroring each other. Finally, there would be a category comprised of
servers not-used by the normal distribution mechanisms, but solely rely on the rule
based plugins to assign images to.

The storage plugins are responsible for connecting to, putting, and removing
files from their storage provider. When installed they register themselves with
the primary plugin. They are responsible for providing the interface to add a new
server, though their actual configuration pages should not have this functionality.

The storage plugins would be, as provided by this project, Amazon S3, FTP, and
Managed HTTP. Managed HTTP would be a php script installed on the remote server,
set up with a unique security key which would be required to communicate with it,
and would allow the primary plugin to add/delete files on the remote machine.

The rule plugins are responsible for handling special cases, or possibly even the
actual distribution mechanisms if the design allows for it. Essentially they
receive a notification each time a picture is added and are allowed to manipulate
its destination server group or set a single server if desired.

Finally, many users do not need this advance configuration, so instead a simple
interface will be created which will be the default presented. It would basically
be a list of servers that could be added to/deleted from, with a radio control
allowing them to select Seperate, Round-Robin, or Load with explanation. All the
servers would be preforming in mirrored mode, for simplicity. Though it might be
better to allow them to selected Distributed-Mirrored also, and present two lists
of mirrored server groups.

What I am looking for especially are bad ideas, ideas I am too terse on, etc. Thank you.
Logged
rpjanaka
Coppermine newbie

Posts: 2


« Reply #12 on: April 02, 2008, 10:31:36 am »

Please can I know that what do you mean by "User-based sharding"

is it something like Torrents....?
Logged
Aditya Mooley
Dev Team member
****
Gender: Male
India India

Posts: 769



WWW
« Reply #13 on: April 02, 2008, 11:00:31 am »

In User based sharding we can keep the users data on different servers based on some criteria. Like for users with usernames starting from a to d, the data will be on server 1. e to h on server 2, etc.
Logged

--- "Its Nice 2 BE Important but its more Important 2 Be NICE" ---
Follow Coppermine on Twitter
intorio Topic starter
Coppermine newbie

Posts: 6


« Reply #14 on: April 02, 2008, 08:23:35 pm »

Which I hope to achieve through rule plugins, to allow other kinds of sharding.
Logged
slausen
Coppermine regular visitor
**
Posts: 67


« Reply #15 on: April 08, 2008, 02:51:39 am »

Hi-

I recently submitted a feature request that may be able to add some value to this thread: http://forum.coppermine-gallery.net/index.php/topic,51694.0.html

It is not as ambitious as this GSOC proposal, but it's possible that implementation of the feature request could help to lay a small bit of the groundwork for what you envision.
Logged
slausen
Coppermine regular visitor
**
Posts: 67


« Reply #16 on: September 16, 2008, 07:26:31 pm »

Hi-

So the summer is over. I was just wondering if any work on this feature was done as outliined in this thread? Thanks.
Logged
SaWey
Dev Team member
****
Gender: Male
Belgium Belgium

Posts: 1119



WWW
« Reply #17 on: September 16, 2008, 09:18:28 pm »

There has been worked on this feature, but the GSoC code isn't available yet.
Logged
Joachim Müller
Administrator
*****
Gender: Male
Germany Germany

Posts: 45051


aka "GauGau"


WWW
« Reply #18 on: September 17, 2008, 06:08:21 pm »

Check the sticky thread http://forum.coppermine-gallery.net/index.php/topic,51330.0.html - it contains references to the GSoC pages. Check out the SVN to see the code (unsupported!) - see http://coppermine.svn.sourceforge.net/viewvc/coppermine/branches/cpg1.5.x/gsoc2008/ and http://documentation.coppermine-gallery.net/en/dev_subversion.htm#subversion
Logged
Pages: [1]   Go Up
  Send this topic  |  Print  
 
Jump to:  

Powered by SMF 1.1.10 | SMF © 2006-2009, Simple Machines LLC
Page created in 0.081 seconds with 17 queries.