2009年4月9日 星期四

memcache memo(1)

Memcache FAQ
Memcached 透過兩種hash 來完成, 一個是透過對key 做 hash 來, 找所對應的 server, 第二個就是找其相對應的值了, 當然不同的client implement 這些過程也會有所不同

When doing a memcached lookup, first the client hashes the key against the whole list of servers. Once it has chosen a server, the client then sends its request, and the server does an internal hash key lookup for the actual item data.

For example, if we have clients 1, 2, 3, and servers A, B, C:

Client 1 wants to set key "foo" with value "barbaz". Client 1 takes the full list of servers (A, B, C), hashes the key against them, then lets say ends up picking server B. Client 1 then directly connects to server B, and sets key "foo" with value "barbaz". Next, client 2 wants to get key "foo". Client 2 runs the same client library as client 1, and has the same server list (A, B, C). It is able to use the same hashing process to figure out key "foo" is on server B. It then directly requests key "foo" and gets back "barbaz".



Can I use different size caches across servers and will memcached use the servers with more memory efficiently?

Memcache's hashing algorithm that determines which server a key is cached on does not take into account memory sizes across servers. But a workaround may be to run multiple memcached instances on your server with more memory with each instance using the same size cache as all your other servers.



Memcached is not faster than my database. Why?

In a one to one comparison, memcached may not be faster than your SQL queries. However, this is not its goal. Memached's goal is scalability. As connections and requests increase, memcached will perform better than most database only solutions. Please test your code under high load with simultaneous connections and requests before deciding memcached is not right for you.



Cache things other than SQL data!

When first plugging memcached into everything you can get your hands on, it may not be obvious that you can or should cache anything other than SQL resultsets. You can, and you should!

If you were building a profile page for display. You might fetch a user's bio section (name, birthdate, hometown, blurb). Then you might format the blurb to replace custom XML tags with HTML, or do some nasty regexes. Instead of caching 'name, birthdate, hometown, blurb' independently, or as one item, cache the renderred output chunk! Then you may simply fetch the pre-procsesed HTML chunk ready for inclusion in the rest of the page, saving precious CPU cycles.



Use a cache hierarchy

In most cases you have the ability to use a localized cache or memcached. We know to use memcached so we may enjoy a massive volume of cached data in a high speed farm, but sometimes it makes sense to go back to your roots a little and maintain multiple levels of cache.

Peter Zaitsev has written about the speed comparisons of PHP's APC over localhost, vs memcached over localhost, and the benefits of using both:

* http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
* http://www.mysqlperformanceblog.com/2006/09/27/apc-or-memcached/

Often you'll have a very small amount of data (product categories, connection information, server status variables, application config variables), which are accessed on nearly every page load. It makes a lot of sense to cache these as close to the process as possible (or even inside the process, if you can). It can help lower page render time, and increase reliability in case of memcached node failures.



Creating good keys

It's a good idea to use sprintf (), or a similar function, when creating keys. Otherwise, it's easy for null values and boolean values to slip into your keys and these may not work as you expect. e.g. memKey = sprintf ( 'cat:%u', categoryId );

WIP: Mulled this over, need someone with better examples to fill this in. Short keys tend to be good, using prefixes along with an MD5 or short SHA1 can be good, namespace prep is good. What else?



Is memcached atomic?
atomic operation
atomic 的意思是說 就算是某個 process 同時set/get 同個東西, 也會一前一後執行, 且不會讓其他的process 處理他

Of course! Well, lets be specific:

* All individual commands sent to memcached are absolutely atomic. If you send a set and a get in parallel, against the same object, they will not clober each other. They will be serialized and one will be executed before the other. Even in threaded mode, all commands are atomic. If they are not, it's a bug :)

* A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.

沒有留言: